For starters, this is how you might want to turn your well-written Markdown file (with common metadata fields like , and ) into a properly typeset PDF document:
However, Markdown is not TeX. Not even close. Once you need to have some bleeding edge control over the typesetting outcome, or perhaps just a little refinement on its LaTeX templating, you’ll soon notice that Pandoc has its quirks and gotchas. I’ve been utilizing Pandoc in all my serious academic writing (incl. homework reports) for years, ever since I gave up on learning more about the overwhelmingly sophisticated TeX ecosystem and turned to something that “just works”. Pandoc fits my needs well. And when it doesn’t, there’s almost always a workaround that achieves the same thing neatly. And this is what this write-up is mostly about.
Tweaking ? Bad idea.
You could, of course, modify the default template () provided by Pandoc, as long as you’re no stranger to LaTeX. In this way, you can achieve anything you want – in pure LaTeX.
There are, however, a few problems with this naïve approach:
- If you are tweaking the template just for something you’re currently working on, you will end up with some highly document-specific, hardly reusable template. Also this won’t give you any good for using Pandoc – you could just write plain LaTeX anyway.
- If Pandoc improves its default template for a newer version, your home-brewed template won’t benefit from this (unless you’re willing to merge the diffs and resolve any conflicts by hand).
I’m conservative about changing the templates. If it’s a general issue that needs to be fixed in the default template, sending a pull request to pandoc-templates might be a better idea. Of course, if there’s a certain submission format you have to stick with (given LaTeX templates for conference papers), then you will fall back on your own.
Separating the formatting stuff
I wouldn’t claim that I know the best practice of using Pandoc, but there’s such a common idiom that cannot be overstressed: Separate presentation and content!
In the YAML front matter of (the main Markdown file you’re writing), put only things that matter to your potential readers:
And in a separate YAML file (let’s call it ), here goes the formatting stuff:
Above is my personal default, and it’s worth a few words to explain:
- is where you control the geometric settings of your document. For example, you may narrow down the page margin to , and this is equivalent to raw LaTeX:
- Set to any value other than if paragraph indentation is desired. (And it is often desired in formal publications.)
- is where you define your own macros, configure existing ones, or claim in case you want to use a package not enabled by Pandoc (e.g., ). Although you might as well define those in other places (e.g., in the content of a Markdown file), don’t do that.
- This decent Q.E.D. tombstone: is my favorite of all time. It doesn’t require the package.
With a separate , now here we are:
While the Markdown syntax for citing is rather easy (), it takes effort to make things right, especially if you have a certain preferred citation format (APA, MLA, Chicago, IEEE, etc.).
The suggestion is: Use pandoc-citeproc. Once you have a list of references you’re interested in, you need two things to typeset those nicely in your document:
- A CSL (Citation Style Language) file (), to specify the citation format you want to use.
- A BibTeX file (), which is a list of all entries you might cite.
- Citation entries in BibTeX format may be found easily on the Internet, through academic search engines and databases. Concatenate them one by one.
As part of the YAML metadata: (Assume you have and )
Using as a filter, generate the document with citations:
The list of references is appended to the end of the document. It is often desirable to give the references an obvious title (“References”), start from a new page and avoid any further indentation, so the following comes in the end of the Markdown source:
Putting it all together!
Basically, we need 5 files in total:
- For content:
- (Markdown + possibly LaTeX mixed format): Main text.
- (BibTeX/BibLaTeX format): List of references.
- For presentation:
- (YAML format): Format-related metadata.
- (LaTeX format): Content of ; package imports and macro definitions.
- (CSL XML format): Citation style.
And one command:
Open question: Lightweight replacement for ?
Pandoc doesn’t provide native support for (and I wonder if there will ever be). You can still have the same thing in Pandoc Markdown:
However, everything in between and will be treated as raw LaTeX, and the expressiveness of Markdown is lost there. More importantly, this is purely a LaTeX-specific thing, so there’s no way for Pandoc to convert this to HTML or any other format (unless you have a filter that does the trick). Consequently, I tend to write all definitions / theorems (lemmas, claims, corollaries, propositions…) in simple Markdown:
It does have some advantages over :
- Using , you cannot see the numbering of each theorem (definition, etc.) in the text editor (well, you can’t without a dedicated plugin at least). This is inconvenient when you need to refer to a prior one later. By numbering them explicitly, you can clearly see these ordinals in the Markdown source.
- It is perfectly valid Markdown, so it converts to any format as you wish (HTML, for example).
This also has some drawbacks compared to using , though:
- It doesn’t have theorem counters. You need to number things explicitly, manually. (Clearly you can’t have implicit numbering and explicit numbering at the same time, so here’s the trade-off.)
- It doesn’t have automatic formatting. That is, you could possibly get the style for a certain entry (plain, definition, remark) wrong.
- Semantically, they are not recognized as theorems, just normal text paragraphs. This is problematic if you want to prevent definitions and theorems from being indented, since there’s no way for LaTeX to tell them from a normal text.
(Probably) The best solution is to write a filter that (conventionally) converts any plain text like (and , , etc.) in the beginning of a paragraph to proper Markdown (for HTML target) or corresponding block (for LaTeX target). Even better, it should be able to do cross-references accordingly (Remember ? Let’s put an anchored link on that!). This is yet to be done, but would be very helpful to someone who does a lot of theorems and proofs thus wants to avoid the kludge of mixing raw LaTeX with semantically evident Markdown.
Markdown is a really awesome format for text and prose. It’s really easy to manage in any text editor, and it’s quick to write. It has a lot of features, including bolding, italicizing, lists, quotes, embedded code, and more. It’s so easy to write that it’s the “language” of choice for many major websites such as StackOverflow and Reddit, it being much easier to implement and looking nicer than a WYSIWYG text editor. I’m even writing this blog post using Markdown. However, you can’t really send someone a Markdown document. It’s meant to be processed into a more readable format, most usually HTML.
LaTeX is a great tool for typesetting text. It has a lot of flexibility and standardizes how documents look. It’s so powerful that it has become the defacto tool to create research papers. However, there is definitely a learning curve in using the software, and the source doesn’t look very nice.
Pandoc brings the best of both worlds. It allows conversion of Markdown to a predefined LaTeX template, allowing you to use Markdown to write LaTeX documents in a format that works out 99% of the time if you’re just writing notes or submitting a linear homework assignment. Usage is simple:
This generates either a or (compiled LaTeX) file that looks pretty good using the default settings.
Generating PDFs quickly
Since I use Markdown so much to generate PDFs, I’ve created the following shell function:
You can add this to your or to add a command like follows:
This will generate a file named in your present working directory.
You can use file watching tools to automatically generate a PDF and leave it open/refreshing in your PDF reader. I personally use a tool called entr(1), which can be installed via Homebrew.
You may want to insert some math into your document. You can do this by surrounding your math in dollar signs () and writing in LaTeX form, for example:
This produces the equation inline. Documentation for this feature of Pandoc is pretty spotty – if you know of more ways to embed LaTeX equations in Pandoc-generated documents, please let me know!
Setting everything up
All of the tools mentioned can be installed via Homebrew.
Writing Markdown instead of LaTeX allows me to iterate faster on my homework/notes and lets me worry more about my content than the formatting of my sections and subsections. It’s really easy to set everything up, and as a Vim user, it has increased my productivity quite a bit. I hope you find my workflow as useful as I have!