Writing your thesis with bookdown

This post details some tips and tricks for writing a thesis/dissertation using the bookdown R package by Yihui Xie. The idea of this post is to supplement the fantastic book that Xie has written about bookdown, which can be found here. I will assume that readers know a bit about R Markdown; a decent knowledge of R Markdown is going to be essential to using bookdown.

The first thing to highlight is that I’m not a pandoc or LaTeX expert. Indeed, I had no experience with LaTeX prior to writing my thesis, and limited knowledge of the pandoc internals. Some of the tips I present here may, therefore, be suboptimal. Nevertheless, the fact that I was able to successfully write my thesis using bookdown is a testament to the accessibility of the tool, even for someone who hadn’t written a line of code before starting a PhD.

What I did have going into writing my thesis was considerable prior experience with R Markdown, which I started using at the end of the first year of my PhD. I would say that learning R Markdown is an essential part of learning R nowadays, so hopefully R users reading this post already know some markdown. The biggest thing that made using bookdown so accessible for me was that I already had lots of my analyses written up in markdown. For example, for one of my thesis chapters I had already submitted a paper where I had a markdown file for all the analysis (see here). I similarly had another paper where I had written the whole initial draft using markdown. For PhD students who might want to use bookdown for their thesis I’d strongly recommend using markdown for any analysis reports or papers you write along the way. Markdown has many benefits by itself, and using it also sets you up nicely to use bookdown for your thesis.

Why bookdown?

Hopefully readers are already familiar with why R Markdown is good to use for analyses. By facilitating reproducibility and removing the need to manually type results, R Markdown makes analyses far easier to write up. Bookdown introduces some additional functionality on top of R Markdown that makes it particularly useful for writing a thesis. You can automatically handle all the numbering of your sections, figures, and tables with bookdown, as well as automatically generating contents pages, and lists of tables and figures. As anyone who has written a thesis in MS Word will tell you, formatting can often be a painful process. With bookdown all your formatting in handled for you and can be customised with some minor tweaks. While I won’t discuss it here, you also have the option with bookdown to save the .tex file it generates when compiling a PDF and customise to your heart’s content drawing on the power of LaTeX.

Some tips and tricks

The focus of this post is outputting to PDF as this is what you’ll want to do for the final version of your thesis. I’ve also included a few tips for outputting particular chapters to MS Word. This is something I had to do for my supervisors to comment on, and I imagine many others will be in that situation.

The YAML front matter

In your index.Rmd file there will be some YAML front matter where you can include various bits information, as well as controlling the output in various ways. For more information on YAML you can head here or to the bookdown book.

One thing you’ll have to do for a thesis is create a title page. Happily with bookdown you can specify the title, subtitle, and author in the YAML front matter. When I had a play with this I found that the default formatting of the author and subtitle didn’t really suit my university’s formatting requirements. Thus, I ended up specifying my name using the subtitle field and put some other info in the author field:

title: 'Assessing and supporting working memory in children: the role of attention and the environment'  
subtitle: 'Edward David John Berry'  
author: 
- 'Submitted in accordance with the requirements for the degree of Doctor of Philosophy'
- 'School of Psychology'
- 'University of Leeds'
date: 'September 2017'

For the author field each bullet point is then a different line of text that I needed on my title page. The final version looked like this.

Another thing you’ll want to do for the PDF version of your thesis is specify the correct margins. The nice thing about bookdown/LaTeX is that it will automatically sort your margins for double-sided printing. This gives a more professional look and also saves on paper. To specify margins all you have to add to your YAML header is:

geometry: "left=4cm, right=3cm, top=2.5cm, bottom=2.5cm"

When you create the PDF version it will handle everything to ensure that the 4cm margin is always on the binding edge. Obviously, when you get your thesis printed don’t forget to tell them you want double-sided!

You can also control things like font size, and line-spacing (linestretch) in the YAML header, as well as whether you want a list of figures (lof) or tables (lot), or how many numbered sections to include in your contents (toc-depth):

fontsize: 12pt
linestretch: 1.5
toc-depth: 1
lof: True
lot: True

A toc-depth of 1 will mean that only sections 1 and 1.1, 1.2 etc. will be included in the contents. There are lots of other LaTeX YAML options that you can read about here, and in the section below on references. You can see the end of this post for the full YAML header I used for my thesis.

As well as being able to set certain options in the YAML header, you can also get bookdown to run some code in_header, before_body or after_body of the LaTeX source document it creates (see here). In my YAML header I pointed bookdown to a .tex file:

output:
  bookdown::pdf_book:
    includes:
      in_header: preamble.tex

In preamble.tex I only had a few lines of LaTeX code:

\usepackage[none]{hyphenat}
\pagestyle{plain}
\raggedbottom 

\usepackage[none]{hyphenat} turns off hyphenation of words across multiple lines as I’m not a fan of this. \pagestyle{plain} removes section headings from the top of pages as I wanted a simple look. Finally, \raggedbottom prevents LaTeX from varying the spacing between paragraphs to minimise empty space on pages. Personally, I think the variable spacing looks a bit odd, so I turned it off.

Abstract, acknowledgements, and abbreviations

For the final version of your thesis you’re likely going to need to include some copyright statement, as well as abstract and acknowledgement pages. While bookdown offers the functionality to add in a preface (see here), this comes after the contents page, and lists of tables and figures. For my university, the abstract etc. must come before the contents page, making this preface option unsuitable. There is also the option to get bookdown to include some LaTeX code before the main body by providing a file in the YAML header, as noted above (see here). However, I had a play with this option and found that it still wasn’t creating the abstract etc. exactly how I wanted them. Ultimately, I developed a rather inelegant work-around, but it did the job. I created an R Markdown file in which to write the abstract etc. and then output to PDF. To do this you’ll need some stuff in the YAML header so that the formatting matches up with your bookdown book.

---
title: " "
author: " "
date: " "
output: pdf_document
classoption: twoside
fontsize: 12pt
linestretch: 1.5
geometry: "left=4cm, right=3cm, top=2.5cm, bottom=2.5cm"
---

The critical bits are getting your margins and font right, as well as ensuring your document in double sided. I also removed page numbering to avoid conflicts with the main body of the thesis. To do this you can just add \pagenumbering{gobble} to the start of the document. For double-sided printing I wanted the copyright statement to be on the back of my title page. This meant including a page in the document that would be later removed so that the margins were correct for the copyright page. You can download the whole of my before_body.Rmd file here as there are a few other bit of LaTeX I had to do to achieve the result I needed. For example, I included a quote within a ‘minipage’. I also needed to insert a blank page into the file and did this by using an invisible   character. The blank page was required to ensure that the title of my Acknowledgements section would appear on the right hand page.

Once you’ve got the stuff you want before the body of thesis sorted it needs adding into the PDF that bookdown generates. This can be done with some very simply LaTeX code:

\documentclass{article}
\usepackage{pdfpages}

\begin{document}

\includepdf[pages=1]{../_book/_main.pdf}
\includepdf[pages=2-]{../rmarkdown/before_body.pdf}
\includepdf[pages=2-]{../_book/_main.pdf}

\end{document}

What were doing is taking the first page of the PDF that bookdown outputs in the _book sub-folder with \includepdf[pages=1]{../_book/_main.pdf}. Then we take from the 2nd page to the end (pages=2-) of our before_body.pdf generated by markdown. Remember we don’t want to the first page of before_body.pdf as that was only created to ensure the margins were correct. Finally we take from the 2nd page to the end of our thesis. You can write .tex documents within R Studio and you even get a nice ‘Compile PDF’ button at the top.

Another thing you probably have to include in your thesis is a table of abbreviations. Here the ability to include a preface in a bookdown book does the job. In your index.Rmd file you just need to a heading saying Abbreviations {-}, with the {-} telling bookdown that it’s a preface that shouldn’t be included in the chapter numbering. You can then create a table of abbreviations with a bit of R code:

library(tidyverse)
library(knitr)

data_frame(
  Term = c("Working memory", "Short-term memory", "Long-term memory"),
  Abbreviation = c("WM", "STM", "LTM")) %>%
  arrange(Term) %>% # i.e. alphabetical order by Term
  kable(booktab = T) # booktab = T gives us a pretty APA-ish table

Referencing

You can also use your YAML header to control aspects of how referencing works. For example, I included:

bibliography: [packages.bib, papers.bib]
biblio-style: apalike

With bibliography I’m indicating the name of two .bib files in the top folder of my project that have my references in them. I used Mendeley for reading PDFs during my PhD so I originally created papers.bib by saving my library as a .bib file. packages.bib was generated in R using knitr::write_bib(). Once I had these files I needed a way to be able to easily edit them, with changes being automatically made to the underlying .bib file. While I begun manually editing the papers.bib in a text editor I found JabRef offered a better solution. JabRef provides a minimalistic GUI for managing a BibTeX library, making edits, and adding new entries. Crucially, any changes you make are reflected in the underlying .bib file, meaning they’ll also be automatically reflected in your bookdown book.

With the biblio-style option in the YAML header you can chose the style you want to use for your referencing. I found apalike great as it sensibly deals with the frustrating situation where people publish under slightly different variants of their name. For example, is Peter Example has published under “Pete Example”, “Peter Example”, and “Pete A. Example” it can handle this without thinking they are three different authors that must be differentiated.

One thing that caught me out with my references was that I assumed my in-text citations would be put in alphabetical order for me within a set of brackets. They aren’t, so make sure you put them in alphabetical order as you go. I.e. make sure you’re putting:

blah blah [@Adams2015; @Jones1976; @Peters1980]

and not:

blah blah [@Jones1976; @Adams2015; @Peters1980]

In the compiled version the citations will appear in the order you put them. Obviously the alphabetical ordering of the actual reference list is all handled for you.

Tables

There are many different ways to do tables in markdown. I opted for using the kable() function from knitr with booktab = TRUE. This gives you APA-like tables in the PDF version, which is what I wanted for the thesis. I also supplemented kable() with the kableExtra package by Hao Zhu. This package includes functions to add footnotes and header columns to a table, among other things. The guide by Zhu to using the package to create PDF tables is a great resource (see here).

Interacting with MS Word

I imagine many people thinking of using bookdown will still need to provide their PhD supervisors with a Word document to read and comment on. Clearly, the optimal workflow would be to bypass Word and use a platform like GitHub to manage comments and changes. However, this probably isn’t feasible unless your supervisors are already comfortable using version control. I would, instead, send my supervisors a Word document generated for a chapter and work on it in Word before integrating the changes back into my bookdown book. To generate a Word document for a chapter you can use the bookdown::preview_chapter():

bookdown::preview_chapter("01-intro.Rmd",
                output_format = "bookdown::word_document2",
                output_file = paste0("thesis-intro-", format(Sys.Date(), ("%d-%m-%y")), ".docx"),
                output_dir = "chapter-previews",
                output_options = list(reference_docx = "word-style-ref.docx"))

You can set the output_format to "bookdown::word_document2", which is slightly different to rmarkdown::word_document (see here). I had a folder in my thesis project called chapter-previews in which I saved these word outputs. I also provided a Word reference document as an additional argument. This isn’t necessary but allows you to control what the output looks like by providing a Word document where you’ve edited the styles (see here). One thing that may freak you out when you output to Word is that the in text citations might be a mess. For example, the first names of authors might be included in the citations. This is a result of the issue with people publishing under different names discussed above. You don’t have to worry as this will all be handled nicely in the final PDF version (see above), just make sure your supervisors know this.

Once you’re working in Word a few things will make your life easier when it comes to adding changes into your markdown document. Firstly, it’s best to avoid accepting changes in the Word document, as you’ll want some indication of where the changes are later. However, if you do end up accepting changes then both Word and LibreOffice have a useful feature for comparing document. All you need to do is create a new Word document from your markdown file then compare this to the document you’ve been working on with your supervisors. In Word you can use the Compare option on the Review tab (see here). With LibreOffice you can go Edit > Track Changes > Compare Document… (see here). In both cases you’ll have the differences between the two documents highlighted, just be sure you know which version of the chapter is being used as the ‘reference document’.

Miscelaneous

When you’re writing your markdown document I’d recommend having each sentence on a separate line (credit to Oliva Guest for this tip). This makes it easier to keep track of changes, as well as quickly look over your references at the end of lines. Looking over your references is useful if, for example, you want to quickly check if you’ve missed the @ off any citations.

You might be tempted when writing your thesis to include all your R code in the files for each chapter and then cache the results. Personally, I’d recommend creating separate analysis scripts for each chapter and then using save.image() to save your R environment. You can then read in this environment and all your results within the markdown files for each chapter. In addition, saving the environment as a separate .RData file gives you more flexibility to load in the results for other purposes. To use this approach I just included something like this at the end of each analysis script:

save.image(file = paste0("data/chapter_2/", Sys.Date(), "_analysis-data", ".RData"))

As well as saving my analysis results in independent files, I did a similar thing for my figures. In my thesis project I had a folder called figures with sub-folder for each chapter where I saved the figures. I preferred this approach to as it meant I had the figure images to reuse for other purposes, such as talks. It also means that I have scripts for each chapter that just create the plots. These can then be used in the future to recreate similar plots without having to scroll through the markdown files for each chapter. In my chapters I then used include_graphics() from knitr to add in the figures, e.g.:

include_graphics("figures/chapter_2/hist_2017-08-09.png")

You need to create PDF versions of the figures for creating the PDF version of the thesis, as well as PNGs for Word/HTML output. If you specify the path to a PNG file, include_graphics() knows to look in the same folder for a PDF version of the figure when creating the PDF output (see here).

Conclusion

This post aimed to provide a few tips that will make writing a PhD thesis using bookdown easier for people. To reiterate, this post is meant to supplement the bookdown book. There are many more topics I have not covered that Xie’s book goes over. I would recommend returning to this post after you’ve worked through at least some of the book. If there’s something you wish I had covered in this post you could tweet me at @ed_berry and I may update the blog.

Appendix

Here is the full YAML header used for my thesis. I kept the .tex file with keep_tex: true but didn’t end up doing anything with it. That said, it can be useful to have for debugging.

--- 
title: "Assessing and supporting working memory in children: the role of attention and the environment"
subtitle: "Edward David John Berry"
author: 
- "Submitted in accordance with the requirements for the degree of Doctor of Philosophy"
- "University of Leeds" 
- "School of Psychology"
date: "September 2017"
output: 
  bookdown::pdf_book:
    includes:
      in_header: preamble.tex
    keep_tex: yes
    latex_engine: xelatex
    citation_package: natbib
fontsize: 12pt
linestretch: 1.5
toc-depth: 1
secnumdepth: 1
lof: True
lot: True
site: bookdown::bookdown_site
documentclass: book
bibliography: [packages.bib, papers.bib]
biblio-style: apalike
geometry: "left=4cm, right=3cm, top=2.5cm, bottom=2.5cm"
---
comments powered by Disqus