Associated Material

Zoom notes: Zoom Notes 05 - Communicate

Readings:

Introducing RMarkdown

So far through this course we’ve been using Rscripts for analysis which lets us save and run our R code, including comments about what we’re doing along the way. We’re now going to introduce RMarkdown documents - which are like Rscripts on steroids!

RMarkdown is a framework that enables the creation of reproducible documents which are a combination of text, R code, and the evaluated output from the code all embedded in a single document. Not only that, but from a single RMarkdown source document, multiple different output formats can be produced such as HTML, PDF, and Word docs.

In fact this entire course has been written using RMarkdown! At the top right of each page is a Code button that will let you download the RMarkdown code that created the page.


Below is an example of an RMarkdown source document

---
title: "Abridged Gapminder Analysis"
date: 2022-04-13
output: html_document
---

```{r setup, include = FALSE}
library(tidyverse)
```

# Introduction

Load in the Gapminder dataset so that it is ready for analysis

```{r read.csv}
# Save an imported data frame into a named variable
gapminder_data <- read_csv("gapminder_data_2007.csv")
```

There are `r nrow(gapminder_data)` rows to the dataset.

## Visualise Life Expectancy

This is a histogram of the life expectancy.

```{r hist}
# Histogram of life expectancy values from gapminder
gapminder_data %>% 
  ggplot(aes(x = lifeExp)) + 
  geom_histogram()
```

There are three main components to this document

  1. The YAML header which is surrounded by ---s and provides information for the compiling process
  2. R code chunks which are surrounded by ```s
  3. Text which can be formatted using the Markdown language.

A reference guide of RMarkdown syntax can be found through Help -> Cheat Sheets -> R Markdown Reference Guide in the RStudio menu.


Example RMarkdown

Before we delve into explaining each part of the RMarkdown file we’re going to create our own from the included template that comes with RStudio.

Lets create our own RMarkdown document now from the template. To do this go File -> New File -> R Markdown. You’ll then be presented with a window that looks like this

Take the opportunity to fill in your name and title then click OK.

You should now have a document that looks like the following:

---
title: "My First Rmd"
author: "Murray"
date: '2022-04-13'
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

```{r cars}
summary(cars)
```

## Including Plots

You can also embed plots, for example:

```{r pressure, echo=FALSE}
plot(pressure)
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

RStudio Visual Editor

From Rstudio v1.2 there has been the inclusion of a live-preview editor that can be turned on that provides a graphical point-and-click method of editing Markdown.

Source view

Source view


Visual view

Visual view


Documentation for how to use the editor and its functionality can be found at https://rstudio.github.io/visual-markdown-editing/


Knitting

In order to get our output document, we need to do a compiling step or knit the document - behind the scenes the text portions are formatted based on the markdown syntax, the R code is run and the results generated, and then the formatted text, code, and results are “knitted” together as a single output.

One of the key benefits on a reproducibility side is that RMarkdown is evaluated from top to bottom externally from your session and so it needs to be self-reliant and have all the commands from reading your data in, processing it, and making your awesome tables and plots like in the Visualisation Module.

To knit the document look for the knit button in the top left of the “source” panel. The keyboard shortcut is Ctrl + Shift + K on PC or Apple + Shift + K on MacOS.

You will then be prompted to save this script, call it “r_markdown_example.Rmd” as save it in your scripts/ directory within your project directory. Once you have knitted a window should pop-up containing your brand new analysis document!

RMarkdown scripts generally have the file extension .Rmd.

Take a few minutes to read from top to bottom through your script and identifying the same features in your outputted HTML document.


Markdown syntax

Markdown is a simplified language that uses symbols to encode formatting of text in a compiled document. Markdown documents can be converted to HTML or LaTeX (used for PDF) through Pandoc (which comes bundled with RStudio).

Headings

Headers - these use the # for the largest heading (header 1) through to ###### the smallest heading (level 6) and correspond to the h1 to h6 heading tags in HTML.

# Level 1 heading

## level 2 heading

### level 3 heading

#### level 4 heading

##### level 5 heading

###### level 6 heading


We’ll cover some more of the common text formatting now, where you’ll see the rendered paragraph followed by the markdown syntax that was used to generate it:

Bold/Italics

Italics is encoded by surrounding word(s) with with a single asterisk (*) or underscore (_), bold uses double asterisks ** or underscores __. To superscript something, surround it with carets (^), and to subscript surround it with tilde (~). Surrounding with double tildes will strikethrough.

*Italics* is encoded by surrounding word(s) with with a single asterisk (\*) or underscore (_), **bold** uses double asterisks ** or underscores __. To ^super^script something, surround it with carets (^), and to ~sub~script surround it with tilde (~). Surrounding with double tildes will ~~strikethrough~~.


Lists

Unordered lists can be made by starting a line with either a dash (-) or an asterisk (*) and if you want to nest items use a tab or two spaces to indent per layer.

  • item 1
  • item 2
  • item 3
    • subitem 1
    • subitem 2
      • sub sub item 1
  • item 4
- item 1
- item 2
- item 3
  - subitem 1
  - subitem 2
    - sub sub item 1
- item 4

Ordered lists start the line with a number followed by a fullstop. It is possible to nest unordered and ordered lists within the same list

  1. item 1
  2. item 2
  3. item 3
1. item 1
2. item 2
3. item 3


Block quotes

block quotes are a way of including blocks of text from someone else. To use these that the line with a > angle bracket

> block quotes are a way of including blocks of text from someone else. To use these begin the line with a > angle bracket


Verbatim code

If you want to include code in your document, the use of verbatim blocks will stop the symbols being interpreted for markdown and will be reproduced asis in the document.
These blocks are started and ended with three backticks ```

```
If you want to include code in your document as has been done to demonstrate the markdown code that generated each of the example paragraphs, the use of verbatim blocks will stop the symbols being interpreted for markdown and will be reproduced as is in the document.
Theses blocks are started and ended with three backticks ```
```

You can also do inline verbatim by surrounding the text with a single backtick

You can also do `inline verbatim` by surrounding the text with a single backtick

Code Chunks

Markdown provides verbatim code chunks, however where RMarkdown really comes into its own is the ability to have the code that is included evaluated and the results also embedded directly below the code that was created them. While it’s called RMarkdown you’re also not just limited to R but other languages can be included and run (so long as the underlying engines are set up)

A code chunk takes this format, similar to to the verbatim code chunk but following the first three backticks are curly braces, and inside the name of the language in lower case - in this case “r”

```{r}
1 + 2
```

Would produce

1 + 2
#> [1] 3


Working directory

The working directory or location that R is going to start looking for specified files (e.g. a csv to read in) for an RMarkdown will default to the location the RMarkdown file is saved. This can be a common source of errors in compiling an RMarkdown document if your RMarkdown is saved in a subdirectory and you don’t have your file paths correct.

Don’t use setwd() in an RMarkdown. It will cause issues.

If you are using an RStudio project and structure as introduced in Introducing R and Rstudio you can make use of the here package which provides a nice way of dealing with relative file paths as if you were navigating from the top of your project directory.

For instance given the following project setup:

my_project/
  |- data/
      \- my_csv.csv
  |- docs/
  |- outputs/
  |- scripts/
      \- my_rmd.Rmd
  \ - my_project.Rproj
  

If we were working on the file my_rmd.Rmd without the use of here we would need to use relative paths from scripts/ (we want to use relative paths within our project because they aren’t dependant on any particular computer making our project transferable) and the command to read data in would look like this:

my_data <- read_csv("../data/my_csv.csv")

Using here everything is relative from the .Rproj file which can be easier to think of since it follows a relative path the same structure as the project, not relative to where the file you’re currently working on lives - here works all that out for you:

library(here)
my_data <- read_csv(here("data/my_csv.csv"))


Code Chunk Options

The behaviour of the code chunks can be modified with options. These options are provided inside the {}’s of the code chunk and are comma separated.

The defaults for a chunk are:

```{r, eval=TRUE, echo=TRUE, message=TRUE, include=TRUE, warning=TRUE}
1 + 2
```
  • echo=TRUE will “echo” the code that is run above the results
  • eval=TRUE means the code inside the chunk will be evaluated (run)
  • include=TRUE means the code and the results will be included in the document
  • warning=TRUE will include any warnings as output in the document
  • message=TRUE will include messages as output in the document

These can individually be specified and set to FALSE to disable the specific behaviour.

Images, Figures and Tables

Images

Inserting images into RMarkdown documents can be done in two main ways

  1. Through markdown with ![alt text](path/to/image)
  2. Using a code chunk and the include_graphics() function from knitr

The second method give you more control over the display of the image in the output because you can use the code chunk options to such as

  • fig.align to control the alignment on the page of the image
  • fig.cap to provide a figure caption
  • out.width controls the output width
  • out.height controls the output height

Figures

Images generated through code such as plots will automatically be included as the output underneath the code that created them.

The figure placement and size can be controlled through the code chunk options

  • fig.align to control the alignment on the page of the image
  • fig.cap to provide a figure caption
  • fig.width controls the output width
  • fig.height controls the output height
  • fig.asp can be used to scale a figure

Tables

Tables can be created manually through markdown using the following syntax

col 1 | col 2 | col 3
---|---|---
row 1 | a | 1
row 2 | b | 2

which creates the following table:

col 1 col 2 col 3
row 1 a 1
row 2 b 2

But these table can be quite laborious to create and customise. They also will need to be manually updated if your data changes. A better option is to create tables directly from your data using the kable() function from knitr which will take a dataframe and automatically create the markdown for it.

library(palmerpenguins)
library(knitr)

penguins_small <- head(penguins, n = 10)

kable(penguins_small)
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
Adelie Torgersen 38.9 17.8 181 3625 female 2007
Adelie Torgersen 39.2 19.6 195 4675 male 2007
Adelie Torgersen 34.1 18.1 193 3475 NA 2007
Adelie Torgersen 42.0 20.2 190 4250 NA 2007

These kable tables can be further customised from the function parameters such as col.names to provide a vector of column names for the table, digits to round numbers, and align to control the alignment of the columns.

The additional customisation can be achieved through the use of the kableExtra package which provides numerous extra function for the the customisation of tables in both HTML and LaTeX which differ slightly in what is possible in each format. But they both include features such as row/column/cell colouring, text formatting, groupings, and footnotes.


Here is an example of some extra customisations that could be done to the original table that was demonstrated above with kable. If your customisations on your table rely on the data staying the same they might need to be redone if you update the data in the table.

library(palmerpenguins)
library(knitr)
library(kableExtra)

penguins_small <- head(penguins, n = 10)

# kbl comes from kableExtra and is a version of kable()
kableExtra::kbl(penguins_small, 
    col.names = c("Species", 
                  "Island", 
                  "Bill Length", 
                  "Bill Depth", 
                  "Flipper Length", 
                  "Body Mass (g)", 
                  "Sex", 
                  "Year"), 
    align = "llrrrrcr",
    caption = "A table showing the measurements of the first 10 penguins from the Palmers Penguins dataset.") %>% 
  kableExtra::kable_styling(full_width = TRUE,
                            position = 'center',
                            font_size = 16,
                            bootstrap_options = 'striped') %>% 
  # add in a grouping header for the columns using mm
  kableExtra::add_header_above(header = c("",
                                          "", 
                                          "Measurements (mm)" = 3, 
                                          "", 
                                          "", 
                                          ""))
A table showing the measurements of the first 10 penguins from the Palmers Penguins dataset.
Measurements (mm)
Species Island Bill Length Bill Depth Flipper Length Body Mass (g) Sex Year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
Adelie Torgersen 38.9 17.8 181 3625 female 2007
Adelie Torgersen 39.2 19.6 195 4675 male 2007
Adelie Torgersen 34.1 18.1 193 3475 NA 2007
Adelie Torgersen 42.0 20.2 190 4250 NA 2007


Citations

Citations can be inserted into an RMarkdown document. This document from RStudio goes through how to do it using ether Markdown or with the visual editor which can be linked with a citation manager such as Zotero, or by searching DOIs and more.


Quarto

Pre-requisite: In order to use Quarto you will need to install the Quarto program which RStudio can then use to compile the Quarto. See https://quarto.org/docs/get-started/

RMarkdown is an extremely useful format for creating reproducible reports, however, there are some key features that are missing (without additional packages and tweaking) which you will find you need if you want to use it for making documents like theses or manuscripts, the easiest to point to is cross-referencing to figures and tables in your text (the packages bookdown and thesisdown do add this functionality to RMarkdown).

Quarto is the next iteration of RMarkdown, and has taken much of the functionality that the extra packages created to expand on RMarkdown had, and includes them right from the get-go. Not only that, but Quarto has been designed from the start to have multi-language support, so if you find yourself working in another language such as python, then this same document publishing system is still available to you.

By and large, Quarto and RMarkdown are extremely similar - they both share the three main components:

  1. YAML header
  2. Markdown blocks
  3. Code chunks

Where they differ is there are some slight syntax changes, largely in the YAML header and how options are given to code chunks.

The first difference is that instead of being saved with a .Rmd file extension, a quarto document has the extension .qmd. And instead of the Knit button, it’s called Render.

YAML Header

The YAML header is the first place where there is a main difference. Instead of output: html_document we use format: html. e.g.

---
title: "My Quarto Document"
format: html
author: "Murray Cadzow"
---

Code chunks

The code part of the code chunks are exactly the same with quarto as in RMarkdown. Where they differ is how the chunk options are provided. Instead of them being placed within the curly braces, they can be listed inside the block at the top following a #|, using the key: value syntax like in the YAML header, instead of using =.

```{r}
#| eval: false

1 + 2
```

Conclusion

This module has only scratched the surface of what is possible with the highly versatile format that is RMarkdown. The main benefit that RMarkdown is that it provides a mechanism to create reproducible analysis documents that include prose, code, and generated outputs.

Make sure to check out RMarkdown - the definitive guide for a comprehensive introduction and guide to the possibilities of RMarkdown. There are also packages for creating multi-document RMarkdown outputs such as entire websites (packagedown, distill), blogs (blogdown), and books (bookdown).

