Containerizing Interactive R Markdown Documents

R Markdown is a reproducible authoring format supporting dozens of static and dynamic output formats. Let's review why and how you should containerize Rmd files.

The rmarkdown package is behind the versatility of R Markdown with dozens of standard and community-provided output formats, ranging from HTML, Word, and PDF, to slides, books, and interactive documents. This abundance of awesomeness is a direct continuation of a long line of predecessors: Sweave/LaTeX, knitr, and pandoc. Its success is the foundation upon which Quarto is built on.

The htmlwidgets R package provides the basis for interactive JavaScript widgets that you can embed in HTML outputs. These are pre-rendered objects that respond to various gestures, like hover and click events. You just render the document once, and you are done until the next time when the document needs updating.

True reactivity, however, requires a lot more JavaScript heavy-lifting – i.e. using Observable – or you can use Shiny as the runtime for the R Markdown document. Such documents require a web server to watch for reactive updates in the background. This makes them effectively Shiny apps.

As with any type of Shiny app, a lot of the hosting options out there require the Shiny app to run inside of a Docker container (e.g. Heroku, ShinyProxy, Fly). Because interactive R Markdown documents differ from Shiny apps in subtle ways, serving them is also slightly different. In this post, we review how to "dockerize" R Markdown documents with different runtime environments.

Prerequisites

We will use the script from the analythium/rmarkdown-docker-examples GitHub repository.

You can also pull the following two Docker images:

docker pull eddelbuettel/r2u:20.04
docker pull nginx:alpine

Runtime: Shiny

The way to make R Markdown document interactive/reactive is to add runtime: shiny to the document’s YAML header. Now you can add Shiny widgets and Shiny render functions to the file’s R code chunks. This way the rendered HTML document will include reactive components.

Here is the runtime-shiny/index.Rmd file as our first document (following this example):

---
title: "Runtime: shiny"
output: html_document
runtime: shiny
---

Here are two Shiny widgets

```{r echo = FALSE}
selectInput("n_breaks",
  label = "Number of bins:",
  choices = c(10, 20, 35, 50),
  selected = 20)
sliderInput("bw_adjust",
  label = "Bandwidth adjustment:",
  min = 0.2,
  max = 2,
  value = 1,
  step = 0.2)
```

And here is a histogram

```{r echo = FALSE}
renderPlot({
  hist(faithful$eruptions,
    probability = TRUE,
    breaks = as.numeric(input$n_breaks),
    xlab = "Duration (minutes)",
    main = "Geyser eruption duration")
  dens <- density(faithful$eruptions,
    adjust = input$bw_adjust)
  lines(dens,
    col = "blue")
})
```

You should use rmarkdown::run() instead of rmarkdown::render("index.Rmd") to get the Shiny app running that will look like this:

We will use the following Dockerfile:

FROM eddelbuettel/r2u:20.04

RUN apt-get update && apt-get install -y --no-install-recommends \
    pandoc \
    && rm -rf /var/lib/apt/lists/*

RUN install.r shiny rmarkdown

RUN addgroup --system app && adduser --system --ingroup app app
WORKDIR /home/app
COPY runtime-shiny .
RUN chown app:app -R /home/app
USER app

EXPOSE 3838

CMD ["R", "-e", "rmarkdown::run(shiny_args = list(port = 3838, host = '0.0.0.0'))"]

Here is the explanation for each line:

the eddelbuettel/r2u parent image represents one of the most significant improvements in developer experience in the past few years, it cuts Docker build times to seconds due to full dependency resolution and using Ubuntu's apt package manager (read more about it here)
we need a newer version of pandoc than the standard package for the fancy R Markdown features we are using
install R packages
add a user named app and create a /home/app folder for this user
copy the contents of the runtime-shiny folder into the /home/app folder
set file permissions and set the app user the user of the container
expose port 3838
define the command using rmarkdown::run() and making sure Shiny runs on the port that we expect it

You can build and run the image:

docker build -f Dockerfile.shiny -t psolymos/rmd:shiny .

docker run -p 8080:3838 psolymos/rmd:shiny

Visit localhost:8080 to see the R Markdown document running as a Shiny app.

However, because it requires a full document render for each end user browser session it can perform poorly for documents that don’t render quickly.

Runtime: Shinyrmd

Prerendered Shiny documents represent an improvement. The Shiny runtime can perform poorly for documents that don’t render quickly. This is where runtime: shinyrmd (or its alias, runtime: shiny_prerendered) comes in. Such documents are pre-rendered before deployment so that the HTML loads faster. No need to wait for Shiny to render it for us.

The Shinyrmd runtime also comes with various contexts: server-start/setup/data (that is analogous to global.R), render (like the UI), and server. These contexts provide a hybrid model of execution, where some code is run once when the document is pre-rendered and some code is run every type the user interacts with the document.

The runtime-shinyrmd folder contains another Rmd file (based on this flexdashboard example):

---
title: "Runtime: shinyrmd"
output: flexdashboard::flex_dashboard
runtime: shinyrmd
---

```{r setup, include=FALSE}
library(dplyr)
knitr::opts_chunk$set(echo = FALSE)
```

```{r data, include=FALSE}
faithful_data <- sample_n(faithful, 100)
```

Column {.sidebar}
--------------------------------------------

```{r}
selectInput("n_breaks",
  label = "Number of bins:",
  choices = c(10, 20, 35, 50),
  selected = 20)
sliderInput("bw_adjust",
  label = "Bandwidth adjustment:",
  min = 0.2,
  max = 2,
  value = 1,
  step = 0.2)
```

Based on [this](...) example.

Column
--------------------------------------------

### Geyser Eruption Duration

```{r}
plotOutput("eruptions")
```

```{r, context="server"}
output$eruptions <- renderPlot({
  hist(faithful_data$eruptions,
    probability = TRUE,
    breaks = as.numeric(input$n_breaks),
    xlab = "Duration (minutes)",
    main = "Geyser Eruption Duration")
  dens <- density(faithful_data$eruptions,
    adjust = input$bw_adjust)
  lines(dens,
    col = "blue")
})
```

You can render and run with rmarkdown::run():

The Dockerfile is slightly modified from the Shiny runtime:

we need 2 more dependencies
we need to pre-render the document with rmarkdown::render() so that it is there when we spin up the container

FROM eddelbuettel/r2u:20.04

RUN apt-get update && apt-get install -y --no-install-recommends \
    pandoc \
    && rm -rf /var/lib/apt/lists/*

RUN install.r shiny rmarkdown flexdashboard dplyr

RUN addgroup --system app && adduser --system --ingroup app app
WORKDIR /home/app
COPY runtime-shinyrmd .
RUN R -e "rmarkdown::render('index.Rmd')"
ENV RMARKDOWN_RUN_PRERENDER=0
RUN chown app:app -R /home/app
USER app

EXPOSE 3838

CMD ["R", "-e", "rmarkdown::run(shiny_args = list(port = 3838, host = '0.0.0.0'))"]

Build and run:

docker build -f Dockerfile.shinyrmd -t psolymos/rmd:shinyrmd .

docker run -p 8080:3838 psolymos/rmd:shinyrmd

Visit localhost:8080 to see the R Markdown document running as a pre-rendered Shiny app.

The docker build is super fast, thanks to the r2u image we used. The image size is around 1 GB, a bit larger than the ~800 GB parent image.

⚠️

Note: some rmarkdown versions might try to re-render index.Rmd. To avoid this, we can include the ENV RMARKDOWN_RUN_PRERENDER=0 environment variable after the render() command.

Interactive tutorials with learnr

The learnr R package is for creating interactive tutorials using R Markdown. This also uses the prerendered Shiny runtime. Let's see an example using this learnr tutorial (follow the link to the whole file, showing only the YAML header and a few more lines here):

---
title: "Set Up"
output:
  learnr::tutorial:
    progressive: true
    allow_skip: true
runtime: shiny_prerendered
description: >
  Learn how to set up R and RStudio on your machine. We will also demonstrate
  how to install R packages from CRAN, and install the tidyverse package.
---

```{r setup, include=FALSE}
library(learnr)
tutorial_options(exercise.timelimit = 60)
```

## Welcome

This is a demo tutorial. Compare it to the [source code](https://github.com/rstudio/learnr/tree/main/inst/tutorials/ex-setup-r/ex-setup-r.Rmd) that made it.

[...]

And the corresponding Dockerfile that is very similar to the previous example, using the RMARKDOWN_RUN_PRERENDER environment variable to make sure render is not happening at runtime:

FROM eddelbuettel/r2u:20.04

RUN apt-get update && apt-get install -y --no-install-recommends \
    pandoc \
    && rm -rf /var/lib/apt/lists/*

RUN install.r shiny rmarkdown learnr

RUN addgroup --system app && adduser --system --ingroup app app
WORKDIR /home/app
COPY learnr .
RUN R -e "rmarkdown::render('index.Rmd')"
ENV RMARKDOWN_RUN_PRERENDER=0
RUN chown app:app -R /home/app
USER app

EXPOSE 3838

CMD ["R", "-e", "rmarkdown::run(shiny_args = list(port = 3838, host = '0.0.0.0'))"]

Build and run the example:

docker build -f Dockerfile.learnr -t psolymos/rmd:learnr .

docker run -p 8080:3838 psolymos/rmd:learnr

Runtime: Static

Static runtime, as its name implies, creates a static document. It stays the same until some of the document's inputs (images, data) change and the document is re-rendered. This gives us an easy way to just locally render the HTML document, copy it into a Docker image, then serve it using Nginx using this Dockerfile:

FROM nginx:alpine
COPY runtime-static/index.html /usr/share/nginx/html/index.html
CMD ["nginx", "-g", "daemon off;"]

This creates a tiny image (30 MB). Run the container and forward the port 80 where Nginx serves the static files to see the result.

What if you want to take advantage of a Docker-based build environment? You might experience issues with some of the dependencies on certain operating systems, or your IT department might not allow you to install packages yourself but you can use Docker ... Or what if you just want to complicate something that should be simple?

This brings us to a neat Docker build feature called multi-stage builds. We know that our Ubuntu-based image is quite big, so we only want to use that to render the HTML. Once it is done, we just insert that artifact into a small Alpine Linux image.

Multi-stage build with Nginx

With multi-stage builds, you use multiple FROM statements in your Dockerfile. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.

Let's see how this works for our R Markdown example. Here is the stripped-down static index.Rmd file from the runtime-static folder:

---
title: "Runtime: static"
output: flexdashboard::flex_dashboard
runtime: static
---

```{r setup, include=FALSE}
library(dplyr)
knitr::opts_chunk$set(echo = FALSE)
```

```{r data, include=FALSE}
faithful_data <- sample_n(faithful, 100)
```

Column {.sidebar}
--------------------------------------

Based on [this](...) example.

Column
-------------------------------------

### Geyser Eruption Duration

```{r}
hist(faithful_data$eruptions,
  probability = TRUE,
  breaks = 20,
  xlab = "Duration (minutes)",
  main = "Geyser Eruption Duration")
dens <- density(faithful_data$eruptions,
  adjust = 1)
lines(dens,
  col = "blue")
```

The rendered document:

Here is the 2-stage Dockerfile:

FROM eddelbuettel/r2u:20.04 AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
    pandoc \
    && rm -rf /var/lib/apt/lists/*
RUN install.r shiny rmarkdown flexdashboard dplyr
WORKDIR /root
COPY runtime-static .
RUN R -e "rmarkdown::render('index.Rmd', output_dir = 'output')"

FROM nginx:alpine
COPY --from=builder /root/output /usr/share/nginx/html
CMD ["nginx", "-g", "daemon off;"]

The 1st stage looks familiar, except we don't worry about being the root user for the build step. We name this stage builder using AS {name} after the FROM instruction.

The 2nd stage uses another FROM instruction, and we specify that we COPY from the builder stage: --from=builder. We grab all the rendered HTML and move it to the Nginx HTML folder to be served by the file server.

We just took advantage of the R build environment to render the document, and we ended up with a minimal-sized image (24.3MB) with the static content inside.

Build and run:

docker build -f Dockerfile.static -t psolymos/rmd:static .

docker run -p 8080:80 psolymos/rmd:static

Multi-stage build with the OpenFaaS Watchdog

Another option is to use the static mode of the of-watchdog from the OpenFaaS project.

The of-watchdog implements a HTTP server listening on port 8080, and acts as a reverse proxy for running functions and microservices. It can be used independently, or as the entrypoint for a container with OpenFaaS.

The 3-stage Dockerfile looks like this, including the watchdog stage that we copy the Go binary from:

FROM eddelbuettel/r2u:20.04 AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
    pandoc \
    && rm -rf /var/lib/apt/lists/*
RUN install.r shiny rmarkdown flexdashboard dplyr
WORKDIR /root
COPY runtime-static .
RUN R -e "rmarkdown::render('index.Rmd', output_dir = 'output')"

FROM ghcr.io/openfaas/of-watchdog:0.9.6 AS watchdog

FROM alpine:latest
RUN mkdir /app
COPY --from=builder /root/output /app
COPY --from=watchdog /fwatchdog .
ENV mode="static"
ENV static_path="/app"
HEALTHCHECK --interval=3s CMD [ -e /tmp/.lock ] || exit 1
CMD ["./fwatchdog"]

The last few lines are telling the of-watchdog where to serve the static files (there are other modes for dynamic/streaming data types).

Build and run the image:

docker build -f Dockerfile.static2 -t psolymos/rmd:static2 .

docker run -p 8080:8080 psolymos/rmd:static

This final image size is now down to 14.9MB. Besides the smaller size, the of-watchdog also exposes a few Prometheus metrics out of the box that can be handy for monitoring.

If you want to learn more about the of-watchdog and the OpenFaaS project, check out Alex Ellis's book Serverless for Everyone Else and the R templates for OpenFaaS.

Conclusions

The Shiny and the pre-rendered Shinyrmd runtimes for R Markdown make it possible to write interactive documents that users can interact with. This is a great way to get started with reactive programming for folks who are already familiar with R Markdown.

We can treat such interactive documents similarly to Shiny apps and deploy them using Docker containers. When it comes to static R Markdown documents, there is nothing that can prevent us from serving these from containers. We learned how to minify the Docker image using multi-stage builds.