The goal of this document/ presentation is to take you (a biostatistician or similar) from knowing nothing about Docker to being able to utilize it in your research through a simple example.
Docker was developed for software engineering. As software has gotten more complex and applications more numerous the act of manually configuring a new server every time you needed to scale up was simply too high. To deal with this Docker When I use the name “Docker” here I really am referring to any container or Virtual Machine (VM) software. software was built. Docker exists as a method of recreating an image of a server with all its software installed etc. in a single command.
Docker’s job is to make it so a programmer only has to sit and type out
sudo apt-get install clang... once and then anytime they have a new machine they just use the Docker ‘image’ to start from a setup just like they had it before, guaranteed.
Quickly before I bore you with a bunch of terminology you could get elsewhere let’s motivate why you as a biostatistician/data scientist may want to use Docker.
The reasons why I find Docker valuable falls into five main points:
A typical docker workflow is as follows:
Dockerfilethat specifies what software you need
Because I expect that workflow to make approximately zero sense at first blush let’s expand on each step.
Installing Docker takes a different form on different machines. The main docs do a much better job than I can so I will point you to them.
Unfortunately as of writing this you need the professional version of windows to run Docker, although this is supposed to change in the near-ish future Docker for windows
This is a simple text script that describes the state of the machine. Think of it like a super bash/shell script that does all the typing into the command line for you. So if your usual workflow upon getting a new computer is opening up the terminal and running something like…
sudo apt-get install R-Lang, RStudio-Server, ...
This can get translated into the Dockerfile so Docker knows how to get the various software you need.
That is all great but you may wonder why you don’t just use a big bash script here and avoid all this hassle? The beauty of these Docker files is you can stack them, building upon previous images built by you or others.
For instance, say you want to run a machine that has instant access to the version 3.4 of R and the
tidyverse suite of packages already installed. You can use the Rocker project who kindly provide pre-built images with different R versions etc for you.
Note the stacking of the shipping containers in the logo.
# ./DOCKERFILE # Start from image with R 3.3.1 and tidyverse installed FROM rocker/tidyverse:3.3.1 ... # Add your own desired packages etc. on top
Let’s setup a super simple example of building a docker image with both the tidyverse and another custom package
visdat to look at data overviews.
FROM rocker/tidyverse:3.5.0 # Install visdat from github RUN R -e "devtools::install_github('ropensci/visdat')"
Now that we have our simple dockerfile we can ‘build’ it. This simply means we tell our computer to go and grab all the necessary files and construct the image for use. Run this from the same directory you made your
$ docker build -t tidy_vizdat .
After you do this you’ll get a nice matrix-esque string of status bars…
Sending build context to Docker daemon 50.18kB Step 1/2 : FROM rocker/tidyverse:3.5.0 3.5.0: Pulling from rocker/tidyverse 54f7e8ac135a: Pull complete...
These show you progress of the downloading and constructing of your new image!
Once this completes you now have an image. This means that no matter what happens you will always be able to run this image and it will work the same. Note that if you don’t specify versions for packages if you rebuild the image at a later date things may not be identical, but as long as you don’t rebuild the image when you run the image it will always be the same
Now that your container is built all you need to do is use the
run command to start it up and enter.
docker run -it tidy_vizdat bash
This says run your just created image and enter it in a
bash shell. After a second you will have your terminal look something like this:
$ docker run -it tidy_vizdat bash root@061df01792d0:/#
You are now in the container! It’s just a linux machine running within your computer. We can open R and run our world-changing calculations…
root@061df01792d0:/# R R version 3.5.0 (2018-04-23) -- "Joy in Playing" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) ... Type 'q()' to quit R. > 4 * 3  12
Beautiful! Now to close everything simply type
exit into the terminal and Docker will shut down the container and it will be like nothing ever happened!
But wait, this isn’t particularly helpful. What if we wanted to use RStudio or import data. Fret not these are possible as well.
The image that we loaded also happens to have RStudio-Server loaded on it. This means that if we can get access to the container from our web browser we can use everyone’s favorite IDE to work/ run scripts.
The beauty of container stacking in action. They just added some the tidyverse on top of their already built RStudio image. DRY comes to software installation!
To make our web browser able to connect into the container we need to tell Docker that we want to map some local port to the container’s internal ports. Aka if we have a server that is running on port
8787 like RStudio-Server does, we need to make sure our local computers
8787 is simply mapped into the container’s.
Luckily, this just means a couple changes to the
docker run command.
docker run -it -p 8787:8787 -e DISABLE_AUTH=true tidy_vizdat
Note that we have added
-p 9000:8787 which tells docker to map port
9000 on our computer to port
8787 in the container. In addition we have added
-e DISABLE_AUTH=true which just tells RStudio we don’t want to use the login screen. To see more about customizing these behaviours such as when you need more security read the Rocker docs Last we simple left off the
bash at the end of the command because we will do our accessing of the container through the web browser.