Reproducible Research: Introduction to Containers

Session Overview

The Building Blocks of Reproducible Research
Why use Containers?
Intro to How Containers Work
Intro to Building Containers
Containers and Package Management

The Building Blocks of Reproducible Research

To build a reproducible scientific research project, we use:

Git: ensures our code is reproducible!
Package/Environment Managers: ensure our required scientific computing libraries are reproducible!
Containers: ensure that every remaining bit of our computing environment is reproducible!

Why Use Containers?

Containers:

Allow you to “Write Code Once, Deploy Anywhere”
Get rid of the “It Works on My Computer” Problem
Run On-Prem, In the Cloud, and on Microservice Platforms

Specifically, containers do two things:

Package your work into reproducible, portable units.
Allow you to deploy your work as a microservice.

Why Use Containers in the Cloud?

To Scale on the Cloud we need Code that is:

Portable
Modular
Version Controlled

Session Overview

The Building Blocks of Reproducible Research
Why use Containers?
Intro to How Containers Work
Intro to Building Containers
Containers and Package Management

How containers work

Where Containers Run

Session Overview

The Building Blocks of Reproducible Research
Why use Containers?
Intro to How Containers Work
Intro to Building Containers
Containers and Package Management

Building Containers

Building Containers: Defining Dockerfiles

FROM python:3.9-slim

# Get Rust
RUN curl https://sh.rustup.rs -sSf | bash -s -- -y

ENV PATH="/root/.cargo/bin:${PATH}"

# Install Python dependencies
COPY requirements.txt ./requirements.txt
RUN pip install -r requirements.txt --use-feature=2020-resolver

# Add app to image
COPY . ./
CMD gunicorn -b 0.0.0.0:8050 app:server -w 4

Key Commands:

FROM
RUN
ENV
COPY
CMD

Building Containers: Defining Dockerfiles, continued

Some projects require lots of software installations. For instance, a single machine learning project may require:

C, C++, and Fortran compilers;
Relevant C, C++, and Fortran libraries;
Hardware drivers and CUDA for interfacing with GPUs during model training;
Python libraries for executing machine learning code and analyzing results;
Some final, extremely obscure scientific computing library that has that one tool that you really need.

That’s a lot for one researcher to install. This is where the magic of Docker Images come in.

Building Containers: using Docker Images

Recall that Dockerfiles tell Docker what to put inside a Docker Image, which is then saved until it needs to be spun up into a container.

Chances are that if you need to get a piece of software into an image, someone else has needed it, too.

Pre-built Docker Images are available from reputable registries. Using them expedites your ability to build the Docker Image that you need for your specific application. We use the FROM command to pull the Docker base image.

FROM pangeo/ml-notebook:latest

Docker Registry

Session Overview

The Building Blocks of Reproducible Research
Why use Containers?
Intro to How Containers Work
Intro to Building Containers
Containers and Package Management

Building containers: using Python packages and environments

You may have noticed that our Dockerfile included a call to pip. Why is that?

It’s because Docker Containers, at their core, emulate extremely lightweight operating systems with nothing extra installed.

Thus, you need to install anything that you need, including Python libraries.

To do install what we need, we call pip install and pass a list of our desired Python libraries to Docker via requirements.txt, which we usually store in the same directory as the Dockerfile.

Frequently, pip is sufficient for our needs when it comes to creating Docker Images and running containers. Sometimes, though, we may need additional capabilities…

Package & Environment Management

Isolated Environments that handle dependencies, conflicts, and package versions.
Are Portable and Version Controlled (text-based)

Python Package and Environment Managers

Conda
Pyenv
Virtual Environments
- Primer
- Guide
Poetry

Reproducible Research: Introduction to Containers

Session Overview

The Building Blocks of Reproducible Research

Why Use Containers?

Why Use Containers in the Cloud?

Session Overview

How containers work

How containers work

How containers work

How containers work

Where Containers Run

Session Overview

Building Containers

Building Containers: Defining Dockerfiles

Building Containers: Defining Dockerfiles, continued

Building Containers: using Docker Images

Docker Registry

Session Overview

Building containers: using Python packages and environments

Package & Environment Management

Docker Resources

Docker Resources (Continued)