To build a reproducible scientific research project, we use:
Containers:
Specifically, containers do two things:
To Scale on the Cloud we need Code that is:
FROM python:3.9-slim
# Get Rust
RUN curl https://sh.rustup.rs -sSf | bash -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"
# Install Python dependencies
COPY requirements.txt ./requirements.txt
RUN pip install -r requirements.txt --use-feature=2020-resolver
# Add app to image
COPY . ./
CMD gunicorn -b 0.0.0.0:8050 app:server -w 4
Key Commands:
Some projects require lots of software installations. For instance, a single machine learning project may require:
That’s a lot for one researcher to install. This is where the magic of Docker Images come in.
Recall that Dockerfiles tell Docker what to put inside a Docker Image, which is then saved until it needs to be spun up into a container.
Chances are that if you need to get a piece of software into an image, someone else has needed it, too.
Pre-built Docker Images are available from reputable registries. Using them expedites your ability to build the Docker Image that you need for your specific application. We use the FROM
command to pull the Docker base image.
You may have noticed that our Dockerfile included a call to pip
. Why is that?
It’s because Docker Containers, at their core, emulate extremely lightweight operating systems with nothing extra installed.
Thus, you need to install anything that you need, including Python libraries.
To do install what we need, we call pip install
and pass a list of our desired Python libraries to Docker via requirements.txt
, which we usually store in the same directory as the Dockerfile.
Frequently, pip
is sufficient for our needs when it comes to creating Docker Images and running containers. Sometimes, though, we may need additional capabilities…
Isolated Environments that handle dependencies, conflicts, and package versions.
Are Portable and Version Controlled (text-based)
Python Package and Environment Managers
Earth System Data Science in the Cloud