Reproducible Research: Managing Containers

Session Overview

  • Brief Review of Reproducible Research and Containers
  • Defining a Docker Image with a Dockerfile
  • Running a preexisting Docker Image
  • Finding running Docker Containers
  • Accessing and exploring running Docker Containers
  • Removing running Docker Containers

Why use containers?

  • Research is hard and details matter (like which version of a scientific computing library we’re using).
  • Different computing hardware and operating systems have different capabilities.
  • Docker Containers allow us to standardize our computing environments across computing platforms, no matter how different the underlying operating systems and hardware may be.

How do we use containers?

  1. Create a Dockerfile or pull a pre-existing Docker Image from a container registry.
  2. Use said Docker Image to run a corresponding Docker Container (we can run multiple containers simultaneously from the same image).

Containers can either be interacted with (ie, running a Jupyter Lab server from within a Docker Container) or can be run autonomously (ie, within a data processing pipeline on AWS).

Using pre-existing Docker Images: why it’s frequently a good idea

###################################################
################# BASE IMAGE START ################
###################################################

# Dockerfile for base image of all pangeo images
FROM ubuntu:22.04
# build file for pangeo images

LABEL org.opencontainers.image.source=https://github.com/pangeo-data/pangeo-docker-images

# Setup environment to match variables set by repo2docker as much as possible
# The name of the conda environment into which the requested packages are installed
ENV CONDA_ENV=notebook \
    # Tell apt-get to not block installs by asking for interactive human input
    DEBIAN_FRONTEND=noninteractive \
    # Set username, uid and gid (same as uid) of non-root user the container will be run as
    NB_USER=jovyan \
    NB_UID=1000 \
    # Use /bin/bash as shell, not the default /bin/sh (arrow keys, etc don't work then)
    SHELL=/bin/bash \
    # Setup locale to be UTF-8, avoiding gnarly hard to debug encoding errors
    LANG=C.UTF-8  \
    LC_ALL=C.UTF-8 \
    # Install conda in the same place repo2docker does
    CONDA_DIR=/srv/conda

# All env vars that reference other env vars need to be in their own ENV block
# Path to the python environment where the jupyter notebook packages are installed
ENV NB_PYTHON_PREFIX=${CONDA_DIR}/envs/${CONDA_ENV} \
    # Home directory of our non-root user
    HOME=/home/${NB_USER}

# Add both our notebook env as well as default conda installation to $PATH
# Thus, when we start a `python` process (for kernels, or notebooks, etc),
# it loads the python in the notebook conda environment, as that comes
# first here.
ENV PATH=${NB_PYTHON_PREFIX}/bin:${CONDA_DIR}/bin:${PATH}

# Ask dask to read config from ${CONDA_DIR}/etc rather than
# the default of /etc, since the non-root jovyan user can write
# to ${CONDA_DIR}/etc but not to /etc
ENV DASK_ROOT_CONFIG=${CONDA_DIR}/etc

RUN echo "Creating ${NB_USER} user..." \
    # Create a group for the user to be part of, with gid same as uid
    && groupadd --gid ${NB_UID} ${NB_USER}  \
    # Create non-root user, with given gid, uid and create $HOME
    && useradd --create-home --gid ${NB_UID} --no-log-init --uid ${NB_UID} ${NB_USER} \
    # Make sure that /srv is owned by non-root user, so we can install things there
    && chown -R ${NB_USER}:${NB_USER} /srv

# Run conda activate each time a bash shell starts, so users don't have to manually type conda activate
# Note this is only read by shell, but not by the jupyter notebook - that relies
# on us starting the correct `python` process, which we do by adding the notebook conda environment's
# bin to PATH earlier ($NB_PYTHON_PREFIX/bin)
RUN echo ". ${CONDA_DIR}/etc/profile.d/conda.sh ; conda activate ${CONDA_ENV}" > /etc/profile.d/init_conda.sh

# Install basic apt packages
RUN echo "Installing Apt-get packages..." \
    && apt-get update --fix-missing > /dev/null \
    && apt-get install -y apt-utils wget zip tzdata > /dev/null \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Add TZ configuration - https://github.com/PrefectHQ/prefect/issues/3061
ENV TZ UTC
# ========================

USER ${NB_USER}
WORKDIR ${HOME}

# Install latest mambaforge in ${CONDA_DIR}
RUN echo "Installing Mambaforge..." \
    && URL="https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh" \
    && wget --quiet ${URL} -O installer.sh \
    && /bin/bash installer.sh -u -b -p ${CONDA_DIR} \
    && rm installer.sh \
    && mamba install conda-lock -y \
    && mamba clean -afy \
    # After installing the packages, we cleanup some unnecessary files
    # to try reduce image size - see https://jcristharif.com/conda-docker-tips.html
    # Although we explicitly do *not* delete .pyc files, as that seems to slow down startup
    # quite a bit unfortunately - see https://github.com/2i2c-org/infrastructure/issues/2047
    && find ${CONDA_DIR} -follow -type f -name '*.a' -delete

EXPOSE 8888
ENTRYPOINT ["/srv/start"]
#CMD ["jupyter", "notebook", "--ip", "0.0.0.0"]

# We use ONBUILD (https://docs.docker.com/engine/reference/builder/#onbuild)
# to support triggering certain behavior when specific files exist in the directories of our
# child images (such as base-notebook, pangeo-notebook, etc). For example,
# in pangeo-notebook/Dockerfile, we *only* inherit from base-image:master, and
# that triggers all these ONBUILD directives - it is as if these ONBUILD
# directives are located inside pangeo-notebook/Dockerfile. This lets us
# keep the Dockerfiles for our child docker images simple, and customize
# them by just adding files with known names to them. This is
# to *mimic* the repo2docker behavior, where users can just add
# environment.yml, requirements.txt, apt.txt etc files to get certain
# behavior without having to understand how Dockerfiles work. We use
# ONBUILD to support a subset of the files that repo2docker supports.
# We do not use repo2docker itself here, to make the images much smaller
# and easier to reason about.
# ----------------------
ONBUILD USER root
# FIXME (?): user and home folder is hardcoded for now
# FIXME (?): this line breaks the cache of all steps below
ONBUILD COPY --chown=jovyan:jovyan . /home/jovyan

# repo2docker will load files from a .binder or binder directory if
# present. We check if those directories exist, and print a diagnostic
# message here.
ONBUILD RUN echo "Checking for 'binder' or '.binder' subfolder" \
        ; if [ -d binder ] ; then \
        echo "Using 'binder/' build context" \
        ; elif [ -d .binder ] ; then \
        echo "Using '.binder/' build context" \
        ; else \
        echo "Using './' build context" \
        ; fi

# Install apt packages specified in a apt.txt file if it exists.
# Unlike repo2docker, blank lines nor comments are supported here.
ONBUILD RUN echo "Checking for 'apt.txt'..." \
        ; [ -d binder ] && cd binder \
        ; [ -d .binder ] && cd .binder \
        ; if test -f "apt.txt" ; then \
        apt-get update --fix-missing > /dev/null \
        # Read apt.txt line by line, and execute apt-get install -y for each line in apt.txt
        && xargs -a apt.txt apt-get install -y \
        && apt-get clean \
        && rm -rf /var/lib/apt/lists/* \
        ; fi

# If a jupyter_notebook_config.py exists, copy it to /etc/jupyter so
# it will be read by jupyter processes when they start. This feature is
# not available in repo2docker.
ONBUILD RUN echo "Checking for 'jupyter_notebook_config.py'..." \
        ; [ -d binder ] && cd binder \
        ; [ -d .binder ] && cd .binder \
        ; if test -f "jupyter_notebook_config.py" ; then \
        mkdir -p /etc/jupyter \
        && cp jupyter_notebook_config.py /etc/jupyter \
        ; fi

ONBUILD USER ${NB_USER}

# We want to keep our images as reproducible as possible. If a lock
# file with exact versions of all required packages is present, we use
# it to install packages. conda-lock (https://github.com/conda-incubator/conda-lock)
# is used to generate this conda-linux-64.lock file from a given environment.yml
# file - so we get the exact same versions each time the image is built. This
# also lets us see what packages have changed between two images by diffing
# the contents of the lock file between those image versions.
# If a lock file is not present, we use the environment.yml file. And
# if that is also not present, we use the pangeo-notebook conda-forge
# package (https://anaconda.org/conda-forge/pangeo-notebook) to install
# a list of base packages.
# After installing the packages, we cleanup some unnecessary files
# to try reduce image size - see https://jcristharif.com/conda-docker-tips.html
ONBUILD RUN echo "Checking for 'conda-lock.yml' 'conda-linux-64.lock' or 'environment.yml'..." \
        ; [ -d binder ] && cd binder \
        ; [ -d .binder ] && cd .binder \
        ; if test -f "conda-lock.yml" ; then \
        conda-lock install --name ${CONDA_ENV} conda-lock.yml \
        ; elif test -f "conda-linux-64.lock" ; then \
        mamba create --name ${CONDA_ENV} --file conda-linux-64.lock \
        ; elif test -f "environment.yml" ; then \
        mamba env create --name ${CONDA_ENV} -f environment.yml  \
        ; else echo "No conda-lock.yml, conda-linux-64.lock, or environment.yml! *creating default env*" ; \
        mamba create --name ${CONDA_ENV} pangeo-notebook \
        ; fi \
        && mamba clean -yaf \
        && find ${CONDA_DIR} -follow -type f -name '*.a' -delete \
        && find ${CONDA_DIR} -follow -type f -name '*.js.map' -delete \
        ; if [ -d ${NB_PYTHON_PREFIX}/lib/python*/site-packages/bokeh/server/static ]; then \
        find ${NB_PYTHON_PREFIX}/lib/python*/site-packages/bokeh/server/static -follow -type f -name '*.js' ! -name '*.min.js' -delete \
        ; fi

# If a requirements.txt file exists, use pip to install packages
# listed there. We don't want to save cached wheels in the image
# to avoid wasting space.
ONBUILD RUN echo "Checking for pip 'requirements.txt'..." \
        ; [ -d binder ] && cd binder \
        ; [ -d .binder ] && cd .binder \
        ; if test -f "requirements.txt" ; then \
        ${NB_PYTHON_PREFIX}/bin/pip install --no-cache -r requirements.txt \
        ; fi

# If a postBuild file exists, run it!
# After it's done, we try to remove any possible cruft commands there
# leave behind under $HOME - particularly stuff that jupyterlab extensions
# leave behind.
ONBUILD RUN echo "Checking for 'postBuild'..." \
        ; [ -d binder ] && cd binder \
        ; [ -d .binder ] && cd .binder \
        ; if test -f "postBuild" ; then \
        chmod +x postBuild \
        && ./postBuild \
        && rm -rf /tmp/* \
        && rm -rf ${HOME}/.cache ${HOME}/.npm ${HOME}/.yarn \
        && rm -rf ${NB_PYTHON_PREFIX}/share/jupyter/lab/staging \
        && find ${CONDA_DIR} -follow -type f -name '*.a' -delete \
        && find ${CONDA_DIR} -follow -type f -name '*.js.map' -delete \
        ; fi

# If a start file exists, put that under /srv/start. Used in the
# same way as a start file in repo2docker.
ONBUILD RUN echo "Checking for 'start'..." \
        ; [ -d binder ] && cd binder \
        ; [ -d .binder ] && cd .binder \
        ; if test -f "start" ; then \
        chmod +x start \
        && cp start /srv/start \
        ; fi
# ----------------------

################# BASE IMAGE END ##################


###################################################
################ ML-NOTEBOOK START ################
###################################################

# ONBUILD instructions in base-image/Dockerfile are used to
# perform certain actions based on the presence of specific
# files (such as conda-linux-64.lock, start) in this repo.
# Refer to the base-image/Dockerfile for documentation.
ARG PANGEO_BASE_IMAGE_TAG=master
FROM pangeo/base-image:${PANGEO_BASE_IMAGE_TAG}

# Required for nvidia drivers to work inside the image on GKE
# No-ops on other platforms - Azure doesn't need these set.
# Shouldn't negatively affect anyone, and makes life easier on GKE.
ENV PATH=${PATH}:/usr/local/nvidia/bin
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/nvidia/lib64

################ ML-NOTEBOOK END ##################


###################################################
############# REQUIREMENTS.TXT START ##############
###################################################

# List of packages and versions installed in the environment
# Generated by parsing conda-linux-64.lock, please use that as source of truth
_libgcc_mutex==0.1
_openmp_mutex==4.5
_py-xgboost-mutex==2.0
absl-py==1.4.0
adal==1.2.7
adlfs==2024.2.0
affine==2.4.0
aiobotocore==2.11.2
aiohttp==3.9.3
aioitertools==0.11.0
aiosignal==1.3.1
alembic==1.13.1
annotated-types==0.6.0
anyio==3.7.1
aom==3.7.1
appdirs==1.4.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
argopy==0.1.14
arrow==1.3.0
asciitree==0.3.3
astropy==6.0.0
astropy-iers-data==0.2024.2.26.0.28.55
asttokens==2.4.1
astunparse==1.6.3
async-lru==2.0.4
async_generator==1.10
atk-1.0==2.38.0
attrs==23.2.0
av==11.0.0
aws-c-auth==0.7.3
aws-c-cal==0.6.1
aws-c-common==0.9.0
aws-c-compression==0.2.17
aws-c-event-stream==0.3.1
aws-c-http==0.7.11
aws-c-io==0.13.32
aws-c-mqtt==0.9.3
aws-c-s3==0.3.14
aws-c-sdkutils==0.1.12
aws-checksums==0.1.17
aws-crt-cpp==0.21.0
aws-sdk-cpp==1.10.57
awscli==2.13.39
awscrt==0.19.0
azure-core==1.30.0
azure-datalake-store==0.0.51
azure-identity==1.15.0
azure-storage-blob==12.19.0
babel==2.14.0
beautifulsoup4==4.12.3
binutils==2.40
binutils_impl_linux-64==2.40
binutils_linux-64==2.40
black==24.2.0
bleach==6.1.0
blinker==1.7.0
blosc==1.21.5
bokeh==3.3.4
boltons==23.1.1
boto3==1.34.34
botocore==1.34.34
bottleneck==1.3.8
bounded-pool-executor==0.0.3
branca==0.7.1
brotli==1.0.9
brotli-bin==1.0.9
brotli-python==1.0.9
bzip2==1.0.8
c-ares==1.27.0
c-compiler==1.7.0
ca-certificates==2024.2.2
cached-property==1.5.2
cached_property==1.5.2
cachetools==5.3.3
cachey==0.2.1
cairo==1.18.0
cartopy==0.22.0
cdsapi==0.6.1
certifi==2024.2.2
certipy==0.1.3
cf_xarray==0.9.0
cffi==1.16.0
cfgrib==0.9.10.4
cfitsio==4.3.1
cftime==1.6.3
cgen==2020.1
charset-normalizer==3.3.2
chex==0.1.83
ciso==0.2.0
click==8.1.7
click-plugins==1.1.1
cligj==0.7.2
cloudpickle==3.0.0
cmocean==3.1.3
colorama==0.4.6
colorcet==3.0.1
colorspacious==1.1.2
comm==0.2.1
configobj==5.0.8
contourpy==1.2.0
cryptography==40.0.2
cuda-version==11.8
cudatoolkit==11.8.0
cudnn==8.8.0.121
cycler==0.12.1
cython==3.0.8
cytoolz==0.12.3
dask==2024.2.1
dask-core==2024.2.1
dask-gateway==2024.1.0
dask-glm==0.3.2
dask-labextension==7.0.0
dask-ml==2023.3.24
datashader==0.16.0
dav1d==1.2.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
descartes==1.1.0
dill==0.3.8
distributed==2024.2.1
distro==1.8.0
dm-tree==0.1.8
docopt==0.6.2
docrep==0.3.2
docutils==0.19
donfig==0.8.1.post0
earthaccess==0.8.2
eccodes==2.34.1
entrypoints==0.4
eofs==1.4.1
erddapy==2.2.0
esmf==8.6.0
esmpy==8.6.0
etils==1.7.0
exceptiongroup==1.2.0
executing==2.0.1
expat==2.5.0
fastapi==0.110.0
fasteners==0.17.3
fastjmd95==0.2.1
fastprogress==1.0.3
ffmpeg==6.1.1
findlibs==0.0.5
fiona==1.9.5
flatbuffers==23.5.26
flax==0.6.1
flox==0.9.2
folium==0.15.1
font-ttf-dejavu-sans-mono==2.37
font-ttf-inconsolata==3.000
font-ttf-source-code-pro==2.038
font-ttf-ubuntu==0.83
fontconfig==2.14.2
fonts-conda-ecosystem==1
fonts-conda-forge==1
fonttools==4.49.0
fqdn==1.5.1
freeglut==3.2.2
freetype==2.12.1
freexl==2.0.0
fribidi==1.0.10
frozenlist==1.4.1
fsspec==2023.12.2
future==1.0.0
gast==0.5.4
gcc==12.3.0
gcc_impl_linux-64==12.3.0
gcc_linux-64==12.3.0
gcm_filters==0.3.0
gcsfs==2023.12.2.post1
gdal==3.8.1
gdk-pixbuf==2.42.10
geocube==0.5.0
geographiclib==1.52
geopandas==0.14.3
geopandas-base==0.14.3
geopy==2.4.1
geos==3.12.1
geotiff==1.7.1
geoviews-core==1.11.1
gettext==0.21.1
gflags==2.2.2
gh==2.43.1
gh-scoped-creds==4.1
giflib==5.2.1
git-lfs==3.4.1
gitdb==4.0.11
gitpython==3.1.42
glog==0.6.0
gmp==6.3.0
gnutls==3.7.9
google-api-core==2.17.1
google-auth==2.28.1
google-auth-oauthlib==1.0.0
google-cloud-core==2.4.1
google-cloud-storage==2.14.0
google-crc32c==1.1.2
google-pasta==0.2.0
google-resumable-media==2.7.0
googleapis-common-protos==1.62.0
graphite2==1.3.13
graphviz==9.0.0
greenlet==3.0.3
grpcio==1.54.3
gsw==3.6.17
gtk2==2.24.33
gts==0.7.6
h11==0.14.0
h2==4.1.0
h3-py==3.7.6
h5netcdf==1.3.0
h5py==3.10.0
harfbuzz==8.3.0
hdf4==4.2.15
hdf5==1.14.3
heapdict==1.0.1
holoviews==1.18.3
hpack==4.0.0
httpcore==1.0.4
httpx==0.27.0
hvplot==0.9.2
hyperframe==6.0.1
icu==73.2
idna==3.6
imagecodecs-lite==2019.12.3
imageio==2.34.0
importlib-metadata==7.0.1
importlib_metadata==7.0.1
importlib_resources==6.1.2
iniconfig==2.0.0
intake==2.0.3
intake-esm==2023.11.10
intake-geopandas==0.4.0
intake-stac==0.4.0
intake-xarray==0.7.0
ipdb==0.13.13
ipykernel==6.29.3
ipyleaflet==0.18.2
ipyspin==1.0.1
ipython==8.17.2
ipytree==0.2.2
ipyurl==0.1.2
ipywidgets==8.1.2
isodate==0.6.1
isoduration==20.11.0
jasper==4.2.1
jax==0.4.13
jaxlib==0.4.12
jedi==0.19.1
jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
json-c==0.17
json5==0.9.17
jsonpickle==3.0.2
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jsonschema-with-format-nongpl==4.21.1
jupyter-lsp==2.2.3
jupyter-panel-proxy==0.1.0
jupyter-resource-usage==1.0.1
jupyter-server-mathjax==0.2.6
jupyter-server-proxy==4.1.0
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_events==0.9.0
jupyter_server==2.12.5
jupyter_server_terminals==0.5.2
jupyter_server_xarray_leaflet==0.2.3
jupyter_telemetry==0.1.0
jupyterhub-base==4.0.2
jupyterhub-singleuser==4.0.2
jupyterlab==4.1.2
jupyterlab-git==0.50.0
jupyterlab-myst==2.3.1
jupyterlab-nvdashboard==0.4.0
jupyterlab_code_formatter==2.2.1
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.3
jupyterlab_widgets==3.0.10
kagglehub==0.1.9
kealib==1.5.3
keras==2.14.0
keras-core==0.1.7
keras-cv==0.8.2
kerchunk==0.2.3
kernel-headers_linux-64==2.6.32
keyutils==1.6.1
kiwisolver==1.4.5
krb5==1.21.2
lame==3.100
lazy_loader==0.3
lcms2==2.16
ld_impl_linux-64==2.40
lerc==4.0.0
libabseil==20230125.3
libaec==1.1.2
libarchive==3.7.2
libarrow==12.0.1
libass==0.17.1
libblas==3.9.0
libboost-headers==1.84.0
libbrotlicommon==1.0.9
libbrotlidec==1.0.9
libbrotlienc==1.0.9
libcblas==3.9.0
libcrc32c==1.1.2
libcurl==8.5.0
libdeflate==1.19
libdrm==2.4.120
libedit==3.1.20191231
libev==4.33
libevent==2.1.12
libexpat==2.5.0
libffi==3.4.2
libgcc-devel_linux-64==12.3.0
libgcc-ng==13.2.0
libgcrypt==1.10.3
libgd==2.3.3
libgdal==3.8.1
libgfortran-ng==13.2.0
libgfortran5==13.2.0
libgirepository==1.78.1
libglib==2.78.4
libglu==9.0.0
libgomp==13.2.0
libgoogle-cloud==2.12.0
libgpg-error==1.48
libgrpc==1.54.3
libhwloc==2.9.3
libiconv==1.17
libidn2==2.3.7
libjpeg-turbo==3.0.0
libkml==1.3.0
liblapack==3.9.0
libllvm14==14.0.6
libnetcdf==4.9.2
libnghttp2==1.58.0
libnsl==2.0.1
libnuma==2.0.16
libopenblas==0.3.26
libopenvino==2023.2.0
libopenvino-auto-batch-plugin==2023.2.0
libopenvino-auto-plugin==2023.2.0
libopenvino-hetero-plugin==2023.2.0
libopenvino-intel-cpu-plugin==2023.2.0
libopenvino-intel-gpu-plugin==2023.2.0
libopenvino-ir-frontend==2023.2.0
libopenvino-onnx-frontend==2023.2.0
libopenvino-paddle-frontend==2023.2.0
libopenvino-pytorch-frontend==2023.2.0
libopenvino-tensorflow-frontend==2023.2.0
libopenvino-tensorflow-lite-frontend==2023.2.0
libopus==1.3.1
libpciaccess==0.18
libpnetcdf==1.12.3
libpng==1.6.43
libpq==16.2
libprotobuf==3.21.12
librsvg==2.56.3
librttopo==1.1.0
libsanitizer==12.3.0
libsecret==0.18.8
libsodium==1.0.18
libspatialindex==1.9.3
libspatialite==5.1.0
libsqlite==3.45.1
libssh2==1.11.0
libstdcxx-ng==13.2.0
libtasn1==4.19.0
libthrift==0.18.1
libtiff==4.6.0
libunistring==0.9.10
libutf8proc==2.8.0
libuuid==2.38.1
libuv==1.46.0
libva==2.20.0
libvpx==1.13.1
libwebp==1.3.2
libwebp-base==1.3.2
libxcb==1.15
libxcrypt==4.4.36
libxgboost==2.0.3
libxml2==2.12.5
libxslt==1.1.39
libzip==1.10.1
libzlib==1.2.13
line_profiler==4.1.1
linkify-it-py==2.0.3
llvmlite==0.41.1
locket==1.0.0
lxml==5.1.0
lz4==4.3.3
lz4-c==1.9.4
lzo==2.10
mako==1.3.2
mapclassify==2.6.1
markdown==3.5.2
markdown-it-py==3.0.0
markupsafe==2.1.5
matplotlib-base==3.8.3
matplotlib-inline==0.1.6
mdit-py-plugins==0.4.0
mdurl==0.1.2
memory_profiler==0.61.0
mercantile==1.2.1
metpy==1.6.1
minizip==4.0.4
mistune==3.0.2
ml_dtypes==0.2.0
morecantile==5.3.0
mpi==1.0
mpich==4.2.0
msal==1.27.0
msal_extensions==1.1.0
msgpack-python==1.0.7
multidict==6.0.5
multimethod==1.11
multipledispatch==0.6.0
munkres==1.1.4
mypy_extensions==1.0.0
namex==0.0.7
nb_conda_kernels==2.3.1
nbclient==0.8.0
nbconvert-core==7.16.1
nbdime==4.0.1
nbformat==5.9.2
nbgitpuller==1.2.0
nbstripout==0.7.1
nc-time-axis==1.4.1
nccl==2.20.3.1
ncurses==6.4
nest-asyncio==1.6.0
netcdf-fortran==4.6.1
netcdf4==1.6.5
nettle==3.9.1
networkx==3.2.1
nodejs==20.9.0
noise==1.2.2
notebook==7.1.1
notebook-shim==0.2.4
nspr==4.35
nss==3.98
numba==0.58.1
numbagg==0.8.0
numcodecs==0.11.0
numpy==1.26.4
numpy_groupies==0.10.2
oauthlib==3.2.2
ocl-icd==2.3.2
ocl-icd-system==1.0.0
odc-geo==0.4.2
odc-stac==0.3.9
openh264==2.4.0
openjpeg==2.5.1
openssl==3.2.1
opt_einsum==3.3.0
optax==0.1.9
orc==1.9.0
overrides==7.7.0
p11-kit==0.24.1
packaging==23.2
pamela==1.1.0
pandas==2.2.1
pandocfilters==1.5.0
panel==1.3.8
pangeo-dask==2024.02.27
pangeo-notebook==2024.02.27
pango==1.52.0
parallelio==2.6.2
param==2.0.2
parcels==3.0.2
parso==0.8.3
partd==1.4.1
pathspec==0.12.1
patsy==0.5.6
pcre2==10.42
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.2.0
pint==0.23
pint-xarray==0.3
pip==24.0
pixman==0.43.2
pkgutil-resolve-name==1.3.10
platformdirs==4.2.0
pluggy==1.4.0
pooch==1.8.1
pop-tools==2023.6.0
poppler==23.12.0
poppler-data==0.4.12
portalocker==2.8.2
postgresql==16.2
pqdm==0.2.0
proj==9.3.1
prometheus_client==0.20.0
promise==2.3
prompt-toolkit==3.0.38
prompt_toolkit==3.0.38
properscoring==0.1
protobuf==4.21.12
psutil==5.9.8
pthread-stubs==0.4
ptyprocess==0.7.0
pugixml==1.14
pure_eval==0.2.2
py-xgboost==2.0.3
pyarrow==12.0.1
pyarrow-hotfix==0.6
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycairo==1.26.0
pycamhd==0.7.0
pycparser==2.21
pyct==0.4.6
pyct-core==0.4.6
pydantic==2.6.2
pydantic-core==2.16.3
pydap==3.4.0
pyerfa==2.0.1.1
pygments==2.17.2
pygobject==3.46.0
pyjwt==2.8.0
pykdtree==1.3.11
pymbolic==2022.2
pynvml==11.5.0
pyopenssl==23.1.1
pyorbital==1.8.2
pyparsing==3.1.1
pyproj==3.6.1
pyresample==1.28.1
pyshp==2.3.1
pysocks==1.7.1
pyspectral==0.13.0
pystac==1.9.0
pystac-client==0.7.5
pytest==8.0.2
python==3.11.8
python-blosc==1.10.6
python-cmr==0.9.0
python-dateutil==2.8.2
python-eccodes==1.7.0
python-fastjsonschema==2.19.1
python-flatbuffers==23.5.26
python-geotiepoints==1.7.2
python-gist==0.10.6
python-gnupg==0.4.9
python-graphviz==0.20.1
python-json-logger==2.0.7
python-tzdata==2024.1
python-xxhash==3.4.1
python_abi==3.11
pytools==2023.1.1
pytz==2024.1
pyu2f==0.1.5
pyviz_comms==3.0.1
pywavelets==1.4.1
pyyaml==6.0.1
pyzmq==25.1.2
rasterio==1.3.9
rdma-core==28.9
re2==2023.03.02
readline==8.2
rechunker==0.5.2
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
requests-oauthlib==1.3.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.0
rio-cogeo==5.2.0
rioxarray==0.15.1
roaring-landmask==0.7.1
rpds-py==0.18.0
rsa==4.9
rtree==1.2.0
ruamel.yaml==0.17.21
ruamel.yaml.clib==0.2.7
s2n==1.3.49
s3fs==2023.12.2
s3transfer==0.10.0
satpy==0.47.0
scikit-image==0.20.0
scikit-learn==1.4.1.post1
scipy==1.12.0
seaborn==0.13.2
seaborn-base==0.13.2
send2trash==1.8.2
setuptools==69.1.1
shapely==2.0.3
simpervisor==1.0.0
six==1.16.0
smmap==5.0.0
snakeviz==2.2.0
snappy==1.1.10
sniffio==1.3.1
snuggs==1.4.7
sortedcontainers==2.4.0
soupsieve==2.5
sparse==0.15.1
sqlalchemy==2.0.27
sqlite==3.45.1
stack_data==0.6.2
stackstac==0.5.0
starlette==0.36.3
statsmodels==0.14.1
svt-av1==1.8.0
sysroot_linux-64==2.12
tbb==2021.11.0
tblib==3.0.0
tensorboard==2.14.1
tensorboard-data-server==0.7.0
tensorflow==2.14.0
tensorflow-base==2.14.0
tensorflow-datasets==4.8.3
tensorflow-estimator==2.14.0
tensorflow-metadata==1.13.1
termcolor==2.4.0
terminado==0.18.0
threadpoolctl==3.3.0
tifffile==2020.6.3
tiledb==2.18.2
tiledb-py==0.24.0
timezonefinder==6.4.1
tinycss2==1.2.1
tinynetrc==1.3.1
tk==8.6.13
toml==0.10.2
tomli==2.0.1
toolz==0.12.1
tornado==6.4
tqdm==4.66.2
traitlets==5.14.1
traittypes==0.2.1
trajan==0.6.0
trollimage==1.23.1
trollsift==0.5.1
types-python-dateutil==2.8.19.20240106
typing-extensions==4.10.0
typing_extensions==4.10.0
typing_utils==0.1.0
tzcode==2024a
tzdata==2024a
uc-micro-py==1.0.3
ucx==1.14.1
ujson==5.9.0
uri-template==1.3.0
uriparser==0.9.7
urllib3==1.26.18
uvicorn==0.27.1
watermark==2.4.3
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
webob==1.8.7
websocket-client==1.7.0
werkzeug==3.0.1
wheel==0.42.0
widgetsnbextension==4.0.10
wrapt==1.14.1
x264==1!164.3095
x265==3.5
xarray==2024.2.0
xarray-datatree==0.0.14
xarray-spatial==0.3.5
xarray_leaflet==0.2.3
xarrayutils==2.0.0
xbatcher==0.3.0
xcape==0.1.4
xclim==0.48.2
xerces-c==3.2.5
xesmf==0.8.4
xgboost==2.0.3
xgcm==0.8.1
xhistogram==0.3.2
xmip==0.7.2
xmitgcm==0.5.2
xorg-fixesproto==5.0
xorg-inputproto==2.3.2
xorg-kbproto==1.0.7
xorg-libice==1.1.1
xorg-libsm==1.2.4
xorg-libx11==1.8.7
xorg-libxau==1.0.11
xorg-libxdmcp==1.1.3
xorg-libxext==1.3.4
xorg-libxfixes==5.0.3
xorg-libxi==1.7.10
xorg-libxrender==0.9.11
xorg-renderproto==0.11.1
xorg-xextproto==7.3.0
xorg-xproto==7.0.31
xpublish==0.3.3
xrft==1.0.1
xskillscore==0.0.24
xxhash==0.8.2
xyzservices==2023.10.1
xz==5.2.6
yamale==4.0.4
yaml==0.2.5
yarl==1.9.4
zarr==2.17.0
zeromq==4.3.5
zict==3.0.0
zipp==3.17.0
zlib==1.2.13
zstd==1.5.5

############### REQUIREMENTS.TXT END ##############

...  # And there's still a lot more required for this Docker Image...

Using a pre-existing Docker Image

run_docker.sh
# From https://pangeo-docker-images.readthedocs.io/en/latest/ . 
# More specifically, from
# https://pangeo-docker-images.readthedocs.io/en/latest/howto/launch.html

aws_access_key=$(aws configure get default.aws_access_key_id)
aws_secret_access_key=$(aws configure get default.aws_secret_access_key)
aws_session_token=$(aws configure get default.aws_session_token)
aws_region="us-east-1"

docker pull pangeo/pangeo-notebook:latest

docker run -it --rm \
    -e AWS_ACCESS_KEY_ID=$aws_access_key \
    -e AWS_SECRET_ACCESS_KEY=$aws_secret_access_key \
    -e AWS_SESSION_TOKEN=$aws_session_token \
    -e AWS_DEFAULT_REGION=us-east-1 \
    --volume $HOME:$HOME \
    -p 8080:8888 \
    pangeo/pangeo-notebook:latest \
    jupyter lab --ip 0.0.0.0 $HOME --NotebookApp.token=''
    

Finding running Docker Containers

Docker can run multiple containers at once (it can even run multiple containers from the same image, which is quite useful).

To find out which Docker containers are running, use docker ps. You’ll notice Docker borrows heavily from Linux and Git when it comes to naming its commands.

docker ps
CONTAINER ID   IMAGE                           COMMAND                  CREATED              STATUS              PORTS                NAMES
51aa22390e18   pangeo/pangeo-notebook:latest   "/srv/start jupyter …"   About a minute ago   Up About a minute   0.0.0.0:8080->8888   busy_blackwell
  • CONTAINER ID: hexadecimal string of 64 characters that uniquely identifies the container.
  • IMAGE: the name of the image running in the container.
  • COMMAND: any commands passed to docker run along with the image.
  • CREATED: how long ago the container was created.
  • STATUS: whether the container is running, exited, paused, restarting, etc.
  • PORTS: what external ports the container might be bound to.
  • NAMES: similar to CONTAINER ID, but much easier to type. These are randomly assigned by Docker.

Accessing and exploring running containers

Sometimes it’s useful to access a running container in order to diagnose bugs and test potential solutions. You can access a container via:

docker exec -it <CONTAINER NAME> bash
  • docker exec: execute a command in a running container
  • -i: interactive
  • -t: allocate a pseudo-TTY (ie, pull up a terminal console from within the running container)
  • <CONTAINER NAME>: whatever NAME Docker randomly assigned to this container.
  • bash: the shell that we want to use to interact with the container.

Run this command and then explore around the container. You can interact with it like you would any Linux operating system.

Removing running containers

And sometimes it’s useful to kill running containers. This need usually arises when you’re experimenting with formulating the correct Dockerfile for a specific project.

First run docker ps to get a list of all running containers. To kill a specific container, run:

docker kill <CONTAINER NAME>

Run docker ps again afterwards. That container ought to have disappeared from the list.

Resources

Docker Resources (Continued)