Lecture 24 - Docker for Data Science
conda
, pipenv
, virtualenv
bash
shell, a git
client, a Python
interpreter, a Jupyter
notebook server, a Quarto
document processor, a SQL
database, and Jupyter
toolsdocker run
command
Sign in
on the Docker Desktop application or on the website, it will open a web browser and ask you to log inFROM image:tag
Dockerfile
(without any extension) in your directoryFROM
, followed by the base image we want to use:
character is used to specify the version of the image. In this case, we are using Ubuntu 24.04LABEL
commandLABEL
instruction adds metadata to an imageMAINTAINER
instruction to specify the maintainer of the image# Metadata
LABEL version="1.0"
LABEL description="Container with the tools covered in QTM 350"
LABEL maintainer="Danilo Freire <danilo.freire@emory.edu>"
LABEL license="MIT"
docker inspect
commandRUN
commandRUN
instruction executes any commands in a new layer on top of the current image and commits the resultsapt
? We will use it again to install software packages (as we did with AWS)apt-get update
(only once) and apt-get install <package>
git
, we would run apt-get update && apt-get install git -y
apt-get clean
and rm -rf /var/lib/apt/lists/*
after installing the packages
RUN
command# Update and install dependencies
# Versions: https://packages.ubuntu.com/
RUN apt-get update && apt-get install -y --no-install-recommends \
bash=5.2.21-2ubuntu4 \
git=1:2.43.0-1ubuntu7.2 \
sqlite3=3.45.1-1ubuntu2 \
wget=1.21.4-1ubuntu4.1 \
python3=3.12.3-1ubuntu0.5 \
python3.12-venv=3.12.3-1ubuntu0.5 \
python3-pip=24.0+dfsg-1ubuntu1.1 && \
apt-get clean && rm -rf /var/lib/apt/lists/*
24.04 LTS
or noble
in this case)bash
sh
as the default shell, but we want to use bash
insteadbash
by adding the following line to the Dockerfile-c
option tells bash
to run the command and then exitbash
commands in the Dockerfile without having to specify the shell every timeapt list --installed
to see which packages are installed on my system and just copy them to the Dockerfilepip
installed, so you would only need to install the other packagesRUN
instructions to install a few Python libraries with pip3
, such as numpy
, pandas
, jupyterlab
, dask
, and matplotlib
pip show <package> | grep Version
or pip freeze > requirements.txt
and then copy the versions from the fileENV PATH="/opt/venv/bin:$PATH" prepends the directory
/opt/venv/binto the beginning of the existing
PATH` environment variable within the Docker imagewget
wget
to download the binarypip
or apt
, so we need to download it from the official website: https://quarto.org/docs/get-started/wget
is a command-line utility that allows you to download files from the web.deb
file (which is the package format for Ubuntu), we can install it with apt-get install <package>
(like we did with the other packages)wget
, as long as we have the URL8888
, so we will need to expose this port with the EXPOSE
instructionbash
inside the JupyterLab interface and have access to all the tools we installed in the container (like git
, sqlite3
, and Quarto
) 😉# Base image
FROM ubuntu:24.04
# Metadata
LABEL version="1.0"
LABEL description="Container with the tools covered in QTM 350"
LABEL maintainer="Danilo Freire <danilo.freire@emory.edu>"
LABEL license="MIT"
# Update and install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends\
bash=5.2.21-2ubuntu4 \
git=1:2.43.0-1ubuntu7.2 \
sqlite3=3.45.1-1ubuntu2 \
libsqlite3-0=3.45.1-1ubuntu2 \
wget=1.21.4-1ubuntu4.1 \
nano=7.2-2ubuntu0.1 \
python3.12=3.12.3-1ubuntu0.5 \
python3.12-venv=3.12.3-1ubuntu0.5 \
python3-pip=24.0+dfsg-1ubuntu1.1 && \
apt-get clean && rm -rf /var/lib/apt/lists/*
# Set default shell to Bash
SHELL ["/bin/bash", "-c"]
# Create and activate virtual environment
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install Python dependencies in virtual environment
RUN pip install numpy==1.26.4 pandas==2.2.2 \
jupyterlab==4.2.5 ipykernel==6.29.5 \
dask==2024.11.2 matplotlib==3.9.2
# Install Quarto
RUN apt-get update && apt-get install -y --no-install-recommends wget ca-certificates && \
# Download the specific Quarto deb file
wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.6.37/quarto-1.6.37-linux-arm64.deb && \
# Install the local deb file (NOTICE the "./" prefix)
apt-get install -y ./quarto-1.6.37-linux-arm64.deb && \
# Clean up the downloaded file and apt cache
rm quarto-1.6.37-linux-arm64.deb && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Create a directory for saving files
RUN mkdir -p /workspace
WORKDIR /workspace
# Expose port for JupyterLab
EXPOSE 8888
# Start JupyterLab
CMD ["sh", "-c", ". /opt/venv/bin/activate && jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root"]
# Run the Docker container
# docker build -t qtm350-container .
# docker run -it --rm -p 8888:8888 -v $(pwd):/workspace qtm350-container
docker build
command-t
flag is used to tag the image with a name, in this case qtm350-container
.
at the end of the command specifies the build context, which is the current directorydocker run
command[...]
4.585 Some packages could not be installed. This may mean that you have
4.585 requested an impossible situation or if you are using the unstable
4.585 distribution that some required packages have not yet been created
4.585 or been moved out of Incoming.
4.585 The following information may help to resolve the situation:
4.585
4.585 The following packages have unmet dependencies:
4.650 sqlite3 : Depends: libsqlite3-0 (= 3.45.1-1ubuntu2) but 3.45.1-1ubuntu2.1 is to be installed
4.651 E: Unable to correct problems, you have held broken packages.
sqlite3
package!libsqlite3-0
and check the available versions3.45.1-1ubuntu2
is available, so we can just add it to the apt-get install
commanddocker run
command-p
flag to map the port 8888
of the container to the port 8888
of the host machine-v
flag to mount a volume in the container, so we can persist the notebooks outside the container-v
flag is used to mount the current directory ($(pwd)
) to the /workspace
directory in the containerCtrl+C
in the terminal where the container is runningdocker ps
command to see the list of running containers and then run the docker stop
command with the container IDdocker rm
command and the image with the docker rmi
commanddocker tag
and docker push
commandsFROM
instruction to specify the base image, then used the RUN
instruction to install the system packages and the Python librariesENV
instruction to set the PATH
environment variable, the EXPOSE
instruction to expose the port for the Jupyter notebook server, and the CMD
instruction to start the Jupyter notebook serverLABEL
instructionsdocker build
command and ran it with the docker run
command