Lecture 24 - Docker for Data Science
conda, pipenv, virtualenv
bash shell, a git client, a Python interpreter, a Jupyter notebook server, a Quarto document processor, a SQL database, and Jupyter toolsdocker run command
Sign in on the Docker Desktop application or on the website, it will open a web browser and ask you to log inFROM image:tagDockerfile (without any extension) in your directoryFROM, followed by the base image we want to use: character is used to specify the version of the image. In this case, we are using Ubuntu 24.04LABEL commandLABEL instruction adds metadata to an imageMAINTAINER instruction to specify the maintainer of the imagedocker inspect commandRUN commandRUN instruction executes any commands in a new layer on top of the current image and commits the resultsapt? We will use it again to install software packages (as we did with AWS)apt-get update (only once) and apt-get install <package>
git, we would run apt-get update && apt-get install git -yapt-get clean and rm -rf /var/lib/apt/lists/* after installing the packages
RUN command# Update and install dependencies
# Versions: https://packages.ubuntu.com/
RUN apt-get update && apt-get install -y --no-install-recommends \
bash=5.2.21-2ubuntu4 \
git=1:2.43.0-1ubuntu7.2 \
sqlite3=3.45.1-1ubuntu2 \
wget=1.21.4-1ubuntu4.1 \
python3=3.12.3-1ubuntu0.5 \
python3.12-venv=3.12.3-1ubuntu0.5 \
python3-pip=24.0+dfsg-1ubuntu1.1 && \
apt-get clean && rm -rf /var/lib/apt/lists/*24.04 LTS or noble in this case)bashsh as the default shell, but we want to use bash insteadbash by adding the following line to the Dockerfile-c option tells bash to run the command and then exitbash commands in the Dockerfile without having to specify the shell every timeapt list --installed to see which packages are installed on my system and just copy them to the Dockerfilepip installed, so you would only need to install the other packagesRUN instructions to install a few Python libraries with pip3, such as numpy, pandas, jupyterlab, dask, and matplotlibpip show <package> | grep Version or pip freeze > requirements.txt and then copy the versions from the fileENV PATH="/opt/venv/bin:$PATH" prepends the directory/opt/venv/binto the beginning of the existingPATH` environment variable within the Docker imagewgetwget to download the binarypip or apt, so we need to download it from the official website: https://quarto.org/docs/get-started/wget is a command-line utility that allows you to download files from the web.deb file (which is the package format for Ubuntu), we can install it with apt-get install <package> (like we did with the other packages)wget, as long as we have the URL8888, so we will need to expose this port with the EXPOSE instructionbash inside the JupyterLab interface and have access to all the tools we installed in the container (like git, sqlite3, and Quarto) 😉# Base image
FROM ubuntu:24.04
# Metadata
LABEL version="1.0"
LABEL description="Container with the tools covered in QTM 350"
LABEL maintainer="Danilo Freire <danilo.freire@emory.edu>"
LABEL license="MIT"
# Update and install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends\
bash=5.2.21-2ubuntu4 \
git=1:2.43.0-1ubuntu7.2 \
sqlite3=3.45.1-1ubuntu2 \
libsqlite3-0=3.45.1-1ubuntu2 \
wget=1.21.4-1ubuntu4.1 \
nano=7.2-2ubuntu0.1 \
python3.12=3.12.3-1ubuntu0.5 \
python3.12-venv=3.12.3-1ubuntu0.5 \
python3-pip=24.0+dfsg-1ubuntu1.1 && \
apt-get clean && rm -rf /var/lib/apt/lists/*
# Set default shell to Bash
SHELL ["/bin/bash", "-c"]
# Create and activate virtual environment
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install Python dependencies in virtual environment
RUN pip install numpy==1.26.4 pandas==2.2.2 \
jupyterlab==4.2.5 ipykernel==6.29.5 \
dask==2024.11.2 matplotlib==3.9.2
# Install Quarto
RUN apt-get update && apt-get install -y --no-install-recommends wget ca-certificates && \
# Download the specific Quarto deb file
wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.6.37/quarto-1.6.37-linux-arm64.deb && \
# Install the local deb file (NOTICE the "./" prefix)
apt-get install -y ./quarto-1.6.37-linux-arm64.deb && \
# Clean up the downloaded file and apt cache
rm quarto-1.6.37-linux-arm64.deb && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Create a directory for saving files
RUN mkdir -p /workspace
WORKDIR /workspace
# Expose port for JupyterLab
EXPOSE 8888
# Start JupyterLab
CMD ["sh", "-c", ". /opt/venv/bin/activate && jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root"]
# Run the Docker container
# docker build -t qtm350-container .
# docker run -it --rm -p 8888:8888 -v $(pwd):/workspace qtm350-containerdocker build command-t flag is used to tag the image with a name, in this case qtm350-container. at the end of the command specifies the build context, which is the current directorydocker run command[...]
4.585 Some packages could not be installed. This may mean that you have
4.585 requested an impossible situation or if you are using the unstable
4.585 distribution that some required packages have not yet been created
4.585 or been moved out of Incoming.
4.585 The following information may help to resolve the situation:
4.585
4.585 The following packages have unmet dependencies:
4.650 sqlite3 : Depends: libsqlite3-0 (= 3.45.1-1ubuntu2) but 3.45.1-1ubuntu2.1 is to be installed
4.651 E: Unable to correct problems, you have held broken packages.sqlite3 package!libsqlite3-0 and check the available versions3.45.1-1ubuntu2 is available, so we can just add it to the apt-get install commanddocker run command-p flag to map the port 8888 of the container to the port 8888 of the host machine-v flag to mount a volume in the container, so we can persist the notebooks outside the container-v flag is used to mount the current directory ($(pwd)) to the /workspace directory in the containerCtrl+C in the terminal where the container is runningdocker ps command to see the list of running containers and then run the docker stop command with the container IDdocker rm command and the image with the docker rmi commanddocker tag and docker push commandsFROM instruction to specify the base image, then used the RUN instruction to install the system packages and the Python librariesENV instruction to set the PATH environment variable, the EXPOSE instruction to expose the port for the Jupyter notebook server, and the CMD instruction to start the Jupyter notebook serverLABEL instructionsdocker build command and ran it with the docker run command