Lecture 08 - Practice
Now, you will practice setting up a new data science project
You should execute these commands in your terminal and are encouraged to document their main commands in a README.md
file as they go.
Estimated Time: 40-50 minutes
!
command to run shell commands in Jupyter, it should work the same way!mkdir my_ds_project_revision
Throughout this revision, please execute the commands in your terminal. You should create a project_log.md
file in your project’s root directory and add the commands you use as you go.
my_ds_project_revision
.pwd
to confirm you’re in the right place.)Go to GitHub and create a new, empty public repository.
Name it my-ds-project-revision
(or similar).
Important: Do not initialise it with a README, .gitignore, or license yet.
Once created, copy the HTTPS or SSH URL for your new repository.
Back in your my_ds_project_revision
directory in the terminal:
YOUR_GITHUB_REPO_URL
with the URL you copied):main
(if it’s not already named that, e.g., if it’s master
):main
branch on GitHub:README.md
file.).gitignore
file:.gitignore
(you can use echo ... >> .gitignore
for each line or open it in a text editor):echo "# Python" >> .gitignore
echo "__pycache__/" >> .gitignore
echo "*.pyc" >> .gitignore
echo "*.pyo" >> .gitignore
echo "*.pyd" >> .gitignore
echo "" >> .gitignore
echo "# Jupyter Notebook" >> .gitignore
echo ".ipynb_checkpoints/" >> .gitignore
echo "" >> .gitignore
echo "# Data files" >> .gitignore
echo "data/raw/*" >> .gitignore
echo "data/processed/*" >> .gitignore
echo "!data/raw/placeholder.txt" >> .gitignore
echo "" >> .gitignore
echo "# Results" >> .gitignore
echo "results/*" >> .gitignore
echo "" >> .gitignore
echo "# Environment" >> .gitignore
echo ".env" >> .gitignore
echo "venv/" >> .gitignore
echo "env/" >> .gitignore
.gitignore
using cat .gitignore
)(Or in one command: git checkout -b feature/add-initial-script-logic
)*
Append lines to scripts/01_data_preprocessing.py
:
echo "# scripts/01_data_preprocessing.py" > scripts/01_data_preprocessing.py
echo "import pandas as pd" >> scripts/01_data_preprocessing.py
echo "" >> scripts/01_data_preprocessing.py
echo "def load_data(filepath):" >> scripts/01_data_preprocessing.py
echo " print(f\"Loading data from {filepath}...\")" >> scripts/01_data_preprocessing.py
echo " # df = pd.read_csv(filepath)" >> scripts/01_data_preprocessing.py
echo " # print(\"Data loaded successfully.\")" >> scripts/01_data_preprocessing.py
echo " # return df" >> scripts/01_data_preprocessing.py
echo "" >> scripts/01_data_preprocessing.py
echo "print(\"Data preprocessing script initialized.\")" >> scripts/01_data_preprocessing.py
cat scripts/01_data_preprocessing.py
)(Self-check: Verify on GitHub.)
Create project_log.md
. It’s recommended you do this as you go.
Add all the shell and Git commands you used to this file using Markdown.
Stage, commit, and push project_log.md
.