QTM 350 - Data Science Computing

Lecture 06 - More Git and GitHub

Danilo Freire

Emory University

18 September, 2024

Great to see you all again! 😊

Recap and lecture overview 📚

Recap of our last lecture

In our last class, we covered

  • How to think in terms of projects
  • File and folder organisation tips
  • Git installation (brew install git)
  • Git configuration
    • git config --global user.name "Your Name"
    • git config --global user.email your@email.com
  • Creating and init(ialising) repositories
  • How to add and commit changes
  • Viewing commit history with git log
  • How to use git checkout to go back in time
  • Checking repository status with git status

Lecture overview

Today we will cover

  • How to push and pull changes to and from remote repositories
  • How to use .gitignore to avoid tracking certain files
  • Creating and managing branches
  • How to clone and fork repositories on GitHub
  • Syncing forked repositories on GitHub
  • How to create issues and pull requests on GitHub
  • Other cool GitHub features, such as Gists, Pages, Actions, and CLI
  • If time allows, we will also see what GitHub Copilot does

Pushing changes to a remote repository 🚀

Pushing changes to a remote repository

  • Last class, we learned how to add and commit changes
  • But what if we want to share our changes with others? 🤔
  • There are two main ways to do this:
    • push changes to a remote repository (upload)
    • pull changes from a remote repository (download)
  • To push changes to a remote repository, use the command git push
  • And to pull changes, use the command git pull

Pushing changes to a remote repository

  • Let’s go back to our my-project local repository
  • Check the last commit with git log
  • Now, let’s push our changes to a remote repository on GitHub
  • First, we need to create a repository on GitHub
  • Go to GitHub and click on the + sign on the top right corner
  • Select New repository
  • Name your repository my-project
  • Click on Create repository

Now, let’s push our changes to GitHub

  • Why do we need to add a remote repository?
  • Because we need to tell Git where to push our changes!
  • git remote add origin https://github.com/danilofreire/my-project.git:
    • remote: tells Git we are adding a remote repository
    • add: adds a new remote repository
    • origin: the name of the remote repository
    • https://github.com/danilofreire/my-project.git is the URL of the remote repository
  • git branch -M master: renames the branch to master
  • git push -u origin master: pushes the changes to the remote repository

  • You may note that my branch was called main not master
  • There is a movement to change the default branch name from master to main. More here.
  • You can change the default branch name in your repository settings, but here I will stick to the default 😉

And voi-là! 🎉

How to pull changes from a remote repository

  • To push further changes to the remote repository, you just repeat the process:
    • git add file-name
    • git commit -m "Message"
    • git push (no need to add the remote repository again)
  • If you work by yourself on your computer only, you already know everything you need to know about Git and you can go home now 😂
  • Just kidding! There is still a lot to learn!
  • If you work with others, or if you use different computers, you need to know how to pull changes from GitHub
  • To pull changes from a remote repository, use (you’ve guessed it) the command git pull
  • git pull is the equivalent of git fetch + git merge
    • git fetch downloads the changes from the remote repository
    • git merge merges the changes with your local repository
  • Let’s see an example where I add a commit directly on GitHub
  • I will add a .gitignore file to ignore certain files and directories

Adding .gitignore to our repository

  • I will go to my repository on GitHub and click on Add file > Create new file

  • After adding the file, I will commit the changes

  • About .gitignore:

    • It is a file that tells Git which files to ignore
    • You can use wildcards, such as *.csv, to ignore all CSV files
    • You can also ignore entire directories, such as data/
    • You can also ignore files by name, such as secrets.txt

Pulling changes

  • Now, let’s pull the changes to my local computer
  • Just type git pull and the changes will be downloaded and merged
  • If you work with others and they push changes to GitHub, you will need to pull the changes to your local computer too

But wait a minute! 🤔

  • What if both my coauthors and I change the same line of code?
  • If you change different files or lines in the same file, Git will merge the changes automatically 👍
  • But if you change the same line of code, Git will not know what to do and will ask you to resolve the conflict
  • You can use a tool like git mergetool to help you resolve the conflict
  • Or you can open the file in a text editor and resolve the conflict manually
  • Let me change the .gitignore file both in my computer and on GitHub, and commit the changes (do not push yet)

Changing the same file on GitHub

  • I’ve changed the same line of code on GitHub

  • Let’s see what happens if I try to pull the changes to my local computer
  • Oh no! 😱

Resolving conflicts

  • To resolve the conflict, you need to open the file in a text editor

  • You will see the changes from both you and your coauthor (or yourself)

  • You can choose which changes to keep, or you can keep both changes

  • After resolving the conflict, you need to add the file and commit the changes

  • Then you can push the changes to GitHub

  • The line that starts with <<<<<<< is your changes

  • ======= indicates the end of your changes and the start of your coauthor’s changes

  • The line that starts with >>>>>>> is the end of the changes from your coauthor and the commit hash

  • I will delete some lines and keep the changes from my computer

Resolving conflicts

  • Now the conflict is resolved! 🥳

  • There are other ways to resolve conflicts, such as using git mergetool
  • You can also use a graphical tool, such as Sourcetree or GitHub Desktop
  • But it is good to know how to resolve conflicts manually 😊

Branches 🌿

Branches

  • Branches are a way to work on different features or versions of your project
  • The master (or main) branch is the default branch in Git on your computer and on GitHub
  • However, you can create new branches to work on new features or versions without affecting the master branch
  • You can create a new branch with the command git branch branch-name
  • And you can switch to the new branch with the command git checkout branch-name
    • Or you can use git switch branch-name if you have Git 2.23 or later
  • You can also create a new branch and switch to it with the command git checkout -b branch-name

  • Branches may or may not be merged back to the master branch
  • You can also delete branches after merging them

Branches

  • Let’s create a new branch called feature-1
    • git checkout -b feature-1
  • I will add a new file called feature-1.txt to the new branch
    • echo "This is feature 1" > feature-1.txt
  • Then, I will add, commit, and push the changes to GitHub
    • git add feature-1.txt && git commit -m "Add feature 1"
    • git push origin feature-1 (you need to push the new branch to GitHub)

Branches

  • Now the branch is already on GitHub

  • Let’s merge the changes to the master branch

    • git checkout master
    • git merge feature-1
    • git push
  • Done! 🎉 Let’s see what happened

  • You can delete the branch with git branch -d feature-1

  • To delete the branch on GitHub, use git push origin --delete feature-1

More about branches here.

Fork and clone 🍴

Cloning

  • Cloning is the process of downloading a repository from GitHub to your computer
  • You can clone your own repositories or other people’s repositories
  • To clone a repository, use the command git clone https://github.com/username/repository-name.git
    • For instance, git clone https://github.com/danilofreire/my-project.git
  • Cloning a repository will create a new folder with the repository name
  • Your copy will be directly connected to the original repository
  • Which means that, if the repository is not yours, you cannot push changes to it
  • Mostly used when you have push rights and want to work on a project locally
  • Cloning the my-project repository to a different folder

Forking

  • Forking is the process of creating your copy of a repository on GitHub
  • You can fork your own repositories or other people’s repositories
  • Different from cloning, forking will create a new repository on your GitHub account
  • This is very useful, as you can push changes to your forked repository
  • The fork maintains a connection to the original repository, often called the “upstream” repository
  • Forks can be used to create entirely new projects based on the original codebase
  • You can also create a pull request to the original repository to suggest changes

Forking

  • Forking from GitHub is easy, but it requires a few steps
  • Forking is not part of Git, but it is part of GitHub
    • That’s why you cannot do it from the command line!
  • To fork a repository, go to the repository on GitHub and click on the Fork button on the top right corner
  • Then, you can clone your forked repository to your computer as you did before

Forking

Forking

  • After forking, you will see the repository on your GitHub account
  • Clone it with git clone https://github.com/your-username/repository-name.git

Syncing a forked repository

  • After forking a repository, you may want to keep your forked repository up to date with the original repository
  • This is pretty easy too
  • Just go on GitHub and click on the “Sync fork” button and confirm the merge
    • Save your changes before doing this!
  • This is surely the easiest way, but you can do it from the command line too
    • Add the original repository as a remote repository
      • git remote add upstream https://github.com/original-username/original-repository.git
      • git fetch upstream
      • git merge upstream/main (or upstream/master)
      • git push
      • Done! 🎉

  • More about syncing a forked repository here.

Issues and pull requests 📥

Issues and pull requests

  • Issues are a way to track bugs, enhancements, or other tasks in a repository
  • You can create an issue by clicking on the Issues tab on GitHub and then on the New issue button
  • Pull requests are a way to suggest changes after you have forked and changed a repository
  • You can create a pull request by clicking on the Pull requests tab on GitHub and then on the New pull request button
  • The repository owner can then review your changes and merge them
  • And that’s how open-source projects work! 🤓🎉

Issues

Pull requests

Pull requests

Other cool GitHub features 🌟🐙

GitHub has many other interesting features

Gists

  • Gists are a great way to share code snippets, notes, or other text
  • You can create a Gist by clicking on the + sign on the top right corner of GitHub and then on New gist
  • Paste any text or code and save it
  • You can also create secret Gists that are not indexed by search engines
  • Gists also benefit from version control, so you can see the history of changes and revert to previous versions

GitHub Pages

  • This is one of the nicest features of GitHub
  • GitHub Pages allows you to create a website for your project, portfolio, or blog
  • As you can see, I use GitHub Pages for this course and for my personal website
  • You can create a website by creating a new repository with the name your-username.github.io
  • Then, you can upload your .html, .css, and .js files to the repository
  • Your website will be available at https://your-username.github.io
  • You can also use custom domains and Jekyll themes, which are pre-designed templates
  • Available themes here

GitHub Actions

  • GitHub Actions is a way to automate your workflow
  • Actions are workflows that run on GitHub when you push changes to your repository
  • You can use them to build, test, package, release, and deploy your code
  • They are a relatively new feature of GitHub and are very powerful
  • Basically, you can create a .yml file with the steps you want to run
  • Then, every time you push a new commit, GitHub will run the steps in the .yml file

GitHub Actions

GitHub in the command line

  • GitHub CLI is a command-line tool that brings GitHub to your terminal
  • You can install it with brew install gh
  • GitHub CLI creates repositories, issues, pull requests, and more
  • I use it both to create repositories and clone/fork them
    • gh repo create my-project
    • gh repo clone danilofreire/my-project
  • Have a look at the GitHub CLI manual for more information

GitHub Copilot 🤖

  • Okay, I have the feeling that you are already tired of me talking about GitHub 😂
  • But I’ve left the coolest feature for last: GitHub Copilot
  • This is really cool! GitHub Copilot is an AI pair programmer that helps you write code
  • Think about ChatGPT but for code
  • It is based on OpenAI’s Codex and is available as a Visual Studio Code extension
  • You can use it to write entire functions, classes, or even entire programs
  • Like ChatGPT, it is not perfect, but it is really good and can save you a lot of time

GitHub Copilot

GitHub Copilot

GitHub Copilot

Phew! That’s all for today! 🎉

Thank you for your attention and see you next week! 😊🙏