class: inverse, center, title-slide, middle <style> .title-slide .remark-slide-number { display: none; } </style> # .title-wrap[Intro to Programming with R for Political Scientists] <br /> ## .header-fancy[Session 1: RStudio and Version Control] ### Markus Freitag ### Geschwister Scholl Institute of Political Science, LMU ### [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#415564;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg>](https://twitter.com/MarkusGFreitag) [<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#415564;" xmlns="http://www.w3.org/2000/svg"> <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg>](https://markusfreitag.netlify.app/) ### July 12, 2021 <a href="https://github.com/m-freitag" class="github-corner" aria-label="View source on Github"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#415564; color:#f6f3f2; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style> --- class: inverse, center, middle name: intro # Intro --- # Hi! .hl2[Why this course?] - Intro to R with a focus on programming (↔ standard pol sci quant training). - A gentle introduction, but with some detail on how R works as a programming language (perhaps a bit different to other intro courses). - Set some good programming/workflow habits to make your life easier later on. -- > *Provide you with some basic practical skills necessary to get serious about quantitative research.* - There are tons of awesome & free materials from the best R Gurus of the world out there. - Take this course as a humble starting point to see through the thicket. .font70[.hl[Note:] Text will be highlighted in .hl[yellow] or .hl2[turquoise]. Links will appear [purple]().] --- # Hi! .hl[Who am I?] - <strike>Research Fellow (pre-doc)</strike>. Soon to be Junior Data Scientist in pharma. I like stats, programming, and causal inference. I also like cooking and (board) gaming. - Web: [<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#415564;" xmlns="http://www.w3.org/2000/svg"> <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg>](https://markusfreitag.netlify.app/) .hl2[Who are you?] Let's do a break-out icebreaker session. This will also be your team for the problem sets. --- # What are we going to cover? 1. .hl[Intro + R-Studio, and Git(Hub)] 2. Base R & Tidyverse Basics 3. Data Wrangling I 4. Data Wrangling II 5. Data Viz 6. Writing Functions --- # Why R? At the present day (and surely years to come), R is arguably the best programming language for academics: - R is from statisticians for statisticians. - Most active (academic) development community for .hl[statistical] computing/programming. - Excellent IDEs; Good integration of other languages and workflow components. - Best for data viz. --- # Why R? ![](data:image/png;base64,#Figs/whyr.png) --- # R versus other langs/software <img src="data:image/png;base64,#Figs/benchmarks.svg" width="73%" style="display: block; margin: auto;" /> <center> Source: [Julia Micro-Benchmarks](https://julialang.org/benchmarks/). <center/> --- # Showcase U can make pretty graphs... <img src="data:image/png;base64,#01_Intro_Git_files/figure-html/unnamed-chunk-2-1.png" width="70%" style="display: block; margin: auto;" /> <center> Data from [Haimueller/Bechtel 2011](https://web.stanford.edu/~jhain/Paper/AJPS2011.pdf). <center/> --- # Showcase U can make pretty [graphs](https://www.economist.com/graphic-detail/2021/04/13/in-america-republican-led-states-are-rolling-back-electoral-and-civil-liberties)... <div align="center"> <img src="https://www.economist.com/img/b/1000/590/90/sites/default/files/images/2021/04/articles/main/20210417_woc511.png" height=450> </div> --- # Showcase Or [maps](https://timogrossenbacher.ch/2019/04/bivariate-maps-with-ggplot2-and-sf/)... <img src="data:image/png;base64,#Figs/map_ineq.png" width="60%" style="display: block; margin: auto;" /> --- # Showcase Or interactive graphs... .pull-left[
] .pull-right[ <div align="center"> <img src="https://raw.githubusercontent.com/m-freitag/cjpowR/master/cjpowR_hex.png" height=200"> </div> If you are interested in [power for (conjoint/factorial) survey experiments](https://github.com/m-freitag/cjpowR)... ] --- # Showcase Or or interactive maps... <center>
<center/> .font60[Map shows [points of interest](https://www.opengov-muenchen.de/dataset/points-of-interest-an-der-sudlichen-isar) at the southern part of the Isar in Munich (GSI in orange). Let summer come!] --- # Showcase - You can easily combine R code and text in so-called Rmarkdown files to produce reproducible documents in various output formats (.html, .pdf, etc.). - This presentation is a (hopefully good) example. You can check out the source code [here](https://github.com/m-freitag/intro-r-polsci). Feel free to take and adapt. - We will learn a little more about this in just a few slides (and a little more next week)! --- class: inverse, center, middle name: intro # R-Studio & Git(Hub) --- # Installation Steps you should have done already: 1. Installed [R](https://www.r-project.org/). 2. Installed [R-Studio](https://www.rstudio.com/products/rstudio/download/#download). 3. Created a [GitHub](https://github.com/) account. Take some care with the username if you want to keep this account throughout your career. You can't change it afterwards. 4. Installed and set up [Git](https://git-scm.com/downloads) (and optionally a desktop client). --- # R-Studio R-Studio is an .hl2[IDE] (integrated development environment) for the R language: - comes with a console, code editor, tools for plotting, history, debugging, and workspace - open source - pretty accessible/easy to use Alternatives: - Visual Studio Code (nice, if you work with multiple langs; my fav IDE) - Alternatives are a bit more clunky for package development, shiny platforms, and some other R-specific stuff - If your main language is R, you can't go wrong with RStudio --- # R-Studio Let's take a tour... <img src="data:image/png;base64,#Figs/rst.png" width="75%" style="display: block; margin: auto;" /> --- # A Small Detour: Two Types of "Scripts" In this course, we will use two types of scripts to write our code: 1. Classic R-Scripts (`.R`): simple text file; comments are usually done like so: `# A comment.` 2. [Rmarkdown](https://rmarkdown.rstudio.com/lesson-1.html) files (`.rmd`): combines code and free text (+ figures and formulae) - Can be "knitted" to, e.g., .pdf, .html and Word - Makes your documents (e.g. a paper or a thesis) fully reproducible - Nice for problem sets (hence, we will use it for this right from the beginning) --- # Rmarkdown An Rmarkdown file consists of mainly three things: 1. .hl[YAML header] (Yet Another Markdown Language). Specifies meta info (e.g., author, date, document format, etc.): ```yaml --- title: "Untitled" author: Markus output: html_document --- ``` 2\. .hl[Code chunks] surrounded by ` ``` `. You can execute each chunk individually. 3\. .hl[Plain text] formatted via Markdown, a markup language with very [straightforward syntax](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf). Problem sets will come as pre-formatted `.rmd` files such that you can start working on exercises directly. --- # R projects - To keep our sanity when coding (and to produce something reproducible in the end), we want to keep all our data, analysis scripts, outputs, etc. together. **Three approaches:** -- **<span style="color: red;">Bad</span>** > Creating a folder in the explorer, dropping all files into it, and manually setting the working directory in the R script using, e.g., `setwd("C:/Users/XYZ/New Folder")`. Just don't use `setwd()`. Ever.<sup>1</sup> Why? Reproducibility. .footnote[[1] Yes, I am looking at you, Stata user, who loves to set working directories via `cd`.] --- # R projects **<span style="color: #eaaf0e;">Ok</span>** > Clicking ![](Figs/rproj.png?display=inline-block), creating a local "R-Project". This creates a folder with an `.Rproj` file. .font80[ Whenever you open an R-project, a fresh instance of R starts, and **the current working directory is set to the project directory.** You can then work with file paths relative to the project directory: E.g. `"Figures/somepicture.png"`. ] .font70[.hl[TIPP:] Future You should also check out the [`here`](https://github.com/jennybc/here_here) package to make relative file paths more robust.] **<span style="color: #03bc0d;">Good</span>** > Creating a project and using version control. This is where Git(Hub) comes in... --- # Why Git(Hub)? Having multiple scripts inside your project called, e.g., `thesis_analysis_final_01_revised_2.R` is a nightmare. - Using Git(Hub) improves your workflow: - Helps with keeping track of the changes you make. - Makes (code) collaboration with other researchers easy. - Helps to make your research/code projects accessible/reproducible/open source. --- # Why Git(Hub)? What's that thing called "Git"? - It's a version control system: $$ \text{Git} \approx \text{MsOffice track changes and restore features} + \text{dropbox/google drive version history}$$ - But it's better, especially for any kind of projects that involve code. - You have to "commit" (approx. an additional save) actively, but that's a good thing! --- # Why Git(Hub)? <div align="center"> <img src="https://imgs.xkcd.com/comics/git.png" height=500> </div> --- # GitHub GitHub to the rescue: - Built on top of Git - A kind of online cloud service that makes working with Git easier - Again, no automated sync (but that's good!) - Instead of in some folder, your project lives in a remote .hl2[repository.] -- .hl[The remote repository is your upstream storage.] → You can .hl2[clone] it from GitHub to create a local copy. → You can .hl2[fork] some repo (including those of other users); i.e., create a repo copy under "your repositories". You can then **clone** this forked repo to get a local copy. --- # Your first repo 1. Click [here](https://github.com/new) and create a new Git(Hub) repo. Call it "test", set it to .hl[private], and initialize with a readme file. 2. In R-Studio, navigate to `File > New Project > Version Control > Git`. 3. Paste the Repository URL, chose a name and project path, and **clone** the thing. 4. You will be asked to provide a personal access token. Generate it [here](https://github.com/settings/tokens) using some name and check the "repo box". 5. In the files tab, open README.md. Also, click on the Git tab. Do this .hl[now]! --- # 4 Operations You Need to Know 1. .hl[Stage] ("add") - Tells Git which files u want to make changes (edits, deletes, etc.) to in the repo (simplification); in RStudio this boils down to "selecting" files/changes to files by checking them. 2. .hl[Commit] - Git's way to "save" the changes you staged. 3. .hl[Pull] - "downloads" all new changes/new commits from GitHub 4. .hl[Push] (to origin) - "uploads" all commits to GitHub; to the origin, i.e., your upstream remote repo. --- # Commit Make some changes to the `README.md` file .hl2[save], stage, commit and push. To establish best practice, give your commit a meaningful name: <img src="data:image/png;base64,#Figs/git_commit.png" width="293" style="display: block; margin: auto;" /> - In general, commit whenever you think you made a meaningful change - Push a bit less often than you commit --- # Collaborate with Git(Hub) - Using version control really comes to shine when collaborating. - BUT: You are always collaborating with your future self. Therefore, always use version control. - To invite someone to a repo on GitHub, go to the repo `settings > manage access`. - Your collaborator can then clone the repo and contribute commits, push them, etc. --- # When Things Go Sideways: Merge Conflicts 1. Go to your new repo on .hl[GitHub]. Edit the README.md manually in line, say, .hl2[3.] 2. Commit some changes in the .hl2[same] line locally in R-Studio. 3. Pull (.hl[don't push]). 4. Git: ![:scale 50=%](data:image/png;base64,#https://img.devrant.com/devrant/rant/r_828080_WeZmP.gif) --- # When Things Go Sideways: Merge Conflicts What do you do now? - Well, you (maybe in exchange with your collaborator) decide! - Solve the conflict manually, add, commit, push. --- # Branches <img src="data:image/png;base64,#Figs/git_tree.png" width="50%" style="display: block; margin: auto;" /> --- # Branches - [Branches](https://git-scm.com/book/de/v2/Git-Branching-Branches-auf-einen-Blick) allow you to develop/test some ideas without touching the main version of your code. - Useful for larger projects. You get a full copy of the repo, and you can commit/pull/push all you want. - If your ideas turned out not to work. Just delete the branch. - In R-Studio, they can be created .hl2[via the little purple branch icon in the Git tab.] - If you want to integrate the feature/idea into the main branch, issue a [pull request](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) (the easiest way is via GitHub). --- # When Things Go Really, Really Wrong .pull-left[ - .hl[DON'T PUSH.] - If you did not, just clone a fresh instance of the repo. - If you did, you can revert to an older version (that may be easier in some cases but more work in others). ] .pull-right[ ![:scale 50=%](data:image/png;base64,#https://gifrific.com/wp-content/uploads/2012/07/Family-Guy-Do-Not-Push-Button.gif) ] --- # Workflow Summary 1. Create a repo or fork some existing repo -- 2. Invite collaborators -- 3. Clone it & create an R project -- 4. Edit code or make other changes -- 5. Stage, commit (with a message), pull (esp. if u work with others to avoid conflicts right away), push -- 6. Rinse and repeat steps 4-5. --- # Workflow Summary Yes, the order really is stage, commit, pull, push. .hl[Always commit first.] Pulling .hl[after] committing does .hl[not] overwrite your work. If you committed, what you pull gets compared to what you committed. If things conflict, it's up to you to resolve. If things work fine, great! If you work alone, you don't really need to pull if you do not change stuff on the remote repo manually. --- # Further Steps - .hl[Fork the course repo and clone it via R-Studio (that's perhaps a bit hacky but easy and does the job).] - You will need this to access the course materials/problem sets conveniently. - As the fork is your own copy of the repo, feel free to commit and push all you want. - You could also use the [shell](https://happygitwithr.com/shell.html) to clone the repo or, alternatively, GitHub Desktop. - For beginners, using a Git GUI (like GitHub Desktop) and R-Studio will cover well over 90% of the use cases. See, e.g., [here](https://happygitwithr.com) for further reading. --- class: inverse, center, middle name: intro # Next Up: Base R & the Tidyverse