R packages used to produce this presentation
library(tidyverse) # for data wrangling and plottinglibrary(tidymodels) # for modeling the tidy waylibrary(knitr) # for presenting tableslibrary(xaringan) # for rendering xaringan presentations
If you are missing a package, run the following command
install.packages("package_name")
Alternatively, you can just use the pacman package that loads and installs packages:
if (!require("pacman")) install.packages("pacman")pacman::p_load(tidyvers, tidymodels, knitr, xaringan)
Best Practice | Methodology |
---|---|
High dimensional statistics | Machine learning |
# code annotation |
Notebooks (R Markdown, Jupyter) |
mydoc_1_3_new_final_23.docx |
Version control |
Ready to use tables (xlsx) | Generate tables (SQL, dplyr, pandas) |
?? | Reproducibility |
Stata, SAS, EViews | R, Python, Julia |
work solo | Interdisciplinary teams |
Reproducible research allows anyone to generate your exact same results.
To make your project reproducible you'll need to:
Being in a "reproducible" state-of-mind means putting yourself in the shoes of the consumers, rather than producers, of your code.
(In "consumers" I also include the future you!)
renv
The renv
package, by RSudio, helps you create reproducible environments for your R projects.
renv
will make your R projects more (From the renv
documentation):
Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because renv gives each project its own private package library.
Portable: Easily transport your projects from one computer to another, even across different platforms. renv makes it easy to install the packages your project depends on.
Reproducible: renv records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
For further details, see this introduction.
Docker is a virtual computer inside your computer.
Docker makes sure that anyone running your code will be able to perfectly reproduce your results.
Docker solves a major predictability barrier: replicating your entire development environment (operating system, R versions, dependencies, etc.).
For further details, see rOpenSci's tutorial.
If your R script starts with setwd()
or rm(list=ls())
then are doing something wrong!
Instead:
Use RStudio's project environment.
Go to Tools -> Global Options -> General
and set the "Save workspace to .RData on exit" to NEVER.
R Markdown notebooks, by RStudio, are perhaps THE go-to tool for conducting reproducible research in R.
The process of "knitting" an Rmd file starts with a clean slate.
An R Markdown file integrates text, code, links, figures, tables, and all that is related to your research project.
R Markdown is perfect for communicating research. One if its main advantages is that an *.Rmd file is a "meta-document" that can be exported as a:
blogdown
).bookdown
).pagedown
)flexdashboards
).%>%
is a pipeThe "pipe" operator %>%
introduced in the magrittr
package, is deeply rooted in the tidyverse
.
To understand what %>%
does, try associating it with the word "then".
Instead of y <- f(x)
, we type y <- x %>% f()
. This might seen cumbersome at first, but consider the following two lines of code:
> y <- h(g(f(x), z))> y <- x %>% f() %>% g(z) %>% h()
The second line of code should be read as: "take x
, then put it through f()
, then put the result through g(. , z)
, then put the result through h()
, and finally, keep the result in y.
df <- data.frame(x = rnorm(10), y = rnorm(10), z = rnorm(10))
df_new <- df[df$x > 0, c("x", "y")]df_new$xx <- df_new$x^2
df <- data.frame(x = rnorm(10), y = rnorm(10), z = rnorm(10))
df_new <- df[df$x > 0, c("x", "y")]df_new$xx <- df_new$x^2
df_new <- df %>% select(x, y) %>% filter(x > 0) %>% mutate(xx = x^2)
df_new <- df %>% select(x, y) %>% filter(x > 0) %>% mutate(xx = x^2)
The above code chunk should be read as:
"generate a new dataframe
df_new
by takingdf
, then selectx
andy
, then filter rows wherex
is positive, then mutate a new variablexx = x^2
"
Following a "tidy" approach makes your code more readable ⇒ more reproducible.
I believe that there is a growing consensus in the #rstats community that we should learn the tidyverse first.
Nevertheless, note that the tidyverse is "Utopian" in the sense that it strives toward perfection, and thus keeps changing. By contrast, base R was built to last.
As usual, being proficient in both (base R and the tidyverse) will get you far...
Which packages come with tidyverse
?
tidyverse_packages()
## [1] "broom" "cli" "crayon" "dbplyr" "dplyr" "forcats" ## [7] "ggplot2" "haven" "hms" "httr" "jsonlite" "lubridate" ## [13] "magrittr" "modelr" "pillar" "purrr" "readr" "readxl" ## [19] "reprex" "rlang" "rstudioapi" "rvest" "stringr" "tibble" ## [25] "tidyr" "xml2" "tidyverse"
Note that not all these packages are loaded by default (e.g., lubrudate
.)
We now briefly introduce one the tidyverse flagships: dplyr
.
dplyr
: The grammar of data manipulationdplyr
is THE go-to tool for data manipulation
Key "verbs":
filter()
- selects observations (rows).select()
- selects variables (columns).mutate()
- generate new variables (columns).arrange()
- sort observations (rows).summarise()
- summary statistics (by groups).Other useful verbs:
group_by()
- groups observations by variables.sample_n()
- sample rows from a table.dplyr
documentation)tidymodels
packagetidymodels::tidymodels_packages()
## [1] "broom" "cli" "crayon" "dials" "dplyr" ## [6] "ggplot2" "infer" "magrittr" "parsnip" "pillar" ## [11] "purrr" "recipes" "rlang" "rsample" "rstudioapi" ## [16] "tibble" "tidytext" "tidypredict" "tidyposterior" "tune" ## [21] "workflows" "yardstick" "tidymodels"
For further details, visit the tidymodels GitHub repo.
R for Data Science (r4ds) by Garrett Grolemund and Hadley Wickham.
Data wrangling and tidying with the “Tidyverse” by Grant McDerrmot.
Getting used to R, RStudio, and R Markdown by Chester Ismay and Patrick C. Kennedy.
Data Visualiztion: A practical introduction by Kieran Healy.
What's wrong with the "*_FINAL_FINAL" method?
What changed?
Where??
When???
By who????
You get the picture...
Git is a distributed version control system.
Huh?!
Sorry. Think of MS Word "track changes" for code projects.
Git has established itself as the de-facto standard for version control and software collaboration.
GitHub is a web-based hosting service for version control using Git.
OK, OK! Think of "Dropbox" for git projects. On steroids. And then some.
GitHub is where and how a large share of open-source projects (e.g., R packages) are being developed.
The source for the definition of GitHub is Wikipedia.
Happy Git and GitHub for the useR by Jenny Bryan.
Version Control with Git(Hub) by Grant McDerrmot.
RStudio:
GitHub Desktop:
install.packages("usethis")library(usethis)use_git_config( scope = "project", user.name = "Jane", user.email = "jane@example.org")
(6. Some extra steps are needed in order to publish and sync this new project with GitHub.)
The pull -> stage -> commit -> push workflow:
Cloning:
Syncing:
Open RStudio (or login to RStudio Cloud.)
Create your first R project.
Initiate Git.1
Create a new RMarkdown file.
Commit.
1 RStudio automatically generates a .gitignore
file that tells git which files to ignore (duh!). Click here for further details on how to configure what to ignore.
R packages used to produce this presentation
library(tidyverse) # for data wrangling and plottinglibrary(tidymodels) # for modeling the tidy waylibrary(knitr) # for presenting tableslibrary(xaringan) # for rendering xaringan presentations
If you are missing a package, run the following command
install.packages("package_name")
Alternatively, you can just use the pacman package that loads and installs packages:
if (!require("pacman")) install.packages("pacman")pacman::p_load(tidyvers, tidymodels, knitr, xaringan)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |