--- title: ".mono[RStudio] + Data i/o with .mono[R]" subtitle: "EC 425/525, Lab 3" author: "Edward Rubin" date: "`r format(Sys.time(), '%d %B %Y')`" output: xaringan::moon_reader: css: ['default', 'metropolis', 'metropolis-fonts', 'my-css.css'] # self_contained: true nature: highlightStyle: github highlightLines: true countIncrementalSlides: false --- class: inverse, middle ```{R, setup, include = F} # devtools::install_github("dill/emoGG") library(pacman) p_load( broom, tidyverse, latex2exp, ggplot2, ggthemes, ggforce, viridis, extrafont, gridExtra, kableExtra, snakecase, janitor, data.table, dplyr, estimatr, lubridate, knitr, parallel, lfe, here, magrittr ) # Define pink color red_pink <- "#e64173" turquoise <- "#20B2AA" orange <- "#FFA500" red <- "#fb6107" blue <- "#3b3b9a" green <- "#8bb174" grey_light <- "grey70" grey_mid <- "grey50" grey_dark <- "grey20" purple <- "#6A5ACD" slate <- "#314f4f" # Dark slate grey: #314f4f # Knitr options opts_chunk$set( comment = "#>", fig.align = "center", fig.height = 7, fig.width = 10.5, warning = F, message = F ) opts_chunk$set(dev = "svg") options(device = function(file, width, height) { svg(tempfile(), width = width, height = height) }) options(knitr.table.format = "html") ``` # Prologue --- name: schedule # Schedule ## Last time Working with data in .mono[R]—especially via `dplyr`. ## Today 1. .mono[RStudio] basics 2. Getting data in and out of .mono[R]. --- name: review # Review Key points from the last lab(s). 1. `dplyr` is your data-work friend. 2. Pipes (`%>%`) make your life easier..super[.pink[†]] .footnote[.pink[†] Check out `magrittr` for more pipe options, _e.g._, `%<>%`.] --- layout: false class: inverse, middle # RStudio --- name: features class: clear Let's recap some of the major features in .mono[RStudio]... ```{R, pic_rstudio, echo = F} knitr::include_graphics("RStudio/rstudio.png") ``` --- class: clear First, you write your .mono[R] scripts (source code) in the .hi[Source] pane. ```{R, pic_rstudio_source1, echo = F} knitr::include_graphics("RStudio/rstudio_source_rec.png") ``` --- class: clear You can use the menubar or .mono[⇧+⌘+N] to create new .mono[R] scripts. ```{R, pic_rstudio_source2, echo = F} knitr::include_graphics("RStudio/rstudio_source_arrow.png") ``` --- class: clear To execute commands from your .mono[R] script, use .mono[⌘+Enter]. ```{R, pic_rstudio_source3, echo = F} knitr::include_graphics("RStudio/rstudio_source_ex.png") ``` --- class: clear .mono[RStudio] will execute the command in the terminal. ```{R, pic_rstudio_source4, echo = F} knitr::include_graphics("RStudio/rstudio_source_ex2.png") ``` --- class: clear You can see our new object in the .hi[Environment] pane. ```{R, pic_rstudio_source5, echo = F} knitr::include_graphics("RStudio/rstudio_source_ex3.png") ``` --- class: clear The .hi-purple[History] tab (next to .hi[Environment]) records your old commands. ```{R, pic_rstudio_history, echo = F} knitr::include_graphics("RStudio/rstudio_history.png") ``` --- class: clear The .hi[Files] pane is file explorer. ```{R, pic_rstudio_files, echo = F} knitr::include_graphics("RStudio/rstudio_files.png") ``` --- class: clear The .hi[Plots] pane/tab shows... plots. ```{R, pic_rstudio_plots, echo = F} knitr::include_graphics("RStudio/rstudio_plots.png") ``` --- class: clear .hi[Packages] shows installed packages ```{R, pic_rstudio_packages, echo = F} knitr::include_graphics("RStudio/rstudio_packages.png") ``` --- count: false class: clear .hi[Packages] shows installed packages and whether they are .hi-purple[loaded]. ```{R, pic_rstudio_packages2, echo = F} knitr::include_graphics("RStudio/rstudio_packages2.png") ``` --- class: clear The .hi[Help] tab shows help documentation (also accessible via `?`). ```{R, pic_rstudio_help, echo = F} knitr::include_graphics("RStudio/rstudio_help.png") ``` --- class: clear Finally, you can customize the actual layout ```{R, pic_rstudio_layout, echo = F} knitr::include_graphics("RStudio/rstudio_layout.png") ``` --- count: false class: clear Finally, you can customize the actual layout and many other items. ```{R, pic_rstudio_customize, echo = F} knitr::include_graphics("RStudio/rstudio_customize.png") ``` --- name: best # .mono[R] and .mono[RStudio] ## Best practices 1. Write code in .mono[R] scripts. Troubleshoot in .mono[RStudio]. Then run the scripts. 1. Comment your code. (`# This is a comment`) 1. Name objects and variables with intelligible, standardized names. - .hi-purple[BAD] `ALLCARS`, `Vl123a8`, `a.fun`, `cens.12931`, `cens.12933` - .hi-pink[GOOD] `unique_cars`, `health_df`, `sim_fun`, `is_female`, `age` 1. Set seeds when generating randomness, _e.g._, `set.seed(123)`. 1. Parallelize when possible. (Packages: `parallel`, `purrr`, `foreach`, *etc.*) 1. Use projects in .mono[RStudio] (next). And organize your projects. --- layout: true # .mono[R] and .mono[RStudio] ## Projects --- name: projects Projects in .mono[R] offer several benefits: 1. Act as an anchor for working with files. 1. Make your work (projects) easily reproducible..super[.pink[†]] 1. Help you quickly jump back into your work. .footnote[.pink[†] In this class, we're assuming reproducibility is good/desirable.] --- layout: false class: clear To start a new project, hit the .hi[project icon]. ```{R, pic_rstudio_projects, echo = F} knitr::include_graphics("RStudio/rstudio_projects.png") ``` --- class: clear You'll then choose the folder/directory where your project lives. ```{R, pic_rstudio_projects2, echo = F} knitr::include_graphics("RStudio/rstudio_projects2.png") ``` --- class: clear If you open (double click) a project, .mono[RStudio] opens .mono[R] in that location. ```{R, pic_rstudio_projects3, echo = F} knitr::include_graphics("RStudio/rstudio_projects3.png") ``` --- count: false class: clear .mono[RStudio] will 'load' your previous setup (pane setup, scripts, *etc.*). ```{R, pic_rstudio_projects3b, echo = F} knitr::include_graphics("RStudio/rstudio_projects3.png") ``` --- layout: true # .mono[R] and .mono[RStudio] ## Projects --- .hi-purple[Without a project], you will need to define long file paths that you'll need to keep updating as folder names/locations change. -- `dir_class <- "/Users/edwardarubin/Dropbox/UO/Teaching/EC525S19/"`
`dir_labs <- paste0(dir_class, "NotesLab/")`
`dir_lab03 <- paste0(dir_labs, "03RInput/")`
`sample_df <- read.csv(paste0(dir_lab03, "sample.csv"))` -- .hi-pink[With a project], .mono[R] automatically references the project's folder. `sample_df <- read.csv("sample.csv")` -- .note[Double-plus bonus] The [`here`](https://github.com/r-lib/here) package extends projects' reproducibility. --- layout: true # Data i/o --- class: inverse, middle --- name: reading ## Reading files Projects solve the hardest part of data input/output in .mono[R], _i.e._, navigating your computer's file structure. .note[Steps to read in a file] 1. Figure out your .hi-slate[file's location] *relative to your project's location*. 1. .hi-slate[Find the function] that loads your files' file type. 1. .hi-slate[Load the file] with the function (using its location). --- name: dir ```{R, create_samplecsv, include = F, cache = T} p_load(babynames) set.seed(12345) n <- 15 write.csv( x = tibble( pid = str_pad(1:n, width = 3, side = "left", pad = 0), age = sample(x = 10:90, size = n, replace = T), first_name = c("Donald", sample(filter(babynames, between(year, 1990, 2000) & prop > 0.01)$name, size = n-1, replace = T)) %>% sample(size = n), is_orange = first_name == "Donald" ), file = "sample.csv", row.names = F ) ``` ## Reading CSVs We can check the files in the current (or any) directory with the `dir()`. -- ```{R, ex_dir} dir() ``` Our current directory has the CSV `sample.csv` that I want to load. --- name: read.csv ## Reading CSVs .mono[R]'s base function for reading CSVs is `read.csv(file)`. You feed `read.csv()` the directory and name of the CSV..super[.pink[†]] .footnote[.pink[†] There are many other optional arguments, _e.g._, whether variables are named, variable types, *etc.*] ```{R, ex_read.csv} read.csv("sample.csv") %>% head(4) ``` `read.csv()` returns a `data.frame` with the CSV's contents. --- name: read_csv ## Reading CSVs The Hadleyverse (technically, the `tidyverse` package) contains a package called `readr`, which contains the `read_csv()` function. `read_csv()` is pretty fast, guesses variable well, and returns a `tibble`..super[.pink[†]] .footnote[.pink[†] More speed: `fread()` from `data.table`. Notice `read.csv()` to `read_csv()` give `pid` differing classes.] ```{R, ex_read_csv} p_load(tidyverse) read_csv("sample.csv") %>% head(3) ``` --- name: read_other ## Reading other file types If you've got a file, chances are .mono[R] can read it. - Stata files: `read_dta` in `haven` - SAS files: `read_sas` in `haven` - Fixed-width files: `read_fwf()` in `readr` (also: `iotools`) - Excel files: `read_excel()` in `readxl` - Raster files: `raster()` in `raster` - Shapefiles: `st_read()` in `sf` --- name: write ## Writing If .mono[R] can read it, then .mono[R] can write it. Generally, there is a `write` or `save` function for each `read` function. ```{R, ex_write_csv, eval = F} # Read 'sample.csv' sample_df <- read_csv("sample.csv") # Write sample_df to 'sample_copy.csv' write_csv( x = sample_df, file = "sample_copy.csv" ) ``` --- name: rds ## RDS files While CSVs can be nice—they are readable without loading into a statistical program—when they get big, they can be slow and inefficient. .note[Enter] RDS files, .mono[R]'s compressed, faster answer. The base functions `readRDS()` and `saveRDS()` read and save RDS files. `readr` offers `read_rds()` and `write_rds()` for more standard naming. ```{R, ex_rds, eval = F} # Write sample_df to 'sample.rds' write_rds(x = sample_df, path = "sample.rds") # Read 'sample.rds' sample_df <- read_rds("sample.rds") ``` --- layout: false name: resources # Additional resources More resources related to today's materials. 1. .mono[RStudio]'s [cheatsheet for .mono[RStudio]](https://www.rstudio.com/wp-content/uploads/2016/01/rstudio-IDE-cheatsheet.pdf) 1. [Many other cheatsheets](https://www.rstudio.com/resources/cheatsheets/) from .mono[RStudio] --- layout: false # Table of contents .pull-left[.hi-slate[Data, .mono[R], and .mono[RStudio]] .smaller[ 1. [Schedule](#schedule) 1. [Review](#review) 1. [.mono[RStudio] features](#features) 1. [Best practices](#best) 1. [Projects](#projects) 1. [Data i/o](#reading) - [Reading files](#reading) - [`dir()`](#dir) - [`read.csv()`](#read.csv) - [`read_csv()`](#read_csv) - [Other file types](#read_other) - [Writing (output)](#write) - [RDS files](#rds) 1. [More resources](#resources) ]] --- exclude: true ```{R, generate pdfs, include = F, eval = T} source("../../ScriptsR/unpause.R") unpause("03RInput.Rmd", ".", T, T) ```