--- title: "Getting to know .mono[R]" subtitle: "EC 425/525, Lab 1" author: "Edward Rubin" date: "`r format(Sys.time(), '%d %B %Y')`" output: xaringan::moon_reader: css: ['default', 'metropolis', 'metropolis-fonts', 'my-css.css'] # self_contained: true nature: highlightStyle: github highlightLines: true countIncrementalSlides: false --- class: inverse, middle ```{R, setup, include = F} # devtools::install_github("dill/emoGG") library(pacman) p_load( broom, tidyverse, latex2exp, ggplot2, ggthemes, ggforce, viridis, extrafont, gridExtra, kableExtra, snakecase, janitor, data.table, dplyr, estimatr, lubridate, knitr, parallel, lfe, here, magrittr ) # Define pink color red_pink <- "#e64173" turquoise <- "#20B2AA" orange <- "#FFA500" red <- "#fb6107" blue <- "#3b3b9a" green <- "#8bb174" grey_light <- "grey70" grey_mid <- "grey50" grey_dark <- "grey20" purple <- "#6A5ACD" slate <- "#314f4f" # Dark slate grey: #314f4f # Knitr options opts_chunk$set( comment = "#>", fig.align = "center", fig.height = 7, fig.width = 10.5, warning = F, message = F ) opts_chunk$set(dev = "svg") options(device = function(file, width, height) { svg(tempfile(), width = width, height = height) }) options(knitr.table.format = "html") ``` # Prologue --- name: schedule # Schedule ## Today Get to know .mono[R] 1. Basic features of .mono[R] 2. Fun with functions 3. OLS (canned and custom) 4. Simulations --- layout: true # .mono[R] intro --- name: types ## Object types/classes As we discussed in class, .mono[R] revolves around objects, _e.g._, `test <- 123`. --
.hi-slate[*Note*] You can also assign values to objects via `=`, _e.g._, `test = 123`. -- Objects have types/classes. -- - `1`, `2/3`, and are `numeric`. -- - `"Hello"` and `'cruel world'` are both `character`. -- - `TRUE`, `T`, `FALSE`, and `F` are `logical` (as is the result of `3 > 2`). -- The `class(x)` function tells you the class of object `x`. --- ## Object types/classes .pull-left[ ```{R, ex_class, split = T} 1 "Clever/funny example words?" 3 < 2 "Warriors" > "Bucks" ``` ] -- .pull-right[ ```{R, ex_class2, split = T} class(1) class("Clever/funny example words?") class(3 < 2) class("Warriors" > "Bucks") ``` ] --- name: structure ## Structure In addition to having types/classes, objects have some type of structure. - `1:3`, `c(1, 2)`, and `seq(2, 8, 2)` each produce a `numeric`-class `vector`. -- - `c("Alright", "already")` produces a `vector` of `character` class. -- - `c(1, 3, T, "Hello")` produces a `vector` of `character` class. -- - `matrix(data = 1:15, ncol = 5) ` creates a `matrix` with class from `data`. -- - `data.frame(x = 1:2, y = c("a", "b"), z = T)` produces a `data.frame` with three columns and two rows. The first column (`x`) is `numeric`; the second column (`y`) is `character`, and the third column (`z`) is logical. --- ## Object types .pull-left[ Our `matrix` ```{R, ex_matrix} matrix(data = 1:15, ncol = 5) ``` ] -- .pull-right[ Our first `data.frame`! ```{R, ex_df} data.frame(x = 1:3, y = T) ``` ] -- Notice how .mono[R] helps 'fill' out the columns when lengths don't match. --- ## Object types .mono[R] can help you check object's type. .pull-left[ ```{R, ex_matrix2} class(matrix(1:9, ncol = 3)) is.matrix(matrix(1:9, ncol = 3)) is.data.frame(matrix(1:9, ncol = 3)) ``` ] -- .pull-right[ ```{R, ex_df2} class(data.frame(x = 1:3)) is.matrix(data.frame(x = 1:3)) is.data.frame(data.frame(x = 1:3)) ``` ] --- name: mix ## Object types/classes .hi-slate[Q] What happens when we mix classes, _e.g._, `c(12, "B", F)`? -- .hi-slate[A] .mono[R] applies the class that can apply to all objects. .pull-left[ ```{R, ex_type1} c(12, "B") c(12, F) ``` ] .pull-right[ ```{R, ex_type2} c("B", F) c(12, "B", F) ``` ] --- name: change ## Changing types and classes .pull-left[ Change numbers to characters. ```{R, num2chr} as.character(1:3) ``` ] .pull-right[ Change logical to numeric. ```{R, log2num} as.numeric(c(T, F)) ``` ] .pull-left[ Change vector to matrix. ```{R, vec2mat} as.matrix(1:3) ``` ] --- name: packages ## Packages Straight out of the box, .mono[R] has a ton of useful features, but it really gets its power from the additional packages (libraries) that users create. - .hi-slate[Open-source greatness] Users find needs and create amazing solutions. - .hi-slate[*Caveat utilitor*] There are a lot of packages, each with a lot of functions. Mistakes can happen. - .hi-slate[Open-source greatness.sub[2]] Again, .mono[R] is open source: Check the code! --
(Maybe. Sometimes it's very hard.) -- .hi-slate[Examples] `ggplot2` (plotting), `dplyr` (data work that can link with SQL), `sf` and `raster` (geospatial work), `lfe` (high-dimensional fixed-effect regression), `data.table` (fast and efficient data work) --- ## Installing packages Once you find a function/package that you need to install,.super[.pink[†]] you'll typically install it via `install.packages("newAmazingPackage")`..super[.pink[††]] .footnote[.pink[†] Tool \#1: Google. .pink[††] The quotation marks are important.] We'll use the package `dplyr` throughout the course. Let's install it. ```{R, ex_install_dplyr, eval = F} # Install 'dplyr' package install.packages("dplyr") ``` -- .hi-slate[*Aside*] Notice the comment above the actual code (.mono[R] uses `#` for comments). --
While not necessary for .mono[R] to work, comments are necessary for research. --- ## Using packages Once you install a package, it is on your machine. You don't need to install it again—though you probably should update them from time to time. -- To .hi-slate[load a package], use the `library(package)` function.super[.pink[†]], _e.g._, to load `dplyr` .footnote[.pink[†] Notice `library()` doesn't *need* quotation marks. I know...] ```{R, eval = F} # Load 'dplyr' library(dplyr) ``` -- Now all functions contained in `dplyr` are available (until you close .mono[R]). --- ## Package management All of this installing, loading, updating, checking-for-existance-and-then-loading can get old. As can typing `library(pacakge1)`, `library(package2)`, ... -- .slate[*[Enter]*] The `pacman` package... for package management, of course. -- After installing (`install.packages("pacman")`), you can - Install and load packages via `p_load(package1, ..., packageN)` - Update packages via `p_update()` The `p_load` paradigm is especially helpful for collaboarations or projects across multiple machines. --- name: math ## Math in .mono[R] .pull-left[ .hi-slate[Basic algebra:] scalars `a` and `b` ```{R, math_algebra, eval = F} # Addition a + b # Subtraction a - b # Multiplication a * b # Division a / b # Mod a %% b # Integer division a %/% b # Exponents a^b ``` ] -- .pull-right[ .hi-pink[Matrix algebra:] matrices `A` and `B` ```{R, matrix_algebra, eval = F} # Addition A + B # Subtraction A - B # Multiplication A %*% B # Inverse solve(A) # Transpose t(A) # Diagonal diag(A) # Dimensions dim(A); nrow(A); ncol(A) ``` ] --- name: vectorization ## Vectorization One **great** feature in .mono[R]: vectorization. With vectorization, .mono[R] automatically applies functions to each element of a vector—no iteration required. --- ## Vectorization .pull-left[ ```{R, ex_vec} # Multiply a scalar by a scalar 3 * 4 # Multiply a scalar by a vector 3 * c(4, 5, 6) # Multiply a vector by a vector 1:3 * c(4, 5, 6) ``` ] .pull-right[ Vectorization can be confusing. ```{R, ex_vec_error} c(0.5, 0.9) + c(1, 2, 3) ``` .mono[R] will send you a warning, but it won't stop you. ] --- name: stat ## Statistics in .mono[R] .pull-left[ .hi-slate[Summaries] for samples `x` and `y` ```{R, stat_functions, eval = F} # Mean mean(x) # Median median(x) # Std. dev. and variance sd(x) var(x) # Min. and max. min(x) max(x) # Correlation/covariance cor(x, y) cov(x, y) # Quartiles and mean summary(x) ``` ] -- .pull-right[ .hi-pink[Sampling] ```{R, sampling_functions, eval = F} # Set the seed set.seed(246) # 4 random draws from N(3,5) rnorm(n = 4, mean = 3, sd = sqrt(5)) # CDF for N(0,1) at z=1.96 pnorm(q = 1.96, mean = 0, sd = 1) # Sample 5 draws from x w/ repl. sample( x = x, size = 5, replace = T ) # First and last 3 head(x, 3) tail(x, 3) ``` ] --- name: indexing ## Indexing vectors Because vectors are so central to .mono[R], being able to index your vectors is important. *Note:* Vectors have one dimension. Take the vector `x` (_e.g._, `x <- c(2, 4, 6, 9)`). - `x[3]` will give us the third element of the vector—_i.e._, `6`. - `x[2:3]` will give us the second *and* third elements—_i.e._, `c(4, 6)`. - `x[-1]` returns all elements *except the first*—_i.e._, `c(4, 6, 9)`. - `x[2] <- 0` replaces the second element with `0`—_i.e._, `c(2, 0, 6, 9)`. -- Lists, _e.g._, `list(1, 2, 3)`, are similar but use double brackets, _e.g._, `y[[3]]`. --- ## Indexing matrices Because matrices (and data frames) have two dimensions, we need to index both dimensions. For matrix `A` (_e.g._, `A <- matrix(1:9, ncol = 3)`) - `A[3,1]` references the element in the 3.super[rd] row and 1.super[st] column. - `A[3,]` references all elements in the 3.super[rd] row (across all columns). - `A[,1]` references all elements in the 1.super[st] column (across all rows). - `A[-2,]` returns all elements in `A` except for the 2.super[nd] row. - `A[2,3] <- 0` replaces the element `A[2,3]` with zero. -- You can also name rows/columns in matrices—and can use these names for referencing. --- ## Other .pull-left[ "Special" values - `Inf` is ∞, _i.e._, 1/0. `-Inf` is -∞. - `NA` is missing. - `NaN` is *not a number*. - `NULL` is null. ] -- .pull-right[ Standard logical operators - `==` for equality - `!=` is not equal. - `>`, `>=`, `<`, `<=` - `&` is *and*; `|` is *or*. ] -- .mono[R] orders by number, lowercase, then uppercase. ```{R, ex_order} # Ordering 1 < "a" ``` --- name: more ## `NA` Finally, `NA` contains no information in .mono[R] .pull-left[ ```{R, ex_na} NA == NA NA != NA NA > 0 ``` ] .pull-right[ ```{R, ex_na2} NA + 0 is.vector(NA) ``` ] --- name: functions ## Functions In general, a function takes some arguments, performs some internal tasks, and returns some output. .hi-slate[*Typical function in* .mono[R]:] `some_fun(arg1, arg2, arg3 = 0)` - For `some_fun` to run, you must define `arg1` and `arg2`, _e.g._, `some_fun(arg1 = 12, arg2 = -1)` - *Optional arguments* If you do not assign a value for `arg3`, then `some_fun` defaults to `arg3 = 0` - Omitted: `some_fun(arg1 = 12, arg2 = -1)` - Equivalent: `some_fun(arg1 = 12, arg2 = -1, arg3 = 0)` --- ## Functions Functions in .mono[R] are flexible. .hi-slate[*Examples*] - `c(arg1, arg2, ... argN)` returns a vector of the inputted arguments
*Note* `c()` takes many inputs and returns one output. - `ls()` lists all user-defined objects in the current environment
*Note* `ls` works without any inputs and returns a character vector. - `rm(obj)` removes the object `obj` from the current environment
*Note* `rm` can take many inputs and returns no output. --- name: user_fun ## User-defined functions .mono[R] makes it easy to define your own functions.^.pink[†] .footnote[.pink[†] We'll delve more deeply into this topic soon.] .hi-slate[*Standard example*] A function that returns the product of three numbers. ```{R, ex_fun} # Our function 'our_product' takes three arguments our_product <- function(num1, num2, num3) { # Calculate the product tmp_product <- num1 * num2 * num3 # Return the answer return(tmp_product) } ``` You *could* get away without using `return()` but that's not recommended. --- ## User-defined functions Our function in action... ```{R, ex_our_fun} our_product(1, 2, 3) ``` ```{R, ex_our_fun2} our_product(1, 2, NA) ``` --- name: exercise ## Exercises 1. Using the tools we've covered, generate a dataset $\left( n=50 \right)$ such that $$ \begin{align} y_i = 12 + 1.5 x_i + \varepsilon_i \end{align} $$ where $x_i\sim N(3,7)$ and $\varepsilon_i\sim N(0,1)$. 2. Estimate the relationship via OLS using only matrix algebra. Recall $$ \begin{align} \hat{\beta}_\text{OLS} = \left( {X}^\prime {X} \right)^{-1} {X}^\prime {y} \end{align} $$ 3. .hi-slate[*Harder*] Write a function that estimates OLS coefficients using matrix algebra. Compare your results with the canned function from .mono[R] (`lm`). 4. .hi-slate[*Hardest*] Bring it all together: Use your DGP (1) and function (3) to run a simulation that illustrates the unbiasedness of OLS. --- layout: false # Table of contents .hi-slate[Introduction to .mono[R]] .smaller[ 1. [Schedule](#schedule) 1. [Object types and classes](#types) - [Data structures](#structure) - [Mixing types/classes](#mix) - [Changing](#change) 1. [Packages](#packages) 1. [Math in .mono[R]](#math) 1. [Vectorization](#vectorization) 1. [Statistics and simulation](#stat) 1. [Indexing](#indexing) 1. [`NA` and logical operators](#more) 1. [Functions](#functions) 1. [User-defined functions](#user_fun) 1. [Exercise](#exercise) ] --- exclude: true ```{R, generate pdfs, include = F, eval = T} source("../../ScriptsR/unpause.R") unpause("01RBasics.Rmd", ".", T, T) ```