class: center, middle, inverse, title-slide # Getting to know .mono[R] ## EC 425/525, Lab 1 Solutions ### Edward Rubin ### 08 April 2019 --- exclude: true --- # Exercises 1. Using the tools we've covered, generate a dataset `\(\left( n=50 \right)\)` such that $$ `\begin{align} y_i = 12 + 1.5 x_i + \varepsilon_i \end{align}` $$ where `\(x_i\sim N(3,7)\)` and `\(\varepsilon_i\sim N(0,1)\)`. 2. Estimate the relationship via OLS using only matrix algebra. Recall $$ `\begin{align} \hat{\beta}_\text{OLS} = \left( {X}^\prime {X} \right)^{-1} {X}^\prime {y} \end{align}` $$ 3. .hi-slate[*Harder*] Write a function that estimates OLS coefficients using matrix algebra. Compare your results with the canned function from .mono[R] (`lm`). 4. .hi-slate[*Hardest*] Bring it all together: Use your DGP (1) and function (3) to run a simulation that illustrates the unbiasedness of OLS. --- layout: true # Example solution: Part 1 1\. Using the tools we've covered, generate a dataset `\(\left( n=50 \right)\)` such that $$ `\begin{align} y_i = 12 + 1.5 x_i + \varepsilon_i \end{align}` $$ where `\(x_i\sim N(3,7)\)` and `\(\varepsilon_i\sim N(0,1)\)`. --- .pull-left[.pseudocode-small[ 1. Set seed 1. Set sample size n=50 1. Generate x~N(3,7) 1. Generate ε~N(0,1) 1. Calculate y<br>y ≔ 12 + 1.5 x + ε ]] -- .pull-right[ ```r # Set seed set.seed(12345) # Set sample size n <- 50 # Generate x~N(3,7) x <- rnorm( n = n, mean = 3, sd = sqrt(7) ) # Generate ε~N(0,1) ε <- rnorm(n = n) # Calcualte y y <- 12 + 1.5 * x + ε ``` ] --- layout: false class: clear, center, middle <img src="01Solution_files/figure-html/plot1-1.svg" style="display: block; margin: auto;" /> --- layout: true # Example solution: Part 2 2\. Estimate the relationship via OLS using only matrix algebra. Recall $$ `\begin{align} \hat{\beta}_\text{OLS} = \left( {X}^\prime {X} \right)^{-1} {X}^\prime {y} \end{align}` $$ --- .pull-left[.pseudocode-small[ 1. Convert y to matrix 2. Create X matrix: [1 x] 3. OLS matrix math ]] -- .pull-right[ ```r # Convert y to matrix y_m <- as.matrix(y) # Create X matrix X_m <- cbind(1, x) # Matrix math XX <- t(X_m) %*% X_m Xy <- t(X_m) %*% y_m b_ols <- solve(XX) %*% Xy ``` ] -- - `cbind` is *column-binding* its arguments (`1` and `x`). -- - .slate[*Alternatives:*] - `matrix(data = c(rep(1, n), x), ncol = 2, byrow = F)` - `as.matrix(data.frame(1, x))` --- How did we do? -- .pull-left[ .hi-pink[Our estimates:] ```r b_ols ``` ``` #> [,1] #> 12.313159 #> x 1.499329 ``` ] -- .pull-right[ .hi-purple[.mono[R]'s estimates:] ```r lm(y ~ x) ``` ``` #> #> Call: #> lm(formula = y ~ x) #> #> Coefficients: #> (Intercept) x #> 12.313 1.499 ``` ] --- layout: false class: clear, center, middle <img src="01Solution_files/figure-html/sol2_plot-1.svg" style="display: block; margin: auto;" /> --- layout: true # Example solution: Part 3 3\. .hi-slate[*Harder*] Write a function that estimates OLS coefficients using matrix algebra. Compare your results with the canned function from .mono[R] (`lm`). .pseudocode-small[ 1. Convert data to matrix 1. Optional: Add a column of 1s 1. Calculate: (X'X).super[-1](X'y) ] --- Our function should take arguments - `y` (the outcome matrix) - `x` (covariates) - an optional argument for whether we add an intercept ot `x` --- ```r b_ols <- function(y, x, add_int = F) { # Force 'y' to matrix Y <- as.matrix(y) # Force 'x' to matrix X <- as.matrix(x) # If desired: Add intercept if (add_int == T) X <- cbind(1, X) # Matrix math b <- solve(t(X) %*% X) %*% t(X) %*% y # Done return(b) } ``` --- .pull-left[ .hi-pink[Us] ```r b_ols(y = y, x = x, add_int = T) ``` ``` #> [,1] #> [1,] 12.313159 #> [2,] 1.499329 ``` ] .pull-right[ .hi-purple[Canned .mono[R]] ```r lm(y ~ x) ``` ``` #> #> Call: #> lm(formula = y ~ x) #> #> Coefficients: #> (Intercept) x #> 12.313 1.499 ``` ] --- layout: true # Example solution: Part 4 4\. .hi-slate[*Hardest*] Bring it all together: Use your DGP (1) and function (3) to run a simulation that illustrates the unbiasedness of OLS. --- *Simulation outline* .pseudocode-small[ One iteration: 1. Generate data via DGP (x, ε, and y) 1. Estimate OLS coefficients Repeat n=10,000 times... ] --- Let's write a function for one iteration ```r one_iter <- function(iter, b0, b1, n) { # Generate x~N(3,7) x <- rnorm(n = n, mean = 3, sd = sqrt(7)) # Generate ε~N(0,1) ε <- rnorm(n = n) # Calcualte y y <- b0 + b1 * x + ε # Regress y and x with our function b_est <- b_ols(y = y, x = x, add_int = T) # Include iteration and convert to vector b_est <- c(iter, b_est) # Return return(b_est) } ``` --- Now we run the function 10,000 times...<sup>.pink[†]</sup> .footnote[.pink[†] We'll talk more about `lapply` and parallelization in the future.] ```r library(parallel) # Run the simulation (parallelized) sim_list <- mclapply( # The function we want to 'repeat' FUN = one_iter, # The values we want to use/vary X = 1:1e4, # Number of cores mc.cores = 4, # Other arguments/parameters for 'one_iter' b0 = 12, b1 = 1.5, n = 50 ) ``` --- .hi-slate[Q] What does our `list` named `sim_list` look like? -- ```r # First element sim_list[[1]] ``` ``` #> [1] 1.000000 11.907517 1.522365 ``` ```r # Last element tail(sim_list, 1) ``` ``` #> [[1]] #> [1] 10000.000000 12.290850 1.403717 ``` .hi-slate[A] It's made up of each iteration's vector of results. --- Let's bind the individual vectors together into a single data frame. ```r # Bind together the vectors (outputs matrix) sim_df <- do.call("rbind", sim_list) # Covert to data frame sim_df <- data.frame(sim_df) # Name our columns names(sim_df) <- c("iter", "b0", "b1") ``` --- layout: false class: clear Density of our estimates for `\(\beta_1\)` via OLS (mean `\(\hat{\beta}_1=\)` 1.501; `\(\beta_1=\)` 1.5) <img src="01Solution_files/figure-html/b_ols-1.svg" style="display: block; margin: auto;" /> .hi-slate[Q] Does this simulation tell us about *consistency* or *unbiasedness*? --- class: clear .mono[R] code from the density plot ```r ggplot(data = sim_df, aes(x = b1)) + geom_density(color = NA, fill = red_pink, alpha = 0.9) + ylab("Density") + xlab(expression(b[1])) + geom_hline(yintercept = 0, color = "black") + geom_vline(xintercept = 1.5, size = 1, linetype = "solid", color = orange) + theme_pander(base_family = "Fira Sans Book", base_size = 20) ``` --- exclude: true