class: center, middle, inverse, title-slide # Simple Linear Regression: Estimation ## EC 320: Introduction to Econometrics ### Kyle Raze ### Fall 2019 --- class: inverse, middle # Prologue --- # Housekeeping **Grading:** Expect Midterm 1 grades before Problem Set 2 grades. **Problem Set 3** - Due Wednesday, November 6th by 11:59pm *on Canvas*. - Submit computational solutions in an HTML file generated by R Markdown. - An R Markdown template is available on Canvas. - 10 points extra credit for typing analytical solutions. - .hi[The standard for extra credit will rise for Problem Sets 4 and 5] .pink[(*e.g.,* generate typed HTML solutions using R Markdown).] --- # Where Are We? ## Where we've been .hi[High Concepts] - Reviewed core ideas from statistics - Developed a framework for thinking about causality - Dabbled in regression analysis. Also, .mono[**R**]. --- # Where Are We? ## Where we're going .hi[The Weeds!] - Learn the mechanics of *how* OLS works -- - Interpret regression results (mechanically and critically) -- - Extend ideas about causality to a regression context -- - Think more deeply about statistical inference -- - Lay a foundation for more-sophisticated regression techniques. -- Also, **more** .mono[**R**]. --- class: inverse, middle # Simple Linear Regression --- # Addressing Questions ## Example: Effect of police on crime __Policy Question:__ Do on-campus police reduce crime on campus? -- - **Empirical Question:** Does the number of on-campus police officers affect campus crime rates? If so, by how much? How can we answer these questions? -- - Prior beliefs. -- - Theory. -- - __Data!__ --- # Let's _"Look"_ at Data ## Example: Effect of police on crime
--- # Take 2 ## Example: Effect of police on crime *"Looking"* at data wasn't especially helpful. -- Let's try using a scatter plot. - Plot each data point in `\((X,Y)\)`-space. - .mono[Police] on the `\(X\)`-axis. - .mono[Crime] on the `\(Y\)`-axis. --- # Take 2 ## Example: Effect of police on crime <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Take 2 ## Example: Effect of police on crime The scatter plot tells us more than the spreadsheet. - Somewhat weak _positive_ relationship. -- - Sample correlation coefficient of 0.14 confirms this. But our question was > Does the number of on-campus police officers affect campus crime rates? If so, by how much? - The scatter plot and correlation coefficient provide only a partial answer. --- # Take 3 ## Example: Effect of police on crime Our next step is to estimate a __statistical model.__ To keep it simple, we will relate an __explained variable__ `\(Y\)` to an __explanatory variable__ `\(X\)` in a linear model. --- # Simple Linear Regression Model We express the relationship between a .hi-purple[explained variable] and an .hi-green[explanatory variable] as linear: $$ \color{#9370DB}{Y_i} = \beta_1 + \beta_2\color{#007935}{X_i} + u_i. $$ - `\(\beta_1\)` is the __intercept__ or constant. - `\(\beta_2\)` is the __slope coefficient__. - `\(u_i\)` is an __error term__ or disturbance term. .footnote[ _Simple_ .mono[=] Only one explanatory variable. ] --- # Simple Linear Regression Model The .hi[intercept] tells us the expected value of `\(Y_i\)` when `\(X_i = 0\)`. $$ Y_i = \color{#e64173}{\beta_1} + \beta_2X_i + u_i $$ Usually not the focus of an analysis. --- # Simple Linear Regression Model The .hi[slope coefficient] tells us the expected change in `\(Y_i\)` when `\(X_i\)` increases by one. $$ Y_i = \beta_1 + \color{#e64173}{\beta_2}X_i + u_i $$ "A one-unit increase in `\(X_i\)` is associated with a `\(\color{#e64173}{\beta_2}\)`-unit increase in `\(Y_i\)`." -- Under certain (strong) assumptions about the error term, `\(\color{#e64173}{\beta_2}\)` is the _effect of_ `\(X_i\)` _on_ `\(Y_i\)`. - Otherwise, it's the _association of_ `\(X_i\)` _with_ `\(Y_i\)`. --- # Simple Linear Regression Model The .hi[error term] reminds us that `\(X_i\)` does not perfectly explain `\(Y_i\)`. $$ Y_i = \beta_1 + \beta_2X_i + \color{#e64173}{u_i} $$ Represents all other factors that explain `\(Y_i\)`. - Useful mnemonic: pretend that `\(u\)` stands for *"unobserved"* or *"unexplained."* --- # Take 3, continued ## Example: Effect of police on crime How might we apply the simple linear regression model to our question about the effect of on-campus police on campus crime? - Which variable is `\(X\)`? Which is `\(Y\)`? -- $$ \text{Crime}_i = \beta_1 + \beta_2\text{Police}_i + u_i. $$ - `\(\beta_1\)` is the crime rate for colleges without police. - `\(\beta_2\)` is the increase in the crime rate for an additional police officer per 1000 students. --- # Take 3, continued ## Example: Effect of police on crime How might we apply the simple linear regression model to our question? $$ \text{Crime}_i = \beta_1 + \beta_2\text{Police}_i + u_i $$ `\(\beta_1\)` and `\(\beta_2\)` are the population parameters we want, but we cannot observe them. -- Instead, we must estimate the population parameters. - `\(\hat{\beta_1}\)` and `\(\hat{\beta_2}\)` generate predictions of `\(\text{Crime}_i\)` called `\(\hat{\text{Crime}_i}\)`. - We call the predictions of the dependent variable __fitted values.__ -- - Together, these trace a line: `\(\hat{\text{Crime}_i} = \hat{\beta_1} + \hat{\beta_2}\text{Police}_i\)`. --- # Take 3, attempted ## Example: Effect of police on crime Guess: `\(\hat{\beta_1} = 60\)` and `\(\hat{\beta_2} = -7\)`. -- <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- # Take 4 ## Example: Effect of police on crime Guess: `\(\hat{\beta_1} = 30\)` and `\(\hat{\beta_2} = 0\)`. -- <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- # Take 5 ## Example: Effect of police on crime Guess: `\(\hat{\beta_1} = 15.6\)` and `\(\hat{\beta_2} = 7.94\)`. -- <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> --- # Residuals Using `\(\hat{\beta_1}\)` and `\(\hat{\beta_2}\)` to make `\(\hat{Y_i}\)` generates misses called .hi[residuals]: $$ \color{#e64173}{\hat{u}_i} = \color{#e64173}{Y_i - \hat{Y_i}}. $$ - Sometimes called `\(\color{#e64173}{e_i}\)`. --- # Residuals ## Example: Effect of police on crime Using `\(\hat{\beta_1} = 15.6\)` and `\(\hat{\beta_2} = 7.94\)` to make `\(\color{#9370DB}{\hat{\text{Crime}_i}}\)` generates .hi[residuals]. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- # Residuals We want an estimator that makes fewer big misses. Why not minimize `\(\sum_{i=1}^n \hat{u}_i\)`? -- - There are positive _and_ negative residuals `\(\implies\)` no solution (can always find a line with more negative residuals). __Alternative:__ Minimize the sum of squared residuals a.k.a. the .hi[residual sum of squares (RSS)]. - Squared numbers are never negative. --- # Residuals ## Example: Effect of police on crime .hi-blue[RSS] gives bigger penalties to bigger residuals. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> --- count: false # Residuals ## Example: Effect of police on crime .hi-blue[RSS] gives bigger penalties to bigger residuals. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> --- count: false # Residuals ## Example: Effect of police on crime .hi-blue[RSS] gives bigger penalties to bigger residuals. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> --- # Residuals ## Minimizing RSS We could test thousands of guesses of `\(\hat{\beta_1}\)` and `\(\hat{\beta_2}\)` and pick the pair that minimizes RSS. - Or we just do a little math and derive some useful formulas that give us RSS-minimizing coefficients without the guesswork. --- class: inverse, middle # Ordinary Least Squares (OLS) --- # OLS The __OLS estimator__ chooses the parameters `\(\hat{\beta_1}\)` and `\(\hat{\beta_2}\)` that minimize the .hi[residual sum of squares (RSS)]: `$$\min_{\hat{\beta}_1,\, \hat{\beta}_2} \quad \color{#e64173}{\sum_{i=1}^n \hat{u}_i^2}$$` This is why we call the estimator ordinary __least squares.__ --- # Deriving the OLS Estimator ## Outline 1. Replace `\(\sum_{i=1}^n \hat{u}_i^2\)` with an equivalent expression involving `\(\hat{\beta_1}\)` and `\(\hat{\beta_2}\)`. -- 2. Take partial derivatives of our RSS expression with respect to `\(\hat{\beta_1}\)` and `\(\hat{\beta_2}\)` and set each one equal to zero (first-order conditions). -- 3. Use the first-order conditions to solve for `\(\hat{\beta_1}\)` and `\(\hat{\beta_2}\)` in terms of data on `\(Y_i\)` and `\(X_i\)`. -- 4. Check second-order conditions to make sure we found the `\(\hat{\beta_1}\)` and `\(\hat{\beta_2}\)` that minimize RSS. --- # OLS Formulas For details, see the [handout](https://raw.githack.com/kyleraze/EC320_Econometrics/master/Lectures/07-Simple_Linear_Regression_Estimation/07-SLR_Estimation_handout.pdf) posted on Canvas. __Slope coefficient__ `$$\hat{\beta}_2 = \dfrac{\sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^n (X_i - \bar{X})^2}$$` __Intercept__ $$ \hat{\beta}_1 = \bar{Y} - \hat{\beta}_2 \bar{X} $$ --- # Slope coefficient The slope estimator is equal to the sample covariance divided by the sample variance of `\(X\)`: $$ `\begin{aligned} \hat{\beta}_2 &= \dfrac{\sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^n (X_i - \bar{X})^2} \\ \\ &= \dfrac{ \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{ \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2} \\ \\ &= \dfrac{S_{XY}}{S^2_X}. \end{aligned}` $$ --- # Take 6 ## Example: Effect of police on crime Using the OLS formulas, we get `\(\hat{\beta_1}\)` .mono[=] 18.41 and `\(\hat{\beta_2}\)` .mono[=] 1.76. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" /> --- count: false # Take 6 ## Example: Effect of police on crime Using the OLS formulas, we get `\(\hat{\beta_1}\)` .mono[=] 18.41 and `\(\hat{\beta_2}\)` .mono[=] 1.76. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> --- # Coefficient Interpretation ## Example: Effect of police on crime Using OLS gives us the fitted line $$ \hat{\text{Crime}_i} = \hat{\beta}_1 + \hat{\beta}_2\text{Police}_i. $$ What does `\(\hat{\beta_1}\)` .mono[=] 18.41 tell us? -- What does `\(\hat{\beta_2}\)` .mono[=] 1.76 tell us? -- __Gut check:__ Does this mean that police _cause_ crime? -- - Probably not. __Why?__ --- # Outliers ## Example: Association of police with crime <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" /> --- # Outliers ## Example: Association of police with crime .hi-purple[Fitted line] without outlier. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> --- count: false # Outliers ## Example: Association of police with crime .hi-purple[Fitted line] without outlier. .hi[Fitted line] with outlier. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" />