class: center, middle, inverse, title-slide # Simple Linear Regression: Estimation ## EC 320: Introduction to Econometrics ### Winter 2022 --- class: inverse, middle # Prologue --- # Housekeeping **Grading:** Midterm 1 grade out. **Problem Set 2:** Due Monday, Feb 7th by 11:59pm *on Canvas*. **Lab & Exercise:** Wednesday, Feb 2nd by 11:59pm. --- # Where Are We? ## Where we've been .hi[High Concepts] - Reviewed core ideas from statistics - Developed a framework for thinking about causality - Dabbled in regression analysis. Also, .mono[**R**]. --- # Where Are We? ## Where we're going .hi[The Weeds!] - Learn the mechanics of *how* OLS works -- - Interpret regression results (mechanically and critically) -- - Extend ideas about causality to a regression context -- - Think more deeply about statistical inference -- - Lay a foundation for more-sophisticated regression techniques. -- Also, **more** .mono[**R**]. --- class: inverse, middle # Simple Linear Regression --- # Addressing Questions ## Example: Effect of police on crime __Policy Question:__ Do on-campus police reduce crime on campus? -- - **Empirical Question:** Does the number of on-campus police officers affect campus crime rates? If so, by how much? How can we answer these questions? -- - Prior beliefs. -- - Theory. -- - __Data!__ --- # Let's _"Look"_ at Data ## Example: Effect of police on crime
--- # Take 2 ## Example: Effect of police on crime *"Looking"* at data wasn't especially helpful. -- Let's try using a scatter plot. - Plot each data point in `\((X,Y)\)`-space. - .mono[Police] on the `\(X\)`-axis. - .mono[Crime] on the `\(Y\)`-axis. --- # Take 2 ## Example: Effect of police on crime <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Take 2 ## Example: Effect of police on crime The scatter plot tells us more than the spreadsheet. - Somewhat weak _positive_ relationship. -- - Sample correlation coefficient of 0.14 confirms this. But our question was > Does the number of on-campus police officers affect campus crime rates? If so, by how much? - The scatter plot and correlation coefficient provide only a partial answer. --- # Take 3 ## Example: Effect of police on crime Our next step is to estimate a __statistical model.__ To keep it simple, we will relate an __explained variable__ `\(Y\)` to an __explanatory variable__ `\(X\)` in a linear model. --- # Simple Linear Regression Model We express the relationship between a .hi-purple[explained variable] and an .hi-green[explanatory variable] as linear: $$ \color{#9370DB}{Y_i} = \beta_0 + \beta_1\color{#007935}{X_i} + u_i. $$ - `\(\beta_0\)` is the __intercept__ or constant. - `\(\beta_1\)` is the __slope coefficient__. - `\(u_i\)` is an __error term__ or disturbance term. .footnote[ _Simple_ .mono[=] Only one explanatory variable. ] --- # Simple Linear Regression Model The .hi[intercept] tells us the expected value of `\(Y_i\)` when `\(X_i = 0\)`. $$ Y_i = \color{#e64173}{\beta_0} + \beta_1X_i + u_i $$ Usually not the focus of an analysis. --- # Simple Linear Regression Model The .hi[slope coefficient] tells us the expected change in `\(Y_i\)` when `\(X_i\)` increases by one. $$ Y_i = \beta_0 + \color{#e64173}{\beta_1}X_i + u_i $$ "A one-unit increase in `\(X_i\)` is associated with a `\(\color{#e64173}{\beta_1}\)`-unit increase in `\(Y_i\)`." -- Under certain (strong) assumptions about the error term, `\(\color{#e64173}{\beta_1}\)` is the _effect of_ `\(X_i\)` _on_ `\(Y_i\)`. - Otherwise, it's the _association of_ `\(X_i\)` _with_ `\(Y_i\)`. --- # Simple Linear Regression Model The .hi[error term] reminds us that `\(X_i\)` does not perfectly explain `\(Y_i\)`. $$ Y_i = \beta_0 + \beta_1X_i + \color{#e64173}{u_i} $$ Represents all other factors that explain `\(Y_i\)`. - Useful mnemonic: pretend that `\(u\)` stands for *"unobserved"* or *"unexplained."* --- # Take 3, continued ## Example: Effect of police on crime How might we apply the simple linear regression model to our question about the effect of on-campus police on campus crime? - Which variable is `\(X\)`? Which is `\(Y\)`? -- $$ \text{Crime}_i = \beta_0 + \beta_1\text{Police}_i + u_i. $$ - `\(\beta_0\)` is the crime rate for colleges without police. - `\(\beta_1\)` is the increase in the crime rate for an additional police officer per 1000 students. --- # Take 3, continued ## Example: Effect of police on crime How might we apply the simple linear regression model to our question? $$ \text{Crime}_i = \beta_0 + \beta_1\text{Police}_i + u_i $$ `\(\beta_0\)` and `\(\beta_1\)` are the population parameters we want, but we cannot observe them. -- Instead, we must estimate the population parameters. - `\(\hat{\beta_0}\)` and `\(\hat{\beta_1}\)` generate predictions of `\(\text{Crime}_i\)` called `\(\hat{\text{Crime}_i}\)`. - We call the predictions of the dependent variable __fitted values.__ -- - Together, these trace a line: `\(\hat{\text{Crime}_i} = \hat{\beta_0} + \hat{\beta_1}\text{Police}_i\)`. --- # Take 3, attempted ## Example: Effect of police on crime Guess: `\(\hat{\beta_0} = 60\)` and `\(\hat{\beta_1} = -7\)`. -- <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- # Take 4 ## Example: Effect of police on crime Guess: `\(\hat{\beta_0} = 30\)` and `\(\hat{\beta_1} = 0\)`. -- <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- # Take 5 ## Example: Effect of police on crime Guess: `\(\hat{\beta_0} = 15.6\)` and `\(\hat{\beta_1} = 7.94\)`. -- <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> --- # Residuals Using `\(\hat{\beta_0}\)` and `\(\hat{\beta_1}\)` to make `\(\hat{Y_i}\)` generates misses called .hi[residuals]: $$ \color{#e64173}{\hat{u}_i} = \color{#e64173}{Y_i - \hat{Y_i}}. $$ - Sometimes called `\(\color{#e64173}{e_i}\)`. --- # Residuals ## Example: Effect of police on crime Using `\(\hat{\beta_0} = 15.6\)` and `\(\hat{\beta_1} = 7.94\)` to make `\(\color{#9370DB}{\hat{\text{Crime}_i}}\)` generates .hi[residuals]. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- # Residuals We want an estimator that makes fewer big misses. Why not minimize `\(\sum_{i=1}^n \hat{u}_i\)`? -- - There are positive _and_ negative residuals `\(\implies\)` no solution (can always find a line with more negative residuals). __Alternative:__ Minimize the sum of squared residuals a.k.a. the .hi[residual sum of squares (RSS)]. - Squared numbers are never negative. --- # Residuals ## Example: Effect of police on crime .hi-blue[RSS] gives bigger penalties to bigger residuals. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> --- count: false # Residuals ## Example: Effect of police on crime .hi-blue[RSS] gives bigger penalties to bigger residuals. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> --- count: false # Residuals ## Example: Effect of police on crime .hi-blue[RSS] gives bigger penalties to bigger residuals. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> --- # Residuals ## Minimizing RSS We could test thousands of guesses of `\(\hat{\beta_0}\)` and `\(\hat{\beta_1}\)` and pick the pair that minimizes RSS. - Or we just do a little math and derive some useful formulas that give us RSS-minimizing coefficients without the guesswork. --- class: inverse, middle # Ordinary Least Squares (OLS) --- # OLS The __OLS estimator__ chooses the parameters `\(\hat{\beta_0}\)` and `\(\hat{\beta_1}\)` that minimize the .hi[residual sum of squares (RSS)]: `$$\min_{\hat{\beta}_0,\, \hat{\beta}_1} \quad \color{#e64173}{\sum_{i=1}^n \hat{u}_i^2}$$` This is why we call the estimator ordinary __least squares.__ --- # Deriving the OLS Estimator ## Outline 1. Replace `\(\sum_{i=1}^n \hat{u}_i^2\)` with an equivalent expression involving `\(\hat{\beta_0}\)` and `\(\hat{\beta_1}\)`. -- 2. Take partial derivatives of our RSS expression with respect to `\(\hat{\beta_0}\)` and `\(\hat{\beta_1}\)` and set each one equal to zero (first-order conditions). -- 3. Use the first-order conditions to solve for `\(\hat{\beta_0}\)` and `\(\hat{\beta_1}\)` in terms of data on `\(Y_i\)` and `\(X_i\)`. -- 4. Check second-order conditions to make sure we found the `\(\hat{\beta_0}\)` and `\(\hat{\beta_1}\)` that minimize RSS. --- # OLS Formulas For details, see the [handout](https://raw.githack.com/bchang2/ec320_w22/main/Lectures/07-Simple_Linear_Regression_Estimation/OLS_derivation.pdf) posted on Canvas. __Slope coefficient__ `$$\hat{\beta}_1 = \dfrac{\sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^n (X_i - \bar{X})^2}$$` __Intercept__ $$ \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X} $$ --- # Slope coefficient The slope estimator is equal to the sample covariance divided by the sample variance of `\(X\)`: $$ `\begin{aligned} \hat{\beta}_1 &= \dfrac{\sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^n (X_i - \bar{X})^2} \\ \\ &= \dfrac{ \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{ \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2} \\ \\ &= \dfrac{S_{XY}}{S^2_X}. \end{aligned}` $$ --- # Take 6 ## Example: Effect of police on crime Using the OLS formulas, we get `\(\hat{\beta_0}\)` .mono[=] 18.41 and `\(\hat{\beta_1}\)` .mono[=] 1.76. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" /> --- count: false # Take 6 ## Example: Effect of police on crime Using the OLS formulas, we get `\(\hat{\beta_0}\)` .mono[=] 18.41 and `\(\hat{\beta_1}\)` .mono[=] 1.76. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> --- # Coefficient Interpretation ## Example: Effect of police on crime Using OLS gives us the fitted line $$ \hat{\text{Crime}_i} = \hat{\beta}_0 + \hat{\beta}_1\text{Police}_i. $$ What does `\(\hat{\beta_0}\)` .mono[=] 18.41 tell us? -- What does `\(\hat{\beta_1}\)` .mono[=] 1.76 tell us? -- __Gut check:__ Does this mean that police _cause_ crime? -- - Probably not. __Why?__ --- # Outliers ## Example: Association of police with crime <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" /> --- # Outliers ## Example: Association of police with crime .hi-purple[Fitted line] without outlier. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> --- count: false # Outliers ## Example: Association of police with crime .hi-purple[Fitted line] without outlier. .hi[Fitted line] with outlier. <img src="07-Simple_Linear_Regression_Estimation_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" />