class: center, middle, inverse, title-slide # Regression Logic ## EC 320: Introduction to Econometrics ### Kyle Raze ### Fall 2019 --- class: inverse, middle # Prologue --- # Housekeeping Problem Set 2 - Analytical problems due by Tuesday at 17:00 on Canvas - Computational problems due by Friday at 17:00 on Canvas Midterm 1 next week (Wednesday) Midterm review on Monday --- # Last Time 1. Fundamental problem of econometrics 2. Selection bias 3. Randomized control trials --- class: inverse, middle # Regression Logic --- # Regression Economists often rely on (linear) regression for statistical comparisons. - *"Linear"* is more flexible than you think. Regression analysis helps us make *other things equal* comparisons. - We can model the effect of `\(X\)` on `\(Y\)` while .hi[controlling] .pink[for potential confounders]. - Forces us to be explicit about the potential sources of selection bias. - Failure to control for confounding variables leads to .hi[omitted-variable bias], a close cousin of selection bias --- # Returns to Private College **Research Question:** Does going to a private college instead of a public college increase future earnings? - **Outcome variable:** earnings - **Treatment variable:** going to a private college (binary) -- **Q:** How might a private school education increase earnings? -- **Q:** Does a comparison of the average earnings of private college graduates with those of public school graduates .pink[isolate the economic returns to private college education]? Why or why not? --- # Returns to Private College **How might we estimate the causal effect of private college on earnings?** **Approach 1:** Compare average earnings of private college graduates with those of public college graduates. - Prone to selection bias. **Approach 2:** Use a matching estimator that compares the earnings of individuals the same admissions profiles. - Cleaner comparison than a simple difference-in-means. - Somewhat difficult to implement. - Throws away data (inefficient). **Approach 3:** Estimate a regression that compares the earnings of individuals with the same admissions profiles. <!-- --- --> <!-- # Difference-in-Means, Take 2 --> <!-- ## Example: Returns to private college --> <!-- show same data with groupings based on application profile; what are the differences/similarities within/across groups?; calculate within-group diff-in-means; take average of these (unweighted, then weighted); show and discuss causal diagram --> <!-- --- --> <!-- # Difference-in-Means, Regression style --> <!-- ## Example: Returns to private college --> <!-- write pop model, describe coefficients and regression lingo; hand wave about OLS and estimated pop model; run regression of example data --> --- # The Regression Model We can estimate the effect of `\(X\)` on `\(Y\)` by estimating a .hi[regression model]: `$$Y_i = \beta_0 + \beta_1 X_i + u_i$$` - `\(Y_i\)` is the outcome variable. -- - `\(X_i\)` is the treatment variable (continuous). -- - `\(u_i\)` is an error term that includes all other (omitted) factors affecting `\(Y_i\)`. -- - `\(\beta_0\)` is the **intercept** parameter. -- - `\(\beta_1\)` is the **slope** parameter. --- # Running Regressions The intercept and slope are population parameters. Using an estimator with data on `\(X_i\)` and `\(Y_i\)`, we can estimate a .hi[fitted regression line]: `$$\hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1 X_i$$` - `\(\hat{Y_i}\)` is the **fitted value** of `\(Y_i\)`. - `\(\hat{\beta}_0\)` is the **estimated intercept**. - `\(\hat{\beta}_1\)` is the **estimated slope**. -- The estimation procedure produces misses called .hi[residuals], defined as `\(Y_i - \hat{Y_i}\)`. --- # Running Regressions In practice, we estimate the regression coefficients using an estimator called .hi[Ordinary Least Squares] (OLS). - Picks estimates that make `\(\hat{Y_i}\)` as close as possible to `\(Y_i\)` given the information we have on `\(X\)` and `\(Y\)`. - We will dive into the weeds after the midterm. --- # Running Regressions OLS picks `\(\hat{\beta}_0\)` and `\(\hat{\beta}_1\)` that trace out the line of best fit. Ideally, we wound like to interpret the slope of the line as the causal effect of `\(X\)` on `\(Y\)`. <img src="05-Regression_Logic_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Confounders However, the data are grouped by a third variable `\(W\)`. How would omitting `\(W\)` from the regression model affect the slope estimator? <img src="05-Regression_Logic_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- # Confounders The problem with `\(W\)` is that it affects both `\(Y\)` and `\(X\)`. Without adjusting for `\(W\)`, we cannot isolate the causal effect of `\(X\)` on `\(Y\)`. <img src="05-Regression_Logic_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- # Controlling for Confounders We can control for `\(W\)` by specifying it in the regression model: `$$Y_i = \beta_0 + \beta_1 X_i + \beta_2 W_i + u_i$$` - `\(W_i\)` is a **control variable**. - By including `\(W_i\)` in the regression, we can use OLS can difference out the confounding effect of `\(W\)`. - **Note:** OLS doesn't care whether a right-hand side variable is a treatment or control variable, but we do. --- # Controlling for Confounders .center[] --- # Controlling for Confounders Controlling for `\(W\)` "adjusts" the data by **differencing out** the group-specific means of `\(X\)` and `\(Y\)`. .hi-purple[Slope of the estimated regression line changes!] <img src="05-Regression_Logic_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- # Controlling for Confounders Can we interpret the estimated slope parameter as the causal effect of `\(X\)` on `\(Y\)` now that we've adjusted for `\(W\)`? <img src="05-Regression_Logic_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> --- # Controlling for Confounders ## Example: Returns to schooling Last class: > **Q:** Could we simply compare the earnings those with more education to those with less? > <br> **A:** If we want to measure the causal effect, probably not. .hi-green[What omitted variables should we worry about?] --- # Controlling for Confounders ## Example: Returns to schooling Three regressions ***of*** wages ***on*** schooling. <table> <caption>Outcome variable: log(Wage)</caption> <thead> <tr> <th style="text-align:left;"> Explanatory variable </th> <th style="text-align:center;"> 1 </th> <th style="text-align:center;"> 2 </th> <th style="text-align:center;"> 3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> Intercept </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> 5.571 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 5.581 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 5.695 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.039) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.066) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.068) </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> Education </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> 0.052 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.026 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.027 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.003) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.005) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.005) </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> IQ Score </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.004 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.003 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.001) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.001) </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> South </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> -0.127 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.019) </td> </tr> </tbody> </table> --- count: false # Controlling for Confounders ## Example: Returns to schooling Three regressions ***of*** wages ***on*** schooling. <table> <caption>Outcome variable: log(Wage)</caption> <thead> <tr> <th style="text-align:left;"> Explanatory variable </th> <th style="text-align:center;"> 1 </th> <th style="text-align:center;"> 2 </th> <th style="text-align:center;"> 3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> Intercept </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 5.571 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> 5.581 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 5.695 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.039) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.066) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.068) </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> Education </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.052 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> 0.026 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.027 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.003) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.005) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.005) </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> IQ Score </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> 0.004 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.003 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.001) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.001) </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> South </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> -0.127 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.019) </td> </tr> </tbody> </table> --- count: false # Controlling for Confounders ## Example: Returns to schooling Three regressions ***of*** wages ***on*** schooling. <table> <caption>Outcome variable: log(Wage)</caption> <thead> <tr> <th style="text-align:left;"> Explanatory variable </th> <th style="text-align:center;"> 1 </th> <th style="text-align:center;"> 2 </th> <th style="text-align:center;"> 3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> Intercept </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 5.571 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 5.581 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> 5.695 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.039) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.066) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.068) </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> Education </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.052 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.026 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> 0.027 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.003) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.005) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.005) </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> IQ Score </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> 0.004 </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> 0.003 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> (0.001) </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.001) </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;line-height: 110%;font-style: italic;color: black !important;"> South </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;line-height: 110%;font-weight: bold;"> -0.127 </td> </tr> <tr> <td style="text-align:left;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;"> </td> <td style="text-align:center;color: #23373b !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.019) </td> </tr> </tbody> </table> --- # Omitted-Variable Bias The presence of omitted-variable bias (OVB) precludes causal interpretation of our slope estimates. We can back out the sign and magnitude of OVB by subtracting the .pink[slope estimate from a ***long*** regression] from the .purple[slope estimate from a ***short*** regression]: `$$\text{OVB} = \color{#9370DB}{\hat{\beta}_1^{\text{Short}}} - \color{#e64173}{\hat{\beta}_1^{\text{Long}}}$$` -- __Dealing with potential sources of OVB is one of the main objectives of econometric analysis!__ <!-- Find example RCT data and run through R example w/ diff-in-means and regression --> <!-- https://www.povertyactionlab.org/evaluation/summer-jobs-reduce-violence-among-disadvantaged-youth-united-states -->