All About Regression

class: center, middle, inverse, title-slide

# All About Regression
## EC 350: Labor Economics
### <a href="https://kyleraze.com">Kyle Raze</a>
### Winter 2022

---

# All About Regression

## **Econometrics**

**The objective?** Identify the effect of a treatment variable `$D$` on an outcome variable `$Y$`..super[.hi-pink[<span>&#8224;</span>]]

- **How?** Find a way to shut down .hi-pink[selection bias].

.footnote[.super[.hi-pink[<span>&#8224;</span>]] The other objective? Forecast future values of key outcome variables, such as unemployment, GDP, customer retention, *etc.* But that's a different subject for a different course.]

## **Regression analysis**

> A set of statistical processes for quantifying the relationship between a dependent variable (*e.g.,* an outcome) and one or more independent variables (*e.g.,* a treatment or a control variable).

A bundle of useful tools for doing econometrics!

---
# All About Regression

## **Regression analysis**

Economists often rely on regression analysis to make various statistical comparisons.

- Can facilitate *other things equal* comparisons.
- Can shut down .pink[selection bias] by explicitly **controlling for** .hi-pink[confounding variables].
- Failure to control for confounding variables? .mono[-->] .hi-pink[omitted-variable bias].

**Our objective?** Learn how to interpret the results of a regression analysis.

1. **Literal interpretation**
    - Interpret the size and statistical significance of regression coefficient estimates.
    - Know your way around a regression table.
2. **Big-picture interpretation** 
    - What do the estimates imply about the effects of a treatment? 
    - Should we trust the estimates? Do they reflect a causal relationship?

---
class: inverse, middle

# Simple linear regression

---
# Simple linear regression

---
count: false
# Simple linear regression

---
# Simple linear regression

## **Model**

We can express the relationship between the .hi-purple[outcome variable] and the .hi-green[treatment variable] as linear:

$$
 \color{#9370DB}{Y_i} = \alpha + \beta~\color{#007935}{D_i} + \varepsilon_i
$$

- `$i$` indexes an individual.
- `$\alpha$` .mono[=] the __intercept__ or constant.
- `$\beta$` .mono[=] the __slope coefficient__.
    - Imagine for now that `$D_i$` can take on many different values (*e.g.,* more than just 0 or 1).
- `$\varepsilon_i$` .mono[=] the __error term__.

.footnote[
_Simple_ .mono[=] Only one independent variable.
]

---
# Simple linear regression

## **Model**

The .hi[intercept] tells us the expected value of `$Y_i$` when `$D_i = 0$`.

$$
 Y_i = \color{#e64173}{\alpha} + \beta ~ D_i + \varepsilon_i
$$

Part of the regression line, but almost never the focus of an analysis.

- In practice, omitting the intercept would bias estimates of the slope coefficient&mdash;the object we really care about.

---
# Simple linear regression

## **Model**

The .hi[slope coefficient] tells us the expected change in `$Y_i$` when `$D_i$` increases by one.

$$
 Y_i = \alpha + \color{#e64173}{\beta} ~ D_i + \varepsilon_i
$$

"A one-unit increase in `$D_i$` *is associated with* a `$\color{#e64173}{\beta}$`-unit increase in `$Y_i$`."

Under certain (strong) assumptions about the error term (*e.g.,* no selection bias), `$\color{#e64173}{\beta}$` represents the causal effect of `$D_i$` on `$Y_i$`.

- "A one-unit increase in `$D_i$` *leads to* a `$\color{#e64173}{\beta}$`-unit increase in `$Y_i$`."
- Otherwise, it's just the _association of_ `$D_i$` _with_ `$Y_i$`, representing a non-causal correlation.

---
# Simple linear regression

## **Model**

The .hi[error term] reminds us that `$D_i$` isn't the only variable that affects `$Y_i$`.

$$
 Y_i = \alpha + \beta ~ D_i + \color{#e64173}{\varepsilon_i}
$$

The error term represents all other factors that explain `$Y_i$`.

- **So what?** If some of those factors influence `$D_i$`, then omitted-variable bias will contaminate estimates of the slope coefficient.

---
# Simple linear regression

## **Example**

.pull-left[
**Q:** How does attendance affect performance?

As a first attempt at an answer, we can estimate a regression of final exam scores on attendance: `$$\text{Final}_i = \alpha + \beta~\text{Attend}_i + \varepsilon_i$$`

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Parameter </th>
   <th style="text-align:center;"> (1) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Intercept </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 56.82 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (2.19) </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Attendance </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 0.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.08) </td>
  </tr>
</tbody>
</table>
.center[*Standard errors in parentheses.*]
]
.pull-right[
<img src="04-All_About_Regression_files/figure-html/attend_1_plot-1.svg" style="display: block; margin: auto;" />
]

---
# Simple linear regression

## **Example**

.pull-left[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/campus_crime_1_plot-1.svg" style="display: block; margin: auto;" />
]
.pull-right[
**Q:** Do police on college campuses reduce crime?

- What does the slope coefficient tell us?
]

---
count: false
# Simple linear regression

## **Example**

.pull-left[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/campus_crime_2_plot-1.svg" style="display: block; margin: auto;" />
]
.pull-right[
**Q:** Do police on college campuses reduce crime?

- What does the slope coefficient tell us?

**Q:** Does this mean that police *cause* crime!?

- Why or why not?
]

---
count: false
# Simple linear regression

## **Example**

.pull-left[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/campus_crime_3_plot-1.svg" style="display: block; margin: auto;" />
]
.pull-right[
**Q:** Do police on college campuses reduce crime?

- What does the slope coefficient tell us?

**Q:** Does this mean that police *cause* crime!?

- Why or why not?

.footnote[For an interesting discussion of the causal effects of police staffing on crime and arrests&mdash;and how those effects vary by race&mdash;check out [episode 55](https://www.probablecausation.com/podcasts/episode-55-morgan-williams-jr) of the [*Probable Causation*](https://www.probablecausation.com/) podcast.]
]

---
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from?
]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" />
]

---
count: false
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from? <br>
**A:** A routine called **ordinary least squares (OLS)**.

]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" />
]

---
count: false
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from? <br>
**A:** A routine called **ordinary least squares (OLS)**.

**How does OLS work?**

]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" />
]

---
count: false
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from? <br>
**A:** A routine called **ordinary least squares (OLS)**.

**How does OLS work?**

- Every "fitted line" produces .hi-pink[residuals].
- Residual .mono[=] actual .mono[-] .hi-purple[predicted]

]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" />
]

---
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from? <br>
**A:** A routine called **ordinary least squares (OLS)**.

**How does OLS work?**

- Some fitted lines generate bigger residuals than others.

]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 58.2 .mono[+] -2.2 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" />
]

---
count: false
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from? <br>
**A:** A routine called **ordinary least squares (OLS)**.

**How does OLS work?**

- Some fitted lines generate bigger residuals than others.

]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 20.5 .mono[+] 3.15 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" />
]

---
count: false
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from? <br>
**A:** A routine called **ordinary least squares (OLS)**.

**How does OLS work?**

- Some fitted lines generate bigger residuals than others.

]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 1.3 .mono[+] 0.75 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" />
]

---
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from? <br>
**A:** A routine called **ordinary least squares (OLS)**.

**How does OLS work?**

- The "line of best fit" is the line that **minimizes** the **sum of squared residuals**.
- **Q:** Why squared?

]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" />
]

---
count: false
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from? <br>
**A:** A routine called **ordinary least squares (OLS)**.

**How does OLS work?**

- The "line of best fit" is the line that **minimizes** the **sum of squared residuals**.
- **Q:** Why squared?
- Using math you'll see in EC 320 or matrix algebra, OLS does this without the guesswork.

]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" />
]

---
# Simple linear regression

## **Estimation**

.pull-left[
**Q:** Where does the regression line come from? <br>
**A:** A routine called **ordinary least squares (OLS)**.

**How does OLS work?**

- **"Squares?"** Sum of squared residuals.
- **"Least?"** Minimize that sum.
- **"Ordinary?"** Oldest, most common way of estimating a regression.

]
.pull-right[
.center[.purple[Crime.sub[*i*] .mono[=] 18.41 .mono[+] 1.76 Police.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" />
]

---
# Simple linear regression

## **Example: Returns to education**

The optimal investment in education by students, parents, and legislators depends in part on the monetary *return to education*.

.hi-purple[Thought experiment:]
- Randomly select an individual.
- Give her an additional year of education.
- How much do her earnings increase?

The change in her earnings describes the .hi-slate[causal effect] of education on earnings.

---
# Simple linear regression

## **Example: Returns to education**

.pull-left[
.center[.purple[Earnings.sub[*i*] .mono[=] 146.95 .mono[+] 60.21 Schooling.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-16-1.svg" style="display: block; margin: auto;" />
]
.pull-right[
**Q:** How much extra money can a worker in this sample expect from an additional year of education?

- How do you know?
]

---
count: false
# Simple linear regression

## **Example: Returns to education**

.pull-left[
.center[.purple[Earnings.sub[*i*] .mono[=] 146.95 .mono[+] 60.21 Schooling.sub[*i*]]]
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-17-1.svg" style="display: block; margin: auto;" />
]
.pull-right[
**Q:** How much extra money can a worker in this sample expect from an additional year of education?

- How do you know?

**Q:** Does this number represent the causal return to an additional year of education?

- What other variables could be driving the relationship?
]

---
class: inverse, middle

# Making adjustments

---
# Making adjustments

.pull-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-18-1.svg" style="display: block; margin: auto;" />
]
.pull-right[
We can produce a fitted line by estimating a regression of an outcome on a treatment: `$$Y_i = \alpha + \beta~D_i + \varepsilon_i$$`

`$\beta$` describes how the outcome changes, *on average*, when treatment changes.

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Parameter </th>
   <th style="text-align:center;"> (1) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Intercept </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 1.22 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.18) </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Treatment </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 0.56 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.08) </td>
  </tr>
</tbody>
</table>
.center[*Standard errors in parentheses.*]
]

---
# Making adjustments

.pull-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-20-1.svg" style="display: block; margin: auto;" />
]
.pull-right[

However, we might worry that a third variable `$W_i$` confounds our estimate of the effect of the treatment on the outcome.
]

---
# Making adjustments

.pull-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-21-1.svg" style="display: block; margin: auto;" />
]
.pull-right[

If data on the confounder exists, it can be added to the regression model: `$$Y_i = \alpha + \beta~D_i + \gamma~W_i + \varepsilon_i$$`
]

**Q:** How does OLS "adjust" for the confounder?

---
count: false
# Making adjustments

.pull-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-22-1.svg" style="display: block; margin: auto;" />
]
.pull-right[

If data on the confounder exists, it can be added to the regression model: `$$Y_i = \alpha + \beta~D_i + \gamma~W_i + \varepsilon_i$$`

**Q:** How does OLS "adjust" for the confounder?

- **Step 1:** Figure out what differences in D are explained by W.
]

---
# Making adjustments

.pull-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-23-1.svg" style="display: block; margin: auto;" />
]
.pull-right[

If data on the confounder exists, it can be added to the regression model: `$$Y_i = \alpha + \beta~D_i + \gamma~W_i + \varepsilon_i$$`

**Q:** How does OLS "adjust" for the confounder?

- **Step 2:** Remove differences in D explained by W.
]

---
# Making adjustments

.pull-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-24-1.svg" style="display: block; margin: auto;" />
]
.pull-right[

If data on the confounder exists, it can be added to the regression model: `$$Y_i = \alpha + \beta~D_i + \gamma~W_i + \varepsilon_i$$`

**Q:** How does OLS "adjust" for the confounder?

- **Step 3:** Figure out what differences in Y are explained by W.
]

---
# Making adjustments

.pull-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-25-1.svg" style="display: block; margin: auto;" />
]
.pull-right[

If data on the confounder exists, it can be added to the regression model: `$$Y_i = \alpha + \beta~D_i + \gamma~W_i + \varepsilon_i$$`

**Q:** How does OLS "adjust" for the confounder?

- **Step 4:** Remove differences in Y explained by W.
]

---
# Making adjustments

.pull-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-26-1.svg" style="display: block; margin: auto;" />
]
.pull-right[

If data on the confounder exists, it can be added to the regression model: `$$Y_i = \alpha + \beta~D_i + \gamma~W_i + \varepsilon_i$$`

**Q:** How does OLS "adjust" for the confounder?

- **Step 5:** Fit a regression through the adjusted data.
]

---
# Making adjustments

.pull-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-27-1.svg" style="display: block; margin: auto;" />
]
.pull-right[

If data on the confounder exists, it can be added to the regression model: `$$Y_i = \alpha + \beta~D_i + \gamma~W_i + \varepsilon_i$$`

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Parameter </th>
   <th style="text-align:center;"> (1) </th>
   <th style="text-align:center;"> (2) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Intercept </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 1.22 </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 0.9 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;"> (0.18) </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.1) </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Treatment </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 0.56 </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> -0.42 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;"> (0.08) </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.07) </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Confounder </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;">  </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 3.91 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.2) </td>
  </tr>
</tbody>
</table>
.center[*Standard errors in parentheses.*]
]

---
# Omitted-variable bias

## **Example: Returns to education**

.pull-left[
<br>
<table>
<caption>Outcome: Weekly Earnings</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Parameter </th>
   <th style="text-align:center;"> 1 </th>
   <th style="text-align:center;"> 2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Intercept </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 146.95 </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;"> -128.89 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (77.72) </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;"> (92.18) </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Schooling (Years) </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 60.21 </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 42.06 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (5.70) </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;"> (6.55) </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> IQ Score (Points) </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;">  </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 5.14 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;"> (0.96) </td>
  </tr>
</tbody>
</table>
.center[*Standard errors in parentheses.*]
]

.pull-right[

]

---
count: false
# Omitted-variable bias

## **Example: Returns to education**

.pull-left[
<br>
<table>
<caption>Outcome: Weekly Earnings</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Parameter </th>
   <th style="text-align:center;"> 1 </th>
   <th style="text-align:center;"> 2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Intercept </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 146.95 </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> -128.89 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;"> (77.72) </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (92.18) </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Schooling (Years) </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 60.21 </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 42.06 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;"> (5.70) </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (6.55) </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> IQ Score (Points) </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;">  </td>
   <td style="text-align:center;color: #272822 !important;line-height: 110%;font-weight: bold;"> 5.14 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-style: italic;color: black !important;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;">  </td>
   <td style="text-align:center;color: #272822 !important;color: #c2bebe !important;line-height: 110%;font-weight: bold;"> (0.96) </td>
  </tr>
</tbody>
</table>
.center[*Standard errors in parentheses.*]
]

.pull-right[
<br> <br>

.orange[Bias] from omitting IQ score 
<br> `$\quad$` .mono[=] .pink["short"] .mono[-] .purple["long"]
<br> `$\quad$` .mono[=] .pink[60.21] .mono[-] .purple[42.06]
<br> `$\quad$` .mono[=] .orange[18.15]

The first regression mistakenly attributes some of the influence of intelligence to education.
]

---
# Omitted-variable bias

.more-left[
<img src="04-All_About_Regression_files/figure-html/venn2-1.svg" style="display: block; margin: auto;" />
]

.less-right[

.hi-purple[Y] .mono[=] Outcome

.hi-green[D] .mono[=] Treatment

.hi-orange[W] .mono[=] Omitted variable

If .hi-orange[W] is correlated with both .hi-green[D] and .hi-purple[Y] .mono[-->] omitted variable bias .mono[-->] regression fails to isolate the causal effect of .hi-green[D] on .hi-purple[Y].

]

---
# Omitted-variable bias

.more-left[
<img src="04-All_About_Regression_files/figure-html/unnamed-chunk-31-1.svg" style="display: block; margin: auto;" />
]

.less-right[

.hi-purple[Y] .mono[=] Outcome

.hi-green[D] .mono[=] Treatment

.hi-orange[W] .mono[=] Omitted variable

If .hi-orange[W] is correlated with both .hi-green[D] and .hi-purple[Y] .mono[-->] omitted variable bias .mono[-->] regression fails to isolate the causal effect of .hi-green[D] on .hi-purple[Y].

]

---
# Housekeeping

**MLK Jr. Day:** No class or office hours on Monday the 17th.

**Pre-recorded lecture** for Wednesday the 19th.

- I will try to post it sometime next week.
- In the meantime, enjoy your weekend!

**Assigned reading for next week:** [Snapping back: Food stamp bans and criminal recidivism](https://www.aeaweb.org/articles?id=10.1257/pol.20170490) by Cody Tuttle (2019).

- Best to read it *after* you watch next week's lecture.
- Reading Quiz 3 due the following week (Monday the 24th).

**Problem Set 1** due on Friday the 21st by 11:59pm.

- Covers everything though next Wednesday.