Classical Assumptions

class: center, middle, inverse, title-slide

# Classical Assumptions
## EC 320: Introduction to Econometrics
### Kyle Raze
### Fall 2019

---

class: inverse, middle

# Prologue

---
# Housekeeping

Lab Attendance

Problem Set 2 grades

Problem Set 3

- Due this Wednesday by 17:00.
- For computational problems, submit an HTML document generated by R Markdown.

---
# Agenda

## Last Week

How does OLS estimate a regression line?

- Minimize RSS.

What are the direct consequences of minimizing RSS?

- Residuals sum to zero. 
- Residuals and the explanatory variable `$X$` are uncorrelated.
- Mean values of `$X$` and `$Y$` are on the fitted regression line.

Whatever do we mean by *goodness of fit*?

- What information does `$R^2$` convey?

---
# Agenda

## Today

Under what conditions is OLS *desirable*?

- **Desired properties:** Unbiasedness, efficiency, and ability to conduct hypothesis tests.
- **Cost:** Six .hi-green[classical assumptions] about the population relationship and the sample.

---
# Returns to Schooling

__Policy Question:__ How much should the state subsidize higher education?

- Could higher education subsidies increase future tax revenue?
- Could targeted subsidies reduce income inequality and racial wealth gaps?
- Are there positive externalities associated with higher education?

__Empirical Question:__ What is the monetary return to an additional year of education?

- Focuses on the private benefits of education. Not the only important question!
- Useful for learning about the econometric assumptions that allow causal interpretation.

---
# Returns to Schooling

**Step 1:** Write down the population model.

`$$\log(\text{Earnings}_i) = \beta_1 + \beta_2\text{Education}_i + u_i$$`

**Step 2:** Find data.

- *Source:* Blackburn and Neumark (1992).

**Step 3:** Run a regression using OLS.

`$$\log(\hat{\text{Earnings}_i}) = \hat{\beta}_1 + \hat{\beta}_2\text{Education}_i$$`

---
# Returns to Schooling

`$\log(\hat{\text{Earnings}_i})$` `$=$` .hi-purple[5.97] `$+$` .hi-purple[0.06] `$\times$` `$\text{Education}_i$`.

---
# Returns to Schooling

Additional year of school associated with a .hi-purple[6%] increase in earnings.

---
# Returns to Schooling

`$R^2$` `$=$` .hi-purple[0.097].

---
# Returns to Schooling

Education explains .hi-purple[9.7%] of the variation in wages.

---
# Returns to Schooling

What must we __assume__ to interpret `$\hat{\beta}_2$` `$=$` .hi-purple[0.06] as the return to schooling?

---
# Residuals *vs.* Errors

The most important assumptions concern the error term `$u_i$`.

**Important:** An error `$u_i$` and a residual `$\hat{u}_i$` are related, but different.

- .hi-green[Error:] Difference between the wage of a worker with 16 years of education and the .hi-green[expected wage] with 16 years of education.

- .hi-purple[Residual:] Difference between the wage of a worker with 16 years of education and the .hi-purple[average wage] of workers with 16 years of education.

- .hi-green[Population] ***vs.*** .hi-purple[sample]**.**

---
# Residuals *vs.* Errors

A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education.

---
# Residuals *vs.* Errors

A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education.

---
# Residuals *vs.* Errors

An .hi-orange[error] tells us how a .hi-slate[worker]'s wages compare to the expected wages of workers in the .hi-green[population] with the same level of education.

---
class: inverse, middle

# Classical Assumptions

---
# Classical Assumptions of OLS

1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term.

2. **Sample Variation:** There is variation in `$X$`.

3. **Random Sampling:** We have a random sample from the population of interest.

4. **Exogeneity:** The `$X$` variable is .hi[exogenous] (*i.e.,* `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`).

5. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `$\mathop{\text{Var}}(u|X) = \sigma^2$`).

6. **Normality:** The population error term is normally distributed with mean zero and variance `$\sigma^2$` (*i.e.,* `$u \sim N(0,\sigma^2)$`)

---
class: inverse, middle

# When Can We Trust OLS?

---
# Bias

An estimator is __biased__ if its expected value is different from the true population parameter.

.pull-left[

**Unbiased estimator:** `$\mathop{\mathbb{E}}\left[ \hat{\beta} \right] = \beta$`

]

.pull-right[

**Biased estimator:** `$\mathop{\mathbb{E}}\left[ \hat{\beta} \right] \neq \beta$`

]

---
# When is OLS Unbiased?

## Assumptions

1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term.

2. **Sample Variation:** There is variation in `$X$`.

3. **Random Sampling:** We have a random sample from the population of interest.

4. **Exogeneity:** The `$X$` variable is .hi[exogenous] (*i.e.,* `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`).

## Result

OLS is unbiased.

---
# Linearity

## Assumption

The population relationship is __linear in parameters__ with an additive error term.

## Examples

- `$\text{Wage}_i = \beta_1 + \beta_2 \text{Experience}_i + u_i$`

- `$\log(\text{Happiness}_i) = \beta_1 + \beta_2 \log(\text{Money}_i) + u_i$`

- `$\sqrt{\text{Convictions}_i} = \beta_1 + \beta_2 (\text{Early Childhood Lead Exposure})_i + u_i$`

- `$\log(\text{Earnings}_i) = \beta_1 + \beta_2 \text{Education}_i + u_i$`

---
# Linearity

## Assumption

The population relationship is __linear in parameters__ with an additive error term.

## Violations

- `$\text{Wage}_i = (\beta_1 + \beta_2 \text{Experience}_i)u_i$`

- `$\text{Consumption}_i = \frac{1}{\beta_1 + \beta_2 \text{Income}_i} + u_i$`

- `$\text{Population}_i = \frac{\beta_1}{1 + e^{\beta_2 + \beta_3 \text{Food}_i}} + u_i$`

- `$\text{Batting Average}_i = \beta_1 (\text{Wheaties Consumption})_i^{\beta_2} + u_i$`

---
# Sample Variation

## Assumption

There is variation in `$X$`.

## Example

---
# Sample Variation

## Assumption

There is variation in `$X$`.

## Violation

---
# Random Sampling

## Assumption

We have a random sample from the population of interest.

## Examples

Random sampling generates many cross-sectional datasets (especially surveys).

- Government surveys (*e.g.,* Current Population Survey, American Community Survey).
- Scientific surveys (*e.g.,* General Social Survey, American National Election Study).
- High-quality political polls (*e.g.,* YouGov, Quinnipiac University, Gallup).

---
# Random Sampling

## Assumption

We have a random sample from the population of interest.

## Violations

- Data collected from non-probability sampling (*e.g.* snowball sampling).
- Most (all?) time-series data.
- Self-selected samples.

---
# Exogeneity

## Assumption

The `$X$` variable is __exogenous:__ `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`.

- For _any_ value of `$X$`, the mean of the error term is zero.

.hi[The most important assumption!]

Really two assumptions bundled into one:

1. On average, the error term is zero: `$\mathop{\mathbb{E}}\left(u\right) = 0$`.

2. The mean of the error term is the same for each value of `$X$`: `$\mathop{\mathbb{E}}\left( u|X \right) = \mathop{\mathbb{E}}\left(u\right)$`.

---
# Exogeneity

## Assumption

The `$X$` variable is __exogenous:__ `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`.

- The assignment of `$X$` is effectively random.
- **Implication:** .hi-purple[no selection bias] and .hi-green[no omitted-variable bias].

## Examples

In the labor market, an important component of `$u$` is unobserved ability.

- `$\mathop{\mathbb{E}}\left( u|\text{Education} = 12 \right) = 0$` and `$\mathop{\mathbb{E}}\left( u|\text{Education} = 20 \right) = 0$`.
- `$\mathop{\mathbb{E}}\left( u|\text{Experience} = 0 \right) = 0$` and `$\mathop{\mathbb{E}}\left( u|\text{Experience} = 40 \right) = 0$`.
- **Do you believe this?**

---
layout: false
class: white-slide, middle

Graphically...
---
exclude: true

---
class: white-slide

Valid exogeneity, _i.e._, `$\mathop{\mathbb{E}}\left( u \mid X \right) = 0$`

<img src="09-Classical_Assumptions_files/figure-html/ex_good_exog-1.svg" style="display: block; margin: auto;" />
---
class: white-slide

Invalid exogeneity, _i.e._, `$\mathop{\mathbb{E}}\left( u \mid X \right) \neq 0$`

---
class: inverse, middle

# Variance Matters, Too

---
# Why Variance Matters

Unbiasedness tells us that OLS gets it right, _on average_.

- But we can't tell whether our sample is "typical."

**Variance** tells us how far OLS can deviate from the population mean.

- How tight is OLS centered on its expected value?

The smaller the variance, the closer OLS gets to the true population parameters _on any sample_.

- Given two unbiased estimators, we want the one with smaller variance.

---
# OLS Variance

To calculate the variance of OLS, we need:

1. The same four assumptions we made for unbiasedness.

2. __Homoskedasticity.__

---
# Homoskedasticity

## Assumption

The error term has the same variance for each value of the independent variable:

`$$\mathop{\text{Var}}(u|X) = \sigma^2.$$`

## Example

---
# Homoskedasticity

## Assumption

The error term has the same variance for each value of the independent variable:

`$$\mathop{\text{Var}}(u|X) = \sigma^2.$$`

## Violation: Heteroskedasticity

---
count: false

# Homoskedasticity

## Assumption

The error term has the same variance for each value of the independent variable:

`$$\mathop{\text{Var}}(u|X) = \sigma^2.$$`

## Violation: Heteroskedasticity

---
# OLS Variance

Variance of the slope estimator:

`$$\mathop{\text{Var}}(\hat{\beta}_2) = \frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}.$$`

- As the error variance increases, the variance of the slope estimator increases.

- As the variation in `$X$` increases, the variance of the slope estimator decreases.

- Larger sample sizes exhibit more variation in `$X \implies \mathop{\text{Var}}(\hat{\beta}_2)$` falls as `$n$` rises.

---
class: inverse, middle

# Gauss-Markov

---
# Gauss-Markov Theorem

OLS is the __Best Linear Unbiased Estimator (BLUE)__ when:

1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term.

2. **Sample Variation:** There is variation in `$X$`.

3. **Random Sampling:** We have a random sample from the population of interest.

4. **Exogeneity:** The `$X$` variable is .hi[exogenous] (*i.e.,* `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`).

5. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `$\mathop{\text{Var}}(u|X) = \sigma^2$`).

---
class: middle

# Gauss-Markov Theorem

OLS is the __Best Linear Unbiased Estimator (BLUE)__

---
class: inverse, middle

# Population *vs.* Sample, Revisited

---
layout: true

# Population *vs.* Sample

**Question:** Why do we care about *population vs. sample*?

---

.pull-left[

.center[**Population**]

]

.pull-right[

.center[**Population relationship**]

$$ y_i = 2.53 + 0.57 x_i + u_i $$

$$ y_i = \beta_1 + \beta_2 x_i + u_i $$

]

---

.pull-left[

.center[**Sample 1:** 30 random individuals]

]

.pull-right[

.center[

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 1.36 + 0.76 x_i$`

]

---
count: false

.pull-left[

.center[**Sample 2:** 30 random individuals]

]

.pull-right[

.center[

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 3.53 + 0.34 x_i$`

]

]
---
count: false

.pull-left[

.center[**Sample 3:** 30 random individuals]

]

.pull-right[

.center[

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 1.44 + 0.86 x_i$`

]

---
layout: false
class: white-slide, middle

Repeat **10,000 times** (Monte Carlo simulation).

---
layout: true
class: white-slide

---

---
layout: true
# Population *vs.* Sample

**Question:** Why do we care about *population vs. sample*?

---

.pull-left[
<img src="09-Classical_Assumptions_files/figure-html/simulation scatter2-1.png" style="display: block; margin: auto;" />
]

.pull-right[

- On **average**, the regression lines match the population line nicely.

- However, **individual lines** (samples) can miss the mark.

- Differences between individual samples and the population create **uncertainty**.

]

---

**Answer:** Uncertainty matters.

`$\hat{\beta}_1$` and `$\hat{\beta}_2$` are random variables that depend on the random sample.

We can't tell if we have a "good" sample (similar to the population) or a "bad sample" (very different than the population).

Next time, we will leverage all six classical assumptions, including **normality**, to conduct hypothesis tests.