Classical Assumptions

# Classical Assumptions
## EC 320: Introduction to Econometrics
### Philip Economides
### Winter 2022

---

# Prologue

---

# Housekeeping

Survey results in:

- Assignments (x2), more room between lab and HW

- Lecture slides (x1), upload Sunday nights max

Updates

- .hi-pink[Problem Set 3:] due by Wednesday 11:59pm

- This lecture last one relevant to .hi-pink[Midterm exam]

- Revise material, have questions ready for .hi-pink[review session]

---
# Agenda

## Last Week

How does OLS estimate a regression line?

- .hi-pink[Minimize RSS].

What are the direct consequences of minimizing RSS?

- Residuals sum to zero. 
- Residuals and the explanatory variable `$X$` are uncorrelated.
- Mean values of `$X$` and `$Y$` are on the fitted regression line.

Whatever do we mean by *goodness of fit*?

- What information does `$R^2$` convey?

---
# Agenda

## Today

Under what conditions is OLS *desirable*?

- **Desired properties:** Unbiasedness, efficiency, and ability to conduct hypothesis tests.
- **Cost:** Six .hi-green[classical assumptions] about the population relationship and the sample.

---
# Returns to Schooling

__Policy Question:__ How much should the state subsidize higher education?

- Could higher education subsidies increase future tax revenue?
- Could targeted subsidies reduce income inequality and racial wealth gaps?
- Are there positive externalities associated with higher education?

__Empirical Question:__ What is the monetary return to an additional year of education?

- Focuses on the private benefits of education. Not the only important question!
- Useful for learning about the econometric assumptions that allow causal interpretation.

---
# Returns to Schooling

**Step 1:** Write down the population model.

`$$\log(\text{Earnings}_i) = \beta_1 + \beta_2\text{Education}_i + u_i$$`

**Step 2:** Find data.

- *Source:* [Blackburn and Neumark (1992)](https://econpapers.repec.org/article/oupqjecon/v_3a107_3ay_3a1992_3ai_3a4_3ap_3a1421-1436..htm).

**Step 3:** Run a regression using OLS.

`$$\log(\hat{\text{Earnings}_i}) = \hat{\beta}_1 + \hat{\beta}_2\text{Education}_i$$`

---
# Returns to Schooling

`$\log(\hat{\text{Earnings}_i})$` `$=$` .hi-purple[5.97] `$+$` .hi-purple[0.06] `$\times$` `$\text{Education}_i$`.

---
# Returns to Schooling

Additional year of school associated with a .hi-purple[6%] increase in earnings.

---
# Returns to Schooling

`$R^2$` `$=$` .hi-purple[0.097].

---
# Returns to Schooling

Education explains .hi-purple[9.7%] of the variation in wages.

---
# Returns to Schooling

What must we __assume__ to interpret `$\hat{\beta}_2$` `$=$` .hi-purple[0.06] as the return to schooling?

---
# Residuals *vs.* Errors

The most important assumptions concern the error term `$u_i$`.

**Important:** An error `$u_i$` and a residual `$\hat{u}_i$` are related, but different.

- .hi-green[Error:] Difference between the wage of a worker with 16 years of education and the .hi-green[expected wage] with 16 years of education.

- .hi-purple[Residual:] Difference between the wage of a worker with 16 years of education and the .hi-purple[average wage] of workers with 16 years of education.

- .hi-green[Population] ***vs.*** .hi-purple[sample]**.**

---
# Residuals *vs.* Errors

A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education.

---
# Residuals *vs.* Errors

A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education.

---
# Residuals *vs.* Errors

An .hi-orange[error] tells us how a .hi-slate[worker]'s wages compare to the expected wages of workers in the .hi-green[population] with the same level of education.

---
class: inverse, middle

# Classical Assumptions

---
# Classical Assumptions of OLS

1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term.

2. **Sample Variation:** There is variation in `$X$`.

3. **Exogeneity:** The `$X$` variable is .hi[exogenous] (*i.e.,* `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`)..pink[†]

4. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `$\mathop{\text{Var}}(u|X) = \sigma^2$`).

5. **Non-autocorrelation:** The values of error terms have independent distributions (*i.e.,* `$E[u_i u_j]=0, \forall i \text{ s.t. } i \neq j$`)

6. **Normality:** The population error term is normally distributed with mean zero and variance `$\sigma^2$` (*i.e.,* `$u \sim N(0,\sigma^2)$`)

.footnote[
.pink[†] Implies assumption of **Random Sampling:** We have a random sample from the population of interest.
]

---
class: inverse, middle

# When Can We Trust OLS?

---
# Bias

An estimator is __biased__ if its expected value is different from the true population parameter.

**Unbiased estimator:** `$\mathop{\mathbb{E}}\left[ \hat{\beta} \right] = \beta$`

]

**Biased estimator:** `$\mathop{\mathbb{E}}\left[ \hat{\beta} \right] \neq \beta$`

]

---
# When is OLS Unbiased?

## Required Assumptions

1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term.

2. **Sample Variation:** There is variation in `$X$`.

3. **Exogeneity:** The `$X$` variable is .hi[exogenous] (*i.e.,* `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`).

&#9755; (3) implies **Random Sampling**. Without, the internal validity of OLS uncompromised, but our external validity becomes uncertain..pink[†]

.footnote[
.pink[†] **Internal Validity:** relates to how well a study is conducted (does it satisfy OLS assumptions?). **External Validity:** relates to how applicable the findings are to the real world.
]

---

## Result

OLS is unbiased.

---
# Linearity (A1.)

## Assumption

The population relationship is __linear in parameters__ with an additive error term.

## Examples

- `$\text{Wage}_i = \beta_1 + \beta_2 \text{Experience}_i + u_i$`

- `$\log(\text{Happiness}_i) = \beta_1 + \beta_2 \log(\text{Money}_i) + u_i$`

- `$\sqrt{\text{Convictions}_i} = \beta_1 + \beta_2 (\text{Early Childhood Lead Exposure})_i + u_i$`

- `$\log(\text{Earnings}_i) = \beta_1 + \beta_2 \text{Education}_i + u_i$`

---
# Linearity (A1.)

## Assumption

The population relationship is __linear in parameters__ with an additive error term.

## Violations

- `$\text{Wage}_i = (\beta_1 + \beta_2 \text{Experience}_i)u_i$`

- `$\text{Consumption}_i = \frac{1}{\beta_1 + \beta_2 \text{Income}_i} + u_i$`

- `$\text{Population}_i = \frac{\beta_1}{1 + e^{\beta_2 + \beta_3 \text{Food}_i}} + u_i$`

- `$\text{Batting Average}_i = \beta_1 (\text{Wheaties Consumption})_i^{\beta_2} + u_i$`

---
# Sample Variation (A2.)

## Assumption

There is variation in `$X$`.

## Example

---
# Sample Variation (A2.)

## Assumption

There is variation in `$X$`.

## Violation

---
# Exogeneity (A3.)

## Assumption

The `$X$` variable is __exogenous:__ `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`.

- For _any_ value of `$X$`, the mean of the error term is zero.

Really two assumptions bundled into one:

1. On average, the error term is zero: `$\mathop{\mathbb{E}}\left(u\right) = 0$`.

2. The mean of the error term is the same for each value of `$X$`: `$\mathop{\mathbb{E}}\left( u|X \right) = \mathop{\mathbb{E}}\left(u\right)$`.

---
# Exogeneity (A3.)

## Assumption

The `$X$` variable is __exogenous:__ `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`.

- The assignment of `$X$` is effectively random.
- **Implication:** .hi-purple[no selection bias] and .hi-green[no omitted-variable bias].

## Examples

In the labor market, an important component of `$u$` is unobserved ability.

- `$\mathop{\mathbb{E}}\left( u|\text{Education} = 12 \right) = 0$` and `$\mathop{\mathbb{E}}\left( u|\text{Education} = 20 \right) = 0$`.
- `$\mathop{\mathbb{E}}\left( u|\text{Experience} = 0 \right) = 0$` and `$\mathop{\mathbb{E}}\left( u|\text{Experience} = 40 \right) = 0$`.
- **Do you believe this?**

---
layout: false
class: white-slide, middle

Graphically...
---
exclude: true

---
class: white-slide

Valid exogeneity, _i.e._, `$\mathop{\mathbb{E}}\left( u \mid X \right) = 0$`

<img src="08-Classical_Assumptions_files/figure-html/ex_good_exog-1.svg" style="display: block; margin: auto;" />
---
class: white-slide

Invalid exogeneity, _i.e._, `$\mathop{\mathbb{E}}\left( u \mid X \right) \neq 0$`

---
class: inverse, middle

# Variance Matters, Too

---
# Why Variance Matters

Unbiasedness tells us that OLS gets it right, _on average_.

- But we can't tell whether our sample is "typical."

**Variance** tells us how far OLS can deviate from the population mean.

- How tight is OLS centered on its expected value?

- This determines the .hi-pink[efficiency] of our estimator.

The smaller the variance, the closer OLS gets, **on average**, to the true population parameters _on any sample_.

- Given two unbiased estimators, we want the one with smaller variance.

- If (A4.) and (A5.) are satisfied as well, we are using the .hi-pink[most efficient] linear estimator.

---
# OLS Variance

To calculate the variance of OLS, we need:

1. The same four assumptions we made for unbiasedness.

2. __Homoskedasticity.__

3. __Non-autocorrelation__

---

# Homoskedasticity (A4.)

## Assumption

The error term has the same variance for each value of the independent variable:

`$$\mathop{\text{Var}}(u|X) = \sigma^2.$$`

## Example

---
# Homoskedasticity (A4.)

## Assumption

The error term has the same variance for each value of the independent variable:

`$$\mathop{\text{Var}}(u|X) = \sigma^2.$$`

## Violation: Heteroskedasticity

---
count: false

# Homoskedasticity (A4.)

## Assumption

The error term has the same variance for each value of the independent variable:

`$$\mathop{\text{Var}}(u|X) = \sigma^2$$`

## Violation: Heteroskedasticity

---

# Non-Autocorrelation

## Assumption

Any individual's error term is drawn independently of other error terms.

`$$\mathop{\text{Cov}}(u_i, u_j) = E[(u_i - \mu_u)(u_j - \mu_u)]\\
                                = E[u_i u_j] = E[u_i] E[u_j]  = 0, \text{where } i \neq j$$`

- This implies no systematic association between error term values for any pair of individuals

- In practice, there is always some correlatio in unobservables across individuals (e.g. common correlation in unobservables among individuals within a given US state)

- Referred to as .hi-pink[clustering] problem. Standard errors can be adjusted to address

---

# OLS Variance

Variance of the slope estimator:

`$$\mathop{\text{Var}}(\hat{\beta}_2) = \frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}.$$`

- As the error variance increases, the variance of the slope estimator increases.

- As the variation in `$X$` increases, the variance of the slope estimator decreases.

- Larger sample sizes exhibit more variation in `$X \implies \mathop{\text{Var}}(\hat{\beta}_2)$` falls as `$n$` rises.

---
class: inverse, middle

# Gauss-Markov

---
# Gauss-Markov Theorem

OLS is the __Best Linear Unbiased Estimator (BLUE)__ when:

1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term.

2. **Sample Variation:** There is variation in `$X$`.

3. **Exogeneity:** The `$X$` variable is .hi[exogenous] (*i.e.,* `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`).

4. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `$\mathop{\text{Var}}(u|X) = \sigma^2$`).

5. **Non-Autocorrelation:** Any pair of error terms are drawn independently of eachother (*i.e.,* `$\mathop{\text{E}}(u_i u_j) = 0 \ \forall \ i \text{ s.t. } i \neq j$`)

---
class: middle

# Gauss-Markov Theorem

OLS is the __Best Linear Unbiased Estimator (BLUE)__

---
class: inverse, middle

# Population *vs.* Sample, Revisited

---
layout: true

# Population *vs.* Sample

**Question:** Why do we care about *population vs. sample*?

---

]

$$ y_i = 2.53 + 0.57 x_i + u_i $$

$$ y_i = \beta_1 + \beta_2 x_i + u_i $$

]

---

]

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 2.36 + 0.61 x_i$`

]

---
count: false

]

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 2.79 + 0.56 x_i$`

]

]
---
count: false

]

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 3.21 + 0.45 x_i$`

]

---
layout: false
class: white-slide, middle

Repeat **10,000 times** (Monte Carlo simulation).

---
layout: true
class: white-slide

---

---
layout: true
# Population *vs.* Sample

**Question:** Why do we care about *population vs. sample*?

---

.pull-left[
<img src="08-Classical_Assumptions_files/figure-html/simulation scatter2-1.png" style="display: block; margin: auto;" />
]

- On **average**, the regression lines match the population line nicely.

- However, **individual lines** (samples) can miss the mark.

- Differences between individual samples and the population create **uncertainty**.

]

---

**Answer:** Uncertainty matters.

`$\hat{\beta}_1$` and `$\hat{\beta}_2$` are random variables that depend on the random sample.

We can't tell if we have a "good" sample (similar to the population) or a "bad sample" (very different than the population).

Next time, we will leverage all six classical assumptions, including **normality**, to conduct hypothesis tests.

---