Classical Assumptions

# Classical Assumptions
## EC 320: Introduction to Econometrics
### Winter 2022

---

# Prologue

---
# Housekeeping

- Problem Set 02 due today by 11:59 pm on Canvas
- The solution to the problem set will be released on Wednesday.
- Midterm grade appeal until tomorrow. Solution posted tomorrow. No appeals will be addressed after the solution is being posted.

---
# Agenda

## Last Week

How does OLS estimate a regression line?

- Minimize RSS.

What are the direct consequences of minimizing RSS?

- Residuals sum to zero. 
- Residuals and the explanatory variable `$X$` are uncorrelated.
- Mean values of `$X$` and `$Y$` are on the fitted regression line.

Whatever do we mean by *goodness of fit*?

- What information does `$R^2$` convey? "The proportion of the variance explained by the regression line"

---
# Agenda

## Today

Under what conditions is OLS *desirable*?

- **Desired properties:** Unbiasedness, efficiency, and ability to conduct hypothesis tests.
- **Cost:** Six .hi-green[classical assumptions] about the population relationship and the sample.

---
# Returns to Schooling

__Policy Question:__ How much should the state subsidize higher education?

- Could higher education subsidies increase future tax revenue?
- Could targeted subsidies reduce income inequality and racial wealth gaps?
- Are there positive externalities associated with higher education?

__Empirical Question:__ What is the monetary return to an additional year of education?

- Focuses on the private benefits of education. Not the only important question!
- Useful for learning about the econometric assumptions that allow causal interpretation.

---
# Returns to Schooling

**Step 1:** Write down the population model.

`$$\log(\text{Earnings}_i) = \beta_0 + \beta_1\text{Education}_i + u_i$$`

**Step 2:** Find data.

- *Source:* Blackburn and Neumark (1992).

**Step 3:** Run a regression using OLS.

`$$\log(\hat{\text{Earnings}_i}) = \hat{\beta}_0 + \hat{\beta}_1\text{Education}_i$$`

---
# Returns to Schooling

`$\log(\hat{\text{Earnings}_i})$` `$=$` .hi-purple[5.97] `$+$` .hi-purple[0.06] `$\times$` `$\text{Education}_i$`.

---
# Returns to Schooling

Additional year of school associated with a .hi-purple[6%] increase in earnings.

---
# Returns to Schooling

`$R^2$` `$=$` .hi-purple[0.097].

---
# Returns to Schooling

Education explains .hi-purple[9.7%] of the variation in wages.

---
# Returns to Schooling

What must we __assume__ to interpret `$\hat{\beta}_1$` `$=$` .hi-purple[0.06] as the return to schooling?

---
# Residuals *vs.* Errors

The most important assumptions concern the error term `$u_i$`.

**Important:** An error `$u_i$` and a residual `$\hat{u}_i$` are related, but different.

.pull-left[
  .hi-green[Population]
  - `$Y_i = \beta_0 + \beta_1 X_i + u_i$`
  - .hi-green[Error:] Difference between the wage of a worker with 16 years of education and the .hi-green[expected wage] with 16 years of education.
]

.pull-right[
.hi-purple[Sample]
- `$Y_i = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{u_i}$`
- .hi-purple[Residual:] Difference between the wage of a worker with 16 years of education and the .hi-purple[average wage] of workers with 16 years of education.
]

---
# Residuals *vs.* Errors

A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education.

---
# Residuals *vs.* Errors

A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education.

---
# Residuals *vs.* Errors

An .hi-orange[error] tells us how a .hi-slate[worker]'s wages compare to the expected wages of workers in the .hi-green[population] with the same level of education.

---
class: inverse, middle

# Classical Assumptions

---
# Classical Assumptions of OLS

- **A1. Linearity:** The population relationship is .hi[linear in parameters] with an additive error term.
- **A2. Sample Variation:** There is variation in `$X$`.
- **A3. Exogeneity:** The `$X$` variable is .hi[exogenous] (*i.e.,* `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`).pink[†] 
- **A4. Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `$\mathop{\text{Var}}(u|X) = \sigma^2$`).
- **A5. Non-autocorrelation:** The values of error terms are independent from one another (*i.e.,* `$E[u_i u_j]=0, \forall i \text{ s.t. } i \neq j$`)..pink[†] 
- **A6. Normality:** The population error term is normally distributed with mean zero and variance `$\sigma^2$` (*i.e.,* `$u \sim N(0,\sigma^2)$`)

.footnote[
.pink[†] **Random Sampling:** Notice up until now, our underlying data type used for analysis was cross-sectional data. Under random sampling, this yields `$u_i$` and `$u_j$` independent for any two observations `$i$` and `$j$`, which is what non-autocorrelation assumption implies. However here I explicitly state **non-autocorrelation** to 1) generalize this case to account for a case we use time series data, 2) to be consistent with the notations from the textbook. Also it could be shown that the errors for different observations are independent conditional on the explanatory variables in the cross-sectional sample under random sampling. 
]

---
class: inverse, middle

# When Can We Trust OLS?

---
# Bias

An estimator is __biased__ if its expected value is different from the true population parameter.

**Unbiased estimator:** `$\mathop{\mathbb{E}}\left[ \hat{\beta} \right] = \beta$`

]

**Biased estimator:** `$\mathop{\mathbb{E}}\left[ \hat{\beta} \right] \neq \beta$`

]

---
# When is OLS Unbiased?

## Assumptions

**A1. Linearity:** The population relationship is .hi[linear in parameters] with an additive error term.

**A2. Sample Variation:** There is variation in `$X$`.

**A3. Exogeneity:** The `$X$` variable is .hi[exogenous] (*i.e.,* `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`). .pink[†]

.footnote[
.pink[†] Should A3 be true, random sampling should be performed in cross sectional data, meaning that A3 implies random sampling in a sense.
]

## Result

OLS is unbiased.

---
# Linearity

## Assumption

The population relationship is __linear in parameters__ with an additive error term.

## Examples

- `$\text{Wage}_i = \beta_0 + \beta_1 \text{Experience}_i + u_i$`

- `$\log(\text{Happiness}_i) = \beta_0 + \beta_1 \log(\text{Money}_i) + u_i$`

- `$\sqrt{\text{Convictions}_i} = \beta_0 + \beta_1 (\text{Early Childhood Lead Exposure})_i + u_i$`

- `$\log(\text{Earnings}_i) = \beta_0 + \beta_1 \text{Education}_i + u_i$`

---
# Linearity

## Assumption

The population relationship is __linear in parameters__ with an additive error term.

## Violations

- `$\text{Wage}_i = (\beta_0 + \beta_1 \text{Experience}_i)u_i$`

- `$\text{Consumption}_i = \frac{1}{\beta_0 + \beta_1 \text{Income}_i} + u_i$`

- `$\text{Population}_i = \frac{\beta_0}{1 + e^{\beta_1 + \beta_3 \text{Food}_i}} + u_i$`

- `$\text{Batting Average}_i = \beta_0 (\text{Wheaties Consumption})_i^{\beta_1} + u_i$`

---
# Sample Variation

## Assumption

There is variation in `$X$`.

## Example

---
# Sample Variation

## Assumption

There is variation in `$X$`.

## Violation

---
# Exogeneity

## Assumption

The `$X$` variable is __exogenous:__ `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`.

- For _any_ value of `$X$`, the mean of the error term is zero.

Really two assumptions bundled into one:

1. On average, the error term is zero: `$\mathop{\mathbb{E}}\left(u\right) = 0$`.

2. The mean of the error term is the same for each value of `$X$`: `$\mathop{\mathbb{E}}\left( u|X \right) = \mathop{\mathbb{E}}\left(u\right)$`.

---
# Exogeneity

## Assumption

The `$X$` variable is __exogenous:__ `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`.

- The assignment of `$X$` is effectively random.
- **Implication:** .hi-purple[no selection bias] and .hi-green[no omitted-variable bias].

## Examples

In the labor market, an important component of `$u$` is unobserved ability.

- `$\mathop{\mathbb{E}}\left( u|\text{Education} = 12 \right) = 0$` and `$\mathop{\mathbb{E}}\left( u|\text{Education} = 20 \right) = 0$`.
- `$\mathop{\mathbb{E}}\left( u|\text{Experience} = 0 \right) = 0$` and `$\mathop{\mathbb{E}}\left( u|\text{Experience} = 40 \right) = 0$`.
- **Do you believe this?**

---
layout: false
class: white-slide, middle

Graphically...
---
exclude: true

---
class: white-slide

Valid exogeneity, _i.e._, `$\mathop{\mathbb{E}}\left( u \mid X \right) = 0$`

<img src="09-Classical_Assumptions_files/figure-html/ex_good_exog-1.svg" style="display: block; margin: auto;" />
---
class: white-slide

Invalid exogeneity, _i.e._, `$\mathop{\mathbb{E}}\left( u \mid X \right) \neq 0$`

---
class: inverse, middle

# Variance Matters, Too

---
# Why Variance Matters

Unbiasedness tells us that OLS gets it right, _on average_.

- But we can't tell whether our sample is "typical."

**Variance** tells us how far OLS can deviate from the population parameter.

- How tight is OLS centered on its expected value?

The smaller the variance, the closer OLS gets to the true population parameters _on any sample_.

- Given two unbiased estimators, we want the one with smaller variance.

---
# OLS Variance

To calculate the variance of OLS, we need:

1. The same four assumptions we made for unbiasedness.

2. __Homoskedasticity__

3. __Non-autocorrelation__

---
# Homoskedasticity

## Assumption

The error term has the same variance for each value of the independent variable:

`$$\mathop{\text{Var}}(u|X) = \sigma^2.$$`

## Example

---
# Homoskedasticity

## Assumption

The error term has the same variance for each value of the independent variable:

`$$\mathop{\text{Var}}(u|X) = \sigma^2.$$`

## Violation: Heteroskedasticity

---
count: false

# Homoskedasticity

## Assumption

The error term has the same variance for each value of the independent variable:

`$$\mathop{\text{Var}}(u|X) = \sigma^2.$$`

## Violation: Heteroskedasticity

---

# Non-Autocorrelation

## Assumption

The population covariance between `$u_i$` and `$u_j$` is zero, meaning that any individual's error term is drawn independently of other error terms.

`$$\mathop{\text{Cov}}(u_i, u_j) = E[(u_i - \mu_u)(u_j - \mu_u)]\\
                                = E[u_i u_j] = E[u_i] E[u_j]  = 0, \text{where } i \neq j$$`

- .small[This implies no systematic association between error term values for any pair of individuals. The values of the error term should be independent of one another. If this assumption is not satisfied, OLS gives inefficient estimates.]
- .small[**Example:** The magnitude and the sign of disturbance term in one observation should not lead to a tendency of determining magnitude and the sign of the disturbance term of the other. ]
- .small[**Violation:** Errors that are correlated with time (autocorrelation)]

---
# OLS Variance

Variance of the slope estimator:

`$$\mathop{\text{Var}}(\hat{\beta}_1) = \frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}.$$`

- As the error variance increases, the variance of the slope estimator increases.

- As the variation in `$X$` increases, the variance of the slope estimator decreases.

- Larger sample sizes exhibit more variation in `$X \implies \mathop{\text{Var}}(\hat{\beta}_1)$` falls as `$n$` rises.

---
class: inverse, middle

# Gauss-Markov

---
# Gauss-Markov Theorem

OLS is the __Best Linear Unbiased Estimator__ under assumptions A1-A5:

- **A1. Linearity:** The population relationship is .hi[linear in parameters] with an additive error term.

- **A2. Sample Variation:** There is variation in `$X$`.

- **A3. Exogeneity:** The `$X$` variable is .hi[exogenous] (*i.e.,* `$\mathop{\mathbb{E}}\left( u|X \right) = 0$`).

- **A4. Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `$\mathop{\text{Var}}(u|X) = \sigma^2$`).

- **A5. Non-Autocorrelation:** Any pair of error terms are drawn independently of each other (*i.e.,* `$\mathop{\text{E}}(u_i u_j) = 0 \ \forall \ i \text{ s.t. } i \neq j$`)

---

# Gauss-Markov Theorem

OLS is the __Best Linear Unbiased Estimator (BLUE)__

---
class: inverse, middle

# Population *vs.* Sample, Revisited

---
layout: true

# Population *vs.* Sample

**Question:** Why do we care about *population vs. sample*?

---

]

$$ y_i = 2.53 + 0.57 x_i + u_i $$

$$ y_i = \beta_0 + \beta_1 x_i + u_i $$

]

---

]

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 2.36 + 0.61 x_i$`

]

---
count: false

]

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 2.79 + 0.56 x_i$`

]

]
---
count: false

]

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 3.21 + 0.45 x_i$`

]

---
layout: false
class: white-slide, middle

Repeat **10,000 times** (Monte Carlo simulation).

---
layout: true
class: white-slide

---

---
layout: true
# Population *vs.* Sample

**Question:** Why do we care about *population vs. sample*?

---

.pull-left[
<img src="09-Classical_Assumptions_files/figure-html/simulation scatter2-1.png" style="display: block; margin: auto;" />
]

- On **average**, the regression lines match the population line nicely.

- However, **individual lines** (samples) can miss the mark.

- Differences between individual samples and the population create **uncertainty**.

]

---

**Answer:** Uncertainty matters.

`$\hat{\beta}_0$` and `$\hat{\beta}_1$` are random variables that depend on the random sample.

We can't tell if we have a "good" sample (similar to the population) or a "bad sample" (very different than the population).

Next time, we will leverage all six classical assumptions, including **normality**, to conduct hypothesis tests.