.b[The Classical Linear Regression Model]

.title[
# .b[The Classical Linear Regression Model]
]
.subtitle[
## .b[.green[EC 339]]
]
.author[
### Marcio Santetti
]
.date[
### Fall 2022
]

---

# Motivation

---

# OLS works, but it needs assumptions

- The goal when using OLS is to obtain .b[unbiased], .b[efficient], and .b[consistent] estimators.

- Moreover, we want to be able to do .b[hypothesis testing].

- All these properties are made possible through .b[7 assumptions].

- This set of assumptions is known as the .b[Classical Linear Regression Model] (CLRM).

---
class: inverse, middle

# The Classical Assumptions

---

# The set of Classical Assumptions

**1**. The regression model is .b[linear], .b[correctly specified], and has an .b[additive] stochastic error
term.

**2**. The stochastic error term `$(u_i)$` has a .b[zero] population mean.

**3**. All explanatory variables `$(x_i)$` are .b[uncorrelated] with the error term.

**4**. Observations of the error term are .b[uncorrelated] with each other.

**5**. The error term has a .b[constant variance].

**6**. No explanatory variable is a .b[perfect linear function] of any other explanatory variable.

**7**. The error term is .b[normally distributed].

---

# Assumption 1

> "*The regression model is .b[linear], .b[correctly specified], and has an .b[additive] stochastic error
term.*"

- .it[Linear] means linear in .b[parameters] `$(\beta_i)$`;
  
  - .it[Correctly specified] means that it has the correct .b[functional form] and .b[no] omitted variables.
  
  - And an .b[additive] error term implies .b[no] other form in which `$u_i$` appears in a model.
  
--

- **Examples**:

$$
`\begin{align}
y_i = \beta_0 \beta_1x_{1i} + \beta_2x_{2i} + u_i
\end{align}`
$$
$$
`\begin{align}
y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i}u_i
\end{align}`
$$
$$
`\begin{align}
y_i = \beta_0 + log(\beta_1)x_{1i} + \beta_2x_{2i} + u_i
\end{align}`
$$

---

# Assumption 1

One of the main reasons for a .it[violation] of CLRM Assumption I is an .b[incorrectly specified] model.

- This may happen due to

- Incorrect .b[functional form] (data visualization matters!);

- .b[Omitted] variables (leading to omitted variables bias).
  
--

A regression's error term may sometimes be a .b[black box].

- Recall that any potentially omitted variable(s) lie(s) there!

Therefore, our models must have a .b[theoretical] motivation.

---

# What is bias?

An estimator is .b[biased] if its expected value is different from the *true* population parameter.

When considering our slope coefficients `$(\hat{\beta}_i)$`, we expect that they, on average, are close to the .b["true"] population parameter, `$\beta_{pop}$`.

**Unbiased:** `$\mathop{\mathbb{E}}\left[ \hat{\beta}_{OLS} \right] = \beta_{pop}$`

]

**Biased:** `$\mathop{\mathbb{E}}\left[ \hat{\beta}_{OLS} \right] \neq \beta_{pop}$`

]

---

# Assumption 2

> *"The stochastic error term `$(u_i)$` has a .b[zero] population mean."*

- Values of the stochastic error term are defined by .b[pure chance].
  
  - It follows a probability .b[distribution] centered around zero.
  
  - Also known as the .b[exogeneity] assumption.
  
--

From standard Microeconomic theory, recall:

- Factors that influence the .b[demand] for a given good:
  
    - Price of the good itself, price of substitutes, preferences...
  
---

# Assumption 2

> *"The stochastic error term `$(u_i)$` has a .b[zero] population mean."*

In practice, what is the difference between `$\mathbb{E}[u \ | \ x] = 0$` and `$\mathbb{E}[u \ | \ x] \neq 0$`?

---

# Assumption 3

> *"All explanatory variables `$(x_i)$` are .b[uncorrelated] with the error term."*

- Observed values of the independent variable are determined .b[independently] of the values contained in the error term
  
  - `$Cor(x_i, u_i) \neq 0 \implies$` .b[violation] of CLRM Assumption III.
  
  - A possible reason: a variable correlated with some `$x_i$` being .b[omitted] from the model.

---

# Assumption 4

> *"Observations of the error term are .b[uncorrelated] with each other."*

- Also known as .b[autocorrelation].
  
  - Common in .b[time-series] data.
  
  - Occurs when the model's disturbances are correlated .b[over time], i.e., `$Cor(u_t, u_j) \neq 0$` for `$t \neq j$`.

---

# Assumption 4

Behavior of `$u_t$` over time (positive serial correlation)

---

# Assumption 4

Behavior of `$u_t$` over time (negative serial correlation)

---

# Assumption 5

> *"The error term has a .b[constant variance]."*

- Also known as the .b[homoskedasticity] assumption.
  
  - If violated, we have .b[heteroskedasticity].
  
  - Extremely .b[common] in cross-section data sets (also in financial time-series data)
  
--

- This assumption implies that the error term has the .b[same variance] for each value of the independent variable.

- `$Var(u|x) = \sigma^2$`

---

# Assumption 5

- .b[Homoskedastic] residuals:

---

# Assumption 5

- .b[Heteroskedastic] residuals:

---

# Assumption 6

> *"No explanatory variable is a .b[perfect linear function] of any other explanatory variable."*

- Also known as the .b[no perfect multicollinearity] assumption.
  
  - Only completely .b[violated] if an independent variable `$x_i$` is a .b[deterministic] function of another variable `$x_j$`, for `$i \neq j$`
  
--

Examples:

- `$x_3 = x_1 - 1,000$`

- `$x_2 = 50 + x_1$`

---

# Assumption 7

> *"The error term is .b[normally distributed]."*

- Summarized by `$u_i \sim{\mathcal{N}(0,  \sigma^2)}$`.
  
--

OLS .b[still works] without this assumption!

But crucial for .b[hypothesis testing and inference].

---

# The Gauss-Markov theorem

---

# The Gauss-Markov theorem

From CLRM Assumptions .b[I through VI], we guarantee that OLS is .hi-blue[BLUE].

We will learn how to deal with the most common .b[violations] of CLRM Assumption after the Midterm exam.

---

# Next time: CLRM in practice

---
exclude: true