Simple Linear Regression: Estimation

class: center, middle, inverse, title-slide

# Simple Linear Regression: Estimation
## EC 320: Introduction to Econometrics
### Winter 2022

---

class: inverse, middle

# HouseKeeping

- Lab 04 today, Exercise 04 due today.
- Problem Set 2 out, due next Monday.

---
# Last Time

We considered a simple linear regression of `$Y_i$` on `$X_i$`:

$$
 Y_i = \beta_0 + \beta_1X_i + u_i.
$$

- `$\beta_0$` and `$\beta_1$` are __population parameters__ that describe the *"true"* relationship between `$X_i$` and `$Y_i$`.

- __Problem:__ We don't know the population parameters. The best we can do is to estimate them.

---
# Last Time

We derived the OLS estimator by picking estimates that minimize `$\sum_{i=1}^n \hat{u}_i^2$`.

- __Intercept:__

$$ \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X}. $$

- __Slope:__

$$
`\begin{aligned}
\hat{\beta}_1 &= \dfrac{\sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^n  (X_i - \bar{X})^2}.
\end{aligned}`
$$

We used these formulas to obtain estimates of the parameters `$\beta_0$` and `$\beta_1$` in a regression of `$Y_i$` on `$X_i$`.

---
# Last Time

With the OLS estimates of the population parameters, we constructed a regression line:

$$
 \hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1X_i.
$$

- `$\hat{Y_i}$` are predicted or __fitted__ values of `$Y_i$`.

- You can think of `$\hat{Y_i}$` as an estimate of the average value of `$Y_i$` given a particular of `$X_i$`.

OLS still produces prediction errors: `$\hat{u}_i = Y_i - \hat{Y_i}$`.

- Put differently, there is a part of `$Y_i$` we can explain and a part we cannot: `$Y_i = \hat{Y_i} + \hat{u}_i$`.

---
# Review

What is the equation for the regression model estimated below?

---
# Review

The estimated __intercept__ is -9.85. What does this tell us?

---
# Review

The estimated __slope__ is 2.2. How do we interpret it?

---
# Today

## Agenda

1. Highlight important properties of OLS.

2. Discuss goodness of fit: how well does one variable explain another?

2. Units of measurement.

---
class: inverse, middle

# OLS Properties

---
# OLS Properties

The way we selected OLS estimates `$\hat{\beta}_0$` and `$\hat{\beta}_1$` gives us three important properties:

1. Residuals sum to zero: `$\sum_{i=1}^n \hat{u}_i = 0$`.

2. The sample covariance between the independent variable and the residuals is zero: `$\sum_{i=1}^n X_i \hat{u}_i = 0$`.

3. The point `$(\bar{X}, \bar{Y})$` is always on the regression line.

---
# OLS Residuals

Residuals sum to zero: `$\sum_{i=1}^n \hat{u}_i = 0$`.

- By extension, the sample mean of the residuals are zero.

- You will prove this in Problem Set 2.

---
# OLS Residuals

The sample covariance between the independent variable and the residuals is zero: `$\sum_{i=1}^n X_i \hat{u}_i = 0$`.

- You will prove a version of this in Problem Set 2.

---
# OLS Regression Line

The point `$(\bar{X}, \bar{Y})$` is always on the regression line.

- Start with the regression line: `$\hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1X_i$`.

- `$\hat{Y_i} = \bar{Y} - \hat{\beta}_1 \bar{X} + \hat{\beta}_1X_i$`.

- Plug `$\bar{X}$` into `$X_i$`:

$$
`\begin{aligned}
\hat{Y_i} &= \bar{Y} - \hat{\beta}_1 \bar{X} + \hat{\beta}_1\bar{X} \\
&= \bar{Y}.
\end{aligned}`
$$

---
class: inverse, middle

# Goodness of Fit

---
# Goodness of Fit

## .hi[Regression 1] *vs.* .hi-green[Regression 2]

- Same slope.

- Same intercept.

**Q:** Which fitted regression line *"explains"** the data better?

.pull-left[

]

.pull-right[

]

.footnote[
* _Explains_ .mono[=] _fits_. 
]

---
# Goodness of Fit

## .hi[Regression 1] *vs.* .hi-green[Regression 2]

The __coefficient of determination__ `$R^2$` is the fraction of the variation in `$Y_i$` *"explained"* by `$X_i$` in a linear regression.

- `$R^2 = 1 \implies X_i$` explains _all_ of the variation in `$Y_i$`.
- `$R^2 = 0 \implies X_i$` explains _none_ of the variation in `$Y_i$`.

.pull-left[

.center[
`$R^2$` .mono[=] 0.73
]

]

.pull-right[

.center[
`$R^2$` .mono[=] 0.07
]

]

---
# Goodness of Fit

---
# Goodness of Fit

---
# Goodness of Fit

---
# Explained and Unexplained Variation

Residuals remind us that there are parts of `$Y_i$` we can't explain.

$$
Y_i = \hat{Y_i} + \hat{u}_i
$$

- Sum the above, divide by `$n$`, and use the fact that OLS residuals sum to zero to get `$\bar{\hat{u}} = 0 \implies \bar{Y} = \bar{\hat{Y}}$`.

__Total Sum of Squares (TSS)__ measures variation in `$Y_i$`:

$$
\text{TSS} \equiv \sum_{i=1}^n (Y_i - \bar{Y})^2.
$$

- We will decompose this variation into explained and unexplained parts.

---
# Explained and Unexplained Variation

__Explained Sum of Squares (ESS)__ measures the variation in `$\hat{Y_i}$`:

$$
\text{ESS} \equiv \sum_{i=1}^n (\hat{Y_i} - \bar{Y})^2.
$$
--

**Residual Sum of Squares (RSS)** measures the variation in `$\hat{u}_i$`:

$$
\text{RSS} \equiv \sum_{i=1}^n \hat{u}_i^2.
$$

.hi[Goal:] Show that `$\text{TSS} = \text{ESS} + \text{RSS}$`.

---
class: white-slide

**Step 1:** Plug `$Y_i = \hat{Y_i} + \hat{u}_i$` into TSS.

`$\text{TSS}$`
--
 `$\quad = \sum_{i=1}^n (Y_i - \bar{Y})^2$`
--
 `$\quad = \sum_{i=1}^n ([\hat{Y_i} + \hat{u}_i] - [\bar{\hat{Y}} + \bar{\hat{u}}])^2$`

**Step 2:** Recall that `$\bar{\hat{u}} = 0$` and `$\bar{Y} = \bar{\hat{Y}}$`.

`$\text{TSS}$`
--
 `$\quad = \sum_{i=1}^n \left( [\hat{Y_i} - \bar{Y}] + \hat{u}_i \right)^2$`
--
 `$\quad = \sum_{i=1}^n \left( [\hat{Y_i} - \bar{Y}] + \hat{u}_i \right) \left( [\hat{Y_i} - \bar{Y}] + \hat{u}_i \right)$`
--
 `$\quad = \sum_{i=1}^n (\hat{Y_i} - \bar{Y})^2 + \sum_{i=1}^n \hat{u}_i^2 + 2 \sum_{i=1}^n \left( (\hat{Y_i} - \bar{Y})\hat{u}_i \right)$`

---
class: white-slide

**Step 3:** Notice .hi-purple[ESS] and .hi[RSS].

`$\text{TSS}$`
--
 `$\quad = \color{#9370DB}{\sum_{i=1}^n (\hat{Y_i} - \bar{Y})^2} + \color{#e64173}{\sum_{i=1}^n \hat{u}_i^2} + 2 \sum_{i=1}^n \left( (\hat{Y_i} - \bar{Y})\hat{u}_i \right)$`
--
 `$\quad = \color{#9370DB}{\text{ESS}} + \color{#e64173}{\text{RSS}} + 2 \sum_{i=1}^n \left( (\hat{Y_i} - \bar{Y})\hat{u}_i \right)$`

---
class: white-slide

**Step 4:** Simplify.

`$\text{TSS}$`
--
 `$\quad = \text{ESS} + \text{RSS} + 2 \sum_{i=1}^n \left( (\hat{Y_i} - \bar{Y})\hat{u}_i \right)$`
--
 `$\quad = \text{ESS} + \text{RSS} + 2 \sum_{i=1}^n \hat{Y_i}\hat{u}_i - 2 \bar{Y}\sum_{i=1}^n \hat{u}_i$`

**Step 5:** Shut down the last two terms. Notice that

`$\sum_{i=1}^n \hat{Y_i}\hat{u}_i$`
 `$\quad = \sum_{i=1}^n (\hat{\beta}_0 + \hat{\beta}_1X_i)\hat{u}_i$`
--
 `$\quad = \hat{\beta}_0 \sum_{i=1}^n \hat{u}_i + \hat{\beta}_1 \sum_{i=1}^n X_i\hat{u}_i$`
--
 `$\quad = 0$`

---
# Goodness of Fit

## Calculating `$R^2$`

- `$R^2 = \frac{\text{ESS}}{\text{TSS}}$`.

- `$R^2 = 1 - \frac{\text{RSS}}{\text{TSS}}$`.

`$R^2$` is related to the correlation between the actual values of `$Y$` and the fitted values of `$Y$`.

- Can show that `$R^2 = (r_{Y, \hat{Y}})^2$`.

---
# Goodness of Fit

## So what?

In the social sciences, low `$R^2$` values are common.

Low `$R^2$` doesn't mean that an estimated regression is useless.

- In a randomized control trial, `$R^2$` is usually less than 0.1.

High `$R^2$` doesn't necessarily mean you have a *"good"* regression.

- Worries about selection bias and omitted variables still apply.

---
class: inverse, middle

# Units of Measurement

---
# Last Time

We ran a regression of crimes per 1000 students on police per 1000 students. We found that `$\hat{\beta_0}$` .mono[=] 18.41 and `$\hat{\beta_1}$` .mono[=] 1.76.

---
# Last Time

What if we had run a regression of crimes per student on police per 1000 students? What would happen to the slope?

`$\hat{\beta_1}$` .mono[=] 0.001756.

---
# Demeaning

## Practice problem

Suppose that, before running a regression of `$Y_i$` on `$X_i$`, you decided to _demean_ each variable by subtracting off the mean from each observation. This gave you `$\tilde{Y}_i = Y_i - \bar{Y}$` and `$\tilde{X}_i = X_i - \bar{X}$`.

Then you decide to estimate

$$
\tilde{Y}_i = \beta_0 + \beta_1 \tilde{X}_i + u_i.
$$

What will you get for your intercept estimate `$\hat{\beta}_0$`?