Simple Linear Regression: Estimation

class: center, middle, inverse, title-slide

# Simple Linear Regression: Estimation
## EC 320: Introduction to Econometrics
### Winter 2022

---

class: inverse, middle

# Prologue

---
# Housekeeping

**Grading:** Midterm 1 grade out.

**Problem Set 2:** Due Monday, Feb 7th by 11:59pm *on Canvas*.

**Lab & Exercise:** Wednesday, Feb 2nd by 11:59pm.

---
# Where Are We?

## Where we've been

.hi[High Concepts]

- Reviewed core ideas from statistics

- Developed a framework for thinking about causality

- Dabbled in regression analysis.

Also, .mono[**R**].

---
# Where Are We?

## Where we're going

.hi[The Weeds!]

- Learn the mechanics of *how* OLS works

- Interpret regression results (mechanically and critically)

- Extend ideas about causality to a regression context

- Think more deeply about statistical inference

- Lay a foundation for more-sophisticated regression techniques.

Also, **more** .mono[**R**].

---
class: inverse, middle

# Simple Linear Regression

---
# Addressing Questions

## Example: Effect of police on crime

__Policy Question:__ Do on-campus police reduce crime on campus?

- **Empirical Question:** Does the number of on-campus police officers affect campus crime rates? If so, by how much?

How can we answer these questions?

- Prior beliefs.

- Theory.

- __Data!__

---
# Let's _"Look"_ at Data

## Example: Effect of police on crime

<div id="htmlwidget-80be86036fc743e297ee" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-80be86036fc743e297ee">{"x":{"filter":"none","vertical":false,"fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96"],[20.42,0.15,0.47,14.68,23.75,7.68,5.59,31.14,8.41,27.36,5.84,32.52,12.05,6.04,24.93,27.94,31.85,17.96,34.51,7.95,41.21,55.59,57.64,46.22,22.18,11.08,23.71,17.16,11.28,57.17,6.98,5.35,45.85,25.72,22.42,8.46,14.35,58.06,20.79,24.35,24.25,37.98,7.44,2.39,5.41,7.07,26.24,30.72,26.85,18.06,13.14,28.1,22.25,20.18,4.25,27.21,33.86,19.32,40,11.78,9.56,8.62,17.75,39.22,12.21,32.52,19,12.61,24.46,16.35,30.82,15.62,19.45,17.58,12.93,24.85,24.05,27.72,18.14,19.66,8.02,15,10.72,21.87,9.38,12.51,28.54,13.33,14.42,22.17,16.07,17.63,17.01,2.98,55.25,29.77],[1.1,2,1.41,2.06,1.52,2.76,1.67,1.03,1.37,1.02,0.72,0.91,1.12,0.44,0.64,0.63,0.64,1.16,0.81,0.24,2.15,2.4,1.96,1.42,0.9,0.72,0.88,0.58,0.64,1.82,0.41,0.69,1.12,6.94,1.62,1.2,1.28,0.74,0.9,0.64,5.54,1.02,0.92,2.67,1.55,1.7,2.92,2.85,1.8,2.68,1.48,1.02,1.72,1.1,0.56,2.14,0.94,1.28,1.66,1.08,1.38,1.03,1.37,3.69,1.82,1.22,1.61,1.71,5,1.3,2.14,2.16,3.34,2.04,1.46,1.23,2.24,1.83,0.99,1.3,1,0.89,0.89,0.84,0.17,0.77,0.95,1.23,1.02,1.95,1.25,1.26,0.82,1.39,0.65,1.55]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Police per 1000 Students<\/th>\n      <th>Crimes per 1000 students<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":6,"lengthChange":false,"pagingType":"simple","columnDefs":[{"className":"dt-right","targets":[1,2]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---
# Take 2

## Example: Effect of police on crime

*"Looking"* at data wasn't especially helpful.

Let's try using a scatter plot.

- Plot each data point in `$(X,Y)$`-space.

- .mono[Police] on the `$X$`-axis.

- .mono[Crime] on the `$Y$`-axis.

---
# Take 2

## Example: Effect of police on crime

---
# Take 2

## Example: Effect of police on crime

The scatter plot tells us more than the spreadsheet.

- Somewhat weak _positive_ relationship.

- Sample correlation coefficient of 0.14 confirms this.

But our question was

> Does the number of on-campus police officers affect campus crime rates? If so, by how much?

- The scatter plot and correlation coefficient provide only a partial answer.

---
# Take 3

## Example: Effect of police on crime

Our next step is to estimate a __statistical model.__

To keep it simple, we will relate an __explained variable__ `$Y$` to an __explanatory variable__ `$X$` in a linear model.

---
# Simple Linear Regression Model

We express the relationship between a .hi-purple[explained variable] and an .hi-green[explanatory variable] as linear:

$$
 \color{#9370DB}{Y_i} = \beta_0 + \beta_1\color{#007935}{X_i} + u_i.
$$

- `$\beta_0$` is the __intercept__ or constant.

- `$\beta_1$` is the __slope coefficient__.

- `$u_i$` is an __error term__ or disturbance term.

.footnote[
_Simple_ .mono[=] Only one explanatory variable.
]

---
# Simple Linear Regression Model

The .hi[intercept] tells us the expected value of `$Y_i$` when `$X_i = 0$`.

$$
 Y_i = \color{#e64173}{\beta_0} + \beta_1X_i + u_i
$$

Usually not the focus of an analysis.

---
# Simple Linear Regression Model

The .hi[slope coefficient] tells us the expected change in `$Y_i$` when `$X_i$` increases by one.

$$
 Y_i = \beta_0 + \color{#e64173}{\beta_1}X_i + u_i
$$

"A one-unit increase in `$X_i$` is associated with a `$\color{#e64173}{\beta_1}$`-unit increase in `$Y_i$`."

Under certain (strong) assumptions about the error term, `$\color{#e64173}{\beta_1}$` is the _effect of_ `$X_i$` _on_ `$Y_i$`.

- Otherwise, it's the _association of_ `$X_i$` _with_ `$Y_i$`.

---
# Simple Linear Regression Model

The .hi[error term] reminds us that `$X_i$` does not perfectly explain `$Y_i$`.

$$
 Y_i = \beta_0 + \beta_1X_i + \color{#e64173}{u_i}
$$

Represents all other factors that explain `$Y_i$`.

- Useful mnemonic: pretend that `$u$` stands for *"unobserved"* or *"unexplained."*

---
# Take 3, continued

## Example: Effect of police on crime

How might we apply the simple linear regression model to our question about the effect of on-campus police on campus crime?

- Which variable is `$X$`? Which is `$Y$`?

$$
 \text{Crime}_i = \beta_0 + \beta_1\text{Police}_i + u_i.
$$

- `$\beta_0$` is the crime rate for colleges without police.
- `$\beta_1$` is the increase in the crime rate for an additional police officer per 1000 students.

---
# Take 3, continued

## Example: Effect of police on crime

How might we apply the simple linear regression model to our question?

$$
 \text{Crime}_i = \beta_0 + \beta_1\text{Police}_i + u_i
$$

`$\beta_0$` and `$\beta_1$` are the population parameters we want, but we cannot observe them.

Instead, we must estimate the population parameters.

- `$\hat{\beta_0}$` and `$\hat{\beta_1}$` generate predictions of `$\text{Crime}_i$` called `$\hat{\text{Crime}_i}$`.

- We call the predictions of the dependent variable __fitted values.__

- Together, these trace a line: `$\hat{\text{Crime}_i} = \hat{\beta_0} + \hat{\beta_1}\text{Police}_i$`.

---
# Take 3, attempted

## Example: Effect of police on crime

Guess: `$\hat{\beta_0} = 60$` and `$\hat{\beta_1} = -7$`.

---
# Take 4

## Example: Effect of police on crime

Guess: `$\hat{\beta_0} = 30$` and `$\hat{\beta_1} = 0$`.

---
# Take 5

## Example: Effect of police on crime

Guess: `$\hat{\beta_0} = 15.6$` and `$\hat{\beta_1} = 7.94$`.

---
# Residuals

Using `$\hat{\beta_0}$` and `$\hat{\beta_1}$` to make `$\hat{Y_i}$` generates misses called .hi[residuals]:

$$
\color{#e64173}{\hat{u}_i} = \color{#e64173}{Y_i - \hat{Y_i}}.
$$

- Sometimes called `$\color{#e64173}{e_i}$`.

---
# Residuals

## Example: Effect of police on crime

Using `$\hat{\beta_0} = 15.6$` and `$\hat{\beta_1} = 7.94$` to make `$\color{#9370DB}{\hat{\text{Crime}_i}}$` generates .hi[residuals].

---
# Residuals

We want an estimator that makes fewer big misses.

Why not minimize `$\sum_{i=1}^n \hat{u}_i$`?

- There are positive _and_ negative residuals `$\implies$` no solution (can always find a line with more negative residuals).

__Alternative:__ Minimize the sum of squared residuals a.k.a. the .hi[residual sum of squares (RSS)].

- Squared numbers are never negative.

---
# Residuals

## Example: Effect of police on crime

.hi-blue[RSS] gives bigger penalties to bigger residuals.

---
count: false

# Residuals

## Example: Effect of police on crime

.hi-blue[RSS] gives bigger penalties to bigger residuals.

---
count: false

# Residuals

## Example: Effect of police on crime

.hi-blue[RSS] gives bigger penalties to bigger residuals.

---
# Residuals

## Minimizing RSS

We could test thousands of guesses of `$\hat{\beta_0}$` and `$\hat{\beta_1}$` and pick the pair that minimizes RSS.

- Or we just do a little math and derive some useful formulas that give us RSS-minimizing coefficients without the guesswork.

---
class: inverse, middle

# Ordinary Least Squares (OLS)

---
# OLS

The __OLS estimator__ chooses the parameters `$\hat{\beta_0}$` and `$\hat{\beta_1}$` that minimize the .hi[residual sum of squares (RSS)]:

`$$\min_{\hat{\beta}_0,\, \hat{\beta}_1} \quad \color{#e64173}{\sum_{i=1}^n \hat{u}_i^2}$$`

This is why we call the estimator ordinary __least squares.__

---
# Deriving the OLS Estimator

## Outline

1. Replace `$\sum_{i=1}^n \hat{u}_i^2$` with an equivalent expression involving `$\hat{\beta_0}$` and `$\hat{\beta_1}$`.

2. Take partial derivatives of our RSS expression with respect to `$\hat{\beta_0}$` and `$\hat{\beta_1}$` and set each one equal to zero (first-order conditions).

3. Use the first-order conditions to solve for `$\hat{\beta_0}$` and `$\hat{\beta_1}$` in terms of data on `$Y_i$` and `$X_i$`.

4. Check second-order conditions to make sure we found the `$\hat{\beta_0}$` and `$\hat{\beta_1}$` that minimize RSS.

---
# OLS Formulas

For details, see the [handout](https://raw.githack.com/bchang2/ec320_w22/main/Lectures/07-Simple_Linear_Regression_Estimation/OLS_derivation.pdf) posted on Canvas.

__Slope coefficient__

`$$\hat{\beta}_1 = \dfrac{\sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^n  (X_i - \bar{X})^2}$$`

__Intercept__

$$ \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X} $$

---
# Slope coefficient

The slope estimator is equal to the sample covariance divided by the sample variance of `$X$`:

$$
`\begin{aligned}
\hat{\beta}_1 &= \dfrac{\sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^n  (X_i - \bar{X})^2} \\ \\
              &= \dfrac{ \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})}{ \frac{1}{n-1} \sum_{i=1}^n  (X_i - \bar{X})^2} \\ \\
              &= \dfrac{S_{XY}}{S^2_X}.
\end{aligned}`
$$

---
# Take 6

## Example: Effect of police on crime

Using the OLS formulas, we get `$\hat{\beta_0}$` .mono[=] 18.41 and `$\hat{\beta_1}$` .mono[=] 1.76.

---
count: false

# Take 6

## Example: Effect of police on crime

Using the OLS formulas, we get `$\hat{\beta_0}$` .mono[=] 18.41 and `$\hat{\beta_1}$` .mono[=] 1.76.

---
# Coefficient Interpretation

## Example: Effect of police on crime

Using OLS gives us the fitted line

$$
 \hat{\text{Crime}_i} = \hat{\beta}_0 + \hat{\beta}_1\text{Police}_i.
$$

What does `$\hat{\beta_0}$` .mono[=] 18.41 tell us?

What does `$\hat{\beta_1}$` .mono[=] 1.76 tell us?

__Gut check:__ Does this mean that police _cause_ crime?

- Probably not. __Why?__

---
# Outliers

## Example: Association of police with crime

---
# Outliers

## Example: Association of police with crime

.hi-purple[Fitted line] without outlier.

---
count: false

# Outliers

## Example: Association of police with crime

.hi-purple[Fitted line] without outlier. .hi[Fitted line] with outlier.