Lecture 09

.title[
# Lecture 09
]
.subtitle[
## OLS // Regression
]
.author[
### Ivan Rudik
]
.date[
### AEM 4510
]

---

```r
if (!require("pacman")) install.packages("pacman")
```

```
## Loading required package: pacman
```

```r
pacman::p_load(
  tidyverse, xaringanExtra, rlang, patchwork, nycflights13, broom, viridis, janitor
)
options(htmltools.dir.version = FALSE)
knitr::opts_hooks$set(fig.callout = function(options) {
  if (options$fig.callout) {
    options$echo = FALSE
  }
knitr::opts_chunk$set(echo = TRUE, fig.align="center")
  options
})
red_pink = "#e64173"
# A blank theme for ggplot
theme_empty = theme_minimal() +
  theme(
    legend.position = "none",
    title = element_text(size = 24),
    axis.text.x = element_text(size = 24), axis.text.y = element_text(size = 24, color = "#ffffff"),
    axis.title.x = element_text(size = 24), axis.title.y = element_text(size = 24),
    panel.grid.minor.x = element_blank(), panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(),
    panel.background = element_rect(fill = "#ffffff", colour = NA),
    plot.background = element_rect(fill = "#ffffff", colour = NA),
    axis.line = element_line(colour = "black"), axis.ticks = element_line(),
  )
theme_blank = theme_minimal() +
  theme(
    legend.position = "none",
    title = element_text(size = 24),
    axis.text.x = element_blank(), axis.text.y = element_blank(),
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    panel.grid.minor.x = element_blank(), panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(),
    panel.background = element_rect(fill = "#ffffff", colour = NA),
    plot.background = element_rect(fill = "#ffffff", colour = NA),
    axis.line = element_blank(), axis.ticks = element_blank(),
  ) 
theme_regular = 
  theme_minimal() +
  theme(
    legend.position = "none",
    title = element_text(size = 14),
    axis.text.x = element_text(size = 24), axis.text.y = element_text(size = 24),
    axis.title.x = element_text(size = 24), axis.title.y = element_text(size = 24),
    panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(),
    panel.grid.major.x = element_blank(), axis.ticks = element_line(),  axis.line = element_line(),
    panel.background = element_rect(fill = "#ffffff", colour = NA),
    plot.background = element_rect(fill = "#ffffff", colour = NA)
  ) 
nascar_df = read_csv("data/10-florida-nascar.csv") |> 
  as_tibble() 
```

```
## Rows: 68858 Columns: 12
```

```
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): school_name
## dbl (11): school_id, grade, year, zscore, nascar_lead, nascar_lead_weighted, years_leaded, industrial_lead, median_incom...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```
## Warning: 'xaringanExtra::style_panelset' is deprecated.
## Use 'style_panelset_tabs' instead.
## See help("Deprecated")
```

```
## Warning in style_panelset_tabs(...): The arguments to `syle_panelset()` changed in xaringanExtra 0.1.0. Please refer to the
## documentation to update your slides.
```

---

# Roadmap

- Intro to regression and ordinary least squares

---
  
class: inverse, center, middle
name: tidyverse

# Regression and ordinary least squares

---

# Why?

Let's start with a few .hi[basic, general questions]

1. What is the goal of econometrics?

2. Why do economists (or other people) study or use econometrics?

---

# Why? Example

GPA is an output from endowments (ability), and hours studied (inputs), and pollution exposure (externality)

One might hypothesize a model:
`$\text{GPA}=f(I, P, \text{SAT}, H)$`

where `$H$` is hours studied, `$P$` is pollution exposure, `$\text{SAT}$` is SAT score and `$\text{I}$` is family income

We expect that GPA will rise with some variables, and decrease with others

But who needs to _expect_?

We can test these hypotheses .hi[using a regression model]

---

# How?

We can write down a linear regression model of the relationship between GPA and (H, P, SAT, PCT):
$$ \text{GPA}_i = \beta_0 + \beta_1 I_i + \beta_2 P_i + \beta_3 \text{SAT}_i + \beta_4 H_i + \varepsilon_i $$

The left hand side of the equals sign is our .hi-blue[dependent variable] GPA

The right hand side of the equals sign contains all of our .hi-red[independent variables] (I, P, SAT, H), and an error term `$\varepsilon_i$` (described later)

The subscript `$i$` means that the variable contains the value for some person `$i$` in our dataset where `$i = 1,\dots,N$`

---

# How?

$$ \text{GPA}_i = \beta_0 + \beta_1 I_i + \beta_2 P_i + \beta_3 \text{SAT}_i + \beta_4 H_i + \varepsilon_i $$

We are interested in how pollution P affects GPA

This is given by `$\beta_2$`

Notice that `$\beta_2 = \frac{\partial\text{GPA}_i}{\partial\text{P}_i}$`

`$\beta_2$` tells us how GPA changes, given a 1 unit increase in pollution!

Our goal will be to estimate `$\beta_2$`, we denote estimates with hats: `$\hat{\beta}_2$`

---

# How?

How do we estimate `$\beta_2$`?

First, suppose we have a set of estimates for all of our `$\beta$`s, then we can *estimate* the GPA `$(\widehat{GPA}_i)$` for any given person based on just (I, P, SAT, H):
`$$\widehat{GPA}_i = \hat{\beta}_0 + \hat{\beta}_1 I_i + \hat{\beta}_2 P_i + \hat{\beta}_3 \text{SAT}_i + \hat{\beta}_4 H_i$$`

---

# How?

We estimate the `$\beta$`s with .hi[linear regression], specifically ordinary least squares

.hi[Ordinary least squares:] choose all the `$\beta$`s so that the sum of squared errors between the *real* GPAs and model-estimated GPAs are minimized:
`$$SSE = \sum_{i=1}^N (GPA_i - \widehat{GPA}_i)^2$$`

Choosing the `$\beta$`s in this fashion gives us the best-fit line through the data

---

# How?

---

# Simple example

Suppose we were only looking at GPA and pollution (lead/Pb):

$$\text{GPA}_i = \beta_0 + \beta_1 P_i + \varepsilon_i $$

![](09-slides-econometrics_files/figure-html/ols vs lines 1-1.svg)