Panel Data

class: center, middle, inverse, title-slide

# Panel Data
## EC 421, Set 11
### Edward Rubin
### 12 March 2019

---

class: inverse, middle

# Prologue

---
name: schedule

# Schedule

## Last Time

Instrumental variables and causality

## Today

Panel data

## Upcoming

- Assignment due .hi[Saturday]
- Final on Monday
---
name: final_review
# Final
## Information

1. The final is .hi[Monday].
  - The final will .hi[cover *all* material] from this course.
  - Expect .hi[recent topics] (time series to today) to dominate.
  - Don't neglect .hi[major topics] (_e.g._, omitted-variable bias).
1. This week's .hi[labs] will cover IV and homework.
1. .hi[Review session] this weekend w/ GEs.
---
layout: false
class: inverse, middle
# Panel data
---
layout: true
# Panel data
## Intro
---
exlcude: true

---
name: intro

We've considered two types of data (each with one dimension):

.pull-left[
.hi-orange[Cross-sectional data:] individual `$i$`

```
#>    state year min_wage poverty_rate
#> 45    UT 2017     7.25          8.6
#> 46    VT 2017    10.00         10.2
#> 47    VA 2017     7.25         10.3
#> 48    WA 2017    11.00          9.9
#> 49    WV 2017     8.75         17.3
#> 50    WI 2017     7.25          9.5
#> 51    WY 2017     7.25         12.4
```
]

.pull-right[
.hi-purple[Time-series data:] time `$t$`

```
#>    state year min_wage poverty_rate
#> 32    OR 2011     8.50         14.4
#> 33    OR 2012     8.80         13.5
#> 34    OR 2013     8.95         15.1
#> 35    OR 2014     9.10         14.4
#> 36    OR 2015     9.25         11.9
#> 37    OR 2016     9.75         11.8
#> 38    OR 2017     9.75         10.2
```
]
---
count: false

We've considered two types of data (each with one dimension):

.pull-left[
.hi-orange[Cross-sectional data:] individual `$i$`
<img src="12_panel_data_files/figure-html/cross_sectional_plot-1.svg" style="display: block; margin: auto;" />
]

.pull-right[
.hi-purple[Time-series data:] time `$t$`
<img src="12_panel_data_files/figure-html/time_series_plot-1.svg" style="display: block; margin: auto;" />
]

.hi-pink[*Panel data*] combine these data types/dimensions: individual `$i$` **at** time `$t$`.
---
layout: false
class: clear

.hi-pink[*Panel data*] combine these data types/dimensions: individual `$i$` **at** time `$t$`.

<img src="12_panel_data_files/figure-html/panel_plot-1.svg" style="display: block; margin: auto;" />
---
layout: true
# Panel data
---
name: definition

## Definition

.pull-left[

With .hi-pink[*panel data*], we have

- .hi-purple[repeated observations] `$(t)$`
- on .hi-orange[multiple indiviuals] `$(i)$`.
]

.pull-right[

```
#>   state year poverty_rate min_wage
#> 1    CA 1990         13.9     4.25
#> 2    CA 2000         12.7     6.25
#> 3    CA 2010         16.3     8.00
#> 4    OR 1990          9.2     4.25
#> 5    OR 2000         10.9     6.50
#> 6    OR 2010         14.2     8.40
#> 7    WA 1990          8.9     4.25
#> 8    WA 2000         10.8     6.50
#> 9    WA 2010         11.5     8.55
```
]

Thus, our regression equation with a panel dataset looks like
$$
`\begin{align}
  y_{\color{#FFA500}{i}\color{#6A5ACD}{t}} = \beta_0 + \beta_1 x_{\color{#FFA500}{i}\color{#6A5ACD}{t}} + u_{\color{#FFA500}{i}\color{#6A5ACD}{t}}
\end{align}`
$$
for .orange[individual] `$\color{#FFA500}{i}$` in .purple[time] `$\color{#6A5ACD}{t}$`.
---
name: ex_wage

## Example

Minimum-wage laws involve many contentious/important policy questions.

- Do minimum-wage laws .hi[increase well-being] for minimum-wage earners and their families?
- Do minimum-wage laws .hi[increase unemployment]?
- Overall, do minimum-wage laws .hi[decrease poverty]?

We want to know the causal effect of the minimum wage, _i.e._, `$\beta_1$` in
$$
`\begin{align}
  \left( \text{Poverty Rate} \right)_{it} = \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{it} + u_{it}
\end{align}`
$$
where `$i$` denotes state and `$t$` indexes year.
---

## Example

If we go ahead and run OLS in our panel, we find

<table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">OLS w/ outcome variable 'poverty rate'</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Term </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t stat. </th>
   <th style="text-align:left;"> p-Value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: white;"> Intercept </td>
   <td style="text-align:right;background-color: white;"> 14.196 </td>
   <td style="text-align:right;background-color: white;"> 0.283 </td>
   <td style="text-align:right;background-color: white;"> 50.21 </td>
   <td style="text-align:left;background-color: white;"> &lt;0.0001 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #e64173;"> Min. Wage </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> -0.203 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> 0.051 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> -3.99 </td>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #e64173;"> &lt;0.0001 </td>
  </tr>
</tbody>
</table>

which suggests that a one-dollar increase in the minimum wage significantly .pink[*reduces*] poverty by approximately 0.203 percentage points.

Surprising?

---

## Example: Causality is still hard

To isolate the causal effect of minimum wage on poverty in
$$
`\begin{align}
  \left( \text{Poverty Rate} \right)_{it} = \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{it} + u_{it}
\end{align}`
$$
We still need exogeneity, _i.e._, `$\mathop{\boldsymbol{E}}\left[ u_{it} \mid \left( \text{Min. Wage} \right) \right] = 0$`.

.hi[Exogeneity with *panel data:*] Are there omitted factors that affect both a state's minimum wage *and* its poverty rate?

We are going to discuss two common panel-data strategies:

1. .hi[Fixed effects]
2. .hi[First differences]
---
name: fe

## Fixed effects

.hi[*Fixed effects*] are binary indicator variables that *help* control for unobserved differences across individuals or time periods.

For example, we can include a .hi-orange[fixed effect for each individual state] `$\color{#FFA500}{i}$` to control for unobserved, time-invariant differences between states:

$$
`\begin{align}
  \left( \text{Poverty Rate} \right)_{it} = \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{it} + \color{#FFA500}{\text{State}_i} + u_{it}
\end{align}`
$$

```
#>   state year poverty_rate min_wage fe_ca fe_or fe_wa
#> 1    CA 2000         12.7     6.25     1     0     0
#> 2    CA 2010         16.3     8.00     1     0     0
#> 3    OR 2000         10.9     6.50     0     1     0
#> 4    OR 2010         14.2     8.40     0     1     0
#> 5    WA 2000         10.8     6.50     0     0     1
#> 6    WA 2010         11.5     8.55     0     0     1
```
---

## Fixed effects

Notice that these individual fixed effects are just .pink[individual-specific intercepts]—now each unit/individual gets her own intercept.

**Q:** What are these individual-level fixed effects (FEs) doing?

**A.sub[1]:** They remove each individual's mean, _i.e._, `$y_{it} - \overline{y}_i$` and `$x_{it} - \overline{x}_i$`.

**A.sub[2]:** They control for unobserved, time-invariant differences between units.<sup>.pink[†]</sup>

.footnote[
.pink[†] By *time-invariance differences* we mean differences between individuals that do not change over time.
]
---
layout: false
class: clear

In the raw data (no fixed effects/demeaning), individuals differ in levels.

<img src="12_panel_data_files/figure-html/panel_plot_raw-1.svg" style="display: block; margin: auto;" />
---
class: clear

Individual-fixed effects remove individuals' means.

<img src="12_panel_data_files/figure-html/panel_plot_fe-1.svg" style="display: block; margin: auto;" />
---
layout: true
# Panel data
## Fixed effects
---

Fixed effects are one method econometricians try to "match" individuals to generate a valid control group for our treated individuals.

Toward this goal, we include .hi-purple[fixed effects for each time period] `$\color{#6A5ACD}{t}$`, to (attempt to) control for shocks that affected all observations.

$$
`\begin{align}
  \left( \text{Poverty Rate} \right)_{it} = \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{it} + \color{#FFA500}{\text{State}_i} + \color{#6A5ACD}{\text{Year}_t} + u_{it}
\end{align}`
$$

```
#>   state year poverty_rate min_wage fe_ca fe_or fe_wa fe_2000 fe_2010
#> 1    CA 2000         12.7     6.25     1     0     0       1       0
#> 2    CA 2010         16.3     8.00     1     0     0       0       1
#> 3    OR 2000         10.9     6.50     0     1     0       1       0
#> 4    OR 2010         14.2     8.40     0     1     0       0       1
#> 5    WA 2000         10.8     6.50     0     0     1       1       0
#> 6    WA 2010         11.5     8.55     0     0     1       0       1
```
---
layout: true
# Panel data
## Fixed-effects estimation in .mono[R]
---

.mono[R] makes estimation with fixed-effects really easy.

As always, you have options.

We're going to use the `felm()` function from the `lfe` package.

.hi[General notation:]<br> `felm(y ~ x1 + x2 + ⋯ | fe1 + fe2 ⋯, data = some_data)`

.hi[Our example:]<br> `felm(poverty_rate ~ min_wage | state + year, data = panel_df)`
---

`felm(poverty_rate ~ min_wage | state + year, data = panel_df)`

<table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">Fixed effects w/ outcome variable 'poverty rate'</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Term </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t stat. </th>
   <th style="text-align:left;"> p-Value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #FFA500;"> Min. Wage </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 0.374 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 0.109 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 3.43 </td>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #FFA500;"> 0.0006 </td>
  </tr>
</tbody>
</table>

`lm(poverty_rate ~ min_wage, data = panel_df)`

<table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">OLS w/ outcome variable 'poverty rate'</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Term </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t stat. </th>
   <th style="text-align:left;"> p-Value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: white;"> Intercept </td>
   <td style="text-align:right;background-color: white;"> 14.196 </td>
   <td style="text-align:right;background-color: white;"> 0.283 </td>
   <td style="text-align:right;background-color: white;"> 50.21 </td>
   <td style="text-align:left;background-color: white;"> &lt;0.0001 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #314f4f;"> Min. Wage </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> -0.203 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> 0.051 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> -3.99 </td>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #314f4f;"> &lt;0.0001 </td>
  </tr>
</tbody>
</table>
---

**Q:** Which set of estimates should we believe?

**A:** The set that you believe meets our exogeneity requirement.
---
layout: true
# Panel data
## First differences
---
name: diff

Another route—related to our time-series studies—uses .hi[*first differences*].

The .hi[*first difference*] for variable `$y$` is the difference between individual `$i$`'s current value of `$y$` (_i.e._, `$y_{i,t}$`) and his previous (lagged) value of `$y$` (_i.e._, `$y_{i,t-1}$`).

We write the first difference as

$$
`\begin{align}
  \Delta y_{it} = y_{i,t} - y_{i,t-1}
\end{align}`
$$
---

From our example, write the model for `$t$` and `$t-1$`

$$
`\begin{align}
  \left( \text{Poverty Rate} \right)_{i,t} &= \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{i,t} + u_{i,t} \tag{t} \\
  \left( \text{Poverty Rate} \right)_{i,t-1} &= \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{i,t-1} + u_{i,t-1} \tag{t-1}
\end{align}`
$$

taking the difference between `$(t)$` and `$(t-1)$` gives

$$
`\begin{align}
  \left( \text{Poverty Rate} \right)_{i,t} - \left( \text{Poverty Rate} \right)_{i,t-1} =& \\
  \beta_0 - \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{i,t} - &\beta_1 \left( \text{Min. Wage} \right)_{i,t-1} + u_{i,t} - u_{i,t-1}
\end{align}`
$$

which implies

$$
`\begin{align}
  \Delta\left( \text{Poverty Rate} \right)_{i,t} &= \beta_1 \Delta \left( \text{Min. Wage} \right)_{i,t} + \Delta u_{i,t}
\end{align}`
$$
---

Estimating our model via first differences gives us the results

<table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">First diff. w/ outcome variable 'poverty rate'</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Term </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t stat. </th>
   <th style="text-align:left;"> p-Value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: white;"> Intercept </td>
   <td style="text-align:right;background-color: white;"> -0.064 </td>
   <td style="text-align:right;background-color: white;"> 0.047 </td>
   <td style="text-align:right;background-color: white;"> -1.34 </td>
   <td style="text-align:left;background-color: white;"> 0.1811 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #e64173;"> Min. Wage </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> 0.221 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> 0.157 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> 1.41 </td>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #e64173;"> 0.1584 </td>
  </tr>
</tbody>
</table>
.white[space]
<br>

.pull-left[
<table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">Fixed effects w/ outcome variable 'poverty rate'</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Term </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t stat. </th>
   <th style="text-align:left;"> p-Value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #FFA500;"> Min. Wage </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 0.374 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 0.109 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 3.43 </td>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #FFA500;"> 0.0006 </td>
  </tr>
</tbody>
</table>
]

.pull-right[
<table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">OLS w/ outcome variable 'poverty rate'</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Term </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t stat. </th>
   <th style="text-align:left;"> p-Value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: white;"> Intercept </td>
   <td style="text-align:right;background-color: white;"> 14.196 </td>
   <td style="text-align:right;background-color: white;"> 0.283 </td>
   <td style="text-align:right;background-color: white;"> 50.21 </td>
   <td style="text-align:left;background-color: white;"> &lt;0.0001 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #314f4f;"> Min. Wage </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> -0.203 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> 0.051 </td>
   <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> -3.99 </td>
   <td style="text-align:left;background-color: white;font-weight: bold;color: #314f4f;"> &lt;0.0001 </td>
  </tr>
</tbody>
</table>
]

---
layout: false
class: clear, middle

**Q:** Conclusions?

**A:** Models (and their requirements) can .hi[*really*] affect your results
---
class: clear, middle

Evaluations

---
layout: false
# Table of contents

.pull-left[
### Admin
.smallest[

1. [Schedule](#schedule)
1. [Final info](#final_review)
]
]

.pull-right[
### Panel data
.smallest[

1. [Introduction](#intro)
1. [Definition](#definition)
1. [Example: Minimum wage](#ex_wage)
1. [Fixed effects](#fe)
1. [Fixed effects in .mono[R]](#fe_r)
1. [First differences](#diff)
]
]
---
exclude: true