class: center, middle, inverse, title-slide # Panel Data ## EC 421, Set 11 ### Edward Rubin ### 12 March 2019 --- class: inverse, middle # Prologue --- name: schedule # Schedule ## Last Time Instrumental variables and causality ## Today Panel data ## Upcoming - Assignment due .hi[Saturday] - Final on Monday --- name: final_review # Final ## Information 1. The final is .hi[Monday]. - The final will .hi[cover *all* material] from this course. - Expect .hi[recent topics] (time series to today) to dominate. - Don't neglect .hi[major topics] (_e.g._, omitted-variable bias). 1. This week's .hi[labs] will cover IV and homework. 1. .hi[Review session] this weekend w/ GEs. --- layout: false class: inverse, middle # Panel data --- layout: true # Panel data ## Intro --- exlcude: true --- name: intro We've considered two types of data (each with one dimension): -- .pull-left[ .hi-orange[Cross-sectional data:] individual `\(i\)` ``` #> state year min_wage poverty_rate #> 45 UT 2017 7.25 8.6 #> 46 VT 2017 10.00 10.2 #> 47 VA 2017 7.25 10.3 #> 48 WA 2017 11.00 9.9 #> 49 WV 2017 8.75 17.3 #> 50 WI 2017 7.25 9.5 #> 51 WY 2017 7.25 12.4 ``` ] -- .pull-right[ .hi-purple[Time-series data:] time `\(t\)` ``` #> state year min_wage poverty_rate #> 32 OR 2011 8.50 14.4 #> 33 OR 2012 8.80 13.5 #> 34 OR 2013 8.95 15.1 #> 35 OR 2014 9.10 14.4 #> 36 OR 2015 9.25 11.9 #> 37 OR 2016 9.75 11.8 #> 38 OR 2017 9.75 10.2 ``` ] --- count: false We've considered two types of data (each with one dimension): .pull-left[ .hi-orange[Cross-sectional data:] individual `\(i\)` <img src="12_panel_data_files/figure-html/cross_sectional_plot-1.svg" style="display: block; margin: auto;" /> ] .pull-right[ .hi-purple[Time-series data:] time `\(t\)` <img src="12_panel_data_files/figure-html/time_series_plot-1.svg" style="display: block; margin: auto;" /> ] -- .hi-pink[*Panel data*] combine these data types/dimensions: individual `\(i\)` **at** time `\(t\)`. --- layout: false class: clear .hi-pink[*Panel data*] combine these data types/dimensions: individual `\(i\)` **at** time `\(t\)`. <img src="12_panel_data_files/figure-html/panel_plot-1.svg" style="display: block; margin: auto;" /> --- layout: true # Panel data --- name: definition ## Definition .pull-left[ With .hi-pink[*panel data*], we have - .hi-purple[repeated observations] `\((t)\)` - on .hi-orange[multiple indiviuals] `\((i)\)`. ] -- .pull-right[ ``` #> state year poverty_rate min_wage #> 1 CA 1990 13.9 4.25 #> 2 CA 2000 12.7 6.25 #> 3 CA 2010 16.3 8.00 #> 4 OR 1990 9.2 4.25 #> 5 OR 2000 10.9 6.50 #> 6 OR 2010 14.2 8.40 #> 7 WA 1990 8.9 4.25 #> 8 WA 2000 10.8 6.50 #> 9 WA 2010 11.5 8.55 ``` ] -- Thus, our regression equation with a panel dataset looks like $$ `\begin{align} y_{\color{#FFA500}{i}\color{#6A5ACD}{t}} = \beta_0 + \beta_1 x_{\color{#FFA500}{i}\color{#6A5ACD}{t}} + u_{\color{#FFA500}{i}\color{#6A5ACD}{t}} \end{align}` $$ for .orange[individual] `\(\color{#FFA500}{i}\)` in .purple[time] `\(\color{#6A5ACD}{t}\)`. --- name: ex_wage ## Example Minimum-wage laws involve many contentious/important policy questions. - Do minimum-wage laws .hi[increase well-being] for minimum-wage earners and their families? - Do minimum-wage laws .hi[increase unemployment]? - Overall, do minimum-wage laws .hi[decrease poverty]? -- We want to know the causal effect of the minimum wage, _i.e._, `\(\beta_1\)` in $$ `\begin{align} \left( \text{Poverty Rate} \right)_{it} = \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{it} + u_{it} \end{align}` $$ where `\(i\)` denotes state and `\(t\)` indexes year. --- ## Example If we go ahead and run OLS in our panel, we find -- <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <caption style="font-size: initial !important;">OLS w/ outcome variable 'poverty rate'</caption> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white;"> Intercept </td> <td style="text-align:right;background-color: white;"> 14.196 </td> <td style="text-align:right;background-color: white;"> 0.283 </td> <td style="text-align:right;background-color: white;"> 50.21 </td> <td style="text-align:left;background-color: white;"> <0.0001 </td> </tr> <tr> <td style="text-align:left;background-color: white;font-weight: bold;color: #e64173;"> Min. Wage </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> -0.203 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> 0.051 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> -3.99 </td> <td style="text-align:left;background-color: white;font-weight: bold;color: #e64173;"> <0.0001 </td> </tr> </tbody> </table> -- which suggests that a one-dollar increase in the minimum wage significantly .pink[*reduces*] poverty by approximately 0.203 percentage points. -- Surprising? --- ## Example: Causality is still hard To isolate the causal effect of minimum wage on poverty in $$ `\begin{align} \left( \text{Poverty Rate} \right)_{it} = \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{it} + u_{it} \end{align}` $$ We still need exogeneity, _i.e._, `\(\mathop{\boldsymbol{E}}\left[ u_{it} \mid \left( \text{Min. Wage} \right) \right] = 0\)`. -- .hi[Exogeneity with *panel data:*] Are there omitted factors that affect both a state's minimum wage *and* its poverty rate? -- We are going to discuss two common panel-data strategies: 1. .hi[Fixed effects] 2. .hi[First differences] --- name: fe ## Fixed effects .hi[*Fixed effects*] are binary indicator variables that *help* control for unobserved differences across individuals or time periods. For example, we can include a .hi-orange[fixed effect for each individual state] `\(\color{#FFA500}{i}\)` to control for unobserved, time-invariant differences between states: -- $$ `\begin{align} \left( \text{Poverty Rate} \right)_{it} = \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{it} + \color{#FFA500}{\text{State}_i} + u_{it} \end{align}` $$ -- ``` #> state year poverty_rate min_wage fe_ca fe_or fe_wa #> 1 CA 2000 12.7 6.25 1 0 0 #> 2 CA 2010 16.3 8.00 1 0 0 #> 3 OR 2000 10.9 6.50 0 1 0 #> 4 OR 2010 14.2 8.40 0 1 0 #> 5 WA 2000 10.8 6.50 0 0 1 #> 6 WA 2010 11.5 8.55 0 0 1 ``` --- ## Fixed effects Notice that these individual fixed effects are just .pink[individual-specific intercepts]—now each unit/individual gets her own intercept. -- **Q:** What are these individual-level fixed effects (FEs) doing? -- **A.sub[1]:** They remove each individual's mean, _i.e._, `\(y_{it} - \overline{y}_i\)` and `\(x_{it} - \overline{x}_i\)`. -- **A.sub[2]:** They control for unobserved, time-invariant differences between units.<sup>.pink[†]</sup> .footnote[ .pink[†] By *time-invariance differences* we mean differences between individuals that do not change over time. ] --- layout: false class: clear In the raw data (no fixed effects/demeaning), individuals differ in levels. <img src="12_panel_data_files/figure-html/panel_plot_raw-1.svg" style="display: block; margin: auto;" /> --- class: clear Individual-fixed effects remove individuals' means. <img src="12_panel_data_files/figure-html/panel_plot_fe-1.svg" style="display: block; margin: auto;" /> --- layout: true # Panel data ## Fixed effects --- Fixed effects are one method econometricians try to "match" individuals to generate a valid control group for our treated individuals. Toward this goal, we include .hi-purple[fixed effects for each time period] `\(\color{#6A5ACD}{t}\)`, to (attempt to) control for shocks that affected all observations. -- $$ `\begin{align} \left( \text{Poverty Rate} \right)_{it} = \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{it} + \color{#FFA500}{\text{State}_i} + \color{#6A5ACD}{\text{Year}_t} + u_{it} \end{align}` $$ ``` #> state year poverty_rate min_wage fe_ca fe_or fe_wa fe_2000 fe_2010 #> 1 CA 2000 12.7 6.25 1 0 0 1 0 #> 2 CA 2010 16.3 8.00 1 0 0 0 1 #> 3 OR 2000 10.9 6.50 0 1 0 1 0 #> 4 OR 2010 14.2 8.40 0 1 0 0 1 #> 5 WA 2000 10.8 6.50 0 0 1 1 0 #> 6 WA 2010 11.5 8.55 0 0 1 0 1 ``` --- layout: true # Panel data ## Fixed-effects estimation in .mono[R] --- .mono[R] makes estimation with fixed-effects really easy. -- As always, you have options. We're going to use the `felm()` function from the `lfe` package. -- .hi[General notation:]<br> `felm(y ~ x1 + x2 + ⋯ | fe1 + fe2 ⋯, data = some_data)` -- .hi[Our example:]<br> `felm(poverty_rate ~ min_wage | state + year, data = panel_df)` --- `felm(poverty_rate ~ min_wage | state + year, data = panel_df)` -- <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <caption style="font-size: initial !important;">Fixed effects w/ outcome variable 'poverty rate'</caption> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white;font-weight: bold;color: #FFA500;"> Min. Wage </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 0.374 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 0.109 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 3.43 </td> <td style="text-align:left;background-color: white;font-weight: bold;color: #FFA500;"> 0.0006 </td> </tr> </tbody> </table> -- `lm(poverty_rate ~ min_wage, data = panel_df)` <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <caption style="font-size: initial !important;">OLS w/ outcome variable 'poverty rate'</caption> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white;"> Intercept </td> <td style="text-align:right;background-color: white;"> 14.196 </td> <td style="text-align:right;background-color: white;"> 0.283 </td> <td style="text-align:right;background-color: white;"> 50.21 </td> <td style="text-align:left;background-color: white;"> <0.0001 </td> </tr> <tr> <td style="text-align:left;background-color: white;font-weight: bold;color: #314f4f;"> Min. Wage </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> -0.203 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> 0.051 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> -3.99 </td> <td style="text-align:left;background-color: white;font-weight: bold;color: #314f4f;"> <0.0001 </td> </tr> </tbody> </table> --- **Q:** Which set of estimates should we believe? -- **A:** The set that you believe meets our exogeneity requirement. --- layout: true # Panel data ## First differences --- name: diff Another route—related to our time-series studies—uses .hi[*first differences*]. -- The .hi[*first difference*] for variable `\(y\)` is the difference between individual `\(i\)`'s current value of `\(y\)` (_i.e._, `\(y_{i,t}\)`) and his previous (lagged) value of `\(y\)` (_i.e._, `\(y_{i,t-1}\)`). -- We write the first difference as $$ `\begin{align} \Delta y_{it} = y_{i,t} - y_{i,t-1} \end{align}` $$ --- From our example, write the model for `\(t\)` and `\(t-1\)` $$ `\begin{align} \left( \text{Poverty Rate} \right)_{i,t} &= \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{i,t} + u_{i,t} \tag{t} \\ \left( \text{Poverty Rate} \right)_{i,t-1} &= \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{i,t-1} + u_{i,t-1} \tag{t-1} \end{align}` $$ -- taking the difference between `\((t)\)` and `\((t-1)\)` gives $$ `\begin{align} \left( \text{Poverty Rate} \right)_{i,t} - \left( \text{Poverty Rate} \right)_{i,t-1} =& \\ \beta_0 - \beta_0 + \beta_1 \left( \text{Min. Wage} \right)_{i,t} - &\beta_1 \left( \text{Min. Wage} \right)_{i,t-1} + u_{i,t} - u_{i,t-1} \end{align}` $$ -- which implies $$ `\begin{align} \Delta\left( \text{Poverty Rate} \right)_{i,t} &= \beta_1 \Delta \left( \text{Min. Wage} \right)_{i,t} + \Delta u_{i,t} \end{align}` $$ --- Estimating our model via first differences gives us the results <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <caption style="font-size: initial !important;">First diff. w/ outcome variable 'poverty rate'</caption> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white;"> Intercept </td> <td style="text-align:right;background-color: white;"> -0.064 </td> <td style="text-align:right;background-color: white;"> 0.047 </td> <td style="text-align:right;background-color: white;"> -1.34 </td> <td style="text-align:left;background-color: white;"> 0.1811 </td> </tr> <tr> <td style="text-align:left;background-color: white;font-weight: bold;color: #e64173;"> Min. Wage </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> 0.221 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> 0.157 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #e64173;"> 1.41 </td> <td style="text-align:left;background-color: white;font-weight: bold;color: #e64173;"> 0.1584 </td> </tr> </tbody> </table> .white[space] <br> -- .pull-left[ <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <caption style="font-size: initial !important;">Fixed effects w/ outcome variable 'poverty rate'</caption> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white;font-weight: bold;color: #FFA500;"> Min. Wage </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 0.374 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 0.109 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #FFA500;"> 3.43 </td> <td style="text-align:left;background-color: white;font-weight: bold;color: #FFA500;"> 0.0006 </td> </tr> </tbody> </table> ] -- .pull-right[ <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <caption style="font-size: initial !important;">OLS w/ outcome variable 'poverty rate'</caption> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white;"> Intercept </td> <td style="text-align:right;background-color: white;"> 14.196 </td> <td style="text-align:right;background-color: white;"> 0.283 </td> <td style="text-align:right;background-color: white;"> 50.21 </td> <td style="text-align:left;background-color: white;"> <0.0001 </td> </tr> <tr> <td style="text-align:left;background-color: white;font-weight: bold;color: #314f4f;"> Min. Wage </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> -0.203 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> 0.051 </td> <td style="text-align:right;background-color: white;font-weight: bold;color: #314f4f;"> -3.99 </td> <td style="text-align:left;background-color: white;font-weight: bold;color: #314f4f;"> <0.0001 </td> </tr> </tbody> </table> ] --- layout: false class: clear, middle **Q:** Conclusions? -- **A:** Models (and their requirements) can .hi[*really*] affect your results --- class: clear, middle Evaluations --- layout: false # Table of contents .pull-left[ ### Admin .smallest[ 1. [Schedule](#schedule) 1. [Final info](#final_review) ] ] .pull-right[ ### Panel data .smallest[ 1. [Introduction](#intro) 1. [Definition](#definition) 1. [Example: Minimum wage](#ex_wage) 1. [Fixed effects](#fe) 1. [Fixed effects in .mono[R]](#fe_r) 1. [First differences](#diff) ] ] --- exclude: true