class: center, middle, inverse, title-slide # The Fundamental Problem of Econometrics ## EC 320: Introduction to Econometrics ### Winter 2022 --- class: inverse, middle # Prologue --- # Statistics Inform Policy __Policy:__ In 2017, the University of Oregon started requiring first-year students to live on campus. __Rationale:__ First-year students who live on campus fare better than those who live off campus. - _80 percent more likely_ to graduate in four years. - Second-year retention rate _5 percentage points higher_. - GPAs _0.13 points higher_, on average. -- .hi-pink[Do these comparisons suggest that the policy will improve student outcomes?] -- .pink[Do they describe the effect of living on campus?] -- .pink[Do they describe] .hi-pink[_something else_?] --- # Other Things Equal The UO's interpretation of those comparisons warrants skepticism. - The decision to live on campus is probably related to family wealth and interest in school. - Family wealth and interest in school are also related to academic achievement. -- __Why?__ The difference in outcomes between those on and off campus is not an .hi-purple[_other things equal_]<sup>*</sup> comparison. __Upshot:__ We can't attribute the difference in outcomes solely to living on campus. .footnote[ <sup>*</sup> _Other things equal_ .mono[=] _ceteris paribus_, _all else held constant_, _etc_. ] --- # Other Things Equal ## A high bar When all other factors are held constant, statistical comparisons detect causal relationships. -- (Micro)economics has developed a comparative advantage in understanding where .hi-purple[_other things equal_] comparisons can and cannot be made. - Anyone can retort "_correlation doesn't necessarily imply causation_." - Understanding _why_ is difficult, but useful for learning from data. --- class: inverse, middle # The Fundamental Problem of Econometrics --- # Causal Identification ## Goal Identify the effect of a .hi[treatment] on an .hi[outcome]. -- ## Ideal data Ideally, we could calculate the .hi[treatment effect] *for each individual* as `$$Y_{1,i} - Y_{0,i}$$` - `\(Y_{1,i}\)` is the outcome for person `\(i\)` when she receives the treatment. - `\(Y_{0,i}\)` is the outcome for person `\(i\)` when she does not receive the treatment. - Known as .pink[potential outcomes]. --- # Causal Identification ## Ideal data .pull-left[ The *ideal* data for 10 people ``` #> i trt y1i y0i #> 1 1 1 5.01 2.56 #> 2 2 1 8.85 2.53 #> 3 3 1 6.31 2.67 #> 4 4 1 5.97 2.79 #> 5 5 1 7.61 4.34 #> 6 6 0 7.63 4.15 #> 7 7 0 4.75 0.56 #> 8 8 0 5.77 3.52 #> 9 9 0 7.47 4.49 #> 10 10 0 7.79 1.40 ``` ] -- .pull-right[ Calculate the causal effect of treatment. $$ `\begin{align} \tau_i = y_{1,i} - y_{0,i} \end{align}` $$ for each individual `\(i\)`. ] --- count: false # Causal Identification ## Ideal data .pull-left[ The *ideal* data for 10 people ``` #> i trt y1i y0i effect_i #> 1 1 1 5.01 2.56 2.45 #> 2 2 1 8.85 2.53 6.32 #> 3 3 1 6.31 2.67 3.64 #> 4 4 1 5.97 2.79 3.18 #> 5 5 1 7.61 4.34 3.27 #> 6 6 0 7.63 4.15 3.48 #> 7 7 0 4.75 0.56 4.19 #> 8 8 0 5.77 3.52 2.25 #> 9 9 0 7.47 4.49 2.98 #> 10 10 0 7.79 1.40 6.39 ``` ] .pull-right[ Calculate the causal effect of treatment. $$ `\begin{align} \tau_i = y_{1,i} - y_{0,i} \end{align}` $$ for each individual `\(i\)`. ] --- count: false # Causal Identification ## Ideal data .pull-left[ The *ideal* data for 10 people ``` #> i trt y1i y0i effect_i #> 1 1 1 5.01 2.56 2.45 #> 2 2 1 8.85 2.53 6.32 #> 3 3 1 6.31 2.67 3.64 #> 4 4 1 5.97 2.79 3.18 #> 5 5 1 7.61 4.34 3.27 #> 6 6 0 7.63 4.15 3.48 #> 7 7 0 4.75 0.56 4.19 #> 8 8 0 5.77 3.52 2.25 #> 9 9 0 7.47 4.49 2.98 #> 10 10 0 7.79 1.40 6.39 ``` ] .pull-right[ Calculate the causal effect of treatment. $$ `\begin{align} \tau_i = y_{1,i} - y_{0,i} \end{align}` $$ for each individual `\(i\)`. The mean of `\(\tau_i\)` is the<br>.hi[average treatment effect] (.pink[ATE]). Thus, `\(\color{#e64173}{\overline{\tau} = 3.82}\)` ] --- # Fundamental Problem of Econometrics ## Ideal comparison $$ `\begin{align} \tau_i = \color{#e64173}{y_{1,i}} &- \color{#9370DB}{y_{0,i}} \end{align}` $$ Highlights the fundamental problem of econometrics. -- ## The problem - If we observe `\(\color{#e64173}{y_{1,i}}\)`, then we cannot observe `\(\color{#9370DB}{y_{0,i}}\)`. - If we observe `\(\color{#9370DB}{y_{0,i}}\)`, then we cannot observe `\(\color{#e64173}{y_{1,i}}\)`. - Can only observe what actually happened; cannot observe the **counterfactual**. --- # Fundamental Problem of Econometrics A dataset that we can observe for 10 people looks something like .pull-left[ ``` #> i trt y1i y0i #> 1 1 1 5.01 NA #> 2 2 1 8.85 NA #> 3 3 1 6.31 NA #> 4 4 1 5.97 NA #> 5 5 1 7.61 NA #> 6 6 0 NA 4.15 #> 7 7 0 NA 0.56 #> 8 8 0 NA 3.52 #> 9 9 0 NA 4.49 #> 10 10 0 NA 1.40 ``` ] -- .pull-right[ We can't observe `\(\color{#e64173}{y_{1,i}}\)` and `\(\color{#9370DB}{y_{0,i}}\)`. But, we do observe - `\(\color{#e64173}{y_{1,i}}\)` for `\(i\)` in 1, 2, 3, 4, 5 - `\(\color{#9370DB}{y_{0,j}}\)` for `\(j\)` in 6, 7, 8, 9, 10 ] -- **Q:** How do we "fill in" the `NA`s and estimate `\(\overline{\tau}\)`? --- # Estimating Causal Effects **Notation:** `\(D_i\)` is a binary indicator variable such that - `\(\color{#e64173}{D_i=1}\)` .pink[if individual] `\(\color{#e64173}{i}\)` .pink[is treated]. - `\(\color{#9370DB}{D_i=0}\)` .purple[if individual] `\(\color{#9370DB}{i}\)` .purple[is not treated (*control* group).] -- Then, rephrasing the previous slide, - We only observe `\(\color{#e64173}{y_{1,i}}\)` when `\(\color{#e64173}{D_{i}=1}\)`. - We only observe `\(\color{#9370DB}{y_{0,i}}\)` when `\(\color{#9370DB}{D_{i}=0}\)`. -- **Q:** How can we estimate `\(\overline{\tau}\)` using only `\(\left(\color{#e64173}{y_{1,i}|D_i=1}\right)\)` and `\(\left(\color{#9370DB}{y_{0,i}|D_i=0}\right)\)`? --- # Estimating Causal Effects **Q:** How can we estimate `\(\overline{\tau}\)` using only `\(\left(\color{#e64173}{y_{1,i}|D_i=1}\right)\)` and `\(\left(\color{#9370DB}{y_{0,i}|D_i=0}\right)\)`? -- **Idea:** What if we compare the groups' means? _I.e._, $$ `\begin{align} \color{#e64173}{\mathop{Avg}\left( y_i\mid D_i = 1 \right)} - \color{#9370DB}{\mathop{Avg}\left( y_i\mid D_i =0 \right)} \end{align}` $$ -- **Q:** When does a simple difference-in-means provide information on the .hi-slate[causal effect] of the treatment? -- **Q.sub[2.0]:** Is `\(\color{#e64173}{\mathop{Avg}\left( y_i\mid D_i = 1 \right)} - \color{#9370DB}{\mathop{Avg}\left( y_i\mid D_i =0 \right)}\)` a *good* estimator for `\(\overline{\tau}\)`? --- # Estimating Causal Effects **Assumption:** Let `\(\tau_i = \tau\)` for all `\(i\)`. - The treatment effect is equal (constant) across all individuals `\(i\)`. -- **Note:** We defined $$ `\begin{align} \tau_i = \tau = \color{#e64173}{y_{1,i}} - \color{#9370DB}{y_{0,i}} \end{align}` $$ which implies $$ `\begin{align} \color{#e64173}{y_{1,i}} = \color{#9370DB}{y_{0,i}} + \tau \end{align}` $$ --- class: clear-slide **Q:** Is `\(\color{#e64173}{\mathop{Avg}\left( y_i\mid D_i = 1 \right)} - \color{#9370DB}{\mathop{Avg}\left( y_i\mid D_i =0 \right)}\)` a *good* estimator for `\(\tau\)`? -- Difference-in-means -- <br> `\(\quad \color{#ffffff}{\Bigg|}=\color{#e64173}{\mathop{Avg}\left( y_i\mid D_i = 1 \right)} - \color{#9370DB}{\mathop{Avg}\left( y_i\mid D_i =0 \right)}\)` -- <br> `\(\quad \color{#ffffff}{\Bigg|}=\color{#e64173}{\mathop{Avg}\left( y_{1,i}\mid D_i = 1 \right)} - \color{#9370DB}{\mathop{Avg}\left( y_{0,i}\mid D_i =0 \right)}\)` -- <br> `\(\quad \color{#ffffff}{\Bigg|}=\color{#e64173}{\mathop{Avg}\left( \color{#000000}{\tau \: +} \: \color{#9370DB}{y_{0,i}} \mid D_i = 1 \right)} - \color{#9370DB}{\mathop{Avg}\left( y_{0,i}\mid D_i =0 \right)}\)` -- <br> `\(\quad \color{#ffffff}{\Bigg|}=\tau + \color{#e64173}{\mathop{Avg}\left(\color{#9370DB}{y_{0,i}} \mid D_i = 1 \right)} - \color{#9370DB}{\mathop{Avg}\left( y_{0,i}\mid D_i =0 \right)}\)` -- <br> `\(\quad \color{#ffffff}{\Bigg|}= \text{Average causal effect} + \color{#FD5F00}{\text{Selection bias}}\)` -- Our proposed difference-in-means estimator gives us the sum of 1. `\(\tau\)`, the .hi-slate[causal, average treatment effect] that we want. 2. .hi-orange[Selection bias:] How much treatment and control groups differ, on average. --- class: inverse, middle # Randomized Control Trials --- # Selection Bias **Problem:** Existence of selection bias precludes *all else equal* comparisons. - To make valid comparisons that yield causal effects, we need to shut down the bias term. -- **Potential solution:** Conduct an experiment. - How? .hi[Random assignment of treatment]. - Hence the name, .hi[*randomized* control trial] (RCT). --- # Randomized Control Trials ## Example: Effect of de-worming on attendance **Motivation:** Intestinal worms are common among children in less-developed countries. The symptoms of these parasites can keep school-aged children at home, disrupting human capital accumulation. **Policy Question:** Do school-based de-worming interventions provide a cost-effective way to increase school attendance? --- # Randomized Control Trials ## Example: Effect of de-worming on attendance **Research Question:** How much do de-worming interventions increase school attendance? **Q:** Could we simply compare average attendance among children with and without access to de-worming medication? -- <br>**A:** If we're after the causal effect, probably not. -- <br><br>**Q:** Why not? -- <br>**A:** Selection bias: Families with access to de-worming medication probably have healthier children for other reasons, too (wealth, access to clean drinking water, *etc.*).<br>.pink[Can't make an *all else equal* comparison. Biased and/or spurious results.] --- # Randomized Control Trials ## Example: Effect of de-worming on attendance **Solution:** Run an experiment. -- Imagine an RCT where we have two groups: - .hi-slate[Treatment:] Villages that where children get de-worming medication in school. - .hi-slate[Control:] Villages that where children don't get de-worming medication in school (status quo). -- By randomizing villages into .hi-slate[treatment] or .hi-slate[control], we will, on average, include all kinds of villages (poor _vs._ less poor, access to clean water _vs._ contaminated water, hospital _vs._ no hospital, *etc.*) in both groups. -- *All else equal*! --- class: clear-slide .hi-slate[54 villages] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot1-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot2-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_1-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_2-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_3-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_4-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_5-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_6-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_7-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_8-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_9-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_10-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_11-1.png" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .hi-orange[plus randomly assigned treatment] <img src="04-Fundamental_Econometric_Problem_files/figure-html/plot3_12-1.png" style="display: block; margin: auto;" /> --- # Randomized Control Trials ## Example: Effect of de-worming on attendance We can estimate the .hi[causal effect] of de-worming on school attendance by comparing the average attendance rates in the treatment group (💊) with those in the control group (no 💊). $$ `\begin{align} \overline{\text{Attendance}}_\text{Treatment} - \overline{\text{Attendance}}_\text{Control} \end{align}` $$ -- Alternatively, we can use the regression -- $$ `\begin{align} \text{Attendance}_i = \beta_0 + \beta_1 \text{Treatment}_i + u_i \tag{1} \end{align}` $$ where `\(\text{Treatment}_i\)` is a binary variable (=1 if village `\(i\)` received the de-worming treatment). -- **Q:** Should trust the results of `\((1)\)`? Why? <br>**A:** On average, .hi[randomly assigning treatment should balance] treatment and control across the other dimensions that affect school attendance. --- class: clear-slide .hi[Randomization can go wrong!] <img src="04-Fundamental_Econometric_Problem_files/figure-html/fertilizer_plot3_bad-1.png" style="display: block; margin: auto;" /> --- # Causality ## Example: Returns to education The optimal investment in education by students, parents, and legislators depends in part on the monetary *return to education*. -- .hi-purple[Thought experiment:] - Randomly select an individual. - Give her an additional year of education. - How much do her earnings increase? The change in her earnings describes the .hi-slate[causal effect] of education on earnings. --- # Causality ## Example: Returns to education **Q:** Could we simply compare the earnings those with more education to those with less? -- <br>**A:** If we want to measure the causal effect, probably not. -- 1. People *choose* education based on their ability and other factors. 1. High-ability people tend to earn more *and* stay in school longer. 1. Education likely reduces experience (time out of the workforce). -- Point (3) also illustrates the difficulty in learning about the effect of education while *holding all else constant*. Many important variables have the same challenge: gender, race, income. --- # Causality ## Example: Returns to education **Q:** How can we estimate the returns to education? -- .hi-slate[Option 1:] Run an .hi[experiment]. -- - Randomly .pink[assign education] (might be difficult). - Randomly .pink[encourage education] (might work). - Randomly .pink[assign programs] that affect education (*e.g.*, mentoring). -- .hi-slate[Option 2:] Look for a .hi-purple[*natural experiment*] (a policy or accident in society that arbitrarily increased education for one subset of people). -- - Admissions .purple[cutoffs]