class: center, middle, inverse, title-slide # How Economists Learn from Data ## EC 201: Principles of Microeconomics ### Kyle Raze ### Winter 2020 --- class: inverse, middle # Prologue --- class: clear-slide .center[**The Fading *American Dream***] <img src="09-Data_Learning_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> --- # Why is the *American Dream* Fading? **Policy Question:** Why are children's chances of climbing the income ladder falling in America? - What can we do to reverse this trend? Difficult to answer with historical data on macroeconomic trends. - Numerous coinciding changes over time make it difficult to test between alternative explanations. - Only a handful of data points. --- # Theoretical Social Science Historically, the social sciences had limited data to study important policy questions. **Result:** Social sciences were .pink[theoretical] fields. - Economists developed .pink[mathematical models]. - Sociologists developed .pink[qualitative theories]. - Both used their theories to make policy recommendations (*e.g.,* to improve upward mobility). -- **Problem:** Untested theories! - Five economists often have five different answers to the same question. - Leads to a politicization of questions that, in principle, have scientific answers (*e.g.,* would universal healthcare make Americans healthier?). --- # The Rise of Empirical Evidence Today, the social sciences are increasingly .pink[empirical] thanks to the growing availability of data. - Ability to test and improve theories using real-world data. - Analogous to the natural sciences. --- class: clear-slide .center[**Empirical Articles in Leading Economics Journals**] <img src="09-Data_Learning_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # What are Data? .more-left[
] .less-right[ .pink[Rows] represent .hi-pink[observations]. .green[Columns] represent .hi-green[variables]. Each .purple[value] is associated with an .pink[observation] and a .green[variable]. ] --- class: inverse, middle # Making Comparisons --- class: clear-slide **Policy:** In 2017, the University of Oregon started requiring first-year students to live on campus. **Rationale:** First-year students who live on campus fare better than those who live off campus. - _80 percent more likely_ to graduate in four years. - Second-year retention rate _5 percentage points higher_. - GPAs _0.13 points higher_, on average. -- **Q:** .pink[Do these comparisons suggest that the policy will improve student outcomes?] -- **Q:** .pink[Do they describe the effect of living on campus?] **Q:** .pink[Do they describe *something else?*] --- # Other Things Equal The UO's interpretation of those comparisons warrants skepticism. - The decision to live on campus is probably related to family wealth and interest in school. - Family wealth and interest in school are also related to academic achievement. -- __Why?__ The difference in outcomes between those on and off campus is not an *other things equal* comparison. __Upshot:__ We can't attribute the difference in outcomes solely to living on campus. --- # Other Things Equal ## A high bar When all other factors are held constant, statistical comparisons detect causal relationships. -- (Micro)economics has developed a comparative advantage in understanding where .hi-purple[other things equal] comparisons can and cannot be made. - Anyone can retort "_correlation doesn't necessarily imply causation_." - Understanding _why_ is difficult, but useful for learning from data. --- # Causal Identification ## Goal Identify the effect of a .hi[treatment] on an .hi[outcome]. -- ## Ideal comparison Ideally, we could calculate the .hi[treatment effect] *for each individual* as `$$Y_{1,i} - Y_{0,i}$$` - `\(Y_{1,i}\)` is the outcome for person `\(i\)` when she receives the treatment. - `\(Y_{0,i}\)` is the outcome for person `\(i\)` when she does not receive the treatment. - Known as .pink[potential outcomes]. --- # Causal Identification ## Ideal data .pull-left[ The *ideal* data for 10 people ``` #> i treat Y_1i Y_0i #> 1 1 1 5.01 4.56 #> 2 2 1 8.85 4.53 #> 3 3 1 6.31 4.67 #> 4 4 1 5.97 4.79 #> 5 5 1 7.61 6.34 #> 6 6 0 7.63 4.15 #> 7 7 0 4.75 0.56 #> 8 8 0 5.77 3.52 #> 9 9 0 7.47 4.49 #> 10 10 0 7.79 1.40 ``` ] -- .pull-right[ Calculate the causal effect of treatment. $$ `\begin{align} \tau_i = Y_{1,i} - Y_{0,i} \end{align}` $$ for each individual `\(i\)`. ] --- count: false # Causal Identification ## Ideal data .pull-left[ The *ideal* data for 10 people ``` #> i treat Y_1i Y_0i effect_i #> 1 1 1 5.01 4.56 0.45 #> 2 2 1 8.85 4.53 4.32 #> 3 3 1 6.31 4.67 1.64 #> 4 4 1 5.97 4.79 1.18 #> 5 5 1 7.61 6.34 1.27 #> 6 6 0 7.63 4.15 3.48 #> 7 7 0 4.75 0.56 4.19 #> 8 8 0 5.77 3.52 2.25 #> 9 9 0 7.47 4.49 2.98 #> 10 10 0 7.79 1.40 6.39 ``` ] .pull-right[ Calculate the causal effect of treatment. $$ `\begin{align} \tau_i = Y_{1,i} - Y_{0,i} \end{align}` $$ for each individual `\(i\)`. ] --- count: false # Causal Identification ## Ideal data .pull-left[ The *ideal* data for 10 people ``` #> i treat Y_1i Y_0i effect_i #> 1 1 1 5.01 4.56 0.45 #> 2 2 1 8.85 4.53 4.32 #> 3 3 1 6.31 4.67 1.64 #> 4 4 1 5.97 4.79 1.18 #> 5 5 1 7.61 6.34 1.27 #> 6 6 0 7.63 4.15 3.48 #> 7 7 0 4.75 0.56 4.19 #> 8 8 0 5.77 3.52 2.25 #> 9 9 0 7.47 4.49 2.98 #> 10 10 0 7.79 1.40 6.39 ``` ] .pull-right[ Calculate the causal effect of treatment. $$ `\begin{align} \tau_i = Y_{1,i} - Y_{0,i} \end{align}` $$ for each individual `\(i\)`. The mean of `\(\tau_i\)` is the<br>.hi[average treatment effect]. Thus, `\(\color{#e64173}{\overline{\tau}}\)` .mono[=] 2.82 ] --- # Causal Identification ## Ideal comparison $$ `\begin{align} \tau_i = \color{#e64173}{Y_{1,i}} &- \color{#9370DB}{Y_{0,i}} \end{align}` $$ Highlights a fundamental problem. -- ## The problem - If we observe `\(\color{#e64173}{Y_{1,i}}\)`, then we cannot observe `\(\color{#9370DB}{Y_{0,i}}\)`. - If we observe `\(\color{#9370DB}{Y_{0,i}}\)`, then we cannot observe `\(\color{#e64173}{Y_{1,i}}\)`. - Can only observe what actually happened; cannot observe the **counterfactual**. --- # Causal Identification A dataset that we can observe for 10 people looks something like .pull-left[ ``` #> i treat Y_1i Y_0i #> 1 1 1 5.01 NA #> 2 2 1 8.85 NA #> 3 3 1 6.31 NA #> 4 4 1 5.97 NA #> 5 5 1 7.61 NA #> 6 6 0 NA 4.15 #> 7 7 0 NA 0.56 #> 8 8 0 NA 3.52 #> 9 9 0 NA 4.49 #> 10 10 0 NA 1.40 ``` ] -- .pull-right[ We can't observe `\(\color{#e64173}{Y_{1,i}}\)` and `\(\color{#9370DB}{Y_{0,i}}\)` at the same time. But, we do observe - `\(\color{#e64173}{Y_{1,i}}\)` for the first 5 observations. - `\(\color{#9370DB}{Y_{0,i}}\)` for the last 5 observations. ] --- # Making Comparisons **Q:** How can we estimate the average treatment effect using the available data? -- **Idea:** What if we compare the outcome means of each group? - Take the average of `\(\color{#e64173}{Y_{1,i}}\)` for people who actually received the treatment (.pink[treatment group mean]). - Take the average of `\(\color{#9370DB}{Y_{0,i}}\)` for people who didn't receive the treatment (.purple[control group mean]). -- **Q:** Does .pink[treatment group mean] .mono[-] .purple[control group mean] isolate the causal effect of the treatment? --- # Making Comparisons .pull-left[ ``` #> i treat Y_1i Y_0i #> 1 1 1 5.01 NA #> 2 2 1 8.85 NA #> 3 3 1 6.31 NA #> 4 4 1 5.97 NA #> 5 5 1 7.61 NA #> 6 6 0 NA 4.15 #> 7 7 0 NA 0.56 #> 8 8 0 NA 3.52 #> 9 9 0 NA 4.49 #> 10 10 0 NA 1.40 ``` ] .pull-right[ .pink[Treatment group mean] .mono[=] 6.75 .purple[Control group mean] .mono[=] 2.82 Difference-in-means .mono[=] 3.93 ] -- Difference-in-means .mono[=] .hi-green[average treatment effect] .mono[+] .hi-orange[selection bias] -- <br> `\(\quad\)` .mono[=] .green[2.82] .mono[+] .orange[(3.93 .mono[-] 2.82)] -- .mono[=] .green[2.82] .mono[+] .orange[1.11] -- .orange[Selection bias] .mono[!=] 0 .mono[==>] people who select into treatment are different than those who do not. --- class: clear-slide **Podcast Question:** According to Emily Oster, > **A.** Any amount of alcohol consumption during pregnancy can harm a fetus. > **B.** Large amounts of alcohol consumption during pregnancy do not necessarily harm a fetus. > **C.** Small amounts of alcohol consumption during pregnancy do not necessarily harm a fetus. --- count: false class: clear-slide **Podcast Question:** According to Emily Oster, > **A.** Any amount of alcohol consumption during pregnancy can harm a fetus. > **B.** Large amounts of alcohol consumption during pregnancy do not necessarily harm a fetus. > .pink[**C.** Small amounts of alcohol consumption during pregnancy do not necessarily harm a fetus.] --- class: inverse, middle # Randomized Control Trials --- # Overcoming Selection Bias **Problem:** Existence of selection bias precludes *other things equal* comparisons. - To make valid comparisons that yield causal effects, we need to shut down the bias term. -- **Solution:** Conduct an experiment. - How? Assign treatment at .pink[random]. - Hence the name, .pink[*randomized* control trial] (RCT). --- # Randomized Control Trials ## Example: Effect of de-worming on attendance **Motivation:** Intestinal worms are common among children in less-developed countries. The symptoms of these parasites can keep school-aged children at home, disrupting human capital accumulation. **Policy Question:** Do school-based de-worming interventions provide a cost-effective way to increase school attendance? --- # Randomized Control Trials ## Example: Effect of de-worming on attendance **Research Question:** How much do de-worming interventions increase school attendance? **Q:** Could we simply compare average attendance among children with and without access to de-worming medication? -- <br>**A:** If we're after the causal effect, probably not. -- <br><br>**Q:** Why not? -- <br>**A:** Selection bias: Families with access to de-worming medication probably have healthier children for other reasons, too (wealth, access to clean drinking water, *etc.*).<br>.pink[Can't make an *all else equal* comparison. Biased and/or spurious results.] --- # Randomized Control Trials ## Example: Effect of de-worming on attendance **Solution:** Run an experiment. -- Imagine an RCT where we have two groups: - .hi-slate[Treatment:] Villages that where children get de-worming medication in school. - .hi-slate[Control:] Villages that where children don't get de-worming medication in school (status quo). -- By randomizing villages into .hi-slate[treatment] or .hi-slate[control], we will, on average, include all kinds of villages (poor _vs._ less poor, access to clean water _vs._ contaminated water, hospital _vs._ no hospital, *etc.*) in both groups. -- *All else equal*! --- class: clear-slide .hi-slate[54 villages] <br> <br> <img src="09-Data_Learning_files/figure-html/plot1-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] <br> <br> <img src="09-Data_Learning_files/figure-html/plot2-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_1-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_2-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_3-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_4-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_5-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_6-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_7-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_8-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_9-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_10-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_11-1.svg" style="display: block; margin: auto;" /> --- class: clear-slide count: false .hi-slate[54 villages] .hi[of varying levels of development] .mono[+] .hi-orange[randomly assigned treatment] <img src="09-Data_Learning_files/figure-html/plot3_12-1.svg" style="display: block; margin: auto;" /> --- # Randomized Control Trials ## Example: Effect of de-worming on attendance We can estimate the .hi[causal effect] of de-worming on school attendance by .purple[comparing] the average attendance rates in the treatment group (💊) with those in the control group (no 💊): .center[.purple[Treatment group attendance rate .mono[-] Control group attendance rate]] -- **Result:** [Attendance increases](https://www.povertyactionlab.org/case-study/deworming-schools-improves-attendance-and-benefits-communities-over-long-term) … by a lot. -- - 25 percent decrease in absenteeism at a cost of $0.60 per child. - Long term cost effectiveness: additional 11.91 years of schooling per $100 spent on de-worming in Kenya. --- # Randomized Control Trials ## Example: Effect of de-worming on attendance We can estimate the .hi[causal effect] of de-worming on school attendance by .purple[comparing] the average attendance rates in the treatment group (💊) with those in the control group (no 💊): .center[.purple[Treatment group attendance rate .mono[-] Control group attendance rate]] **Q:** Should we trust the results of the comparison? Why? -- **A:** We probably should. On average, randomly assigning treatment balances the treatment and control groups across other dimensions that affect school attendance. --- class: clear-slide .hi[Randomization can go wrong!] <br> <br> <img src="09-Data_Learning_files/figure-html/fertilizer_plot3_bad-1.svg" style="display: block; margin: auto;" /> --- # Interpreting Results ## .pink[Internal Validity] Addresses the question, .pink[*should we believe the study?*] A study has high .pink[internal validity] if we are confident that one variable has a .pink[causal] influence on another variable within the context of the study (*e.g.,* .pink[no selection bias]). -- ## .purple[External Validity] Addresses the question, .purple[*how far can we generalize the results of the study?*] A study has high .purple[external validity] to the extent that the results .purple[apply to other contexts] (not just the local environment that generated the results). -- - Requires internal validity! --- class: inverse, middle # Creating Moves to Opportunity --- # Creating Moves to Opportunity ## Background **Policy Question:** How can we lift people out of poverty? **Research Agenda:** What kinds of social assistance programs have lasting effects on upward mobility? Economists study a variety of state and federal social assistance programs. - .purple[Medicaid], .purple[SNAP] (food stamps), .purple[TANF] (cash welfare), .purple[WIC] (benefits for mothers and infants), .purple[National School Lunch Program], .purple[public housing], .pink[Section 8] (housing vouchers), *etc.* - Considerable variation in benefits and incentive structures. -- - Today: .pink[Section 8]. --- # Creating Moves to Opportunity ## Background Neighborhoods matter a great deal for later-in-life outcomes. Previous research shows that children who move to better neighborhoods are 1. More likely to go college. 2. More likely to earn higher incomes as adults. 3. Less likely to go prison. -- **Problem:** Low-income families tend to live in neighborhoods with limited upward income mobility. - Those who receive Section 8 vouchers usually don't move to better neighborhoods. --- # Creating Moves to Opportunity ## Experiment **Research Question:** Why do voucher holders stay in low-opportunity neighborhoods? Do they prefer those neighborhoods or do they face barriers to moving? -- **Social Experiment:** [Creating Moves to Opportunity (CMTO)](https://opportunityinsights.org/paper/cmto/) 274 low-income families in King County, WA randomly assigned to one of two groups: - .hi-purple[Control group:] Housing voucher for any neighborhood in King County. - .hi-pink[Treatment group:] Housing voucher for any neighborhood in King County .mono[+] counseling .mono[+] landlord engagement .mono[+] one-time moving cost assistance. --- class: clear-slide High-opportunity areas are neighborhoods with 1. High upward social mobility (*i.e.,* children of low-income parents move up in the income distribution as adults). 2. Relatively low rent. .center[**King County, WA**] <img src="seattle_cmto_map.png" width="589" style="display: block; margin: auto;" /> --- class: clear-slide .center[**Creating Moves to Opportunity: Experimental Results**] <table> <caption>Outcome: Share Moving to Opportunity Area (%)</caption> <thead> <tr> <th style="text-align:left;"> Subgroup </th> <th style="text-align:center;"> Control Mean </th> <th style="text-align:center;"> Treatment Mean </th> <th style="text-align:center;"> Treatment Effect </th> <th style="text-align:center;"> Standard Error </th> <th style="text-align:center;"> N </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> All Families </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 14.3 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 54.3 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 40.0 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 5.2 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 274 </td> </tr> <tr> <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Black Non-Hispanic </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 12.3 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 46.2 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 33.9 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 7.3 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 134 </td> </tr> <tr> <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> White Non-Hispanic </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 10.3 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 65.9 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 55.6 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 10.2 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 64 </td> </tr> <tr> <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Started in High Opportunity Tract </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 22.2 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 85.3 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 63.1 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 15.9 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 21 </td> </tr> <tr> <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Didn't Start in High Opportunity Tract </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 12.5 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 49.9 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 37.4 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 6.0 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 193 </td> </tr> <tr> <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> No College </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 9.1 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 56.9 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 47.8 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 7.0 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 139 </td> </tr> <tr> <td style="text-align:left;color: #272822 !important;line-height: 110%;font-style: italic;color: black !important;"> Some College or More </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 22.6 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 52.6 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 30.0 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 8.1 </td> <td style="text-align:center;color: #272822 !important;line-height: 110%;"> 132 </td> </tr> </tbody> </table>