Take-home final
EC 607
1 Admin
1.1 Academic honesty
You are not allowed to work with anyone else. Working with anyone else will be considered cheating. You will receive a zero for both parts of the final exam and will fail the class.
You can use online materials (including ChatGPT and Copilot), books, notes, solutions, etc. However, you still must put all of your answers in your own words. Copying other people’s (and chatbots’) words is also considered cheating.
Owen and Ed will not help you debug your code. Please do not ask.
1.2 Instructions
Due Upload your answers to Canvas before 11:59 pm (Pacific) on Thursday, 13 June 2024.
Important You must submit your answers as an HTML or PDF file, built from an RMarkdown (.RMD
) or Quarto (.qmd
) file (you can also submit a link to an HTML page if you prefer that route).
2 A paper (100 points)
Read the paper The long-term impact of military service on health: Evidence from World Ward II and Korean War veterans.
I’m serious. You need to read the paper. You can do it! This is actually the kind of thing a PhD program is trying to prepare you to do.
You are going to read an actual economics paper (🤯), digest the main ideas, and examine the plausibility of its central assumptions.
1.01 (5 pts) What is treatment here? What is the outcome?
1.02 (5 pts) Write out the model for an individual-level treatment effect \((\tau_i)\) in the potential outcomes framework.
1.03 (10 pts) Using a DAG, explain the authors’ concern for selection into treatment.
1.04 (5 pts) The authors write
In their seminar paper, Norman Hearst et al. (1986) resolve this selection bias problem by using the natural randomized experiment generated by the Vietnam draft lottery. They show that those draft-eligible men with low lottery numbers had higher mortality rates in the year immediately following Vietnam…
The Vietnam draft lottery essentially worked by randomly assigning a “draft order” to calendar days (e.g., “June 4th”). Men were then drafted into military service based on the “draft order” of their birthdate.
Suppose in a given year, 195 of the 366 dates was randomly drafted into militar service.
Would comparing the mortality rate of randomly drafted men to randomly not-drafted men allow us to avoid selection bias? Briefly explain your answer.
1.05 (10 pts) Would the comparison in 1.04 suffer from any other forms of bias? Explain.
1.06 (10 pts) Hearst and coauthors found that the mortality rate among drafted men was 4 percent higher than undrafted men. The authors also note that only 26 percent of men randomly selected for the draft actually ended up in the military.
We want to know the true impact of military service on mortality. Use econometrics to explain whether the “4-percent increase” is likely to (a) consistently estimate, (b) underestimate, (c) overestimate, or (d) you cannot say.
1.07 (10 pts) The authors write that the key equation of interest is \[ M_{ict} = \alpha + \beta V_{ic} + \lambda_{t-c} + \phi_{t} + u_{ict} \]
and that they are concerned about a negative correlation between \(V_{ict}\) and \(u_{ict}\).
Why are they concerned?
1.08 (15 pts) The authors note that they cannot get “microdata” (individual-level data), so they move to a birth-cohort-level model for the log of the mortality rate
\[ \log \overline{M}_{ct} = \alpha + \beta \overline{V}_c + \delta(c) + \lambda_{t-c} + \phi_{t} + u_{ct} \]
They then suggest they can use the proportion of veterans in a birth cohort \(\overline{V}_c\) as an instrument for individual-level veteran status \(V_{ic}\).
Assume we are not considering a random draft—meaning individuals make their own choices to join the military. Explain whether the authors’ proposed instrument seems to meet the three requirements of a valid instrument.
1.09 (10 pts) The authors write
Clearly, cohort-average veteran rates cannot be used as an instrument for veteran status if there are “direct” cohort effect in equation (1) in addition to the unrestricted age effects.
What do they mean? Which—if any—of the requirements for valid instruments would these “direct” cohort effects violate?
1.10 (5 pts) The authors’ second empirical approach “assumes birth cohort effects are the same for men ans women from the same cohorts” where “the female mortality rate provides a valid counterfactual for male mortality, absent military service.”
Explain how this approach relates to the fundamental problem of causal inference.
1.11 (5 pts) Explain how the approach in 1.10 (5 pts) relates to matching.
1.12 (5 pts) Does the approach outlined in 1.10 (5 pts) seem likely to provide consistent estimates of the treatment effect? Briefly explain why or why not.
1.13 (5 pts) Assuming the authors’ estimation “works”, what type of treatment effect will they recover? (For whom is the treatment effect applicable?)
3 A simulation (50 points)
Important You may choose to skip this section of the test. However, if you skip this section, the highest grade you can earn in the entire course is a “B” (meaning no B+, A-, A, or A+). Completing this section does not guarantee you recieve higher than a “B”, but it does give you a chance.
DGP We will start by setting up a DGP for a two-stage least squares simulation.
- There are 4 types of people in the population: \(a, b, c, d\);
- The groups are equally represented in the population;
- We’re interested in the effect of a binary treatment \(D\);
- Treatment effects vary by group: \(\tau_a=0\), \(\tau_b=1\), \(\tau_c=2\), \(\tau_d=3\);
- We have three possible instruments for \(D\). Each is binary (with 50% chance of being equal to \(1\)) and affects a specific group.
- \(Z_1=1\) increases the probability of treatment for group \(a\) from 0.1 to 0.5;
- \(Z_2=1\) increases the probability of treatment for group \(b\) from 0.3 to 0.6;
- \(Z_3=1\) increases the probability of treatment for group \(c\) from 0.2 to 0.8.
- For group \(d\), the probability of treatment is 0.7.
Finally, our outcome is \[ Y_i = \alpha + \tau_{g(i)} D_i + \gamma Z_{3i} + w_i + u_i \] where
- \(\alpha = 1\), \(\gamma=1\);
- \(g(i)\) represents individual \(i\)’s group;
- \(w_i\sim N(0,1)\) and \(u_i\sim N(0,1)\).
2.01 (2.5 pts) Using the definition of the DGP above, draw a DAG.
You might want to draw a separate DAG for each group…
2.02 (5 pts) Using the DAG, explain which—if any—of our three instruments is valid.
2.03 (5 pts) Calculate the ATE. (I want a number.)
2.04 (2.5 pts) Calculate the LATE for \(Z_1\). (I want a number.)
2.05 (30 pts) Run a simulation that shows the distribution of two-stage least squares estimates for the treatment effect from the following estimation strategies:
- Plain OLS: Regress \(Y\) on \(D\)
- 2SLS1: Insturment \(D\) with \(Z_1\)
- 2SLS2: Insturment \(D\) with \(Z_2\)
- 2SLS3: Insturment \(D\) with \(Z_3\)
- 2SLS4: Insturment \(D\) with \(Z_1 + Z_2\)
- 2SLS5: Insturment \(D\) with \(Z_1 + Z_2 + Z_3\)
Each sample should have 1,000 individuals. You should run a bunch of iterations.
Hint: The rbinom()
function is great for generating random binary variables. The sample()
function works nicely for drawing randomly from a vector.
2.06 (2.5 pts) What is the “best” strategy for estimating a treatment effect here? Explain why it’s the best.
2.07 (2.5 pts) What is the “worst” strategy for estimating a treatment effect here? Explain why it’s the worst.