Take-home final

EC607

Admin

Congratulations on making it to the end of the term! It’s been a pleasure having you all in class.

Academic honesty

You are not allowed to work with anyone else. Working with anyone else will be considered cheating. You will receive a zero for both parts of the final exam and will fail the class.

You can use online materials, including LLMs (e.g., ChatGPT and Copilot), books, notes, solutions, etc. However, you still must put all of your answers in your own words. Copying other people’s (and chatbots’) words is also considered cheating.

Connor and Ed will not help you debug your code. Please do not ask. We’re available for clarifying questions only.

Instructions

Due Upload your answers to Canvas before 11:59 pm (Pacific) on Friday, 13 June 2025.

Important You must submit your answers as an HTML or PDF file, built from an RMarkdown (.RMD) or Quarto (.qmd) file (you can also submit a link to an HTML page if you prefer that route).

Conceptual questions

00 Read the paper Upstream and Downstream Impacts of College Merit-Based Financial Aid for Low-Income Students. (Here’s the AEA page for the paper, which includes a link to the replication package.)

I’m serious. You need to read the paper—at least 193–216. You can do it!

This is actually the kind of thing a PhD program is trying to prepare you to do. You are going to read an actual economics paper (🤯), digest the main ideas, and examine the plausibility of its central assumptions.

01 (5 pts) Let’s start big picture. What is the causal question at the heart of the paper?

02 (5 pts) How does the regression discontinuity (RD) design help the authors answer the question in 01?

03 (5 pts) Is the RD fuzzy or sharp? Explain.

04 (15 pts) As with any approach, the RD design rests on a set of assumptions. The authors write:

The three key assumptions for the validity of the RD design are the following: (i) there is no evidence of manipulation in assignment to treatment near the discontinuity; (ii) any observed differences in the neighborhood of the discontinuity occur only as a result of the differences in the running variables; and (iii) the predicted discontinuity creates a large change in assignment to treatment as a function of the running variable.

What evidence do the authors provide to support each of these assumptions?

05 (10 pts) Does the RD rely upon random assignment of test scores? Explain your answer.

06 (10 pts) The economics literature often estimates RDs in a two-stage least squres (2SLS) framework. Explain why 2SLS can be helpful in estimating treatment effects in an RD design.

07 (10 pts) Explain what the two running variables are and how each variable creates a discontinuity.

08 (10 pts) Why would someone want two running variables in an RD design?

Empirical questions

09 (0 pts) Load the data.

You need to download the dataset from the paper’s replication package and then unzip it. You will ultimately want the dataset called data_RD.dta, which should unzip into the data/ReplicationCode/Data/ directory.

Notice that the file extension is the dreaded .dta format (Stata format). You can read it into R using the read_dta() function from the haven package which is part of the tidyverse.

Note: The running variables are already transformed relative to the cutoff points, so you do not need to transform them further.

10 (10 pts) Reproduce the first-stage figure for each of the running variables.

Important: Notice the bandwidths used in the paper’s figures and the how the authors restrict to “compliers” for the running variable not being plotted (e.g., what they’re doing with running variable two when they’re plotting running variable one).

11 (10 pts) The authors only discuss compliers and never-takers in the paper. How/why are they able to rule out always-takers and defiers?

12 (10 pts) Use at least two plots to demonstrate a lack of manipulation/sorting around the test-score cutoff. Explain how your plots support the assumption of no manipulation.

13 (10 pts) Plot the reduced-form figures with respect to the two running variables.

14 (10 pts) Estimate the reduced-form and first-stage regressions implied by the figures in 10 and 13. Report the results in a table. Briefly discuss whether your regression results match the figures.

15 (5 pts) To whom would the LATE be local in this paper? Explain your answer.