Big Data and Economics

class: center, middle, inverse, title-slide

.title[
# Big Data and Economics
]
.subtitle[
## Regression Discontinuity
]
.author[
### Kyle Coombs, adapted from Nick Huntington-Klein
]
.date[
### Bates College | <a href="https://github.com/big-data-and-economics">ECON/DCS 368</a>
]

---

pre[class] {
  max-height: 100px;
}
</style>

# Table of contents

- [Prologue](#prologue)

- [Regression Discontinuity](#rdd)
  - [Fitting Lines in RDD](#fitlines)
  - [Overfitting](#careful)
  - [Assumptions](#assumptions)

- [RDD Challenges](#rdd-challenges)

- [Fuzzy RDD](#fuzzy-rdd) (if time)

- [How the pros do it](#how-the-pros-do-it) (if time)

---
name: prologue
class: inverse, center, middle

# Prologue

---

# Prologue

- We've just finished covering difference-in-differences, which is one way of estimating a causal effect using observational data

- DID is *very* widely applicable, but it relies on some pretty strong assumptions like parallel trends

- Today we'll discuss another method for estimating causal effects using observational data: Regression Discontinuity
  - This method can sometimes be easier to defend
  - But it is rarer to find situations where it applies
  - There's also plenty of room for "snake oil" here as with all causal inference

- Today I'm intentionally using simulated data to illustrate concepts because this approach looks so different in different applications

- But the fundamentals are the same no matter what you're studying and I don't want that lost in the econometric sauce

- As always, there's a ton here and we're just scratching the surface

---
# Announcements

1. New bonus opportunity: the person with most "good faith" posts/answers on GitHub gets 2.5% bonus
  - This is to motivate you to use this tool the way actual data scientists/coding professionals do
  - I give some guidelines on "good faith" on the course site, but tl;dr: "don't spam and be constructive"

2. Dr. Szymon Sacher is coming to speak Friday March 15th at 11am in PGill G50
  - **Bonus opportunity**: Up to five points on problem set 4 if you (1) attend and (2) write one page describing a potential application of his work
  - Submit the description in your problem set 3 repository and tag me (@kgcsport) in an issue when you do -- just need to have a place for you to submit it

---
# Questions

- Any questions?

- Feel free to interrupt with questions during lecture if needed

---
# Student presentation

---
# Attribution

- Most of these slides are borrowed from Nick Huntington-Klein slides on RDD and Raj Chetty's course of Big Data and Economics

---
name: rdd
class: inverse, center, middle

# Regression Discontinuity

---

# Regression Discontinuity

- Regression discontinuity design (RDD) is currently the darling of the econometric world for estimating causal effects without running an experiment

- It doesn't apply everywhere, but when it does, it's very easy to buy the identification assumptions

- Not that it doesn't have its own issues, of course, but it's pretty good!

---

# Regression Discontinuity

The basic idea is this:

- We look for a treatment that is assigned on the basis of being above/below a *cutoff value* of a continuous variable

- For example, if you get above a certain test score they let you into a "gifted and talented" program

- Or if you are just on one side of a time zone line, your day starts one hour earlier/later

- Or if a candidate gets 50.1% of the vote they're in, 40.9% and they're out

- Or if you're 65 years old you get Medicare, if you're 64.99 years old you don't

- Class size must be below 40 students, so there are small classes when a grade reaches 41, 81, 121, etc. students

We call these continuous variables "Running variables" because we *run along them* until we hit the cutoff

---
# Line up in height order

1. Line up in height order

2. Those below 5'6" get a pill to increase their height

3. Those above 5'6" don't

4. We want to know the effect of the pill on height

- Can we compare the average height of the treated and untreated groups after a year?

- Nope! Heights and the rate of growth is different for other reasons than the pill

- But what if we compared people right around 5'6"? They're basically the same, except for random chance

---
# Running var triggers treatment

There is a relationship between an outcome ($Y$) and a running variable ($X$)

There is also a treatment that triffers if `$X< c$`, a cutoff.

- Let's do the wrong thing

1. Assign `$Treatment = 1$` if running variable above `$c$` and `$Treatment = 0$` if below

2. Regress `$y = \beta_0 + \beta_1 Treatment + \varepsilon$`

3. Get a biased estimate. Why?

- The running variable is omitted, so we have endogeneity!

---
# Endogenous running variables

- The key to RDD is that the running variable is *endogenous* - it's not randomly assigned

- For example, people with higher test scores are better equipped to succeed in academia as are people in gift-and-talented programs

- Similarly, you might wonder how Medicare affects health outcomes, but older people have worse health outcomes than young people

- School enrollment increases both education resources AND class sizes at cutoff points

- Shoot! Our treatment is endogenous! We have endogeneity if we omit the running variable from our model

---
# Regression Discontinuity

- So what does this mean?

- If we can control for the running variable *everywhere except the cutoff*, then...

- We will be controlling for the running variable, removing endogeneity

- But leaving variation at the cutoff open, allowing for variation in treatment

- We focus on just the variation around the treatment, narrowing the range of the running variable we use so sharply that it's basically controlled for. 
  - Then the effect of cutoff on treatment is like an experiment!

---

# Regression Discontinuity

- The idea is that *right around the cutoff*, treatment is randomly assigned

- If you have a test score of 89.9 (not high enough for gifted-and-talented), you're basically the same as someone who has a test score of 90.0 (just barely high enough)

- So if we just focus around the cutoff, we remove endogeneity because it's basically random which side of the line you're on

- But we get variation in treatment!

- This specifically gives us the effect of treatment *for people who are right around the cutoff* a.k.a. a "local average treatment effect" 
  - we still won't know the effect of being put in gifted-and-talented for someone who gets a 30

---
# Terminology

- Some quick terminology before we go on

1. **Running Variable**: The continuous variable that triggers treatment, sometimes called the **forcing variable**

2. **Cutoff**: The value of the running variable that triggers treatment

3. **Bandwidth**: The range of the running variable we use to estimate the effect of treatment -- do we look at everyone within .1 of the cutoff? .5? 1? The whole running variable?

---

# Regression Discontinuity

- A very basic idea of this, before we even get to regression, is to create a *binned scatterplot*

- And see how the bin values jump at the cutoff

- A binned chart chops the Y-axis up into bins

- Then takes the average Y value within that bin. That's it!

- Then, we look at how those X bins relate to the Y binned values.

- If it looks like a pretty normal, continuous relationship... then JUMPS UP at the cutoff X-axis value, that tells us that the treatment itself must be doing something!

---

# Regression Discontinuity

![](12-regression-discontinuity_files/figure-html/rdd-gif-1.gif)

---

# Concept Checks

- [Can you think of an example of a treatment that is assigned at least partially on a cutoff?](https://www.mentimeter.com/app/presentation/bl3969oqed38kkigk4v1qhzhsz22re8a/98biciiz6rj2)

- [Why is it important to look as narrowly as possible around the cutoff? What does this gain over comparing the entire treated and untreated groups?](https://www.mentimeter.com/app/presentation/bl3969oqed38kkigk4v1qhzhsz22re8a/1rjcjy9g2hnk)

---
name: fitlines
# Fitting Lines in RDD

- Looking purely just at the cutoff and making no use of the space *away* from the cutoff throws out a lot of useful information

- We know that the running variable is related to outcome, so we can probably improve our *prediction* of what the value on either side of the cutoff should be if we *use data away from the cutoff to help with prediction* than if we *just use data near the cutoff*, which is what that animation does

- We can do this with good ol' OLS.

- The bin plot we did can help us pick a functional form for the slope

---

# Fitting Lines in RDD

- To be clear, producing the line(s) below is our goal. How can we do it?

- The true model I've made is an RDD effect of 0.7, with a slope of 1 to the left of the cutoff and a slope of 1.5 to the right

![](12-regression-discontinuity_files/figure-html/fitting-1.png)

---

# Regression in RDD

- First, we need to *transform our data*

- We need a "Treated" variable that's `TRUE` when treatment is applied - above or below the cutoff

- Then, we are going to want a bunch of things to change at the cutoff.

- This will be easier if the running variable is *centered around the cutoff*.<sup>1</sup>

- So we'll turn our running variable `$X$` into `$X - cutoff$` and call that `$XCentered$`

```r
cutoff = .5

df <- df %>%
  mutate(treated = X >= .5,
         X_centered = X - .5)
```

.footnote[<sup>1</sup> If `$X$` is not centered, you can still back out the treatment effect, but you'll need to adjust standard errors and point estimates for the fact that the running variable is not centered. You save yourself a ton of time and headaches by just centering it.]

---

# Varying Slope

- Typically, you will want to let the slope vary to either side

- In effect, we are fitting an entirely different regression line on each side of the cutoff

- We can do this by interacting both slope and intercept with `$Treated$`!
  - `$\beta_1$` is how the intercept jumps at treatment - that's our RDD effect, `$\beta_3$` is how the slope changes.

`$$Y = \beta_0 + \beta_1Treated + \beta_2XCentered + \beta_3Treated\times XCentered + \varepsilon$$`

```r
etable(feols(Y ~ treated*X_centered, data = df,vcov='HC1'))
```

```
##                      feols(Y ~ treate..
## Dependent Var.:                       Y
##                                        
## Constant               -0.0111 (0.0272)
## Treated              0.7467*** (0.0381)
## X_centered           0.9825*** (0.0916)
## Treated x X_centered 0.4470*** (0.1287)
## ____________________ __________________
## S.E. type            Heteroskedas.-rob.
## Observations                      1,000
## R2                              0.84769
## Adj. R2                         0.84723
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

# Varying Slope

(as an aside, sometimes the effect of interest is the interaction term - the change in slope! This answers the question "does the effect of `$X$` on `$Y$` change at the cutoff? This is called a "regression kink" design. We won't go more into it here, but it is out there!)

---

# Polynomial Terms

- We don't need to stop at linear slopes!

- Just like we brought in our knowledge of binary and interaction terms to understand the linear slope change, we can bring in polynomials too. Add a square maybe!

- Don't get too wild with cubes, quartics, etc. - polynomials tend to be at their "weirdest" near the edges, and we don't want super-weird predictions right at the cutoff. It could give us a mistaken result!

- A square term should be enough

---

# Polynomial Terms

- How do we do this? Interactions again. Take *any* regression equation...

`$$Y = \beta_0 + \beta_1X + \beta_2X^2 + \varepsilon$$`

- And just center the `$X$` (let's call it `$XC$`, add on a set of the same terms multiplied by `$Treated$` (don't forget `$Treated$` by itself - that's `$Treated$` times the interaction!))

`$$Y = \beta_0 + \beta_1XC + \beta_2XC^2 + \beta_3Treated + \beta_4Treated\times XC + \beta_5Treated\times XC^2 + \varepsilon$$`

- The coefficient on `$Treated$` remains our "jump at the cutoff" - our RDD estimate!

```r
etable(feols(Y ~ X_centered*treated + I(X_centered^2)*treated, data = df,vcov='HC1'))
```

```
##                              feols(Y ~ X_cent..
## Dependent Var.:                               Y
##                                                
## Constant                       -0.0340 (0.0400)
## X_centered                     0.6990. (0.3700)
## Treated                      0.7677*** (0.0603)
## X_centered square              -0.5722 (0.7205)
## X_centered x Treated            0.7509 (0.5621)
## Treated x X_centered squared     0.5319 (1.076)
## ____________________________ __________________
## S.E. type                    Heteroskedas.-rob.
## Observations                              1,000
## R2                                      0.84779
## Adj. R2                                 0.84702
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

# Fitting Quadratic Lines in RDD

- Sometimes it can be hard to tell if a quadratic (or higher-order) term is really necessary
- Visualizations can help!

<div class="figure" style="text-align: center">
<img src="img/regressiondiscontinuity-linearrdd-1.png" alt="A linear slope makes the jump much bigger than it really is! (From the Effect Chapter 20)" width="65%" />
<p class="caption">A linear slope makes the jump much bigger than it really is! (From the Effect Chapter 20)</p>
</div>

---

# Concept Checks

- [Why might we want to use a polynomial term in our RDD model?](https://www.mentimeter.com/app/presentation/bl3969oqed38kkigk4v1qhzhsz22re8a/mineasatkyea)

- [What relationship are we assuming between the outcome variable and the running variable if we choose not to include `$XCentered$` in our model at all (i.e. a "zero-order polynomial")?](https://www.mentimeter.com/app/presentation/bl3969oqed38kkigk4v1qhzhsz22re8a/qvcfi9ndkua1)

---
name: careful
# Careful with higher order polynomials

- Sometimes higher order polynoials can be a little too flexible

- They make it look like there is an effect where none exist

- "Overfitting" where your model too flexibly follows the data points can lie to you!

- Read [Andrew Gelman's](https://statmodeling.stat.columbia.edu/2020/12/27/rd-gullible/) blog for more info

---
# Does voting make you sick?
Or did the researchers just overfit their model?

.pull-left[
<div class="figure" style="text-align: center">
<img src="img/healthcare_exp_rdd_gullible_top.png" alt="Health care use (0/1)" width="100%" />
<p class="caption">Health care use (0/1)</p>
</div>
]
.pull-right[
<div class="figure" style="text-align: center">
<img src="img/healthcare_exp_rdd_gullible_bottom.png" alt="Log of total health care expenditure among users" width="100%" />
<p class="caption">Log of total health care expenditure among users</p>
</div>
]

Running variable is age with cutoff at age 20 (voting eligibility). Chang & Meyerhoefer (2020) via Andrew Gelman.

---
name: assumptions
# Assumptions

- There must be some assumptions lurking around here
- Some are more obvious (we use the correct functional form)
<br>

- Others are trickier. What are we assuming about the error term and endogeneity here?
- Specifically, we are assuming that *the only thing jumping at the cutoff is treatment*
- Sort of like parallel trends, but maybe more believable since we've narrowed in so far
<br>

- For example, if earning below 150% of the poverty line gets food stamps AND job training, then we can't isolate the effect of just food stamps
  - Or if the proportion of people who are self-employed jumps up just below 150% (based on *reported* income), that's endogeneity!
<br>

- The only thing different about just above/just below should be treatment

---

# Graphically

![](12-regression-discontinuity_files/figure-html/rdd-graph-1.gif)

---
# Robust standard errors

- Oftentimes the error term is likely correlated with the running variable

- That means people tend to use "robust" standard errors, though it is challenging to get them right

![](12-regression-discontinuity_files/figure-html/robust-standard-errors-1.png)

```
## $x
## [1] "X"
## 
## $y
## [1] "Y"
## 
## $title
## [1] "Heterogeneous errors"
## 
## attr(,"class")
## [1] "labels"
```

---
name: rdd-challenges
class: inverse, center, middle

# RDD Challenges

---

# Other Difficulties

More assumptions, limitations, and diagnostics!

- Windows

- Granular running variables

- Manipulated running variables

- Fuzzy regression discontinuity

- How pros do it

---

# Windows

- The basic idea of RDD is that we're interested in *the cutoff*

- The points away from the cutoff are only useful to help predict values at the cutoff

- Do we really want that full range? Is someone's test score of 30 really going to help us much in predicting `$Y$` at a test score of 89?

- So we might limit our analysis within just a narrow window around the cutoff, just like that initial animation we saw!

- This makes the exogenous-at-the-jump assumption more plausible, and lets us worry less about functional form (over a narrow range, not too much difference between a linear term and a square), but on the flip side reduces our sample size considerably

---

# Windows

- Pay attention to the sample sizes, accuracy (true value .7) and standard errors!

```
##                       All   |X|<.25    |X|<.1   |X|<.05   |X|<.01
## Dependent Var.:         Y         Y         Y         Y         Y
##                                                                  
## Treated           0.75***   0.77***   0.71***   0.61***     0.56 
##                  (0.04)    (0.06)    (0.10)    (0.15)      (0.36)
## _______________  ________  ________  ________  ________  ________
## S.E. type       Het.-rob. Het.-rob. Het.-rob. Het.-rob. Het.-rob.
## Observations        1,000       492       206        93        15
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

# Granular Running Variable

- We assume that the running variable varies more or less *continuously*

- That makes it possible to have, say, a test score of 89 compared to a test score of 90 it's almost certainly the same as except for random chance

- But what if our data only had test score in big chunks? i.e. I just know those earning "80-89" or "90-100"

- Much less believable that groups only separated by random chance

- There are some fancy RDD estimators that allow for granular running variables

- But in general, if this is what you're facing, you might be in trouble

- Before doing an RDD, ask: 
  - Is it plausible that someone with the highest value just below the cutoff, and someone with the lowest value just above the cutoff are only at different values because of random chance?

---
# Looking for Lumping

- Ok, now let's go back to our continuous running variables

- What if the running variable is *manipulated*?

- Imagine you're a teacher grading the gifted-and-talented exam. You see someone with an 89 and think "aww, they're so close! I'll just give them an extra point..."

- Or, a school knows that if they have 41 students in a grade, they have to hire another teacher. So they just... don't admit more students

- Suddenly, that treatment is a lot less randomly assigned around the cutoff!

- If there's manipulation of the running variable around the cutoff, we can often see it in the presence of *lumping*

- i.e. if there's a big cluster of observations to one side of the cutoff and a seeming gap missing on the other side

---

# Looking for Lumping

- Here's an example from the real world in medical research - statistically, p-values *should* be uniformly distributed

- But it's hard to get insignificant results published. So people "p-hack" until they find significance and we have selection into publication based on `$p < .05$`.

![p-value graph from Perneger & Combescure, 2017](img/p_value_distribution.png)

---

# Looking for Lumping

- We can look for lumping graphically by just plotting a binned histogram and looking for a jump in *number of observations* at the cutoff

- The first one looks pretty good. We have one that looks not-so-good on the right

![](12-regression-discontinuity_files/figure-html/lump_bad-1.png)

---
# McCrary Test

- A more formal way to check for manipulation is the McCrary test

- The null hypothesis that the running variable is continuous at the cutoff

- Intuitively, it assesses how likely the density of the running variable (number of observations) would be to occur if the running variable were continuous at the cutoff

- If really unlikely, we might reject that null, suggesting manipulation

- It can also be implemented easily with the **rddensity** package in R

---
# rddensity() output

Null hypothesis is that the running variable is continuous at the cutoff (i.e. no manipulation) for various bandwidths/window lengths

```r
#library(rddensity) # Already loaed
mccrary <- rddensity(df$X_centered,c=0)
summary(mccrary)
```

```
## 
## Manipulation testing using local polynomial density estimation.
## 
## Number of obs =       1000
## Model =               unrestricted
## Kernel =              triangular
## BW method =           estimated
## VCE method =          jackknife
## 
## c = 0                 Left of c           Right of c          
## Number of obs         501                 499                 
## Eff. Number of obs    185                 161                 
## Order est. (p)        2                   2                   
## Order bias (q)        3                   3                   
## BW est. (h)           0.175               0.169               
## 
## Method                T                   P > |T|             
## Robust                -1.59               0.1118              
## 
## 
## P-values of binomial tests (H0: p=0.5).
## 
## Window Length              <c     >=c    P>|T|
## 0.029     + 0.029          30      20    0.2026
## 0.045     + 0.044          47      30    0.0675
## 0.061     + 0.060          70      47    0.0415
## 0.077     + 0.075          87      65    0.0882
## 0.094     + 0.091         103      86    0.2444
## 0.110     + 0.106         122     101    0.1803
## 0.126     + 0.122         140     122    0.2936
## 0.143     + 0.138         152     141    0.5592
## 0.159     + 0.153         169     155    0.4702
## 0.175     + 0.169         185     161    0.2162
```

---
# rdplotdensity() implementation

- Also creates a nice visual

```r
rdplotdensity(mccrary,df$X_centered)
```

![](12-regression-discontinuity_files/figure-html/rdplot-1.png)

```
## $Estl
## Call: lpdensity
## 
## Sample size                                      501
## Polynomial order for point estimation    (p=)    2
## Order of derivative estimated            (v=)    1
## Polynomial order for confidence interval (q=)    3
## Kernel function                                  triangular
## Scaling factor                                   0.500500500500501
## Bandwidth method                                 user provided
## 
## Use summary(...) to show estimates.
## 
## $Estr
## Call: lpdensity
## 
## Sample size                                      499
## Polynomial order for point estimation    (p=)    2
## Order of derivative estimated            (v=)    1
## Polynomial order for confidence interval (q=)    3
## Kernel function                                  triangular
## Scaling factor                                   0.498498498498498
## Bandwidth method                                 user provided
## 
## Use summary(...) to show estimates.
## 
## $Estplot
```

![](12-regression-discontinuity_files/figure-html/rdplot-2.png)

---
# Looking for Lumping

- Another thing we can do is do a "placebo test"

- Check if variables *other than treatment or outcome* vary at the cutoff

- We can do this by re-running our RDD but switching our outcome with another variable

- If we get a significant jump, that's bad! That tells us that *other things are changing at the cutoff* which implies some sort of manipulation (or just super lousy luck)

- If all placebo tests are passed, that's good news, but doesn't mean prove zero manipulation

---

# Concept Checks

- [Why does using a narrow window make the effect estimate noisier?](https://www.mentimeter.com/app/presentation/bl3969oqed38kkigk4v1qhzhsz22re8a/3wbmqrt4dow6)

- [Intuitively, why would we be skeptical that a regression discontinuity run on a very granular running variable is valid?](https://www.mentimeter.com/app/presentation/bl3969oqed38kkigk4v1qhzhsz22re8a/bg1vqz24cpws)

- [Why might bunching of observations on one side of the cutoff be a sign of manipulation?](https://www.mentimeter.com/app/presentation/bl3969oqed38kkigk4v1qhzhsz22re8a/jxmprymy381h)

---
name: fuzzy-rdd
class: inverse, center, middle

# Fuzzy Regression Discontinuity

---

# Fuzzy Regression Discontinuity

- So far, we've assumed that you're either on one side of the cutoff and untreated, or the other and treated

- What if it isn't so simple? What if the cutoff just *increases* your chances of treatment?

- For example, what if 30% of schools with fewer than 40 students make smaller classrooms for whatever reason
  - It can get more complicated than this -- it always can

- This is a "fuzzy regression discontinuity" (yes, that does sound like a bizarre Sesame Street episode)

- Now, our RDD will understate the true effect, since it's being calculated on the assumption that we added treatment to 100% of people at the cutoff, when really it's 70%. So we'll get roughly only about 70% of the effect

---

# Fuzzy Regression Discontinuity

- We can account for this with a model designed to take this into account

- Specifically, we can use something called two-stage least squares (or Wald instrumental variable estimator) to handle these sorts of situations

- Basically, two-stage least squares estimates how much the chances of treatment go up at the cutoff, and scales the estimate by that change

- So it would take whatever result we got on the previous slide and divide it by 0.7 (the increased in treated share) to get the true effect

---
# Fuzzy Regression Discontinuity

First let's make some fake data:

```r
set.seed(1000)
df <- tibble(X = runif(1000)) %>%
  mutate(treatassign = .05 + .3*(X > .5)) %>%
  mutate(rand = runif(1000)) %>%
  mutate(treatment = treatassign > rand) %>%
  mutate(Y = .2 + .4*X + .5*treatment + rnorm(1000,0,0.3)) %>% # True effect .5
  mutate(X_center = X - .5) %>%
  mutate(above_cut = X > .5)
```

---

# Fuzzy Regression Discontinuity

- Notice that the y-axis here isn't the outcome, it's "proportion treated"

![](12-regression-discontinuity_files/figure-html/fuzzy-rdd-1.png)

---

# Fuzzy Regression Discontinuity

- We can perform this using the instrumental-variables features of `feols`

- The first stage is the interaction between the running variable and whether treated regressed on the interaction of the running variable and the "sharp" cutoff

- `feols(outcome ~ controls  | XC*treated ~ XC*above_the_cutoff)`

---

# Fuzzy Regression Discontinuity

- (the true effect of treatment is .5 - okay, it's not perfect)

```r
predict_treatment <- feols(treatment ~ X_center*above_cut, data = df)
without_fuzzy <-feols(Y ~ X_center*treatment, data = df)
fuzzy_rdd <- feols(Y ~ 1 | X_center*treatment ~ X_center*above_cut, data = df)
etable(predict_treatment, without_fuzzy, fuzzy_rdd, dict=c('above_cutTRUE'='Above Cut','treatmentTRUE'='Treatment'))
```

```
##                       predict_treatment      without_fuzzy          fuzzy_rdd
## Dependent Var.:               treatment                  Y                  Y
##                                                                              
## Constant               0.0605. (0.0354) 0.4079*** (0.0108) 0.4129*** (0.0345)
## X_center                0.0044 (0.1215) 0.4030*** (0.0375) 0.4541*** (0.1227)
## Above Cut            0.3053*** (0.0484)                                      
## X_center x Above Cut   -0.0392 (0.1687)                                      
## Treatment                               0.4524*** (0.0281) 0.4837*** (0.1247)
## X_center x Treatment                       0.0660 (0.0994)   -0.2510 (0.4829)
## ____________________ __________________ __________________ __________________
## S.E. type                           IID                IID                IID
## Observations                      1,000              1,000              1,000
## R2                              0.13289            0.41696            0.41083
## Adj. R2                         0.13028            0.41520            0.40905
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
---

# Concept Checks

- [If the treatment variable is fuzzily assigned, do we underestimate with a sharp RDD?](https://www.mentimeter.com/app/presentation/bl3969oqed38kkigk4v1qhzhsz22re8a/mbca6662tdvd)

- Yes, but [how do we know that, if our treatment variable is fuzzily assigned, we will *underestimate* the effect if we just run a regular RDD, rather than overestimate it?](https://www.mentimeter.com/app/presentation/bl3969oqed38kkigk4v1qhzhsz22re8a/gjmr13xpn7gc)

---
name: how-the-pros-do-it
class: inverse, center, middle

# How professionals do it

---

# How professionals do it

- We've gone through all kinds of procedures for doing RDD in R already using regression

- But often, professional researchers won't do it that way because it's a bit too easy to mess up details

- Instead, they use packages like **rdrobust** (available in R, Stata, and Python) and written by a team of econometricians

- It abstracts the tedious stuff, like bandwidth selection and standard errors, and gives you loads of customization options for your RDD

- In general, packages like these written by experts who are well-published in discussing a method are a good idea to try

---
# RDrobust

- There are three major functions in RD robust:

1. `rdrobust()` - the main estimation approach, it returns info about the regression and you can customize a variety of complex RD stuff
2. `rdplot()` - a plotting function that shows the jump at the cutoff and let's you customize much of the complexities
3. `rdbwselect()` - a bandwidth selection tool that helps you pick the best bandwidth for your RDD

---

# Basics of **rdrobust**

- We can specify an RDD model by just telling it the dependent variable `$Y$`, the running variable `$X$`, and the cutoff `$c$`.

- We can also specify how many polynomials to use with `p`, defaults to 1
  - (it applies the polynomials more locally than our linear OLS models do - a bit more flexible)

- Use `c` to specify the cutoff (no need to center the running variable manually)

- Pick the bandwidth with `h` or use a data-driven technique with `rdbwselect()`

- Including a `fuzzy` option to specify actual treatment outside of the running variable/cutoff combo

- And many other options

- But output is pretty nasty, so you'll need to do some work to get it into a readable format

---

# rdrobust

```r
summary(rdrobust(df$Y, df$X, c = .5))
```

```
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 1000
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                  488          512
## Eff. Number of Obs.             135          162
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                   0.152        0.152
## BW bias (b)                   0.229        0.229
## rho (h/b)                     0.666        0.666
## Unique Obs.                     488          512
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     0.116     0.090     1.289     0.197    [-0.061 , 0.294]     
##         Robust         -         -     1.006     0.314    [-0.105 , 0.325]     
## =============================================================================
```

---

# rdrobust

```r
summary(rdrobust(df$Y, df$X, c = .5, fuzzy = df$treatment))
```

```
## Fuzzy RD estimates using local polynomial regression.
## 
## Number of Obs.                 1000
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                  488          512
## Eff. Number of Obs.             117          154
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                   0.141        0.141
## BW bias (b)                   0.206        0.206
## rho (h/b)                     0.685        0.685
## Unique Obs.                     488          512
## 
## First-stage estimates.
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     0.210     0.103     2.044     0.041     [0.009 , 0.411]     
##         Robust         -         -     1.376     0.169    [-0.073 , 0.416]     
## =============================================================================
## 
## Treatment effect estimates.
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     0.530     0.357     1.486     0.137    [-0.169 , 1.229]     
##         Robust         -         -     1.433     0.152    [-0.228 , 1.467]     
## =============================================================================
```

---

# rdrobust

- We can even have it automatically make plots of our RDD! Same syntax

```r
rdplot(df$Y, df$X, c = .5)
```

![](12-regression-discontinuity_files/figure-html/rdrobust-plots-1.png)

---

# That's it!

- That's what we have for RDD

- Go explore the regression discontinuity activity on class sizes

---
class: inverse, center, middle

# Next lecture: Bootstrapping
<html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html>