class: center, middle, inverse, title-slide # Instrumental Variables ## EC 425/525, Set 8 ### Edward Rubin ### 20 May 2019 --- class: inverse, middle $$ `\begin{align} \def\ci{\perp\mkern-10mu\perp} \end{align}` $$ # Prologue --- name: schedule # Schedule ## Last time Matching and propensity-score methods - Conditional independence - Overlap ## Today Instrumental variables (and two-stage least squares) ## Upcoming - Assignment due Sunday - Proposal due Wednesday 5/22 - Midterm? --- layout: true # Research designs --- class: inverse, middle --- name: designs ## Selection on observables and/or unobservables We've been focusing on .hi-slate[*selection-on-observables* designs], _i.e._, $$ `\begin{align} \left(\text{Y}_{0i},\, \text{Y}_{1i}\right) \ci \text{D}_{i}|\text{X}_{i} \end{align}` $$ for .hi-slate[observable] variables `\(\text{X}_{i}\)`. .hi-pink[*Selection-on-unobservable* designs] replace this assumption with two new (but related) assumptions 1. `\(\left(\text{Y}_{0i},\, \text{Y}_{1i}\right) \perp \text{Z}_{i}\)` 2. `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right) \neq 0\)` --- ## Selection on observables and/or unobservables Our main goal in causal-inference minded (applied) econometrics boils down to isolating .b["good" variation] in `\(\text{D}_{i}\)` (exogenous/as-good-as-random) from .b["bad" variation] (the part of `\(\text{D}_{i}\)` correlated with `\(\text{Y}_{0i}\)` and `\(\text{Y}_{1i}\)`). (We want to avoid selection bias.) - .hi-slate[Selection-on-observables designs] assume that we can control for all *bad variation* (selection) in `\(\text{D}_{i}\)` through a known (observed) `\(\text{X}_{i}\)`. - .hi-pink[Selection-on-unobservables designs] assume that we can extract part of the *good variation* in `\(\text{D}_{i}\)` (generally using some `\(\text{Z}_{i}\)`) and then use this *good* part of `\(\text{D}_{i}\)` to estimate the effect of `\(\text{D}_{i}\)` on `\(\text{Y}_{i}\)`. We throw away the *bad variation* in `\(\text{D}_{i}\)` (it's bad). --- ## Which route? So set of research designs is more palatable? 1. There are plenty of bad applications of both sets.<br>.purple[Violated assumptions, bad controls, *etc.*] 1. .hi-slate[Selection on observables] assumes we know .it[everything] about selection into treatment—we can identify .it[all] of the good (or bad) variation in `\(\text{D}_{i}\)`. <br>.purple[Tough in non-experimental settings. Difficult to validate in practice.] 1. .hi-pink[Selection on unobservables] assumes we can isolate .it[some] good/clean variation in `\(\text{D}_{i}\)`, which we then use to estimate the effect of `\(\text{D}_{i}\)` on `\(\text{Y}_{i}\)`. <br>.purple[Seems more plausible. Possible to validate. May be underpowered.] --- layout: true # Instrumental variables --- name: intro ## Introduction .attn[Instrumental variables] (IV).super[.pink[†]] is the canonical selection-on-unobservables design—isolating *good variation* in `\(\text{D}_{i}\)` via some magical .pink[instrument] `\(\color{#e64173}{\text{Z}_{i}}\)`. .footnote[.pink[†] For the moment, we're lumping together IV and two-stage least squares (2SLS) together—as many people do—even though they are technically different.] Consider some model (structural equation) $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1} \end{align}` $$ To guarantee consistent OLS estimates for `\(\beta_1\)`, want `\(\mathop{\text{Cov}} \left( \text{D}_{i},\,\varepsilon_i \right)=0\)`. <br> In general, this is a heroic assumption. .note[Alternative:] Estimate `\(\beta_1\)` via instrumental variables. --- name: defined ## Definition For our model $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1} \end{align}` $$ A valid .attn[instrument] is a variable `\(\color{#e64173}{\text{Z}_{i}}\)` such that 1. `\(\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{D}_{i} \right) \neq 0\)` <br>our .pink[instrument] correlates with treatment (so we can keep part of `\(\text{D}_{i}\)`) 2. `\(\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \varepsilon_i \right) = 0\)` <br>our .pink[instrument] is uncorrelated with other (non- `\(\!\!\text{D}_{i}\)`) determinants of `\(\text{Y}_{i}\)` , _i.e._, `\(\color{#e64173}{\text{Z}_{i}}\)` is excludable from equation `\((1)\)`. .attn[(exclusion restriction)] --- name: example ## Example Back to the returns to a college degree, $$ `\begin{align} \text{Income}_i = \beta_0 + \beta_1 \text{Grad}_i + \varepsilon_i \end{align}` $$ OLS is likely biased. What if that state conducts a (random) .hi-pink[lottery] for scholarships? Let `\(\color{#e64173}{\text{Lottery}_i}\)` denote an indicator for whether `\(i\)` won a lottery scholarship..super[.pink[†]] .footnote[.pink[†] We'll have to focus on families who were eligible/who applied.] 1. `\(\mathop{\text{Cov}} \left( \color{#e64173}{\text{Lottery}_i},\, \text{Grad}_i \right)\neq 0\)` `\(\left( >0 \right)\)` if scholarships increase grad. rates. 2. `\(\mathop{\text{Cov}} \left(\color{#e64173}{\text{Lottery}_i},\, \varepsilon_i\right) = 0\)` since the lottery is randomized. --- layout: true # Instrument variables ## The IV estimator The IV estimator for our model $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1} \end{align}` $$ with (valid) instrument `\(\color{#e64173}{\text{Z}_{i}}\)` is $$ `\begin{align} \hat{\beta}_\text{IV} = \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{Y}\right) \end{align}` $$ --- name: iv-estimator If you have no covariates, then $$ `\begin{align} \hat{\beta}_\text{IV} = \dfrac{\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{Y}_{i}\right)}{\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{D}_{i} \right)} \end{align}` $$ --- If you have additional (exogenous) covariates `\(\text{X}_i\)`, then $$ `\begin{align} \text{Z} &= \begin{bmatrix}\color{#e64173}{\text{Z}_{i}} & \text{X}_{i}\end{bmatrix} \\[0.5em] \text{D} &= \begin{bmatrix}\color{#e64173}{\text{D}_{i}} & \text{X}_{i}\end{bmatrix} \end{align}` $$ --- layout: true # Instrumental variables --- ## Proof: Consistency With a valid instrument `\(\text{Z}_{i}\)`, `\(\hat{\beta}_\text{IV}\)` is a consistent estiamtor for `\(\beta_1\)` in $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{X}_{i} + \varepsilon_i \tag{1} \end{align}` $$ `\(\mathop{\text{plim}}\left( \hat{\beta}_{IV} \right)\)` .pad-left[ `\(= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{Y}\right) \right)\)` ] .pad-left[ `\(= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{D} \beta + \text{Z}'\varepsilon\right) \right)\)` ] .pad-left[ `\(= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{D}\right) \beta\right) + \mathop{\text{plim}}\left(\dfrac{1}{N} \text{Z}'\text{D}\right)^{-1} \mathop{\text{plim}}\left( \dfrac{1}{N} \text{Z}'\varepsilon\right)\)` ] .pad-left[ `\(=\beta\)` .pink[✔] ] --- layout: true # Two-stage least squares --- class: inverse, middle --- name: setup ## Setup You'll commonly see IV implemented as a two-stage process known as<br>.attn[two-stage least squares] (2SLS). .attn[First stage] Estimate the effect of the instrument `\(\color{#e64173}{\text{Z}_{i}}\)` on our endogenous variable `\(\text{D}_{i}\)` and (predetermined) covariates `\(\text{X}_{i}\)`. Save `\(\color{#6A5ACD}{\widehat{\text{D}}_{i}}\)`. $$ `\begin{align} \text{D}_{i} = \gamma_1 \color{#e64173}{\text{Z}_{i}} + \gamma_2 \text{X}_{i} + u_i \end{align}` $$ .attn[Second stage] Estimate model we wanted—but only using the variation in `\(\text{D}_{i}\)` that correlates with `\(\color{#e64173}{\text{Z}_{i}}\)`, _i.e._, `\(\color{#6A5ACD}{\widehat{\text{D}}_{i}}\)`. $$ `\begin{align} \text{Y}_{i} = \beta_1 \color{#6A5ACD}{\widehat{\text{D}}_{i}} + \beta_2 \text{X}_{i} + \varepsilon_i \end{align}` $$ .note[Note] The controls `\(\text{X}_{i}\)` must match in the first and second stages. --- ## IV estimation This two-step procedure, with a valid instrument, produces an estimator `\(\hat{\beta}_1\)` that is consistent for `\(\beta_1\)`. $$ `\begin{align} \hat{\beta}_\text{2SLS} &= \left( \text{D}' \text{P}_{\text{Z}} \text{D} \right)^{-1} \left( \text{D}' \text{P}_{\text{Z}} \text{Y} \right) \\[0.3em] \text{P}_{\text{Z}} &= \text{Z} \left( \text{Z}'\text{Z} \right)^{-1} \text{Z}' \end{align}` $$ where `\(\text{D}\)` is a matrix of our treatment and predetermined covariates `\(\left( \text{X}_{i} \right)\)` and `\(Z\)` is a matrix of our instrument and our predetermined covariates. --- ## IV estimation Important notes - The controls `\(\left( \text{X}_{i} \right)\)` must match in the first and second stages. - If you have exactly .hi-slate[one instrument] and exactly .hi-slate[one endogenous variable], then 2SLS and IV are identical. - Your second-stage standard errors are not correct. --- name: reduced-form ## The reduced form In addition to the regressions within the two stages of 2SLS 1. `\(\text{D}_{i} = \gamma_1 \color{#e64173}{\text{Z}_{i}} + \gamma_2 \text{X}_{i} + u_i\)` 2. `\(\text{Y}_{i} = \beta_1 \color{#6A5ACD}{\widehat{\text{D}}_{i}} + \beta_2 \text{X}_{i} + \varepsilon_i\)` there is a third important and related regression: the reduced form. The .attn[reduced form] regresses the outcome `\(\text{Y}_{i}\)` (LHS of the second stage) on our instrument `\(\color{#e64173}{\text{Z}_{i}}\)` and covariates `\(\text{X}_{i}\)` (RHS of the first stage). $$ `\begin{align} \text{Y}_{i} = \pi_1 \color{#e64173}{\text{Z}_{i}} + \pi_2 \text{X}_{i} + u_i \end{align}` $$ Thus, the reduced form provides a consistent estimate of the causal effect of our instrument on the outcome. --- ## The reduced form, continued While the reduced form estimates the causal effect of the instrument on our outcome, we're often actually interested in the effect of *treatment* `\(\left( \text{D}_{i} \right)\)`. That said, the reduced form is still incredibly helpful/important: - Clarifies your source of identifying variation. - Does not suffer from *weak instruments* problems. - Only requires `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_i \right) = 0\)`. - Offers insights into your estimates $$ `\begin{align} \widehat{\beta}_{1}^\text{2SLS} = \dfrac{\widehat{\pi}_{1}}{\widehat{\gamma}_{1}} \end{align}` $$ when you have exactly one instrument. --- name: reduced-intuition ## The reduced form, intuition This expression for the 2SLS (and IV) estimator can be very helpful. $$ `\begin{align} \widehat{\beta}_{1}^\text{2SLS} = \dfrac{\color{#6A5ACD}{\widehat{\pi}_{1}}}{\color{#20B2AA}{\widehat{\gamma}_{1}}} = \dfrac{\color{#6A5ACD}{\text{Reduced-form estimate}}}{\color{#20B2AA}{\text{First-stage estimate}}} \end{align}` $$ What's the interpretation/intuition? Back to our example: `\(\widehat{\beta}_1 =\)` est. effect of college graduation on income. `\(\color{#6A5ACD}{\widehat{\pi}_1}\)` gives the estimated causal effect of the scholarship lottery on income , but what share of lottery winners graduate? We need to rescale if `\(<\)` 100%. `\(\color{#20B2AA}{\widehat{\gamma}_1}\)` estimates the effect of winning the scholarship lottery on graduation —the share of winners who graduated due to winning. We can scale with `\(\color{#20B2AA}{\widehat{\gamma}_1}\)`! --- name: reduced-example ## The reduced form, example To see why this scaling makes sense, imagine that 50% of lottery winners graduate from college due to the lottery, _i.e._, `\(\color{#20B2AA}{\widehat{\gamma}_1 =}\)` .turquoise[0.50]..super[.pink[†]] .footnote[.pink[†] Imagine none of the applicants would have graduated otherwise] Our reduced-form estimate of `\(\color{#6A5ACD}{\widehat{\pi}_1=}\)` .purple[$5,000] says that lottery winners make $5,000 more than the control group, on average. However, half of the winners did not graduate, so `\(\color{#6A5ACD}{\widehat{\pi}_1}\)` "underestimates" the effect of college graduation by combining graduates by nongraduates. Thus, we want to double `\(\color{#6A5ACD}{\widehat{\pi}_1}\)`, _i.e._, divide by `\(\color{#20B2AA}{\widehat{\gamma}_1}\)`: `\(\color{#6A5ACD}{\widehat{\pi}_1}/\color{#20B2AA}{\widehat{\gamma}_1}\)` = .turquoise[$5,000]/.purple[0.5] = $10,000. --- name: reduced-derivation .qa[Q] How do we get this magical expression? `\(\left( \widehat{\beta}_1^\text{IV} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1} \right)\)` ## Derivation `\(\widehat{\beta}_1^\text{IV} = \left( \text{Z}'\text{D} \right)^{-1} \left( \text{Z}'\text{Y} \right)\)` `\(\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \left( \widetilde{\text{Z}}'\widetilde{\text{D}} \right)^{-1} \left( \widetilde{\text{Z}}'\text{Y} \right)\)` applying FWL to reduce `\(\text{D}\)` and `\(\text{Z}\)` to vectors. `\(\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \dfrac{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_{i},\, \text{Y}_{i} \right)}{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_i,\, \widetilde{\text{D}}_{i} \right)}\)` `\(= \dfrac{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_{i},\, \text{Y}_{i} \right)/\mathop{\text{Var}} \left( \widetilde{\text{Z}}_i \right)}{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_i,\, \widetilde{\text{D}}_{i} \right)/\mathop{\text{Var}} \left( \widetilde{\text{Z}}_i \right)}\)` `\(\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1}\)` .pink[✔] --- layout: false class: clear, middle Let's push a bit deeper into IV's mechanics and intuition. --- layout: true # IV: Mechanics and intuition --- name: iv-intuition ## Setup In this section, we'll use medical trials as a working example..super[.pink[†]] .footnote[.pink[†] Credit/thanks go to [Michael Anderson](https://are.berkeley.edu/~mlanderson/ARE_Website/Home.html) for this example—and much of these notes.] We are interested in the regression model for the effect of some treatment (_e.g._, blood-pressure medication) on medical outcome `\(\text{Y}_{i}\)` $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \end{align}` $$ `\(\text{D}_{i}\)` indicates whether `\(i\)` *takes* the treatment (medication). `\(\varepsilon_i\)` captures all other factors that affect `\(\text{Y}_{i}\)`. Or in potential-outcomes framework: $$ `\begin{align} \text{Y}_{i} &= \text{Y}_{1i} \text{D}_{i} + \text{Y}_{0i} (1-\text{D}_{i}) \\ \text{Y}_{0i} &= \beta_0 + \varepsilon_i \\ \text{Y}_{1i} &= \text{Y}_{0i} + \beta_1 \end{align}` $$ --- ## Research design .note[Goal] .hi-slate[Estimate the effect of blood-pressure medication] on blood pressure. .note[Challenge] .hi-slate[Selection bias:] Even if treatment reduces blood pressure, selection bias will fights against the estimated effect. .note[Solution] .hi-slate[Randomized medical trial:] Ask randomly chosen individuals in treatment group to take the pill. Control individual get placebo (or nothing). .note[Analysis 1] .attn[Intention to treat] (.attn[ITT]): `\(\widehat{\beta}_1^\text{ITT} = \overline{\text{Y}}_\text{Trt} - \overline{\text{Y}}_\text{Ctrl}\)` .note[ITT problem] .attn[Bias from noncompliance:] People don't always follow rules. <br>*E.g.*, treated folks who don't take pills; control folks who take pills. .note[Analysis 2] .hi-slate[IV!] Instrument medication `\(\text{D}_{i}\)` with intention to treat `\(\text{Z}_{i}\)`. --- ## The IV solution First question: Is `\(\text{Z}_{i}\)` a valid instrument for `\(\text{D}_{i}\)`? 1. `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_i \right) = 0\)` as `\(\text{Z}_{i}\)` was randomly assigned (exclusion restriction). 1. `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right)\neq 0\)` if assignment to treatment changes the likelihood you take the pills (first stage). ∴ `\(\text{Z}_{i}\)` is a valid instrument for `\(\text{D}_{i}\)` and IV consistently estimates `\(\beta_1\)`. --- name: iv-noncompliance ## Noncompliance .attn[Noncompliant] individuals do not abide by their treatment assignment. Let's see how IV "solves" this problems. First, assume noncompliance only affects treated individuals—*i.e.*, treated folks sometimes don't take their pills; control folks never take pills. --- ## Noncompliance, continued The .hi-slate[first stage] recovers the share of treatment individuals who take the pill $$ `\begin{align} \text{D}_{i} = \gamma_1 \text{Z}_{i} + u_i \end{align}` $$ *i.e.*, if 50% of treated individuals take the medication, `\(\widehat{\gamma} =\)` 0.50. The .hi-slate[reduced form] estimates the *ITT* $$ `\begin{align} \text{Y}_{i} = \pi_1 \text{Z}_{i} + v_i \end{align}` $$ which we know IV rescales using the first stage $$ `\begin{align} \widehat{\beta}_{1}^\text{IV} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1} = \dfrac{\widehat{\pi}_1}{0.50} = 2 \times \widehat{\pi}_1 \end{align}` $$ --- name: iv-rescale ## Noncompliance, continued IV solves the noncompliance issue by rescaling by the rate of compliance. If everyone perfectly complies, then `\(\widehat{\gamma}_1 = 1\)` and `\(\widehat{\beta}_{1}^\text{IV} = \widehat{\pi}_1/1 = \widehat{\beta}_{1}^\text{ITT}\)`. .ex[Further example] `\(N_\text{Trt}\)` = 10; trt. compliance = 50%; ctrl. compliance = 100%. `\(\overline{\text{Y}}_\text{Trt} = \dfrac{5 (\beta_0 + \beta_1) + 5 (\beta_0)}{10} = \beta_0 + \dfrac{\beta_1}{2}\)` and `\(\overline{\text{Y}}_\text{Ctrl} = \beta_0\)`. So our reduced-form estimate (the ITT) is `\(\widehat{\gamma}_1 = \dfrac{\beta_1}{2}\)` (half the true effect). IV consistently estimates `\(\beta_1\)` via rescaling the ITT by the rate of compliance $$ `\begin{align} \widehat{\beta}_1^\text{IV} = \dfrac{\pi}{\gamma} = \dfrac{\beta_1/2}{1/2} = \beta_1 \end{align}` $$ --- ## Takeaways Main points 1. IV .b[rescales] .pink[the causal effect of] `\(\color{#e64173}{\text{Z}_{i}}\)` .pink[on] `\(\color{#e64173}{\text{Y}_{i}}\)` by .purple[the causal effect of] `\(\color{#6A5ACD}{\text{Z}_{i}}\)` .purple[on] `\(\color{#6A5ACD}{\text{D}_{i}}\)`. 1. IV .b[does not] compare treated compliers to untreated compliers. <br>Such a comparison/estimator would re-introduce selection bias. --- layout: true class: clear, middle --- name: het Thus far, we assumed homogeneous treatment effects. .qa[Q] What happens .b[when treatment effects are heterogeneous]? --- .qa[A] Let's recall what our instruments are doing (with Venn diagrams!). .note[Credit] [Glen Waddell](http://www.glenwaddell.com) introduced me to IV via Venn. --- name: venn <img src="08IV_NoPause_files/figure-html/venn_iv-1.svg" style="display: block; margin: auto;" /> --- <img src="08IV_NoPause_files/figure-html/venn-endog-1.svg" style="display: block; margin: auto;" /> --- <img src="08IV_NoPause_files/figure-html/venn-irrelevant-1.svg" style="display: block; margin: auto;" /> --- <img src="08IV_NoPause_files/figure-html/venn-iv-endog2-1.svg" style="display: block; margin: auto;" /> --- <img src="08IV_NoPause_files/figure-html/venn-iv-endog1-1.svg" style="display: block; margin: auto;" /> --- layout: true # IV + heterogeneity --- ## Recap Throughout the course, we've discussed two concepts of treatment effects. 1. .attn[Average treatment effect] (.attn[ATE]) The average treatment effect for an individual randomly drawn from our sample. 1. .attn[Treatment on the treated] (.attn[TOT]) The average treatment effect for a .it.hi-slate[treated] individual randomly drawn from our sample. When we assume homogeneous/constant treatment effects, ATE = TOT. .qa[Q] If treatment effects vary, then what do IV and 2SLS estimate? .qa[A] Not ATE. And not TOT. They estimate the LATE..super[.pink[†]] .footnote[ .pink[†] See [Angrist, Imbens, and Rubin (1996)](https://www.jstor.org/stable/2291629). ] --- ## The LATE IV generally estimates the .attn[LATE]—the .attn[Local Average Treatment Effect]. .note[Recall] IV "works" by isolating variation in `\(\text{D}_{i}\)` induced by our instrument `\(\text{Z}_{i}\)`. In other words: IV focuses on the individuals whose `\(\text{D}_{i}\)` changes due to `\(\text{Z}_{i}\)`. Angrist, Imbens, and Rubin (1996) call these folks .attn[compliers]. However, *compliers* are only one of four possible groups. .col-left[ 1. .attn[Compliers] `\(\text{D}_{i} = 1\)` iff `\(\text{Z}_{i}=1\)`. 1. .attn[Always-takers] `\(\text{D}_{i} = 1\)` `\(\forall \text{Z}_{i}\)`. 1. .attn[Never-takers] `\(\text{D}_{i} = 0\)` `\(\forall \text{Z}_{i}\)`. 1. .attn[Defiers] `\(\text{D}_{i} = 1\)` iff `\(\text{Z}_{i}=0\)`. ] .col-right[ Only take pills .hi-slate[when treated]. <br>.hi-slate[Always] take pills. <br>.hi-slate[Never] take pills. <br>Only take pills .hi-slate[when untreated]. ] --- ## The LATE Because IV only uses variation in `\(\text{D}_{i}\)` that correlates with `\(\text{Z}_{i}\)`, IV mechanically drops *always-takers* and *never-takers*. Most IV derivations/applications assume away the existence of *defiers*. Thus, IV estimates a treatment effect .hi-slate[using only *compliers*]. Hence the "local" in *local average treatment effect*. --- name: late-ex ## The LATE: Medical-trial example Imagine treatment works for some `\(\left( \beta_{1,i} < 0 \right)\)` and not for others `\(\left( \beta_{1,j} = 0 \right)\)`. Suppose individuals know their response to blood-pressure medication. - `\(\beta_{1,i}<0\)` individuals always take the pill. - `\(\beta_{1,j}=0\)` individuals only take the pill when treated. Then our compliers will be individuals for whom `\(\beta_{1,j}=0\)`. Thus, IV's LATE will indicate no treatment effect `\(\left( \widehat{\beta}_1^\text{IV} = 0 \right)\)`. --- ## The LATE .qa[Q] So is IV actually inconsistent? .qa[A] It depends what you are trying to estimate (and how you interpret it). IV doesn't estimate the ATE or TOT, so it would be inconsistent for them..super[.pink[†]] .footnote[ .pink[†] Just as the TOT is not consistent for the ATE. ] IV estimates the *local* average treatment effect. .note[Takeaway] Because IV identifies off of compliers, it estimates an average treatment effect for these individuals (who *comply* with the instrument). .note[Takeaway.sub[2]] Different instruments have different LATEs. --- name: monotonicity ## Monotonicity We've already written down the two classical IV/2SLS assumptions - .note[First stage:] `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right) > 0\)` - .note[Exclusion restriction:] `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_{i} \right) = 0\)` but we need a third assumption to get ensure IV's complier-based LATE interpretation. - .attn[Monotonicity] (.attn[Uniformity]).attn[:] `\(\text{D}_{i}(z)\geq \text{D}_{i}(z')\)` or `\(\text{D}_{i}(z)\leq \text{D}_{i}(z') \enspace \forall i\)` <br> [Heckman](chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/http://jenni.uchicago.edu/papers/koop2006/koop2-IV_ho_2006-09-25a_mms.pdf): *Uniformity* of responses *across persons.* <br> [Imbens and Angrist (1994)](https://www.jstor.org/stable/2951620): Instrument has monotone effect on `\(\text{D}_{i}\)`. --- ## Monotonicity If "defiers" exist, then monotonicity/uniformity is violated. In this case, the IV estimand is $$ `\begin{align} \dfrac{\tau_{c} \mathop{\text{Pr}}\left(\text{complier}\right) - \tau_{d} \mathop{\text{Pr}}\left(\text{defier}\right)}{ \mathop{\text{Pr}}\left(\text{complier}\right) - \mathop{\text{Pr}}\left(\text{defier}\right)} \end{align}` $$ which is not bound between `\(\tau_{c}\)` and `\(\tau_{d}\)`. .ex[Example] `\(\tau_c=\)` 1 and `\(\tau_d=\)` 2. `\(\mathop{\text{Pr}}\left(\text{complier}\right)=\)` 2/3 and `\(\mathop{\text{Pr}}\left(\text{defier}\right)=\)` 1/3. Then the "LATE" is 0..super[.pink[†]] .footnote[ .pink[†] Some people would instead say that there is no LATE when you violate monotonicity. ] --- layout: false class: clear, middle Until now, we've focused on using a single instrument. The 2SLS estimator accomodates multiple instruments..super[.pink[†]] .footnote[ .pink[†] Whether you can find multiple valid instruments is another question. ] --- layout: true # Multiple instruments --- class: inverse, middle name: multi-inst --- ## Motivation .qa[Q] Why include multiple instruments? .qa[A] Multiple instruments can capture more variation in `\(\text{D}_{i}\)` (efficiency). Using terminology from the *system-of-equations* literature, - one instrument for one endogenous variable: .attn[just identified] - multiple instruments for one endogenous variable: .attn[over identified] --- ## In practice With (valid) instruments `\(\text{Z}_{1i}\)` and `\(\text{Z}_{2i}\)`, or first stage becomes $$ `\begin{align} \text{D}_{i} = \gamma_0 + \gamma_1 \text{Z}_{1i} + \gamma_2 \text{Z}_{2i} + \gamma_3 \text{X}_{i} + u_i \end{align}` $$ while our second stage is still $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \widehat{\text{D}}_{i} + \beta_2 \text{X}_{i} + v_i \end{align}` $$ --- layout: true # Multiple instruments ## Example: Quarter of birth --- name: multi-ex Back to our quest to estimate the returns to education. [Angrist and Krueger (1991)](https://www.jstor.org/stable/2937954) proposed *quarter of birth* as a set of instruments for years of schooling. Accordingly, their first stage looks something like.super[.pink[†]] .footnote[ .pink[†] We need to drop one of the quarter-of-birth indicators to avoid perfect collinearity. ] $$ `\begin{align} \text{Schooling}_i = \gamma_0 &+ \gamma_1 \mathbb{I}(\text{Born Q1})_{i} + \gamma_2 \mathbb{I}(\text{Born Q2})_{i} \\&+ \gamma_3 \mathbb{I}(\text{Born Q3})_{i} + \gamma_4 \mathbb{I}(\text{Born Q4})_{i} \\&+ \gamma_5 \text{X}_{i} + u_{i} \end{align}` $$ --- .qa[Q] Is quarter of birth a valid instrument? .qa[Q1] Why would quarter of birth affect schooling? (.note[First stage]) .qa[A1] Students cannot drop out of school until a certain age, and quarter of birth affects your age at the time you begin school. .ex[Example] Some states require students to stay in school until they are 16. - Students who start school at age .hi-slate[6] drop out after .hi-slate[10] years of schooling. - Students who start school at age .hi-slate[5] drop out after .hi-slate[11] years of schooling. --- If students must begin school in calendar year in which they turn 6 - December birthdates: begin school at 5.75; drop out with 10.25 yrs. - January birthdates: begin school at 6.75; drop out with 9.25 yrs. For some group, quarter of birth may affect the number of years in school. --- It turns out that the first stage is also pretty weak in this setting. .attn[Weak instruments] can cause several problems for 2SLS/IV: 1. Our estimator is a ratio of the reduced form and the first stage, so a weak first stage can blow up reduced-form estimates (amplifying reduced-form noise/bias). 2. Many weak instruments lead to a finite-sample issue in which 2SLS is biased toward OLS—our first stage is essentially overfitting. What about our other requirements for a valid instrument? --- .qa[Q2] Is quarter of birth uncorrelated with `\(\varepsilon_i\)` (.note[excludable])? .qa[A2] While quarter of birth may be fairly arbitrary for some families, other families might time births. If these birth timers differ from other couples along other dimensions (_e.g._, income or education), then quarter of birth may correlate with `\(\varepsilon_i\)`. --- .qa[Q3] Is the effect monotone? .qa[A3] Some.super[.pink[†]] argue that monotonicity may be violated in this setting. .footnote[ .pink[†] _E.g._, [Aliprantis (2012)](https://journals.sagepub.com/doi/abs/10.3102/1076998610396885) ] Consider December births. - Original idea: December birthdates will start school at age 5.7, inducing more years of education before 16. - *Redshirting* idea: Parents hold back December kids so they can be older (_i.e._, 6.7), inducing fewer years of education before 16. --- layout: true # 2SLS and .mono[R] --- name: 2sls-r ## `estimatr` You can implement 2SLS/IV in many ways in .mono[R]. Today: `esitmatr` and `iv_robust()`. Specifically, we give `iv_robust()` the relationship that we want separted from the instrument by `|` , *e.g.*, ```r # Estimate 2SLS iv_robust(Y ~ D | Z, data = sample_df, se_type = "classical") %>% tidy() %>% select(1:5) ``` ``` #> term estimate std.error statistic p.value #> 1 (Intercept) 5.786204 2.9744230 1.945320 0.0546020456 #> 2 D 1.107801 0.3043264 3.640173 0.0004372703 ``` --- ## Now in two stages! Of course, we can estimate 2SLS in two stages. ```r # First stage stage1 <- lm_robust(D ~ Z, data = sample_df, se_type = "classical") # First-stage results stage1 %>% tidy() %>% select(1:5) ``` ``` #> term estimate std.error statistic p.value #> 1 (Intercept) 8.8226148 0.3169568 27.835389 2.486413e-48 #> 2 Z 0.3257347 0.1031506 3.157857 2.112927e-03 ``` --- ## Second stage We just need to add `\(\widehat{\text{D}}_{i}\)` to our dataset. ```r # Add fitted (first-stage) values to data sample_df %<>% mutate(D_hat = stage1$fitted.values) # Second stage stage2 <- lm_robust(Y ~ D_hat, data = sample_df, se_type = "classical") # Second-stage results stage2 %>% tidy() %>% select(1:5) ``` ``` #> term estimate std.error statistic p.value #> 1 (Intercept) 5.786204 5.4132099 1.068904 0.28773854 #> 2 D_hat 1.107801 0.5538496 2.000184 0.04824759 ``` --- ## Standard errors However, recall that our second-stage standard errors are not correct. .center.hi-purple[Second-stage results] <span style="display:block; margin-bottom:-1em;"> </span> <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white;color: black;"> Int </td> <td style="text-align:right;background-color: white;color: black;"> 5.786 </td> <td style="text-align:right;background-color: white;color: black;"> 5.413 </td> <td style="text-align:right;background-color: white;color: black;"> 1.07 </td> <td style="text-align:left;background-color: white;color: black;"> 0.2877 </td> </tr> <tr> <td style="text-align:left;background-color: white;"> D hat </td> <td style="text-align:right;background-color: white;"> 1.108 </td> <td style="text-align:right;background-color: white;"> 0.554 </td> <td style="text-align:right;background-color: white;"> 2.00 </td> <td style="text-align:left;background-color: white;"> 0.0482 </td> </tr> </tbody> </table> .center.hi-pink[2SLS results] <span style="display:block; margin-bottom:-1em;"> </span> <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white;color: black;"> Int </td> <td style="text-align:right;background-color: white;color: black;"> 5.786 </td> <td style="text-align:right;background-color: white;color: black;"> 2.974 </td> <td style="text-align:right;background-color: white;color: black;"> 1.95 </td> <td style="text-align:left;background-color: white;color: black;"> 0.0546 </td> </tr> <tr> <td style="text-align:left;background-color: white;"> D </td> <td style="text-align:right;background-color: white;"> 1.108 </td> <td style="text-align:right;background-color: white;"> 0.304 </td> <td style="text-align:right;background-color: white;"> 3.64 </td> <td style="text-align:left;background-color: white;"> 0.0004 </td> </tr> </tbody> </table> --- layout: true # IV and 2SLS ## Conclusions --- name: conclusions 1. IV/2SLS focus on .hi-slate[isolating some "good" variation] in `\(\text{D}_{i}\)` via `\(\text{Z}_{i}\)`. 1. Important .hi-slate[requirements]: strong first stage, excludability, monotonicity. 1. IV and 2SLS .hi-slate[rescale the reduced form] with the first stage. 1. Estimates are .hi-slate[LATE from compliers]. 1. Different instruments can produce .hi-slate[different LATEs]. 1. A .hi-slate[weak first stage] can lead to problems. --- layout: false # Table of contents .col-left[ ### Admin .smallest[ 1. [Schedule](#schedule) ] ### Instrumental variables .smallest[ 1. [Research designs](#designs) 1. [Introduction](#intro) 1. [Definition](#defined) 1. [Example](#example) 1. [IV estimator](#iv-estimator) ] ] .col-right[ ### Two-stage least squares .smallest[ 1. [Setup](#setup) 1. [The reduced form](#reduced-form) - [Defined](#reduced-form) - [Intuition](#reduced-intuition) - [Example](#reduced-example) - [Derivation](#reduced-derivation) 1. [Intuition and mechanics](#iv-intuition) - [Noncompliance](#iv-noncompliance) - [Rescaling](#iv-rescale) 1. [Heterogeneous treatment effects](#het) - [Venn diagram](#venn) - [LATE](#late) - [Example](#late-ex) - [Monotonicity](#monotonicity) 1. [Multiple instruments](#multi-inst) - [Example](#multi-ex) 1. [2SLS and .mono[R]](#2sls-r) 1. [Conclusions](#conclusions) ] ] --- exclude: true