class: center, middle, inverse, title-slide .title[ # Instrumental Variables ] .subtitle[ ## EC 607, Set 9 ] .author[ ### Edward Rubin ] --- class: inverse, middle $$ `\begin{align} \def\ci{\perp\mkern-10mu\perp} \end{align}` $$ # Prologue --- name: schedule # Schedule ## Last time Matching and propensity-score methods - Conditional independence - Overlap ## Today Instrumental variables (and two-stage least squares) ## Upcoming Assignment 2 --- layout: true # Research designs --- class: inverse, middle --- name: designs ## Selection on observables and/or unobservables We've been focusing on .hi-slate[*selection-on-observables* designs], _i.e._, $$ `\begin{align} \left(\text{Y}_{0i},\, \text{Y}_{1i}\right) \ci \text{D}_{i}|\text{X}_{i} \end{align}` $$ for .hi-slate[observable] variables `\(\text{X}_{i}\)`. -- .hi-pink[*Selection-on-unobservable* designs] replace this assumption with two new (but related) assumptions 1. `\(\left(\text{Y}_{0i},\, \text{Y}_{1i}\right) \perp \text{Z}_{i}\)` 2. `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right) \neq 0\)` --- ## Selection on observables and/or unobservables Our main goal in causal-inference minded (applied) econometrics boils down to isolating .b["good" variation] in `\(\text{D}_{i}\)` (exogenous/as-good-as-random) from .b["bad" variation] (the part of `\(\text{D}_{i}\)` correlated with `\(\text{Y}_{0i}\)` and `\(\text{Y}_{1i}\)`). -- (We want to avoid selection bias.) -- - .hi-slate[Selection-on-observables designs] assume that we can control for all *bad variation* (selection) in `\(\text{D}_{i}\)` through a known (observed) `\(\text{X}_{i}\)`. -- - .hi-pink[Selection-on-unobservables designs] assume that we can extract .b[part of] the *good variation* in `\(\text{D}_{i}\)` (generally using some `\(\text{Z}_{i}\)`) and then use this *good* part of `\(\text{D}_{i}\)` to estimate the effect of `\(\text{D}_{i}\)` on `\(\text{Y}_{i}\)`. -- We throw away the rest of `\(\text{D}_{i}\)` (it includes *bad variation*). --- ## Which route? Which set of research designs is more palatable? -- 1. There are plenty of bad applications of both sets.<br>.purple[Violated assumptions, bad controls, *etc.*] -- 1. .hi-slate[Selection on observables] assumes we know .it[everything] about selection into treatment—we can identify .it[all] of the good (or bad) variation in `\(\text{D}_{i}\)`. -- <br>.purple[Tough in non-experimental settings. Difficult to validate in practice.] -- 1. .hi-pink[Selection on unobservables] assumes we can isolate .it[some] good/clean variation in `\(\text{D}_{i}\)`, which we then use to estimate the effect of `\(\text{D}_{i}\)` on `\(\text{Y}_{i}\)`. -- <br>.purple[Seems more plausible. Possible to validate. May be underpowered.] --- layout: true # Instrumental variables --- name: intro ## Introduction .attn[Instrumental variables] (IV).super[.pink[†]] is the canonical selection-on-unobservables design—isolating *good variation* in `\(\text{D}_{i}\)` via some magical .pink[instrument] `\(\color{#e64173}{\text{Z}_{i}}\)`. .footnote[.pink[†] For the moment, we're lumping together IV and two-stage least squares (2SLS) together—as many people do—even though they are technically different.] -- Consider some model (structural equation) $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1} \end{align}` $$ To guarantee consistent OLS estimates for `\(\beta_1\)`, want `\(\mathop{\text{Cov}} \left( \text{D}_{i},\,\varepsilon_i \right)=0\)`. <br> In general, this is a heroic assumption. -- .note[Alternative:] Estimate `\(\beta_1\)` via instrumental variables. --- name: defined ## Definition For our model $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1} \end{align}` $$ A valid .attn[instrument] is a variable `\(\color{#e64173}{\text{Z}_{i}}\)` such that 1. `\(\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{D}_{i} \right) \neq 0\)` -- <br>our .pink[instrument] correlates with treatment -- (so we can keep part of `\(\text{D}_{i}\)`) -- 2. `\(\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \varepsilon_i \right) = 0\)` -- <br>our .pink[instrument] is uncorrelated with other (non- `\(\!\!\text{D}_{i}\)`) determinants of `\(\text{Y}_{i}\)` -- , _i.e._, `\(\color{#e64173}{\text{Z}_{i}}\)` is excludable from equation `\((1)\)`. -- .attn[(exclusion restriction)] --- name: iv-dag ## The DAG <img src="09-iv_files/figure-html/dag-plot-1.svg" style="display: block; margin: auto;" /> -- .qa[Q] How does this DAG illustrate the requirements and identification of IV? --- ## The DAG <img src="09-iv_files/figure-html/dag-plot-2-1.svg" style="display: block; margin: auto;" /> .qa[Relevance:] .b.purple[Z] causes an effect in .b.purple[D]. --- ## The DAG <img src="09-iv_files/figure-html/dag-plot-3-1.svg" style="display: block; margin: auto;" /> .qa[Exclusion restriction:] <br> 1\. .b.purple[Z] is .b.pink[exogenous] (not associated with) .b.purple[U] because -- .b.purple[D] is a collider. -- <br> .white[1\.] .it[I.e.], .b.purple[Z → D ← U → Y] is closed without conditioning on (unobservable) .b.purple[U]. --- ## The DAG <img src="09-iv_files/figure-html/dag-plot-4-1.svg" style="display: block; margin: auto;" /> .qa[Exclusion restriction:] <br> 1\. .b.purple[Z] is .b.pink[exogenous] (not associated with) .b.purple[U] because .b.purple[D] is a collider. <br> 2\. Also: .b.purple[Z] does not directly cause .b.purple[Y]. --- name: example ## Example Back to the returns to a college degree, $$ `\begin{align} \text{Income}_i = \beta_0 + \beta_1 \text{Grad}_i + \varepsilon_i \end{align}` $$ OLS is likely biased. -- What if that state conducts a (random) .hi-pink[lottery] for scholarships? -- Let `\(\color{#e64173}{\text{Lottery}_i}\)` denote an indicator for whether `\(i\)` won a lottery scholarship..super[.pink[†]] .footnote[.pink[†] We'll have to focus on families who were eligible/who applied.] -- 1. `\(\mathop{\text{Cov}} \left( \color{#e64173}{\text{Lottery}_i},\, \text{Grad}_i \right)\neq 0\)` `\(\left( >0 \right)\)` if scholarships increase grad. rates. -- 2. `\(\mathop{\text{Cov}} \left(\color{#e64173}{\text{Lottery}_i},\, \varepsilon_i\right) = 0\)` since the lottery is randomized. --- layout: true # Instrument variables ## The IV estimator The IV estimator for our model $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1} \end{align}` $$ with (valid) instrument `\(\color{#e64173}{\text{Z}_{i}}\)` is $$ `\begin{align} \hat{\beta}_\text{IV} = \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{Y}\right) \end{align}` $$ --- name: iv-estimator -- If you have no covariates, then $$ `\begin{align} \hat{\beta}_\text{IV} = \dfrac{\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{Y}_{i}\right)}{\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{D}_{i} \right)} \end{align}` $$ --- If you have additional (exogenous) covariates `\(\text{X}_i\)`, then $$ `\begin{align} \text{Z} &= \begin{bmatrix}\color{#e64173}{\text{Z}_{i}} & \text{X}_{i}\end{bmatrix} \\[0.5em] \text{D} &= \begin{bmatrix}\color{#e64173}{\text{D}_{i}} & \text{X}_{i}\end{bmatrix} \end{align}` $$ --- layout: true # Instrumental variables --- ## Proof: Consistency With a valid instrument `\(\text{Z}_{i}\)`, `\(\hat{\beta}_\text{IV}\)` is a consistent estimator for `\(\beta_1\)` in $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{X}_{i} + \varepsilon_i \tag{1} \end{align}` $$ `\(\mathop{\text{plim}}\left( \hat{\beta}_{IV} \right)\)` -- .pad-left[ `\(= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{Y}\right) \right)\)` ] -- .pad-left[ `\(= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{D} \beta + \text{Z}'\varepsilon\right) \right)\)` ] -- .pad-left[ `\(= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{D}\right) \beta\right) + \mathop{\text{plim}}\left(\dfrac{1}{N} \text{Z}'\text{D}\right)^{-1} \mathop{\text{plim}}\left( \dfrac{1}{N} \text{Z}'\varepsilon\right)\)` ] -- .pad-left[ `\(=\beta\)` .pink[✔] ] --- layout: true # Two-stage least squares --- class: inverse, middle --- name: setup ## Setup You'll commonly see IV implemented as a two-stage process known as<br>.attn[two-stage least squares] (2SLS). -- .attn[First stage] Estimate the effect of the instrument `\(\color{#e64173}{\text{Z}_{i}}\)` on our endogenous variable `\(\text{D}_{i}\)` and (predetermined) covariates `\(\text{X}_{i}\)`. Save `\(\color{#6A5ACD}{\widehat{\text{D}}_{i}}\)`. $$ `\begin{align} \text{D}_{i} = \gamma_1 \color{#e64173}{\text{Z}_{i}} + \gamma_2 \text{X}_{i} + u_i \end{align}` $$ -- .attn[Second stage] Estimate the model we wanted—but only using the variation in `\(\text{D}_{i}\)` that correlates with `\(\color{#e64173}{\text{Z}_{i}}\)`, _i.e._, `\(\color{#6A5ACD}{\widehat{\text{D}}_{i}}\)`. $$ `\begin{align} \text{Y}_{i} = \beta_1 \color{#6A5ACD}{\widehat{\text{D}}_{i}} + \beta_2 \text{X}_{i} + \varepsilon_i \end{align}` $$ .note[Note] The controls `\(\text{X}_{i}\)` must match in the first and second stages. --- ## IV estimation This two-step procedure, with a valid instrument, produces an estimator `\(\hat{\beta}_1\)` that is consistent for `\(\beta_1\)`. $$ `\begin{align} \hat{\beta}_\text{2SLS} &= \left( \text{D}' \text{P}_{\text{Z}} \text{D} \right)^{-1} \left( \text{D}' \text{P}_{\text{Z}} \text{Y} \right) \\[0.3em] \text{P}_{\text{Z}} &= \text{Z} \left( \text{Z}'\text{Z} \right)^{-1} \text{Z}' \end{align}` $$ where `\(\text{D}\)` is a matrix of our treatment and predetermined covariates `\(\left( \text{X}_{i} \right)\)` and `\(Z\)` is a matrix of our instrument and our predetermined covariates. --- ## IV estimation Important notes - The controls `\(\left( \text{X}_{i} \right)\)` must match in the first and second stages. - *Related:* Nonlinear first stages can mess things up. - If you have exactly .hi-slate[one instrument] and exactly .hi-slate[one endogenous variable], then 2SLS and IV are identical. - Your second-stage standard errors are not correct. --- name: reduced-form ## The reduced form In addition to the regressions within the two stages of 2SLS 1. `\(\text{D}_{i} = \gamma_1 \color{#e64173}{\text{Z}_{i}} + \gamma_2 \text{X}_{i} + u_i\)` 2. `\(\text{Y}_{i} = \beta_1 \color{#6A5ACD}{\widehat{\text{D}}_{i}} + \beta_2 \text{X}_{i} + \varepsilon_i\)` there is a third important and related regression: the reduced form. -- The .attn[reduced form] regresses the outcome `\(\text{Y}_{i}\)` (LHS of the second stage) on our instrument `\(\color{#e64173}{\text{Z}_{i}}\)` and covariates `\(\text{X}_{i}\)` (RHS of the first stage). $$ `\begin{align} \text{Y}_{i} = \pi_1 \color{#e64173}{\text{Z}_{i}} + \pi_2 \text{X}_{i} + u_i \end{align}` $$ -- Thus, the reduced form provides a consistent estimate of the causal effect of our instrument on the outcome. --- ## The reduced form, continued While the reduced form estimates the causal effect of the instrument on our outcome, we're often actually interested in the effect of *treatment* `\(\left( \text{D}_{i} \right)\)`. -- That said, the reduced form is still incredibly helpful/important: - Clarifies your source of identifying variation. -- - Does not suffer from *weak instruments* problems. -- - Only requires `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_i \right) = 0\)`. -- - Offers insights into your estimates -- $$ `\begin{align} \widehat{\beta}_{1}^\text{2SLS} = \dfrac{\widehat{\pi}_{1}}{\widehat{\gamma}_{1}} \end{align}` $$ when you have exactly one instrument. --- name: reduced-intuition ## The reduced form, intuition This expression for the 2SLS (and IV) estimator can be very helpful. $$ `\begin{align} \widehat{\beta}_{1}^\text{2SLS} = \dfrac{\color{#6A5ACD}{\widehat{\pi}_{1}}}{\color{#20B2AA}{\widehat{\gamma}_{1}}} = \dfrac{\color{#6A5ACD}{\text{Reduced-form estimate}}}{\color{#20B2AA}{\text{First-stage estimate}}} \end{align}` $$ -- What's the interpretation/intuition? -- Back to our example: `\(\widehat{\beta}_1 =\)` est. effect of college graduation on income. -- `\(\color{#6A5ACD}{\widehat{\pi}_1}\)` gives the estimated causal effect of the scholarship lottery on income -- , but what share of lottery winners graduate? We need to rescale if `\(<\)` 100%. -- `\(\color{#20B2AA}{\widehat{\gamma}_1}\)` estimates the effect of winning the scholarship lottery on graduation -- —the share of winners who graduated due to winning. -- We can scale with `\(\color{#20B2AA}{\widehat{\gamma}_1}\)`! --- name: reduced-example ## The reduced form, example To see why this scaling makes sense, imagine that 50% of lottery winners graduate from college due to the lottery, _i.e._, `\(\color{#20B2AA}{\widehat{\gamma}_1 =}\)` .turquoise[0.50]..super[.pink[†]] .footnote[.pink[†] Imagine none of the applicants would have graduated otherwise] -- Our reduced-form estimate of `\(\color{#6A5ACD}{\widehat{\pi}_1=}\)` .purple[$5,000] says that lottery winners make $5,000 more than the control group, on average. -- However, half of the winners did not graduate, so `\(\color{#6A5ACD}{\widehat{\pi}_1}\)` "underestimates" the effect of college graduation by combining graduates by nongraduates. -- Thus, we want to double `\(\color{#6A5ACD}{\widehat{\pi}_1}\)`, _i.e._, divide by `\(\color{#20B2AA}{\widehat{\gamma}_1}\)`: `\(\color{#6A5ACD}{\widehat{\pi}_1}/\color{#20B2AA}{\widehat{\gamma}_1}\)` = .turquoise[$5,000]/.purple[0.5] = $10,000. --- name: reduced-derivation .qa[Q] How do we get this magical expression? `\(\left( \widehat{\beta}_1^\text{IV} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1} \right)\)` -- ## Derivation -- `\(\widehat{\beta}_1^\text{IV} = \left( \text{Z}'\text{D} \right)^{-1} \left( \text{Z}'\text{Y} \right)\)` -- `\(\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \left( \widetilde{\text{Z}}'\widetilde{\text{D}} \right)^{-1} \left( \widetilde{\text{Z}}'\text{Y} \right)\)` applying FWL to reduce `\(\text{D}\)` and `\(\text{Z}\)` to vectors. -- `\(\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \dfrac{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_{i},\, \text{Y}_{i} \right)}{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_i,\, \widetilde{\text{D}}_{i} \right)}\)` -- `\(= \dfrac{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_{i},\, \text{Y}_{i} \right)/\mathop{\text{Var}} \left( \widetilde{\text{Z}}_i \right)}{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_i,\, \widetilde{\text{D}}_{i} \right)/\mathop{\text{Var}} \left( \widetilde{\text{Z}}_i \right)}\)` -- `\(\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1}\)` .pink[✔] --- layout: false class: clear, middle Let's push a bit deeper into IV's mechanics and intuition. --- layout: true # IV: Mechanics and intuition --- name: iv-intuition ## Setup In this section, we'll use medical trials as a working example..super[.pink[†]] .footnote[.pink[†] Credit/thanks go to [Michael Anderson](https://are.berkeley.edu/~mlanderson/ARE_Website/Home.html) for this example—and much of these notes.] -- We are interested in the regression model for the effect of some treatment (_e.g._, blood-pressure medication) on medical outcome `\(\text{Y}_{i}\)` -- $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \end{align}` $$ `\(\text{D}_{i}\)` indicates whether `\(i\)` *takes* the treatment (medication). `\(\varepsilon_i\)` captures all other factors that affect `\(\text{Y}_{i}\)`. -- Or in potential-outcomes framework: $$ `\begin{align} \text{Y}_{i} &= \text{Y}_{1i} \text{D}_{i} + \text{Y}_{0i} (1-\text{D}_{i}) \\ \text{Y}_{0i} &= \beta_0 + \varepsilon_i \\ \text{Y}_{1i} &= \text{Y}_{0i} + \beta_1 \end{align}` $$ --- ## Research design .note[Goal] .hi-slate[Estimate the effect of blood-pressure medication] on blood pressure. -- .note[Challenge] .hi-slate[Selection bias:] Even if treatment reduces blood pressure, selection bias will fights against the estimated effect. -- .note[Solution] .hi-slate[Randomized medical trial:] Ask randomly chosen individuals in treatment group to take the pill. Controls get placebo (or nothing). -- .note[Analysis 1] .attn[Intention to treat] (.attn[ITT]): `\(\widehat{\beta}_1^\text{ITT} = \overline{\text{Y}}_\text{Trt} - \overline{\text{Y}}_\text{Ctrl}\)` -- .note[ITT problem] .attn[Bias from noncompliance:] People don't always follow rules. <br>*E.g.*, treated folks who don't take pills; control folks who take pills. -- .note[Analysis 2] .hi-slate[IV!] -- Instrument medication `\(\text{D}_{i}\)` with intention to treat `\(\text{Z}_{i}\)`. --- ## The IV solution First question: Is `\(\text{Z}_{i}\)` a valid instrument for `\(\text{D}_{i}\)`? -- 1. `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_i \right) = 0\)` as `\(\text{Z}_{i}\)` was randomly assigned (exclusion restriction). -- 1. `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right)\neq 0\)` if assignment to treatment changes the likelihood you take the pills (first stage). -- ∴ `\(\text{Z}_{i}\)` is a valid instrument for `\(\text{D}_{i}\)` and IV consistently estimates `\(\beta_1\)`. --- name: iv-noncompliance ## Noncompliance .attn[Noncompliant] individuals do not abide by their treatment assignment. -- Let's see how IV "solves" this problems. -- First, assume noncompliance only affects treated individuals—*i.e.*, treated folks sometimes don't take their pills; control folks never take pills. --- ## Noncompliance, continued The .hi-slate[first stage] recovers the share of treatment individuals who take the pill $$ `\begin{align} \text{D}_{i} = \gamma_1 \text{Z}_{i} + u_i \end{align}` $$ -- *i.e.*, if 50% of treated individuals take the medication, `\(\widehat{\gamma}_1 =\)` 0.50. -- The .hi-slate[reduced form] estimates the *ITT* $$ `\begin{align} \text{Y}_{i} = \pi_1 \text{Z}_{i} + v_i \end{align}` $$ -- which we know IV rescales using the first stage $$ `\begin{align} \widehat{\beta}_{1}^\text{IV} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1} = \dfrac{\widehat{\pi}_1}{0.50} = 2 \times \widehat{\pi}_1 \end{align}` $$ --- name: iv-rescale ## Noncompliance, continued IV solves the noncompliance issue by rescaling by the rate of compliance. -- If everyone perfectly complies, then `\(\widehat{\gamma}_1 = 1\)` and `\(\widehat{\beta}_{1}^\text{IV} = \widehat{\pi}_1/1 = \widehat{\beta}_{1}^\text{ITT}\)`. -- .ex[Further example] `\(N_\text{Trt}\)` = 10; trt. compliance = 50%; ctrl. compliance = 100%. `\(\overline{\text{Y}}_\text{Trt} = \dfrac{5 (\beta_0 + \beta_1) + 5 (\beta_0)}{10} = \beta_0 + \dfrac{\beta_1}{2}\)` -- and `\(\overline{\text{Y}}_\text{Ctrl} = \beta_0\)`. -- So our reduced-form estimate (the ITT) is `\(\widehat{\gamma}_1 = \dfrac{\beta_1}{2}\)` (half the true effect). -- IV consistently estimates `\(\beta_1\)` via rescaling the ITT by the rate of compliance $$ `\begin{align} \widehat{\beta}_1^\text{IV} = \dfrac{\pi}{\gamma} = \dfrac{\beta_1/2}{1/2} = \beta_1 \end{align}` $$ --- ## Takeaways Main points 1. IV .b[rescales] .pink[the causal effect of] `\(\color{#e64173}{\text{Z}_{i}}\)` .pink[on] `\(\color{#e64173}{\text{Y}_{i}}\)` by .purple[the causal effect of] `\(\color{#6A5ACD}{\text{Z}_{i}}\)` .purple[on] `\(\color{#6A5ACD}{\text{D}_{i}}\)`. -- 1. IV .b[does not] compare treated compliers to untreated compliers. -- <br>Such a comparison/estimator would re-introduce selection bias. --- layout: true class: clear, middle --- name: het Thus far, we assumed homogeneous treatment effects. .qa[Q] What happens .b[when treatment effects are heterogeneous]? --- .qa[A] Let's recall what our instruments are doing (with Venn diagrams!). .note[Credit] [Glen Waddell](http://www.glenwaddell.com) introduced me to IV via Venn. --- name: venn <img src="09-iv_files/figure-html/venn_iv-1.svg" style="display: block; margin: auto;" /> --- <img src="09-iv_files/figure-html/venn-endog-1.svg" style="display: block; margin: auto;" /> --- <img src="09-iv_files/figure-html/venn-irrelevant-1.svg" style="display: block; margin: auto;" /> --- <img src="09-iv_files/figure-html/venn-iv-endog2-1.svg" style="display: block; margin: auto;" /> --- <img src="09-iv_files/figure-html/venn-iv-endog1-1.svg" style="display: block; margin: auto;" /> --- layout: false class: clear, middle Can you draw the DAGs? --- layout: true # IV + heterogeneity --- ## Recap Throughout the course, we've discussed two concepts of treatment effects. -- 1. .attn[Average treatment effect] (.attn[ATE]) The average treatment effect for an individual randomly drawn from our sample. -- 1. .attn[Treatment on the treated] (.attn[TOT]) The average treatment effect for a .it.hi-slate[treated] individual randomly drawn from our sample. -- When we assume homogeneous/constant treatment effects, ATE = TOT. -- .qa[Q] If treatment effects vary, then what do IV and 2SLS estimate? -- .qa[A] Not ATE. -- And not TOT. -- They estimate the LATE..super[.pink[†]] .footnote[ .pink[†] See [Angrist, Imbens, and Rubin (1996)](https://www.jstor.org/stable/2291629). ] --- name: late ## The LATE IV generally estimates the .attn[LATE]—the .attn[Local Average Treatment Effect]. -- .note[Recall] IV "works" by isolating variation in `\(\text{D}_{i}\)` induced by our instrument `\(\text{Z}_{i}\)`. -- In other words: IV focuses on the individuals whose `\(\text{D}_{i}\)` changes due to `\(\text{Z}_{i}\)`. Angrist, Imbens, and Rubin (1996) call these folks .attn[compliers]. -- However, *compliers* are only one of four possible groups. .col-left[ 1. .attn[Compliers] `\(\text{D}_{i} = 1\)` iff `\(\text{Z}_{i}=1\)`. 1. .attn[Always-takers] `\(\text{D}_{i} = 1\)` `\(\forall \text{Z}_{i}\)`. 1. .attn[Never-takers] `\(\text{D}_{i} = 0\)` `\(\forall \text{Z}_{i}\)`. 1. .attn[Defiers] `\(\text{D}_{i} = 1\)` iff `\(\text{Z}_{i}=0\)`. ] -- .col-right[ Only take pills .hi-slate[when treated]. <br>.hi-slate[Always] take pills. <br>.hi-slate[Never] take pills. <br>Only take pills .hi-slate[when untreated]. ] --- ## The LATE Because IV only uses variation in `\(\text{D}_{i}\)` that correlates with `\(\text{Z}_{i}\)`, IV mechanically drops *always-takers* and *never-takers*. -- Most IV derivations/applications assume away the existence of *defiers*. -- Thus, IV estimates a treatment effect .hi-slate[using only *compliers*]. -- Hence the "local" in *local average treatment effect*. --- name: late-ex ## The LATE: Medical-trial example Imagine treatment works for some `\(\left( \beta_{1,i} < 0 \right)\)` and not for others `\(\left( \beta_{1,j} = 0 \right)\)`. Suppose individuals know their response to blood-pressure medication. -- - `\(\beta_{1,i}<0\)` individuals always take the pill. -- - `\(\beta_{1,j}=0\)` individuals only take the pill when treated. -- Then our compliers will be individuals for whom `\(\beta_{1,j}=0\)`. -- Thus, IV's LATE will indicate no treatment effect `\(\left( \widehat{\beta}_1^\text{IV} = 0 \right)\)`. --- ## The LATE .qa[Q] So is IV actually inconsistent? -- .qa[A] It depends what you are trying to estimate (and how you interpret it). IV doesn't estimate the ATE or TOT, so it would be inconsistent for them..super[.pink[†]] .footnote[ .pink[†] Just as the TOT is not consistent for the ATE. ] -- IV estimates the *local* average treatment effect. -- .note[Takeaway] Because IV identifies off of compliers, it estimates an average treatment effect for these individuals (who *comply* with the instrument). -- .note[Takeaway.sub[2]] Different instruments have different LATEs. --- name: monotonicity ## Monotonicity We've already written down the two classical IV/2SLS assumptions - .note[First stage:] `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right) > 0\)` - .note[Exclusion restriction:] `\(\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_{i} \right) = 0\)` but we need a third assumption to get ensure IV's complier-based LATE interpretation. -- - .attn[Monotonicity] (.attn[Uniformity]).attn[:] `\(\text{D}_{i}(z)\geq \text{D}_{i}(z')\)` or `\(\text{D}_{i}(z)\leq \text{D}_{i}(z') \enspace \forall i\)` <br> [Heckman](chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/http://jenni.uchicago.edu/papers/koop2006/koop2-IV_ho_2006-09-25a_mms.pdf): *Uniformity* of responses *across persons.* <br> [Imbens and Angrist (1994)](https://www.jstor.org/stable/2951620): Instrument has monotone effect on `\(\text{D}_{i}\)`. --- ## Monotonicity If "defiers" exist, then monotonicity/uniformity is violated. -- In this case, the IV estimand is $$ `\begin{align} \dfrac{\tau_{c} \mathop{\text{Pr}}\left(\text{complier}\right) - \tau_{d} \mathop{\text{Pr}}\left(\text{defier}\right)}{ \mathop{\text{Pr}}\left(\text{complier}\right) - \mathop{\text{Pr}}\left(\text{defier}\right)} \end{align}` $$ which is not bound between `\(\tau_{c}\)` and `\(\tau_{d}\)`. -- .ex[Example] `\(\tau_c=\)` 1 and `\(\tau_d=\)` 2. `\(\mathop{\text{Pr}}\left(\text{complier}\right)=\)` 2/3 and `\(\mathop{\text{Pr}}\left(\text{defier}\right)=\)` 1/3. -- Then the "LATE" is 0..super[.pink[†]] .footnote[ .pink[†] Some people would instead say that there is no LATE when you violate monotonicity. ] --- layout: false class: clear, middle Until now, we've focused on using a single instrument. The 2SLS estimator accomodates multiple instruments..super[.pink[†]] .footnote[ .pink[†] Whether you can find multiple valid instruments is another question. ] --- layout: true # Multiple instruments --- class: inverse, middle name: multi-inst --- ## Motivation .qa[Q] Why include multiple instruments? -- .qa[A] Multiple instruments can capture more variation in `\(\text{D}_{i}\)` (efficiency). -- Using terminology from the *system-of-equations* literature, - one instrument for one endogenous variable: .attn[just identified] - multiple instruments for one endogenous variable: .attn[over identified] --- ## In practice With (valid) instruments `\(\text{Z}_{1i}\)` and `\(\text{Z}_{2i}\)`, or first stage becomes $$ `\begin{align} \text{D}_{i} = \gamma_0 + \gamma_1 \text{Z}_{1i} + \gamma_2 \text{Z}_{2i} + \gamma_3 \text{X}_{i} + u_i \end{align}` $$ -- while our second stage is still $$ `\begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \widehat{\text{D}}_{i} + \beta_2 \text{X}_{i} + v_i \end{align}` $$ --- layout: true # Multiple instruments ## Example: Quarter of birth --- name: multi-ex Back to our quest to estimate the returns to education. -- [Angrist and Krueger (1991)](https://www.jstor.org/stable/2937954) proposed *quarter of birth* as a set of instruments for years of schooling. -- Accordingly, their first stage looks something like.super[.pink[†]] .footnote[ .pink[†] We need to drop one of the quarter-of-birth indicators to avoid perfect collinearity. ] $$ `\begin{align} \text{Schooling}_i = \gamma_0 &+ \gamma_1 \mathbb{I}(\text{Born Q1})_{i} + \gamma_2 \mathbb{I}(\text{Born Q2})_{i} \\&+ \gamma_3 \mathbb{I}(\text{Born Q3})_{i} + \gamma_4 \mathbb{I}(\text{Born Q4})_{i} \\&+ \gamma_5 \text{X}_{i} + u_{i} \end{align}` $$ --- .qa[Q] Is quarter of birth a valid instrument? -- .qa[Q1] Why would quarter of birth affect schooling? (.note[First stage]) -- .qa[A1] Students cannot drop out of school until a certain age, and quarter of birth affects your age at the time you begin school. -- .ex[Example] Some states require students to stay in school until they are 16. - Students who start school at age .hi-slate[6] drop out after .hi-slate[10] years of schooling. - Students who start school at age .hi-slate[5] drop out after .hi-slate[11] years of schooling. --- If students must begin school in calendar year in which they turn 6 - December birthdates: begin school at 5.75; drop out with 10.25 yrs. - January birthdates: begin school at 6.75; drop out with 9.25 yrs. -- For some group, quarter of birth may affect the number of years in school. --- It turns out that the first stage is also pretty weak in this setting. .attn[Weak instruments] can cause several problems for 2SLS/IV: -- 1. Our estimator is a ratio of the reduced form and the first stage, so a weak first stage essentially blows up the reduced-form estimates (amplifying reduced-form noise/bias). -- 2. Many weak instruments lead to a finite-sample issue in which 2SLS is biased toward OLS—our first stage is essentially overfitting. -- What about our other requirements for a valid instrument? --- .qa[Q2] Is quarter of birth uncorrelated with `\(\varepsilon_i\)` (.note[excludable])? -- .qa[A2] While quarter of birth may be fairly arbitrary for some families, other families might time births. If these birth timers differ from other couples along other dimensions (_e.g._, income or education), then quarter of birth may correlate with `\(\varepsilon_i\)`. --- .qa[Q3] Is the effect monotone? -- .qa[A3] Some.super[.pink[†]] argue that monotonicity may be violated in this setting. .footnote[ .pink[†] _E.g._, [Aliprantis (2012)](https://journals.sagepub.com/doi/abs/10.3102/1076998610396885) ] -- Consider December births. -- - Original idea: December birthdates will start school at age 5.7, inducing more years of education before 16. -- - *Redshirting* idea: Parents hold back December kids so they can be older (_i.e._, 6.7), inducing fewer years of education before 16. --- layout: true # 2SLS and .mono[R] --- name: 2sls-r ## `feols` You can implement 2SLS/IV in many ways in .mono[R]. Today: `fixest` and `feols()` <br>There are others, *e.g.*, `estimatr::iv_robust()` and `lfe::felm()` -- Specifically, `feols()` wants the exogenous "part" of the equation, a `|`, and the 'link' between the endogenous regressors and the instruments -- , *e.g.*, ```r # Estimate 2SLS feols(Y ~ 1 | D ~ Z, data = sample_df) %>% tidy() ``` ``` #> # A tibble: 2 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 5.79 2.97 1.95 0.0546 #> 2 fit_D 1.11 0.304 3.64 0.000437 ``` --- ## Now in two stages! Of course, we can estimate 2SLS in two stages. ```r # First stage stage1 = feols(D ~ Z, data = sample_df) # First-stage results stage1 %>% tidy() ``` ``` #> # A tibble: 2 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 8.82 0.317 27.8 2.49e-48 #> 2 Z 0.326 0.103 3.16 2.11e- 3 ``` --- ## Second stage We just need to add `\(\widehat{\text{D}}_{i}\)` to our dataset. ```r # Add fitted (first-stage) values to data sample_df %<>% mutate(D_hat = stage1$fitted.values) # Second stage stage2 = feols(Y ~ D_hat, data = sample_df) # Second-stage results stage2 %>% tidy() ``` ``` #> # A tibble: 2 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 5.79 5.41 1.07 0.288 #> 2 D_hat 1.11 0.554 2.00 0.0482 ``` --- ## Standard errors However, recall that our second-stage standard errors are not correct. -- .center.hi-purple[Second-stage results] <span style="display:block; margin-bottom:-1em;"> </span> <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;color: black !important;"> Int </td> <td style="text-align:right;background-color: white !important;color: black !important;"> 5.786 </td> <td style="text-align:right;background-color: white !important;color: black !important;"> 5.413 </td> <td style="text-align:right;background-color: white !important;color: black !important;"> 1.07 </td> <td style="text-align:left;background-color: white !important;color: black !important;"> 0.2877 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> D hat </td> <td style="text-align:right;background-color: white !important;"> 1.108 </td> <td style="text-align:right;background-color: white !important;"> 0.554 </td> <td style="text-align:right;background-color: white !important;"> 2.00 </td> <td style="text-align:left;background-color: white !important;"> 0.0482 </td> </tr> </tbody> </table> -- .center.hi-pink[2SLS results] <span style="display:block; margin-bottom:-1em;"> </span> <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Term </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t stat. </th> <th style="text-align:left;"> p-Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;color: black !important;"> Int </td> <td style="text-align:right;background-color: white !important;color: black !important;"> 5.786 </td> <td style="text-align:right;background-color: white !important;color: black !important;"> 2.974 </td> <td style="text-align:right;background-color: white !important;color: black !important;"> 1.95 </td> <td style="text-align:left;background-color: white !important;color: black !important;"> 0.0546 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> D </td> <td style="text-align:right;background-color: white !important;"> 1.108 </td> <td style="text-align:right;background-color: white !important;"> 0.304 </td> <td style="text-align:right;background-color: white !important;"> 3.64 </td> <td style="text-align:left;background-color: white !important;"> 0.0004 </td> </tr> </tbody> </table> --- layout: true # IV and 2SLS ## Conclusions --- name: conclusions 1. IV/2SLS focus on .hi-slate[isolating some "good" variation] in `\(\text{D}_{i}\)` via `\(\text{Z}_{i}\)`. 1. Important .hi-slate[requirements]: strong first stage, excludability, monotonicity. 1. IV and 2SLS .hi-slate[rescale the reduced form] with the first stage. 1. Estimates are .hi-slate[LATE from compliers]. 1. Different instruments can produce .hi-slate[different LATEs]. 1. A .hi-slate[weak first stage] can lead to problems. --- layout: false # Table of contents .col-left[ ### Admin .smallest[ 1. [Schedule](#schedule) ] ### Instrumental variables .smallest[ 1. [Research designs](#designs) 1. [Introduction](#intro) 1. [Definition](#defined) 1. [DAG](#iv-dag) 1. [Example](#example) 1. [IV estimator](#iv-estimator) ] ] .col-right[ ### Two-stage least squares .smallest[ 1. [Setup](#setup) 1. [The reduced form](#reduced-form) - [Defined](#reduced-form) - [Intuition](#reduced-intuition) - [Example](#reduced-example) - [Derivation](#reduced-derivation) 1. [Intuition and mechanics](#iv-intuition) - [Noncompliance](#iv-noncompliance) - [Rescaling](#iv-rescale) 1. [Heterogeneous treatment effects](#het) - [Venn diagram](#venn) - [LATE](#late) - [Example](#late-ex) - [Monotonicity](#monotonicity) 1. [Multiple instruments](#multi-inst) - [Example](#multi-ex) 1. [2SLS and .mono[R]](#2sls-r) 1. [Conclusions](#conclusions) ] ]