Instrumental Variables

class: center, middle, inverse, title-slide

.title[
# Instrumental Variables
]
.subtitle[
## EC 607, Set 9
]
.author[
### Edward Rubin
]

---

class: inverse, middle

$$
`\begin{align}
  \def\ci{\perp\mkern-10mu\perp}
\end{align}`
$$

# Prologue

---
name: schedule

# Schedule

## Last time

Matching and propensity-score methods
- Conditional independence
- Overlap

## Today

Instrumental variables (and two-stage least squares)

## Upcoming

Assignment 2

---
layout: true
# Research designs

---
class: inverse, middle

---
name: designs
## Selection on observables and/or unobservables

We've been focusing on .hi-slate[*selection-on-observables* designs], _i.e._,
$$
`\begin{align}
  \left(\text{Y}_{0i},\, \text{Y}_{1i}\right) \ci \text{D}_{i}|\text{X}_{i}
\end{align}`
$$
for .hi-slate[observable] variables `$\text{X}_{i}$`.

.hi-pink[*Selection-on-unobservable* designs] replace this assumption with two new (but related) assumptions

1. `$\left(\text{Y}_{0i},\, \text{Y}_{1i}\right) \perp \text{Z}_{i}$`

2. `$\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right) \neq 0$`

---
## Selection on observables and/or unobservables

Our main goal in causal-inference minded (applied) econometrics boils down to isolating .b["good" variation] in `$\text{D}_{i}$` (exogenous/as-good-as-random) from .b["bad" variation] (the part of `$\text{D}_{i}$` correlated with `$\text{Y}_{0i}$` and `$\text{Y}_{1i}$`).

(We want to avoid selection bias.)

- .hi-slate[Selection-on-observables designs] assume that we can control for all *bad variation* (selection) in `$\text{D}_{i}$` through a known (observed) `$\text{X}_{i}$`.

- .hi-pink[Selection-on-unobservables designs] assume that we can extract .b[part of] the *good variation* in `$\text{D}_{i}$` (generally using some `$\text{Z}_{i}$`) and then use this *good* part of `$\text{D}_{i}$` to estimate the effect of `$\text{D}_{i}$` on `$\text{Y}_{i}$`.
--
 We throw away the rest of `$\text{D}_{i}$` (it includes *bad variation*).

---
## Which route?

Which set of research designs is more palatable?

1. There are plenty of bad applications of both sets.<br>.purple[Violated assumptions, bad controls, *etc.*]

1. .hi-slate[Selection on observables] assumes we know .it[everything] about selection into treatment—we can identify .it[all] of the good (or bad) variation in `$\text{D}_{i}$`.
--
<br>.purple[Tough in non-experimental settings. Difficult to validate in practice.]

1. .hi-pink[Selection on unobservables] assumes we can isolate .it[some] good/clean variation in `$\text{D}_{i}$`, which we then use to estimate the effect of `$\text{D}_{i}$` on `$\text{Y}_{i}$`.
--
<br>.purple[Seems more plausible. Possible to validate. May be underpowered.]

---
layout: true
# Instrumental variables

---
name: intro
## Introduction

.attn[Instrumental variables] (IV).super[.pink[†]] is the canonical selection-on-unobservables design—isolating *good variation* in `$\text{D}_{i}$` via some magical .pink[instrument] `$\color{#e64173}{\text{Z}_{i}}$`.

.footnote[.pink[†] For the moment, we're lumping together IV and two-stage least squares (2SLS) together—as many people do—even though they are technically different.]

Consider some model (structural equation)
$$
`\begin{align}
  \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1}
\end{align}`
$$
To guarantee consistent OLS estimates for `$\beta_1$`, want `$\mathop{\text{Cov}} \left( \text{D}_{i},\,\varepsilon_i \right)=0$`.
<br> In general, this is a heroic assumption.

.note[Alternative:] Estimate `$\beta_1$` via instrumental variables.

---
name: defined
## Definition

For our model
$$
`\begin{align}
  \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1}
\end{align}`
$$

A valid .attn[instrument] is a variable `$\color{#e64173}{\text{Z}_{i}}$` such that

1. `$\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{D}_{i} \right) \neq 0$`
--
<br>our .pink[instrument] correlates with treatment
--
 (so we can keep part of `$\text{D}_{i}$`)

2. `$\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \varepsilon_i \right) = 0$`
--
<br>our .pink[instrument] is uncorrelated with other (non- `$\!\!\text{D}_{i}$`) determinants of `$\text{Y}_{i}$`
--
, _i.e._, `$\color{#e64173}{\text{Z}_{i}}$` is excludable from equation `$(1)$`.
--
 .attn[(exclusion restriction)]

---
name: iv-dag
## The DAG

.qa[Q] How does this DAG illustrate the requirements and identification of IV?

---
## The DAG

.qa[Relevance:] .b.purple[Z] causes an effect in .b.purple[D].

---
## The DAG

.qa[Exclusion restriction:] 
<br>  1\. .b.purple[Z] is .b.pink[exogenous] (not associated with) .b.purple[U] because
--
 .b.purple[D] is a collider.
--
<br>  .white[1\.] .it[I.e.], .b.purple[Z → D ← U → Y] is closed without conditioning on (unobservable) .b.purple[U].

---
## The DAG

.qa[Exclusion restriction:] 
<br>  1\. .b.purple[Z] is .b.pink[exogenous] (not associated with) .b.purple[U] because .b.purple[D] is a collider.
<br>  2\. Also: .b.purple[Z] does not directly cause .b.purple[Y].

---
name: example
## Example

Back to the returns to a college degree,
$$
`\begin{align}
  \text{Income}_i = \beta_0 + \beta_1 \text{Grad}_i + \varepsilon_i
\end{align}`
$$
OLS is likely biased.

What if that state conducts a (random) .hi-pink[lottery] for scholarships?

Let `$\color{#e64173}{\text{Lottery}_i}$` denote an indicator for whether `$i$` won a lottery scholarship..super[.pink[†]]

.footnote[.pink[†] We'll have to focus on families who were eligible/who applied.]

1. `$\mathop{\text{Cov}} \left( \color{#e64173}{\text{Lottery}_i},\, \text{Grad}_i \right)\neq 0$` `$\left( >0 \right)$` if scholarships increase grad. rates.

2. `$\mathop{\text{Cov}} \left(\color{#e64173}{\text{Lottery}_i},\, \varepsilon_i\right) = 0$` since the lottery is randomized.
---
layout: true
# Instrument variables
## The IV estimator

The IV estimator for our model
$$
`\begin{align}
  \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1}
\end{align}`
$$
with (valid) instrument `$\color{#e64173}{\text{Z}_{i}}$` is
$$
`\begin{align}
   \hat{\beta}_\text{IV} = \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{Y}\right)
\end{align}`
$$

---
name: iv-estimator

If you have no covariates, then
$$
`\begin{align}
  \hat{\beta}_\text{IV} = \dfrac{\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{Y}_{i}\right)}{\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{D}_{i} \right)}
\end{align}`
$$
---

If you have additional (exogenous) covariates `$\text{X}_i$`, then
$$
`\begin{align}
  \text{Z} &= \begin{bmatrix}\color{#e64173}{\text{Z}_{i}} & \text{X}_{i}\end{bmatrix}
  \\[0.5em]
  \text{D} &= \begin{bmatrix}\color{#e64173}{\text{D}_{i}} & \text{X}_{i}\end{bmatrix}
\end{align}`
$$
---
layout: true
# Instrumental variables

---
## Proof: Consistency

With a valid instrument `$\text{Z}_{i}$`, `$\hat{\beta}_\text{IV}$` is a consistent estimator for `$\beta_1$` in
$$
`\begin{align}
  \text{Y}_{i} = \beta_0 + \beta_1 \text{X}_{i} + \varepsilon_i \tag{1}
\end{align}`
$$

`$\mathop{\text{plim}}\left( \hat{\beta}_{IV} \right)$`

--
.pad-left[
`$= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{Y}\right) \right)$`
]

--
.pad-left[
`$= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{D} \beta + \text{Z}'\varepsilon\right) \right)$`
]

--
.pad-left[
`$= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{D}\right) \beta\right) + \mathop{\text{plim}}\left(\dfrac{1}{N} \text{Z}'\text{D}\right)^{-1} \mathop{\text{plim}}\left( \dfrac{1}{N} \text{Z}'\varepsilon\right)$`
]

--
.pad-left[
`$=\beta$`  .pink[✔]
]
---
layout: true
# Two-stage least squares

---
class: inverse, middle
---
name: setup
## Setup

You'll commonly see IV implemented as a two-stage process known as<br>.attn[two-stage least squares] (2SLS).

.attn[First stage] Estimate the effect of the instrument `$\color{#e64173}{\text{Z}_{i}}$` on our endogenous variable `$\text{D}_{i}$` and (predetermined) covariates `$\text{X}_{i}$`. Save `$\color{#6A5ACD}{\widehat{\text{D}}_{i}}$`.

$$
`\begin{align}
  \text{D}_{i} = \gamma_1 \color{#e64173}{\text{Z}_{i}} + \gamma_2 \text{X}_{i} + u_i
\end{align}`
$$

.attn[Second stage] Estimate the model we wanted—but only using the variation in `$\text{D}_{i}$` that correlates with `$\color{#e64173}{\text{Z}_{i}}$`, _i.e._, `$\color{#6A5ACD}{\widehat{\text{D}}_{i}}$`.

$$
`\begin{align}
  \text{Y}_{i} = \beta_1 \color{#6A5ACD}{\widehat{\text{D}}_{i}} + \beta_2 \text{X}_{i} + \varepsilon_i
\end{align}`
$$

.note[Note] The controls `$\text{X}_{i}$` must match in the first and second stages.

---
## IV estimation

This two-step procedure, with a valid instrument, produces an estimator `$\hat{\beta}_1$` that is consistent for `$\beta_1$`.

$$
`\begin{align}
  \hat{\beta}_\text{2SLS} &= \left( \text{D}' \text{P}_{\text{Z}} \text{D} \right)^{-1} \left( \text{D}' \text{P}_{\text{Z}} \text{Y} \right)
  \\[0.3em]
  \text{P}_{\text{Z}} &= \text{Z} \left( \text{Z}'\text{Z} \right)^{-1} \text{Z}'
\end{align}`
$$

where `$\text{D}$` is a matrix of our treatment and predetermined covariates `$\left( \text{X}_{i} \right)$` and `$Z$` is a matrix of our instrument and our predetermined covariates.

---
## IV estimation

Important notes

- The controls `$\left( \text{X}_{i} \right)$` must match in the first and second stages.

- *Related:* Nonlinear first stages can mess things up.

- If you have exactly .hi-slate[one instrument] and exactly .hi-slate[one endogenous variable], then 2SLS and IV are identical.

- Your second-stage standard errors are not correct.

---
name: reduced-form
## The reduced form

In addition to the regressions within the two stages of 2SLS
1. `$\text{D}_{i} = \gamma_1 \color{#e64173}{\text{Z}_{i}} + \gamma_2 \text{X}_{i} + u_i$`
2. `$\text{Y}_{i} = \beta_1 \color{#6A5ACD}{\widehat{\text{D}}_{i}} + \beta_2 \text{X}_{i} + \varepsilon_i$`

there is a third important and related regression: the reduced form.

The .attn[reduced form] regresses the outcome `$\text{Y}_{i}$` (LHS of the second stage) on our instrument `$\color{#e64173}{\text{Z}_{i}}$` and covariates `$\text{X}_{i}$` (RHS of the first stage).
$$
`\begin{align}
   \text{Y}_{i} = \pi_1 \color{#e64173}{\text{Z}_{i}} + \pi_2 \text{X}_{i} + u_i
\end{align}`
$$
--
Thus, the reduced form provides a consistent estimate of the causal effect of our instrument on the outcome.
---
## The reduced form, continued

While the reduced form estimates the causal effect of the instrument on our outcome, we're often actually interested in the effect of *treatment* `$\left( \text{D}_{i} \right)$`.

That said, the reduced form is still incredibly helpful/important:

- Clarifies your source of identifying variation.
--

- Does not suffer from *weak instruments* problems.
--

- Only requires `$\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_i \right) = 0$`.
--

- Offers insights into your estimates
--

$$
`\begin{align}
  \widehat{\beta}_{1}^\text{2SLS} = \dfrac{\widehat{\pi}_{1}}{\widehat{\gamma}_{1}}
\end{align}`
$$
when you have exactly one instrument.

---
name: reduced-intuition
## The reduced form, intuition

This expression for the 2SLS (and IV) estimator can be very helpful.
$$
`\begin{align}
  \widehat{\beta}_{1}^\text{2SLS} = \dfrac{\color{#6A5ACD}{\widehat{\pi}_{1}}}{\color{#20B2AA}{\widehat{\gamma}_{1}}} = \dfrac{\color{#6A5ACD}{\text{Reduced-form estimate}}}{\color{#20B2AA}{\text{First-stage estimate}}}
\end{align}`
$$

What's the interpretation/intuition?

Back to our example: `$\widehat{\beta}_1 =$` est. effect of college graduation on income.

`$\color{#6A5ACD}{\widehat{\pi}_1}$` gives the estimated causal effect of the scholarship lottery on income
--
, but what share of lottery winners graduate? We need to rescale if `$<$` 100%.

`$\color{#20B2AA}{\widehat{\gamma}_1}$` estimates the effect of winning the scholarship lottery on graduation
--
—the share of winners who graduated due to winning.
--
 We can scale with `$\color{#20B2AA}{\widehat{\gamma}_1}$`!
---
name: reduced-example
## The reduced form, example

To see why this scaling makes sense, imagine that 50% of lottery winners graduate from college due to the lottery, _i.e._, `$\color{#20B2AA}{\widehat{\gamma}_1 =}$` .turquoise[0.50]..super[.pink[†]]

.footnote[.pink[†] Imagine none of the applicants would have graduated otherwise]

Our reduced-form estimate of `$\color{#6A5ACD}{\widehat{\pi}_1=}$` .purple[$5,000] says that lottery winners make $5,000 more than the control group, on average.

However, half of the winners did not graduate, so `$\color{#6A5ACD}{\widehat{\pi}_1}$` "underestimates" the effect of college graduation by combining graduates by nongraduates.

Thus, we want to double `$\color{#6A5ACD}{\widehat{\pi}_1}$`, _i.e._, divide by `$\color{#20B2AA}{\widehat{\gamma}_1}$`:
`$\color{#6A5ACD}{\widehat{\pi}_1}/\color{#20B2AA}{\widehat{\gamma}_1}$` = .turquoise[$5,000]/.purple[0.5] = $10,000.
---
name: reduced-derivation

.qa[Q] How do we get this magical expression? `$\left( \widehat{\beta}_1^\text{IV} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1} \right)$`

## Derivation

`$\widehat{\beta}_1^\text{IV} = \left( \text{Z}'\text{D} \right)^{-1} \left( \text{Z}'\text{Y} \right)$`

`$\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \left( \widetilde{\text{Z}}'\widetilde{\text{D}} \right)^{-1} \left( \widetilde{\text{Z}}'\text{Y} \right)$`   applying FWL to reduce `$\text{D}$` and `$\text{Z}$` to vectors.

`$\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \dfrac{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_{i},\, \text{Y}_{i} \right)}{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_i,\, \widetilde{\text{D}}_{i} \right)}$`
--
 `$= \dfrac{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_{i},\, \text{Y}_{i} \right)/\mathop{\text{Var}} \left( \widetilde{\text{Z}}_i \right)}{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_i,\, \widetilde{\text{D}}_{i} \right)/\mathop{\text{Var}} \left( \widetilde{\text{Z}}_i \right)}$`

`$\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1}$`  .pink[✔]
---
layout: false
class: clear, middle

Let's push a bit deeper into IV's mechanics and intuition.
---
layout: true
# IV: Mechanics and intuition

---
name: iv-intuition

## Setup

In this section, we'll use medical trials as a working example..super[.pink[†]]

.footnote[.pink[†] Credit/thanks go to [Michael Anderson](https://are.berkeley.edu/~mlanderson/ARE_Website/Home.html) for this example—and much of these notes.]

We are interested in the regression model for the effect of some treatment (_e.g._, blood-pressure medication) on medical outcome `$\text{Y}_{i}$`
--
$$
`\begin{align}
  \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i
\end{align}`
$$
`$\text{D}_{i}$` indicates whether `$i$` *takes* the treatment (medication). `$\varepsilon_i$` captures all other factors that affect `$\text{Y}_{i}$`.
--
 Or in potential-outcomes framework:

$$
`\begin{align}
  \text{Y}_{i} &= \text{Y}_{1i} \text{D}_{i} + \text{Y}_{0i} (1-\text{D}_{i}) \\
  \text{Y}_{0i} &= \beta_0 + \varepsilon_i \\
  \text{Y}_{1i} &= \text{Y}_{0i} + \beta_1
\end{align}`
$$
---
## Research design

.note[Goal] .hi-slate[Estimate the effect of blood-pressure medication] on blood pressure.

.note[Challenge] .hi-slate[Selection bias:] Even if treatment reduces blood pressure, selection bias will fights against the estimated effect.

.note[Solution] .hi-slate[Randomized medical trial:] Ask randomly chosen individuals in treatment group to take the pill. Controls get placebo (or nothing).

.note[Analysis 1] .attn[Intention to treat] (.attn[ITT]): `$\widehat{\beta}_1^\text{ITT} = \overline{\text{Y}}_\text{Trt} - \overline{\text{Y}}_\text{Ctrl}$`

.note[ITT problem] .attn[Bias from noncompliance:] People don't always follow rules.
<br>*E.g.*, treated folks who don't take pills; control folks who take pills.

.note[Analysis 2] .hi-slate[IV!]
--
 Instrument medication `$\text{D}_{i}$` with intention to treat `$\text{Z}_{i}$`.
---
## The IV solution

First question: Is `$\text{Z}_{i}$` a valid instrument for `$\text{D}_{i}$`?

1. `$\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_i \right) = 0$` as `$\text{Z}_{i}$` was randomly assigned (exclusion restriction).

1. `$\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right)\neq 0$` if assignment to treatment changes the likelihood you take the pills (first stage).

∴ `$\text{Z}_{i}$` is a valid instrument for `$\text{D}_{i}$` and IV consistently estimates `$\beta_1$`.
---
name: iv-noncompliance
## Noncompliance

.attn[Noncompliant] individuals do not abide by their treatment assignment.

Let's see how IV "solves" this problems.

First, assume noncompliance only affects treated individuals—*i.e.*, treated folks sometimes don't take their pills; control folks never take pills.

---
## Noncompliance, continued

The .hi-slate[first stage] recovers the share of treatment individuals who take the pill
$$
`\begin{align}
  \text{D}_{i} = \gamma_1 \text{Z}_{i} + u_i
\end{align}`
$$
--
*i.e.*, if 50% of treated individuals take the medication, `$\widehat{\gamma}_1 =$` 0.50.

The .hi-slate[reduced form] estimates the *ITT*
$$
`\begin{align}
  \text{Y}_{i} = \pi_1 \text{Z}_{i} + v_i
\end{align}`
$$
--
which we know IV rescales using the first stage
$$
`\begin{align}
  \widehat{\beta}_{1}^\text{IV} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1} = \dfrac{\widehat{\pi}_1}{0.50} =
  2 \times \widehat{\pi}_1
\end{align}`
$$
---
name: iv-rescale
## Noncompliance, continued

IV solves the noncompliance issue by rescaling by the rate of compliance.

If everyone perfectly complies, then `$\widehat{\gamma}_1 = 1$` and `$\widehat{\beta}_{1}^\text{IV} = \widehat{\pi}_1/1 = \widehat{\beta}_{1}^\text{ITT}$`.

.ex[Further example] `$N_\text{Trt}$` = 10; trt. compliance = 50%; ctrl. compliance = 100%.

`$\overline{\text{Y}}_\text{Trt} = \dfrac{5 (\beta_0 + \beta_1) + 5 (\beta_0)}{10} = \beta_0 + \dfrac{\beta_1}{2}$`
--
 and `$\overline{\text{Y}}_\text{Ctrl} = \beta_0$`.

So our reduced-form estimate (the ITT) is `$\widehat{\gamma}_1 = \dfrac{\beta_1}{2}$` (half the true effect).

IV consistently estimates `$\beta_1$` via rescaling the ITT by the rate of compliance
$$
`\begin{align}
  \widehat{\beta}_1^\text{IV} = \dfrac{\pi}{\gamma} = \dfrac{\beta_1/2}{1/2} = \beta_1
\end{align}`
$$
---
## Takeaways

Main points

1. IV .b[rescales] .pink[the causal effect of] `$\color{#e64173}{\text{Z}_{i}}$` .pink[on] `$\color{#e64173}{\text{Y}_{i}}$` by .purple[the causal effect of] `$\color{#6A5ACD}{\text{Z}_{i}}$` .purple[on] `$\color{#6A5ACD}{\text{D}_{i}}$`.

1. IV .b[does not] compare treated compliers to untreated compliers.
--
<br>Such a comparison/estimator would re-introduce selection bias.
---
layout: true
class: clear, middle

---
name: het

Thus far, we assumed homogeneous treatment effects.

.qa[Q] What happens .b[when treatment effects are heterogeneous]?
---

.qa[A] Let's recall what our instruments are doing (with Venn diagrams!).

.note[Credit] [Glen Waddell](http://www.glenwaddell.com) introduced me to IV via Venn.

---
name: venn

---
layout: false
class: clear, middle

Can you draw the DAGs?

---
layout: true
# IV + heterogeneity

---
## Recap

Throughout the course, we've discussed two concepts of treatment effects.

1. .attn[Average treatment effect] (.attn[ATE]) The average treatment effect for an individual randomly drawn from our sample.

1. .attn[Treatment on the treated] (.attn[TOT]) The average treatment effect for a .it.hi-slate[treated] individual
randomly drawn from our sample.

When we assume homogeneous/constant treatment effects, ATE = TOT.

.qa[Q] If treatment effects vary, then what do IV and 2SLS estimate?

.qa[A] Not ATE.
--
 And not TOT.
--
 They estimate the LATE..super[.pink[†]]

.footnote[
 .pink[†] See [Angrist, Imbens, and Rubin (1996)](https://www.jstor.org/stable/2291629).
 ]
---
name: late
## The LATE

IV generally estimates the .attn[LATE]—the .attn[Local Average Treatment Effect].

.note[Recall] IV "works" by isolating variation in `$\text{D}_{i}$` induced by our instrument `$\text{Z}_{i}$`.

In other words: IV focuses on the individuals whose `$\text{D}_{i}$` changes due to `$\text{Z}_{i}$`.

Angrist, Imbens, and Rubin (1996) call these folks .attn[compliers].

However, *compliers* are only one of four possible groups.

.col-left[
1. .attn[Compliers] `$\text{D}_{i} = 1$` iff `$\text{Z}_{i}=1$`.
1. .attn[Always-takers] `$\text{D}_{i} = 1$` `$\forall \text{Z}_{i}$`.
1. .attn[Never-takers] `$\text{D}_{i} = 0$` `$\forall \text{Z}_{i}$`.
1. .attn[Defiers] `$\text{D}_{i} = 1$` iff `$\text{Z}_{i}=0$`.
]
--
.col-right[
Only take pills .hi-slate[when treated].
<br>.hi-slate[Always] take pills.
<br>.hi-slate[Never] take pills.
<br>Only take pills .hi-slate[when untreated].
]
---
## The LATE

Because IV only uses variation in `$\text{D}_{i}$` that correlates with `$\text{Z}_{i}$`, IV mechanically drops *always-takers* and *never-takers*.

Most IV derivations/applications assume away the existence of *defiers*.

Thus, IV estimates a treatment effect .hi-slate[using only *compliers*].

Hence the "local" in *local average treatment effect*.
---
name: late-ex
## The LATE: Medical-trial example

Imagine treatment works for some `$\left( \beta_{1,i} < 0 \right)$` and not for others `$\left( \beta_{1,j} = 0 \right)$`.

Suppose individuals know their response to blood-pressure medication.

--
- `$\beta_{1,i}<0$` individuals always take the pill.

--
- `$\beta_{1,j}=0$` individuals only take the pill when treated.

Then our compliers will be individuals for whom `$\beta_{1,j}=0$`.

Thus, IV's LATE will indicate no treatment effect `$\left( \widehat{\beta}_1^\text{IV} = 0 \right)$`.

---
## The LATE

.qa[Q] So is IV actually inconsistent?

.qa[A] It depends what you are trying to estimate (and how you interpret it).

IV doesn't estimate the ATE or TOT, so it would be inconsistent for them..super[.pink[†]]

.footnote[
.pink[†] Just as the TOT is not consistent for the ATE.
]

IV estimates the *local* average treatment effect.

.note[Takeaway] Because IV identifies off of compliers, it estimates an average treatment effect for these individuals (who *comply* with the instrument).

.note[Takeaway.sub[2]] Different instruments have different LATEs.
---
name: monotonicity
## Monotonicity

We've already written down the two classical IV/2SLS assumptions

- .note[First stage:] `$\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right) > 0$`
- .note[Exclusion restriction:] `$\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_{i} \right) = 0$`

but we need a third assumption to get ensure IV's complier-based LATE interpretation.

- .attn[Monotonicity] (.attn[Uniformity]).attn[:] `$\text{D}_{i}(z)\geq \text{D}_{i}(z')$` or `$\text{D}_{i}(z)\leq \text{D}_{i}(z') \enspace \forall i$`
<br> [Heckman](chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/http://jenni.uchicago.edu/papers/koop2006/koop2-IV_ho_2006-09-25a_mms.pdf): *Uniformity* of responses *across persons.*
<br> [Imbens and Angrist (1994)](https://www.jstor.org/stable/2951620): Instrument has monotone effect on `$\text{D}_{i}$`.
---
## Monotonicity

If "defiers" exist, then monotonicity/uniformity is violated.

In this case, the IV estimand is

$$
`\begin{align}
  \dfrac{\tau_{c} \mathop{\text{Pr}}\left(\text{complier}\right) - \tau_{d} \mathop{\text{Pr}}\left(\text{defier}\right)}{ \mathop{\text{Pr}}\left(\text{complier}\right) -  \mathop{\text{Pr}}\left(\text{defier}\right)}
\end{align}`
$$

which is not bound between `$\tau_{c}$` and `$\tau_{d}$`.

.ex[Example] `$\tau_c=$` 1 and `$\tau_d=$` 2. `$\mathop{\text{Pr}}\left(\text{complier}\right)=$` 2/3 and `$\mathop{\text{Pr}}\left(\text{defier}\right)=$` 1/3.

Then the "LATE" is 0..super[.pink[†]]

.footnote[
.pink[†] Some people would instead say that there is no LATE when you violate monotonicity.
]

---
layout: false
class: clear, middle

Until now, we've focused on using a single instrument.

The 2SLS estimator accomodates multiple instruments..super[.pink[†]]

.footnote[
.pink[†] Whether you can find multiple valid instruments is another question.
]
---
layout: true
# Multiple instruments

---
class: inverse, middle
name: multi-inst
---
## Motivation

.qa[Q] Why include multiple instruments?

.qa[A] Multiple instruments can capture more variation in `$\text{D}_{i}$` (efficiency).

Using terminology from the *system-of-equations* literature,

- one instrument for one endogenous variable: .attn[just identified]
- multiple instruments for one endogenous variable: .attn[over identified]

---
## In practice

With (valid) instruments `$\text{Z}_{1i}$` and `$\text{Z}_{2i}$`, or first stage becomes
$$
`\begin{align}
  \text{D}_{i} = \gamma_0 + \gamma_1 \text{Z}_{1i} + \gamma_2 \text{Z}_{2i} + \gamma_3 \text{X}_{i} + u_i
\end{align}`
$$

while our second stage is still
$$
`\begin{align}
  \text{Y}_{i} = \beta_0 + \beta_1 \widehat{\text{D}}_{i} + \beta_2 \text{X}_{i} + v_i
\end{align}`
$$
---
layout: true
# Multiple instruments
## Example: Quarter of birth

---
name: multi-ex

Back to our quest to estimate the returns to education.

[Angrist and Krueger (1991)](https://www.jstor.org/stable/2937954) proposed *quarter of birth* as a set of instruments for years of schooling.

Accordingly, their first stage looks something like.super[.pink[†]]

.footnote[
.pink[†] We need to drop one of the quarter-of-birth indicators to avoid perfect collinearity.
]

$$
`\begin{align}
  \text{Schooling}_i = \gamma_0 &+ \gamma_1 \mathbb{I}(\text{Born Q1})_{i} + \gamma_2 \mathbb{I}(\text{Born Q2})_{i}
  \\&+ \gamma_3 \mathbb{I}(\text{Born Q3})_{i} + \gamma_4 \mathbb{I}(\text{Born Q4})_{i}
  \\&+ \gamma_5 \text{X}_{i} + u_{i}
\end{align}`
$$

---
.qa[Q] Is quarter of birth a valid instrument?

.qa[Q1] Why would quarter of birth affect schooling? (.note[First stage])

.qa[A1] Students cannot drop out of school until a certain age, and quarter of birth affects your age at the time you begin school.

.ex[Example] Some states require students to stay in school until they are 16.
- Students who start school at age .hi-slate[6] drop out after .hi-slate[10] years of schooling.
- Students who start school at age .hi-slate[5] drop out after .hi-slate[11] years of schooling.

---
If students must begin school in calendar year in which they turn 6
- December birthdates: begin school at 5.75; drop out with 10.25 yrs.
- January birthdates: begin school at 6.75; drop out with 9.25 yrs.

For some group, quarter of birth may affect the number of years in school.

---

It turns out that the first stage is also pretty weak in this setting.

.attn[Weak instruments] can cause several problems for 2SLS/IV:

1. Our estimator is a ratio of the reduced form and the first stage, so a weak first stage essentially blows up the reduced-form estimates (amplifying reduced-form noise/bias).

2. Many weak instruments lead to a finite-sample issue in which 2SLS is biased toward OLS—our first stage is essentially overfitting.

What about our other requirements for a valid instrument?

---
.qa[Q2] Is quarter of birth uncorrelated with `$\varepsilon_i$` (.note[excludable])?

.qa[A2] While quarter of birth may be fairly arbitrary for some families, other families might time births.

If these birth timers differ from other couples along other dimensions (_e.g._, income or education), then quarter of birth may correlate with `$\varepsilon_i$`.

---
.qa[Q3] Is the effect monotone?

.qa[A3] Some.super[.pink[†]] argue that monotonicity may be violated in this setting.

.footnote[
.pink[†] _E.g._, [Aliprantis (2012)](https://journals.sagepub.com/doi/abs/10.3102/1076998610396885)
]

Consider December births.

- Original idea: December birthdates will start school at age 5.7, inducing more years of education before 16.

- *Redshirting* idea: Parents hold back December kids so they can be older (_i.e._, 6.7), inducing fewer years of education before 16.

---
layout: true
# 2SLS and .mono[R]

---
name: 2sls-r

## `feols`

You can implement 2SLS/IV in many ways in .mono[R].

Today: `fixest` and `feols()`
<br>There are others, *e.g.*, `estimatr::iv_robust()` and `lfe::felm()`

Specifically, `feols()` wants the exogenous "part" of the equation, a `|`, and the 'link' between the endogenous regressors and the instruments
--
, *e.g.*,

```r
# Estimate 2SLS
feols(Y ~ 1 | D ~ Z, data = sample_df) %>% tidy()
```

```
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     5.79     2.97       1.95 0.0546  
#> 2 fit_D           1.11     0.304      3.64 0.000437
```

---

## Now in two stages!

Of course, we can estimate 2SLS in two stages.

```r
# First stage
stage1 = feols(D ~ Z, data = sample_df)
# First-stage results
stage1 %>% tidy()
```

```
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)    8.82      0.317     27.8  2.49e-48
#> 2 Z              0.326     0.103      3.16 2.11e- 3
```

---

## Second stage

We just need to add `$\widehat{\text{D}}_{i}$` to our dataset.

```r
# Add fitted (first-stage) values to data
sample_df %<>% mutate(D_hat = stage1$fitted.values)
# Second stage
stage2 = feols(Y ~ D_hat, data = sample_df)
# Second-stage results
stage2 %>% tidy()
```

```
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)     5.79     5.41       1.07  0.288 
#> 2 D_hat           1.11     0.554      2.00  0.0482
```

---
## Standard errors

However, recall that our second-stage standard errors are not correct.

.center.hi-purple[Second-stage results]
<span style="display:block; margin-bottom:-1em;"> </span>

<table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Term </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t stat. </th>
   <th style="text-align:left;"> p-Value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: white !important;color: black !important;"> Int </td>
   <td style="text-align:right;background-color: white !important;color: black !important;"> 5.786 </td>
   <td style="text-align:right;background-color: white !important;color: black !important;"> 5.413 </td>
   <td style="text-align:right;background-color: white !important;color: black !important;"> 1.07 </td>
   <td style="text-align:left;background-color: white !important;color: black !important;"> 0.2877 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: white !important;"> D hat </td>
   <td style="text-align:right;background-color: white !important;"> 1.108 </td>
   <td style="text-align:right;background-color: white !important;"> 0.554 </td>
   <td style="text-align:right;background-color: white !important;"> 2.00 </td>
   <td style="text-align:left;background-color: white !important;"> 0.0482 </td>
  </tr>
</tbody>
</table>

.center.hi-pink[2SLS results]
<span style="display:block; margin-bottom:-1em;"> </span>
<table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Term </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t stat. </th>
   <th style="text-align:left;"> p-Value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: white !important;color: black !important;"> Int </td>
   <td style="text-align:right;background-color: white !important;color: black !important;"> 5.786 </td>
   <td style="text-align:right;background-color: white !important;color: black !important;"> 2.974 </td>
   <td style="text-align:right;background-color: white !important;color: black !important;"> 1.95 </td>
   <td style="text-align:left;background-color: white !important;color: black !important;"> 0.0546 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: white !important;"> D </td>
   <td style="text-align:right;background-color: white !important;"> 1.108 </td>
   <td style="text-align:right;background-color: white !important;"> 0.304 </td>
   <td style="text-align:right;background-color: white !important;"> 3.64 </td>
   <td style="text-align:left;background-color: white !important;"> 0.0004 </td>
  </tr>
</tbody>
</table>

---
layout: true
# IV and 2SLS
## Conclusions

---
name: conclusions

1. IV/2SLS focus on .hi-slate[isolating some "good" variation] in `$\text{D}_{i}$` via `$\text{Z}_{i}$`.

1. Important .hi-slate[requirements]: strong first stage, excludability, monotonicity.

1. IV and 2SLS .hi-slate[rescale the reduced form] with the first stage.

1. Estimates are .hi-slate[LATE from compliers].

1. Different instruments can produce .hi-slate[different LATEs].

1. A .hi-slate[weak first stage] can lead to problems.
---
layout: false
# Table of contents

.col-left[
### Admin
.smallest[

1. [Schedule](#schedule)
]

### Instrumental variables
.smallest[
1. [Research designs](#designs)
1. [Introduction](#intro)
1. [Definition](#defined)
1. [DAG](#iv-dag)
1. [Example](#example)
1. [IV estimator](#iv-estimator)
]

]

.col-right[

### Two-stage least squares
.smallest[
1. [Setup](#setup)
1. [The reduced form](#reduced-form)
  - [Defined](#reduced-form)
  - [Intuition](#reduced-intuition)
  - [Example](#reduced-example)
  - [Derivation](#reduced-derivation)
1. [Intuition and mechanics](#iv-intuition)
  - [Noncompliance](#iv-noncompliance)
  - [Rescaling](#iv-rescale)
1. [Heterogeneous treatment effects](#het)
  - [Venn diagram](#venn)
  - [LATE](#late)
  - [Example](#late-ex)
  - [Monotonicity](#monotonicity)
1. [Multiple instruments](#multi-inst)
  - [Example](#multi-ex)
1. [2SLS and .mono[R]](#2sls-r)
1. [Conclusions](#conclusions)
]

]