--- title: "Instrumental Variables" subtitle: "EC 425/525, Set 8" author: "Edward Rubin" date: "`r format(Sys.time(), '%d %B %Y')`" output: xaringan::moon_reader: css: ['default', 'metropolis', 'metropolis-fonts', 'my-css.css'] # self_contained: true nature: highlightStyle: github highlightLines: true countIncrementalSlides: false --- class: inverse, middle ```{R, setup, include = F} # devtools::install_github("dill/emoGG") library(pacman) p_load( broom, tidyverse, ggplot2, ggthemes, ggforce, ggridges, latex2exp, viridis, extrafont, gridExtra, kableExtra, snakecase, janitor, data.table, dplyr, lubridate, knitr, estimatr, here, magrittr ) # Define pink color red_pink <- "#e64173" turquoise <- "#20B2AA" orange <- "#FFA500" red <- "#fb6107" blue <- "#3b3b9a" green <- "#8bb174" grey_light <- "grey70" grey_mid <- "grey50" grey_dark <- "grey20" purple <- "#6A5ACD" slate <- "#314f4f" # Dark slate grey: #314f4f # Knitr options opts_chunk$set( comment = "#>", fig.align = "center", fig.height = 7, fig.width = 10.5, warning = F, message = F ) opts_chunk$set(dev = "svg") options(device = function(file, width, height) { svg(tempfile(), width = width, height = height) }) options(crayon.enabled = F) options(knitr.table.format = "html") # A blank theme for ggplot theme_empty <- theme_bw() + theme( line = element_blank(), rect = element_blank(), strip.text = element_blank(), axis.text = element_blank(), plot.title = element_blank(), axis.title = element_blank(), plot.margin = structure(c(0, 0, -0.5, -1), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_simple <- theme_bw() + theme( line = element_blank(), panel.grid = element_blank(), rect = element_blank(), strip.text = element_blank(), axis.text.x = element_text(size = 18, family = "STIXGeneral"), axis.text.y = element_blank(), axis.ticks = element_blank(), plot.title = element_blank(), axis.title = element_blank(), # plot.margin = structure(c(0, 0, -1, -1), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_axes_math <- theme_void() + theme( text = element_text(family = "MathJax_Math"), axis.title = element_text(size = 22), axis.title.x = element_text(hjust = .95, margin = margin(0.15, 0, 0, 0, unit = "lines")), axis.title.y = element_text(vjust = .95, margin = margin(0, 0.15, 0, 0, unit = "lines")), axis.line = element_line( color = "grey70", size = 0.25, arrow = arrow(angle = 30, length = unit(0.15, "inches") )), plot.margin = structure(c(1, 0, 1, 0), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_axes_serif <- theme_void() + theme( text = element_text(family = "MathJax_Main"), axis.title = element_text(size = 22), axis.title.x = element_text(hjust = .95, margin = margin(0.15, 0, 0, 0, unit = "lines")), axis.title.y = element_text(vjust = .95, margin = margin(0, 0.15, 0, 0, unit = "lines")), axis.line = element_line( color = "grey70", size = 0.25, arrow = arrow(angle = 30, length = unit(0.15, "inches") )), plot.margin = structure(c(1, 0, 1, 0), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_axes <- theme_void() + theme( text = element_text(family = "Fira Sans Book"), axis.title = element_text(size = 18), axis.title.x = element_text(hjust = .95, margin = margin(0.15, 0, 0, 0, unit = "lines")), axis.title.y = element_text(vjust = .95, margin = margin(0, 0.15, 0, 0, unit = "lines")), axis.line = element_line( color = grey_light, size = 0.25, arrow = arrow(angle = 30, length = unit(0.15, "inches") )), plot.margin = structure(c(1, 0, 1, 0), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_set(theme_gray(base_size = 20)) # Column names for regression results reg_columns <- c("Term", "Est.", "S.E.", "t stat.", "p-Value") # Function for formatting p values format_pvi <- function(pv) { return(ifelse( pv < 0.0001, "<0.0001", round(pv, 4) %>% format(scientific = F) )) } format_pv <- function(pvs) lapply(X = pvs, FUN = format_pvi) %>% unlist() # Tidy regression results table tidy_table <- function(x, terms, highlight_row = 1, highlight_color = "black", highlight_bold = T, digits = c(NA, 3, 3, 2, 5), title = NULL) { x %>% tidy() %>% select(1:5) %>% mutate( term = terms, p.value = p.value %>% format_pv() ) %>% kable( col.names = reg_columns, escape = F, digits = digits, caption = title ) %>% kable_styling(font_size = 20) %>% row_spec(1:nrow(tidy(x)), background = "white") %>% row_spec(highlight_row, bold = highlight_bold, color = highlight_color) } ``` $$ \begin{align} \def\ci{\perp\mkern-10mu\perp} \end{align} $$ # Prologue --- name: schedule # Schedule ## Last time Matching and propensity-score methods - Conditional independence - Overlap ## Today Instrumental variables (and two-stage least squares) ## Upcoming - Assignment due Sunday - Proposal due Wednesday 5/22 - Midterm? --- layout: true # Research designs --- class: inverse, middle --- name: designs ## Selection on observables and/or unobservables We've been focusing on .hi-slate[*selection-on-observables* designs], _i.e._, $$ \begin{align} \left(\text{Y}_{0i},\, \text{Y}_{1i}\right) \ci \text{D}_{i}|\text{X}_{i} \end{align} $$ for .hi-slate[observable] variables $\text{X}_{i}$. -- .hi-pink[*Selection-on-unobservable* designs] replace this assumption with two new (but related) assumptions 1. $\left(\text{Y}_{0i},\, \text{Y}_{1i}\right) \perp \text{Z}_{i}$ 2. $\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right) \neq 0$ --- ## Selection on observables and/or unobservables Our main goal in causal-inference minded (applied) econometrics boils down to isolating .b["good" variation] in $\text{D}_{i}$ (exogenous/as-good-as-random) from .b["bad" variation] (the part of $\text{D}_{i}$ correlated with $\text{Y}_{0i}$ and $\text{Y}_{1i}$). -- (We want to avoid selection bias.) -- - .hi-slate[Selection-on-observables designs] assume that we can control for all *bad variation* (selection) in $\text{D}_{i}$ through a known (observed) $\text{X}_{i}$. -- - .hi-pink[Selection-on-unobservables designs] assume that we can extract part of the *good variation* in $\text{D}_{i}$ (generally using some $\text{Z}_{i}$) and then use this *good* part of $\text{D}_{i}$ to estimate the effect of $\text{D}_{i}$ on $\text{Y}_{i}$. -- We throw away the *bad variation* in $\text{D}_{i}$ (it's bad). --- ## Which route? So set of research designs is more palatable? -- 1. There are plenty of bad applications of both sets.
.purple[Violated assumptions, bad controls, *etc.*] -- 1. .hi-slate[Selection on observables] assumes we know .it[everything] about selection into treatment—we can identify .it[all] of the good (or bad) variation in $\text{D}_{i}$. --
.purple[Tough in non-experimental settings. Difficult to validate in practice.] -- 1. .hi-pink[Selection on unobservables] assumes we can isolate .it[some] good/clean variation in $\text{D}_{i}$, which we then use to estimate the effect of $\text{D}_{i}$ on $\text{Y}_{i}$. --
.purple[Seems more plausible. Possible to validate. May be underpowered.] --- layout: true # Instrumental variables --- name: intro ## Introduction .attn[Instrumental variables] (IV).super[.pink[†]] is the canonical selection-on-unobservables design—isolating *good variation* in $\text{D}_{i}$ via some magical .pink[instrument] $\color{#e64173}{\text{Z}_{i}}$. .footnote[.pink[†] For the moment, we're lumping together IV and two-stage least squares (2SLS) together—as many people do—even though they are technically different.] -- Consider some model (structural equation) $$ \begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1} \end{align} $$ To guarantee consistent OLS estimates for $\beta_1$, want $\mathop{\text{Cov}} \left( \text{D}_{i},\,\varepsilon_i \right)=0$.
In general, this is a heroic assumption. -- .note[Alternative:] Estimate $\beta_1$ via instrumental variables. --- name: defined ## Definition For our model $$ \begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1} \end{align} $$ A valid .attn[instrument] is a variable $\color{#e64173}{\text{Z}_{i}}$ such that 1. $\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{D}_{i} \right) \neq 0$ --
our .pink[instrument] correlates with treatment -- (so we can keep part of $\text{D}_{i}$) -- 2. $\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \varepsilon_i \right) = 0$ --
our .pink[instrument] is uncorrelated with other (non- $\!\!\text{D}_{i}$) determinants of $\text{Y}_{i}$ -- , _i.e._, $\color{#e64173}{\text{Z}_{i}}$ is excludable from equation $(1)$. -- .attn[(exclusion restriction)] --- name: example ## Example Back to the returns to a college degree, $$ \begin{align} \text{Income}_i = \beta_0 + \beta_1 \text{Grad}_i + \varepsilon_i \end{align} $$ OLS is likely biased. -- What if that state conducts a (random) .hi-pink[lottery] for scholarships? -- Let $\color{#e64173}{\text{Lottery}_i}$ denote an indicator for whether $i$ won a lottery scholarship..super[.pink[†]] .footnote[.pink[†] We'll have to focus on families who were eligible/who applied.] -- 1. $\mathop{\text{Cov}} \left( \color{#e64173}{\text{Lottery}_i},\, \text{Grad}_i \right)\neq 0$ $\left( >0 \right)$ if scholarships increase grad. rates. -- 2. $\mathop{\text{Cov}} \left(\color{#e64173}{\text{Lottery}_i},\, \varepsilon_i\right) = 0$ since the lottery is randomized. --- layout: true # Instrument variables ## The IV estimator The IV estimator for our model $$ \begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \tag{1} \end{align} $$ with (valid) instrument $\color{#e64173}{\text{Z}_{i}}$ is $$ \begin{align} \hat{\beta}_\text{IV} = \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{Y}\right) \end{align} $$ --- name: iv-estimator -- If you have no covariates, then $$ \begin{align} \hat{\beta}_\text{IV} = \dfrac{\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{Y}_{i}\right)}{\mathop{\text{Cov}} \left( \color{#e64173}{\text{Z}_{i}},\, \text{D}_{i} \right)} \end{align} $$ --- If you have additional (exogenous) covariates $\text{X}_i$, then $$ \begin{align} \text{Z} &= \begin{bmatrix}\color{#e64173}{\text{Z}_{i}} & \text{X}_{i}\end{bmatrix} \\[0.5em] \text{D} &= \begin{bmatrix}\color{#e64173}{\text{D}_{i}} & \text{X}_{i}\end{bmatrix} \end{align} $$ --- layout: true # Instrumental variables --- ## Proof: Consistency With a valid instrument $\text{Z}_{i}$, $\hat{\beta}_\text{IV}$ is a consistent estiamtor for $\beta_1$ in $$ \begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{X}_{i} + \varepsilon_i \tag{1} \end{align} $$ $\mathop{\text{plim}}\left( \hat{\beta}_{IV} \right)$ -- .pad-left[ $= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{Y}\right) \right)$ ] -- .pad-left[ $= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{D} \beta + \text{Z}'\varepsilon\right) \right)$ ] -- .pad-left[ $= \mathop{\text{plim}}\left( \left(\text{Z}'\text{D}\right)^{-1} \left( \text{Z}'\text{D}\right) \beta\right) + \mathop{\text{plim}}\left(\dfrac{1}{N} \text{Z}'\text{D}\right)^{-1} \mathop{\text{plim}}\left( \dfrac{1}{N} \text{Z}'\varepsilon\right)$ ] -- .pad-left[ $=\beta$  .pink[✔] ] --- layout: true # Two-stage least squares --- class: inverse, middle --- name: setup ## Setup You'll commonly see IV implemented as a two-stage process known as
.attn[two-stage least squares] (2SLS). -- .attn[First stage] Estimate the effect of the instrument $\color{#e64173}{\text{Z}_{i}}$ on our endogenous variable $\text{D}_{i}$ and (predetermined) covariates $\text{X}_{i}$. Save $\color{#6A5ACD}{\widehat{\text{D}}_{i}}$. $$ \begin{align} \text{D}_{i} = \gamma_1 \color{#e64173}{\text{Z}_{i}} + \gamma_2 \text{X}_{i} + u_i \end{align} $$ -- .attn[Second stage] Estimate model we wanted—but only using the variation in $\text{D}_{i}$ that correlates with $\color{#e64173}{\text{Z}_{i}}$, _i.e._, $\color{#6A5ACD}{\widehat{\text{D}}_{i}}$. $$ \begin{align} \text{Y}_{i} = \beta_1 \color{#6A5ACD}{\widehat{\text{D}}_{i}} + \beta_2 \text{X}_{i} + \varepsilon_i \end{align} $$ .note[Note] The controls $\text{X}_{i}$ must match in the first and second stages. --- ## IV estimation This two-step procedure, with a valid instrument, produces an estimator $\hat{\beta}_1$ that is consistent for $\beta_1$. $$ \begin{align} \hat{\beta}_\text{2SLS} &= \left( \text{D}' \text{P}_{\text{Z}} \text{D} \right)^{-1} \left( \text{D}' \text{P}_{\text{Z}} \text{Y} \right) \\[0.3em] \text{P}_{\text{Z}} &= \text{Z} \left( \text{Z}'\text{Z} \right)^{-1} \text{Z}' \end{align} $$ where $\text{D}$ is a matrix of our treatment and predetermined covariates $\left( \text{X}_{i} \right)$ and $Z$ is a matrix of our instrument and our predetermined covariates. --- ## IV estimation Important notes - The controls $\left( \text{X}_{i} \right)$ must match in the first and second stages. - If you have exactly .hi-slate[one instrument] and exactly .hi-slate[one endogenous variable], then 2SLS and IV are identical. - Your second-stage standard errors are not correct. --- name: reduced-form ## The reduced form In addition to the regressions within the two stages of 2SLS 1. $\text{D}_{i} = \gamma_1 \color{#e64173}{\text{Z}_{i}} + \gamma_2 \text{X}_{i} + u_i$ 2. $\text{Y}_{i} = \beta_1 \color{#6A5ACD}{\widehat{\text{D}}_{i}} + \beta_2 \text{X}_{i} + \varepsilon_i$ there is a third important and related regression: the reduced form. -- The .attn[reduced form] regresses the outcome $\text{Y}_{i}$ (LHS of the second stage) on our instrument $\color{#e64173}{\text{Z}_{i}}$ and covariates $\text{X}_{i}$ (RHS of the first stage). $$ \begin{align} \text{Y}_{i} = \pi_1 \color{#e64173}{\text{Z}_{i}} + \pi_2 \text{X}_{i} + u_i \end{align} $$ -- Thus, the reduced form provides a consistent estimate of the causal effect of our instrument on the outcome. --- ## The reduced form, continued While the reduced form estimates the causal effect of the instrument on our outcome, we're often actually interested in the effect of *treatment* $\left( \text{D}_{i} \right)$. -- That said, the reduced form is still incredibly helpful/important: - Clarifies your source of identifying variation. -- - Does not suffer from *weak instruments* problems. -- - Only requires $\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_i \right) = 0$. -- - Offers insights into your estimates -- $$ \begin{align} \widehat{\beta}_{1}^\text{2SLS} = \dfrac{\widehat{\pi}_{1}}{\widehat{\gamma}_{1}} \end{align} $$ when you have exactly one instrument. --- name: reduced-intuition ## The reduced form, intuition This expression for the 2SLS (and IV) estimator can be very helpful. $$ \begin{align} \widehat{\beta}_{1}^\text{2SLS} = \dfrac{\color{#6A5ACD}{\widehat{\pi}_{1}}}{\color{#20B2AA}{\widehat{\gamma}_{1}}} = \dfrac{\color{#6A5ACD}{\text{Reduced-form estimate}}}{\color{#20B2AA}{\text{First-stage estimate}}} \end{align} $$ -- What's the interpretation/intuition? -- Back to our example: $\widehat{\beta}_1 =$ est. effect of college graduation on income. -- $\color{#6A5ACD}{\widehat{\pi}_1}$ gives the estimated causal effect of the scholarship lottery on income -- , but what share of lottery winners graduate? We need to rescale if $<$ 100%. -- $\color{#20B2AA}{\widehat{\gamma}_1}$ estimates the effect of winning the scholarship lottery on graduation -- —the share of winners who graduated due to winning. -- We can scale with $\color{#20B2AA}{\widehat{\gamma}_1}$! --- name: reduced-example ## The reduced form, example To see why this scaling makes sense, imagine that 50% of lottery winners graduate from college due to the lottery, _i.e._, $\color{#20B2AA}{\widehat{\gamma}_1 =}$ .turquoise[0.50]..super[.pink[†]] .footnote[.pink[†] Imagine none of the applicants would have graduated otherwise] -- Our reduced-form estimate of $\color{#6A5ACD}{\widehat{\pi}_1=}$ .purple[$5,000] says that lottery winners make $5,000 more than the control group, on average. -- However, half of the winners did not graduate, so $\color{#6A5ACD}{\widehat{\pi}_1}$ "underestimates" the effect of college graduation by combining graduates by nongraduates. -- Thus, we want to double $\color{#6A5ACD}{\widehat{\pi}_1}$, _i.e._, divide by $\color{#20B2AA}{\widehat{\gamma}_1}$: $\color{#6A5ACD}{\widehat{\pi}_1}/\color{#20B2AA}{\widehat{\gamma}_1}$ = .turquoise[$5,000]/.purple[0.5] = $10,000. --- name: reduced-derivation .qa[Q] How do we get this magical expression? $\left( \widehat{\beta}_1^\text{IV} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1} \right)$ -- ## Derivation -- $\widehat{\beta}_1^\text{IV} = \left( \text{Z}'\text{D} \right)^{-1} \left( \text{Z}'\text{Y} \right)$ -- $\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \left( \widetilde{\text{Z}}'\widetilde{\text{D}} \right)^{-1} \left( \widetilde{\text{Z}}'\text{Y} \right)$   applying FWL to reduce $\text{D}$ and $\text{Z}$ to vectors. -- $\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \dfrac{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_{i},\, \text{Y}_{i} \right)}{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_i,\, \widetilde{\text{D}}_{i} \right)}$ -- $= \dfrac{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_{i},\, \text{Y}_{i} \right)/\mathop{\text{Var}} \left( \widetilde{\text{Z}}_i \right)}{\mathop{\text{Cov}} \left( \widetilde{\text{Z}}_i,\, \widetilde{\text{D}}_{i} \right)/\mathop{\text{Var}} \left( \widetilde{\text{Z}}_i \right)}$ -- $\color{#ffffff}{\widehat{\beta}_1^\text{IV}} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1}$  .pink[✔] --- layout: false class: clear, middle Let's push a bit deeper into IV's mechanics and intuition. --- layout: true # IV: Mechanics and intuition --- name: iv-intuition ## Setup In this section, we'll use medical trials as a working example..super[.pink[†]] .footnote[.pink[†] Credit/thanks go to [Michael Anderson](https://are.berkeley.edu/~mlanderson/ARE_Website/Home.html) for this example—and much of these notes.] -- We are interested in the regression model for the effect of some treatment (_e.g._, blood-pressure medication) on medical outcome $\text{Y}_{i}$ -- $$ \begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \text{D}_{i} + \varepsilon_i \end{align} $$ $\text{D}_{i}$ indicates whether $i$ *takes* the treatment (medication). $\varepsilon_i$ captures all other factors that affect $\text{Y}_{i}$. -- Or in potential-outcomes framework: $$ \begin{align} \text{Y}_{i} &= \text{Y}_{1i} \text{D}_{i} + \text{Y}_{0i} (1-\text{D}_{i}) \\ \text{Y}_{0i} &= \beta_0 + \varepsilon_i \\ \text{Y}_{1i} &= \text{Y}_{0i} + \beta_1 \end{align} $$ --- ## Research design .note[Goal] .hi-slate[Estimate the effect of blood-pressure medication] on blood pressure. -- .note[Challenge] .hi-slate[Selection bias:] Even if treatment reduces blood pressure, selection bias will fights against the estimated effect. -- .note[Solution] .hi-slate[Randomized medical trial:] Ask randomly chosen individuals in treatment group to take the pill. Control individual get placebo (or nothing). -- .note[Analysis 1] .attn[Intention to treat] (.attn[ITT]): $\widehat{\beta}_1^\text{ITT} = \overline{\text{Y}}_\text{Trt} - \overline{\text{Y}}_\text{Ctrl}$ -- .note[ITT problem] .attn[Bias from noncompliance:] People don't always follow rules.
*E.g.*, treated folks who don't take pills; control folks who take pills. -- .note[Analysis 2] .hi-slate[IV!] -- Instrument medication $\text{D}_{i}$ with intention to treat $\text{Z}_{i}$. --- ## The IV solution First question: Is $\text{Z}_{i}$ a valid instrument for $\text{D}_{i}$? -- 1. $\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_i \right) = 0$ as $\text{Z}_{i}$ was randomly assigned (exclusion restriction). -- 1. $\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right)\neq 0$ if assignment to treatment changes the likelihood you take the pills (first stage). -- ∴ $\text{Z}_{i}$ is a valid instrument for $\text{D}_{i}$ and IV consistently estimates $\beta_1$. --- name: iv-noncompliance ## Noncompliance .attn[Noncompliant] individuals do not abide by their treatment assignment. -- Let's see how IV "solves" this problems. -- First, assume noncompliance only affects treated individuals—*i.e.*, treated folks sometimes don't take their pills; control folks never take pills. --- ## Noncompliance, continued The .hi-slate[first stage] recovers the share of treatment individuals who take the pill $$ \begin{align} \text{D}_{i} = \gamma_1 \text{Z}_{i} + u_i \end{align} $$ -- *i.e.*, if 50% of treated individuals take the medication, $\widehat{\gamma} =$ 0.50. -- The .hi-slate[reduced form] estimates the *ITT* $$ \begin{align} \text{Y}_{i} = \pi_1 \text{Z}_{i} + v_i \end{align} $$ -- which we know IV rescales using the first stage $$ \begin{align} \widehat{\beta}_{1}^\text{IV} = \dfrac{\widehat{\pi}_1}{\widehat{\gamma}_1} = \dfrac{\widehat{\pi}_1}{0.50} = 2 \times \widehat{\pi}_1 \end{align} $$ --- name: iv-rescale ## Noncompliance, continued IV solves the noncompliance issue by rescaling by the rate of compliance. -- If everyone perfectly complies, then $\widehat{\gamma}_1 = 1$ and $\widehat{\beta}_{1}^\text{IV} = \widehat{\pi}_1/1 = \widehat{\beta}_{1}^\text{ITT}$. -- .ex[Further example] $N_\text{Trt}$ = 10; trt. compliance = 50%; ctrl. compliance = 100%. $\overline{\text{Y}}_\text{Trt} = \dfrac{5 (\beta_0 + \beta_1) + 5 (\beta_0)}{10} = \beta_0 + \dfrac{\beta_1}{2}$ -- and $\overline{\text{Y}}_\text{Ctrl} = \beta_0$. -- So our reduced-form estimate (the ITT) is $\widehat{\gamma}_1 = \dfrac{\beta_1}{2}$ (half the true effect). -- IV consistently estimates $\beta_1$ via rescaling the ITT by the rate of compliance $$ \begin{align} \widehat{\beta}_1^\text{IV} = \dfrac{\pi}{\gamma} = \dfrac{\beta_1/2}{1/2} = \beta_1 \end{align} $$ --- ## Takeaways Main points 1. IV .b[rescales] .pink[the causal effect of] $\color{#e64173}{\text{Z}_{i}}$ .pink[on] $\color{#e64173}{\text{Y}_{i}}$ by .purple[the causal effect of] $\color{#6A5ACD}{\text{Z}_{i}}$ .purple[on] $\color{#6A5ACD}{\text{D}_{i}}$. -- 1. IV .b[does not] compare treated compliers to untreated compliers. --
Such a comparison/estimator would re-introduce selection bias. --- layout: true class: clear, middle --- name: het Thus far, we assumed homogeneous treatment effects. .qa[Q] What happens .b[when treatment effects are heterogeneous]? --- .qa[A] Let's recall what our instruments are doing (with Venn diagrams!). .note[Credit] [Glen Waddell](http://www.glenwaddell.com) introduced me to IV via Venn. --- name: venn ```{R, venn_iv, echo = F, fig.height = 7.5} # Colors (order: x1, x2, x3, y, z) venn_colors <- c(purple, red, "grey60", orange, red_pink) # Line types (order: x1, x2, x3, y, z) venn_lines <- c("solid", "dotted", "dotted", "solid", "solid") # Locations of circles venn_df <- tibble( x = c( 0.0, -0.5, 1.5, -1.0, -1.4), y = c( 0.0, -2.5, -1.8, 2.0, -2.6), r = c( 1.9, 1.5, 1.5, 1.3, 1.3), l = c( "Y", "X[1]", "X[2]", "X[3]", "Z"), xl = c( 0.0, 0.7, 1.6, -1.0, -2.9), yl = c( 0.0, -3.8, -1.9, 2.2, -2.6) ) # Venn ggplot(data = venn_df, aes(x0 = x, y0 = y, r = r, fill = l, color = l)) + geom_circle(aes(linetype = l), alpha = 0.3, size = 0.75) + theme_void() + theme(legend.position = "none") + scale_fill_manual(values = venn_colors) + scale_color_manual(values = venn_colors) + scale_linetype_manual(values = venn_lines) + geom_text(aes(x = xl, y = yl, label = l), size = 9, family = "Fira Sans Book", parse = T) + annotate( x = -5.5, y = 3.3, geom = "text", label = "Figure 1", size = 10, family = "Fira Sans Book", hjust = 0 ) + xlim(-5.5, 4.5) + ylim(-4.2, 3.4) + coord_equal() ``` --- ```{R, venn-endog, echo = F, fig.height = 7.5} # Change locations of circles venn_df %>% mutate( x = x + c(0, 0, 0, 0, 0), xl = xl + c(0, 0, 0, 0, 0), y = y + c(0, 0, 0, 0, 1), yl = yl + c(0, 0, 0, 0, 1) ) %>% # Venn ggplot(data = ., aes(x0 = x, y0 = y, r = r, fill = l, color = l)) + geom_circle(aes(linetype = l), alpha = 0.3, size = 0.75) + theme_void() + theme(legend.position = "none") + scale_fill_manual(values = venn_colors) + scale_color_manual(values = venn_colors) + scale_linetype_manual(values = venn_lines) + geom_text(aes(x = xl, y = yl, label = l), size = 9, family = "Fira Sans Book", parse = T) + annotate( x = -5.5, y = 3.3, geom = "text", label = "Figure 2", size = 10, family = "Fira Sans Book", hjust = 0 ) + xlim(-5.5, 4.5) + ylim(-4.2, 3.4) + coord_equal() ``` --- ```{R, venn-irrelevant, echo = F, fig.height = 7.5} # Change locations of circles venn_df %>% mutate( x = x + c(0, 0, 0, 0,-1), xl = xl + c(0, 0, 0, 0,-1), y = y + c(0, 0, 0, 0, 2.3), yl = yl + c(0, 0, 0, 0, 2.3) ) %>% # Venn ggplot(data = ., aes(x0 = x, y0 = y, r = r, fill = l, color = l)) + geom_circle(aes(linetype = l), alpha = 0.3, size = 0.75) + theme_void() + theme(legend.position = "none") + scale_fill_manual(values = venn_colors) + scale_color_manual(values = venn_colors) + scale_linetype_manual(values = venn_lines) + geom_text(aes(x = xl, y = yl, label = l), size = 9, family = "Fira Sans Book", parse = T) + annotate( x = -5.5, y = 3.3, geom = "text", label = "Figure 3", size = 10, family = "Fira Sans Book", hjust = 0 ) + xlim(-5.5, 4.5) + ylim(-4.2, 3.4) + coord_equal() ``` --- ```{R, venn-iv-endog2, echo = F, fig.height = 7.5} # Change locations of circles venn_df %>% mutate( x = x + c(0, 0, 0, 0, 2), xl = xl + c(0, -2.4, 0.8, 0, 4.6), y = y + c(0, 0, 0, 0, 0), yl = yl + c(0, 0, 0, 0, -1.1) ) %>% # Venn ggplot(data = ., aes(x0 = x, y0 = y, r = r, fill = l, color = l)) + geom_circle(aes(linetype = l), alpha = 0.3, size = 0.75) + theme_void() + theme(legend.position = "none") + scale_fill_manual(values = venn_colors) + scale_color_manual(values = venn_colors) + scale_linetype_manual(values = venn_lines) + geom_text(aes(x = xl, y = yl, label = l), size = 9, family = "Fira Sans Book", parse = T) + annotate( x = -5.5, y = 3.3, geom = "text", label = "Figure 4", size = 10, family = "Fira Sans Book", hjust = 0 ) + xlim(-5.5, 4.5) + ylim(-4.2, 3.4) + coord_equal() ``` --- ```{R, venn-iv-endog1, echo = F, fig.height = 7.5} # Venn ggplot(data = venn_df, aes(x0 = x, y0 = y, r = r, fill = l, color = l)) + geom_circle(aes(linetype = l), alpha = 0.3, size = 0.75) + theme_void() + theme(legend.position = "none") + scale_fill_manual(values = venn_colors) + scale_color_manual(values = venn_colors) + scale_linetype_manual(values = venn_lines) + geom_text(aes(x = xl, y = yl, label = l), size = 9, family = "Fira Sans Book", parse = T) + annotate( x = -5.5, y = 3.3, geom = "text", label = "Figure 1", size = 10, family = "Fira Sans Book", hjust = 0 ) + xlim(-5.5, 4.5) + ylim(-4.2, 3.4) + coord_equal() ``` --- layout: true # IV + heterogeneity --- ## Recap Throughout the course, we've discussed two concepts of treatment effects. -- 1. .attn[Average treatment effect] (.attn[ATE]) The average treatment effect for an individual randomly drawn from our sample. -- 1. .attn[Treatment on the treated] (.attn[TOT]) The average treatment effect for a .it.hi-slate[treated] individual randomly drawn from our sample. -- When we assume homogeneous/constant treatment effects, ATE = TOT. -- .qa[Q] If treatment effects vary, then what do IV and 2SLS estimate? -- .qa[A] Not ATE. -- And not TOT. -- They estimate the LATE..super[.pink[†]] .footnote[ .pink[†] See [Angrist, Imbens, and Rubin (1996)](https://www.jstor.org/stable/2291629). ] --- ## The LATE IV generally estimates the .attn[LATE]—the .attn[Local Average Treatment Effect]. -- .note[Recall] IV "works" by isolating variation in $\text{D}_{i}$ induced by our instrument $\text{Z}_{i}$. -- In other words: IV focuses on the individuals whose $\text{D}_{i}$ changes due to $\text{Z}_{i}$. Angrist, Imbens, and Rubin (1996) call these folks .attn[compliers]. -- However, *compliers* are only one of four possible groups. .col-left[ 1. .attn[Compliers] $\text{D}_{i} = 1$ iff $\text{Z}_{i}=1$. 1. .attn[Always-takers] $\text{D}_{i} = 1$ $\forall \text{Z}_{i}$. 1. .attn[Never-takers] $\text{D}_{i} = 0$ $\forall \text{Z}_{i}$. 1. .attn[Defiers] $\text{D}_{i} = 1$ iff $\text{Z}_{i}=0$. ] -- .col-right[ Only take pills .hi-slate[when treated].
.hi-slate[Always] take pills.
.hi-slate[Never] take pills.
Only take pills .hi-slate[when untreated]. ] --- ## The LATE Because IV only uses variation in $\text{D}_{i}$ that correlates with $\text{Z}_{i}$, IV mechanically drops *always-takers* and *never-takers*. -- Most IV derivations/applications assume away the existence of *defiers*. -- Thus, IV estimates a treatment effect .hi-slate[using only *compliers*]. -- Hence the "local" in *local average treatment effect*. --- name: late-ex ## The LATE: Medical-trial example Imagine treatment works for some $\left( \beta_{1,i} < 0 \right)$ and not for others $\left( \beta_{1,j} = 0 \right)$. Suppose individuals know their response to blood-pressure medication. -- - $\beta_{1,i}<0$ individuals always take the pill. -- - $\beta_{1,j}=0$ individuals only take the pill when treated. -- Then our compliers will be individuals for whom $\beta_{1,j}=0$. -- Thus, IV's LATE will indicate no treatment effect $\left( \widehat{\beta}_1^\text{IV} = 0 \right)$. --- ## The LATE .qa[Q] So is IV actually inconsistent? -- .qa[A] It depends what you are trying to estimate (and how you interpret it). IV doesn't estimate the ATE or TOT, so it would be inconsistent for them..super[.pink[†]] .footnote[ .pink[†] Just as the TOT is not consistent for the ATE. ] -- IV estimates the *local* average treatment effect. -- .note[Takeaway] Because IV identifies off of compliers, it estimates an average treatment effect for these individuals (who *comply* with the instrument). -- .note[Takeaway.sub[2]] Different instruments have different LATEs. --- name: monotonicity ## Monotonicity We've already written down the two classical IV/2SLS assumptions - .note[First stage:] $\mathop{\text{Cov}} \left( \text{Z}_{i},\, \text{D}_{i} \right) > 0$ - .note[Exclusion restriction:] $\mathop{\text{Cov}} \left( \text{Z}_{i},\, \varepsilon_{i} \right) = 0$ but we need a third assumption to get ensure IV's complier-based LATE interpretation. -- - .attn[Monotonicity] (.attn[Uniformity]).attn[:] $\text{D}_{i}(z)\geq \text{D}_{i}(z')$ or $\text{D}_{i}(z)\leq \text{D}_{i}(z') \enspace \forall i$
[Heckman](chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/http://jenni.uchicago.edu/papers/koop2006/koop2-IV_ho_2006-09-25a_mms.pdf): *Uniformity* of responses *across persons.*
[Imbens and Angrist (1994)](https://www.jstor.org/stable/2951620): Instrument has monotone effect on $\text{D}_{i}$. --- ## Monotonicity If "defiers" exist, then monotonicity/uniformity is violated. -- In this case, the IV estimand is $$ \begin{align} \dfrac{\tau_{c} \mathop{\text{Pr}}\left(\text{complier}\right) - \tau_{d} \mathop{\text{Pr}}\left(\text{defier}\right)}{ \mathop{\text{Pr}}\left(\text{complier}\right) - \mathop{\text{Pr}}\left(\text{defier}\right)} \end{align} $$ which is not bound between $\tau_{c}$ and $\tau_{d}$. -- .ex[Example] $\tau_c=$ 1 and $\tau_d=$ 2. $\mathop{\text{Pr}}\left(\text{complier}\right)=$ 2/3 and $\mathop{\text{Pr}}\left(\text{defier}\right)=$ 1/3. -- Then the "LATE" is 0..super[.pink[†]] .footnote[ .pink[†] Some people would instead say that there is no LATE when you violate monotonicity. ] --- layout: false class: clear, middle Until now, we've focused on using a single instrument. The 2SLS estimator accomodates multiple instruments..super[.pink[†]] .footnote[ .pink[†] Whether you can find multiple valid instruments is another question. ] --- layout: true # Multiple instruments --- class: inverse, middle name: multi-inst --- ## Motivation .qa[Q] Why include multiple instruments? -- .qa[A] Multiple instruments can capture more variation in $\text{D}_{i}$ (efficiency). -- Using terminology from the *system-of-equations* literature, - one instrument for one endogenous variable: .attn[just identified] - multiple instruments for one endogenous variable: .attn[over identified] --- ## In practice With (valid) instruments $\text{Z}_{1i}$ and $\text{Z}_{2i}$, or first stage becomes $$ \begin{align} \text{D}_{i} = \gamma_0 + \gamma_1 \text{Z}_{1i} + \gamma_2 \text{Z}_{2i} + \gamma_3 \text{X}_{i} + u_i \end{align} $$ -- while our second stage is still $$ \begin{align} \text{Y}_{i} = \beta_0 + \beta_1 \widehat{\text{D}}_{i} + \beta_2 \text{X}_{i} + v_i \end{align} $$ --- layout: true # Multiple instruments ## Example: Quarter of birth --- name: multi-ex Back to our quest to estimate the returns to education. -- [Angrist and Krueger (1991)](https://www.jstor.org/stable/2937954) proposed *quarter of birth* as a set of instruments for years of schooling. -- Accordingly, their first stage looks something like.super[.pink[†]] .footnote[ .pink[†] We need to drop one of the quarter-of-birth indicators to avoid perfect collinearity. ] $$ \begin{align} \text{Schooling}_i = \gamma_0 &+ \gamma_1 \mathbb{I}(\text{Born Q1})_{i} + \gamma_2 \mathbb{I}(\text{Born Q2})_{i} \\&+ \gamma_3 \mathbb{I}(\text{Born Q3})_{i} + \gamma_4 \mathbb{I}(\text{Born Q4})_{i} \\&+ \gamma_5 \text{X}_{i} + u_{i} \end{align} $$ --- .qa[Q] Is quarter of birth a valid instrument? -- .qa[Q1] Why would quarter of birth affect schooling? (.note[First stage]) -- .qa[A1] Students cannot drop out of school until a certain age, and quarter of birth affects your age at the time you begin school. -- .ex[Example] Some states require students to stay in school until they are 16. - Students who start school at age .hi-slate[6] drop out after .hi-slate[10] years of schooling. - Students who start school at age .hi-slate[5] drop out after .hi-slate[11] years of schooling. --- If students must begin school in calendar year in which they turn 6 - December birthdates: begin school at 5.75; drop out with 10.25 yrs. - January birthdates: begin school at 6.75; drop out with 9.25 yrs. -- For some group, quarter of birth may affect the number of years in school. --- It turns out that the first stage is also pretty weak in this setting. .attn[Weak instruments] can cause several problems for 2SLS/IV: -- 1. Our estimator is a ratio of the reduced form and the first stage, so a weak first stage can blow up reduced-form estimates (amplifying reduced-form noise/bias). -- 2. Many weak instruments lead to a finite-sample issue in which 2SLS is biased toward OLS—our first stage is essentially overfitting. -- What about our other requirements for a valid instrument? --- .qa[Q2] Is quarter of birth uncorrelated with $\varepsilon_i$ (.note[excludable])? -- .qa[A2] While quarter of birth may be fairly arbitrary for some families, other families might time births. If these birth timers differ from other couples along other dimensions (_e.g._, income or education), then quarter of birth may correlate with $\varepsilon_i$. --- .qa[Q3] Is the effect monotone? -- .qa[A3] Some.super[.pink[†]] argue that monotonicity may be violated in this setting. .footnote[ .pink[†] _E.g._, [Aliprantis (2012)](https://journals.sagepub.com/doi/abs/10.3102/1076998610396885) ] -- Consider December births. -- - Original idea: December birthdates will start school at age 5.7, inducing more years of education before 16. -- - *Redshirting* idea: Parents hold back December kids so they can be older (_i.e._, 6.7), inducing fewer years of education before 16. --- layout: true # 2SLS and .mono[R] --- name: 2sls-r ## `estimatr` You can implement 2SLS/IV in many ways in .mono[R]. Today: `esitmatr` and `iv_robust()`. -- Specifically, we give `iv_robust()` the relationship that we want separted from the instrument by `|` -- , *e.g.*, ```{R, iv-data, include = F} # Set seed set.seed(12345) # Sample size n <- 1e2 # Define our variance-covariance matrix (D, ε, Z) Σ <- matrix(data = c(1, 0.3, 0.3, 0.3, 1, 0, 0.3, 0, 1), ncol = 3) # Our vector of means (D, ε, Z) μ = c(10, 0, 3) # Draw n observations; convert to tibble sample_df <- MASS::mvrnorm(n = n, mu = μ, Sigma = Σ) %>% as_tibble() # Name variables names(sample_df) <- c("D", "ε", "Z") # Calculate Y sample_df %<>% mutate(Y = 7 + 1 * D + ε) ``` ```{R, r-iv1} # Estimate 2SLS iv_robust(Y ~ D | Z, data = sample_df, se_type = "classical") %>% tidy() %>% select(1:5) ``` --- ## Now in two stages! Of course, we can estimate 2SLS in two stages. ```{R, r-iv-s1} # First stage stage1 <- lm_robust(D ~ Z, data = sample_df, se_type = "classical") # First-stage results stage1 %>% tidy() %>% select(1:5) ``` --- ## Second stage We just need to add $\widehat{\text{D}}_{i}$ to our dataset. ```{R, r-iv-s2} # Add fitted (first-stage) values to data sample_df %<>% mutate(D_hat = stage1$fitted.values) # Second stage stage2 <- lm_robust(Y ~ D_hat, data = sample_df, se_type = "classical") # Second-stage results stage2 %>% tidy() %>% select(1:5) ``` --- ## Standard errors However, recall that our second-stage standard errors are not correct. -- .center.hi-purple[Second-stage results] ```{R, r-iv-2sls1, echo = F} stage2 %>% tidy_table( terms = c("Int", "D hat"), highlight_bold = F ) ``` -- .center.hi-pink[2SLS results] ```{R, r-iv-2sls2, echo = F} iv_robust(Y ~ D | Z, data = sample_df, se_type = "classical") %>% tidy_table( terms = c("Int", "D"), highlight_bold = F ) ``` --- layout: true # IV and 2SLS ## Conclusions --- name: conclusions 1. IV/2SLS focus on .hi-slate[isolating some "good" variation] in $\text{D}_{i}$ via $\text{Z}_{i}$. 1. Important .hi-slate[requirements]: strong first stage, excludability, monotonicity. 1. IV and 2SLS .hi-slate[rescale the reduced form] with the first stage. 1. Estimates are .hi-slate[LATE from compliers]. 1. Different instruments can produce .hi-slate[different LATEs]. 1. A .hi-slate[weak first stage] can lead to problems. --- layout: false # Table of contents .col-left[ ### Admin .smallest[ 1. [Schedule](#schedule) ] ### Instrumental variables .smallest[ 1. [Research designs](#designs) 1. [Introduction](#intro) 1. [Definition](#defined) 1. [Example](#example) 1. [IV estimator](#iv-estimator) ] ] .col-right[ ### Two-stage least squares .smallest[ 1. [Setup](#setup) 1. [The reduced form](#reduced-form) - [Defined](#reduced-form) - [Intuition](#reduced-intuition) - [Example](#reduced-example) - [Derivation](#reduced-derivation) 1. [Intuition and mechanics](#iv-intuition) - [Noncompliance](#iv-noncompliance) - [Rescaling](#iv-rescale) 1. [Heterogeneous treatment effects](#het) - [Venn diagram](#venn) - [LATE](#late) - [Example](#late-ex) - [Monotonicity](#monotonicity) 1. [Multiple instruments](#multi-inst) - [Example](#multi-ex) 1. [2SLS and .mono[R]](#2sls-r) 1. [Conclusions](#conclusions) ] ] --- exclude: true ```{R, generate pdfs, include = F, eval = T} source("../../ScriptsR/unpause.R") unpause("08IV.Rmd", ".", T, T) ```