--- title: "Inference and Randomization" subtitle: "EC 425/525, Set 11" author: "Edward Rubin" date: "`r format(Sys.time(), '%d %B %Y')`" output: xaringan::moon_reader: css: ['default', 'metropolis', 'metropolis-fonts', 'my-css.css'] # self_contained: true nature: highlightStyle: github highlightLines: true countIncrementalSlides: false --- class: inverse, middle ```{R, setup, include = F} # devtools::install_github("dill/emoGG") library(pacman) p_load( broom, tidyverse, ggplot2, ggthemes, ggforce, ggridges, latex2exp, viridis, extrafont, gridExtra, kableExtra, snakecase, janitor, data.table, dplyr, lubridate, knitr, future, furrr, estimatr, huxtable, here, magrittr ) # Define pink color red_pink <- "#e64173" turquoise <- "#20B2AA" orange <- "#FFA500" red <- "#fb6107" blue <- "#3b3b9a" green <- "#8bb174" grey_light <- "grey70" grey_mid <- "grey50" grey_dark <- "grey20" purple <- "#6A5ACD" slate <- "#314f4f" # Dark slate grey: #314f4f # Knitr options opts_chunk$set( comment = "#>", fig.align = "center", fig.height = 7, fig.width = 10.5, warning = F, message = F ) opts_chunk$set(dev = "svg") options(device = function(file, width, height) { svg(tempfile(), width = width, height = height) }) options(crayon.enabled = F) options(knitr.table.format = "html") # A blank theme for ggplot theme_empty <- theme_bw() + theme( line = element_blank(), rect = element_blank(), strip.text = element_blank(), axis.text = element_blank(), plot.title = element_blank(), axis.title = element_blank(), plot.margin = structure(c(0, 0, -0.5, -1), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_simple <- theme_bw() + theme( line = element_blank(), panel.grid = element_blank(), rect = element_blank(), strip.text = element_blank(), axis.text.x = element_text(size = 18, family = "STIXGeneral"), axis.text.y = element_blank(), axis.ticks = element_blank(), plot.title = element_blank(), axis.title = element_blank(), # plot.margin = structure(c(0, 0, -1, -1), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_axes_math <- theme_void() + theme( text = element_text(family = "MathJax_Math"), axis.title = element_text(size = 22), axis.title.x = element_text(hjust = .95, margin = margin(0.15, 0, 0, 0, unit = "lines")), axis.title.y = element_text(vjust = .95, margin = margin(0, 0.15, 0, 0, unit = "lines")), axis.line = element_line( color = "grey70", size = 0.25, arrow = arrow(angle = 30, length = unit(0.15, "inches") )), plot.margin = structure(c(1, 0, 1, 0), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_axes_serif <- theme_void() + theme( text = element_text(family = "MathJax_Main"), axis.title = element_text(size = 22), axis.title.x = element_text(hjust = .95, margin = margin(0.15, 0, 0, 0, unit = "lines")), axis.title.y = element_text(vjust = .95, margin = margin(0, 0.15, 0, 0, unit = "lines")), axis.line = element_line( color = "grey70", size = 0.25, arrow = arrow(angle = 30, length = unit(0.15, "inches") )), plot.margin = structure(c(1, 0, 1, 0), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_axes <- theme_void() + theme( text = element_text(family = "Fira Sans Book"), axis.title = element_text(size = 18), axis.title.x = element_text(hjust = .95, margin = margin(0.15, 0, 0, 0, unit = "lines")), axis.title.y = element_text(vjust = .95, margin = margin(0, 0.15, 0, 0, unit = "lines")), axis.line = element_line( color = grey_light, size = 0.25, arrow = arrow(angle = 30, length = unit(0.15, "inches") )), plot.margin = structure(c(1, 0, 1, 0), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_set(theme_gray(base_size = 20)) # Column names for regression results reg_columns <- c("Term", "Est.", "S.E.", "t stat.", "p-Value") # Function for formatting p values format_pvi <- function(pv) { return(ifelse( pv < 0.0001, "<0.0001", round(pv, 4) %>% format(scientific = F) )) } format_pv <- function(pvs) lapply(X = pvs, FUN = format_pvi) %>% unlist() # Tidy regression results table tidy_table <- function(x, terms, highlight_row = 1, highlight_color = "black", highlight_bold = T, digits = c(NA, 3, 3, 2, 5), title = NULL) { x %>% tidy() %>% select(1:5) %>% mutate( term = terms, p.value = p.value %>% format_pv() ) %>% kable( col.names = reg_columns, escape = F, digits = digits, caption = title ) %>% kable_styling(font_size = 20) %>% row_spec(1:nrow(tidy(x)), background = "white") %>% row_spec(highlight_row, bold = highlight_bold, color = highlight_color) } ``` $$ \begin{align} \def\ci{\perp\mkern-10mu\perp} \end{align} $$ # Prologue --- name: schedule # Schedule ## Last time An analytical solution to cluter-robust inference ## Today Inference using (re)randomization .super[.pink[†]] .footnote[ .pink[†] These notes follow [notes](https://imai.fas.harvard.edu) by Kosuke Imai, *[Field Experiments](https://books.wwnorton.com/books/webad.aspx?id=24003)* by Gerber and Green, and *[Causal Inference for Statistics, Social, and Biomedical Sciences ](https://www.cambridge.org/core/books/causal-inference-for-statistics-social-and-biomedical-sciences/71126BE90C58F1A431FE9B2DD07938AB)* by Imbens and Rubin. ] ## Upcoming The end is near. As is the final. --- layout: true # Inference and (re)randomization --- class: inverse, middle --- ## Inference recap Our inference techniques have focused on (asymptotic) .attn[analytical methods]. 1. Choose (or derive) an estimator 2. Derived the estimator's (asymptotic) distribution.super[.pink[†]] 3. Construct confidence intervals or hypothesis tests .footnote[ .pink[†] And, consequently, standard errors. ] --- name: resampling ## Resampling .attn[Resampling methods] offers a different, more computationally intense (less asymptoically intense) approach. -- A .attn[resampling method] involves repeatedly drawing samples (*resampling*) from a dataset and refitting the model of interest on each sample. We can learn about the behavior of the model through its performance across the many iterations..super[.pink[†]] .footnote[ .pink[†] This approach is very similar to our Monte Carlo simulations, except that we will sample *with replacement* from a single dataset. ] -- .note[Common implementations:] Bootstrap (and jackknife), cross validation, permutation tests/randomization inference --- layout: true # The bootstrap --- name: boot class: inverse, middle --- ## Basics .attn[Bootstrapping] resamples, .it[with replacement], from the original dataset. -- - In each sample, we apply our estimator. - Then, we consider the distribution/properties of these estimates. This resampling helps us better understand the uncertainty associated with our estimator (within the current data setting). --- name: boot-formal ## More formally Let's formalize the bootstrap a bit. - $Z$ denotes our original dataset (_e.g._, $Z = \left[ \text{Y} \mid \text{X} \right]$ in our standard setup). -- - $\hat\alpha(Z)$ refers to the estimate for $\alpha$ derived from our dataset $Z$. -- - We draw $B$ bootstrap samples $b\in \left\{ 1,\,\ldots,\, B \right\}$. -- - $Z^{\star 1}$ represents our first bootstrap sample $\left( b=1 \right)$. -- - $\hat{\alpha}^{\star 1} = \hat\alpha(Z^{\star1})$ is our estimator evaluated on the first bootstrap sample. -- The .attn[bootstrapped standard error] of $\hat\alpha$ is the standard deviation of the $\hat\alpha^{\star b}$ $$ \begin{align} \mathop{\text{SE}_{B}}\left( \hat\alpha \right) = \sqrt{\dfrac{1}{B} \sum_{b=1}^{B} \left( \hat\alpha^{\star b} - \dfrac{1}{B} \sum_{\ell=1}^{B} \hat\alpha^{\star \ell} \right)^2} \end{align} $$ --- exclude: true ```{R, ex-boot-0, echo = F} # Generate the dataset set.seed(123) n = 9 z = tibble(x = 1:n, y = 1 + x + rnorm(n, sd = 5)) b = lm(y ~ x, data = z)$coefficient[2] boot_colors <- magma(n, begin = 0.1, end = 0.93) s = 1:n base_df <- expand.grid(x = 1:sqrt(n), y = 1:sqrt(n)) %>% as_tibble() # Bootstrap 1 s1 <- sample(1:n, n, replace = T) z1 <- z[s1,] b1 <- lm(y ~ x, data = z1)$coefficient[2] # Bootstrap 2 s2 <- sample(1:n, n, replace = T) z2 <- z[s2,] b2 <- lm(y ~ x, data = z2)$coefficient[2] # Bootstrap 3 s3 <- sample(1:n, n, replace = T) z3 <- z[s3,] b3 <- lm(y ~ x, data = z3)$coefficient[2] # Bootstrap 4 s4 <- sample(1:n, n, replace = T) z4 <- z[s4,] b4 <- lm(y ~ x, data = z4)$coefficient[2] ``` --- name: boot-graph ## More graphically .thin-left[ $$Z$$ ```{R, g1-boot0, echo = F, out.width = "100%"} # Graph individuals ggplot( data = base_df %>% mutate(fill = 1:n, lab = s), aes(x, y, fill = as.factor(fill)) ) + geom_tile(color = "white", size = 1.5) + geom_text(aes(label = lab), color = "white", size = 20) + coord_equal() + scale_fill_manual(values = boot_colors[s]) + scale_color_manual(values = boot_colors[s]) + theme_void() + theme(legend.position = "none") ``` $$\hat\beta = `r b %>% round(3)`$$ ```{R, g2-boot0, echo = F, out.width = '100%'} # Graph individuals ggplot( data = z %>% mutate(s = 1:n), aes(x, y, color = as.factor(s)) ) + geom_smooth(method = lm, se = F, color = "grey85", size = 5) + geom_point(size = 20, alpha = 0.5) + coord_equal() + xlim(-0.5,n+0.5) + scale_color_manual(values = boot_colors[s]) + theme_void() + theme(legend.position = "none") ``` ] -- .thin-left[ $$Z^{\star 1}$$ ```{R, g1-boot1, echo = F, out.width = "100%"} # Graph individuals ggplot( data = base_df %>% mutate(fill = 1:n, lab = s1), aes(x, y, fill = as.factor(fill)) ) + geom_tile(color = "white", size = 1.5) + geom_text(aes(label = lab), color = "white", size = 20) + coord_equal() + scale_fill_manual(values = boot_colors[s1]) + scale_color_manual(values = boot_colors[s1]) + theme_void() + theme(legend.position = "none") ``` $$\hat\beta = `r b1 %>% round(3)`$$ ```{R, g2-boot1, echo = F, out.width = '100%'} # Graph individuals ggplot( data = z1 %>% mutate(s = 1:n), aes(x, y, color = as.factor(s)) ) + geom_smooth(method = lm, se = F, color = "grey85", size = 5) + geom_point(size = 20, alpha = 0.5) + coord_equal() + xlim(-0.5,n+0.5) + scale_color_manual(values = boot_colors[s1]) + theme_void() + theme(legend.position = "none") ``` ] -- .thin-left[ $$Z^{\star 2}$$ ```{R, g1-boot2, echo = F, out.width = "100%"} # Graph individuals ggplot( data = base_df %>% mutate(fill = 1:n, lab = s2), aes(x, y, fill = as.factor(fill)) ) + geom_tile(color = "white", size = 1.5) + geom_text(aes(label = lab), color = "white", size = 20) + coord_equal() + scale_fill_manual(values = boot_colors[s2]) + scale_color_manual(values = boot_colors[s2]) + theme_void() + theme(legend.position = "none") ``` $$\hat\beta = `r b2 %>% round(3)`$$ ```{R, g2-boot2, echo = F, out.width = '100%'} # Graph individuals ggplot( data = z2 %>% mutate(s = 1:n), aes(x, y, color = as.factor(s)) ) + geom_smooth(method = lm, se = F, color = "grey85", size = 5) + geom_point(size = 20, alpha = 0.5) + coord_equal() + xlim(-0.5,n+0.5) + scale_color_manual(values = boot_colors[s2]) + theme_void() + theme(legend.position = "none") ``` ] -- .left5[


⋯ ] .thin-left[ $$Z^{\star B}$$ ```{R, g1-boot3, echo = F, out.width = "100%"} # Graph individuals ggplot( data = base_df %>% mutate(fill = 1:n, lab = s3), aes(x, y, fill = as.factor(fill)) ) + geom_tile(color = "white", size = 1.5) + geom_text(aes(label = lab), color = "white", size = 20) + coord_equal() + scale_fill_manual(values = boot_colors[s3]) + scale_color_manual(values = boot_colors[s3]) + theme_void() + theme(legend.position = "none") ``` $$\hat\beta = `r b3 %>% round(3)`$$ ```{R, g2-boot3, echo = F, out.width = '100%'} # Graph individuals ggplot( data = z3 %>% mutate(s = 1:n), aes(x, y, color = as.factor(s)) ) + geom_smooth(method = lm, se = F, color = "grey85", size = 5) + geom_point(size = 20, alpha = 0.5) + coord_equal() + xlim(-0.5,n+0.5) + scale_color_manual(values = boot_colors[s3]) + theme_void() + theme(legend.position = "none") ``` ] --- Running this bootstrap 10,000 times ```{R, boot-full, cache = T, eval = T} plan(multiprocess, workers = 10) # Set a seed set.seed(123) # Run the simulation 1e4 times boot_df <- future_map_dfr( # Repeat sample size 100 for 1e4 times rep(n, 1e4), # Our function function(n) { # Estimates via bootstrap est <- lm(y ~ x, data = z[sample(1:n, n, replace = T), ]) # Return a tibble data.frame(int = est$coefficients[1], coef = est$coefficients[2]) }, # Let furrr know we want to set a seed .options = future_options(seed = T) ) ``` --- layout: false class: clear, middle ```{R, boot-full-graph, echo = F, dev = 'png', dpi = 250, cache = T} ggplot( data = z, aes(x, y, fill = as.factor(1:n)) ) + geom_abline( data = boot_df, aes(intercept = int, slope = coef), color = "grey50", alpha = 0.01 ) + geom_abline( intercept = lm(y ~ x, z)$coefficient[1], slope = lm(y ~ x, z)$coefficient[2], color = "black", size = 1.25 ) + geom_point( size = 10, stroke = 0.75, color = "white", shape = 21 ) + # coord_equal() + # xlim(-0.5,n+0.5) + scale_fill_manual(values = boot_colors[s]) + theme_void() + theme(legend.position = "none") ``` --- layout: true # The bootstrap --- ## Comparison In this 10,000-sample bootstrap, we calculate a standard error for $\hat\beta_1$ of approximately `r sd(boot_df$coef) %>% round(3)`. -- If we go the old-fashioned OLS route $\left( s^2 \left(\text{X}'\text{X}\right)^2 \right)$, we estimate `r tidy(lm(y~x,z))[2,3] %>% as.numeric() %>% round(3)`. Not bad. --- layout: true # Permutation tests --- name: perm class: inverse, middle --- name: perm-motive ## Motivation Consider the null hypothesis of *no average treatment effect*, _i.e._, .center[ H.sub[o]: $\overline{\text{Y}}_{0} = \overline{\text{Y}}_{1} \quad \left(\implies \overline{\tau}=0 \right)$ ] -- We've discussed how randomization avoids the pitfalls of selection bias. -- Randomization can also clarify inference—helping quantify uncertainty. -- .qa[Q] How? -- .qa[A] We know exactly how the randomness happened (we assigned it), so we don't need parametric assumptions to derive a distribution under H.sub[o]! --
We use the .hi[experimental design], rather than a probability model. --- name: perm-tea ## Tea drinkers .ex[Classic example] Sir R. A. Fisher had a colleague who claimed to be able to tell whether the tea was poured into milk *or* milk was poured into the tea..super[.pink[†]] .footnote[ .pink[†] Don't worry, Fisher is known for more than this one experiment. ] -- Being the friend he was, Fisher designed an experiment to determine whether his colleague was telling the truth. -- Fisher randomized the order of 8 cups of tea: - 4 cups with .hi-purple[m]ilk added first - 4 cups with .hi-pink[t]ea added first -- Vindication! His colleague got all 8 correct. --
.qa[Q] With random guessing, how likely is correctly guessing all 8 cups? --- ## Tea drinkers 2 .qa[Q] With random guessing, how likely is correctly guessing all 8 cups? -- This question reflects our understanding of a .hi-slate[*p*-value]. If Fisher's colleague had no ability and simply guessed (H.sub[o]), what is the probability she would have guessed all 8 cups correctly? -- Fisher's H.sub[o]: the answers were unrelated to the cups' actual contents. Under this hypothesis, we can re-randomize the cups and see how many times her answer was perfectly correct. -- This is the idea behind .attn[permutation testing] and .attn[randomization inference]. --- ## Tea drinkers with a vengeance .left10.center[ .note[Cup.sub[]]
1
2
3
4
5
6
7
8 ] -- .left10.center[ .note[Guess.sub[]]
.hi-purple[m]
.hi-pink[t]
.hi-pink[t]
.hi-purple[m]
.hi-purple[m]
.hi-pink[t]
.hi-pink[t]
.hi-purple[m] ] -- .left10.center[ .note[Truth.sub[]]
.hi-purple[m]
.hi-pink[t]
.hi-pink[t]
.hi-purple[m]
.hi-purple[m]
.hi-pink[t]
.hi-pink[t]
.hi-purple[m]
.smaller[8/8] ] -- .left10.center[ .note[P.sub[1]]
.hi-purple[m]
.hi-purple.st[m]
.hi-purple.st[m]
.hi-purple[m]
.hi-pink.st[t]
.hi-pink[t]
.hi-pink[t]
.hi-pink.st[t]
.smaller[4/8] ] -- .left10.center[ .note[P.sub[2]]
.hi-purple[m]
.hi-purple.st[m]
.hi-purple.st[m]
.hi-pink.st[t]
.hi-purple[m]
.hi-pink[t]
.hi-pink[t]
.hi-pink.st[t]
.smaller[4/8] ] -- .left10.center[ .note[P.sub[3]]
.hi-purple[m]
.hi-purple.st[m]
.hi-purple.st[m]
.hi-pink.st[t]
.hi-pink.st[t]
.hi-purple.st[m]
.hi-pink[t]
.hi-pink.st[t]
.smaller[2/8] ] -- .left5.center[ .note[⋯] ] .left10.center[ .note[P.sub[70]]
.hi-pink.st[t]
.hi-pink[t]
.hi-pink[t]
.hi-pink.st[t]
.hi-purple[m]
.hi-purple.st[m]
.hi-purple.st[m]
.hi-purple[m]
.smaller[4/8] ] -- .left30.pad-left[
```{R, graph-tea, echo = F, out.width = '100%', fig.asp = 1.4/1} ggplot( data = tibble(x = 2 * (0:4), n = c(1, 16, 36, 16, 1)), aes(x = x, y = n) ) + geom_col() + geom_hline(yintercept = 0, size = 2) + scale_x_continuous( "N. correct", breaks = 2 * (0:4) ) + scale_y_continuous("Count") + theme_pander(base_size = 65, base_family = "Fira Sans Book") + theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) ``` ] -- .clear-up[ So our permutation-test-based *p*-value is 1/70 $\approx$ 0.0143. $\implies$ Reject H.sub[o]. ] --- name: perm-gen ## Generalization The procedure for permutation-based hypothesis testing.super[.pink[†]] is the same as our "standard" asymptotic-based hypothesis testing. .footnote[ .pink[†] Also called *Fisher's exact test*, as you get exact *p*-values. ] -- 1. .hi-slate[Define hypotheses], H.sub[o] and H.sub[a]. 1. Choose our .hi-slate[rejection threshold] $\alpha$ (tolerated type-I error rate). 1. Choose a .hi-slate[test statistic] that is a function of our sample. 1. Derive/calculate the .hi-slate[test statistic's distribution .it[under H.sub[o]]]. 1. .hi-slate[Compute the *p*-value] by comparing test stat. to its H.sub[o] distribution. 1. .hi-slate[Conclusions]—reject or fail to reject H.sub[o]. -- .note[The difference:] Permutation tests use the randomization's mechanism to construct the test-statistic's exact distribution under H.sub[o]. --- ## More generally Fisher focused on testing a .attn[sharp null hypothesis]—no effect *for anyone*, _i.e._, .center[ H.sub[o]: $\text{Y}_{1i} - \text{Y}_{0i} = 0 \enspace \forall i \quad \left( \implies \tau_i = 0 \enspace \forall i \right)$ ] against an alternative hypothesis that someone has a non-zero effect .center[ H.sub[a]: $\enspace \text{Y}_{1i} - \text{Y}_{0i} \neq 0$ for some $i$ $\left( \implies \exists i \enspace \text{s.t.} \enspace \tau_i \neq 0\right)$ ] -- A .attn[sharp null hypothesis] is specified *for all individuals*, *e.g.*, .center[ H.sub[o]: $\text{Y}_{1i} - \text{Y}_{0i} = C \enspace \forall i$ ] which differs from the ATE-based nulls that we normally consider, *e.g.*, .center[ H.sub[o]: $\mathop{E}\left[\text{Y}_{1i} - \text{Y}_{0i}\right] = C$. ] --- name: perm-insight ## Key insight Our estimate (or test statistic) is a function of 1. individuals' responses $\left( \text{Y}_{i} \right)$ 2. individuals' treatment assignments $\left( \text{D}_{i} \right)$ -- .hi-slate[Under the sharp null] H.sub[o]: $\tau_i=0$ $\left( \forall i \right)$ - $\text{Y}_{0i} = \text{Y}_{1i} = \text{Y}_{i} \enspace \forall i$ (*i.e.*, changing $\text{D}_{i}$ will not affect observed $\text{Y}_{i}$) -- - Permutations of $\text{D}$ construct the *exact* null distribution (unchanged $\text{Y}$). -- The number of possible permutations can get big -- —*e.g.*, 500 treated and 500 control has 2.7 $\times$ 10.super[299] options. -- Approximate the distribution by sampling. --- ## On average The sharp null was central to Fisher's interpretation. [Neyman *et al.* (1935)](https://www-jstor-org.libproxy.uoregon.edu/stable/2983637) extended.super[.pink[†]] this idea of permutation-based tests to the average treatment effect (testing H.sub[o]: $\mathop{E}\left[ \text{Y}_{1i} \right] - \mathop{E}\left[ \text{Y}_{0i} \right] = 0$). Neyman and others also added standard errors and confidence intervals. .footnote[ .pink[†] Fisher, paraphrased: .bigger[🤬]
.pink[††] *Permutation tests* and *Randomization inference* are not the most strictly defined terms. ] -- These extensions have come to be known as .attn[randomization inference]..super[.pink[††]] --- layout: true # Randomization inference --- class: inverse, middle name: random --- name: random-setup ## Setup In order to .hi-slate[generalize our null hypothesis to the average treatment effect], .center[ H.sub[o]: $\overline{\tau} = 0 \implies \mathop{E}\left[ \text{Y}_{1i} - \text{Y}_{0i} \right] = 0$ ] we have to give up something. 1. If we want an exact null distribution, then we must .hi-pink[assume a uniform treatment effect]. -- (Assuming our way back to a sharp null.) -- 2. If we want to avoid assuming $\tau_i = \overline{\tau} \enspace \forall i$, then we have to .hi-purple[accept a non-exact null distribution]. (We don't observe $\text{Y}_{0i}$ for $\text{D}_{i}=1$.) -- If we don't like either option, then we need to go back to deriving asymptotic properties via probability modeling assumptions. --- ## Implementation Once we decide which simplification we're willing to accept, we proceed similarly to permutation tests: - shuffle $\text{D}$ in a way that mimics treatment assignment - collect test statistics from each iteration -- .note[Note] Monte Carlo simulations, bootstrap, permutation tests, and randomization all apply very similar processes. --- ## (Which) Test statistics We still need to choose a test statistic on which we base the *p*-value. - The .hi-slate[actual estimate]—difference in means or coefficient - .hi-slate[Transformed estimates] - .hi-slate[Quantiles], *e.g.*, the median - .hi-slate[_t_ statistic] - .hi-slate[Rank] statistics -- We can also extend this idea to .hi-pink[confidence intervals]. .ex[*E.g.*,] Use the point estimates associated with the 2.5.super[th] and 95.super[th] percentiles to construct a 95% confidence interval. --- name: random-example ## Example Back to the LaLonde NSW dataset. We previously estimated ```{R, read-nsw, echo = F} # Read NSW data nsw_df <- haven::read_dta("nsw.dta") # Estimate est_ols <- lm_robust(re78 ~ treat, data = nsw_df) %>% tidy() ``` - the NSW increased real earnings by $\color{#e64173}{\hat{\beta}_1}\approx$ .pink[`r est_ols[2,"estimate"] %>% scales::dollar()`] - (het.-robust) standard error of .pink[`r est_ols[2,"std.error"] %>% scales::dollar()`] - _t_ statistic $\color{#e64173}{t_\text{stat}}\approx$ .pink[`r est_ols[2,"statistic"] %>% round(2)`] with *p*-value $\approx$ .pink[`r est_ols[2,"p.value"] %>% round(4)`] -- Let's re-randomize treatment 10,000 times. In each .hi-purple[iteration] $\color{#6A5ACD}{r}$, calculate 1. $\color{#6A5ACD}{\hat{\beta}_{1}^{r}}$, the .hi-purple[point estimate] (the regression coefficient) 1. $\color{#6A5ACD}{t_\text{stat}^{r}}$, the .hi-purple[_t_ statistic] -- Then calculate the implied *p*-values using the location of $\color{#e64173}{\hat\beta_1}$ and $\color{#e64173}{t_\text{stat}}$ in the distributions of $\color{#6A5ACD}{\hat{\beta}_1^r}$ and $\color{#6A5ACD}{t_\text{stat}^r}$, respectively..super[.pink[†]] .footnote[ .pink[†] Very similar exercise for confience intervals. ] --- ## Example: Re-randomization The main decision is how to generate treatment. .qa[Q] Should we permute $\text{D}$ or draw $\text{D}_{i}$ for each individual?.super[.pink[†]] .footnote[ .pink[†] The difference is in whether we hold the number of treated individuals constant. ] -- .qa[A] How was the original randomization conducted? -- We'll assume the NSW started with a set number of treatments to disperse. --- layout: true class: clear --- First, we'll write a function that performs one iteration. ```{R, random-fun1, eval = T} # Arguments: 'i' (iteration), 'n_t' (# of trt) fun_randomization <- function(i) { # Sample the treatment vector. NOTE: Sampling WITHOUT replacement t_i <- sample(nsw_df$treat, size = nrow(nsw_df), replace = F) # Regression using our re-randomized treatment est_i <- lm_robust(re78 ~ t_i, data = nsw_df) %>% tidy() # Return tibble with iteration, point estimate, and test statistic tibble(i, est = est_i[2,"estimate"], t_stat = est_i[2,"statistic"]) } ``` -- And now run the re-randomization function 10,000 times. ```{R, random-fun2, cache = T, eval = T} # Set up parallelization and seed plan(multiprocess, workers = 4); set.seed(1234) # Run the simulation 1e4 times random_df <- future_map_dfr( 1:1e4, fun_randomization, .options = future_options(seed = T) ) ``` --- .note[Result 1] Share $\color{#6A5ACD}{\big|\hat{\beta}_1^r\big|} > \color{#e64173}{\hat{\beta}_1} =$ `r mean(abs(random_df$est)>abs(est_ols[2,"estimate"])) %>% round(4)`. (Original *p*-value $=$ .pink[`r round(est_ols[2,"p.value"], 4)`]) ```{R, random-dist1, echo = F, fig.height = 6.75} # Calculate density gg_df <- density(random_df$est, n = 1e3, kernel = "epanechnikov") %$% tibble( est = x, density = y, reject = abs(est) > est_ols[2,"estimate"] ) ggplot( data = gg_df, aes(x = est, ymin = 0, ymax = density) ) + geom_ribbon(fill = "grey85", alpha = 0.8) + geom_ribbon( data = gg_df %>% filter(est > abs(est_ols[2,"estimate"])), fill = purple ) + geom_ribbon( data = gg_df %>% filter(est < -abs(est_ols[2,"estimate"])), fill = purple ) + geom_hline(yintercept = 0) + geom_vline( xintercept = est_ols[2,"estimate"], color = red_pink, size = 1, linetype = "longdash" ) + xlab(TeX("$\\widehat{\\beta}_{1}^\\textit{r}$")) + theme_pander(base_size = 18, base_family = "Fira Sans Book") + theme( axis.text.y = element_blank(), axis.ticks.x = element_blank(), axis.title.x = element_text(family = "TeX Gyre Termes", size = 20), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) ``` --- .note[Result 2] Share $\color{#6A5ACD}{\big| t_\text{stat}^r\big|} > \color{#e64173}{t_\text{stat}} =$ `r mean(abs(random_df$t_stat)>abs(est_ols[2,"statistic"])) %>% round(4)`. (Original *p*-value $=$ .pink[`r round(est_ols[2,"p.value"], 4)`]) ```{R, random-dist2, echo = F, fig.height = 6.75} # Calculate density gg_df <- density(random_df$t_stat, n = 1e3, kernel = "epanechnikov") %$% tibble( stat = x, density = y, reject = abs(stat) > est_ols[2,"statistic"] ) ggplot( data = gg_df, aes(x = stat, ymin = 0, ymax = density) ) + geom_ribbon(fill = "grey85", alpha = 0.8) + geom_ribbon( data = gg_df %>% filter(stat > abs(est_ols[2,"statistic"])), fill = purple ) + geom_ribbon( data = gg_df %>% filter(stat < -abs(est_ols[2,"statistic"])), fill = purple ) + geom_hline(yintercept = 0) + geom_vline( xintercept = est_ols[2,"statistic"], color = red_pink, size = 1, linetype = "longdash" ) + xlab(TeX("\\textit{t}$_{stat}^\\textit{r}$")) + theme_pander(base_size = 18, base_family = "Fira Sans Book") + theme( axis.text.y = element_blank(), axis.ticks.x = element_blank(), axis.title.x = element_text(family = "TeX Gyre Termes", size = 20), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) ``` --- layout: true # Randomization inference --- name: random-ci ## Confidence intervals To construct confidence intervals, we .hi[invert] randomization-based .hi[hypothesis tests], imposing a range of null hypotheses. -- .ex[E.g.,] To construct a 95% C.I. for $\color{#e64173}{\hat{\tau}}$ 1. Impose the null hypothesis H.sub[o]: $\tau = \tau_o$ for many values of $\tau_o$. 2. Find all values of $\tau_o$ that do not reject $\color{#e64173}{\hat{\tau}}$ at the 5% level. -- .note[Note] We must to be able to clearly impose the null in our "model". --- layout: true class: clear --- Athey and Imbens ([2016](https://arxiv.org/pdf/1607.00698.pdf)) .hi-slate[on regression and randomization inference]:.super[.pink[†]] .footnote[ .pink[†] Specifically in the context of experiments, though the concerns should remain in other contexts. ] > Although these methods [regression] remain the most popular way of analyzing data from randomized experiments, .purple[we suggest caution in using them]. > ... In particular there is a disconnect between the way the conventional assumptions in regression analyses are formulated and the implications of randomization. As a result it is easy for the researcher using regression methods to go beyond analyses that are justified by randomization, and end up with analyses that rely on a .purple[difficult-to-assess mix of randomization assumptions, modeling assumptions, and large sample approximation]. --- Athey and Imbens ([2016](https://arxiv.org/pdf/1607.00698.pdf)) .hi-slate[on regression and randomization inference]:.super[.pink[†]] .footnote[ .pink[†] Specifically in the context of experiments, though the concerns should remain in other contexts. ] > Ultimately .purple[we recommend that researchers wishing to use regression or other model-based methods rather than the randomization-based methods we prefer, do so with care]. For example, using only indicator variables based on partitioning the covariate space, rather than using multi-valued variables as covariates in the regression function preserves many of the finite sample properties that simple comparisons of means have, and leads to regression estimates with clear interpretations. In addition, .purple[in many cases the potential gains from regression adjustment can also be captured by careful ex ante design], that is, through stratified randomized experiments to be discussed in the next section, without the potential costs associated with ex post regression adjustment. --- exclude: true ```{R, fun-invert-ci, echo = F} # Arguments: 'null', the null values fun_invert_i <- function(null) { # Sample the treatment vector. NOTE: Sampling WITHOUT replacement t_i <- sample(nsw_df$treat, size = nrow(nsw_df), replace = F) map_dfr( null, function(null_j) { # Impose the null (generate outcomes) y_ij <- nsw_df$re78 + t_i * null_j # Regression using our re-randomized treatment est_ij <- lm_robust(y_ij ~ t_i) %>% tidy() # Return tibble with point estimate, se, test statistic, and null tibble( est = est_ij[2,"estimate"], se = est_ij[2, "std.error"], t_stat = est_ij[2,"statistic"], null = null_j ) } ) %>% mutate(null_group = unlist(null) %>% seq_along()) } # Function to run the function a bunch of times fun_invert <- function(null, times = 1e3) { plan(multiprocess, workers = 4) future_map_dfr( rep(null, times), fun_invert_i, .options = future_options(seed = T) ) } ``` ```{R, run-invert-ci, echo = F, cache = T, eval = T} ci_df <- fun_invert( # null = list(quantile(random_df$est, seq(0, 1, 0.01))), null = list(seq(from = est_ols[2,"conf.low"], to = est_ols[2,"conf.high"], length.out = 100)), times = 1e4 ) ``` ```{R, summarize-ci, eval = T} # Add groups and test ci_sum <- ci_df %>% mutate( reject = 2 * pt(abs((est - est_ols[2,"estimate"])/se), df = 720, lower.tail = F) < 0.05 ) # Summarize ci_sum %<>% group_by(null_group) %>% summarize( reject = mean(reject), null = first(null) ) ``` --- exclude: true .note[Constructing a 95% confidence interval] ```{R, random-ci1, echo = F, fig.height = 6.75, eval = T} ggplot( data = ci_sum, aes(x = null, y = reject, color = reject > 0.1) ) + geom_hline(yintercept = 0.10, linetype = "dotted") + geom_hline(yintercept = 0) + geom_point(size = 2.5) + xlab(TeX("$\\widehat{\\beta}_{1}^\\textit{o}$")) + ylab("Share rejecting original estimate") + scale_y_continuous(breaks = seq(0, 0.5, 0.1)) + scale_color_manual("", values = c(purple, "grey85")) + theme_pander(base_size = 18, base_family = "Fira Sans Book") + theme( legend.position = "none", axis.ticks = element_blank(), axis.title.x = element_text(family = "TeX Gyre Termes", size = 20, angle = 0, vjust = 0.5), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) ``` --- layout: true # Randomization and clustering --- class: inverse, middle name: clustering --- ## The plot thickens Permutation tests and randomization inference both work because we know.super[.pink[†]] the process through which treatment was randomly assigned. .footnote[ .pink[†] Or claim to understand. ] -- If treatment is correlated within groups, then our bootstraps, permutations, and re-randomizations need to reflect this dependence. --- layout: false name: reading # Further reading ## Papers [Bootstrap-Based Improvements for Inference with Clustered Errors](https://www.mitpressjournals.org/doi/abs/10.1162/rest.90.3.414)
Cameron, Gelbach, and Miller (2008) [Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results](https://academic.oup.com/qje/article/134/2/557/5195544)
Young (2019) [The Econometrics of Randomized Experiments](https://arxiv.org/pdf/1607.00698.pdf) Athey and Imbens (2016) [Randomization Inference With Natural Experiments](https://www.tandfonline.com/doi/abs/10.1198/016214505000001258)
Ho and Imai (2012)
.note[Also:] [Notes](https://imai.fas.harvard.edu) by Kosuke Imai --- # Further reading ### Books: Resampling methods and the bootstrap *[An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/)*
James, Witten, Hastie, and Tibshirani *[Elements of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/)*
Hastie, Tibshirani, and Friedman ### Books: Permutation tests and randomization inference *[Causal Inference for Statistics, Social, and Biomedical Sciences ](https://www.cambridge.org/core/books/causal-inference-for-statistics-social-and-biomedical-sciences/71126BE90C58F1A431FE9B2DD07938AB)*
Imbens and Rubin *[Field Experiments](https://books.wwnorton.com/books/webad.aspx?id=24003)*
Gerber and Green --- layout: false # Table of contents .col-left[ #### Admin .smaller[ 1. [Schedule](#schedule) 1. [Further reading](#reading) ] ] .col-right[ #### Inference and randomization .smaller[ 1. [Resampling](#resampling) 1. [The bootstrap](#boot) - [Basics](#boot) - [Semi-formally](#boot-formal) - [Graphically](#boot-graph) 1. [Permutation tests](#perm) - [Motivation](#perm-motive) - [Tea tests](#perm-tea) - [Generalization](#perm-gen) - [Basics](#perm-insight) 1. [Randomization inference](#random) - [Setup](#random-setup) - [Example](#random-example) - [Confidence intervals](#random-ci) 1. [Clustering](#clustering) ] ] --- exclude: true ```{R, generate pdfs, include = F, eval = T} source("../../ScriptsR/unpause.R") unpause("11Randomization.Rmd", ".", T, T) ```