The minimum detectable effect1 - MDE - in an experiment design is related to the power of that design, that is, the probability that, for a given effect size and statistical significance level, you will be able to reject the null hypothesis of zero effect. Everything else constant, higher sample sizes improve power, thus, reducing the MDE.
Before conducting the experiment itself, you can use survey data to assess the design’s power. In this class we will consider a setting where we have baseline student-level data on a sample of several schools and would like to randomize a treatment at the school level. We will also analyse the impact of including student-level covariates in power. Since treatment is assigned at the school-level, standard errors should account for clustering. We start by loading our known escolas.csv
dataset.
library(data.table)
library(fixest)
library(ggplot2)
setDTthreads(100) # data.table uses all cpu cores
setFixest_nthreads(4) # fixest uses all cpu cores
<- fread("Data/escolas.csv")[, -c("V1")]
dt summary(dt)
## escola aluno logico mulher
## Length:906 Min. : 1.0 Min. :-1.40923 Min. :0.0000
## Class :character 1st Qu.:227.2 1st Qu.:-0.77069 1st Qu.:0.0000
## Mode :character Median :453.5 Median :-0.13215 Median :0.0000
## Mean :453.5 Mean :-0.00218 Mean :0.4735
## 3rd Qu.:679.8 3rd Qu.: 0.66603 3rd Qu.:1.0000
## Max. :906.0 Max. : 3.06056 Max. :1.0000
## NA's :2
## idade
## Min. : 8.784
## 1st Qu.:10.661
## Median :11.060
## Mean :11.219
## 3rd Qu.:11.402
## Max. :17.005
##
Basic Principles
Consider a simple regression framework with binary treatment \(T\in\{0, 1\}\) and a proportion \(P\) of treated individuals. We are interested in the OLS coefficient \(\beta\)
\[ \begin{equation} Y_i=\alpha+\beta T_i + \varepsilon_i. \end{equation} \]
Consider that each individual was randomly sampled from an identical population, so that observations can be assumed to be i.i.d., with variance \(\sigma^2\). The variance of \(\hat\beta\) is given by
\[ \begin{equation} \sigma^2_{\beta}=\frac{1}{P(1-P)}\frac{\sigma^2}{N}. \end{equation} \] Hence, a higher sample size \(N\) and a balanced proportion of treated units \((P\sim 0.5)\) will improve the precision of our estimation leading to a lower MDE. If we are interested in testing the hypothesis, \(H_0\), that the effect of the program is equal to zero against the alternative that it is not, then, for a given significance level, \(\alpha\) and power, \(\kappa\), the MDE is given by
\[ \begin{equation} MDE=\left(t_{1-\kappa}+t_{\alpha/2}\right)\sqrt{\frac{1}{P(1-P)}\frac{\sigma^2}{N}} \end{equation} \]
There is a trade-off between power and significance level. For some desirable MDE, if you want to have more power on your design you must accept a higher probability of committing Type I error.
Fully Randomized Experiment
Let’s first consider the case of a fully (i.e. individual level) randomized experiment with homogeneous treatment effect. Moreover, we will start with a simple regression and incrementally add regressors (i.e. control variables) to assess the effect on MDE. Our dependent variable is the logical exam’s score and additional regressors will be age and woman’s flag. The econometric specification takes the form:
\[ \begin{equation} y_i=\alpha+\beta T_i+\gamma_1 age_i+\gamma_2 woman_i+\varepsilon_i \end{equation} \]
# Clean up the data. Never forget it!!
== "S/escola", escola := "99"][, escola := as.integer(escola)]
dt[escola <- dt[!is.na(logico)]
dt summary(dt)
## escola aluno logico mulher
## Min. : 1.00 Min. : 1.0 Min. :-1.40923 Min. :0.0000
## 1st Qu.: 8.00 1st Qu.:226.8 1st Qu.:-0.77069 1st Qu.:0.0000
## Median :14.50 Median :453.5 Median :-0.13215 Median :0.0000
## Mean :15.42 Mean :453.6 Mean :-0.00218 Mean :0.4735
## 3rd Qu.:23.00 3rd Qu.:680.2 3rd Qu.: 0.66603 3rd Qu.:1.0000
## Max. :99.00 Max. :906.0 Max. : 3.06056 Max. :1.0000
## idade
## Min. : 8.784
## 1st Qu.:10.660
## Median :11.060
## Mean :11.219
## 3rd Qu.:11.405
## Max. :17.005
We have to simulate the treatment assignment designed to happen in the experiment. In this case this will be a random draw from \(\{0, 1\}\) at student level. For each treatment effect imposed we simulate a large number of assignments and derive the proportion of null hypothesis rejections (i.e. the power) for that effect.
#' Include the effect in y variable for those observations in the treatment
#' group
set.seed(123456)
<- seq(0, 0.5, by = 0.05) # Effect grid
eff_grid <- 1000
reps <- 0.05
alpha <- copy(dt)
dt2 #' Creates a data.table to hold all replications and effects
<- CJ(effect = eff_grid, replication = seq_len(reps))
mde_dt #' For each row, draw a treatment assignment vector, create an "artificial" outcome y and run regressions. Save the rejections
:= lapply(effect, function(x){
mde_dt[, rejections := rbinom(.N, 1, prob = 0.5)]
dt2[, treat := logico + x*treat]
dt2[, y <- feols(y~treat+csw0(idade, mulher), data = dt2,
model se = "hetero",
lean = TRUE)
lapply(model, function(x) pvalue(x)["treat"] < alpha)
})]#' Unlist the rejections for each model estimated
= .(effect, replication),
mde_dt[, by c("Model1","Model2","Model3") := rejections[[1]]]
# Power table. Columns "Model" present the power for a given effect size
<- mde_dt[, by = effect,
table lapply(.SD, mean),
= patterns("Model")]
.SDcols table
## effect Model1 Model2 Model3
## 1: 0.00 0.044 0.047 0.053
## 2: 0.05 0.109 0.114 0.123
## 3: 0.10 0.335 0.354 0.353
## 4: 0.15 0.601 0.653 0.656
## 5: 0.20 0.851 0.876 0.887
## 6: 0.25 0.965 0.982 0.980
## 7: 0.30 0.996 0.996 0.996
## 8: 0.35 1.000 1.000 1.000
## 9: 0.40 1.000 1.000 1.000
## 10: 0.45 1.000 1.000 1.000
## 11: 0.50 1.000 1.000 1.000
Notice how power increases when we add regressors that help explain variations in the dependent variable. The inclusion of those regressors reduces total variance, thus, improving precision on \(\hat\beta\) and reducing the MDE.
Cluster-randomized Experiment
When analyzing individual data from programs randomized at a group level, it is important to take into account that the error term may not be independent across individuals. In this case, our simulation must take into account the clustered treatment assignment and the possibility of errors grouping. Consider the following setup
\[ \begin{equation} Y_{ij}=\alpha+\beta T_{j}+v_j+w_{ij} \end{equation} \] where \(j\in\{1, \ldots, J\}\) indexes the cluster. Suppose same size clusters with \(n\) individuals, \(v_j\) is i.i.d. with variance \(\tau^2\), and \(w_{ij}\) is i.i.d. with variance \(\sigma^2\). The cluster-robust standard error for \(\hat\beta\) is given by
\[ \begin{equation} \sqrt{\frac{1}{P(1-P)}\frac{n\tau^2+\sigma^2}{nJ}} \end{equation} \] Therefore, the MDE in a cluster-randomized experiment is
\[ \begin{equation} MDE^{CR}=\left(t_{1-\kappa}+t_{\alpha/2}\right)\sqrt{\frac{1}{P(1-P)}\frac{n\tau^2+\sigma^2}{nJ}} \end{equation} \] which will be higher than the individually randomized \(MDE\) and varies roughly proportionally as a function of the number of clusters \(J\) and much less as a function of the number of individuals within a cluster.
set.seed(123456)
<- copy(dt)
dt2 <- CJ(effect = eff_grid, replication = seq_len(reps))
mde2_dt := lapply(effect, function(x){
mde2_dt[, rejections #' Treatment assignment made by school!
= escola, treat := rbinom(1, 1, prob = 0.5)]
dt2[, by := logico + x*treat]
dt2[, y <- feols(y~treat+csw0(idade, mulher), data = dt2,
model cluster = ~escola, # Cluster the errors by school
lean = TRUE)
lapply(model, function(x) pvalue(x)["treat"] < alpha)
})]= .(effect, replication),
mde2_dt[, by c("Model1","Model2","Model3") := rejections[[1]]]
# Power table. Columns "Model" present the power for a given effect size
<- mde2_dt[, by = effect,
table2 lapply(.SD, mean),
= patterns("Model")]
.SDcols table2
## effect Model1 Model2 Model3
## 1: 0.00 0.072 0.065 0.065
## 2: 0.05 0.074 0.068 0.065
## 3: 0.10 0.095 0.096 0.093
## 4: 0.15 0.143 0.160 0.153
## 5: 0.20 0.205 0.236 0.234
## 6: 0.25 0.294 0.324 0.323
## 7: 0.30 0.404 0.456 0.456
## 8: 0.35 0.508 0.587 0.577
## 9: 0.40 0.616 0.675 0.675
## 10: 0.45 0.717 0.777 0.769
## 11: 0.50 0.803 0.865 0.859
What we see here, compared to the fully randomized experiment is a loss of power for a given effect size. When errors are clustered, the effective number of observations is reduced, the coefficient’s standard error increases and so does the MDE.
The previous tables suggest we have an increasing power function with relation to the treatment effect \(\beta\). Take the first model, for example, depicted in the figure bellow.
ggplot(table2, aes(effect, Model1)) +
geom_point() +
geom_line() +
labs(x = "Effect size", y = "Power",
title = "Power function",
subtitle = "Significance level fixed at 5%.") +
theme_light()
Suppose now we want to compute the MDE for a specified power and significance level from the baseline data we were given. Basically, we need to invert the power function. We don’t actually have data on the treatment, remember we are dealing with survey data prior to any intervention, such that we still simulate treatment assignment and compute the power, although, now we have a goal value to pursue. Once we get to that value, then we will know what is the minimum effect size that achieves the desired power.
set.seed(123456)
<- 0.8
kappa <- 0.05
alpha <- 100 # High values may take longer
reps
<- function(beta, kappa = 0.8, alpha = 0.05, reps = 10){
power_diff <- copy(dt) # Copying from global environment
dt2 <- CJ(effect = beta, replication = seq_len(reps))
mde_dt # sapply because return single value
:= sapply(effect, function(x){
mde_dt[, rejections = escola, treat := rbinom(1, 1, prob = 0.5)]
dt2[, by := logico + x*treat]
dt2[, y # Only one (simple) model
<- feols(y~treat, data = dt2,
model cluster = ~escola,
lean = TRUE)
pvalue(model)["treat"] < alpha
})]# Power value. It's an scalar now
<- mde_dt[, mean(rejections)]
pwr return(pwr - kappa)
}<- uniroot(power_diff, interval = c(0, 1),
res kappa = kappa, alpha = alpha, reps = reps)
For this data, a specified power of 0.8 and significance level at 0.05, we have found a MDE of 0.5108621 with estimated precision of 6.1377666^{-5}.
But what if this MDE is not enough? By how much should we increase the sample size to achieve a desired MDE? The “pilot” data has already been collected and cannot (suppose that is the case) be redone. As a rule of thumb the MDE decreases at the square root of “effective observations” rate, where effective observations would be the number of clusters in a grouped experiment. Then, the ratio of MDEs is proportional to the square root of the inverse ratio of observations.
\[ \begin{equation} MDE=MDE_{pilot}\sqrt{\frac{N_{pilot}}{N}} \end{equation} \]
Once we have calculated \(MDE_{pilot}\), for a given power and significance level, it is easy to estimate the minimum sample size required in order to reach a goal MDE. Suppose we want to halve the \(MDE_{pilot}\), then, keeping constant power and significance, we should quadruple the sample size.
paramtest
R package
As a final note, I refer your to check the paramtest
package to make simulations like the ones we have done here. The package is supposed to:
“Run simulations or other functions while easily varying parameters from one iteration to the next. Some common use cases would be grid search for machine learning algorithms, running sets of simulations (e.g., estimating statistical power for complex models), or bootstrapping under various conditions.”
Check the companion script provided, power.R
and the package’s documentation and vignette at the link bellow.
References
A starters reference on the subject is Duflo, Glennerster, and Kremer (2007).↩︎