The minimum detectable effect1 - MDE - in an experiment design is related to the power of that design, that is, the probability that, for a given effect size and statistical significance level, you will be able to reject the null hypothesis of zero effect. Everything else constant, higher sample sizes improve power, thus, reducing the MDE.

Before conducting the experiment itself, you can use survey data to assess the design’s power. In this class we will consider a setting where we have baseline student-level data on a sample of several schools and would like to randomize a treatment at the school level. We will also analyse the impact of including student-level covariates in power. Since treatment is assigned at the school-level, standard errors should account for clustering. We start by loading our known escolas.csv dataset.

library(data.table)
library(fixest)
library(ggplot2)

setDTthreads(100) # data.table uses all cpu cores
setFixest_nthreads(4) # fixest uses all cpu cores
dt <- fread("Data/escolas.csv")[, -c("V1")]
summary(dt)
##     escola              aluno           logico             mulher      
##  Length:906         Min.   :  1.0   Min.   :-1.40923   Min.   :0.0000  
##  Class :character   1st Qu.:227.2   1st Qu.:-0.77069   1st Qu.:0.0000  
##  Mode  :character   Median :453.5   Median :-0.13215   Median :0.0000  
##                     Mean   :453.5   Mean   :-0.00218   Mean   :0.4735  
##                     3rd Qu.:679.8   3rd Qu.: 0.66603   3rd Qu.:1.0000  
##                     Max.   :906.0   Max.   : 3.06056   Max.   :1.0000  
##                                     NA's   :2                          
##      idade       
##  Min.   : 8.784  
##  1st Qu.:10.661  
##  Median :11.060  
##  Mean   :11.219  
##  3rd Qu.:11.402  
##  Max.   :17.005  
## 

Basic Principles

Consider a simple regression framework with binary treatment \(T\in\{0, 1\}\) and a proportion \(P\) of treated individuals. We are interested in the OLS coefficient \(\beta\)

\[ \begin{equation} Y_i=\alpha+\beta T_i + \varepsilon_i. \end{equation} \]

Consider that each individual was randomly sampled from an identical population, so that observations can be assumed to be i.i.d., with variance \(\sigma^2\). The variance of \(\hat\beta\) is given by

\[ \begin{equation} \sigma^2_{\beta}=\frac{1}{P(1-P)}\frac{\sigma^2}{N}. \end{equation} \] Hence, a higher sample size \(N\) and a balanced proportion of treated units \((P\sim 0.5)\) will improve the precision of our estimation leading to a lower MDE. If we are interested in testing the hypothesis, \(H_0\), that the effect of the program is equal to zero against the alternative that it is not, then, for a given significance level, \(\alpha\) and power, \(\kappa\), the MDE is given by

\[ \begin{equation} MDE=\left(t_{1-\kappa}+t_{\alpha/2}\right)\sqrt{\frac{1}{P(1-P)}\frac{\sigma^2}{N}} \end{equation} \]

There is a trade-off between power and significance level. For some desirable MDE, if you want to have more power on your design you must accept a higher probability of committing Type I error.

Fully Randomized Experiment

Let’s first consider the case of a fully (i.e. individual level) randomized experiment with homogeneous treatment effect. Moreover, we will start with a simple regression and incrementally add regressors (i.e. control variables) to assess the effect on MDE. Our dependent variable is the logical exam’s score and additional regressors will be age and woman’s flag. The econometric specification takes the form:

\[ \begin{equation} y_i=\alpha+\beta T_i+\gamma_1 age_i+\gamma_2 woman_i+\varepsilon_i \end{equation} \]

# Clean up the data. Never forget it!!
dt[escola == "S/escola", escola := "99"][, escola := as.integer(escola)]
dt <- dt[!is.na(logico)]
summary(dt)
##      escola          aluno           logico             mulher      
##  Min.   : 1.00   Min.   :  1.0   Min.   :-1.40923   Min.   :0.0000  
##  1st Qu.: 8.00   1st Qu.:226.8   1st Qu.:-0.77069   1st Qu.:0.0000  
##  Median :14.50   Median :453.5   Median :-0.13215   Median :0.0000  
##  Mean   :15.42   Mean   :453.6   Mean   :-0.00218   Mean   :0.4735  
##  3rd Qu.:23.00   3rd Qu.:680.2   3rd Qu.: 0.66603   3rd Qu.:1.0000  
##  Max.   :99.00   Max.   :906.0   Max.   : 3.06056   Max.   :1.0000  
##      idade       
##  Min.   : 8.784  
##  1st Qu.:10.660  
##  Median :11.060  
##  Mean   :11.219  
##  3rd Qu.:11.405  
##  Max.   :17.005

We have to simulate the treatment assignment designed to happen in the experiment. In this case this will be a random draw from \(\{0, 1\}\) at student level. For each treatment effect imposed we simulate a large number of assignments and derive the proportion of null hypothesis rejections (i.e. the power) for that effect.

#' Include the effect in y variable for those observations in the treatment
#' group
set.seed(123456)
eff_grid <- seq(0, 0.5, by = 0.05) # Effect grid
reps <- 1000
alpha <- 0.05
dt2 <- copy(dt)
#' Creates a data.table to hold all replications and effects
mde_dt <- CJ(effect = eff_grid, replication = seq_len(reps))
#' For each row, draw a treatment assignment vector, create an "artificial" outcome y and run regressions. Save the rejections
mde_dt[, rejections := lapply(effect, function(x){
  dt2[, treat := rbinom(.N, 1, prob = 0.5)]
  dt2[, y := logico + x*treat]
  model <- feols(y~treat+csw0(idade, mulher), data = dt2, 
               se = "hetero", 
               lean = TRUE)
  lapply(model, function(x) pvalue(x)["treat"] < alpha)
})]
#' Unlist the rejections for each model estimated
mde_dt[, by = .(effect, replication), 
       c("Model1","Model2","Model3") := rejections[[1]]]
# Power table. Columns "Model" present the power for a given effect size
table <- mde_dt[, by = effect,
                lapply(.SD, mean),
                .SDcols = patterns("Model")]
table
##     effect Model1 Model2 Model3
##  1:   0.00  0.044  0.047  0.053
##  2:   0.05  0.109  0.114  0.123
##  3:   0.10  0.335  0.354  0.353
##  4:   0.15  0.601  0.653  0.656
##  5:   0.20  0.851  0.876  0.887
##  6:   0.25  0.965  0.982  0.980
##  7:   0.30  0.996  0.996  0.996
##  8:   0.35  1.000  1.000  1.000
##  9:   0.40  1.000  1.000  1.000
## 10:   0.45  1.000  1.000  1.000
## 11:   0.50  1.000  1.000  1.000

Notice how power increases when we add regressors that help explain variations in the dependent variable. The inclusion of those regressors reduces total variance, thus, improving precision on \(\hat\beta\) and reducing the MDE.

Cluster-randomized Experiment

When analyzing individual data from programs randomized at a group level, it is important to take into account that the error term may not be independent across individuals. In this case, our simulation must take into account the clustered treatment assignment and the possibility of errors grouping. Consider the following setup

\[ \begin{equation} Y_{ij}=\alpha+\beta T_{j}+v_j+w_{ij} \end{equation} \] where \(j\in\{1, \ldots, J\}\) indexes the cluster. Suppose same size clusters with \(n\) individuals, \(v_j\) is i.i.d. with variance \(\tau^2\), and \(w_{ij}\) is i.i.d. with variance \(\sigma^2\). The cluster-robust standard error for \(\hat\beta\) is given by

\[ \begin{equation} \sqrt{\frac{1}{P(1-P)}\frac{n\tau^2+\sigma^2}{nJ}} \end{equation} \] Therefore, the MDE in a cluster-randomized experiment is

\[ \begin{equation} MDE^{CR}=\left(t_{1-\kappa}+t_{\alpha/2}\right)\sqrt{\frac{1}{P(1-P)}\frac{n\tau^2+\sigma^2}{nJ}} \end{equation} \] which will be higher than the individually randomized \(MDE\) and varies roughly proportionally as a function of the number of clusters \(J\) and much less as a function of the number of individuals within a cluster.

set.seed(123456)
dt2 <- copy(dt)
mde2_dt <- CJ(effect = eff_grid, replication = seq_len(reps))
mde2_dt[, rejections := lapply(effect, function(x){
  #' Treatment assignment made by school!
  dt2[, by = escola, treat := rbinom(1, 1, prob = 0.5)]
  dt2[, y := logico + x*treat]
  model <- feols(y~treat+csw0(idade, mulher), data = dt2, 
               cluster = ~escola, # Cluster the errors by school
               lean = TRUE)
  lapply(model, function(x) pvalue(x)["treat"] < alpha)
})]
mde2_dt[, by = .(effect, replication), 
       c("Model1","Model2","Model3") := rejections[[1]]]
# Power table. Columns "Model" present the power for a given effect size
table2 <- mde2_dt[, by = effect,
                lapply(.SD, mean),
                .SDcols = patterns("Model")]
table2
##     effect Model1 Model2 Model3
##  1:   0.00  0.072  0.065  0.065
##  2:   0.05  0.074  0.068  0.065
##  3:   0.10  0.095  0.096  0.093
##  4:   0.15  0.143  0.160  0.153
##  5:   0.20  0.205  0.236  0.234
##  6:   0.25  0.294  0.324  0.323
##  7:   0.30  0.404  0.456  0.456
##  8:   0.35  0.508  0.587  0.577
##  9:   0.40  0.616  0.675  0.675
## 10:   0.45  0.717  0.777  0.769
## 11:   0.50  0.803  0.865  0.859

What we see here, compared to the fully randomized experiment is a loss of power for a given effect size. When errors are clustered, the effective number of observations is reduced, the coefficient’s standard error increases and so does the MDE.

The previous tables suggest we have an increasing power function with relation to the treatment effect \(\beta\). Take the first model, for example, depicted in the figure bellow.

ggplot(table2, aes(effect, Model1)) +
  geom_point() +
  geom_line() +
  labs(x = "Effect size", y = "Power", 
       title = "Power function",
       subtitle = "Significance level fixed at 5%.") +
  theme_light()

Suppose now we want to compute the MDE for a specified power and significance level from the baseline data we were given. Basically, we need to invert the power function. We don’t actually have data on the treatment, remember we are dealing with survey data prior to any intervention, such that we still simulate treatment assignment and compute the power, although, now we have a goal value to pursue. Once we get to that value, then we will know what is the minimum effect size that achieves the desired power.

set.seed(123456)
kappa <- 0.8
alpha <- 0.05
reps <- 100 # High values may take longer

power_diff <- function(beta, kappa = 0.8, alpha = 0.05, reps = 10){
  dt2 <- copy(dt) # Copying from global environment
  mde_dt <- CJ(effect = beta, replication = seq_len(reps))
  # sapply because return single value
  mde_dt[, rejections := sapply(effect, function(x){
    dt2[, by = escola, treat := rbinom(1, 1, prob = 0.5)]
    dt2[, y := logico + x*treat]
    # Only one (simple) model
    model <- feols(y~treat, data = dt2, 
                   cluster = ~escola, 
                   lean = TRUE)
    pvalue(model)["treat"] < alpha
  })]
  # Power value. It's an scalar now
  pwr <- mde_dt[, mean(rejections)]
  return(pwr - kappa)
}
res <- uniroot(power_diff, interval = c(0, 1),
               kappa = kappa, alpha = alpha, reps = reps)

For this data, a specified power of 0.8 and significance level at 0.05, we have found a MDE of 0.5108621 with estimated precision of 6.1377666^{-5}.

But what if this MDE is not enough? By how much should we increase the sample size to achieve a desired MDE? The “pilot” data has already been collected and cannot (suppose that is the case) be redone. As a rule of thumb the MDE decreases at the square root of “effective observations” rate, where effective observations would be the number of clusters in a grouped experiment. Then, the ratio of MDEs is proportional to the square root of the inverse ratio of observations.

\[ \begin{equation} MDE=MDE_{pilot}\sqrt{\frac{N_{pilot}}{N}} \end{equation} \]

Once we have calculated \(MDE_{pilot}\), for a given power and significance level, it is easy to estimate the minimum sample size required in order to reach a goal MDE. Suppose we want to halve the \(MDE_{pilot}\), then, keeping constant power and significance, we should quadruple the sample size.

paramtest R package

As a final note, I refer your to check the paramtest package to make simulations like the ones we have done here. The package is supposed to:

“Run simulations or other functions while easily varying parameters from one iteration to the next. Some common use cases would be grid search for machine learning algorithms, running sets of simulations (e.g., estimating statistical power for complex models), or bootstrapping under various conditions.”

Check the companion script provided, power.R and the package’s documentation and vignette at the link bellow.

References

Duflo, Esther, Rachel Glennerster, and Michael Kremer. 2007. “Using Randomization in Development Economics Research: A Toolkit.” Handbook of Development Economics 4: 3895–3962.

  1. A starters reference on the subject is Duflo, Glennerster, and Kremer (2007).↩︎