QTM 385 - Experimental Methods

Lecture 07 - Blocking and Clustering

Danilo Freire

Emory University

Hi, there!
Hope all is well 😉

Brief recap 📚

Last time, we…

  • Discussed Kalla and Broockman (2015): Campaign Contributions Facilitate Access to Congressional Officials: A Randomized Field Experiment
  • They conducted a field experiment demonstrating that informing congressional offices about meeting attendees being donors significantly increased (3-4 times) the likelihood of securing meetings with policymakers
  • Their treatment was a simple email sent to congressional offices with information about the meeting attendees (donors vs. local constituents)
  • The experiment also had a good theoretical grounding, as it was based on the idea that money distorts political representation
  • The results indicate this is true, at least in the American context (at the time of the experiment)

Last time, we…

  • The findings replicate elsewhere, too: For instance, 14% more callbacks for German- vs Turkish-sounding names (Kaas and Munger (2019))
  • Yemane and Reino (2019) find that Latinos are also discriminated against in the US labor market, but not in Spain. Why is that?

Today, we will…

  • Understand how to improve our experiments by using blocking
  • Learn why blocking reduces variance and increases precision
  • See how blocking solves some practical issues in field experiments
  • Learn about, and how to deal with, clustering and intra-cluster correlation
  • Differences between blocking and clustering
  • But first…, let’s talk about your group work again!

Group project 👥

Group project

The list is ready!

  • Thanks to everyone who emailed me with their group preferences 😉
  • I took all your preferences into account and tried to make the groups as balanced as possible
  • If you didn’t email me, I assigned you to a group randomly, as we agreed in class
  • Group numbers were also randomised to avoid any selection bias! 😂
  • I will also create the groups on Canvas in case you need to refer to it later
  • If you are happy with your group, that’s great! If not, please let me know and I will try to accommodate your request 🥳

Questions?
Did I miss anyone? 😅

Group project 🤝

Next steps (from our previous lecture)

  • Now that you have your groups, I’d like you to start working on your pre-analysis plan
  • I’ll give you some time to discuss your ideas and start writing your plan
  • What do you think about sending this to me next week?
    • Submit at most 2 paragraphs summarising an experiment that you want to develop in this course. At minimum, your summary should include a research question, why the question is important, and a rough sketch of how you plan to answer the question
  • In two weeks:
    • Write a title and abstract for a paper you imagine writing based on your proposed experiment. Assume that your findings align with your theoretical predictions. Remember to establish why the findings matter for your intended audience
  • In three weeks:
    • Outline your pre-analysis plan. Your outline should include sections on the research question, the experimental design, the data you will collect, and the analysis you will conduct

Blocking and Clustering

What is blocking?

  • Blocking is a procedure that involves grouping experimental units based on certain characteristics
  • These groups are called blocks or strata and are formed based on variables that are expected to affect the outcome of the experiment (heterogeneous treatment effects)
  • Within each block, units are randomly assigned to treatment or control groups
  • So we have “experiments within experiments”!
  • This approach helps to ensure that the treatment and control groups are comparable within each block, reducing the potential for confounding variables to affect the results

Why is blocking important?

  • Blocking can also help us ensure that an equal number of people from each group are assigned to the treatment and control groups
  • For instance, imagine an experiment that includes 20 people, 10 men and 10 women
  • If we randomly assign people to the treatment and control groups, we might end up with 15 women in the experiment, or even only men in our sample (although the chance of this happening is low)
  • Blocking removes this risk entirely
  • And as groups will be more homogeneous, blocking also reduces variance and increases precision
  • “Block what you can, and randomise what you cannot” (Gerber and Green 2012, p. 110)

One or many blocks?

  • With enough time and resources, we could create a block for every possible characteristic that might affect the outcome of the experiment
  • But this would be impractical and unnecessary
  • Instead, we should focus on the most important characteristics that are likely to have the greatest impact on the outcome
  • But how to we know which characteristics are important if we haven’t run the experiment yet?
  • Two strategies:
    • Use previous research (quantitative or qualitative) to identify important characteristics
    • Pilot the experiment with a small sample to test for important characteristics

How does blocking help?

  • Let’s say we are interested in testing the effect of a new drug on blood pressure
  • We know that age is an important factor that affects blood pressure, so we decide to block our sample by age
  • To simplify, we create two blocks: one for people under 50 and another for people over 50
  • Imagine that we have 12 people in our sample (\(N\)) and want to assign 6 of them (\(m=6\)) to the treatment group and 6 to the control group
  • How many possible ways are there to assign people to the treatment and control groups?
# Random assignment
choose(12, 6)
[1] 924
# Two blocks with 6 people each
# 3 in the treatment group
choose(6, 3) * choose(6, 3)
[1] 400

How does blocking help?

  • The assignments that are ruled out are those in which too many or too few units in a block are assigned to treatment
  • Those “extreme” assignments produce estimates that are in the tails of the sampling distribution
  • The figure shows the sampling distribution of the difference-in-means estimator under complete random assignment
  • The histogram is shaded according to whether the particular random assignment is permissible under a procedure that blocks on the binary covariate \(X\) (age, in our case)
  • After many simulations, we see that blocking rules out by design those assignments that are not perfectly balanced

Is there any disadvantage to blocking?

  • In general, no!
  • Gerber and Green argue that, even if you create blocks at random, you will do no worse than if you had not blocked at all (simple randomisation)
  • So there is no real disadvantage to blocking according to them
  • Others, such as Pashley and Miratrix (2021), argue that in some specific conditions (such as unthoughtful blocking), blocking will not be too beneficial, but it will not be so harmful either
  • In the vast majority of cases, you have a lot of gain from blocking and very little to lose
  • The only risk, in fact, is analysing the data incorrectly
  • And that can happen! We will see how soon 😉

When is blocking useful?

  • Blocking can be particularly useful when:
    • The sample size is small
    • There are important characteristics that are likely to affect the outcome of the experiment
    • The cost of blocking is low compared to the potential benefits
    • When it is important to affirm that heterogeneity is expected and should be explored (defense against p-hacking)
  • We should not overstate its benefits, as much of it can also be obtained with covariate adjustment. But the decrease in variance is real!

Kalla and Broockman (2015)

How to define ATE in blocked experiments?

  • If we consider the ATE at the unit level:

\[ATE = \frac{1}{N}\sum_{i=1}^N y_{i,1} - y_{i,0}\]

  • We could re-express this quantity equivalently using the ATE of block \(j\), \(ATE_j\), as follows:

\[ATE = \frac{1}{J}\sum_{j=1}^J\sum_{i=1}^{N_j} \frac{y_{i,1} - y_{i,0}}{N_j} = \sum_{j=1}^J \frac{N_j}{N}ATE_j\]

  • And it would be logical to estimate this quantity by replacing what we can indeed calculate:

\[\widehat{ATE} = \sum_{j=1}^J \frac{N_j}{N}\widehat{ATE_j}\]

How to define ATE in blocked experiments?

  • We can define the standard error of the estimator by averaging the standard errors within each block (if our blocks are sufficiently large)

  • Take each block’s standard error
  • Weight it by the squared proportion of that block’s size
  • Add up all these weighted squared standard errors
  • Take the square root of the sum

How to define ATE in blocked experiments?

Let’s simulate some data

set.seed(12345)
# We have 10 units
N <- 10
# y0 is the potential outcome under control
y0 <- c(0, 0, 0, 1, 1, 3, 4, 5, 190, 200)
# For each unit, the treatment effect is intrinsic
tau <- c(10, 30, 200, 90, 10, 20, 30, 40, 90, 20)
# y1 is the potential outcome under treatment
y1 <- y0 + tau
# Two blocks: a and b
block <- c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b")
# Z is the treatment assignment
# (in the code we use Z instead of T)
Z <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 1)
# Y is the observed outcome
Y <- Z * y1 + (1 - Z) * y0
# The data
dat <- data.frame(Z = Z, y0 = y0, y1 = y1, tau = tau, b = block, Y = Y)
head(dat)
  Z y0  y1 tau b  Y
1 0  0  10  10 a  0
2 0  0  30  30 a  0
3 0  0 200 200 a  0
4 0  1  91  90 a  1
5 1  1  11  10 a 11
6 1  3  23  20 a 23

How to define ATE in blocked experiments?

  • One option to estimate \(ATE_j\) is just to replace it with \(\widehat{ATE}\)
with(dat, table(b, Z))
   Z
b   0 1
  a 4 2
  b 2 2

As we can see, we have 6 units in block \(a\), 2 of which are assigned to the treatment, and 4 units in block \(b\), 2 of which are assigned to the treatment.

Estimating ATE in blocked experiments

  • First, let’s see some possible estimations
# The ATE
library(estimatr)
lm_robust(Y ~ Z, data = dat)
              Estimate Std. Error  t value   Pr(>|t|)    CI Lower   CI Upper DF
(Intercept)   1.666667  0.9189366 1.813691 0.10728314  -0.4524049   3.785738  8
Z           131.833333 68.4173061 1.926900 0.09015082 -25.9372575 289.603924  8
lm_robust(Y ~ Z + block, data = dat)
             Estimate Std. Error   t value   Pr(>|t|)  CI Lower  CI Upper DF
(Intercept) -32.42857   21.63212 -1.499093 0.17752759 -83.58042  18.72327  7
Z           114.78571   50.53715  2.271314 0.05736626  -4.71565 234.28708  7
blockb      102.28571   50.49783  2.025547 0.08245280 -17.12268 221.69411  7
  • How are they different?

Why are they different?

  • How are they different? (The first one ignores the blocks. The second one uses a different set of weights, created using fixed effects variables or indicator/dummy variables)

  • And we can estimate the total ATE by adjusting the weights according to the size of the blocks:

lm_lin(Y ~ Z, covariates = ~ block, data = dat)
            Estimate Std. Error  t value     Pr(>|t|)   CI Lower   CI Upper DF
(Intercept)     1.95   0.250000 7.800000 0.0002340912   1.338272   2.561728  6
Z             108.25  12.530862 8.638672 0.0001325490  77.588086 138.911914  6
blockb_c        4.25   0.559017 7.602631 0.0002696413   2.882135   5.617865  6
Z:blockb_c    228.75  30.599224 7.475680 0.0002957945 153.876397 303.623603  6
  • Which one should we use? Any thoughts?

Weighted average of block ATEs

  • The weighted average of the block ATEs is the best estimator
  • The weights are the proportion of units in each block, \(N_j/N\)
  • If the likelihood of being assigned to the treatment group differs by block, comparing the means across all subjects will lead to a biased estimate of the ATE
    • Unless the probability of assignment to the treatment group is identical for every block
  • In a nutshell: when estimating the ATE in a blocked experiment, just do it the old-fashioned way! 😅

Clustering

What is clustering?

  • So far, we have only allocated individual units to treatment and control groups
  • But in some cases, we might want to allocate whole groups of units to treatment and control conditions
  • Usually, the main reason why we do this is because of practical issues
    • It is impossible to randomly assign individuals to treatment and control groups (TV markets, for instance)
    • We cannot isolate individuals from each other (same household, same school, etc.)
    • We have strong priors about possible spillover effects
  • However, we still measure effects at the individual level (or any level smaller than the randomisation unit)
  • Let’s see how that impacts our analyses 😉

Clustering

  • A common example is an education experiment in which the treatment is randomised at the classroom level
  • All students in a classroom are assigned to either treatment or control together, as it is impossible for students in the same classroom to be assigned to different conditions (different teachers, materials, etc)
  • Assignments do not vary within the classroom
  • Clusters can be localities, like villages, precincts, or neighbourhoods
  • When clusters are all the same size, the standard difference-in-means estimator we usually employ is unbiased
  • However, caution is needed when clusters have different numbers of units or when there are very few clusters, as the treatment effects could be correlated with cluster size
  • When cluster size is related to potential outcomes, the usual difference-in-means estimator is often biased

Intra-cluster correlation

  • Typically, cluster randomised trials have higher variance than individually randomised trials
  • Why? Because individuals within the same cluster tend to be more similar to each other than to individuals in different clusters
  • How much higher variance depends on a statistic that can be hard to think about, the intra-cluster correlation (ICC) of the outcome
  • The total variance can be decomposed into the variance of the cluster means \(\sigma^2_{between}\) plus the individual variance of the cluster-demeaned outcome \(\sigma^2_{within}\)
  • The ICC is a number between 0 and 1 that describes the fraction of the total variance that is due to the between variance: \(\rho = \frac{\sigma^2_{between}}{\sigma^2_{between} + \sigma^2_{within}}\)
  • Let me explain the ICC in more detail 😊

Intra-cluster correlation

  • Think of a cluster as a group of individuals who share some common characteristic or environment
  • For example, all students in a class experience the same teacher and facilities
  • Intra-cluster correlation tells us how similar individuals within the same group are in relation to individuals from different groups
  • If everyone in a group is very alike (like students who all perform similarly on a test), the correlation is high. If they’re quite different, it’s low
  • \(\sigma^2_{between}\): Some classrooms might be better than others because of different teaching styles or resources
  • \(\sigma^2_{within}\): Even within the same classroom, students might perform differently due to their own unique abilities or efforts
  • The total variance is the sum of these two components, and the ICC tells us how much of the total variance is due to differences between classrooms
  • When ICC is 1, the effective sample size is equal to the number of clusters. When ICC is 0, the effective sample size is equal to the number of individuals.

Information reduction

  • The ICC is a measure of how much information we lose by clustering
  • Imagine an impact evaluation with 10 people, where 5 are assigned to the treatment group and 5 to control
  • In one version of the experiment, the treatment is assigned to individuals
  • In another version, the 5 individuals with black hair and the 5 individuals with some other color of hair are assigned to treatment as a group
  • This design has two clusters: ‘black hair’ and ‘other color’
  • In the first version, the randomisation procedure allows for 252 different combinations of people as the treatment and control groups
  • However, in the second version, the randomisation procedure assigns all the black-haired subjects either to treatment or to control, and thus only ever produces two ways of combining people to estimate an effect

Information reduction

What to do about information reduction

Robust clustered standard errors

  • In many cases of clustered assignment, we will have to make sure that our estimates of uncertainty about treatment effects reflect information loss
  • This is because classical standard error estimators assume that the data are independent, thus biasing downwards our results
  • We would have type I errors more often than we should, and we would be more likely to conclude that a treatment effect is statistically significant than we should be
  • But fear not! There is a solution:
    • Robust clustered standard errors (RCSE)
    • Think of it as calculating a variance for each cluster first and then combining these in a way that reflects the true variability across the clusters

Blocking and clustering

  • One of the ways which can do to improve clustered designs is to use blocking
  • Imagine villages nested within discrete regions, or classrooms nested within discrete schools
  • Usually we do have some information about the clusters, so we can use it to reduce variance and add more effective information
  • But that is something we will see next time 😉

Summary

  • Blocking is a procedure that involves grouping experimental units based on certain characteristics
  • It helps to ensure that the treatment and control groups are comparable within each block, reducing the potential for confounding variables to affect the results
  • Blocking can also help us ensure that an equal number of people from each group are assigned to the treatment and control groups
  • Blocking reduces variance and increases precision
  • Clustering is a procedure that involves grouping experimental units based on some common characteristic or environment
  • It is useful when it is impossible to randomly assign individuals to treatment and control groups
  • But it is often not the ideal solution:
    • We are usually required to use clusters, not because we want to

Next time

  • We will see more on blocking and clustering in R with the estimatr package
  • Estimate models with robust clustered standard errors and see how they differ from regular standard errors
  • More on covariate adjustment
  • Learn about required sample sizes and power analysis
  • And more! 😂

Thanks very much! 😊

See you next time! 👋