QTM 385 - Experimental Methods

Lecture 07 - Blocking and Clustering

Danilo Freire

danilo.freire@emory.edu

Department of Data and Decision Sciences
Emory University

Hi, there!
Hope all is well 😉

Brief recap 📚

Last time, we…

Discussed Kalla and Broockman (2015): Campaign Contributions Facilitate Access to Congressional Officials: A Randomized Field Experiment
They conducted a field experiment demonstrating that informing congressional offices about meeting attendees being donors significantly increased (3-4 times) the likelihood of securing meetings with policymakers
Their treatment was a simple email sent to congressional offices with information about the meeting attendees (donors vs. local constituents)
The experiment also had a good theoretical grounding, as it was based on the idea that money distorts political representation
The results indicate this is true, at least in the American context (at the time of the experiment)

Article link and replication data

Last time, we…

Also discussed another experiment by Bertrand and Mullainathan (2004): Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination
This was a field experiment that sent out resumes with different names to test for racial discrimination in the job market
The results were striking: resumes with white-sounding names received 50% more callbacks than those with African American-sounding names
The experiment was widely reported in the media and sparked a lot of discussions about racial discrimination
- YouTube video with Mullainathan discussing discrimination by algorithms and people

The findings replicate elsewhere, too: For instance, 14% more callbacks for German- vs Turkish-sounding names (Kaas and Munger (2019))
- But the effect disappears when recommendation letters are provided. Why? Statistical discrimination?
Yemane and Reino (2019) find that Latinos are also discriminated against in the US labor market, but not in Spain. Why is that?

Article link and replication data

Today, we will…

Understand how to improve our experiments by using blocking
Learn why blocking reduces variance and increases precision
See how blocking solves some practical issues in field experiments
Learn about, and how to deal with, clustering and intra-cluster correlation
Understand why clustering increases variance and reduces statistical power
Differences between blocking and clustering
But first…, let’s talk about your group work again!

Source: Data Science Discovery - University of Illinois

Group project 👥

Group project

Thanks to everyone who has emailed me with their group preferences 😉
If you haven’t emailed me yet, I will soon assign you to a group (randomly)
Group numbers have also been randomised to avoid any selection bias! 😂
I will also create the groups on Canvas in case you need to refer to it later
Please let me know if you have already formed a group and want to keep it!

Questions?

Group project 🤝

Next steps (from our previous lecture)

I’ll give you some time to discuss your ideas and start writing your plan
What do you think about sending this to me next week/in two weeks?
- Submit at most 2 paragraphs summarising an experiment that you want to develop in this course. At minimum, your summary should include a research question, why the question is important, and a rough sketch of how you plan to answer the question
In two weeks:
- Write a title and abstract for a paper you imagine writing based on your proposed experiment. Assume that your findings align with your theoretical predictions. Remember to establish why the findings matter for your intended audience
In three weeks:
- Outline your pre-analysis plan. Your outline should include sections on the research question, the experimental design, the data you will collect, and the analysis you will conduct

Blocking and Clustering

What is blocking?

Blocking is a procedure that involves grouping experimental units based on certain characteristics
These groups are called blocks or strata and are formed based on variables that are expected to affect the outcome of the experiment (heterogeneous treatment effects)
Within each block, units are randomly assigned to treatment or control groups
So we have “experiments within experiments”!
This approach helps to ensure that the treatment and control groups are comparable within each block, reducing the potential for confounding variables to affect the results

Why is blocking important?

Blocking can also help us ensure that an equal number of people from each group are assigned to the treatment and control groups
For instance, imagine an experiment that includes 20 people, 10 men and 10 women
If we randomly assign people to the treatment and control groups, we might end up with 15 women in the experiment, or even only men in our sample (although the chance of this happening is quite low)
Blocking removes this risk entirely
And as groups will be more homogeneous, blocking also reduces variance and increases precision
“Block what you can, and randomise what you cannot” (Gerber and Green 2012, p. 110)

One or many blocks?

With enough time and resources, we could create a block for every possible characteristic that might affect the outcome of the experiment
But this would be impractical and unnecessary
Instead, we should focus on the most important characteristics that are likely to have the greatest impact on the outcome
But how to we know which characteristics are important if we haven’t run the experiment yet?
Two strategies:
- Use previous research (quantitative or qualitative) to identify important characteristics
- Pilot the experiment with a small sample to test for important characteristics

How does blocking help?

Let’s say we are interested in testing the effect of a new drug on blood pressure
We know that age is an important factor that affects blood pressure, so we decide to block our sample by age
To simplify, we create two blocks: one for people under 50 and another for people over 50
Imagine that we have 12 people in our sample (\(N\)) and want to assign 6 of them (\(m=6\)) to the treatment group and 6 to the control group
How many possible ways are there to assign people to the treatment and control groups?

# Random assignment
choose(12, 6)

[1] 924

# Two blocks with 6 people each
# 3 in the treatment group
choose(6, 3) * choose(6, 3)

[1] 400

How does blocking help?

The assignments that are ruled out are those in which too many or too few units in a block are assigned to treatment
Those “extreme” assignments produce estimates that are in the tails of the sampling distribution
The figure shows the sampling distribution of the difference-in-means estimator under complete random assignment
The histogram is shaded according to whether the particular random assignment is permissible under a procedure that blocks on the binary covariate \(X\) (age, in our case)
After many simulations, we see that blocking rules out by design those assignments that are not well balanced

Source: Blair et al (2023)

Is there any disadvantage to blocking?

In general, no!
Gerber and Green argue that, even if you create blocks at random, you will do no worse than if you had not blocked at all (simple randomisation)
So there is no real disadvantage to blocking according to them
Others, such as Pashley and Miratrix (2021), argue that in some specific conditions (such as unthoughtful blocking), blocking will not be too beneficial, but it will not be so harmful either
In the vast majority of cases, you have a lot of gain from blocking and very little to lose
The only risk, in fact, is analysing the data incorrectly
And that can happen! We will see how soon 😉

When is blocking useful?

Blocking can be particularly useful when:
- The sample size is small
- There are important characteristics that are likely to affect the outcome of the experiment
- The cost of blocking is low compared to the potential benefits
- When it is important to affirm that heterogeneity is expected and should be explored (defense against p-hacking)
We should not overstate its benefits, as much of it can also be obtained with covariate adjustment. But the decrease in variance is real!

Kalla and Broockman (2015)

How to define ATE in blocked experiments?

If we consider the ATE at the unit level:

\[ATE = \frac{1}{N}\sum_{i=1}^N y_{i,1} - y_{i,0}\]

We could re-express this quantity equivalently using the ATE of block \(j\), \(ATE_j\), as follows:

\[ATE = \frac{1}{J}\sum_{j=1}^J\sum_{i=1}^{N_j} \frac{y_{i,1} - y_{i,0}}{N_j} = \sum_{j=1}^J \frac{N_j}{N}ATE_j\]

And it would be logical to estimate this quantity by replacing what we can indeed calculate:

\[\widehat{ATE} = \sum_{j=1}^J \frac{N_j}{N}\widehat{ATE_j}\]

How to define ATE in blocked experiments?

We can define the standard error of the estimator by averaging the standard errors within each block (if our blocks are sufficiently large)

Take each block’s standard error
Weight it by the squared proportion of that block’s size
Add up all these weighted squared standard errors
Take the square root of the sum

How to define ATE in blocked experiments?

Let’s simulate some data

set.seed(12345)
# We have 10 units
N <- 10
# y0 is the potential outcome under control
y0 <- c(0, 0, 0, 1, 1, 3, 4, 5, 190, 200)
# For each unit, the treatment effect is intrinsic
tau <- c(10, 30, 200, 90, 10, 20, 30, 40, 90, 20)
# y1 is the potential outcome under treatment
y1 <- y0 + tau
# Two blocks: a and b
block <- c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b")
# Z is the treatment assignment
# (in the code we use Z instead of T)
Z <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 1)
# Y is the observed outcome
Y <- Z * y1 + (1 - Z) * y0
# The data
dat <- data.frame(Z = Z, y0 = y0, y1 = y1, tau = tau, b = block, Y = Y)
head(dat)

  Z y0  y1 tau b  Y
1 0  0  10  10 a  0
2 0  0  30  30 a  0
3 0  0 200 200 a  0
4 0  1  91  90 a  1
5 1  1  11  10 a 11
6 1  3  23  20 a 23

How to define ATE in blocked experiments?

One option to estimate \(ATE_j\) is just to replace it with \(\widehat{ATE}\)

with(dat, table(b, Z))

As we can see, we have 6 units in block \(a\), 2 of which are assigned to the treatment, and 4 units in block \(b\), 2 of which are assigned to the treatment.

Estimating ATE in blocked experiments

First, let’s see some possible estimations

# The ATE
library(estimatr)
lm_robust(Y ~ Z, data = dat)

              Estimate Std. Error  t value   Pr(>|t|)    CI Lower   CI Upper DF
(Intercept)   1.666667  0.9189366 1.813691 0.10728314  -0.4524049   3.785738  8
Z           131.833333 68.4173061 1.926900 0.09015082 -25.9372575 289.603924  8

lm_robust(Y ~ Z + block, data = dat)

             Estimate Std. Error   t value   Pr(>|t|)  CI Lower  CI Upper DF
(Intercept) -32.42857   21.63212 -1.499093 0.17752759 -83.58042  18.72327  7
Z           114.78571   50.53715  2.271314 0.05736626  -4.71565 234.28708  7
blockb      102.28571   50.49783  2.025547 0.08245280 -17.12268 221.69411  7

How are they different?

Why are they different?

How are they different? (The first one ignores the blocks. The second one uses a different set of weights, created using fixed effects variables or indicator/dummy variables)
And we can estimate the total ATE by adjusting the weights according to the size of the blocks:

lm_lin(Y ~ Z, covariates = ~ block, data = dat)

            Estimate Std. Error  t value     Pr(>|t|)   CI Lower   CI Upper DF
(Intercept)     1.95   0.250000 7.800000 0.0002340912   1.338272   2.561728  6
Z             108.25  12.530862 8.638672 0.0001325490  77.588086 138.911914  6
blockb_c        4.25   0.559017 7.602631 0.0002696413   2.882135   5.617865  6
Z:blockb_c    228.75  30.599224 7.475680 0.0002957945 153.876397 303.623603  6

Which one should we use? Any thoughts?

Weighted average of block ATEs

The weighted average of the block ATEs is the best estimator
The weights are the proportion of units in each block, \(N_j/N\)
If the likelihood of being assigned to the treatment group differs by block, comparing the means across all subjects will lead to a biased estimate of the ATE
- Unless the probability of assignment to the treatment group is identical for every block
In a nutshell: when estimating the ATE in a blocked experiment, just do it the old-fashioned way! 😅

Source: DeclareDesign (2018)

Clustering

What is clustering?

So far, we have only allocated individual units to treatment and control groups
But in some cases, we might want to allocate whole groups of units to treatment and control conditions
Usually, the main reason why we do this is because of practical issues
- It is impossible to randomly assign individuals to treatment and control groups (TV markets, for instance)
- We cannot isolate individuals from each other (same household, same school, etc.)
- We have strong priors about possible spillover effects
However, we still measure effects at the individual level (or any level smaller than the randomisation unit)
Let’s see how that impacts our analyses 😉

Source: TMG Research

:::

Clustering

A common example is an education experiment in which the treatment is randomised at the classroom level
All students in a classroom are assigned to either treatment or control together, as it is impossible for students in the same classroom to be assigned to different conditions (different teachers, materials, etc)
Assignments do not vary within the classroom
Clusters can be localities, like villages, precincts, or neighbourhoods
When clusters are all the same size, the standard difference-in-means estimator we usually employ is unbiased
However, caution is needed when clusters have different numbers of units or when there are very few clusters, as the treatment effects could be correlated with cluster size
When cluster size is related to potential outcomes, the usual difference-in-means estimator is often biased

Source: Patterson et al (2022)

:::

Differences between blocking and clustering

Aspect	Blocking	Clustering
Purpose	To reduce variance and increase precision	Practical necessity, not choice
Unit of randomisation	Individual units	Groups of units
Grouping	Based on pre-treatment characteristics	Based on natural or administrative groups
Analysis	Compare within blocks, then weight	Must account for within-cluster similarity
Example	Block by age, then randomise individuals within age groups	Randomise schools, but measure student outcomes

Intra-cluster correlation

Typically, cluster randomised trials have higher variance than individually randomised trials
Why? Because individuals within the same cluster tend to be more similar to each other than to individuals in different clusters
How much higher variance depends on a statistic that can be hard to think about, the intra-cluster correlation (ICC) of the outcome (\(\rho\))
Intra-cluster correlation (ICC) measures how similar individuals are within the same cluster compared to individuals in different clusters

Key Points:

Ranges from 0 to 1
Higher values indicate greater similarity within clusters
Formula: \(\rho = \frac{\sigma^2_{between}}{\sigma^2_{between} + \sigma^2_{within}}\)

Intuitive Explanation:

ICC = 0: Individuals within clusters are no more similar than individuals from different clusters (\(\sigma^2_{between} = 0\))
ICC = 1: All individuals within a cluster are identical to each other (\(\sigma^2_{within} = 0\))
Typical values: Usually between 0.01 and 0.2 in social science research

Example:

Imagine we’re studying student test scores in 10 schools with 30 students each:

Between-school variance (\(\sigma^2_{between}\)) = 25
Within-school variance (\(\sigma^2_{within}\)) = 75
ICC = \(\frac{25}{25 + 75} = 0.25\)

This means 25% of the total variance in test scores is due to differences between schools, and 75% is due to differences between students within the same school

Information reduction

Clustering reduces the number of possible treatment assignments, which reduces statistical information

Example:

With 10 individuals (5 with black hair, 5 with other colors):

Individual Randomisation:

Each person can be independently assigned to treatment or control
Total possible assignments: \(\binom{10}{5} = 252\) different combinations

Cluster Randomisation:

All individuals in a cluster receive the same treatment
Only 2 possible assignments:
1. Black hair cluster = treatment, Other colour cluster = control
2. Black hair cluster = control, Other colour cluster = treatment

The more clustered our design, the fewer possible randomisation combinations we have, leading to less statistical information for estimating treatment effects
This is why clustered designs typically require larger sample sizes to achieve the same statistical power as individually randomised experiments

Source: Stata Guide

Information reduction

Design Effects in Clustered Designs

The design effect (DEFF) quantifies how much clustering increases the variance of our estimates compared to simple random sampling

\[DEFF = 1 + (m - 1) \times \rho\]

Where \(m\) = average cluster size and \(\rho\) = intra-cluster correlation (ICC)
DEFF = 1: Clustering has no effect on variance (\(\rho = 0\))
DEFF > 1: Clustering increases variance; need larger sample size
DEFF < 1: Clustering decreases variance (rare)

Example:

Average cluster size: 30 students per classroom
ICC: 0.1 (10% of total variance is between classrooms)
DEFF = 1 + (30 - 1) × 0.1 = 3.9

This means we need nearly 4 times as many observations as in a simple random sample to achieve the same statistical power!

Practical Recommendations for Clustered Designs

Before Conducting the Experiment:

Estimate the ICC from previous studies or pilot data
Calculate required sample size accounting for design effects
Plan for adequate cluster numbers (typically need at least 15-20 clusters per arm)
Consider blocking at higher levels to improve precision

During Analysis:

Always use clustered standard errors at the level of randomisation
Report both clustered and unclustered standard errors for comparison
Check balance across clusters to ensure randomisation worked
Consider heterogeneous treatment effects across clusters

Reporting Results:

Clearly state the unit of randomisation
Report the ICC and design effect
Describe the clustering structure in your methodology
Explain how standard errors were calculated

What to do about information reduction

Robust clustered standard errors

When dealing with clustered data, we need to adjust our standard errors to account for the within-cluster correlation

Classical standard errors assume independent observations
In clustered data, observations within clusters are correlated
This leads to underestimated standard errors and inflated Type I error rates
Solution: Robust Clustered Standard Errors
- Adjust standard errors to account for clustering
- Use the “sandwich” estimator to correct for within-cluster correlation
- More conservative and accurate inference
- Implemented in the estimatr package in R

R example

library(estimatr)

# Simulate clustered data
set.seed(123)
n_clusters <- 20
n_per_cluster <- 30
total_n <- n_clusters * n_per_cluster

# Create cluster IDs
cluster_id <- rep(1:n_clusters, each = n_per_cluster)

# Treatment assigned at cluster level
treatment <- rep(rbinom(n_clusters, 1, 0.5), each = n_per_cluster)

# Create outcome with cluster effects
cluster_effect <- rnorm(n_clusters, 0, 2)
individual_effect <- rnorm(total_n, 0, 1)
outcome <- 2 * treatment + rep(cluster_effect, each = n_per_cluster) + individual_effect

# Create data frame
data <- data.frame(
  cluster_id = factor(cluster_id),
  treatment = treatment,
  outcome = outcome
)

# Compare standard errors
model_naive <- lm_robust(outcome ~ treatment, data = data)
model_clustered <- lm_robust(outcome ~ treatment, clusters = cluster_id, data = data)

# Display results
summary(model_naive)


Call:
lm_robust(formula = outcome ~ treatment, data = data)

Standard error type:  HC2 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper  DF
(Intercept)   0.4872     0.1239   3.932 9.397e-05   0.2439   0.7305 598
treatment     0.7484     0.1737   4.308 1.927e-05   0.4072   1.0896 598

Multiple R-squared:  0.02962 ,  Adjusted R-squared:  0.02799 
F-statistic: 18.56 on 1 and 598 DF,  p-value: 1.927e-05

summary(model_clustered)


Call:
lm_robust(formula = outcome ~ treatment, data = data, clusters = cluster_id)

Standard error type:  CR2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper    DF
(Intercept)   0.4872     0.6400  0.7613   0.4683  -0.9886    1.963  8.00
treatment     0.7484     0.8941  0.8370   0.4140  -1.1361    2.633 17.22

Multiple R-squared:  0.02962 ,  Adjusted R-squared:  0.02799 
F-statistic: 0.7006 on 1 and 19 DF,  p-value: 0.413

Combining blocking and clustering

We can improve the efficiency of clustered designs by incorporating blocking at a higher level

Example:

Clusters: Classrooms (level where treatment is assigned)
Blocks: Schools (higher-level grouping)
Blocking strategy: Stratify randomisation by school characteristics

Benefits:

Reduces variance by ensuring similar schools are distributed across treatment arms
Improves precision of treatment effect estimates
Accounts for both clustering and heterogeneity between blocks

Implementation:

Group clusters into blocks based on relevant characteristics
Randomise treatment within each block
Analyse using appropriate clustered standard errors

Summary

Blocking is a technique to reduce variance and increase precision by grouping similar units before randomisation
It is especially useful in small samples or when important covariates are known
We run a series of mini-experiments within blocks, then combine results
Results can be estimated using weighted averages of block-specific ATEs
Clustering is a practical necessity when randomising groups of units together
It requires adjusting standard errors to account for within-cluster correlation
The intra-cluster correlation (ICC) quantifies similarity within clusters and impacts variance
Design effects from clustering often require larger sample sizes
Always use robust clustered standard errors when analysing clustered data
Combining blocking and clustering can further improve efficiency in experimental designs

Next lecture

We will see more on blocking and clustering in R with the estimatr package
Estimate models with robust clustered standard errors and see how they differ from regular standard errors
More on covariate adjustment
Learn about required sample sizes and power analysis
And more! 😂

QTM 385 - Experimental Methods

Hi, there! Hope all is well 😉

Brief recap 📚

Last time, we…

Last time, we…

Today, we will…

Group project 👥

Group project

Questions?

Group project 🤝

Next steps (from our previous lecture)

Blocking and Clustering

What is blocking?

Why is blocking important?

One or many blocks?

How does blocking help?

How does blocking help?

Is there any disadvantage to blocking?

When is blocking useful?

Kalla and Broockman (2015)

How to define ATE in blocked experiments?

How to define ATE in blocked experiments?

How to define ATE in blocked experiments?

Let’s simulate some data

How to define ATE in blocked experiments?

Estimating ATE in blocked experiments

Why are they different?

Weighted average of block ATEs

Clustering

What is clustering?

Clustering

Differences between blocking and clustering

Intra-cluster correlation

Key Points:

Intuitive Explanation:

Example:

Information reduction

Example:

Information reduction

Design Effects in Clustered Designs

Example:

Practical Recommendations for Clustered Designs

Before Conducting the Experiment:

During Analysis:

Reporting Results:

What to do about information reduction

Robust clustered standard errors

R example

Combining blocking and clustering

Example:

Benefits:

Implementation:

Summary

Next lecture

Thanks very much! 😊

See you next time! 👋

Hi, there!
Hope all is well 😉