QTM 385 - Experimental Methods

Lecture 16 - Interference

Danilo Freire

Emory University

How are you doing? 😊

Brief recap 📚

Brief recap 📚

  • Natural experiments framework
    • True vs “as-if” randomness in treatment assignment
    • Core assumption: exogeneity of assignment mechanism
    • Examples: Lottery-based charter school studies, border discontinuity designs
  • Quasi-experimental approaches
    • Regression discontinuity (RDD): Leveraging threshold-based assignment
    • Difference-in-differences (DID): Utilising parallel trends assumption
  • Methodological challenges
    • Selection bias in observational data
    • SUTVA violations from treatment spillovers
    • Power limitations in natural variation contexts
  • Empirical examples
    • Angrist et al. (2013): School lottery IV analysis
    • Card & Krueger (1994): Minimum wage DID study
    • Mignozzetti et al. (2024): RDD in legislative analysis
  • Validation strategies
    • Placebo tests for assumption verification
    • Pre-treatment trend analysis for DID
    • Robustness checks for sensitivity assessments
  • Ethical considerations
    • Responsible communication of limitations
    • Secondary data ethics compliance
    • Policy impact assessments for natural experiments

Today’s plan 📅

Interference and spillovers

  • SUTVA violations happen quite often
  • So it is better to model interference explicitly
  • We will see how to deal with spillovers using multi-level designs, spatial spillovers, within-subject designs, repeated-measures experiments, and waitlist designs
  • We will also discuss the assumptions behind these designs and how to estimate treatment effects in these cases
  • Let’s get started! 😎

Interference

Interference

When treatment effects spill over

  • Remember SUTVA?
    • Stable unit treatment value assumption
  • The stable part means that potential outcomes should be independent of the treatment
  • As you can imagine, this poses risks to causal identification
  • This happens quite often in social and public health interventions:
    • Peer effects in education
    • Contagion in public health
    • Spillovers in policy evaluations
    • Network effects in marketing and technology

A motivating example

  • Imagine you are trying to improve school grades
  • You think that awarding a prize to students will help motivate them
  • You randomly assign students to treatment and control groups
  • You find a positive effect of the program on test scores
  • But you also find that depending on who gets the treatment, the effect varies
Student Grades if Alice wins prize Grades if Bob wins prize Grades if Charlie wins prize Grades if no prize awarded
Alice 10 5 7 7
Bob 5 5 5 5
Charlie 9 5 9 9

Example

  • You find that the results vary depending on who wins the prize
  • Let’s calculate the ATEs in this case?
Student Grades if Alice wins prize Grades if Bob wins prize Grades if Charlie wins prize Grades if no prize awarded
Alice 10 5 7 7
Bob 5 5 5 5
Charlie 9 5 9 9
  • If we simply calculate the ATE using the last column as \(Y_{i}(0)\) and the column corresponding to \(i\)’s award as \(Y_{i}(1)\), we have: \(\frac{((10 - 7) + (5 - 5) + (9 - 9))}{3} = 1\)
  • However… if we randomise the treatment, we have these results:
  • If Alice wins the prize, the ATE is \(10 - \frac{5+9}{2} = 3\)
  • If Bob wins the prize, the ATE is \(5 - \frac{5+5}{2} = 0\)
  • If Charlie wins the prize, the ATE is \(9 - \frac{7+5}{2} = 3\)
  • The average of these three is \(2\), which is different from the first calculation

Not all spillovers are bad!

  • As we have seen, many social phenomena are interdependent
  • But some are explicitly designed to leverage spillovers
    • Contamination: The effect of being vaccinated on one’s probability of contracting a disease depends on whether others are vaccinated
    • Network effects: The value of a product or service increases as more people use it (social media, telecommunication)
    • Hot-spots policing: The effect of increased police presence in one area can reduce crime in nearby areas
    • Deterrence: The effect of a harsher punishment on one individual can deter others from committing crimes

How to deal with interference? (good and bad!)

How to deal with interference?

  • Interference can be a nuisance, but also an opportunity
  • So it is a good idea to model it explicitly in our analysis
  • In this case, we wouldn’t need to assume “no spillovers”, but rather “not modelled spillovers”
  • Good design can help us leverage spillovers to our advantage
  • There are several methods to do this:
    • Clustered randomisation (this one you already know!)
    • Multi-level designs (e.g., schools within districts)
    • Within-subject designs (e.g., repeated measures)
    • Waitlist designs (e.g., staggered rollouts)
  • Let’s see them in more detail 😎

Multi-level designs

  • Multi-level designs are nested experiments
  • This means that we have multiple levels of randomisation
  • For example, in a school-based intervention:
    • Schools are randomised to treatment or control
    • Within schools, students are also randomised
  • Political interventions can also be nested:
    • We randomise the treatment at the district level
    • Then we randomise the treatment again at the voter level
  • But we need to expand our potential outcomes notation to account for this…

An example

Get-out-to-vote campaign

  • Imagine we are running an experiment to test the effect of a get-out-to-vote campaign on voter turnout
  • We have a multi-level design:
    • Households are randomised to treatment or control
    • Within households, individuals are also randomised
  • Focus on two-voter households where residents share addresses
  • Random assignment groups:
    • 5,000 households: both voters targeted
    • 5,000 households: neither targeted
    • 10,000 households: one randomly targeted
  • Creates four 10,000-person groups:
    • Mail received by both
    • Neither receives mail
    • Mail with untreated housemate
    • Untreated with treated housemate
  • Multi-level design features:
    • Two-stage randomisation (household then individual)
    • Expandable to additional levels like postcode

An example

Get-out-to-vote campaign

  • Potential outcomes depend on both own and housemate’s treatment status
  • Revised notation system accounts for dual treatment influences (\(Y_{ab}\) where):
    • a = housemate’s treatment status (0=control, 1=treated)
    • b = own treatment status (0=control, 1=treated)
Notation Housemate Self Interpretation
\(Y_{00}\) Control Control Baseline outcome
\(Y_{01}\) Control Treated Direct treatment effect
\(Y_{10}\) Treated Control Spillover effect
\(Y_{11}\) Treated Treated Combined effects
  • Key assumption: Strict household containment
    • Outcomes only affected by within-household interventions
    • No interference from external treatment assignments

Defining causal effects

  • From the four potential outcomes, key estimands emerge:
    • \(Y_{01} - Y_{00}\): Direct effect of treatment when housemate is untreated
    • \(Y_{11} - Y_{10}\): Treatment effect when housemate receives mail
    • \(Y_{10} - Y_{00}\): Spillover effect on untreated with treated housemate
    • \(Y_{11} - Y_{01}\): Spillover interaction among treated pairs
  • Critical considerations:
    • Effects depend on housemate’s treatment status
    • Estimands capture combined influences of:
      • Direct intervention impacts
      • Communication spillovers
      • Shared resource effects
    • Does not isolate specific mechanisms behind spillovers

Implementation challenges

Practical considerations

  • Common implementation hurdles:
    • Treatment contamination between groups
    • Differential attrition across conditions
    • Compliance monitoring complexities
    • Resource allocation for multi-level tracking
  • Ethical considerations:
    • Informed consent for network members
    • Privacy protections for household data
    • Equity implications of spillover effects
  • Design limitations to address:
    • Sample size requirements for cluster effects
    • Measurement challenges for indirect impacts
    • Temporal aspects of spillover timing
    • Correct estimation of standard errors, as subjects are probably correlated in many ways
Note: Pre-registration is crucial for complex designs

Spatial spillovers

Spatial spillovers

  • So far, we have weakened the SUTVA assumption by considering interference in a small setting (households)
  • Sometimes, spillovers are not confined to specific units, but spread across space
  • This is particularly common in urban settings:
    • Crime in adjacent neighbourhoods
    • Pollution in nearby areas
    • Economic development in neighbouring regions
    • Health outcomes in surrounding communities
  • While it seems tempting to just include a measure of proximity in our models, it is not that simple…
    • Not easy to be very precise about spillover effects
    • Not valid when spillovers are not spatially confined (e.g., pollution)
    • Physical proximity does not always make sense (e.g., social networks, phones)

Motivating example

  • Imagine you want to improve healthcare in five villages
  • Assume also that the proper distance between villages is known, so this is not a spatial problem
  • Subjects reside in villages A, B, C, D and F, and E is unoccupied
  • Only one village will receive the treatment, which is a new healthcare facility
  • There are three types of potential outcomes:
    • \(Y_{01}\): Healthcare level of village X if it receives the treatment
    • \(Y_{10}\): Healthcare level of village X if the adjacent village receives the treatment
    • \(Y_{00}\): Healthcare level of village X if village X or its neighbours do not receive the treatment

Potential outcomes

Village Untreated (\(Y_{00}\)) Adjacent village treated (\(Y_{10}\)) Treated (\(Y_{01}\))
A 0 2 0
B 6 2 10
C 0 4 4
D 6 6 6
F 6 NA 3
  • Notice that some villages can never have some outcomes
  • Village E is unoccupied, so it is not included in the analysis
  • NA indicates that Village F can never be adjacent to a treated village
  • In this case, location F can never manifest a \(Y_{10}\) outcome
  • As this potential outcome can never be observed, we exclude F from the definition of the average treatment effect \(E[Y_{10} - Y_{00}]\)

Potential outcomes

  • Second, the probability of assignment to each treatment condition varies from one observation to the next
  • For instance, village A has a 0.20 probability of being exposed to spillovers from an adjacent treated location, whereas the village at location B has a 0.40 probability
  • As we have seen previously, we should weight observations by inverse probability of assigned experimental condition, excluding subjects with zero probability (e.g., F)
Village A B C D F Pr(assignment to control) Pr(assignment to spillover) Pr(assignment to treatment)
1 \(Y_{01}\) \(Y_{10}\) \(Y_{00}\) \(Y_{00}\) \(Y_{00}\) 0.6 0.2 0.2
2 \(Y_{10}\) \(Y_{01}\) \(Y_{10}\) \(Y_{00}\) \(Y_{00}\) 0.4 0.4 0.2
3 \(Y_{00}\) \(Y_{10}\) \(Y_{01}\) \(Y_{10}\) \(Y_{00}\) 0.4 0.4 0.2
4 \(Y_{00}\) \(Y_{00}\) \(Y_{00}\) \(Y_{10}\) \(Y_{01}\) 0.6 0.2 0.2
5 \(Y_{00}\) \(Y_{00}\) \(Y_{00}\) \(Y_{00}\) \(Y_{01}\) 0.8 0 0.2

Calculating \(Y_{01} - Y_{00}\) and \(Y_{10} - Y_{00}\)

Average and spillover effects

  • Now we can calculate the average treatment effect in the presence of spillovers
  • We need to weight the observations by the inverse probability of assignment to the treatment condition
  • Imagine that Village D is treated
  • Villages A, B and F have \(Y_{00}\) as the potential outcome, as they are not directly affected by spillovers
  • Village C, in contrast, is not included in the calculation because it does not have a \(Y_{00}\) potential outcome (untreated), but expresses a \(Y_{10}\) potential outcome instead (spillover effect)
  • The weighted difference-in-means estimator of \(E[Y_{01} - Y_{00}]\) adjusts for the fact that the probabilities of being untreated are 0.60, 0.40, and 0.80 for Villages 1, 2, and 5, respectively
  • So the effect is:

\[ \hat{E}[Y_{01} - Y_{00}] = \frac{\frac{6}{0.2}}{\frac{1}{0.2}} - \frac{\frac{0}{0.6} + \frac{6}{0.4} + \frac{6}{0.8}}{\frac{1}{0.6} + \frac{1}{0.4} + \frac{1}{0.8}} = 1.85. \]

  • If we wanted to estimate the spillover effect, we would calculate \(E[Y_{10} - Y_{00}]\) instead
  • Village D is treated, so we would exclude it from the calculation
  • Village F is also excluded as it cannot be adjacent to a treated village

\[ \hat{E}[Y_{10} - Y_{00}] = \frac{\frac{4}{0.4}}{\frac{1}{0.4}} - \frac{\frac{0}{0.6} + \frac{6}{0.4}}{\frac{1}{0.6} + \frac{1}{0.4}} = 0.40. \]

Within-subject designs and repeated-measures experiments

Within-subject designs

  • Within-subject designs are a type of study that repeatedly measures the same subjects over time and randomisation occurs at a certain point in time
  • The good thing about this design is that it controls for individual differences that could confound the results
  • This type of experiment is rather uncommon in the social sciences, but it is quite common in psychology
  • Instead, social scientists usually rely on interrupted time-series designs to study the effects of interventions
  • This is similar to a regression discontinuity in time
  • Let’s see how they work… (and why they are a bit problematic)
  • The problem with this design is that it is hard to establish causality
  • Confounding factors can affect the results, as something may change at the same time as the intervention

Within-subject designs

Assumptions

  • The key assumption in within-subject designs is that the intervention is the only thing that changes
  • Therefore, we need to make two additional assumptions:
    • No-anticipation: Subjects do not know when the intervention will occur, so they cannot change their behaviour in anticipation
    • For example, if we are studying the effect of a new policy on crime rates, we need to make sure that criminals do not know when the policy will be implemented
    • No persistence: Potential outcomes in one period are unaffected by treatments administered in prior periods
    • This is a bit tricky, as we are assuming that the effect of the treatment disappears after a certain period
    • Experimenters often include “washout periods” between experimental sessions so as to allow the previous period’s effects to dissipate

Variations of repeated-measures experiments

  • Clifford et al (2021) evaluate how different “pre-post” designs can improve experimental estimates, mainly by reducing standard errors
  • Regular experiments (“post only”) have relatively low precision because the outcome is measured just once
  • This is particularly problematic when the outcome is noisy or if the experiment involves multiple treatment arms, moderators, or small treatment effects
  • Pre-post designs measure the outcome prior to the experimental manipulation at point \(t_0\) and after the manipulation at point \(t_1\) (as we’ve just seen)
  • The pre-post design can also be a between-subjects design because some respondents are never exposed to the treatment; respondents difference scores are compared between groups

Variations of time-series experiments

  • Some scholars worry that measuring the outcome prior to the experiment could alter estimated treatment effects
  • This is true in some cases: the no-persistence assumption is likely violated when you the treatment is giving information or training, for example
  • The quasi-pretest-posttest design is a variation of the pre-post design that tries to avoids this problem by providing a similar, but not identical, treatment in the pretest phase
  • The authors decided to test these designs in a replication of six studies to see how they perform in practice
  • If different designs yield largely the same results, researchers can be more confident in the external validity of these findings and choose designs based on other features, like precision

Variations of time-series experiments

Replication of six studies

Variations of time-series experiments

Standard errors

Waitlist designs

Waitlist designs

  • The final type of design we will discuss today is the waitlist design, also known as the stepped-wedge design
  • Their scientific value comes from their ability to track treatment effects among several subjects as they play out over time
  • Waitlists play a diplomatic role because they overcome the problem of withholding treatment from a control group (remember our ethics discussion?)
  • In this design, every subject is treated eventually; random assignment determines when they receive treatment
  • It is a hybrid between a within-subject and a between-subject design, and it is widely used in public health interventions (e.g., vaccine rollouts)

Waitlist designs

Source: Hemming et al (2014)

Example: TV advertising and candidate support

  • Imagine you are a political consultant and you want to test the effect of TV advertising on candidate support
  • Ads are aired during three weeks, and outcomes (support for the gubernatorial candidate as gauged by opinion polls) are assessed at the end of each week
  • Eight media markets are randomly assigned to one of four conditions:
    • Two media markets are randomly assigned to air ads for three weeks starting in week 1
    • Two markets air ads for two weeks starting in week 2
    • Two markets air ads for one week starting in week 3
    • And two markets air no ads at all
  • There are just three relevant potential outcomes:
    • \(Y_{00}\): untreated during preceding and current periods
    • \(Y_{01}\): untreated during the preceding period but treated during the current period
    • \(Y_{11}\): treated in both the preceding and current periods
    • Given the design, we never observe the potential outcome \(Y_{10}\) because media markets never cease to run ads once they start

Advertising waitlist experiment’s random assignments and observed outcomes

Assigned treatment

Market Week 1 Week 2 Week 3
1 01 11 11
2 00 00 01
3 00 01 11
4 00 00 01
5 00 00 00
6 01 11 11
7 00 00 00
8 00 01 11

Observed outcomes

Market Week 1 Week 2 Week 3
1 7 9 4
2 7 5 7
3 1 2 10
4 4 3 10
5 3 3 3
6 10 8 10
7 2 3 4
8 3 1 3

Probabilities of assignment to treatment condition

Treatment Condition Week 1 Week 2 Week 3
Pr(00) 0.75 0.50 0.25
Pr(01) 0.25 0.25 0.25
Pr(11) 0 0.25 0.50

Estimating the immediate effect of TV advertising

  • The immediate treatment effect is \(Y_{01} - Y_{00}\), that is, the effect of being treated in the current period but not in the preceding period
  • We just need to take the numbers from the tables and apply inverse probability weighting again

\[ \begin{aligned} \widehat{E}[Y_{01} - Y_{00}] &= \frac{\frac{7 + 10}{0.25} + \frac{2 + 1}{0.25} + \frac{7 + 10}{0.25}}{\frac{2}{0.25} + \frac{2}{0.25} + \frac{2}{0.25}} \\ &- \frac{\frac{7 + 1 + 4 + 4 + 3 + 2 + 3}{0.75} + \frac{5 + 3 + 3 + 3}{0.50} + \frac{3 + 4}{0.25}}{\frac{6}{0.75} + \frac{4}{0.50} + \frac{2}{0.25}} = 2.72. \end{aligned} \]

Estimating the cumulative effect of TV advertising

  • Finally, we will estimate the cumulative effect of TV advertising, which is \(Y_{11} - Y_{00}\), that is, the effect of being treated in both the preceding and current periods
  • We do the same thing again, but now we consider the \(Y_{11}\) potential outcomes
  • However, we must restrict our attention to the second and third weeks, because this type of treatment cannot occur in the first week

\[ \begin{aligned} \widehat{E}[Y_{11} - Y_{00}] &= \frac{\frac{9 + 8}{0.25} + \frac{4 + 10 + 10 + 3}{0.50}}{\frac{2}{0.25} + \frac{4}{0.50}} \\ &- \frac{\frac{5 + 3 + 3 + 3}{0.50} + \frac{3 + 4}{0.25}}{\frac{4}{0.50} + \frac{2}{0.25}} = 4.13. \end{aligned} \]

Conclusion

  • Spillovers are a common problem in social science research
  • They can be positive or negative, and they can be modelled explicitly in our analysis
  • There are several methods to deal with spillovers, such as multi-level designs, within-subject designs, repeated-measures experiments, and waitlist designs
  • Each design has its advantages and disadvantages, and the choice of design should be based on the research question and the context
  • You can use DeclareDesign to simulate spillover designs and pretest posttest designs
  • The statistical analysis of waitlist designs are a little tricky, but you can use the swCRTdesign package in R to help you with that

And that’s all for today! 🎉

See you next time! 😉