QTM 385 - Experimental Methods

Lecture 09 - One-Sided Non-Compliance

Danilo Freire

Emory University

Hi, there!
Hope all is well! 😉

Group work 👥

Group work

This week’s task

  • Please send me an email () by Wednesday with the following content:
    • Two paragraphs (maximum) summarising an experiment that you wish to develop in this course. At a minimum, your summary should include a research question, why the question is important, and a rough outline of how you plan to answer the question.
  • We’ll be working on a little bit of the task each week during class, and I’ll be posting the week’s assignments on the website. It will be fun! 😊

Brief recap 📚

Last class

  • Clustering:
    • Assigns whole groups (e.g. classrooms) to treatment due to practical constraints
    • Introduces intra-cluster correlation (ICC) which increases variance
    • Requires cluster-robust standard errors and careful power calculations
    • The effective sample size is always smaller than the actual sample size, sometimes substantially so!
  • A few suggestions on how to deal with clustering:
    • Increase the number of clusters
    • Increase the number of units per cluster
    • Use pair-matching (or any type of blocking) to improve precision
  • Statistical power:
    • Power = Probability of detecting true effects (aim for ≥80%) Influenced by: effect size magnitude, outcome variability, sample size, and significance level
    • DeclareDesign package enables power simulation through:
    • Model declaration, treatment effect estimation, design diagnosis across sample sizes
    • You can use any power calculator to estimate power, but DeclareDesign allows for any type of design, what can be difficult to estimate using traditional formulas
    • Power curves show how power changes with sample size

Today’s plan 📋

One-sided non-compliance

  • One-sided non-compliance
  • Compliers versus never-takers
  • Intent-to-treat (ITT) effect versus average treatment effect (ATE)
  • Complier average causal effect (CACE) is the effect of the treatment on the compliers
  • Instrumental variables (IV) can be used to estimate the CACE
  • Two-stage least squares (2SLS) is the most common IV method
  • Placebo designs can help test the IV assumptions

Non-compliance 🤔

Non-compliance: A big problem!

  • In experimental research, compliance is the extent to which participants follow the treatment assignment
  • Under full compliance, all participants follow the treatment assignment (and that’s what we want!)
  • Non-compliance occurs when participants do not follow the treatment assignment
  • In everyday language, compliance and non-compliance have a negative connotation, but in research, they are neutral terms
  • Non-compliance is a problem because it undermines the internal validity of the study
  • Today we will examine one-sided non-compliance, which is when units in the treatment group do not receive the treatment
    • Those in the control group are not affected by this issue
  • Next class, we will discuss two-sided non-compliance, which is when some people in the treatment group do not receive the treatment and some people in the control group do receive the treatment
    • This complicates analysis quite a bit, but we have methods to deal with it 🤓

Non-compliance visualised

Source: Spotify R&D Engineering (2023)

A motivating example

Canvassing to increase voter turnout

  • Imagine that you are interested in studying the effect of canvassing on voter turnout
    • Maybe if you knock on people’s doors and talk to them about the importance of voting, they will be more likely to vote!
  • You design an experiment where you randomly assign 1000 to receive canvassing (treatment group) and 1000 to not receive canvassing (control group)
  • However… usually only 25% of the people in the treatment group are actually canvassed
    • The rest are not home, refuse to talk, etc.
  • So we have 250 people treated and 1000 in the control group
  • What would you do? 🤔

Option 01: Just compare the two groups

As-treated analysis

  • The first option we have is to just compare the two groups as if nothing had happened
  • So we would compare the 1000 people who were in the treatment group with the 1000 people who were in the control group
  • Then calculate the average treatment effect (ATE) as the difference between the two groups as we always do
  • What do you think? 🤔

Option 01: Just compare the two groups

As-treated analysis

  • The problem with this approach is that it undermines the internal validity of the study
  • The random assignment is no longer valid because the treatment group is not receiving the treatment
  • We are assuming that the effect of canvassing is zero for the 750 people who were not canvassed
  • There might be selection bias in the treatment group
    • For example, maybe the people who were canvassed are more likely to vote anyway
    • People who refuse to talk to canvassers might be less likely to vote, and so on
  • So this approach is not recommended 👎

Option 02: Assume random compliance

As-treated analysis

  • The second option, related to the first, is to assume that the differences between the two groups are random
  • In other words, we assume that the people who were canvassed are randomly selected from the treatment group
  • And the fact that only 25% of the people were canvassed is just bad luck
  • If this is the case, we can drop the people who were not canvassed from the treatment group and compare the 250 people who were canvassed with the 1000 people in the control group
  • This would be able to recover the true ATE if the assumption is correct
  • What do you think? 🤔

Option 02: Assume random compliance

As-treated analysis

  • The problem with this approach is that we cannot test the assumption
  • We cannot know if the differences between the two groups are random or not
  • Most likely they are not!
  • Unless you can really justify that the differences are random, this approach is not recommended 👎
  • But if you can justify it (good luck with that! 😂), this is okay!

Option 03: Redefine the ATE

Just give people the choice

  • The third option is to stick to the random assignment and compare the two groups as if everyone had followed the treatment assignment
  • Instead of comparing the people who were actually canvassed with those who were not canvassed, we compare the people who were assigned to be canvassed with those who were not assigned to be canvassed
  • The difference here is semantic:
    • We would be able to recover the true ATE if we had only given people the choice of whether or not to be canvassed
  • For instance, rather than analysing the effect of Medicaid on health outcomes, we would be analysing the effect of being offered Medicaid on health outcomes
    • In this definition, non-compliance is impossible
  • What do you think?

Option 03: Redefine the ATE

Just give people the choice

  • The problem with this approach is that it underestimates the treatment effect
  • The average treatment effect is the difference between the outcome of the people who were actually canvassed and the outcome of the people who were not canvassed
  • But the this analysis compares the outcome of the people who were assigned to be canvassed with the outcome of the people who were not assigned to be canvassed
  • This is not the same as receiving the treatment or not!
  • So this approach is not recommended either 👎

Option 04: Instrumental variables (IV)

A clever way to deal with non-compliance

  • The fourth option is to use instrumental variables (IV) (or two-stage least squares - 2SLS)
  • The benefit of IV is that it allows us to recover the true effect of the programme instead of only the effect of being offered the programme
  • The downside is that IV does not allow us to recover the true ATE in the whole population
  • This is because it only measures the effect of the programme on the compliers, that is, those who took the treatment when it was assigned to them
  • This is the best approach 👍
  • But first, some definitions and concepts… 🤓

New definitions and assumptions 🤓

Full compliance

  • In an ideal experiment, we randomly assign each user to a treatment or a control group
  • All users in the treatment group experience the treatment, and all users in the control group do not experience the treatment
  • The table below summarises full compliance:
Random assignment (\(Z\)) Treatment status (\(D\))
Treatment Treated
Control Untreated
  • For the next slides, it is useful to introduce some definitions:
    • \(Z \in \{0, 1\}\) indicates whether a user was assigned to the treatment or the control group (visited by a canvasser or not)
    • \(D \in \{0, 1\}\) indicates whether a user was treated (actually heard the message)
    • \(Y\), as always, is the outcome we care about (voter turnout)
  • In this case, the treatment effect is the difference between the potential outcomes of the treated and untreated users, as we have seen before

One-sided non-compliance

  • In the case of one-sided non-compliance, some users in the treatment group do not receive the treatment
  • The table below summarises the situation:
Random assignment (\(Z\)) Treatment status (\(D\))
Treatment Treated
Untreated
Control Untreated
  • In this case, the quantity \(E = [Y| Z=1] - [Y|Z=0]\) does not represent the treatment effect anymore
  • Instead, it represents the effect of being assigned to the treatment group only, i.e., the intent-to-treat (ITT) effect
  • Let’s formalise this a bit more…

One-sided non-compliance

Notation

  • Let the experimental assignment of subject \(i\) be \(z_i\)
  • When \(z_i = 1\), the subject is assigned to the treatment group, and when \(z_i = 0\), the subject is assigned to the control group
  • Let \(d_i(z)\) represent whether subject \(i\) is actually treated, given the assignment \(z_i\)
  • To make it short, let’s write \(d_i(z = 1)\) as \(d_i(1)\) and \(d_i(z = 0)\) as \(d_i(0)\)
  • If a subject receives no treatment when assigned to the control groups, we represent them as \(d_i(0) = 0\)
  • For one-sided non-compliance, \(d_i(0)\) is always 0 for all people in the control groups, but \(d_i(1)\) can be 0 or 1
  • If \(d_i(1) = 1\), I would open the door if canvassed, but if \(d_i(1) = 0\), I would not open the door

Compliers and never-takers

Two new groups

  • In the case of one-sided non-compliance, we have two new groups of analysis
  • Compliers are those who would take the treatment if assigned to the treatment group and would not take the treatment if assigned to the control group
    • So, \(d_i(1) = 1\) and \(d_i(0) = 0\)
  • However, we also have a group of people who would not take the treatment even if assigned to the treatment group
    • These are the never-takers
    • For them, \(d_i(1) = d_i(0) = 0\)
  • Thus, the expression \(ATE | d_i(1)\) means the average treatment effect (ATE) for the compliers
  • Keep in mind that the names “compliers” and “never-takers” are unrelated with the outcomes \(Y_i\), just with the treatment assignment \(d_i(z)\)
  • It is not always easy to define who is a complier in an experiment
    • What if canvassing is done in the weekends but some people are at home only during the week? Compliers or never-takers?
    • If we canvass them during the week instead, are they compliers or never-takers?

First assumption: Non-interference

  • The first assumption we need to make is that of non-interference
  • Non-interference means that whether a subject is treated depends only on the subject’s own treatment group assignment
  • This assumption is strong, difficult to test, and often violated
  • The intent-to-treat (\(ITT\)) effect of assignment (\(z\)) on treatment status (\(d\)) is defined as:

\[ ITT_{i, D} = d_i (1) - d_i (0) \]

  • If everyone complies perfectly, then \(d_i(1)\) will be 1 and \(d_i(0)\) will be 0, so the difference is 1
  • The average \(ITT_{i, D}\) across all subjects is

\[ITT_D = E[ITT_{i, D}] = E[d_i(1)] - E[d_i(0)]\]

  • That is, the proportion of people who take the treatment when assigned to the treatment group minus the proportion of people who take the treatment when assigned to the control group
  • In one-sided non-compliance, \(E[d_i(0)] = 0\) for all subjects, so \(ITT_D = E[d_i(1)] \geq 0\)

ITT effect on the outcome

  • The intent-to-treat effect of \(z_i\) on \(Y_i\) for each subject is:

\[ITT_{i,Y} = Y_i(z = 1, d(1)) - Y_i(z = 0, d(0))\]

  • That is:
    • \(Y_i(z = 1, d(1))\): Outcome for person \(i\) if assigned to treatment (\(z=1\)) and they actually take the treatment (\(d(1)\))
    • \(Y_i(z = 0, d(0))\): Outcome for person \(i\) if assigned to control (\(z=0\)) and do not take the treatment (\(d(0)\))
  • Hence, the average \(ITT_{Y}\) is:

\[ITT_{Y} = E[ITT_{Y}] = E[Y_i(z = 1, d(1))] - E[Y_i(z = 0, d(0))]\]

  • If we have full compliance, \(ITT_{Y}\) is the same as the average treatment effect (ATE)

  • If not, \(ITT_{Y}\) is the intent-to-treat (ITT) effect: whether a programme “made a difference” in the outcome, regardless of whether people actually took the treatment

Second assumption: Exclusion restriction

  • The second assumption we need to make is that of the exclusion restriction
  • The exclusion restriction means that the only way the treatment assignment (\(z\)) affects the outcome (\(Y\)) is through its effect on whether people actually get the treatment (\(d\))
  • In other words, untreated subjects have the same potential outcomes regardless of their assignments:
    • \(Y_i(z = 0, d(0)) = Y_i(z = 0, d(1))\)
  • And the same is true for treated subjects:
    • \(Y_i(z = 1, d(1)) = Y_i(z = 1, d(0))\)
  • In general:
    • \(Y_i(z, d) = Y_i(d)\)
  • This assumption is also strong, and the main reason why we have placebos in science!

CACE and IVs 🤓

Complier average causal effect (CACE)

The effect of the treatment on the compliers

  • As we cannot correctly estimate the ATE with non-compliance, we focus on the complier average causal effect (CACE)
  • CACE tries to answer this question: “For those individuals who actually heard the message, what is the effect of the message on their likelihood of voting?
  • Formally, the CACE is defined as:

\[ CACE \equiv \frac{\sum_{i=1}^{N}(Y_i(1) - Y_i(0))d_i(1)}{\sum_{i=1}^{N}d_i(1)} = \frac{ITT_Y}{ITT_D} = E[(Y_i(d = 1) - Y_i(d = 0)) | d_i(1) = 1] \]

  • In other words, it is the treatment effect, but only coming from the compliers, divided by the number of compliers
  • CACE is also know as Local Average Treatment Effects (LATE)

CACE and instrumental variables

The effect of the treatment on the compliers

  • The good thing about CACE/LATE is that we have a consistent estimator for it
  • Equivalent to two-stage least square estimators
    • Regress \(D_i\) on \(Z_i\) to get fitted values \(\hat{D_i}\)
    • Regress \(Y_i\) on \(\hat{D_i}\)
  • But remember the assumptions:
    • Non-interference will be violated, for instance, if subject \(j\) is canvassed and tells \(i\) about it
    • Excludability can also be violated if the controls are treated by another canvassing campaign
    • First stage requires that at least one person is treated

Estimating the regressions

  • Let’s go back to our example of canvassing
  • We have the following regressions to estimate the CACE:
    • First stage: \(TREATED_i = \alpha_0 + \alpha_1 ASSIGNED + \epsilon_i\)
    • Second stage: \(VOTED_i = \beta_0 + \beta_1 TREATED + u_i\)
  • The first stage estimates the effect of the treatment assignment on the treatment status
  • The second stage estimates the effect of the treatment status on the outcome
  • The coefficient \(\beta_1\) is the two-stage least squares (2SLS) estimator of the CACE

Estimating the regressions

Data preparation

# Load the dataset
data1 <- read.csv("canvassing.csv", head=TRUE, sep=",")
colnames(data1)
 [1] "X"          "id1"        "persons"    "v98_1"      "v98_2"     
 [6] "persngrp"   "mailings"   "phongotv"   "ward"       "majpty1"   
[11] "majpty2"    "age1"       "age2"       "placebo"    "vote98"    
[16] "v98"        "cntany"     "pcntany"    "v96_1"      "v96_0"     
[21] "age"        "agemiss"    "agesq"      "majorpty"   "onetreat"  
[26] "onetreat_p"
dim(data1)
[1] 31098    26
# Select one-person households that were either pure controls or canvass only
sel <-  data1$onetreat==1 & data1$mailings==0 & data1$phongotv==0 & data1$persons==1

# Verify the number of observations
table(sel)
sel
FALSE  TRUE 
24008  7090 
data2 <- data1[sel,]

# Rename variables
data2$VOTED    <- data2$v98
data2$ASSIGNED <- data2$persngrp
data2$TREATED  <- data2$cntany

Estimating the regressions

First stage: \(ITT_d\) with robust standard errors

  • The intercept of zero in this equation indicates that no one in the control group was contacted, in keeping with the definition of one-sided noncompliance
  • The coefficient 0.273 indicates that assignment to the treatment group caused 27.3% of the targeted subjects to be treated
  • In other words, the estimated share of Compliers in the treatment group is 27.3%. The 95% CI suggests that this proportion ranges from 25.0% to 29.6%
# Load the required packages
library(AER)      # For IV
library(sandwich) # For robust SEs

# Box 5.5: ITT_D
# Note that results from this will vary from the book
itt_d_fit <- lm(TREATED ~ ASSIGNED, data = data2)
coeftest(itt_d_fit, vcovHC(itt_d_fit))

t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 1.5358e-14 1.6258e-16  94.464 < 2.2e-16 ***
ASSIGNED    2.7336e-01 1.1733e-02  23.299 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimating the regressions

\(ITT_Y\) with robust standard errors

  • Here we estimate the ITT of the whole population
  • This model accounts for the possibility that ASSIGNED is not a perfect measure of treatment status
  • It can be “endogenous”, that is, related to unobserved factors (\(u_i\)) that affect outcomes
  • Those assigned to the treatment group were 3.84 percentage points more likely to vote
  • The estimated ITT may be a useful thing to know!
  • If you are conducting an evaluation of a programme, you can use the ITT to assess the programs output in relation to its costs
# Box 5.4: ITT with robust SEs
itt_fit <- lm(VOTED ~ ASSIGNED, data = data2)
coeftest(itt_fit, vcovHC(itt_fit))

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 0.375376   0.006446 58.2344 < 2.2e-16 ***
ASSIGNED    0.038464   0.014479  2.6565  0.007914 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimating the regressions

CACE using 2SLS

  • Finally, here we estimate the CACE
  • It is the effect of the treatment on the compliers
  • So we could just have used the formula: \(CACE = \frac{ITT_Y}{ITT_D} = \frac{0.038464}{0.2734} \approx 0.1407\)
  • The estimated average treatment effect of the canvassing treatment among Compliers is a 14.07 percentage point increase in the probability of voting
  • We could have estimated the CACE using the ivreg function from the AER package and gotten the same result
# Box 5.6: CACE
cace_fit <- ivreg(VOTED ~ TREATED, ~ ASSIGNED, data = data2)
coeftest(cace_fit, vcovHC(cace_fit))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.375376   0.006446 58.2344   <2e-16 ***
TREATED     0.140711   0.052434  2.6836   0.0073 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimating the regressions

Using estimatr

  • There’s no need to learn how to use ivreg if you don’t want to!
  • Our familiar estimatr package has a function called iv_robust that does the same thing
  • The results are the same as before, and we also see that the 95% confidence interval ranges from 0.038 to 0.24, that is, canvassing increses the probability of voting by 3.8 to 24 percentage points
  • The effect is positive and statistically significant
# Box 5.6: CACE 
# Load estimatr
library(estimatr)

# CACE with estimatr
cace_fit2 <- iv_robust(VOTED ~ TREATED | ASSIGNED, data = data2)
cace_fit2
             Estimate Std. Error   t value    Pr(>|t|)   CI Lower  CI Upper
(Intercept) 0.3753764 0.00644539 58.239525 0.000000000 0.36274155 0.3880113
TREATED     0.1407115 0.05241688  2.684469 0.007281396 0.03795877 0.2434642
              DF
(Intercept) 7088
TREATED     7088

Designs that antecipate non-compliance 🤓

Large-\(n\) designs

  • Non-compliance not only prevents us from estimating the true ATE, it also makes CACE estimation more challenging.
  • While 2SLS is a consistent estimator for the CACE, the estimator becomes much less precise if the proportion of compliers is small
  • So the first advice is to design experiments with large sample sizes to increase the number of compliers
  • This is not always feasible, though, as it can be expensive
  • But if you can, do it! 😊

Placebo designs

  • A more realistic approach is to anticipate non-compliance and include placebo conditions in the experiment
  • This is done in two steps:
    • First, subjects are recruited to the study and assigned to treatment and control groups
    • Second, given compliance, subjects are randomly allocated to two groups:
      • The treatment group receives the treatment in the usual way
      • The placebo group receives a “non-treatment” that is assumed to have no effect on the outcome of interest
  • For instance, we could have a placebo group that receives a fake canvassing treatment, such as information about the importance of recycling or the benefits of exercise
  • CACE can be estimated by comparing the outcomes for those given the canvassing treatment and those given the “non-treatment”

Placebo designs

  • Why does this work?
  • Because the main problem in one-sided compliance is the existence of never-takers
  • But if we randomise the treatment amongst the compliers, we screen-out the never-takers by design
  • Compliers in the treated state can then be compared directly to Compliers in the untreated state, which eliminates the noise generated by the never-takers
  • Thus, we are back to full compliance and can estimate the true ATE!
  • This is a very powerful tool in experimental research
  • Think about it when designing your experiments! 😊

Partial treatment

  • Finally, what to do when we have partial treatment?
  • For instance, a subject interrupts the medical treatment before the end
  • The easiest and most widely used approach is to classify the partially-treated subject as untreated, estimate the CACE, and then classify the subject as treated and estimate the CACE again
  • Those two estimates provide bounds for the CACE
    • The lower bound is the estimate when the subject is classified as treated
    • The upper bound is the estimate when the subject is classified as untreated
  • While not perfect, this strategy at least provides a range of possible values for the CACE and allows us to quantity the uncertainty in our estimates

Conclusion 📚

Conclusion

  • Non-compliance is a big problem in experimental research
  • One-sided non-compliance is when units in the treatment group do not receive the treatment
  • We have seen that we have several options to deal with non-compliance, but the best one is to use instrumental variables (IV)
  • IV allows us to estimate the complier average causal effect (CACE), which is the effect of the treatment on the compliers
  • We have also seen that large-\(n\) designs and placebo designs can help anticipate non-compliance
  • Next class, we will discuss two-sided non-compliance, which is when some people in the treatment group do not receive the treatment and some people in the control group do receive the treatment
  • …and we will see how to deal with it! 😊

…and that’s all for today! 🎉

Thank you! 🙏