QTM 385 - Experimental Methods

Lecture 10 - Two-sided non-compliance

Danilo Freire

Emory University

Hi, there!
Hope all is well! 😉

Brief recap 📚

Last class

  • One-sided non-compliance occurs when units in the treatment group fail to receive the intervention, while control group members remain unaffected
  • This scenario introduces two key groups:
    • Compliers: Subjects who accept treatment when assigned
    • Never-takers: Subjects who reject treatment regardless of assignment
  • The intent-to-treat (ITT) effect measures the impact of treatment assignment, while the complier average causal effect (CACE) estimates the true effect on those who actually received treatment
  • CACE can be calculated via: \(CACE = \frac{ITT_Y}{ITT_D}\)
  • Instrumental variables (IV) methods like two-stage least squares (2SLS) are preferred for estimation:
    • Regress treatment receipt on assignment (D ~ Z)
    • Use predicted treatment status to estimate outcome effects (Y ~ D_hat)
    • estimatr:::iv_robust(Y ~ D | Z, data = data)
  • Placebo designs help validate IV assumptions by testing compliers against non-treatment conditions
  • We should anticipate non-compliance through large sample sizes and robust experimental designs to maintain internal validity

Today’s plan 📋

Two-sided non-compliance

  • Two-sided non-compliance:
    • Treatment group: Some units don’t receive treatment
    • Control group: Some units access treatment externally
  • Four compliance types:
    • Compliers: Follow assigned treatment
    • Never-takers: Never receive treatment
    • Always-takers: Always seek treatment
    • Defiers: Do opposite of assignment
  • Non-ignorable selection and cross-contamination between arms
  • Monotonicity (no defiers)
  • Encouragement designs, double randomisation, and noncompliance-adjusted power calculations

Source: Alves (2022)

Two-sided non-compliance 🤔

Non-compliance: You already know it’s a problem

Now it will get worse 😂

  • Last class, we discussed one-sided non-compliance
  • We saw that ATE is not estimable in this scenario, as treatment and control groups are no longer comparable (selection bias risk)
  • We also learned how to estimate the CACE using IV methods, and it is the difference in observed treatment and control group outcomes by the proportion of subjects who are Compliers
  • Today, we will discuss two-sided non-compliance, which is even more complex
  • In this scenario, some subjects in the treatment group do not receive treatment, while some in the control group do
  • Underestimation of treatment effect (usually): If some in the treatment group don’t comply, and some in the control group get treatment, the difference between the groups in terms of actual treatment received becomes smaller
  • This can make it look like the treatment is less effective than it actually is. We “dilute” the treatment effect 🤓

Compliance types 📊

Four types of compliance

  • So far, we have discussed Compliers and Never-takers
  • In two-sided non-compliance, we also have Always-takers and Defiers
  • Always-takers: Subjects who always seek treatment, regardless of assignment
  • Defiers: Subjects who do the opposite of their assignment: Imagine stubborn teenagers! 😂
  • Many experiments face this issue, especially in social sciences
    • Encouragement designs: For example, students who receive private school vouchers but still attend public schools, but some students in the control group attend private schools even without vouchers
    • Natural experiments: Lottery defined who would be drafted to the Vietnam War, but some drafted soldiers avoided service, while some non-drafted soldiers volunteered
  • Fortunately, the estimation is similar to one-sided non-compliance, just with more assumptions

Four types of compliance

  • More formally, we have the following:

  • \(Z_i\) is the treatment assignment, \(D_i\) is the treatment receipt, and \(Y_i\) is the outcome

  • Compliers: \(D_i(1) = 1\) and \(D_i(0) = 0\). Similarly, \(D_i(1) \gt D_i(0)\)

  • Never-takers: \(D_i(1) = 0\) and \(D_i(0) = 0\)

  • Always-takers: \(D_i(1) = 1\) and \(D_i(0) = 1\)

  • Defiers: \(D_i(1) = 0\) and \(D_i(0) = 1\). These are usually rare, though

  • Connections between observed data and compliance types:

\(Z_i = 0\) \(Z_i = 1\)
\(D_i = 0\) Never-taker or Complier Never-taker or Defier
\(D_i = 1\) Always-taker or Defier Always-taker or Complier
  • Notice that treatment assignment has no effect on whether always-takers or never-takers are treated
    • Always-takers are treated regardless of assignment, while never-takers are never treated
  • Defiers and compliers, on the other hand, respond to treatment assignment, but in opposite ways
    • So the problem is that we can’t tell who is who, and this makes estimation difficult! 🤔

How to solve this? 🤔

Motivating example: Candidate debate study

  • Mullainathan et al (2010) designed a study to measure the impact of watching a political debate on voting intentions
  • Treatment group: Encouraged to watch debate
  • Control group: Encouraged to watch non-political programme
  • Treatment defined as self-reported debate viewing
  • Always-Takers: Watch debate regardless of encouragement
  • Never-Takers: Never watch debate, even if encouraged
  • Compliers: Watch debate only when encouraged
  • Defiers: Watch debate only when discouraged (watch non-political programme)
  • Compliance type is a fixed attribute in this design
    • If the design was different, compliance types could change!

Quantifying compliance

Estimating group sizes

  • Let \(\pi_{AT}\), \(\pi_{NT}\), \(\pi_{C}\), \(\pi_{D}\) denote proportions of Always-Takers, Never-Takers, Compliers, and Defiers

  • Formulas to estimate these proportions:

    • Always-Takers’ share (\(\pi_{AT}\)): \[ \pi_{AT} = \frac{1}{N} \sum_{i=1}^{N} d_i(1)d_i(0) \]
    • Never-Takers’ share (\(\pi_{NT}\)): \[ \pi_{NT} = \frac{1}{N} \sum_{i=1}^{N} (1-d_i(1))(1-d_i(0)) \]
    • Compliers’ share (\(\pi_{C}\)): \[ \pi_{C} = \frac{1}{N} \sum_{i=1}^{N} d_i(1)z_i(1-d_i(0)) \]
    • Defiers’ share (\(\pi_{D}\)): \[ \pi_{D} = 1 - \pi_{AT} - \pi_{NT} - \pi_{C} \]

Quantifying compliance

Numbers from the debate study

  • Under random assignment, the assigned treatment group has the same expected shares of Always-Takers, Never-Takers, Compliers, and Defiers as the assigned control group
    • Right? Why or why not?
  • In the control group, the untreated subjects are either Never-Takers or Compliers
    • The study of the New York City mayoral debates found that 84% of the control group reported not watching the debate, so \(\hat{\pi}_{NT} + \hat{\pi}_{C} = 0.84\)
    • Subjects in the control group who watched the debate are either Always-Takers or Defiers, and \(\hat{\pi}_{AT} + \hat{\pi}_{D} = 0.16\)
  • In the treatment group, 37% of the subjects reported watching the debate
    • These subjects must be either Always-Takers or Compliers, so \(\hat{\pi}_{AT} + \hat{\pi}_{C} = 0.37\)
    • The remaining 63% are either Never-Takers or Defiers, so \(\hat{\pi}_{NT} + \hat{\pi}_{D} = 0.63\)
  • But we don’t know how many people are in each group, as mentioned before
  • We can estimate these proportions with a trick…

Monotonicity 📏

Monotonicity assumption

No defiers allowed! 😂

  • Monotonicity is a key assumption in two-sided non-compliance
  • It states that no subject is a Defier: No one does the opposite of their assignment
  • This is a strong assumption too, as it is possible that someone does the opposite of what they were told
  • But if we assume that is the case, all problems are solved! 😂
  • In our previous example:
    • If we assume that no one watched the debate when they were told not to, we can estimate the proportions of each group

Monotonicity assumption

Simplifying the Model

  • Assume no Defiers (\(\pi_{D} = 0\)) to simplify estimation
  • With \(\pi_{D} = 0\), we can estimate other proportions:
    • Always-Takers (\(\hat{\pi}_{AT}\)):
    • \(\hat{\pi}_{AT} + \hat{\pi}_{D} = 0.16 \implies \hat{\pi}_{AT} = 0.16\)
    • Never-Takers (\(\hat{\pi}_{NT}\)):
    • \(\hat{\pi}_{NT} + \hat{\pi}_{D} = 0.63 \implies \hat{\pi}_{NT} = 0.63\)
    • Compliers (\(\hat{\pi}_{C}\)):
    • \(\hat{\pi}_{AT} + \hat{\pi}_{C} = 0.37 \implies \hat{\pi}_{C} = 0.37 - \hat{\pi}_{AT} = 0.37 - 0.16 = 0.21\)
    • Alternatively, using:
    • \(\hat{\pi}_{NT} + \hat{\pi}_{C} = 0.84 \implies \hat{\pi}_{C} = 0.84 - \hat{\pi}_{NT} = 0.84 - 0.63 = 0.21\)
  • Both formulas yield approximately \(\hat{\pi}_{C} \approx 0.2\)

Now the easy part, CACE estimation 🤓

Estimating CACE

Now we know what to do! 😉

  • Once we eliminated Defiers, we can estimate the CACE using the same formula as before
  • The CACE is the difference in observed treatment and control group outcomes by the proportion of subjects who are Compliers
  • Although two-sided noncompliance introduces the possibility that some subjects are Always-Takers, they pose no identification problems
  • Always-Takers have no effect on the \(ITT\), and the share of Always-Takers is differenced away when we calculate the \(ITT_D\)
  • Why so? Because they are treated regardless of assignment, so they don’t affect the treatment effect!

ITT decomposition with no defiers

  • Intent-to-treat effect on outcome (\(ITT_Y\)) can be decomposed:

    \[ ITT_Y = ITT_{Y,co} \pi_{co} + \underbrace{ITT_{Y,at} \pi_{at}}_{=0} + \underbrace{ITT_{Y,nt} \pi_{nt}}_{=0} + \underbrace{ITT_{Y,de} \pi_{de}}_{=0 \text{ (mono)}} \]

  • Under Exclusion Restriction (ER) and Monotonicity (mono) assumptions, ITT simplifies to:

    \[ ITT_Y = ITT_{co} \pi_{co} \]

  • Same identification result:

    \[ \tau_{LATE} = \frac{ITT_Y}{ITT_D} \]

Back to our example 📊

Estimating CACE in the debate study

Let’s see how all this works in practice! 😉

  • Remember Mullainathan et al’s (2010) initial experimental design:
    • Treatment group: Encouraged to watch debate
    • Control group: Encouraged to watch non-political programme
    • Treatment defined as self-reported debate viewing
  • Now let’s analyse a subset of their data in the table below:
Treatment group Control group
% Reporting change (N treated) 59.5 (185) 50.0 (80)
% Reporting change (N untreated) 40.6 (320) 40.2 (415)
% Reporting change (total N) 47.5 (505) 41.8 (495)

Source: Mullainathan, Washington, and Azari 2010.

Download the data from our GitHub repository

Does it seem confusing? 🤔

Let’s break it down! 😉

  • The study \(N\) is 1000, with 505 in the treatment group and 495 in the control group
    • These are the total \(N\) in each group, in parentheses, last row of the table
  • What wbout those who were actually treated?
    • 185 in the treatment group and 80 in the control group (huge non-compliance!)
    • These are the \(N\) treated, in parentheses, first row of the table
  • And those who were not treated?
    • 320 in the treatment group and 415 in the control group
    • These are the \(N\) untreated, in parentheses, second row of the table
  • How to calculate CACE here? Remember that \(CACE = \frac{ITT_Y}{ITT_D}\)
  • \(ITT_Y\) is the difference in outcomes between treatment and control groups
    • \(ITT_Y = 47.5 - 41.8 = 5.7\), or \(0.057\) in percentage points
  • So far so good? 😉

Calculating CACE

Let’s do the maths again! 😉

  • \(ITT_Y = 0.057\), right?
  • Now let’s calculate \(ITT_D\):
    • \(ITT_D = 36.6 - 16.2 = 20.4\), or \(0.204\) in percentage points
  • Wait a minute! Where do these numbers come from?
    • Remember that only 185 people were treated in the treatment group
    • \(\frac{185}{505} \times 100 \approx 36.6\)% reported change in the treatment group! Aha! 😂
  • Control group “treatment” rate (non-compliance):
    • \(\frac{80}{495} \times 100 \approx 16.2\)% reported change in the control group
  • So, \(ITT_D = 20.4\) or \(0.204\) in percentage points. Let’s keep it in percentage points for now 😉
  • Finally, \(CACE = \frac{ITT_Y}{ITT_D} = \frac{0.057}{0.204} \approx 0.28\)
  • So, the CACE is \(0.28\)! Woo-hoo! 🥳
  • In fancy words: “The estimate suggests that watching the debates raises the rate at which Compliers report opinion change by 28 percentage points.

Wait, there’s more! 🤓

Using IVs to estimate CACE

  • Remember that I said we could use IV methods to estimate the CACE?
  • It works the same way as in one-sided non-compliance, but with the “no defiers” (monotonicity) assumption
  • We regress treatment receipt on assignment, and then use predicted treatment status to estimate outcome effects
  • The formula is the same: estimatr:::iv_robust(Y ~ D | Z, data = data)
  • The estimatr package is your friend here once again! 😉
  • Since we don’t live in the Stone Age any longer, we can use R to do all the hard work for us! 😂
  • The script is also avaliable in our GitHub repository
  • Let’s do it together, step-by-step, to make sure we all understand what the coefficients mean 🤓

IV estimation in R

ITT

  • First, let’s load the data and estimate the ITT
  • Remember that \(ITT_Y = 0.057\) when we calculated it manually?
  • We will use the lm_robust function to estimate the ITT
  • Can you explain what the coefficients mean?
library(estimatr)

# Load the data
df <- read.csv("./mullainathan.csv")

# Rename variables
ASSIGNED <- df$watch
TREATED <- df$watchdps
Y <- df$ochange

# Estimate ITT
itt_model <- lm_robust(Y ~ ASSIGNED)
summary(itt_model)

Call:
lm_robust(formula = Y ~ ASSIGNED)

Standard error type:  HC2 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|)  CI Lower CI Upper  DF
(Intercept)  0.41818    0.02219  18.843 5.371e-68  0.374632   0.4617 998
ASSIGNED     0.05707    0.03142   1.816 6.965e-02 -0.004595   0.1187 998

Multiple R-squared:  0.003293 , Adjusted R-squared:  0.002294 
F-statistic: 3.298 on 1 and 998 DF,  p-value: 0.06965
# Extract ITT for later
ITT <- coef(itt_model)[2]

ITT interpretation

  • Let’s see coefficients table mean:
  • (Intercept):
    • Estimate: 0.41818
    • The intercept is the predicted value of Y (opinion change) when ASSIGNED = 0 (control group)
    • This is approximately the average rate of opinion change in the control group
    • Looking back at our table, the “% Reporting change (total N)” in the control group was 41.8% or 0.418. Pretty good!
  • ASSIGNED:
    • Estimate: 0.05707
    • This is the coefficient for ASSIGNED
    • It’s the estimated difference in the average value of Y (opinion change) between those assigned to the treatment condition and those who were not
    • This is our regression estimate of the ITT!
    • And it’s very close to the 0.057 we calculated by hand before
  • The results show that the p-value is close to 0.05, suggesting marginal statistical significance

IV estimation in R

ITT_D

  • So the first step is done! 🥳
  • Now let’s estimate the ITT_D
  • \(ITT_D = 0.204\), correct?
  • We will use the lm_robust function again, but this time with the TREATED variable
# Estimate ITT_D
itt_d_model <- lm_robust(TREATED ~ ASSIGNED)
summary(itt_d_model)

Call:
lm_robust(formula = TREATED ~ ASSIGNED)

Standard error type:  HC2 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper  DF
(Intercept)   0.1616    0.01656   9.759 1.503e-21   0.1291   0.1941 998
ASSIGNED      0.2047    0.02711   7.552 9.667e-14   0.1515   0.2579 998

Multiple R-squared:  0.05379 ,  Adjusted R-squared:  0.05284 
F-statistic: 57.03 on 1 and 998 DF,  p-value: 9.667e-14
# Extract ITT_D for later
ITT_D <- coef(itt_d_model)[2]

ITT_D interpretation

  • Let’s see coefficients table mean this time:
  • (Intercept):
    • Estimate: 0.1625
    • The intercept is the predicted value of TREATED when ASSIGNED = 0 (control group)
    • This is approximately the average rate of watching the debate in the control group
    • Looking back at our table, the “% Reporting change (total N)” in the control group was 16.2% or 0.162 indeed!
  • ASSIGNED:
    • Estimate: 0.2041
    • This is the coefficient for ASSIGNED = 1
    • It’s the estimated difference in the average value of TREATED between those assigned to the treatment condition and those who were not
    • This is our regression estimate of the ITT_D!
    • So those who were assigned to the treatment group were 20% more watch the debate than those who were not
    • The p-values are quite low this time, so the results are statistically significant

Putting it all together

IV estimation of CACE

  • Now that we have both ITT and ITT_D, we can calculate the CACE
# Calculate CACE
cace <- iv_robust(Y ~ TREATED | ASSIGNED)
summary(cace)

Call:
iv_robust(formula = Y ~ TREATED | ASSIGNED)

Standard error type:  HC2 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper  DF
(Intercept)   0.3731    0.04357   8.564 4.066e-17   0.2876   0.4586 998
TREATED       0.2787    0.15301   1.822 6.878e-02  -0.0215   0.5790 998

Multiple R-squared:  0.00992 ,  Adjusted R-squared:  0.008928 
F-statistic: 3.319 on 1 and 998 DF,  p-value: 0.06878
# Check value
ITT/ITT_D 
 ASSIGNED 
0.2787494 

CACE interpretation

  • Our final table indicates that:
  • (Intercept):
    • Estimate: 0.41818
    • This is the predicted value of Y when TREATED = 0 (control group)
    • This is approximately the average rate of opinion change in the control group
  • TREATED:
    • Estimate: 0.2787
    • It’s the estimated difference in the average value of Y (opinion change) between those who were treated and those who were not
    • This is our regression estimate of the CACE, our main estimate in this experiment
  • The p-value is above 0.05, so there is no statistical significance here

Now you’re experts in non-compliance 😎

Discussing the assumptions in this model

And how to deal with them in practice (in your own experiments!)

  • Let’s start with the monotonicity assumption
  • In this case, defiers would be people who watch the debate if encouraged not to, while not watching the debate if told to do so
  • This behaviour does seem a little weird, but it’s not impossible
  • Suppose some people get bored and just flip from one channel to another and end up watching the debate
  • Remember that, when we have defiers, we cannot estimate the CACE because \(ITT_D\) is not identified (it includes both compliers and defiers)
  • How big a problem is this in practice? It depends on the context
  • If the ratio of defiers is small, the bias introduced by defiers is small too and CACE is still a good estimate
  • So what to do?
    • Gerber and Green argue that if you have a large sample size, the ATE amongst defiers will probably be close to that of compliers. Do you agree?
    • Create treatments that more closely align with the preferences of the subjects
    • Check if any reverse psychology or reason to resist the treatment exists
    • Check if the control condition seems unfair or unattractive, so that people would resist them
    • Finally, check if you have a very small \(ITT_D\), as this may indicate that defiers are not a big problem

Exclusion restriction

  • The exclusion restriction is another key assumption in this model
  • It states that the instrumental variable (Z) affects the outcome (Y) only through the treatment (D)
  • In our example, the assignment to watch the debate (Z) should only affect opinion change (Y) through watching the debate (D)
  • Oh well… This is a tough one
  • Imagine that people who watch the debate start reading more newspapers and become more informed, which leads to opinion change
  • In this case, the exclusion restriction is violated, and the IV estimate is biased
  • Another possible violation is when people are aware of observer’s presence and change their behaviour
  • This is a serious problem in survey experiments about sensitive topics, like voting intentions or drug use
  • In these cases, people may lie about their behaviour to avoid embarrassment or legal consequences
  • So the effect will be overestimated because people are more likely to report what they think the observer wants to hear
  • How to deal with this? Difficult to say!
  • In my experience, the best solution is to use objective measures whenever possible, that is, indicators that do not depend on self-reporting
  • Do you want to talk more about this? Let’s discuss! 😉

How would you improve the experiment? 🤓

Let’s discuss the debate study for a bit

  • How do you think we could lower the number of defiers in this study?
  • What about the exclusion restriction? How could we make sure that the IV only affects the outcome through the treatment?
  • Do you think that people’s behaviour would change if they knew they were being observed? How could we mitigate this problem?
  • Finally, do you think that the results are generalisable? Why or why not?
  • Think about your own experiments: How would you deal with these issues?

That’s all for today! 🥳

Summary

  • Today we discussed two-sided non-compliance, which is even more complex than one-sided non-compliance
  • We have four compliance types: Compliers, Never-takers, Always-takers, and Defiers
  • We also discussed the monotonicity assumption, which states that no one does the opposite of their assignment
  • We saw how to estimate the CACE using IV methods and how to interpret the results (again!)
  • Finally, we discussed the exclusion restriction and how to deal with it in practice
  • Do you have any questions? Let’s me know! 😉

Next class

  • Next class, we will discuss attrition (missing data in the outcome)
  • We will see how to deal with missing data in experiments and how to impute missing values (should you do it?)
  • Please also send me your ideas for the final project! I’m looking forward to hearing from you! 🤓
  • Thanks to all of you who have already done so! You’re awesome! 🥳

Thank you! 🙏

See you next time! 😉