QTM 385 - Experimental Methods

Lecture 10 - Two-sided non-compliance

Danilo Freire

danilo.freire@emory.edu

Emory University

Hi, there!
Hope all is well! 😉

Brief recap 📚

Last class

One-sided non-compliance occurs when units in the treatment group fail to receive the intervention, while control group members remain unaffected
This scenario introduces two key groups:
- Compliers: Subjects who accept treatment when assigned
- Never-takers: Subjects who reject treatment regardless of assignment
The intent-to-treat (ITT) effect measures the impact of treatment assignment, while the complier average causal effect (CACE) estimates the true effect on those who actually received treatment
CACE can be calculated via: \(CACE = \frac{ITT_Y}{ITT_D}\)

Instrumental variables (IV) methods like two-stage least squares (2SLS) are preferred for estimation:
- Regress treatment receipt on assignment (D ~ Z)
- Use predicted treatment status to estimate outcome effects (Y ~ D_hat)
- estimatr:::iv_robust(Y ~ D | Z, data = data)
Placebo designs help validate IV assumptions by testing compliers against non-treatment conditions
We should anticipate non-compliance through large sample sizes and robust experimental designs to maintain internal validity

Today’s plan 📋

Two-sided non-compliance

Two-sided non-compliance:
- Treatment group: Some units don’t receive treatment
- Control group: Some units access treatment externally
Four compliance types:
- Compliers: Follow assigned treatment
- Never-takers: Never receive treatment
- Always-takers: Always seek treatment
- Defiers: Do opposite of assignment
Non-ignorable selection and cross-contamination between arms
Monotonicity (no defiers)
Encouragement designs, double randomisation, and noncompliance-adjusted power calculations

Source: Alves (2022)

Two-sided non-compliance 🤔

Non-compliance: You already know it’s a problem

Now it will get worse 😂

Last class, we discussed one-sided non-compliance
We saw that ATE is not estimable in this scenario, as treatment and control groups are no longer comparable (selection bias risk)
We also learned how to estimate the CACE using IV methods, and it is the difference in observed treatment and control group outcomes by the proportion of subjects who are Compliers
Today, we will discuss two-sided non-compliance, which is even more complex
In this scenario, some subjects in the treatment group do not receive treatment, while some in the control group do
Underestimation of treatment effect (usually): If some in the treatment group don’t comply, and some in the control group get treatment, the difference between the groups in terms of actual treatment received becomes smaller
This can make it look like the treatment is less effective than it actually is. We “dilute” the treatment effect 🤓

Compliance types 📊

Four types of compliance

So far, we have discussed Compliers and Never-takers
In two-sided non-compliance, we also have Always-takers and Defiers
Always-takers: Subjects who always seek treatment, regardless of assignment
Defiers: Subjects who do the opposite of their assignment: Imagine stubborn teenagers! 😂
Many experiments face this issue, especially in social sciences
- Encouragement designs: For example, students who receive private school vouchers but still attend public schools, but some students in the control group attend private schools even without vouchers
- Natural experiments: Lottery defined who would be drafted to the Vietnam War, but some drafted soldiers avoided service, while some non-drafted soldiers volunteered
Fortunately, the estimation is similar to one-sided non-compliance, just with more assumptions

Four types of compliance

More formally, we have the following:
\(Z_i\) is the treatment assignment, \(D_i\) is the treatment receipt, and \(Y_i\) is the outcome
Compliers: \(D_i(1) = 1\) and \(D_i(0) = 0\). Similarly, \(D_i(1) \gt D_i(0)\)
Never-takers: \(D_i(1) = 0\) and \(D_i(0) = 0\)
Always-takers: \(D_i(1) = 1\) and \(D_i(0) = 1\)
Defiers: \(D_i(1) = 0\) and \(D_i(0) = 1\). These are usually rare, though
Connections between observed data and compliance types:

	\(Z_i = 0\)	\(Z_i = 1\)
\(D_i = 0\)	Never-taker or Complier	Never-taker or Defier
\(D_i = 1\)	Always-taker or Defier	Always-taker or Complier

Notice that treatment assignment has no effect on whether always-takers or never-takers are treated
- Always-takers are treated regardless of assignment, while never-takers are never treated
Defiers and compliers, on the other hand, respond to treatment assignment, but in opposite ways
- So the problem is that we can’t tell who is who, and this makes estimation difficult! 🤔

How to solve this? 🤔

Motivating example: Candidate debate study

Mullainathan et al (2010) designed a study to measure the impact of watching a political debate on voting intentions
Treatment group: Encouraged to watch debate
Control group: Encouraged to watch non-political programme
Treatment defined as self-reported debate viewing

Always-Takers: Watch debate regardless of encouragement
Never-Takers: Never watch debate, even if encouraged
Compliers: Watch debate only when encouraged
Defiers: Watch debate only when discouraged (watch non-political programme)
Compliance type is a fixed attribute in this design
- If the design was different, compliance types could change!

Quantifying compliance

Estimating group sizes

Let \(\pi_{AT}\), \(\pi_{NT}\), \(\pi_{C}\), \(\pi_{D}\) denote proportions of Always-Takers, Never-Takers, Compliers, and Defiers
Formulas to estimate these proportions:
- Always-Takers’ share (\(\pi_{AT}\)): \[ \pi_{AT} = \frac{1}{N} \sum_{i=1}^{N} d_i(1)d_i(0) \]
- Never-Takers’ share (\(\pi_{NT}\)): \[ \pi_{NT} = \frac{1}{N} \sum_{i=1}^{N} (1-d_i(1))(1-d_i(0)) \]
- Compliers’ share (\(\pi_{C}\)): \[ \pi_{C} = \frac{1}{N} \sum_{i=1}^{N} d_i(1)z_i(1-d_i(0)) \]
- Defiers’ share (\(\pi_{D}\)): \[ \pi_{D} = 1 - \pi_{AT} - \pi_{NT} - \pi_{C} \]

Quantifying compliance

Numbers from the debate study

Under random assignment, the assigned treatment group has the same expected shares of Always-Takers, Never-Takers, Compliers, and Defiers as the assigned control group
- Right? Why or why not?
In the control group, the untreated subjects are either Never-Takers or Compliers
- The study of the New York City mayoral debates found that 84% of the control group reported not watching the debate, so \(\hat{\pi}_{NT} + \hat{\pi}_{C} = 0.84\)
- Subjects in the control group who watched the debate are either Always-Takers or Defiers, and \(\hat{\pi}_{AT} + \hat{\pi}_{D} = 0.16\)
In the treatment group, 37% of the subjects reported watching the debate
- These subjects must be either Always-Takers or Compliers, so \(\hat{\pi}_{AT} + \hat{\pi}_{C} = 0.37\)
- The remaining 63% are either Never-Takers or Defiers, so \(\hat{\pi}_{NT} + \hat{\pi}_{D} = 0.63\)
But we don’t know how many people are in each group, as mentioned before
We can estimate these proportions with a trick…

Monotonicity 📏

Monotonicity assumption

No defiers allowed! 😂

Monotonicity is a key assumption in two-sided non-compliance
It states that no subject is a Defier: No one does the opposite of their assignment
This is a strong assumption too, as it is possible that someone does the opposite of what they were told
But if we assume that is the case, all problems are solved! 😂
In our previous example:
- If we assume that no one watched the debate when they were told not to, we can estimate the proportions of each group

Monotonicity assumption

Simplifying the Model

Assume no Defiers (\(\pi_{D} = 0\)) to simplify estimation
With \(\pi_{D} = 0\), we can estimate other proportions:
- Always-Takers (\(\hat{\pi}_{AT}\)):
- \(\hat{\pi}_{AT} + \hat{\pi}_{D} = 0.16 \implies \hat{\pi}_{AT} = 0.16\)
- Never-Takers (\(\hat{\pi}_{NT}\)):
- \(\hat{\pi}_{NT} + \hat{\pi}_{D} = 0.63 \implies \hat{\pi}_{NT} = 0.63\)
- Compliers (\(\hat{\pi}_{C}\)):
- \(\hat{\pi}_{AT} + \hat{\pi}_{C} = 0.37 \implies \hat{\pi}_{C} = 0.37 - \hat{\pi}_{AT} = 0.37 - 0.16 = 0.21\)
- Alternatively, using:
- \(\hat{\pi}_{NT} + \hat{\pi}_{C} = 0.84 \implies \hat{\pi}_{C} = 0.84 - \hat{\pi}_{NT} = 0.84 - 0.63 = 0.21\)
Both formulas yield approximately \(\hat{\pi}_{C} \approx 0.2\)

Now the easy part, CACE estimation 🤓

Estimating CACE

Now we know what to do! 😉

Once we eliminated Defiers, we can estimate the CACE using the same formula as before
The CACE is the difference in observed treatment and control group outcomes by the proportion of subjects who are Compliers
Although two-sided noncompliance introduces the possibility that some subjects are Always-Takers, they pose no identification problems
Always-Takers have no effect on the \(ITT\), and the share of Always-Takers is differenced away when we calculate the \(ITT_D\)
Why so? Because they are treated regardless of assignment, so they don’t affect the treatment effect!

ITT decomposition with no defiers

Intent-to-treat effect on outcome (\(ITT_Y\)) can be decomposed:

\[ ITT_Y = ITT_{Y,co} \pi_{co} + \underbrace{ITT_{Y,at} \pi_{at}}_{=0} + \underbrace{ITT_{Y,nt} \pi_{nt}}_{=0} + \underbrace{ITT_{Y,de} \pi_{de}}_{=0 \text{ (mono)}} \]
Under Exclusion Restriction (ER) and Monotonicity (mono) assumptions, ITT simplifies to:

\[ ITT_Y = ITT_{co} \pi_{co} \]
Same identification result:

\[ \tau_{LATE} = \frac{ITT_Y}{ITT_D} \]

Back to our example 📊

Estimating CACE in the debate study

Let’s see how all this works in practice! 😉

Remember Mullainathan et al’s (2010) initial experimental design:
- Treatment group: Encouraged to watch debate
- Control group: Encouraged to watch non-political programme
- Treatment defined as self-reported debate viewing
Now let’s analyse a subset of their data in the table below:

	Treatment group	Control group
% Reporting change (N treated)	59.5 (185)	50.0 (80)
% Reporting change (N untreated)	40.6 (320)	40.2 (415)
% Reporting change (total N)	47.5 (505)	41.8 (495)

Source: Mullainathan, Washington, and Azari 2010.

Download the data from our GitHub repository

Does it seem confusing? 🤔

Let’s break it down! 😉

The study \(N\) is 1000, with 505 in the treatment group and 495 in the control group
- These are the total \(N\) in each group, in parentheses, last row of the table
What wbout those who were actually treated?
- 185 in the treatment group and 80 in the control group (huge non-compliance!)
- These are the \(N\) treated, in parentheses, first row of the table
And those who were not treated?
- 320 in the treatment group and 415 in the control group
- These are the \(N\) untreated, in parentheses, second row of the table
How to calculate CACE here? Remember that \(CACE = \frac{ITT_Y}{ITT_D}\)
\(ITT_Y\) is the difference in outcomes between treatment and control groups
- \(ITT_Y = 47.5 - 41.8 = 5.7\), or \(0.057\) in percentage points
So far so good? 😉

Calculating CACE

Let’s do the maths again! 😉

\(ITT_Y = 0.057\), right?
Now let’s calculate \(ITT_D\):
- \(ITT_D = 36.6 - 16.2 = 20.4\), or \(0.204\) in percentage points
Wait a minute! Where do these numbers come from?
- Remember that only 185 people were treated in the treatment group
- \(\frac{185}{505} \times 100 \approx 36.6\)% reported change in the treatment group! Aha! 😂
Control group “treatment” rate (non-compliance):
- \(\frac{80}{495} \times 100 \approx 16.2\)% reported change in the control group
So, \(ITT_D = 20.4\) or \(0.204\) in percentage points. Let’s keep it in percentage points for now 😉
Finally, \(CACE = \frac{ITT_Y}{ITT_D} = \frac{0.057}{0.204} \approx 0.28\)
So, the CACE is \(0.28\)! Woo-hoo! 🥳
In fancy words: “The estimate suggests that watching the debates raises the rate at which Compliers report opinion change by 28 percentage points.”

Wait, there’s more! 🤓

Using IVs to estimate CACE

Remember that I said we could use IV methods to estimate the CACE?
It works the same way as in one-sided non-compliance, but with the “no defiers” (monotonicity) assumption
We regress treatment receipt on assignment, and then use predicted treatment status to estimate outcome effects
The formula is the same: estimatr:::iv_robust(Y ~ D | Z, data = data)
The estimatr package is your friend here once again! 😉
Since we don’t live in the Stone Age any longer, we can use R to do all the hard work for us! 😂
The script is also avaliable in our GitHub repository
Let’s do it together, step-by-step, to make sure we all understand what the coefficients mean 🤓

IV estimation in R

ITT

First, let’s load the data and estimate the ITT
Remember that \(ITT_Y = 0.057\) when we calculated it manually?
We will use the lm_robust function to estimate the ITT
Can you explain what the coefficients mean?

library(estimatr)

# Load the data
df <- read.csv("./mullainathan.csv")

# Rename variables
ASSIGNED <- df$watch
TREATED <- df$watchdps
Y <- df$ochange

# Estimate ITT
itt_model <- lm_robust(Y ~ ASSIGNED)
summary(itt_model)


Call:
lm_robust(formula = Y ~ ASSIGNED)

Standard error type:  HC2 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|)  CI Lower CI Upper  DF
(Intercept)  0.41818    0.02219  18.843 5.371e-68  0.374632   0.4617 998
ASSIGNED     0.05707    0.03142   1.816 6.965e-02 -0.004595   0.1187 998

Multiple R-squared:  0.003293 , Adjusted R-squared:  0.002294 
F-statistic: 3.298 on 1 and 998 DF,  p-value: 0.06965

# Extract ITT for later
ITT <- coef(itt_model)[2]

ITT interpretation

Let’s see coefficients table mean:
(Intercept):
- Estimate: 0.41818
- The intercept is the predicted value of Y (opinion change) when ASSIGNED = 0 (control group)
- This is approximately the average rate of opinion change in the control group
- Looking back at our table, the “% Reporting change (total N)” in the control group was 41.8% or 0.418. Pretty good!

ASSIGNED:
- Estimate: 0.05707
- This is the coefficient for ASSIGNED
- It’s the estimated difference in the average value of Y (opinion change) between those assigned to the treatment condition and those who were not
- This is our regression estimate of the ITT!
- And it’s very close to the 0.057 we calculated by hand before
The results show that the p-value is close to 0.05, suggesting marginal statistical significance

IV estimation in R

ITT_D

So the first step is done! 🥳
Now let’s estimate the ITT_D
\(ITT_D = 0.204\), correct?
We will use the lm_robust function again, but this time with the TREATED variable

# Estimate ITT_D
itt_d_model <- lm_robust(TREATED ~ ASSIGNED)
summary(itt_d_model)


Call:
lm_robust(formula = TREATED ~ ASSIGNED)

Standard error type:  HC2 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper  DF
(Intercept)   0.1616    0.01656   9.759 1.503e-21   0.1291   0.1941 998
ASSIGNED      0.2047    0.02711   7.552 9.667e-14   0.1515   0.2579 998

Multiple R-squared:  0.05379 ,  Adjusted R-squared:  0.05284 
F-statistic: 57.03 on 1 and 998 DF,  p-value: 9.667e-14

# Extract ITT_D for later
ITT_D <- coef(itt_d_model)[2]

ITT_D interpretation

Let’s see coefficients table mean this time:
(Intercept):
- Estimate: 0.1625
- The intercept is the predicted value of TREATED when ASSIGNED = 0 (control group)
- This is approximately the average rate of watching the debate in the control group
- Looking back at our table, the “% Reporting change (total N)” in the control group was 16.2% or 0.162 indeed!

ASSIGNED:
- Estimate: 0.2041
- This is the coefficient for ASSIGNED = 1
- It’s the estimated difference in the average value of TREATED between those assigned to the treatment condition and those who were not
- This is our regression estimate of the ITT_D!
- So those who were assigned to the treatment group were 20% more watch the debate than those who were not
- The p-values are quite low this time, so the results are statistically significant

Putting it all together

IV estimation of CACE

Now that we have both ITT and ITT_D, we can calculate the CACE

# Calculate CACE
cace <- iv_robust(Y ~ TREATED | ASSIGNED)
summary(cace)


Call:
iv_robust(formula = Y ~ TREATED | ASSIGNED)

Standard error type:  HC2 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper  DF
(Intercept)   0.3731    0.04357   8.564 4.066e-17   0.2876   0.4586 998
TREATED       0.2787    0.15301   1.822 6.878e-02  -0.0215   0.5790 998

Multiple R-squared:  0.00992 ,  Adjusted R-squared:  0.008928 
F-statistic: 3.319 on 1 and 998 DF,  p-value: 0.06878

# Check value
ITT/ITT_D

 ASSIGNED 
0.2787494

CACE interpretation

Our final table indicates that:
(Intercept):
- Estimate: 0.41818
- This is the predicted value of Y when TREATED = 0 (control group)
- This is approximately the average rate of opinion change in the control group

TREATED:
- Estimate: 0.2787
- It’s the estimated difference in the average value of Y (opinion change) between those who were treated and those who were not
- This is our regression estimate of the CACE, our main estimate in this experiment
The p-value is above 0.05, so there is no statistical significance here

Now you’re experts in non-compliance 😎

Discussing the assumptions in this model

And how to deal with them in practice (in your own experiments!)

Let’s start with the monotonicity assumption
In this case, defiers would be people who watch the debate if encouraged not to, while not watching the debate if told to do so
This behaviour does seem a little weird, but it’s not impossible
Suppose some people get bored and just flip from one channel to another and end up watching the debate
Remember that, when we have defiers, we cannot estimate the CACE because \(ITT_D\) is not identified (it includes both compliers and defiers)
How big a problem is this in practice? It depends on the context

If the ratio of defiers is small, the bias introduced by defiers is small too and CACE is still a good estimate
So what to do?
- Gerber and Green argue that if you have a large sample size, the ATE amongst defiers will probably be close to that of compliers. Do you agree?
- Create treatments that more closely align with the preferences of the subjects
- Check if any reverse psychology or reason to resist the treatment exists
- Check if the control condition seems unfair or unattractive, so that people would resist them
- Finally, check if you have a very small \(ITT_D\), as this may indicate that defiers are not a big problem

Exclusion restriction

The exclusion restriction is another key assumption in this model
It states that the instrumental variable (Z) affects the outcome (Y) only through the treatment (D)
In our example, the assignment to watch the debate (Z) should only affect opinion change (Y) through watching the debate (D)
Oh well… This is a tough one
Imagine that people who watch the debate start reading more newspapers and become more informed, which leads to opinion change
In this case, the exclusion restriction is violated, and the IV estimate is biased

Another possible violation is when people are aware of observer’s presence and change their behaviour
This is a serious problem in survey experiments about sensitive topics, like voting intentions or drug use
In these cases, people may lie about their behaviour to avoid embarrassment or legal consequences
So the effect will be overestimated because people are more likely to report what they think the observer wants to hear
How to deal with this? Difficult to say!
In my experience, the best solution is to use objective measures whenever possible, that is, indicators that do not depend on self-reporting
Do you want to talk more about this? Let’s discuss! 😉

How would you improve the experiment? 🤓

Let’s discuss the debate study for a bit

How do you think we could lower the number of defiers in this study?
What about the exclusion restriction? How could we make sure that the IV only affects the outcome through the treatment?
Do you think that people’s behaviour would change if they knew they were being observed? How could we mitigate this problem?
Finally, do you think that the results are generalisable? Why or why not?
Think about your own experiments: How would you deal with these issues?

That’s all for today! 🥳

Summary

Today we discussed two-sided non-compliance, which is even more complex than one-sided non-compliance
We have four compliance types: Compliers, Never-takers, Always-takers, and Defiers
We also discussed the monotonicity assumption, which states that no one does the opposite of their assignment
We saw how to estimate the CACE using IV methods and how to interpret the results (again!)
Finally, we discussed the exclusion restriction and how to deal with it in practice
Do you have any questions? Let’s me know! 😉

Next class

Next class, we will discuss attrition (missing data in the outcome)
We will see how to deal with missing data in experiments and how to impute missing values (should you do it?)
Please also send me your ideas for the final project! I’m looking forward to hearing from you! 🤓
Thanks to all of you who have already done so! You’re awesome! 🥳

QTM 385 - Experimental Methods

Hi, there! Hope all is well! 😉

Brief recap 📚

Last class

Today’s plan 📋

Two-sided non-compliance

Two-sided non-compliance 🤔

Non-compliance: You already know it’s a problem

Now it will get worse 😂

Compliance types 📊

Four types of compliance

Four types of compliance

How to solve this? 🤔

Motivating example: Candidate debate study

Quantifying compliance

Estimating group sizes

Quantifying compliance

Numbers from the debate study

Monotonicity 📏

Monotonicity assumption

No defiers allowed! 😂

Monotonicity assumption

Simplifying the Model

Now the easy part, CACE estimation 🤓

Estimating CACE

Now we know what to do! 😉

ITT decomposition with no defiers

Back to our example 📊

Estimating CACE in the debate study

Let’s see how all this works in practice! 😉

Does it seem confusing? 🤔

Let’s break it down! 😉

Calculating CACE

Let’s do the maths again! 😉

Wait, there’s more! 🤓

Using IVs to estimate CACE

IV estimation in R

ITT

ITT interpretation

IV estimation in R

ITT_D

ITT_D interpretation

Putting it all together

IV estimation of CACE

CACE interpretation

Now you’re experts in non-compliance 😎

Discussing the assumptions in this model

And how to deal with them in practice (in your own experiments!)

Exclusion restriction

How would you improve the experiment? 🤓

Let’s discuss the debate study for a bit

That’s all for today! 🥳

Summary

Next class

Thank you! 🙏

See you next time! 😉

Hi, there!
Hope all is well! 😉