Two paragraphs (maximum) summarising an experiment that you wish to develop in this course. At a minimum, your summary should include a research question, why the question is important, and a rough outline of how you plan to answer the question.
We’ll be working on a little bit of the task each week during class, and I’ll be posting the week’s assignments on the website. It will be fun! 😊
Brief recap 📚
Last class
Clustering:
Assigns whole groups (e.g. classrooms) to treatment due to practical constraints
Introduces intra-cluster correlation (ICC) which increases variance
Requires cluster-robust standard errors and careful power calculations
The effective sample size is always smaller than the actual sample size, sometimes substantially so!
A few suggestions on how to deal with clustering:
Increase the number of clusters
Increase the number of units per cluster
Use pair-matching (or any type of blocking) to improve precision
Statistical power:
Power = Probability of detecting true effects (aim for ≥80%) Influenced by: effect size magnitude, outcome variability, sample size, and significance level
DeclareDesign package enables power simulation through:
Model declaration, treatment effect estimation, design diagnosis across sample sizes
You can use any power calculator to estimate power, but DeclareDesign allows for any type of design, what can be difficult to estimate using traditional formulas
Power curves show how power changes with sample size
Today’s plan 📋
One-sided non-compliance
One-sided non-compliance
Compliers versus never-takers
Intent-to-treat (ITT) effect versus average treatment effect (ATE)
Complier average causal effect (CACE) is the effect of the treatment on the compliers
Instrumental variables (IV) can be used to estimate the CACE
Two-stage least squares (2SLS) is the most common IV method
In experimental research, compliance is the extent to which participants follow the treatment assignment
Under full compliance, all participants follow the treatment assignment (and that’s what we want!)
Non-compliance occurs when participants do not follow the treatment assignment
In everyday language, compliance and non-compliance have a negative connotation, but in research, they are neutral terms
Non-compliance is a problem because it undermines the internal validity of the study
Today we will examine one-sided non-compliance, which is when units in the treatment group do not receive the treatment
Those in the control group are not affected by this issue
Next class, we will discuss two-sided non-compliance, which is when some people in the treatment group do not receive the treatment and some people in the control group do receive the treatment
This complicates analysis quite a bit, but we have methods to deal with it 🤓
Imagine that you are interested in studying the effect of canvassing on voter turnout
Maybe if you knock on people’s doors and talk to them about the importance of voting, they will be more likely to vote!
You design an experiment where you randomly assign 1000 to receive canvassing (treatment group) and 1000 to not receive canvassing (control group)
However… usually only 25% of the people in the treatment group are actually canvassed
The rest are not home, refuse to talk, etc.
So we have 250 people treated and 1000 in the control group
What would you do? 🤔
Option 01: Just compare the two groups
As-treated analysis
The first option we have is to just compare the two groups as if nothing had happened
So we would compare the 1000 people who were in the treatment group with the 1000 people who were in the control group
Then calculate the average treatment effect (ATE) as the difference between the two groups as we always do
What do you think? 🤔
Option 01: Just compare the two groups
As-treated analysis
The problem with this approach is that it undermines the internal validity of the study
The random assignment is no longer valid because the treatment group is not receiving the treatment
We are assuming that the effect of canvassing is zero for the 750 people who were not canvassed
There might be selection bias in the treatment group
For example, maybe the people who were canvassed are more likely to vote anyway
People who refuse to talk to canvassers might be less likely to vote, and so on
So this approach is not recommended 👎
Option 02: Assume random compliance
As-treated analysis
The second option, related to the first, is to assume that the differences between the two groups are random
In other words, we assume that the people who were canvassed are randomly selected from the treatment group
And the fact that only 25% of the people were canvassed is just bad luck
If this is the case, we can drop the people who were not canvassed from the treatment group and compare the 250 people who were canvassed with the 1000 people in the control group
This would be able to recover the true ATE if the assumption is correct
What do you think? 🤔
Option 02: Assume random compliance
As-treated analysis
The problem with this approach is that we cannot test the assumption
We cannot know if the differences between the two groups are random or not
Most likely they are not!
Unless you can really justify that the differences are random, this approach is not recommended 👎
But if you can justify it (good luck with that! 😂), this is okay!
Option 03: Redefine the ATE
Just give people the choice
The third option is to stick to the random assignment and compare the two groups as if everyone had followed the treatment assignment
Instead of comparing the people who were actually canvassed with those who were not canvassed, we compare the people who were assigned to be canvassed with those who were not assigned to be canvassed
The difference here is semantic:
We would be able to recover the true ATE if we had only given people the choice of whether or not to be canvassed
For instance, rather than analysing the effect of Medicaid on health outcomes, we would be analysing the effect of being offered Medicaid on health outcomes
In this definition, non-compliance is impossible
What do you think?
Option 03: Redefine the ATE
Just give people the choice
The problem with this approach is that it underestimates the treatment effect
The average treatment effect is the difference between the outcome of the people who were actually canvassed and the outcome of the people who were not canvassed
But the this analysis compares the outcome of the people who were assigned to be canvassed with the outcome of the people who were not assigned to be canvassed
This is not the same as receiving the treatment or not!
So this approach is not recommended either 👎
Option 04: Instrumental variables (IV)
A clever way to deal with non-compliance
The fourth option is to use instrumental variables (IV) (or two-stage least squares - 2SLS)
The benefit of IV is that it allows us to recover the true effect of the programme instead of only the effect of being offered the programme
The downside is that IV does not allow us to recover the true ATE in the whole population
This is because it only measures the effect of the programme on the compliers, that is, those who took the treatment when it was assigned to them
This is the best approach 👍
But first, some definitions and concepts… 🤓
New definitions and assumptions 🤓
Full compliance
In an ideal experiment, we randomly assign each user to a treatment or a control group
All users in the treatment group experience the treatment, and all users in the control group do not experience the treatment
The table below summarises full compliance:
Random assignment (\(Z\))
Treatment status (\(D\))
Treatment
Treated
Control
Untreated
For the next slides, it is useful to introduce some definitions:
\(Z \in \{0, 1\}\) indicates whether a user was assigned to the treatment or the control group (visited by a canvasser or not)
\(D \in \{0, 1\}\) indicates whether a user was treated (actually heard the message)
\(Y\), as always, is the outcome we care about (voter turnout)
In this case, the treatment effect is the difference between the potential outcomes of the treated and untreated users, as we have seen before
One-sided non-compliance
In the case of one-sided non-compliance, some users in the treatment group do not receive the treatment
The table below summarises the situation:
Random assignment (\(Z\))
Treatment status (\(D\))
Treatment
Treated
Untreated
Control
Untreated
In this case, the quantity \(E = [Y| Z=1] - [Y|Z=0]\) does not represent the treatment effect anymore
Instead, it represents the effect of being assigned to the treatment group only, i.e., the intent-to-treat (ITT) effect
Let’s formalise this a bit more…
One-sided non-compliance
Notation
Let the experimental assignment of subject \(i\) be \(z_i\)
When \(z_i = 1\), the subject is assigned to the treatment group, and when \(z_i = 0\), the subject is assigned to the control group
Let \(d_i(z)\) represent whether subject \(i\) is actually treated, given the assignment \(z_i\)
To make it short, let’s write \(d_i(z = 1)\) as \(d_i(1)\) and \(d_i(z = 0)\) as \(d_i(0)\)
If a subject receives no treatment when assigned to the control groups, we represent them as \(d_i(0) = 0\)
For one-sided non-compliance, \(d_i(0)\) is always 0 for all people in the control groups, but \(d_i(1)\) can be 0 or 1
If \(d_i(1) = 1\), I would open the door if canvassed, but if \(d_i(1) = 0\), I would not open the door
Compliers and never-takers
Two new groups
In the case of one-sided non-compliance, we have two new groups of analysis
Compliers are those who would take the treatment if assigned to the treatment group and would not take the treatment if assigned to the control group
So, \(d_i(1) = 1\) and \(d_i(0) = 0\)
However, we also have a group of people who would not take the treatment even if assigned to the treatment group
These are the never-takers
For them, \(d_i(1) = d_i(0) = 0\)
Thus, the expression \(ATE | d_i(1)\) means the average treatment effect (ATE) for the compliers
Keep in mind that the names “compliers” and “never-takers” are unrelated with the outcomes \(Y_i\), just with the treatment assignment \(d_i(z)\)
It is not always easy to define who is a complier in an experiment
What if canvassing is done in the weekends but some people are at home only during the week? Compliers or never-takers?
If we canvass them during the week instead, are they compliers or never-takers?
First assumption: Non-interference
The first assumption we need to make is that of non-interference
Non-interference means that whether a subject is treated depends only on the subject’s own treatment group assignment
This assumption is strong, difficult to test, and often violated
The intent-to-treat (\(ITT\)) effect of assignment (\(z\)) on treatment status (\(d\)) is defined as:
\[ ITT_{i, D} = d_i (1) - d_i (0) \]
If everyone complies perfectly, then \(d_i(1)\) will be 1 and \(d_i(0)\) will be 0, so the difference is 1
The average \(ITT_{i, D}\) across all subjects is
\[ITT_D = E[ITT_{i, D}] = E[d_i(1)] - E[d_i(0)]\]
That is, the proportion of people who take the treatment when assigned to the treatment group minus the proportion of people who take the treatment when assigned to the control group
In one-sided non-compliance, \(E[d_i(0)] = 0\) for all subjects, so \(ITT_D = E[d_i(1)] \geq 0\)
ITT effect on the outcome
The intent-to-treat effect of \(z_i\) on \(Y_i\) for each subject is:
If we have full compliance, \(ITT_{Y}\) is the same as the average treatment effect (ATE)
If not, \(ITT_{Y}\) is the intent-to-treat (ITT) effect: whether a programme “made a difference” in the outcome, regardless of whether people actually took the treatment
Second assumption: Exclusion restriction
The second assumption we need to make is that of the exclusion restriction
The exclusion restriction means that the only way the treatment assignment (\(z\)) affects the outcome (\(Y\)) is through its effect on whether people actually get the treatment (\(d\))
In other words, untreated subjects have the same potential outcomes regardless of their assignments:
\(Y_i(z = 0, d(0)) = Y_i(z = 0, d(1))\)
And the same is true for treated subjects:
\(Y_i(z = 1, d(1)) = Y_i(z = 1, d(0))\)
In general:
\(Y_i(z, d) = Y_i(d)\)
This assumption is also strong, and the main reason why we have placebos in science!
CACE and IVs 🤓
Complier average causal effect (CACE)
The effect of the treatment on the compliers
As we cannot correctly estimate the ATE with non-compliance, we focus on the complier average causal effect (CACE)
CACE tries to answer this question: “For those individuals who actually heard the message, what is the effect of the message on their likelihood of voting?”
# Select one-person households that were either pure controls or canvass onlysel <- data1$onetreat==1& data1$mailings==0& data1$phongotv==0& data1$persons==1# Verify the number of observationstable(sel)
First stage: \(ITT_d\) with robust standard errors
The intercept of zero in this equation indicates that no one in the control group was contacted, in keeping with the definition of one-sided noncompliance
The coefficient 0.273 indicates that assignment to the treatment group caused 27.3% of the targeted subjects to be treated
In other words, the estimated share of Compliers in the treatment group is 27.3%. The 95% CI suggests that this proportion ranges from 25.0% to 29.6%
# Load the required packageslibrary(AER) # For IVlibrary(sandwich) # For robust SEs# Box 5.5: ITT_D# Note that results from this will vary from the bookitt_d_fit <-lm(TREATED ~ ASSIGNED, data = data2)coeftest(itt_d_fit, vcovHC(itt_d_fit))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.5358e-14 1.6258e-16 94.464 < 2.2e-16 ***
ASSIGNED 2.7336e-01 1.1733e-02 23.299 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Estimating the regressions
\(ITT_Y\) with robust standard errors
Here we estimate the ITT of the whole population
This model accounts for the possibility that ASSIGNED is not a perfect measure of treatment status
It can be “endogenous”, that is, related to unobserved factors (\(u_i\)) that affect outcomes
Those assigned to the treatment group were 3.84 percentage points more likely to vote
The estimated ITT may be a useful thing to know!
If you are conducting an evaluation of a programme, you can use the ITT to assess the programs output in relation to its costs
# Box 5.4: ITT with robust SEsitt_fit <-lm(VOTED ~ ASSIGNED, data = data2)coeftest(itt_fit, vcovHC(itt_fit))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.375376 0.006446 58.2344 < 2.2e-16 ***
ASSIGNED 0.038464 0.014479 2.6565 0.007914 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Estimating the regressions
CACE using 2SLS
Finally, here we estimate the CACE
It is the effect of the treatment on the compliers
So we could just have used the formula: \(CACE = \frac{ITT_Y}{ITT_D} = \frac{0.038464}{0.2734} \approx 0.1407\)
The estimated average treatment effect of the canvassing treatment among Compliers is a 14.07 percentage point increase in the probability of voting
We could have estimated the CACE using the ivreg function from the AER package and gotten the same result
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.375376 0.006446 58.2344 <2e-16 ***
TREATED 0.140711 0.052434 2.6836 0.0073 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Estimating the regressions
Using estimatr
There’s no need to learn how to use ivreg if you don’t want to!
Our familiar estimatr package has a function called iv_robust that does the same thing
The results are the same as before, and we also see that the 95% confidence interval ranges from 0.038 to 0.24, that is, canvassing increses the probability of voting by 3.8 to 24 percentage points
The effect is positive and statistically significant
# Box 5.6: CACE # Load estimatrlibrary(estimatr)# CACE with estimatrcace_fit2 <-iv_robust(VOTED ~ TREATED | ASSIGNED, data = data2)cace_fit2
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper
(Intercept) 0.3753764 0.00644539 58.239525 0.000000000 0.36274155 0.3880113
TREATED 0.1407115 0.05241688 2.684469 0.007281396 0.03795877 0.2434642
DF
(Intercept) 7088
TREATED 7088
Designs that antecipate non-compliance 🤓
Large-\(n\) designs
Non-compliance not only prevents us from estimating the true ATE, it also makes CACE estimation more challenging.
While 2SLS is a consistent estimator for the CACE, the estimator becomes much less precise if the proportion of compliers is small
So the first advice is to design experiments with large sample sizes to increase the number of compliers
This is not always feasible, though, as it can be expensive
But if you can, do it! 😊
Placebo designs
A more realistic approach is to anticipate non-compliance and include placebo conditions in the experiment
This is done in two steps:
First, subjects are recruited to the study and assigned to treatment and control groups
Second, given compliance, subjects are randomly allocated to two groups:
The treatment group receives the treatment in the usual way
The placebo group receives a “non-treatment” that is assumed to have no effect on the outcome of interest
For instance, we could have a placebo group that receives a fake canvassing treatment, such as information about the importance of recycling or the benefits of exercise
CACE can be estimated by comparing the outcomes for those given the canvassing treatment and those given the “non-treatment”
Placebo designs
Why does this work?
Because the main problem in one-sided compliance is the existence of never-takers
But if we randomise the treatment amongst the compliers, we screen-out the never-takers by design
Compliers in the treated state can then be compared directly to Compliers in the untreated state, which eliminates the noise generated by the never-takers
Thus, we are back to full compliance and can estimate the true ATE!
This is a very powerful tool in experimental research
Think about it when designing your experiments! 😊
Partial treatment
Finally, what to do when we have partial treatment?
For instance, a subject interrupts the medical treatment before the end
The easiest and most widely used approach is to classify the partially-treated subject as untreated, estimate the CACE, and then classify the subject as treated and estimate the CACE again
Those two estimates provide bounds for the CACE
The lower bound is the estimate when the subject is classified as treated
The upper bound is the estimate when the subject is classified as untreated
While not perfect, this strategy at least provides a range of possible values for the CACE and allows us to quantity the uncertainty in our estimates
Conclusion 📚
Conclusion
Non-compliance is a big problem in experimental research
One-sided non-compliance is when units in the treatment group do not receive the treatment
We have seen that we have several options to deal with non-compliance, but the best one is to use instrumental variables (IV)
IV allows us to estimate the complier average causal effect (CACE), which is the effect of the treatment on the compliers
We have also seen that large-\(n\) designs and placebo designs can help anticipate non-compliance
Next class, we will discuss two-sided non-compliance, which is when some people in the treatment group do not receive the treatment and some people in the control group do receive the treatment