QTM 385 - Experimental Methods

Lecture 11 - Attrition

Danilo Freire

Emory University

Hello, everyone!
How are you doing today? 😉

Brief recap 📚

Last class

  • Two-sided non-compliance occurs when both treatment and control groups experience imperfect compliance
  • Four compliance types:
    • Compliers: Follow treatment assignment
    • Never-takers: Never receive treatment
    • Always-takers: Always seek treatment
    • Defiers: Do opposite of assignment (rare)
  • Monotonicity assumption required (no defiers) for identification
  • IV methods estimate CACE: \(\tau_{LATE} = \frac{ITT_Y}{ITT_D}\)

Today’s plan 📋

Attrition in experiments

Another missing data problem

  • Attrition: Loss of participants before study completion
  • Two main types:
    • Random attrition: Missingness unrelated to treatment/outcome
    • Non-random/Systematic attrition: Missingness correlates with variables
  • Impacts:
    • Reduced statistical power 📉
    • Potential selection bias threat to validity
    • Compromised generalisability 🌐

Handling Attrition

  • Analysis methods:
    • Inverse probability weighting (IPW)
    • Extreme bounds analysis (EBA)
    • Lee bounds (trimming the bounds)
  • IPW estimates ATE when missingness is related to observed variables
  • EBA is a robustness check that evaluates the sensitivity of the ATE to different assumptions about the missing data
  • Lee bounds are a way to trim the bounds to make them more informative

Attrition in Experiments 🧩

Attrition in experiments

Missing outcome data

  • Attrition is, unfortunately, another common issue in experiments
  • When attrition is correlated with treatment, it can bias estimates
  • Several factors can lead to attrition:
    • Non-compliance: Intervention too burdensome, unfair, or ineffective
    • Survey fatigue: Participants lose interest over time
    • Unforeseen events: Health issues, job changes, etc.
    • Data collection errors: Intentional or not
  • Attrition forces scholars to make assumptions about missing data
  • The main one being that missingness is ignorable, that is, not related to the outcome
  • Another approach is to assume that missingness is related to observed variables
  • In this case, some statistical methods can be used to model the missingness mechanism
  • Finally, we can try to collect more data to reduce attrition, but this is not always feasible

Motivating example: RAND Health Insurance Experiment

Does healthcare insurance improve health outcomes?

  • RAND Health Insurance Experiment (HIE) was a large-scale study in the 1970s that randomly assigned families to different insurance plans
  • The goal was to assess the impact of insurance on health outcomes
  • The experiment covered 5%, 50%, 75%, or 100% of each family’s health costs, with the remaining covered by the family
  • to insure families in the first three experimental groups against catastrophic financial loss, costs over $1,000 were covered by the experimental insurance plan
  • The study found that insurance coverage did not significantly affect health outcomes
  • The conclusion was that public health insurance was not cost-effective, thus the government did not need to expand it
  • Overall, the study cost about $550 million in today’s dollars 😮
  • However, the study had high attrition rates, which may have biased the results!

Health expenditure and health outcomes in the RAND HIE

Column (1) shows the average for the group assigned catastrophic coverage. Columns (2)–(5) compare averages in the deductible, cost-sharing, free care, and any insurance groups with the average in column (1).

Source: Angrist and Piscke (2014, 19)

Attrition in RAND HIE

  • Statistical analyses found out that researchers excluded people who dropped out
  • Initial refusal rates differed significantly:
    • 8% of those assigned to the free care group refused to enroll after the baseline survey
    • 25% of those assigned to cost-sharing plans refused to enrol after the baseline survey.
  • Among those who enrolled, further attrition occurred during the 3- to 5-year study
  • Withdrawal was more common in experimental groups where subjects had to pay for health services
  • Voluntary attrition rates varied by plan:
    • 0.4% in the free plan group
    • 6.7% in the cost-sharing plan groups
  • Subjects in cost-sharing conditions who anticipated serious health problems might have dropped out to maintain existing coverage
  • What issues do you see with this attrition pattern?

Some notation 📝

Attrition in experiments

Using the potential outcomes framework

  • You’re all familiar with the potential outcomes framework by now 🤓
  • Remember our variables from last class?
    • \(Y_i(0)\) and \(Y_i(1)\): Potential outcomes for individual \(i\)
    • \(z_i\): Treatment assignment
    • \(d_i\): Actual treatment received by individual \(i\)
    • \(Y_i\): Observed outcome for individual \(i\)
  • Here, let’s first assume that there is full compliance with treatment assignment, that is, \(d_i = z_i\)
  • So far, so good? 😊
  • Now, let’s introduce attrition to the mix and use a similar notation

Attrition in experiments

Some new notation

  • Let \(r_i\) (for reported) be an indicator for whether individual \(i\) remains in the study
  • \(r_i(z)\): This is a new potential outcome related to attrition
  • It represents whether the outcome for person \(i\) is reported (i.e., they stay in the study and we get their data) if they were assigned to treatment \(z\)
  • \(r_i(1)\): Whether the outcome for person \(i\) would be reported if they were in the treatment group
    • \(r_i(1) = 1\): Outcome is reported (person stays in the study)
    • \(r_i(1) = 0\): Outcome is missing (person drops out).
  • Similarly, \(r_i(0)\): Whether the outcome for person \(i\) would be reported if they were in the control group
    • \(r_i(0) = 1\): Outcome is reported (person stays in the study)
    • \(r_i(0) = 0\): Outcome is missing (person drops out)

\[r_i = r_i(0)(1 - z_i) + r_i(1)z_i \]

  • If they are in the control group, their reporting outcome is just \(r_i(0)\), and if they are in the treatment group, it is \(r_i(1)\)
  • Attrition occurs when some values are \(r_i = 0\)!
  • A bit formal, but just stating the obvious 😅

When does bias occur?

  • Bias arises if the characteristics of those who stay in the study \((R_i(z) = 1)\) are systematically different from those who drop out (\(R_i(z) = 0)\), and if these differences are related to the potential outcomes \(Y_i(z)\)
  • In other words, if attrition is non-random, we have a problem!
  • We are essentially missing a piece of the puzzle (the dropouts), and that missing piece can distort our view of the true treatment effect
  • Sadly, if there’s non-random attrition, the difference-in-means estimator not recover the ATE for the entire subject pool, and it will not recover the ATE for any meaningful subgroup either!
  • Why?
    • Attrition might be non-random within each of these baseline subgroups
    • For example, maybe younger people in the treatment group drop out more than older people, but the opposite can be true in the control group!

Special cases of attrition

Special cases of attrition

Missing independent of potential outcomes

  • Random attrition is the best-case scenario!
  • This claim is usually not verified directly, but rather assumed
  • Formally,

\[Y_i(z) \perp R_i(z)\]

  • In some cases this may be true, such as a computer malfunction that causes data loss to some participants
  • We can actually test that using randomisation inference!
    • Just regress \(r_i\) on the covariates and treatment assignment and you should find no significant relationship
  • If attrition is random, we can ignore it and proceed with our analysis
  • It still has issues, though:
    • Reduced power due to smaller sample size
    • Generalisability issues
  • But at least we can estimate the ATE without bias 😅
  • What are some other examples of random attrition? Can you think of any?

Special cases of attrition

  • Since random attrition is rare, we often have to deal with non-random attrition
  • A special type if missing independent of potential outcomes given X, or \(MIPO | X\)
  • It is formally defined as:

\[Y_i(z) \perp R_i(z) | X_i\]

  • You can include more covariates in \(X_i\) if you think they are related to attrition
  • Suppose that students with poor attendance are both more likely to be missing on the day of the assessment and more likely to benefit from the intervention
  • \(MIPO|X\) means that if one partitions the experimental sample by prior attendance, missingness is random within each subgroup
  • There is, among students whose record of attendance is poor, there is no relationship between missing school on the day of the assessment and the subjects potential outcomes
  • The same goes for students with a good record of attendance

Inverse probability weighting

Inverse probability weighting

  • We can compensate for missing data by weighting the observed data
  • The idea is to upweight the observations that are similar to the missing ones
  • Let me give you a simple example:
    • Suppose we have 40 people in our experiment
    • 30 are men and 10 are women
    • The overall average would be \(\frac{30}{40} \times\) men’s average + \(\frac{10}{40} \times\) women’s average
  • Now assume that 15 out of the 30 men drop out
  • “Controlling for gender” (i.e., weighting):
    • The remaining 15 men produce an unbiased estimate of the average amongst 30 men
    • So we can just count them twice!
    • This is the essence of inverse probability weighting (IPW)

Inverse probability weighting

The formula

  • IPW estimates ATE when \(MIPO|X\) holds
  • To estimate ATE, we need \(E[Y_i(1)]\) and \(E[Y_i(0)]\) (as you know!)
  • When \(MIPO|X\) holds, \(E[Y_i(1)]\) is weighted average:

\[E[Y_i(1)] = \frac{1}{N} \sum_{i=1}^{N} \frac{Y_i(1) r_i(1)}{\pi_i(z = 1, x)},\]

  • \(\pi_i(z = 1, x)\): share of non-missing subjects among treated with covariate profile \(x\)
  • Missing outcomes have no effect on sum
  • Reported outcomes weighted by \(1/\pi_i(z = 1, x)\)
  • Weights replace missing values with copies of non-missing values
  • Weighting scheme called inverse probability weighting because observations weighted by inverse probability of being observed

Motivating example for IPW

  • Treatment: Educational programme at a community centre
  • \(X_i\): Covariate indicating proximity to the center
    • \(X_i = 1\): Lives near the community center
    • \(X_i = 0\): Lives far away
  • Follow-up evaluation at the community center
  • Attrition is related to \(X_i\): People living far are less likely to show up
  • Potential outcomes differ by \(X_i\): People living far away have higher potential outcomes
  • Let’s see how it can recover ATE when \(MIPO|X\) holds!

Motivating example for IPW

Potential outcomes and covariates

Observation \(Y_i(0)\) \(Y_i(1)\) \(r_i(0)\) \(r_i(1)\) \(Y_i(0) \vert r_i(0)\) \(Y_i(1) \vert r_i(1)\) \(X_i\)
1 3 4 1 1 3 4 1
2 4 7 1 1 4 7 1
3 3 4 1 1 3 4 1
4 4 7 1 1 4 7 1
5 10 14 0 0 Missing Missing 0
6 12 18 0 0 Missing Missing 0
7 10 14 1 1 10 14 0
8 12 18 1 1 12 18 0

Motivating example for IPW

Calculation of ATE

  • ATE with complete data: \(E[Y_i(1)] - E[Y_i(0)] = 3.5\)
  • Relevant proportions of non-missingness:
    • \(\pi_i(z = 1, x = 1) = 1\)
    • \(\pi_i(z = 1, x = 0) = 1\)
    • \(\pi_i(z = 0, x = 1) = 1\)
    • \(\pi_i(z = 0, x = 0) = 0.5\)
  • Applying the weighted average formula:

\[E[Y_i(1)] = (\frac{1}{8}) (\frac{4}{1} + \frac{7}{1} + \frac{4}{1} + \frac{7}{1} + \frac{14}{0.5} + \frac{18}{0.5}) = 10.75,\]

\[E[Y_i(0)] = (\frac{1}{8}) (\frac{3}{1} + \frac{4}{1} + \frac{3}{1} + \frac{4}{1} + \frac{10}{0.5} + \frac{12}{0.5}) = 7.25,\]

\[E[Y_i(1)] - E[Y_i(0)] = 3.5.\]

  • IPW recovers the true ATE! 🎉

Downsides of IPW

When \(MIPO|X\) is incorrect

  • Incorrect \(MIPO|X\) assumption leads to misleading estimates
  • IPW gives largest weights to high-attrition subgroups
  • Biased subgroup estimate can disproportionately influence overall ATE
  • IPW average may be more biased than unweighted data
  • Re-weighting increases sampling variability
    • Extra weight on subsamples with many missing observations
  • \(MIPO|X\) discussion: hypothetical scenarios with known potential outcomes
  • In practice, researchers must make assumptions about attrition
  • Evaluating \(MIPO|X\): detective work and speculation 🕵️‍♂️

ATE Bounds

Extreme bounds analysis

A robustness check for attrition

  • Now, let’s analyse the worst case scenario: when attrition is so severe that it invalidates the \(MIPO|X\) assumption
  • In this case, there is no clear way to recover the ATE
  • But we can do something about it!
  • Extreme bounds analysis (EBA - Manski bounds) is a robustness check that evaluates the sensitivity of the ATE to different assumptions
  • The idea is to bound the ATE by the most extreme assumptions about the missing data
    • That is, we assume the best and worst-case scenarios for the missing data
  • Sensitivity methods are a growing area of research in causal inference, and EBA is one of the most popular methods

Extreme bounds analysis

The procedure

  • Imagine the missing outcomes from those who dropped out

  • We don’t know what those outcomes are, but we can consider two extreme scenarios:

  • Best-case scenario: Assume the dropouts in the treatment group would have had the best possible outcomes, and dropouts in the control group would have had the worst possible outcomes (relative to the observed data)

  • Worst-case scenario: Assume the dropouts in the treatment group would have had the worst possible outcomes, and dropouts in the control group would have had the best possible outcomes

  • By calculating the ATE under these extreme assumptions, we get a range (upper bound and lower bound) that (hopefully 😅) contains the true ATE

A simple example

  • Suppose our programme has the following outcomes if there is no attrition:
    • Average for the treatment group: \(\frac{(7 + 10 + 6 + 6)}{4} = \frac{29}{4} = 7.25\)
    • Average for the control group: \(\frac{(3 + 7 + 5 + 6)}{4} = \frac{21}{4} = 5.25\)
    • ATE: \(7.25 - 5.25 = 2\)
  • Now, suppose that we only observe the following outcomes:
    • Average for the treatment group: \(\frac{(7 + 10 + ? + ?)}{4} = ?\)
    • Average for the control group: \(\frac{(? + 7 + 5 + 6)}{4} = ?\)
  • Assume that out collected outcomes show that the possible values for the missing data are between 0 and 10
  • Let’s see how to calculate the bounds for the ATE!

A simple example

  • To find the upper bound on the treatment effect estimate, substitute 10 for the missing values in the treatment group and 0 for the missing value in the control group

  • Upper bound: \(\frac{(7 + 10 + 10 + 10)}{4} - \frac{(0 + 7 + 5 + 6)}{4} = \frac{37}{4} - \frac{18}{4} = 4.75\)

  • To find the lower bound on the treatment effect estimate, substitute 0 for the missing values in the treatment group and 10 for the missing value in the control group

  • Lower bound: \(\frac{(7 + 10 + 0 + 0)}{4} - \frac{(10 + 7 + 5 + 6)}{4} = \frac{17}{4} - \frac{28}{4} = -2.75\)

  • So, the ATE is between -2.75 and 4.75… and indeed it is (2)! 🎉

  • But as you can see, the bounds are pretty wide

Trimming the bounds with stronger assumptions

Monotonicity revisited

  • Lee (2009) suggested that we can trim the bounds to make them more informative (and narrower)
  • The key assumption of Lee is monotonicty, similar to what we discussed in the context of non-compliance
  • Lee proposed that we have four types of respondents, defined with respect to specific post-treatment outcomes (some observed, others not):
    • Never-reporters: Those who would never report their outcomes (\(R(1) = 0\) and \(R(0) = 0\))
    • Always-reporters: Those who would always report their outcomes (\(R(1) = 1\) and \(R(0) = 1\))
    • If-treated reporters: Those who would report their outcomes if treated (\(R(1) = 1\) and \(R(0) = 0\))
    • If-untreated reporters: Those who would report their outcomes if in the control group (\(R(1) = 0\) and \(R(0) = 1\))
  • The assumption is that there are no if-untreated reporters, similar to the “no defiers” assumption in non-compliance

Trimming the bounds with stronger assumptions

Monotonicty revisited

  • Under monotonicity, we can bound the average treatment effect for Always-Reporters
  • For them, the ATE is defined as:

\[E[Y_i(1) | R_i(1) = 1, R_i(0) = 1] - E[Y_i(0) | R_i(1) = 1, R_i(0) = 1]\]

  • That is, we observe \(Y(0)\) only for the Always-Reporters, as If-Treated-Reporters and Never-Reporters produce only missing values, and under monotonicity we assume there are no If-Untreated-Reporters
  • It sounds a bit confusing, because it is! 😅
  • The difficult part is to estimate the ATE for Always-Reporters, as we observe \(Y(1)\) for both Always-Reporters and If-Treated-Reporters
  • We can estimate the share of Always-Reporters by assuming that the share of Always-Reporters in the control group is the same as in the treatment group
  • The difference is the share of If-Treated-Reporters in the treatment group, we can use the lower and upper bound of the distribution of outcomes to estimate the ATE for Always-Reporters

Trimming the bounds with stronger assumptions

  • We can bound the effect for always-responders by taking the means of trimmed versions of the treated outcome distribution
  • A lower bound comes from trimming off the share of compliers from the top of the distribution, and an upper bound comes from trimming this share from the bottom
  • There are several R packages that calculate Lee bounds
  • Although this is a good solution to attrition, it has received some criticism too
  • For instance, McKenzie (2024) argues that it’s not clear how to use Lee bounds with binary variables, that incorporating covariates isn’t straightforward (paper proposing a solution here), and that bounds are not easy to use with clustered data

Handling attrition in R

Handling attrition in R

attritevis package

  • The attritevis package has several useful functions to visualise the attrition process, calculate the bounds, and estimate the ATE
  • It has very few assumptions:
    • Data must be ordered by survey questions, i.e. if respondents answered Q1 before Q2, the variable Q1 must appear before Q2 (i.e. in an earlier column) in the dataframe
    • Attrition is defined as completely leaving the survey, not just skipping a question.
    • For balance tests, treatment and control conditions must be defined
  • The package is available on GitHub: https://github.com/lbassan/attritevis

An empathy experiment

  • The experiment manipulates peer-praise and measures empathy in a behavioral task
  • There are two arms in the peer-praise randomisation: peer-praise and no praise (control)
  • In the first arm, a word cloud of praise, drawn from real praise collected in a pilot study, is given for people who behave empathetically, with a line of text about peer group average thermometer ratings towards people who are empathetic
    • “Peers of yours on this platform have said they hold favorable feelings towards people who engage in empathetic behavior, with an average of 7.9, on a scale of 0 (least favorable) to 10 (most favorable), That same peer group provided real feedback for empathetic behavior which is pictured in the word cloud below”
  • Respondents in the control condition do not receive any additional information

Installing the package and loading a dataset

  • To install and load the package, run the following code:
# devtools::install_github("lbassan/attritevis", dependencies = TRUE)
library(attritevis)
library(ggrepel)

test_data <- read.csv("./test_data.csv")
names(test_data)
 [1] "X"              "consent"        "age"            "sex"           
 [5] "education"      "state"          "income"         "part_id"       
 [9] "race"           "religion"       "attrition_1"    "attrition_2"   
[13] "cards_a"        "pa"             "pb_1"           "pb_2"          
[17] "pb_3"           "pc"             "cards_b"        "p2a"           
[21] "p2b_1"          "p2b_2"          "p2b_3"          "p2c"           
[25] "treat1"         "Happy_1_1"      "Happy_1_2"      "Happy_1_3"     
[29] "cards1"         "X1a"            "X1b_1"          "X1b_2"         
[33] "X1b_3"          "X1c"            "treat2"         "Happy_2_1"     
[37] "Happy_2_2"      "Happy_2_3"      "cards2"         "X2a"           
[41] "X2b_1"          "X2b_2"          "X2b_3"          "X2c"           
[45] "treat3"         "Happy_3_1"      "Happy_3_2"      "Happy_3_3"     
[49] "cards3"         "X3a"            "X3b_1"          "X3b_2"         
[53] "X3b_3"          "post1"          "post2_7"        "post3"         
[57] "post4"          "post5"          "post6"          "post7"         
[61] "post8"          "post9"          "post10"         "post11_1"      
[65] "post11_8"       "post13_1"       "post14_1"       "post15_1"      
[69] "post16_1"       "post17"         "ideology"       "trump_approval"
[73] "pres_approval" 

Attrition dataframe

  • The attrition() function creates a frame that indicates, per question:
    • attrited – how many respondents attrited (left the survey) at each question
    • proportion – number of attrited respondents / number of respondents who entered survey
    • prop_q – number of attrited respondents / number of respondents entering into the question
    • questions – question names
    • responded – how many respondents responded in each question
    • prop_r – number of respondents who responded / number of respondents who entered survey
attrition_data <- attritevis::attrition(test_data)
head(attrition_data, 20)
   attrited prop_q proportion   questions responded prop_r
1         0   0.00       0.00           X       624   1.00
2         0   0.00       0.00     consent       624   1.00
3         3   0.00       0.00         age       621   1.00
4         0   0.00       0.00         sex       618   0.99
5         0   0.00       0.00   education       621   1.00
6         0   0.00       0.00       state       621   1.00
7         0   0.00       0.00      income       621   1.00
8         0   0.00       0.00     part_id       621   1.00
9         0   0.00       0.00        race       621   1.00
10        0   0.00       0.00    religion       621   1.00
11        1   0.00       0.00 attrition_1       620   0.99
12        6   0.01       0.01 attrition_2       614   0.98
13       37   0.06       0.06     cards_a       577   0.92
14        0   0.00       0.00          pa       553   0.89
15        0   0.00       0.00        pb_1       536   0.86
16        0   0.00       0.00        pb_2       536   0.86
17        0   0.00       0.00        pb_3       536   0.86
18        0   0.00       0.00          pc       534   0.86
19        0   0.00       0.00     cards_b       534   0.86
20        0   0.00       0.00         p2a       522   0.84
sum(attrition_data$attrited) #How many respondents attrited overall?
[1] 129
sum(attrition_data$attrited)/nrow(test_data) #What proportion of the overall sample is this?
[1] 0.2067308

Visualising the attrition process

  • The plot_attrition() function shows the attrition process for the post-treatment questions
plot_attrition(test_data)  

Visualising the attrition process

  • We can specify y="resonded" to account for response, rather than attrition
plot_attrition(test_data, y="responded")  

Visualising the attrition process

  • With treatment_q, we can specify the treatment question, and with outcome_q, we can specify the outcome questions
attritevis::plot_attrition(test_data,
              y = "responded",
              outcome_q = c("cards1", "cards2",  "cards3"),
              treatment_q = "treat1",
              mycolors = c(treatment = "#000066",
                           control = "#CC0033"))

Balance tests

  • The balance_cov() function calculates the balance of covariates between the treatment and control groups
attritevis::balance_cov(data = test_data, 
        treatment = "treat1", 
        question = "age")

    Welch Two Sample t-test

data:  treat_data$question1 and control_data$question1
t = -0.32688, df = 568.57, p-value = 0.7439
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.002600  1.431137
sample estimates:
mean of x mean of y 
 37.42361  37.70934 

Balance tests

  • We can also use the function balance_cov() when the covariate (question) is a factor, but we must specify which factor we are interested in
attritevis::balance_cov(data = test_data, 
        treatment = "treat1", 
        question = "sex",
        factor = TRUE,
        factor_name = "female")

    2-sample test for equality of proportions with continuity correction

data:  x out of n
X-squared = 1.1305, df = 1, p-value = 0.2877
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.12931498  0.03623038
sample estimates:
   prop 1    prop 2 
0.3576389 0.4041812 

Balance across attrition

  • Next, we can check whether our treatment is correlated with attrition at any moment in the survey with the balance_attrite() function
attritevis::balance_attrite(data = test_data, 
        treatment = "treat1", 
        question = "Happy_3_1")

Call:
glm(formula = question1 ~ treatment1, family = binomial(link = "logit"), 
    data = data2)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -1.7441     0.1653 -10.552   <2e-16 ***
treatment1treatment  -0.1704     0.2415  -0.706     0.48    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 464.49  on 576  degrees of freedom
Residual deviance: 463.99  on 575  degrees of freedom
  (47 observations deleted due to missingness)
AIC: 467.99

Number of Fisher Scoring iterations: 4

Calculating the bounds

  • Finally, we can estimate both Manski bounds and Lee bounds with the bounds() function
  • We have to install the attrition package to estimate the bounds
# devtools::install_github("acoppock/attrition")
attritevis::bounds(data = test_data, 
        treatment = "treat1", 
        DV = "cards1")
    ci_lower     ci_upper      low_est      upp_est      low_var      upp_var 
-0.003727682  0.144394733  0.070333526  0.070333526  0.001427859  0.001427859 
attritevis::bounds(data = test_data, 
        treatment = "treat1", 
        DV = "cards1", 
        type = "Lee")
    upper_bound     lower_bound       Out0_mono      Out1L_mono      Out1U_mono 
     0.07968127      0.06517928      0.29482072      0.36000000      0.37450199 
control_group_N   treat_group_N               Q              f1              f0 
   251.00000000    254.00000000      0.01523036      0.11805556      0.13148789 
         pi_r_1          pi_r_0 
     0.88194444      0.86851211 

Summary

  • Attrition is a common issue in experiments, and it can bias estimates (as everything else we’ve seen so far 😅)
  • We can handle attrition by assuming that it is related to observed variables
  • In this case, we can use inverse probability weighting (IPW) to estimate the ATE
  • We can also use extreme bounds analysis (EBA) to evaluate the sensitivity of the ATE to different assumptions about the missing data
  • Lee bounds are a way to trim the bounds to make them more informative
  • The attritevis package in R has several functions to visualise the attrition process, calculate the bounds, and estimate the ATE
  • Consider using these functions in your next experiment! 🤓

…and that’s it! 🎉

See you next time! 😉