Lecture 03 - Potential Outcomes Framework
A good research question should produce knowledge that solves real-world problems and guides policy decisions, with a practical and credible research design
Theory is essential in research design, whether implicit or explicit, as it helps generate hypotheses, informs design choices, and guides inference strategies
Operationalisation is the process of translating theoretical concepts into measurable variables, such as turning “social isolation” into the frequency of social interactions
Pre-registration involves filing research designs and hypotheses publicly to reduce bias, improve credibility, and distinguish between pre-planned and exploratory analyses
The reproducibility crisis in science highlights the need for transparent and replicable research, with pre-registration and pre-analysis plans helping address this issue
The EGAP research design form provides a blueprint for creating robust research designs, covering key components like hypotheses, population, intervention, outcomes, and analysis
The MIDA framework (Model, Inquiry, Data, Answer) helps researchers simulate and diagnose their designs before implementation, ensuring they can answer their research questions effectively
DeclareDesign is a tool that allows researchers to declare, diagnose, and design their studies using the MIDA framework, improving the quality and credibility of research
A two-arm trial is a common experimental design where units are randomly assigned to treatment or control groups, and the average treatment effect (ATE) is estimated
Alignment between research design and theoretical frameworks is necessary for generating credible and actionable results, even when experiments are not feasible
Hospital | No Hospital | Difference | |
---|---|---|---|
Health status | 3.21 | 3.93 | −0.72∗∗∗ |
(0.014) | (0.003) | ||
Observations | 7,774 | 90,049 |
\[ Y_i = \begin{cases} Y_{0i} & \text{if } T_i = 0 \\ Y_{1i} & \text{if } T_i = 1 \end{cases} \]
where \(T_i\) is a treatment indicator equal to 1 if \(i\) was treated and 0 otherwise
Each individual either participates in the programme or not
The causal impact of programme \(T\) on \(i\) is:
\[ Y_{1i} - Y_{0i} \]
\[ Y_i = Y_{0i} + (Y_{1i} - Y_{0i}) T_i \]
Example: Alejandro goes to the hospital, Benicio does not.
Difference in group means
Even in a large sample:
People will choose to participate in a program when they expect the program to make them better off (i.e., when \(Y_{1,i} - Y_{0,i} > 0\)).
The people who choose to participate are likely to be different than those who choose not to… even in the absence of the program
Source: r/dataisbeautiful
Conditional independence assumption (CIA) approaches
Explicit models (structural or not) of selection into treatment
Natural experiments when treatment is as-good-as-random
Main question: do hospitals make people sicker?
Heterogeneous treatment effects: treatments work differently for different people
Real-World Example:
Takeaway:
What would happen if everyone went to the hospital vs. no one?
Potential outcomes table:
Person | Outcome if Hospitalised | Outcome if Not Hospitalised |
---|---|---|
Sick | Health improves slightly | Health stays poor |
Healthy | Health slightly worsens | Health stays fine |
Randomisation breaks the link between treatment choice and outcomes
(An absurd) experiment:
Randomisation ensures groups are similar on average (similar mix of sick/healthy people)
Any difference in outcomes is caused by the hospital, not pre-existing health
But what are we measuring here?
What if we only randomise treatment for people who need it?
Better Experiment:
Result:
What are we measuring now? Is this the ideal experiment?
We might consider randomising access to hospitals
But what if people ignore their random assignment?
Example:
Problem:
Result:
This is the compliance problem in experiments
Even with random assignment, human behaviour (often) complicates things
Example: A government offers free training to help people find jobs
Should we continue the programme?
Two ways to analyse:
Takeaway: Compliance affects how we interpret results and design policies.