QTM 385 - Experimental Methods
Lecture 25 - Course Revision
Danilo Freire
Emory University
Welcome back! 🤓
Course revision session 📚
Today’s goal: connecting the dots
- We have covered a lot of ground in experimental methods this semester! 🥳
- Today, we will review some key concepts and methods from the course
- The aim is to see how different topics link together
- And feel free to ask me questions about the group project, or anything else! 😉
The research design process (Lec 02)
- Good research questions produce knowledge people care about, solving problems or helping policy
- Research questions should be clear, specific, and answerable
- No experiment is theory-free, even if not explicitly stated
- Operationalisation involves translating abstract concepts (e.g, social isolation) into measurable variables (e.g, frequency of social interactions)
- Construct validity ensures the measure accurately reflects the concept
- Credible designs yield practical answers, are transparent via pre-registration (PAPs), and are replicable
The MIDA framework (Lec 02 cont)
- The MIDA framework provides a structure for declaring and diagnosing any research design:
- Model: Assumptions about how the world works (potential outcomes, relationships)
- Inquiry: The specific question (estimand) we want to answer (e.g, ATE)
- Data Strategy: How data are generated (sampling, treatment assignment)
- Answer Strategy: The estimator used to answer the inquiry from the data (e.g, difference-in-means, regression)
- Using MIDA in code (with
DeclareDesign
) allows simulating the design to understand its properties (bias, power, etc) before implementation
Potential outcomes & causality (Lec 03)
- The Potential Outcomes (PO) framework is what we use for defining causal effects
- For each unit \(i\), there’s an outcome if treated (\(Y_i(1)\)) and an outcome if untreated (\(Y_i(0)\))
- The individual treatment effect is \(\tau_i = Y_i(1) - Y_i(0)\)
- The Fundamental Problem of Causal Inference states we only observe one potential outcome per unit (\(Y_i = Y_i(1)Z_i + Y_i(0)(1-Z_i)\))
- Causality is inherently a missing data problem
- Our goal is often to estimate population averages, like the Average Treatment Effect (ATE): \(ATE = E[Y_i(1) - Y_i(0)]\)
Hypothesis testing: Neyman vs Fisher (Lec 05)
Neyman Approach (ATE)
- Focuses on estimating the average effect in the population
- Tests hypotheses like \(H_0: ATE = 0\) vs \(H_a: ATE \neq 0\)
- Uses test statistics (e.g, \(t\)-stat = estimate / SE) and \(p\)-values; rejects \(H_0\) if \(p\)-value < \(\alpha\)
- Confidence Intervals provide a range of plausible values for the ATE
- Relies on large sample approximations (Central Limit Theorem)
- Considers Type I (\(\alpha\)) and Type II (\(\beta\)) errors; Power = 1 - \(\beta\)
Fisher Approach (Randomisation Inference)
- Uses the random assignment process itself as the basis for inference
- Tests the sharp null hypothesis (\(H_0: Y_i(1) = Y_i(0)\) for all \(i\))
- Simulates all possible random assignments under \(H_0\) to build a reference distribution
- The \(p\)-value is the proportion of simulated statistics as extreme as the observed one
- Requires fewer assumptions (no normality) and yields exact \(p\)-values; good for small samples (uses
ri2
package)
Randomisation inference details (Lec 05 cont)
- The core idea of RI is to ask: “Assuming the treatment had absolutely no effect on anyone (the sharp null), how likely were we to get a difference-in-means as large as the one we actually observed, just by the random chance of assignment?”
- We generate the randomisation distribution by:
- Assuming \(H_0\) is true, so \(Y_i(1)=Y_i(0)=Y_i^{obs}\) for all \(i\)
- Recalculating the difference-in-means (or other test statistic) for many (or all) possible ways the units could have been randomly assigned to \(Z=1\) and \(Z=0\)
- Plotting these simulated differences
- The \(p\)-value is the fraction of simulated differences that are as large or larger in magnitude than our actual observed difference
- This avoids assumptions about the distribution of outcomes needed for t-tests
Key experimental findings (Lec 06)
- Discussed influential studies applying experimental methods:
- Kalla & Broockman (2015): Used a field experiment (blocked randomisation) to show revealing donor status significantly increased political access to US congressional officials
- Bertrand & Mullainathan (2004): Employed a correspondence study (field experiment) randomising names on CVs, finding significant callback gaps favouring White-sounding names in the US labour market
- Chattopadhyay & Duflo (2004): Leveraged a natural experiment (randomised council seat reservations in India) showing female leaders prioritised different public goods (water vs roads) compared to male leaders
Design Challenges & Solutions 🛠️
Blocking and clustering (Lec 07 & 08)
Blocking
- Group units by pre-treatment covariates (\(X\)) related to outcome (\(Y\)); randomise within blocks
- Increases precision (removes between-block variance), ensures balance on \(X\)
- Include block fixed effects or use interaction estimators (
lm_lin
)
Clustering
- Treatment assigned at group level (village, school); outcomes measured at individual level
- Often necessary due to practical constraints or spillovers
- Challenge: Intra-Cluster Correlation (ICC) violates independence (\(\rho = \frac{\sigma^2_{between}}{\sigma^2_{between} + \sigma^2_{within}}\))
Clustering Consequences & Power
- Use Cluster-Robust Standard Errors (CRSE) (
estimatr::lm_robust(..., clusters = ...)
), requires sufficient clusters
- Power analysis must account for ICC; driven more by number of clusters
- Improve designs via pair-matching or blocking at the cluster level
Power analysis principles (Lec 08 cont)
- Statistical power is the probability of correctly rejecting a false null hypothesis (detecting a true effect)
- Conventionally aim for power \(\ge 0.80\)
- Power depends on:
- Effect size: Larger effects are easier to detect
- Sample size (N): Larger N increases power
- Significance level (\(\alpha\)): Lower \(\alpha\) reduces power
- Outcome variance (\(\sigma^2\)): Lower variance increases power
- Proportion treated: Power is maximised with equal group sizes (50/50 split)
- Design features: Blocking increases power; clustering decreases power (must use \(n_{ESS}\))
- Conduct power analysis before the experiment using tools like
DeclareDesign
or power calculators, making reasonable assumptions about effect size and variance
One-sided non-compliance (Lec 09)
- Some assigned to treatment (\(Z=1\)) don’t receive it (\(D=0\)), but control compliance (\(Z=0 \implies D=0\)) is perfect
- Compliers (\(D_i(1)=1, D_i(0)=0\)) and Never-takers (\(D_i(1)=0, D_i(0)=0\))
- Intent-to-Treat (ITT) effect (\(E[Y|Z=1] - E[Y|Z=0]\)) estimates the effect of assignment; it’s unbiased but diluted
- Complier Average Causal Effect (CACE/LATE) (\(E[Y_i(1) - Y_i(0) | D_i(1)>D_i(0)]\)) estimates the effect of treatment on compliers
- Estimation uses Instrumental Variables (IV) / 2SLS, with assignment (\(Z\)) as instrument for treatment receipt (\(D\))
- \(CACE = ITT_Y / ITT_D\)
- Requires relevance, exclusion, independence assumptions
Compliance types
Wᵢ(1)=0 |
never-taker |
defier |
Wᵢ(1)=1 |
complier |
always-taker |
Two-sided non-compliance (Lec 10)
- Non-compliance occurs in both arms: some \(Z=1\) don’t get \(D=1\); some \(Z=0\) do get \(D=1\) (e.g, control group finds alternative access)
- Adds potential for Always-takers (\(D_i(1)=1, D_i(0)=1\)) and Defiers (\(D_i(1)=0, D_i(0)=1\))
- Observed groups become mixtures of compliance types
- Requires the Monotonicity Assumption (assume no Defiers) to identify CACE; this implies \(D_i(1) \ge D_i(0)\) for all \(i\)
- Estimation still uses IV/2SLS (\(CACE = ITT_Y / ITT_D\))
- Always-takers don’t bias the IV estimate under monotonicity
Attrition: missing outcome data (Lec 11)
- Attrition involves missing outcome data post-randomisation (e.g, participants drop out)
- Bias occurs if attrition is non-random (differential attrition related to treatment or potential outcomes)
- Handling Options:
- Assume MCAR (unlikely); analyse complete cases (reduces power)
- Assume MAR / Conditional Ignorability (\(MIPO|X\)): Missingness depends only on observed pre-treatment \(X\); use Inverse Probability Weighting (IPW) to upweight observed units similar to missing ones
- Assume MNAR: Missingness depends on unobservables; use Bounds Analysis to estimate range of possible ATEs under worst-case (Manski bounds) or monotonicity assumptions (Lee bounds)
Ethics in research design (Lec 12)
- Ethical conduct is integral to good science
- Core Principles from the Belmont Report:
- Respect for Persons: Requires informed consent, autonomy, and protection for vulnerable groups
- Beneficence: Involves minimising harm and maximising potential benefits through careful risk-benefit assessment; Equipoise (genuine uncertainty) is key
- Justice: Demands fair participant selection and equitable distribution of research burdens/benefits
- Practical implementation involves Institutional Review Boards (IRBs), clear consent processes, data protection, and considering staff/community well-being
- Adaptive designs can enhance ethics by allocating more participants to effective treatments sooner
Six components of a DeclareDesign
study (Lec 13)
- DeclareDesign formalises research plans using six key components, specified using
declare_*
functions:
- Population: Defines units and their characteristics (
declare_model
)
- Potential outcomes: Specifies how outcomes depend on treatments (
declare_model
)
- Sampling strategy: How units are selected (
declare_sampling
)
- Assignment: How units are assigned to treatment (
declare_assignment
)
- Estimand: The target quantity of interest (
declare_inquiry
)
- Estimator: The procedure/model used for estimation (
declare_estimator
)
- The
DesignLibrary
package provides pre-built templates for common designs
Pre-analysis plans (PAPs) in practice (Lec 14)
- PAPs detail the research plan (hypotheses, design, analysis) before data analysis
- Aim to increase transparency, reduce bias (p-hacking, HARKing), enhance credibility
- Stemmed from reproducibility crisis
- Key components: Motivation, Hypotheses, Population/Sampling, Intervention, Outcomes/Covariates, Randomisation, Analysis Plan (estimators, SEs, power, missing data, subgroups), Implementation details
- Should distinguish confirmatory (pre-specified) from exploratory analyses
- Pros: Credibility, transparency, limits researcher degrees of freedom
- Cons: Time-consuming, potentially inflexible (mitigated by allowing pre-specified exploratory analysis or clear justifications for deviations)
- Registries like OSF, AEA, EGAP host PAPs
- SOPs (Standard Operating Procedures) offer a potentially more flexible alternative but are less common
Advanced Methods & Applications 🔬
Natural & quasi-experiments (Lec 15)
- Used when RCTs aren’t feasible/ethical, leveraging “as-if” random assignment
- Natural Experiments rely on assignment outside researcher control (e.g, lotteries); require strong exogeneity arguments
- Quasi-Experiments are a broader category; common designs include:
- Regression Discontinuity (RDD): Exploits sharp cutoff rules (e.g, Mignozzetti et al); assumes continuity of potential outcomes at cutoff
- Difference-in-Differences (DID): Compares changes over time for treated vs control (e.g, Card & Krueger); assumes parallel trends in absence of treatment
- Validity depends heavily on the plausibility of underlying assumptions
Interference & spillovers (Lec 16)
- Interference occurs when one unit’s treatment affects another’s outcome (a SUTVA violation); common in social/network settings
- Standard ATE estimates become biased
- Requires expanding potential outcomes notation (e.g, \(Y_{i}(Z_i, Z_{-i})\))
- Designs to address/estimate interference:
- Clustered randomisation: Randomise at a level high enough to contain spillovers
- Multi-level designs: Randomise at multiple levels (e.g, household & individual) to separate direct/indirect effects
Heterogeneous treatment effects (HTE) (Lec 18)
- Effects often vary; ATE is just the average; understanding variability is key
- Challenge: \(Var(\tau)\) depends on unidentifiable \(Cov(Y(1), Y(0))\)
- Exploring HTE:
- Treatment-by-Covariate Interactions (CATEs): Estimate ATE within subgroups based on pre-treatment \(X\); use regression interactions (\(Y \sim Z * X\)); Caution: Correlational re: HTE source; multiple comparisons risk
- Treatment-by-Treatment Interactions (Factorial Designs): Experimentally manipulate multiple factors (\(Z_1, Z_2\)); allows causal inference about interactions; requires larger N
- Beware the multiple comparisons problem when testing many subgroups; use corrections (Bonferroni) or pre-specification
Survey experiments (Lec 21 & 22)
Survey Experiments (Lec 21)
- Random assignment embedded within survey instruments
- Ideal for studying attitudes, preferences, information effects
- Common designs: Question wording/framing, order effects, vignettes
- Trade-offs: High internal validity vs potential external validity/demand effect concerns
Sensitive Topics (Lec 22)
- Challenge: Social desirability bias
- Goal: Elicit truthful responses while protecting privacy
- Techniques: List Experiment, Randomised Response (RRT), Endorsement Experiment, Conjoint Analysis
Survey experiments: validation & design (Lec 21 cont)
- Ensuring construct validity (manipulating what you intend):
- Pilot testing treatments before main study
- Manipulation checks (post-treatment questions assessing if manipulation worked)
- Placebo conditions (similar task/info but without key manipulation)
- Non-equivalent outcomes (outcomes that shouldn’t be affected)
- Design considerations:
- Comparability across conditions (length, complexity)
- Realism of vignettes/stimuli
- Respondent burden and attention (timers, forced exposure)
- Device compatibility (mobile vs desktop)
Sensitive survey techniques details (Lec 22 cont)
- List Experiment: Compare mean count between T (list + sensitive item) and C (list only); difference estimates prevalence; assumes no design effects/no liars; watch for floor/ceiling effects
- RRT: Respondent uses random device (coin flip) to determine whether to answer truthfully or give fixed response; known probabilities allow estimation; can be confusing for respondents but often performs well in validation
- Endorsement Experiment: Randomly associate policy/statement with endorsing group; difference in support reveals implicit attitude towards endorser; analysis complex with multiple endorsers
- Conjoint Analysis: Respondents choose between profiles with multiple randomised attributes; estimates importance of each attribute (including sensitive ones) via trade-offs; powerful but complex design/analysis
Discussions & Integration 🌍
Key paper discussions (Lec 06, 17, 19, 23)
Foundational Ideas & Design
- Lec 06: Kalla & Broockman (access), Bertrand & Mullainathan (discrimination), Chattopadhyay & Duflo (representation)
- Classic examples of field & natural experiments demonstrating core concepts
- Lec 17: Centola (networks/contagion), Paluck (network intervention/climate), Gerber & Green (GOTV/interference/IV)
- Focused on exploring interference, network structure, and spillover effects experimentally and analytically
Identification & Complex Settings
- Lec 19: Munshi (networks/IV/FE), Miguel & Kremer (externalities/cluster RCT/spillovers)
- Showcased clever identification strategies using IVs and cluster-randomisation for observational data and spillovers
- Lec 23: Druckman (list/sensitive), Blair (list+endorsement/sensitive), Rosenfeld (validation/sensitive), Freire & Skarbek (conjoint/sensitive)
- Illustrated application and validation of methods for sensitive topics
Integration of research findings (Lec 24)
- Generalising results (extrapolation) is challenging; distinguish Sample ATE (SATE) from Population ATE (PATE); PATE estimation adds sampling uncertainty
- The Bayesian framework formally updates prior beliefs with new evidence, can incorporate beliefs about potential bias (e.g, sampling bias); posterior is precision-weighted average
- Meta-Analysis pools results from multiple studies
- Fixed Effects assumes one true effect; weights by precision (\(1/SE^2\))
- Random Effects allows true effect to vary across studies (more realistic); accounts for between-study heterogeneity
- Beware publication bias and study heterogeneity
Wrapping up
- Phew, that was a lot of content! 😅
- We’ve discussed from basic experimental design (MIDA, PO) through implementation challenges (compliance, attrition, ethics, interference) to advanced analysis (HTE, mediation, quasi-methods) and synthesis (meta-analysis)
- Key theme: Understanding assumptions, potential biases, and choosing appropriate experimental designs and methods
- Experiments are amazing 🤩! But they require careful thought and execution
- Thank you for your engagement throughout the course 🙌
- Any questions?
Thank you for your attention! 🙏