QTM 385 - Experimental Methods

Lecture 25 - Course Revision

Danilo Freire

Emory University

Welcome back! 🤓
Course revision session 📚

Today’s goal: connecting the dots

  • We have covered a lot of ground in experimental methods this semester! 🥳
  • Today, we will review some key concepts and methods from the course
  • The aim is to see how different topics link together
  • And feel free to ask me questions about the group project, or anything else! 😉

Foundations 🏗️

The research design process (Lec 02)

  • Good research questions produce knowledge people care about, solving problems or helping policy
  • Research questions should be clear, specific, and answerable
  • No experiment is theory-free, even if not explicitly stated
  • Operationalisation involves translating abstract concepts (e.g, social isolation) into measurable variables (e.g, frequency of social interactions)
    • Construct validity ensures the measure accurately reflects the concept
  • Credible designs yield practical answers, are transparent via pre-registration (PAPs), and are replicable

The MIDA framework (Lec 02 cont)

  • The MIDA framework provides a structure for declaring and diagnosing any research design:
    • Model: Assumptions about how the world works (potential outcomes, relationships)
    • Inquiry: The specific question (estimand) we want to answer (e.g, ATE)
    • Data Strategy: How data are generated (sampling, treatment assignment)
    • Answer Strategy: The estimator used to answer the inquiry from the data (e.g, difference-in-means, regression)
  • Using MIDA in code (with DeclareDesign) allows simulating the design to understand its properties (bias, power, etc) before implementation

Source: DeclareDesign

Potential outcomes & causality (Lec 03)

  • The Potential Outcomes (PO) framework is what we use for defining causal effects
    • For each unit \(i\), there’s an outcome if treated (\(Y_i(1)\)) and an outcome if untreated (\(Y_i(0)\))
    • The individual treatment effect is \(\tau_i = Y_i(1) - Y_i(0)\)
  • The Fundamental Problem of Causal Inference states we only observe one potential outcome per unit (\(Y_i = Y_i(1)Z_i + Y_i(0)(1-Z_i)\))
  • Causality is inherently a missing data problem
  • Our goal is often to estimate population averages, like the Average Treatment Effect (ATE): \(ATE = E[Y_i(1) - Y_i(0)]\)

Selection bias & SUTVA (Lec 03 cont)

  • Selection bias arises when comparing groups that differ systematically before treatment (e.g, sicker people choosing hospitals), biasing simple comparisons
  • Randomisation breaks the link between potential outcomes and treatment receipt, making groups comparable on average (in expectation) on all pre-treatment characteristics
  • This allows unbiased ATE estimation via difference-in-means
  • SUTVA (Stable Unit Treatment Value Assumption):
    • No interference between units (one unit’s treatment doesn’t affect another’s outcome)
    • Consistent treatment value (the treatment is the same for all units receiving it)

  1. Confounding: Underlying illness (\(Z\)) causes both taking a specific drug (\(X\)) and experiencing a bad outcome (\(Y\))
  2. Mediation: Smoking (\(X\)) causes lung damage (\(Z\)), which then leads to breathing difficulties (\(Y\))
  3. Collision: A student’s talent (\(X\)) influences their grades (\(Y\)); both talent (\(X\)) and grades (\(Y\)) influence getting a scholarship (\(Z\)) – selecting only scholarship winners (\(Z\)) distorts the observed talent-grade relationship

Selection bias (Lec 04)

  1. Start with the observed difference in means: \(E[Y_i|Z_i=1] - E[Y_i|Z_i=0]\)
  2. Substitute observed outcomes with potential outcomes based on treatment status: \(= E[Y_i(1)|Z_i=1] - E[Y_i(0)|Z_i=0]\)
  3. Add and subtract the counterfactual for the treated group (\(E[Y_i(0)|Z_i=1]\)): \(= E[Y_i(1)|Z_i=1] - E[Y_i(0)|Z_i=1] + E[Y_i(0)|Z_i=1] - E[Y_i(0)|Z_i=0]\)
  4. Group the terms: \(= \{ E[Y_i(1)|Z_i=1] - E[Y_i(0)|Z_i=1] \} + \{ E[Y_i(0)|Z_i=1] - E[Y_i(0)|Z_i=0] \}\)
  5. Identify the components: \(= ATT + \text{Selection Bias}\)
    • Where ATT is the Average Treatment effect on the Treated
    • And Selection Bias is \(\{ E[Y_i(0)|Z_i=1] - E[Y_i(0)|Z_i=0] \}\), which is the difference in the potential untreated outcomes between those who selected treatment and those who did not
  6. Under randomisation, \(ATT = ATE\), and the selection bias term averages to zero
  • Balance tests check pre-treatment covariate balance; useful diagnostic, but doesn’t guarantee balance on unobservables
  • Beware common biases like attrition bias, survivorship bias, and post-treatment bias (controlling for mediators)

Hypothesis testing: Neyman vs Fisher (Lec 05)

Neyman Approach (ATE)

  • Focuses on estimating the average effect in the population
  • Tests hypotheses like \(H_0: ATE = 0\) vs \(H_a: ATE \neq 0\)
  • Uses test statistics (e.g, \(t\)-stat = estimate / SE) and \(p\)-values; rejects \(H_0\) if \(p\)-value < \(\alpha\)
  • Confidence Intervals provide a range of plausible values for the ATE
  • Relies on large sample approximations (Central Limit Theorem)
  • Considers Type I (\(\alpha\)) and Type II (\(\beta\)) errors; Power = 1 - \(\beta\)

Fisher Approach (Randomisation Inference)

  • Uses the random assignment process itself as the basis for inference
  • Tests the sharp null hypothesis (\(H_0: Y_i(1) = Y_i(0)\) for all \(i\))
  • Simulates all possible random assignments under \(H_0\) to build a reference distribution
  • The \(p\)-value is the proportion of simulated statistics as extreme as the observed one
  • Requires fewer assumptions (no normality) and yields exact \(p\)-values; good for small samples (uses ri2 package)

Randomisation inference details (Lec 05 cont)

  • The core idea of RI is to ask: “Assuming the treatment had absolutely no effect on anyone (the sharp null), how likely were we to get a difference-in-means as large as the one we actually observed, just by the random chance of assignment?
  • We generate the randomisation distribution by:
    • Assuming \(H_0\) is true, so \(Y_i(1)=Y_i(0)=Y_i^{obs}\) for all \(i\)
    • Recalculating the difference-in-means (or other test statistic) for many (or all) possible ways the units could have been randomly assigned to \(Z=1\) and \(Z=0\)
    • Plotting these simulated differences
  • The \(p\)-value is the fraction of simulated differences that are as large or larger in magnitude than our actual observed difference
  • This avoids assumptions about the distribution of outcomes needed for t-tests

Key experimental findings (Lec 06)

  • Discussed influential studies applying experimental methods:
    • Kalla & Broockman (2015): Used a field experiment (blocked randomisation) to show revealing donor status significantly increased political access to US congressional officials
    • Bertrand & Mullainathan (2004): Employed a correspondence study (field experiment) randomising names on CVs, finding significant callback gaps favouring White-sounding names in the US labour market
    • Chattopadhyay & Duflo (2004): Leveraged a natural experiment (randomised council seat reservations in India) showing female leaders prioritised different public goods (water vs roads) compared to male leaders

Design Challenges & Solutions 🛠️

Blocking and clustering (Lec 07 & 08)

Blocking

  • Group units by pre-treatment covariates (\(X\)) related to outcome (\(Y\)); randomise within blocks
  • Increases precision (removes between-block variance), ensures balance on \(X\)
  • Include block fixed effects or use interaction estimators (lm_lin)

Clustering

  • Treatment assigned at group level (village, school); outcomes measured at individual level
  • Often necessary due to practical constraints or spillovers
  • Challenge: Intra-Cluster Correlation (ICC) violates independence (\(\rho = \frac{\sigma^2_{between}}{\sigma^2_{between} + \sigma^2_{within}}\))

Clustering Consequences & Power

  • Use Cluster-Robust Standard Errors (CRSE) (estimatr::lm_robust(..., clusters = ...)), requires sufficient clusters
  • Power analysis must account for ICC; driven more by number of clusters
  • Improve designs via pair-matching or blocking at the cluster level

Power analysis principles (Lec 08 cont)

  • Statistical power is the probability of correctly rejecting a false null hypothesis (detecting a true effect)
  • Conventionally aim for power \(\ge 0.80\)
  • Power depends on:
    • Effect size: Larger effects are easier to detect
    • Sample size (N): Larger N increases power
    • Significance level (\(\alpha\)): Lower \(\alpha\) reduces power
    • Outcome variance (\(\sigma^2\)): Lower variance increases power
    • Proportion treated: Power is maximised with equal group sizes (50/50 split)
    • Design features: Blocking increases power; clustering decreases power (must use \(n_{ESS}\))
  • Conduct power analysis before the experiment using tools like DeclareDesign or power calculators, making reasonable assumptions about effect size and variance

One-sided non-compliance (Lec 09)

  • Some assigned to treatment (\(Z=1\)) don’t receive it (\(D=0\)), but control compliance (\(Z=0 \implies D=0\)) is perfect
  • Compliers (\(D_i(1)=1, D_i(0)=0\)) and Never-takers (\(D_i(1)=0, D_i(0)=0\))
  • Intent-to-Treat (ITT) effect (\(E[Y|Z=1] - E[Y|Z=0]\)) estimates the effect of assignment; it’s unbiased but diluted
  • Complier Average Causal Effect (CACE/LATE) (\(E[Y_i(1) - Y_i(0) | D_i(1)>D_i(0)]\)) estimates the effect of treatment on compliers
  • Estimation uses Instrumental Variables (IV) / 2SLS, with assignment (\(Z\)) as instrument for treatment receipt (\(D\))
  • \(CACE = ITT_Y / ITT_D\)
  • Requires relevance, exclusion, independence assumptions
Compliance types
Wᵢ(0) = 0 Wᵢ(0) = 1
Wᵢ(1)=0 never-taker defier
Wᵢ(1)=1 complier always-taker

Two-sided non-compliance (Lec 10)

  • Non-compliance occurs in both arms: some \(Z=1\) don’t get \(D=1\); some \(Z=0\) do get \(D=1\) (e.g, control group finds alternative access)
  • Adds potential for Always-takers (\(D_i(1)=1, D_i(0)=1\)) and Defiers (\(D_i(1)=0, D_i(0)=1\))
  • Observed groups become mixtures of compliance types
  • Requires the Monotonicity Assumption (assume no Defiers) to identify CACE; this implies \(D_i(1) \ge D_i(0)\) for all \(i\)
  • Estimation still uses IV/2SLS (\(CACE = ITT_Y / ITT_D\))
  • Always-takers don’t bias the IV estimate under monotonicity

Source: Facure (2022)

Attrition: missing outcome data (Lec 11)

  • Attrition involves missing outcome data post-randomisation (e.g, participants drop out)
  • Bias occurs if attrition is non-random (differential attrition related to treatment or potential outcomes)
  • Handling Options:
    • Assume MCAR (unlikely); analyse complete cases (reduces power)
    • Assume MAR / Conditional Ignorability (\(MIPO|X\)): Missingness depends only on observed pre-treatment \(X\); use Inverse Probability Weighting (IPW) to upweight observed units similar to missing ones
    • Assume MNAR: Missingness depends on unobservables; use Bounds Analysis to estimate range of possible ATEs under worst-case (Manski bounds) or monotonicity assumptions (Lee bounds)

Ethics in research design (Lec 12)

  • Ethical conduct is integral to good science
  • Core Principles from the Belmont Report:
    • Respect for Persons: Requires informed consent, autonomy, and protection for vulnerable groups
    • Beneficence: Involves minimising harm and maximising potential benefits through careful risk-benefit assessment; Equipoise (genuine uncertainty) is key
    • Justice: Demands fair participant selection and equitable distribution of research burdens/benefits
  • Practical implementation involves Institutional Review Boards (IRBs), clear consent processes, data protection, and considering staff/community well-being
  • Adaptive designs can enhance ethics by allocating more participants to effective treatments sooner

Six components of a DeclareDesign study (Lec 13)

  • DeclareDesign formalises research plans using six key components, specified using declare_* functions:
    • Population: Defines units and their characteristics (declare_model)
    • Potential outcomes: Specifies how outcomes depend on treatments (declare_model)
    • Sampling strategy: How units are selected (declare_sampling)
    • Assignment: How units are assigned to treatment (declare_assignment)
    • Estimand: The target quantity of interest (declare_inquiry)
    • Estimator: The procedure/model used for estimation (declare_estimator)
  • The DesignLibrary package provides pre-built templates for common designs

Source: DeclareDesign

Pre-analysis plans (PAPs) in practice (Lec 14)

  • PAPs detail the research plan (hypotheses, design, analysis) before data analysis
  • Aim to increase transparency, reduce bias (p-hacking, HARKing), enhance credibility
  • Stemmed from reproducibility crisis
  • Key components: Motivation, Hypotheses, Population/Sampling, Intervention, Outcomes/Covariates, Randomisation, Analysis Plan (estimators, SEs, power, missing data, subgroups), Implementation details
  • Should distinguish confirmatory (pre-specified) from exploratory analyses
  • Pros: Credibility, transparency, limits researcher degrees of freedom
  • Cons: Time-consuming, potentially inflexible (mitigated by allowing pre-specified exploratory analysis or clear justifications for deviations)
  • Registries like OSF, AEA, EGAP host PAPs
  • SOPs (Standard Operating Procedures) offer a potentially more flexible alternative but are less common

Advanced Methods & Applications 🔬

Natural & quasi-experiments (Lec 15)

  • Used when RCTs aren’t feasible/ethical, leveraging “as-if” random assignment
  • Natural Experiments rely on assignment outside researcher control (e.g, lotteries); require strong exogeneity arguments
  • Quasi-Experiments are a broader category; common designs include:
    • Regression Discontinuity (RDD): Exploits sharp cutoff rules (e.g, Mignozzetti et al); assumes continuity of potential outcomes at cutoff
    • Difference-in-Differences (DID): Compares changes over time for treated vs control (e.g, Card & Krueger); assumes parallel trends in absence of treatment
  • Validity depends heavily on the plausibility of underlying assumptions

Interference & spillovers (Lec 16)

  • Interference occurs when one unit’s treatment affects another’s outcome (a SUTVA violation); common in social/network settings
  • Standard ATE estimates become biased
  • Requires expanding potential outcomes notation (e.g, \(Y_{i}(Z_i, Z_{-i})\))
  • Designs to address/estimate interference:
    • Clustered randomisation: Randomise at a level high enough to contain spillovers
    • Multi-level designs: Randomise at multiple levels (e.g, household & individual) to separate direct/indirect effects

Heterogeneous treatment effects (HTE) (Lec 18)

  • Effects often vary; ATE is just the average; understanding variability is key
  • Challenge: \(Var(\tau)\) depends on unidentifiable \(Cov(Y(1), Y(0))\)
  • Exploring HTE:
    • Treatment-by-Covariate Interactions (CATEs): Estimate ATE within subgroups based on pre-treatment \(X\); use regression interactions (\(Y \sim Z * X\)); Caution: Correlational re: HTE source; multiple comparisons risk
    • Treatment-by-Treatment Interactions (Factorial Designs): Experimentally manipulate multiple factors (\(Z_1, Z_2\)); allows causal inference about interactions; requires larger N
  • Beware the multiple comparisons problem when testing many subgroups; use corrections (Bonferroni) or pre-specification

Mediation analysis (Lec 20)

  • Seeks to understand how \(Z\) affects \(Y\) via mediator \(M\) (\(Z \to M \to Y\))
  • Traditional regression methods are often biased due to \(M \leftrightarrow Y\) confounding (omitted variables affecting both \(M\) and \(Y\))
  • Potential outcomes approach reveals fundamental identification problem requiring strong ‘sequential ignorability’ assumptions (\(M\) is ‘as-if’ random conditional on \(Z\) and \(X\))
  • Randomising \(Z\) alone is insufficient
  • Using \(Z\) as an IV for \(M\) requires the strong exclusion restriction (no direct \(Z \leftrightarrow Y\) path), often violated
  • Implicit Mediation is a robust design-based alternative:
    • Experimentally vary treatment components (\(Z\) vs \(Z'\))
    • Compare effects of different bundles to infer mechanism importance without measuring \(M\)

Survey experiments (Lec 21 & 22)

Survey Experiments (Lec 21)

  • Random assignment embedded within survey instruments
  • Ideal for studying attitudes, preferences, information effects
  • Common designs: Question wording/framing, order effects, vignettes
  • Trade-offs: High internal validity vs potential external validity/demand effect concerns

Sensitive Topics (Lec 22)

  • Challenge: Social desirability bias
  • Goal: Elicit truthful responses while protecting privacy
  • Techniques: List Experiment, Randomised Response (RRT), Endorsement Experiment, Conjoint Analysis

Survey experiments: validation & design (Lec 21 cont)

  • Ensuring construct validity (manipulating what you intend):
    • Pilot testing treatments before main study
    • Manipulation checks (post-treatment questions assessing if manipulation worked)
    • Placebo conditions (similar task/info but without key manipulation)
    • Non-equivalent outcomes (outcomes that shouldn’t be affected)
  • Design considerations:
    • Comparability across conditions (length, complexity)
    • Realism of vignettes/stimuli
    • Respondent burden and attention (timers, forced exposure)
    • Device compatibility (mobile vs desktop)

Sensitive survey techniques details (Lec 22 cont)

  • List Experiment: Compare mean count between T (list + sensitive item) and C (list only); difference estimates prevalence; assumes no design effects/no liars; watch for floor/ceiling effects
  • RRT: Respondent uses random device (coin flip) to determine whether to answer truthfully or give fixed response; known probabilities allow estimation; can be confusing for respondents but often performs well in validation
  • Endorsement Experiment: Randomly associate policy/statement with endorsing group; difference in support reveals implicit attitude towards endorser; analysis complex with multiple endorsers
  • Conjoint Analysis: Respondents choose between profiles with multiple randomised attributes; estimates importance of each attribute (including sensitive ones) via trade-offs; powerful but complex design/analysis

Discussions & Integration 🌍

Key paper discussions (Lec 06, 17, 19, 23)

Foundational Ideas & Design

  • Lec 06: Kalla & Broockman (access), Bertrand & Mullainathan (discrimination), Chattopadhyay & Duflo (representation)
    • Classic examples of field & natural experiments demonstrating core concepts
  • Lec 17: Centola (networks/contagion), Paluck (network intervention/climate), Gerber & Green (GOTV/interference/IV)
    • Focused on exploring interference, network structure, and spillover effects experimentally and analytically

Identification & Complex Settings

  • Lec 19: Munshi (networks/IV/FE), Miguel & Kremer (externalities/cluster RCT/spillovers)
    • Showcased clever identification strategies using IVs and cluster-randomisation for observational data and spillovers
  • Lec 23: Druckman (list/sensitive), Blair (list+endorsement/sensitive), Rosenfeld (validation/sensitive), Freire & Skarbek (conjoint/sensitive)
    • Illustrated application and validation of methods for sensitive topics

Integration of research findings (Lec 24)

  • Generalising results (extrapolation) is challenging; distinguish Sample ATE (SATE) from Population ATE (PATE); PATE estimation adds sampling uncertainty
  • The Bayesian framework formally updates prior beliefs with new evidence, can incorporate beliefs about potential bias (e.g, sampling bias); posterior is precision-weighted average
  • Meta-Analysis pools results from multiple studies
    • Fixed Effects assumes one true effect; weights by precision (\(1/SE^2\))
    • Random Effects allows true effect to vary across studies (more realistic); accounts for between-study heterogeneity
    • Beware publication bias and study heterogeneity

Generalisation & meta-analysis details (Lec 24 cont)

  • PATE standard error includes sampling variance: \(SE(\widehat{PATE}) = \sqrt{ \frac{Var(Y_i(1))}{m} + \frac{Var(Y_i(0))}{N-m} }\)
  • Bayesian updating combines prior precision (\(\rho_{prior}\)) and data precision (\(\rho_{data}\)) for posterior precision (\(\rho_{posterior} = \rho_{prior} + \rho_{data}\))
  • Posterior mean is precision-weighted average: \(\mu_{posterior} = \frac{\rho_{prior} \mu_{prior} + \rho_{data} x_e}{\rho_{posterior}}\)
  • Incorporating bias (\(B \sim N(\beta, \sigma^2_B)\)) reduces effective data precision: \(\rho_{effective\_data} = \frac{1}{\sigma^2_B + \sigma^2_{xe}}\)
  • Meta-analysis requires careful study selection (PRISMA), data extraction, and choice between fixed/random effects models; meta-regression explores heterogeneity sources

Wrapping up

  • Phew, that was a lot of content! 😅
  • We’ve discussed from basic experimental design (MIDA, PO) through implementation challenges (compliance, attrition, ethics, interference) to advanced analysis (HTE, mediation, quasi-methods) and synthesis (meta-analysis)
  • Key theme: Understanding assumptions, potential biases, and choosing appropriate experimental designs and methods
  • Experiments are amazing 🤩! But they require careful thought and execution
  • Thank you for your engagement throughout the course 🙌
  • Any questions?

Thank you for your attention! 🙏