QTM 385 - Experimental Methods

Lecture 02 - The Research Design Process

Danilo Freire

Emory University

Hello, everyone! 👋
Nice to meet you again! 😊

Recap and lecture overview 📚

Welcome back! 🎉

About the course

Last week

  • We saw a brief history of experimental methods and their main characteristics
    • Key milestones from Petrarch (1394) to modern RCTs
  • We also saw the main types of experiments
    • Lab Experiments: Conducted in controlled environments with high internal validity
    • Field Experiments: Take place in real-world settings, offering higher external validity
    • Natural Experiments: Utilise naturally occurring variations, providing less control but valuable insights
  • Finally, we briefly discussed the distinction between correlation and causation
  • Key tools: R/Python, GitHub, Quarto, and Jupyter Notebooks
  • Assignments: Problem sets (50%), pre-analysis plan (20%), and final project (30%), with a late policy and collaboration guidelines

Course website and materials

Click on the image to access the course website! 🌐

https://danilofreire.github.io/qtm385/

Textbooks

Required: Alan Gerber and Donald Green - Field Experiments

Lots of additional readings included in the syllabus 😊

Today’s lecture 📅

The research design process: Questions and credibility

  • What makes a research question good?
  • The importance of theory
  • Designing and selecting your treatment
  • How to interpret your findings carefully
  • The importance of transparency and reproducibility
  • The research design process
  • EGAP research design form: A guide to help you think through the key components of your research design
  • Introduction to DeclareDesign and the MIDA framework
  • Brief overview of pre-registration and pre-analysis plans

But first…

Let’s do an experiment right now! 😄

Instructions

  • Come here to the front of the room (one at a time!)
  • You will look at my screen and answer a question
  • Please don’t say the answer out loud and don’t give hints to others!
  • Let’s see what happens! 😊


https://forms.gle/xUL39k7ngY2kXYJC7

Let’s discuss the results! 📊

What makes a research question good? 🤔

The importance of a good research question

  • The answer to a good research question should produce knowledge that people will care about
  • Addressing the question should (help) solve a problem, make a decision, or clarify/challenge our understanding of the world
  • But an interesting question is not enough
  • A good research design is a practical plan for research that makes the best use of available resources and produces a credible answer
  • The quality of a research design can be assessed by how well it produces results that can be used to guide policy and improve science:
    • A great research design produces results that clearly point in certain directions that we care about
    • A poor research design produces results that leave us in the dark — results with confusing interpretation

Source: University of Cambridge

Causality and experiments

  • As researchers, we are interested in research questions about how the world works

  • There are a number of different types of questions that we may want to answer. In academia, they are often divided into two broad categories:

    • Descriptive questions: Descriptions of a given phenomena: e.g., “how do teachers allocate their time during a school day?
    • Causal questions: Questions about how \(X\) affects \(Y\): e.g., “Does providing vocational training to migrants improve their economic integration in the receiving country?
  • Then we can move on to questions about why? \(\rightarrow\) i.e., knowing the effect of a cause is necessary before moving on to understanding the causes of an effect.

  • (Next sessions: more on about what we mean by causality and how experiments give us leverage to make causal claims.)

Theory

  • What is the phenomenon we want to explain?

    • Our outcome (we are going to call it \(Y\))
  • Does the cause we theorise lead to observing changes in \(Y\)?

    • Our treatment (in the context of experiments) (we are going to call it \(T\))
    • We will use \(X\) to refer to background variables (covariates)
  • What is the theory of change?

  • We are ultimately interested in how two theoretical concepts are related, measured by observed variables \(T\) (our treatment) and \(Y\) (our outcomes)

  • Why is theory important, then?

The importance of theory

  • There is no such thing as “just doing an experiment” 🧐

  • All research design involves theory, whether implicit or explicit

  • Our questions are value laden: For example, social scientists studied marijuana use in the 1950s as a form of “deviance”, the questions focused on “why are people making such bad decisions?” or “how can policy makers prevent marijuana use?”

  • Why do the research? We might want to change how scientists explain the world and/or change the policy decisions in (a) one place and time and/or (b) in other places and times

  • Research focused on learning the causal effect of \(T\) on \(Y\) requires a model of the world: how might intervention \(T\) might have an effect on some outcome \(Y\), and why, and how large might be the effect. It helps us think about how a different intervention or targeting different recipients might lead to different results.

  • Our theories and models are important not just for generating hypotheses, but for informing design and strategies for inference

  • Designing research will often clarify where we are less certain about our theories. Our theories will point to problems with our design. And questions arising from the process of design may indicate a need for more work on explanation and mechanism

Richard Feynman on the experimental method

Click on the image to watch the video! 📺

Designing or selecting your treatment

  • Operationalisation: The process of translating theoretical concepts into measurable variables.
    • Example: Turning the concept of “social isolation” into a measurable variable such as the frequency of social interactions
  • Key Questions:
    • How will we measure our outcomes?
    • What indicators will be used to represent the underlying concept?
    • How will we manipulate the cause of interest?
    • What intervention or treatment will be implemented?
  • Importance of Alignment:
    • The research design must align with the theoretical framework to ensure that the study is addressing the intended questions

Let’s consider the example from our experiment practicum

  • What is the outcome of interest (\(Y\))?

  • What is the cause of interest (\(T\))?

  • What can be a theory that yields to this experimental design?

  • What can be the main hypothesis?

  • How can we measure our outcomes?


  • What do you think?

Measuring treatments

  • Can we directly manipulate \(T\)? (underlying treatment concept of interest)

    • Ethical, logistical and other types of considerations can limit our ability to manipulate all of the indicators of \(T\)
    • At best, we may be able to change some of its indicators
    • We design a treatment, \(T\), to do so
  • How does our actual treatment relate to \(T\)?

    • But our treatment can be manipulating other things (bundled treatments)
  • Did everyone receive \(T\)?

    • Measure compliance

Source: World Health Organization, 2003

Measuring outcomes

  • Similarly, we often cannot directly observe the true value of the outcome concept for most of the outcomes we are interested in

  • Examples:

    • Correct answers to problems (indicators) for underlying mathematical aptitude (the actual phenomenon)
    • Days without food (indicators) for hunger (the actual phenomenon)
    • Reports of bribes (indicators) for corruption (the actual phenomenon)
  • Moreover, the underlying outcome concept may be even under debate (e.g., democracy)

  • If our indicators don’t measure the underlying concept that we’re interested in, then we may not be able to learn very much, even if we have an otherwise very sound experiment

Thinking about the treatment from the practicum…

How do you feel about the world?

  • Now think of yourselves as the researchers

  • In pairs or groups of three:

    • Generate hypothesis on potential heterogeneous effects
    • Generate expected effect size
    • Discuss theories behind the hypothesis and expected effect size, with emphasis on the importance of theory
    • Other ways of measuring the outcome or mode of administering the treatment?

Pre-registration and pre-analysis plans 📝

Bias in published research against null results

  • Anticipating or facing difficulties in getting published, manuscripts with null results are never submitted for review or put away in a “file drawer” after several rejections

  • We all face incentives to change your specifications, measurements, or even hypotheses to get a statistically significant result (\(p\)-hacking) to improve chances of publication

  • Even people not facing these incentives make many decisions when they analyse data: handling missing values and duplicate observations, creating scales, etc. And these choices can be consequential

  • Overall result: reduced credibility for individual pieces of research and (rightly) reduced confidence in whether we actually know what we claim to know

Amy Cuddy demonstrating her theory of “power posing”. It could never be replicated. Ted Talk with 74 million views!

The reproducibility crisis in medicine

https://doi.org/10.1371/journal.pmed.0020124

The reproducibility crisis in psychology

https://www.science.org/doi/10.1126/science.aac4716

The reproducibility crisis in political science

https://www.journals.uchicago.edu/doi/abs/10.1086/734279?journalCode=jop

The reproducibility crisis in economics

https://onlinelibrary.wiley.com/doi/full/10.1111/joes.12598

The reproducibility crisis… everywhere?

https://onlinelibrary.wiley.com/doi/full/10.1002/jrsm.1703

Not even our code is reproducible

https://www.nature.com/articles/s41597-022-01143-6

Towards review of design rather than outcomes

  • One part of solving this problem is to focus on the design, rather than the outcomes

  • The bias against null results can be overcome by reviewing the design, prior to learning the results

  • A good design executed well will produce credible research, which might be a null result. We want credible and actionable null results

  • Reviews of designs are also an opportunity to improve the research before it is implemented

“…if you gave the PAP to two different programmers and asked each to prepare the data for the primary dependent/independent variable(s), they should both be able to do so without asking any questions, and they should both be able to get the same answer.” Olken, 2015

Pre-registration of analysis plans and research designs

  • Pre-registration is the filing of your research design and hypotheses with a publicly-accessible repository. EGAP hosts one that you can use for free (currently on OSF.io using the EGAP registration form)

  • Pre-registration does not preclude later exploratory analyses that were not stated in advance. You just have to clearly distinguish between the two

  • Even if you will be submitting a paper with results rather than a design to an academic journal or you are primarily interested in a final report with findings for a policy audience, there are important advantages to you and to other researchers from pre-registering your research

    • You can learn about other research, completed and in progress; others can learn about yours. We can learn about studies that produced null results
    • It forces you to state your hypotheses and plan of analysis in advance of seeing the results, which limits \(p\)-hacking

Evidence in Governance and Politics (EGAP)

  • EGAP is a network of researchers and practitioners who conduct experiments to learn about effective governance and politics
  • They have long been advocates of pre-registration and reproducible research, such as with their Metaketa initiative
  • Disclosure: I am a member of EGAP and have been involved in several of their projects (including the courses on experimental design 😉)
  • They have lots of useful resources for researchers interested in experimental methods, some of them we will be using in this course. More in https://egap.org/

The EGAP research design form

  • The EGAP research design form is a guide to help you think through the key components of your research design
  • It shows the main parts of a pre-analysis plan, and it is a good blueprint for your own future experiments
  • We will be using it in our course, and you can access it here or on our website
  • We will have a few sessions dedicated to discussing pre-analysis plans, but I would like to introduce you to the form today (so you can start thinking about your final project!)

The EGAP research design form

Section 1: Introduction
  1. Researcher name
  1. Research project title
  1. One sentence summary of your specific research question
  1. General motivation
  1. Why should someone who is not an academic care care about the results of this research? [1 paragraph]
  2. What policy decision(s) will your research help inform? [1 paragraph] |
  1. Theoretical motivation
  1. What theoretical questions can this research shed light on? [1 paragraph]
  2. Key debate(s)/literature(s) that will be informed by the answer to your research question [1 paragraph]
  1. Primary hypotheses
  1. What are the key parameter/estimands the research design seeks to estimate? What sign and/or magnitude is predicted by primary hypotheses for each parameter/estimand? [1-2 paragraphs]
  2. What is the logic or theory of change behind the primary hypotheses [1-2 paragraphs]
  3. What are the key pieces in the relevant academic literature that inform your hypotheses? [2-3 pieces]
  1. Secondary hypotheses
  1. What are the secondary paramater/estimands the research design seeks to estimate? What sign and/or magnitude is predicted by the secondary hypotheses for each parameter/estimand [These may be conditional effects for subgroups or hypotheses about additional outcomes or cross- randomized treatments.]
  2. What is the logic or theory of change behind each secondary hypothesis? [Explain what effects we should expect if the theory behind your primary hypothesis is correct.]
  1. Alternative explanations if results are consistent with hypotheses
  1. What alternative theories could explain the results?
  2. Hypothesis for an alternative outcome (or other subgroups) that would be consistent only with the alternative explanation and not the logic behind your primary hypothesis.
  1. Alternative explanations if results are inconsistent with hypotheses
  1. What alternative theories could explain the results?
Section 2: Population and Sample
  1. Population of interest
  1. Where and when will your study take place?
  1. Does this match up to your population of interest, or are there conditions that make this study context different?
  1. Sample size
  1. How is this sample selected? Be specific about the procedure.
  1. Consent
  1. How will you obtain informed consent? If you will not, what is the justification?
  2. Is this population vulnerable to being coerced into participating in the study?
  1. Ethics
  1. Is the sample size large enough that you have sufficient power for your research conclusions to be credible and useful?
  2. Is the sample size no larger than necessary for the research?
  3. Can the research (results) be used to target people or make people more vulnerable?
Section 3: Intervention
  1. Status Quo
  1. Describe the status quo–what are the current conditions in terms of the outcomes you hope to change? What aspects of the intervention already exist, if any?
  1. Intervention
  1. Describe your intervention(s)
  2. What is already known about the effect of the proposed intervention relative to the status quo? Is there credible evidence on the question?
  1. Control
  1. Describe the control condition
  2. Is the control condition a pure control (no intervention at all) or a placebo? What is the placebo contition designed to control for?
  1. Units
  1. To what units (level) will the intervention be applied? Individual, classroom, school, village, municipality, etc.
  2. Is this the same level at which outcomes will be measured? If not, how will you address the different levels if they do not perfectly overlap?
  1. Compliance
  1. What does it mean to “take” (comply with) the the intervention?
  2. If the intervention is a prgram, how much someone need to attend (showing up once? finishing the program?) in order to count as having attended?
  1. Non- Compliance
  1. Is there any concern with non-compliance (either taking the intervention if assigned to control/placebo or failing to take the intervention if assigned to treatment)?
  1. Ethics
  1. Is the control condition no worse than the status quo, according to the best evidence available?
  2. Are there concerns that participants may be forced to comply wiht the intervention?
  3. What are the risks and magnitude of potentially negative effects of the treatment? Are such risks concentrated on a particular subset of your population?
Section 4: Outcome and Covariates
  1. Primary Outcome
  1. What is your primary outcome?
  1. Measurement
  1. How will it be measured? (Give the actual text of the survey question and response options, if using a survey measure. Is the outcome continuous, binary, etc.?)
  1. Priors
  1. What is the expected distribution of the primary outcome? (This may come from a prior study on a similar population or you may have to make an educated guess).
  1. Validity and measurement error
  1. Is there any concern with untruthful reporting? If so, how will you address it?
  1. Stages
  1. Will you collect a baseline?
  2. Will you collect a midline?
  3. Will you collect multiple waves of endline measurement?
  4. If you will collect a baseline or midline, how will you find the same respondents (minimize attrition?)
  1. Covariates
  1. What covariate data do you need, including for subgroup analysis? How will covariates be measured?
  2. What addtional covariates (if any) will you measure?
  3. What additional outcomes or covariates will you collect to distinguish between your explanation and alternatives if your findings are consistent with your hypothesis?
  1. Ethics
  1. Will data collection be onerous (time, effort) or painful (physically, emotionally) for any respondents?
  2. Are these costs necessary? Have they been minimized?
  3. Are they outweighed by the potential benefits of the research to society?
Section 5: Randomization
  1. Randomization strategy
  1. Complete/simple, block, cluster, factorial etc.
  1. Blocks
  1. What are they, how many blocks, how many units per block?
  1. Clusters
  1. What are they, how many clusters, how many units per cluster?
  2. If you have clusters, what is the intra-class correlation (ICC)?
  3. Is clustering strictly necessary, or could you randomize at the individual level?
Section 6: Analysis
  1. Estimator
  1. What is your estimator?
  1. Standard Errors
  1. What kind of standard errors will you use?
  1. Test
  1. If you plan to report a p-value, what kind of test will you use?
  1. Missing Data
  1. How will you handle missing data?
  1. Effect size
  1. What is the expected effect size? What is the minimum effect size that would make the study worth running? what effect sizes have similar studies found?
  1. What is your power?
Section 7: Implementation
  1. Randomization
  1. How will you conduct the randomization? (on a computer in advance, drawing from an urn in public, etc.)
  1. Implementation
  1. Who will implement the intervention?
  2. Are there any dangers to your research team, including enumerators? How will you minimize them?
  3. How will you track the quality of the implementation of the intervention?
  1. Compliance
  1. Who will measure compliance?
  1. Data management
  1. How will you manage the data? (security, anonymity, etc.)

The MIDA framework 📊

The MIDA framework

  • Finally, we will briefly introduce the MIDA framework
  • As we have seen, a good research design is a practical plan for research that makes the best use of available resources and produces a credible answer
  • But how can we assess the quality of a research design before we implement it?
  • Simulate it!
  • Luckly, there is a package that does all the hard work for us: DeclareDesign
  • Helps us be concrete about the stages of research design by allowing us to represent them in code, which then allows us to simulate the stages of research design in order to understand the properties of the statistical estimators and tests that we use.

Introduction to DeclareDesign

  • See https://declaredesign.org/

  • Regardless of the method, research designs have four components

  • MIDA:

    • M: Model (of how the world works)
    • I: Inquiry
    • D: Data strategy
    • A: Answer strategy
  • Critical insight: Simulation of a research design teaches what answers a research design can find

  • Working with simulated data before data collection helps prevent errors and oversights

Model

  • A model of how we think the world works, including:

    • \(T\)s and \(X\)s (treatments or focal causal variables like policy interventions and other background variables)
    • \(Y\)s (dependent variables)
    • Relations between variables (potential outcomes, functional forms, auxiliary variables and contexts)
    • Probability distribution over \(X\)s if not also over \(Y\)s.
  • This is the theory!

    • Codified numerically
  • The model is wrong by definition. If it were correct, you wouldn’t need to do the study

  • But without a model, we don’t have a place to start to assess what can be learned

Inquiry

  • An answerable question about the model:

  • What is the effect of a treatment \(T\) on an outcome \(Y\) ?

  • Usually a quantity of interest, some summary of the data:

    • Descriptive: What is the mean of \(Y\) in treatment, formally.
    • Causal: What would be the average difference of \(Y\) if we switched treatment to control? If we claimed that \(T\) cannot cause \(Y\), how much evidence do we have about this claim?
    • Quantity is the estimand or hypothesis
  • Not all questions that we want to ask are answerable

    • And the range of inquiries we can ask are limited: how much can we learn from some summary quantity such as the average treatment effect (ATE)?

Data

  • Realise (generate) data on the set of variables (all \(X\)s, \(T\)s and \(Y\)s)

  • A function of your model

  • Includes both:

    • Sampling — how units arrive in your sample
    • Treatment assignment — what values of endogenous variables are revealed

Answer

  • Given a realization of the data, generate an answer – an estimate of the quantity of interest (inquiry)

  • This is your estimator or test:

    • Difference-in-means
    • \(t\)-test
    • Regression methods
    • etc.
  • Answer is an estimate of the quantity of interest or \(p\)-value (inquiry/estimand/test)

Let’s see an example: Two-arm trial

Two-arm trial

  • Two-arm trial: A common design in which units are randomly assigned to one of two conditions (treatment and control)
  • Model: We have a treatment \(T\) that we think might affect an outcome \(Y\). We have \(N\) units, which we can sample in many ways (simple random sampling, stratified sampling, etc.). We can include background variables \(X\) that we think might affect \(Y\), but let’s keep it simple for now
  • Inquiry: What is the effect of \(T\) on \(Y\)?
    • Defined by the average treatment effect (ATE): \(\text{ATE} = \frac{1}{N} \sum_{i=1}^N Y_i(1) - \frac{1}{N} \sum_{i=1}^N Y_i(0)\)
  • Data: We randomly assign units to treatment and control, and we measure \(Y\) for each unit
  • Answer: We estimate the ATE using a difference-in-means test

Two-arm trial

  • This is a Directed acyclic graph (DAG) of the model
    • A DAG is a way to represent the model of the world, showing the relationships between variables
    • An outcome \(Y\) is affected by unknown factors \(U\) and a treatment \(Z\) (the authors use \(Z\) instead of \(T\))
    • The measurement procedure \(Q\) affects in the sense that it measures a latent outcome and records the measurement in a dataset
    • No arrows lead into \(Z\) because it is randomly assigned

Declare

Diagnose

Design

Avaliable models

Wrap-up 🤓

Summary

Research design

  • Good Research Questions: Produce knowledge that people care about and solve real-world problems
  • Theory: Essential for generating hypotheses, informing design, and guiding inference
  • Operationalisation: Translating theoretical concepts into measurable variables (e.g., social isolation → frequency of interactions)
  • Pre-registration: Filing research designs and hypotheses publicly to reduce bias and improve credibility
  • Reproducibility: Ensuring research can be replicated, addressing the reproducibility crisis in science.

Importance of theory

  • No “Just Doing an Experiment”: All research involves theory, whether implicit or explicit
  • Model of the World: Helps predict how interventions affect outcomes and guides design
  • Alignment: Research design must align with theoretical frameworks to address intended questions
  • Causal Inference: Understanding the causal effect of \(T\) on \(Y\) requires a model of the world

Summary

Pre-registration

  • EGAP: Advocates for pre-registration and reproducible research
  • EGAP Research Design Form: Blueprint for research designs, available on EGAP’s website
  • MIDA Framework
    • Model: How we think the world works
    • Inquiry: Answerable question about the model
    • Data: Realising data on variables
    • Answer: Generating an estimate of the quantity of interest

DeclareDesign

  • Declare: Declare the model, inquiry, data, and answer
  • Diagnose: Diagnose the design to understand its properties
  • Design: Design the research to answer the inquiry
  • Available Models: Different models available to simulate research designs
  • Putting it all together: Simulate research designs to understand their properties

Next class

  • We will discuss the experimental ideal: the gold standard for causal inference
  • But we will also see what to do when we cannot run an experiment (and still use the experimental logic to infer causality)
  • Natural and quasi-experiments: the next best thing to a randomised controlled trial
  • I will also post the first assignment on the course website and on Canvas
  • Please read Blair et al. (2023) if you haven’t yet, as well as the required readings for next class
  • … and that’s it for today! 😊🎉

Questions?

Thank you very much! 👏
See you soon! 😊