QTM 385 - Experimental Methods

Lecture 02 - The Research Design Process

Danilo Freire

danilo.freire@emory.edu

Emory University

Hello, everyone! 👋
Nice to meet you again! 😊

Recap and lecture overview 📚

Welcome back! 🎉

About the course

This is an applied course on experimental methods! 🤓
All materials, including lectures, code, and assignments, are available online
- Course website: https://danilofreire.github.io/qtm385/
- GitHub repository: https://github.com/danilofreire/qtm385
About me: Danilo Freire, VAP in QTM
- Email: danilo.freire@emory.edu
- Office hours: At your convenience, just send me an email first
Teaching philosophy: Learning by doing, participation is always welcome, and I’m here to help you out! 😉

Last week

We saw a brief history of experimental methods and their main characteristics
- Key milestones from Petrarch (1394) to modern RCTs
We also saw the main types of experiments
- Lab Experiments: Conducted in controlled environments with high internal validity
- Field Experiments: Take place in real-world settings, offering higher external validity
- Natural Experiments: Utilise naturally occurring variations, providing less control but valuable insights
Finally, we briefly discussed the distinction between correlation and causation
Key tools: R/Python, GitHub, Quarto, and Jupyter Notebooks
Assignments: Problem sets (50%), pre-analysis plan (20%), and final project (30%), with a late policy and collaboration guidelines

Course website and materials

Click on the image to access the course website! 🌐

https://danilofreire.github.io/qtm385/

Textbooks

Required: Alan Gerber and Donald Green - Field Experiments

Recommended: Douglas C. Montgomery - Design and Analysis of Experiments

Lots of additional readings included in the syllabus 😊

Today’s lecture 📅

The research design process: Questions and credibility

What makes a research question good?
The importance of theory
Designing and selecting your treatment
How to interpret your findings carefully
The importance of transparency and reproducibility
The research design process
EGAP research design form: A guide to help you think through the key components of your research design
Introduction to DeclareDesign and the MIDA framework
Brief overview of pre-registration and pre-analysis plans

https://book.declaredesign.org/introduction/what-is-a-research-design.html#mida-the-four-elements-of-a-research-design

But first…

Let’s do an experiment right now! 😄

Instructions

Come here to the front of the room (one at a time!)
You will look at my screen and answer a question
Please don’t say the answer out loud and don’t give hints to others!
Let’s see what happens! 😊

https://forms.gle/xUL39k7ngY2kXYJC7

Let’s discuss the results! 📊

What makes a research question good? 🤔

The importance of a good research question

The answer to a good research question should produce knowledge that people will care about
Addressing the question should (help) solve a problem, make a decision, or clarify/challenge our understanding of the world
But an interesting question is not enough
A good research design is a practical plan for research that makes the best use of available resources and produces a credible answer
The quality of a research design can be assessed by how well it produces results that can be used to guide policy and improve science:
- A great research design produces results that clearly point in certain directions that we care about
- A poor research design produces results that leave us in the dark — results with confusing interpretation

Source: University of Cambridge

Causality and experiments

As researchers, we are interested in research questions about how the world works
There are a number of different types of questions that we may want to answer. In academia, they are often divided into two broad categories:
- Descriptive questions: Descriptions of a given phenomena: e.g., “how do teachers allocate their time during a school day?”
- Causal questions: Questions about how \(X\) affects \(Y\): e.g., “Does providing vocational training to migrants improve their economic integration in the receiving country?”
Then we can move on to questions about why? \(\rightarrow\) i.e., knowing the effect of a cause is necessary before moving on to understanding the causes of an effect.
(Next sessions: more on about what we mean by causality and how experiments give us leverage to make causal claims.)

Theory

What is the phenomenon we want to explain?
- Our outcome (we are going to call it \(Y\))
Does the cause we theorise lead to observing changes in \(Y\)?
- Our treatment (in the context of experiments) (we are going to call it \(T\))
- We will use \(X\) to refer to background variables (covariates)
What is the theory of change?
We are ultimately interested in how two theoretical concepts are related, measured by observed variables \(T\) (our treatment) and \(Y\) (our outcomes)

Why is theory important, then?

Source: https://www.sciencedirect.com/topics/materials-science/statistical-design

The importance of theory

There is no such thing as “just doing an experiment” 🧐
All research design involves theory, whether implicit or explicit
Our questions are value laden: For example, social scientists studied marijuana use in the 1950s as a form of “deviance”, the questions focused on “why are people making such bad decisions?” or “how can policy makers prevent marijuana use?”
Why do the research? We might want to change how scientists explain the world and/or change the policy decisions in (a) one place and time and/or (b) in other places and times
Research focused on learning the causal effect of \(T\) on \(Y\) requires a model of the world: how might intervention \(T\) might have an effect on some outcome \(Y\), and why, and how large might be the effect. It helps us think about how a different intervention or targeting different recipients might lead to different results.
Our theories and models are important not just for generating hypotheses, but for informing design and strategies for inference
Designing research will often clarify where we are less certain about our theories. Our theories will point to problems with our design. And questions arising from the process of design may indicate a need for more work on explanation and mechanism

Richard Feynman on the experimental method

Click on the image to watch the video! 📺

Designing or selecting your treatment

Operationalisation: The process of translating theoretical concepts into measurable variables.
- Example: Turning the concept of “social isolation” into a measurable variable such as the frequency of social interactions
Key Questions:
- How will we measure our outcomes?
- What indicators will be used to represent the underlying concept?
- How will we manipulate the cause of interest?
- What intervention or treatment will be implemented?
Importance of Alignment:
- The research design must align with the theoretical framework to ensure that the study is addressing the intended questions

Let’s consider the example from our experiment practicum

What is the outcome of interest (\(Y\))?
What is the cause of interest (\(T\))?
What can be a theory that yields to this experimental design?
What can be the main hypothesis?
How can we measure our outcomes?

What do you think?

Measuring treatments

Can we directly manipulate \(T\)? (underlying treatment concept of interest)
- Ethical, logistical and other types of considerations can limit our ability to manipulate all of the indicators of \(T\)
- At best, we may be able to change some of its indicators
- We design a treatment, \(T\), to do so
How does our actual treatment relate to \(T\)?
- But our treatment can be manipulating other things (bundled treatments)
Did everyone receive \(T\)?
- Measure compliance

Source: World Health Organization, 2003

Measuring outcomes

Similarly, we often cannot directly observe the true value of the outcome concept for most of the outcomes we are interested in
Examples:
- Correct answers to problems (indicators) for underlying mathematical aptitude (the actual phenomenon)
- Days without food (indicators) for hunger (the actual phenomenon)
- Reports of bribes (indicators) for corruption (the actual phenomenon)
Moreover, the underlying outcome concept may be even under debate (e.g., democracy)
If our indicators don’t measure the underlying concept that we’re interested in, then we may not be able to learn very much, even if we have an otherwise very sound experiment

Thinking about the treatment from the practicum…

How do you feel about the world?

Now think of yourselves as the researchers
In pairs or groups of three:
- Generate hypothesis on potential heterogeneous effects
- Generate expected effect size
- Discuss theories behind the hypothesis and expected effect size, with emphasis on the importance of theory
- Other ways of measuring the outcome or mode of administering the treatment?

Pre-registration and pre-analysis plans 📝

Bias in published research against null results

Anticipating or facing difficulties in getting published, manuscripts with null results are never submitted for review or put away in a “file drawer” after several rejections
We all face incentives to change your specifications, measurements, or even hypotheses to get a statistically significant result (\(p\)-hacking) to improve chances of publication
Even people not facing these incentives make many decisions when they analyse data: handling missing values and duplicate observations, creating scales, etc. And these choices can be consequential
Overall result: reduced credibility for individual pieces of research and (rightly) reduced confidence in whether we actually know what we claim to know

Amy Cuddy demonstrating her theory of “power posing”. It could never be replicated. Ted Talk with 74 million views!

The reproducibility crisis in medicine

https://doi.org/10.1371/journal.pmed.0020124

The reproducibility crisis in psychology

https://www.science.org/doi/10.1126/science.aac4716

The reproducibility crisis in political science

https://www.journals.uchicago.edu/doi/abs/10.1086/734279?journalCode=jop

The reproducibility crisis in economics

https://onlinelibrary.wiley.com/doi/full/10.1111/joes.12598

The reproducibility crisis… everywhere?

https://onlinelibrary.wiley.com/doi/full/10.1002/jrsm.1703

Not even our code is reproducible

https://www.nature.com/articles/s41597-022-01143-6

Towards review of design rather than outcomes

One part of solving this problem is to focus on the design, rather than the outcomes
The bias against null results can be overcome by reviewing the design, prior to learning the results
A good design executed well will produce credible research, which might be a null result. We want credible and actionable null results
Reviews of designs are also an opportunity to improve the research before it is implemented

“…if you gave the PAP to two different programmers and asked each to prepare the data for the primary dependent/independent variable(s), they should both be able to do so without asking any questions, and they should both be able to get the same answer.” Olken, 2015

Pre-registration of analysis plans and research designs

Pre-registration is the filing of your research design and hypotheses with a publicly-accessible repository. EGAP hosts one that you can use for free (currently on OSF.io using the EGAP registration form)
Pre-registration does not preclude later exploratory analyses that were not stated in advance. You just have to clearly distinguish between the two
Even if you will be submitting a paper with results rather than a design to an academic journal or you are primarily interested in a final report with findings for a policy audience, there are important advantages to you and to other researchers from pre-registering your research
- You can learn about other research, completed and in progress; others can learn about yours. We can learn about studies that produced null results
- It forces you to state your hypotheses and plan of analysis in advance of seeing the results, which limits \(p\)-hacking

Evidence in Governance and Politics (EGAP)

EGAP is a network of researchers and practitioners who conduct experiments to learn about effective governance and politics
They have long been advocates of pre-registration and reproducible research, such as with their Metaketa initiative
Disclosure: I am a member of EGAP and have been involved in several of their projects (including the courses on experimental design 😉)
They have lots of useful resources for researchers interested in experimental methods, some of them we will be using in this course. More in https://egap.org/

The EGAP research design form

The EGAP research design form is a guide to help you think through the key components of your research design
It shows the main parts of a pre-analysis plan, and it is a good blueprint for your own future experiments
We will be using it in our course, and you can access it here or on our website
We will have a few sessions dedicated to discussing pre-analysis plans, but I would like to introduce you to the form today (so you can start thinking about your final project!)

The EGAP research design form

Section 1: Introduction
Researcher name
Research project title
One sentence summary of your specific research question
General motivation	Why should someone who is not an academic care care about the results of this research? [1 paragraph] What policy decision(s) will your research help inform? [1 paragraph] \|
Theoretical motivation	What theoretical questions can this research shed light on? [1 paragraph] Key debate(s)/literature(s) that will be informed by the answer to your research question [1 paragraph]
Primary hypotheses	What are the key parameter/estimands the research design seeks to estimate? What sign and/or magnitude is predicted by primary hypotheses for each parameter/estimand? [1-2 paragraphs] What is the logic or theory of change behind the primary hypotheses [1-2 paragraphs] What are the key pieces in the relevant academic literature that inform your hypotheses? [2-3 pieces]
Secondary hypotheses	What are the secondary paramater/estimands the research design seeks to estimate? What sign and/or magnitude is predicted by the secondary hypotheses for each parameter/estimand [These may be conditional effects for subgroups or hypotheses about additional outcomes or cross- randomized treatments.] What is the logic or theory of change behind each secondary hypothesis? [Explain what effects we should expect if the theory behind your primary hypothesis is correct.]
Alternative explanations if results are consistent with hypotheses	What alternative theories could explain the results? Hypothesis for an alternative outcome (or other subgroups) that would be consistent only with the alternative explanation and not the logic behind your primary hypothesis.
*Alternative explanations if results are inconsistent* with hypotheses**	What alternative theories could explain the results?

Section 2: Population and Sample
Population of interest
Where and when will your study take place?	Does this match up to your population of interest, or are there conditions that make this study context different?
Sample size	How is this sample selected? Be specific about the procedure.
Consent	How will you obtain informed consent? If you will not, what is the justification? Is this population vulnerable to being coerced into participating in the study?
Ethics	Is the sample size large enough that you have sufficient power for your research conclusions to be credible and useful? Is the sample size no larger than necessary for the research? Can the research (results) be used to target people or make people more vulnerable?

Section 3: Intervention
Status Quo	Describe the status quo–what are the current conditions in terms of the outcomes you hope to change? What aspects of the intervention already exist, if any?
Intervention	Describe your intervention(s) What is already known about the effect of the proposed intervention relative to the status quo? Is there credible evidence on the question?
Control	Describe the control condition Is the control condition a pure control (no intervention at all) or a placebo? What is the placebo contition designed to control for?
Units	To what units (level) will the intervention be applied? Individual, classroom, school, village, municipality, etc. Is this the same level at which outcomes will be measured? If not, how will you address the different levels if they do not perfectly overlap?
Compliance	What does it mean to “take” (comply with) the the intervention? If the intervention is a prgram, how much someone need to attend (showing up once? finishing the program?) in order to count as having attended?
Non- Compliance	Is there any concern with non-compliance (either taking the intervention if assigned to control/placebo or failing to take the intervention if assigned to treatment)?
Ethics	Is the control condition no worse than the status quo, according to the best evidence available? Are there concerns that participants may be forced to comply wiht the intervention? What are the risks and magnitude of potentially negative effects of the treatment? Are such risks concentrated on a particular subset of your population?

Section 4: Outcome and Covariates
Primary Outcome	What is your primary outcome?
Measurement	How will it be measured? (Give the actual text of the survey question and response options, if using a survey measure. Is the outcome continuous, binary, etc.?)
Priors	What is the expected distribution of the primary outcome? (This may come from a prior study on a similar population or you may have to make an educated guess).
Validity and measurement error	Is there any concern with untruthful reporting? If so, how will you address it?
Stages	Will you collect a baseline? Will you collect a midline? Will you collect multiple waves of endline measurement? If you will collect a baseline or midline, how will you find the same respondents (minimize attrition?)
Covariates	What covariate data do you need, including for subgroup analysis? How will covariates be measured? What addtional covariates (if any) will you measure? What additional outcomes or covariates will you collect to distinguish between your explanation and alternatives if your findings are consistent with your hypothesis?
Ethics	Will data collection be onerous (time, effort) or painful (physically, emotionally) for any respondents? Are these costs necessary? Have they been minimized? Are they outweighed by the potential benefits of the research to society?

Section 5: Randomization
Randomization strategy	Complete/simple, block, cluster, factorial etc.
Blocks	What are they, how many blocks, how many units per block?
Clusters	What are they, how many clusters, how many units per cluster? If you have clusters, what is the intra-class correlation (ICC)? Is clustering strictly necessary, or could you randomize at the individual level?

Section 6: Analysis
Estimator	What is your estimator?
Standard Errors	What kind of standard errors will you use?
Test	If you plan to report a p-value, what kind of test will you use?
Missing Data	How will you handle missing data?
Effect size	What is the expected effect size? What is the minimum effect size that would make the study worth running? what effect sizes have similar studies found?
What is your power?

Section 7: Implementation
Randomization	How will you conduct the randomization? (on a computer in advance, drawing from an urn in public, etc.)
Implementation	Who will implement the intervention? Are there any dangers to your research team, including enumerators? How will you minimize them? How will you track the quality of the implementation of the intervention?
Compliance	Who will measure compliance?
Data management	How will you manage the data? (security, anonymity, etc.)

The MIDA framework 📊

The MIDA framework

Finally, we will briefly introduce the MIDA framework
As we have seen, a good research design is a practical plan for research that makes the best use of available resources and produces a credible answer
But how can we assess the quality of a research design before we implement it?
Simulate it!
Luckly, there is a package that does all the hard work for us: DeclareDesign
Helps us be concrete about the stages of research design by allowing us to represent them in code, which then allows us to simulate the stages of research design in order to understand the properties of the statistical estimators and tests that we use.

Introduction to DeclareDesign

See https://declaredesign.org/
Regardless of the method, research designs have four components
MIDA:
- M: Model (of how the world works)
- I: Inquiry
- D: Data strategy
- A: Answer strategy
Critical insight: Simulation of a research design teaches what answers a research design can find
Working with simulated data before data collection helps prevent errors and oversights

Model

A model of how we think the world works, including:
- \(T\)s and \(X\)s (treatments or focal causal variables like policy interventions and other background variables)
- \(Y\)s (dependent variables)
- Relations between variables (potential outcomes, functional forms, auxiliary variables and contexts)
- Probability distribution over \(X\)s if not also over \(Y\)s.
This is the theory!
- Codified numerically
The model is wrong by definition. If it were correct, you wouldn’t need to do the study
But without a model, we don’t have a place to start to assess what can be learned

Inquiry

An answerable question about the model:
What is the effect of a treatment \(T\) on an outcome \(Y\) ?
Usually a quantity of interest, some summary of the data:
- Descriptive: What is the mean of \(Y\) in treatment, formally.
- Causal: What would be the average difference of \(Y\) if we switched treatment to control? If we claimed that \(T\) cannot cause \(Y\), how much evidence do we have about this claim?
- Quantity is the estimand or hypothesis
Not all questions that we want to ask are answerable
- And the range of inquiries we can ask are limited: how much can we learn from some summary quantity such as the average treatment effect (ATE)?

Data

Realise (generate) data on the set of variables (all \(X\)s, \(T\)s and \(Y\)s)
A function of your model
Includes both:
- Sampling — how units arrive in your sample
- Treatment assignment — what values of endogenous variables are revealed

Answer

Given a realization of the data, generate an answer – an estimate of the quantity of interest (inquiry)
This is your estimator or test:
- Difference-in-means
- \(t\)-test
- Regression methods
- etc.
Answer is an estimate of the quantity of interest or \(p\)-value (inquiry/estimand/test)

Let’s see an example: Two-arm trial

Two-arm trial

Two-arm trial: A common design in which units are randomly assigned to one of two conditions (treatment and control)
Model: We have a treatment \(T\) that we think might affect an outcome \(Y\). We have \(N\) units, which we can sample in many ways (simple random sampling, stratified sampling, etc.). We can include background variables \(X\) that we think might affect \(Y\), but let’s keep it simple for now
Inquiry: What is the effect of \(T\) on \(Y\)?
- Defined by the average treatment effect (ATE): \(\text{ATE} = \frac{1}{N} \sum_{i=1}^N Y_i(1) - \frac{1}{N} \sum_{i=1}^N Y_i(0)\)
Data: We randomly assign units to treatment and control, and we measure \(Y\) for each unit
Answer: We estimate the ATE using a difference-in-means test

Two-arm trial

This is a Directed acyclic graph (DAG) of the model
- A DAG is a way to represent the model of the world, showing the relationships between variables
- An outcome \(Y\) is affected by unknown factors \(U\) and a treatment \(Z\) (the authors use \(Z\) instead of \(T\))
- The measurement procedure \(Q\) affects in the sense that it measures a latent outcome and records the measurement in a dataset
- No arrows lead into \(Z\) because it is randomly assigned

Source: Blair et al. (2023)

Declare

Diagnose

Design

Avaliable models

Wrap-up 🤓

Summary

Research design

Good Research Questions: Produce knowledge that people care about and solve real-world problems
Theory: Essential for generating hypotheses, informing design, and guiding inference
Operationalisation: Translating theoretical concepts into measurable variables (e.g., social isolation → frequency of interactions)
Pre-registration: Filing research designs and hypotheses publicly to reduce bias and improve credibility
Reproducibility: Ensuring research can be replicated, addressing the reproducibility crisis in science.

Importance of theory

No “Just Doing an Experiment”: All research involves theory, whether implicit or explicit
Model of the World: Helps predict how interventions affect outcomes and guides design
Alignment: Research design must align with theoretical frameworks to address intended questions
Causal Inference: Understanding the causal effect of \(T\) on \(Y\) requires a model of the world

Summary

Pre-registration

EGAP: Advocates for pre-registration and reproducible research
EGAP Research Design Form: Blueprint for research designs, available on EGAP’s website
MIDA Framework
- Model: How we think the world works
- Inquiry: Answerable question about the model
- Data: Realising data on variables
- Answer: Generating an estimate of the quantity of interest

DeclareDesign

Declare: Declare the model, inquiry, data, and answer
Diagnose: Diagnose the design to understand its properties
Design: Design the research to answer the inquiry
Available Models: Different models available to simulate research designs
Putting it all together: Simulate research designs to understand their properties

Next class

We will discuss the experimental ideal: the gold standard for causal inference
But we will also see what to do when we cannot run an experiment (and still use the experimental logic to infer causality)
Natural and quasi-experiments: the next best thing to a randomised controlled trial
I will also post the first assignment on the course website and on Canvas
Please read Blair et al. (2023) if you haven’t yet, as well as the required readings for next class
… and that’s it for today! 😊🎉

Questions?

Thank you very much! 👏
See you soon! 😊

QTM 385 - Experimental Methods

Hello, everyone! 👋 Nice to meet you again! 😊

Recap and lecture overview 📚

Welcome back! 🎉

About the course

Last week

Course website and materials

Textbooks

Required: Alan Gerber and Donald Green - Field Experiments

Recommended: Douglas C. Montgomery - Design and Analysis of Experiments

Today’s lecture 📅

The research design process: Questions and credibility

But first… Let’s do an experiment right now! 😄

Instructions

Let’s discuss the results! 📊

What makes a research question good? 🤔

The importance of a good research question

Causality and experiments

Theory

The importance of theory

Richard Feynman on the experimental method

Designing or selecting your treatment

Let’s consider the example from our experiment practicum

Measuring treatments

Measuring outcomes

Thinking about the treatment from the practicum…

How do you feel about the world?

Pre-registration and pre-analysis plans 📝

Bias in published research against null results

The reproducibility crisis in medicine

The reproducibility crisis in psychology

The reproducibility crisis in political science

The reproducibility crisis in economics

The reproducibility crisis… everywhere?

Not even our code is reproducible

Towards review of design rather than outcomes

Pre-registration of analysis plans and research designs

Evidence in Governance and Politics (EGAP)

The EGAP research design form

The EGAP research design form

The MIDA framework 📊

The MIDA framework

Introduction to DeclareDesign

Model

Inquiry

Data

Answer

Let’s see an example: Two-arm trial

Two-arm trial

Two-arm trial

Declare

Diagnose

Design

Avaliable models

Wrap-up 🤓

Summary

Research design

Importance of theory

Summary

Pre-registration

DeclareDesign

Next class

Questions?

Thank you very much! 👏 See you soon! 😊

Hello, everyone! 👋
Nice to meet you again! 😊

But first…

Let’s do an experiment right now! 😄

Thank you very much! 👏
See you soon! 😊