Lecture 02 - The Research Design Process
Lots of additional readings included in the syllabus 😊
https://forms.gle/xUL39k7ngY2kXYJC7
Source: University of Cambridge
As researchers, we are interested in research questions about how the world works
There are a number of different types of questions that we may want to answer. In academia, they are often divided into two broad categories:
Then we can move on to questions about why? \(\rightarrow\) i.e., knowing the effect of a cause is necessary before moving on to understanding the causes of an effect.
(Next sessions: more on about what we mean by causality and how experiments give us leverage to make causal claims.)
What is the phenomenon we want to explain?
Does the cause we theorise lead to observing changes in \(Y\)?
What is the theory of change?
We are ultimately interested in how two theoretical concepts are related, measured by observed variables \(T\) (our treatment) and \(Y\) (our outcomes)
There is no such thing as “just doing an experiment” 🧐
All research design involves theory, whether implicit or explicit
Our questions are value laden: For example, social scientists studied marijuana use in the 1950s as a form of “deviance”, the questions focused on “why are people making such bad decisions?” or “how can policy makers prevent marijuana use?”
Why do the research? We might want to change how scientists explain the world and/or change the policy decisions in (a) one place and time and/or (b) in other places and times
Research focused on learning the causal effect of \(T\) on \(Y\) requires a model of the world: how might intervention \(T\) might have an effect on some outcome \(Y\), and why, and how large might be the effect. It helps us think about how a different intervention or targeting different recipients might lead to different results.
Our theories and models are important not just for generating hypotheses, but for informing design and strategies for inference
Designing research will often clarify where we are less certain about our theories. Our theories will point to problems with our design. And questions arising from the process of design may indicate a need for more work on explanation and mechanism
What is the outcome of interest (\(Y\))?
What is the cause of interest (\(T\))?
What can be a theory that yields to this experimental design?
What can be the main hypothesis?
How can we measure our outcomes?
Can we directly manipulate \(T\)? (underlying treatment concept of interest)
How does our actual treatment relate to \(T\)?
Did everyone receive \(T\)?
Source: World Health Organization, 2003
Similarly, we often cannot directly observe the true value of the outcome concept for most of the outcomes we are interested in
Examples:
Moreover, the underlying outcome concept may be even under debate (e.g., democracy)
If our indicators don’t measure the underlying concept that we’re interested in, then we may not be able to learn very much, even if we have an otherwise very sound experiment
Now think of yourselves as the researchers
In pairs or groups of three:
Anticipating or facing difficulties in getting published, manuscripts with null results are never submitted for review or put away in a “file drawer” after several rejections
We all face incentives to change your specifications, measurements, or even hypotheses to get a statistically significant result (\(p\)-hacking) to improve chances of publication
Even people not facing these incentives make many decisions when they analyse data: handling missing values and duplicate observations, creating scales, etc. And these choices can be consequential
Overall result: reduced credibility for individual pieces of research and (rightly) reduced confidence in whether we actually know what we claim to know
Amy Cuddy demonstrating her theory of “power posing”. It could never be replicated. Ted Talk with 74 million views!
One part of solving this problem is to focus on the design, rather than the outcomes
The bias against null results can be overcome by reviewing the design, prior to learning the results
A good design executed well will produce credible research, which might be a null result. We want credible and actionable null results
Reviews of designs are also an opportunity to improve the research before it is implemented
“…if you gave the PAP to two different programmers and asked each to prepare the data for the primary dependent/independent variable(s), they should both be able to do so without asking any questions, and they should both be able to get the same answer.” Olken, 2015
Pre-registration is the filing of your research design and hypotheses with a publicly-accessible repository. EGAP hosts one that you can use for free (currently on OSF.io using the EGAP registration form)
Pre-registration does not preclude later exploratory analyses that were not stated in advance. You just have to clearly distinguish between the two
Even if you will be submitting a paper with results rather than a design to an academic journal or you are primarily interested in a final report with findings for a policy audience, there are important advantages to you and to other researchers from pre-registering your research
Section 1: Introduction | |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Section 2: Population and Sample | |
---|---|
|
|
|
|
|
|
|
|
|
|
Section 3: Intervention | |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Section 4: Outcome and Covariates | |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Section 5: Randomization | |
---|---|
|
|
|
|
|
|
Section 6: Analysis | |
---|---|
|
|
|
|
|
|
|
|
|
|
|
Section 7: Implementation | |
---|---|
|
|
|
|
|
|
|
|
Regardless of the method, research designs have four components
MIDA:
Critical insight: Simulation of a research design teaches what answers a research design can find
Working with simulated data before data collection helps prevent errors and oversights
A model of how we think the world works, including:
This is the theory!
The model is wrong by definition. If it were correct, you wouldn’t need to do the study
But without a model, we don’t have a place to start to assess what can be learned
An answerable question about the model:
What is the effect of a treatment \(T\) on an outcome \(Y\) ?
Usually a quantity of interest, some summary of the data:
Not all questions that we want to ask are answerable
Realise (generate) data on the set of variables (all \(X\)s, \(T\)s and \(Y\)s)
A function of your model
Includes both:
Given a realization of the data, generate an answer – an estimate of the quantity of interest (inquiry)
This is your estimator or test:
Answer is an estimate of the quantity of interest or \(p\)-value (inquiry/estimand/test)
Source: Blair et al. (2023)