QTM 385 - Experimental Methods

Lecture 14 - Writing Pre-Analysis Plans

Danilo Freire

danilo.freire@emory.edu

Emory University

Hello, everyone! 😊

Brief recap 📚

Last class, we discussed…

Quarto for reproducible research and document authoring
Literate programming principles combining code and documentation
Version control integration with Git
Multi-format publishing (HTML, PDF, slides)
DeclareDesign simulation workflow components
Research design fundamentals and simulation workflows
Six key components: Population, Outcomes, Sampling, Assignment, Estimand, Estimator
Diagnostic analysis with power calculations

DeclareDesign Library

Today’s plan 📅

A closer look at pre-analysis plans

Writing and executing PAPs

We have briefly discussed the importance of PAPs before
Today, we will dive a little deeper into the topic
Discuss their pros and cons
We will talk about its components
- Research questions, hypotheses, variables, estimations, threats to validity
In your group, you will work on a PAP template and discuss how its components apply to your research project
Finally, we will see some examples of PAPs you can use as a reference

Source: Open Science Framework

The brief history of PAPs 📜

Why do we bother with PAPs?

The idea of PAPs is actually more recent than you might think
While RCTs have been around for decades, the first PAPs were written in the early 2000s in response to a growing concern about false results in medicine
In 1997, the US. Food and Drug Administration Modernization Act (FDAMA) mandated the public registration of clinical trials, including protocols for data collection and analysis
This led to ClinicalTrials.gov in 2000, a registry requiring researchers to outline primary outcomes, sample sizes, and statistical methods before patient enrollment
As you expect, these statistical analysis plans (SAPs) were designed to prevent data mining and selective reporting
The International Council for Harmonisation (ICH) in Madrid further expanded these requirements to include handling of missing data and statistical models to reduce Type I errors

Source: ClinicalTrials.gov

The rise of PAPs: Ulysses pacts for researchers?

The 2010s were a decade of reproducibility crises
Several studies tried to replicate famous experiments and failed
The Open Science Framework (OSF) was created in 2012 to promote open science practices
The American Economic Association (AEA) launched the RCT Registry in 2013
Casey et al. (2012) demonstrated how PAPs could “bind researchers’ hands” against data mining by pre-specifying outcomes, covariates, and subgroup analyses
Olken (2015) argued that comprehensive PAPs, like a “Ulysses pact”, were necessary to limit researcher discretion
Simmons et al (2011) mention the idea of “researchers degrees of freedom”, since “it is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis”

Ulysses/Odysseus and the Sirens

Pros and cons of PAPs

The two main advantages of PAPs are transparency and credibility
- They prevent p-hacking and HARKing (hypothesising after results are known)
But what about the cons?
Ofosu and Posner (2021) mention some of the main criticisms:
- PAPs are time-consuming: 50% of researchers spend more than 2 weeks writing them, 1/4 spend more than a month
- They are inflexible and limit the scope for breakthroughs
- PAPs force researchers to run sub-optimal analyses to avoid deviations, thus creating boring and uninformative research
- People can steal your work and publish it before you do
- Finally, some say that PAPs don’t even work, as they require constant policing and this is something academia does not reward

More than 50 hypothesis specified? Really? How can this prevent data mining?

Source: Ofosu and Posner (2021)

Responding to criticisms

PAPs are time-consuming: True, but the authors mention that 64% of respondents said that writing a PAP was useful
They are inflexible: Not necessarily
- A good approach can be “to freely undertake exploratory investigations that go beyond the PAP, clearly labeling the results of such investigations in the paper as coming from analyses that were not pre-specified, with an explanation provided for why they were added”
PAPs are worthless without policing: About 40% of respondents said the reviewers mentioned the PAP in their reviews
People can steal your work: This seems to be a little concern, as only 15% of respondents said they cared about this

SOPs as a flexible alternative to PAPs?

Lin and Green (2016) argue that we should adopt a SOP as much as possible to avoid the pitfalls of PAPs
SOPs are more flexible in the way that they only specify what you plan to do if something happens, that is, you don’t need to specify all the details of your analysis
They are more like a “safety net” for your research
It can save time if you work in a research group, as you can share the same SOP with your colleagues and avoid writing multiple PAPs
You can see an example here: https://github.com/acoppock/Green-Lab-SOP
While interesting in practice, apparently SOPs never really took off
Ofosu and Posner (2021) mention that only 3% of respondents said they used SOPs
Maybe it’s time to revisit this idea?

SOPs as a flexible alternative to PAPs?

Source: Lin et al (2016)

Components of a PAP 📝

What should a PAP contain?

Scholars do not fully agree about how long or detailed a PAP should be
Uri (2017) argues that PAPs should not contain anything that is not essential to the analysis
McKenzie (2012) has a helpful pre-analysis plan checklist that includes only a few points
EGAP (2017) proposes a more comprehensive list of components, with 7 sections
- This is the one we will use in this class!
From experience, PAPs can be as short as 2 pages or as long as 50 pages
- The average is probably between 10-20 pages
At the mininum, PAPs should include 4 sections:
- Unit of analysis, population, and inclusion/exclusion criteria
- Method (observational, experimental, quasi-experimental)
- Experimental intervention or explanatory variable
- Outcomes of interest
Let’s see the EGAP template in more detail
Available at https://danilofreire.github.io/qtm385/design-form.html

Group activity

How would you organise your PAP?

Together with your group members, you will work on the EGAP template
The idea is to discuss how you would organise your PAP and fill out the template
You don’t need to complete all of it, just a brief summary of each section
You have a few minutes to discuss each section with your group, then we will share our thoughts with the class
Let’s start! 🚀

Section 01 - Introduction

Section 1: Introduction
Researcher name
Research project title
One sentence summary of your specific research question
General motivation	Why should someone who is not an academic care care about the results of this research? [1 paragraph] What policy decision(s) will your research help inform? [1 paragraph]
Theoretical motivation	What theoretical questions can this research shed light on? [1 paragraph] Key debate(s)/literature(s) that will be informed by the answer to your research question [1 paragraph]

Hypotheses

Section 1: Introduction
Primary hypotheses	What are the key parameter/estimands the research design seeks to estimate? What sign and/or magnitude is predicted by primary hypotheses for each parameter/estimand? [1-2 paragraphs] What is the logic or theory of change behind the primary hypotheses [1-2 paragraphs] What are the key pieces in the relevant academic literature that inform your hypotheses? [2-3 pieces]
Secondary hypotheses	What are the secondary paramater/estimands the research design seeks to estimate? What sign and/or magnitude is predicted by the secondary hypotheses for each parameter/estimand [These may be conditional effects for subgroups or hypotheses about additional outcomes or cross- randomized treatments.] What is the logic or theory of change behind each secondary hypothesis? [Explain what effects we should expect if the theory behind your primary hypothesis is correct.]
Alternative explanations if results are consistent with hypotheses	What alternative theories could explain the results? Hypothesis for an alternative outcome (or other subgroups) that would be consistent only with the alternative explanation and not the logic behind your primary hypothesis.
*Alternative explanations if results are inconsistent* with hypotheses**	What alternative theories could explain the results?

Population and Sample

Section 2: Population and Sample
Population of interest
Where and when will your study take place?	Does this match up to your population of interest, or are there conditions that make this study context different?
Sample size	How is this sample selected? Be specific about the procedure.
Consent	How will you obtain informed consent? If you will not, what is the justification? Is this population vulnerable to being coerced into participating in the study?
Ethics	Is the sample size large enough that you have sufficient power for your research conclusions to be credible and useful? Is the sample size no larger than necessary for the research? Can the research (results) be used to target people or make people more vulnerable?

Section 03 - Treatment and Randomisation

Section 3: Intervention
Status Quo	Describe the status quo–what are the current conditions in terms of the outcomes you hope to change? What aspects of the intervention already exist, if any?
Intervention	Describe your intervention(s) What is already known about the effect of the proposed intervention relative to the status quo? Is there credible evidence on the question?
Control	Describe the control condition Is the control condition a pure control (no intervention at all) or a placebo? What is the placebo contition designed to control for?
Units	To what units (level) will the intervention be applied? Individual, classroom, school, village, municipality, etc. Is this the same level at which outcomes will be measured? If not, how will you address the different levels if they do not perfectly overlap?

Threats to Validity

Section 3: Intervention
Compliance	What does it mean to “take” (comply with) the the intervention? If the intervention is a program, how much someone need to attend (showing up once? finishing the program?) in order to count as having attended?
Non- Compliance	Is there any concern with non-compliance (either taking the intervention if assigned to control/placebo or failing to take the intervention if assigned to treatment)?
Ethics	Is the control condition no worse than the status quo, according to the best evidence available? Are there concerns that participants may be forced to comply wiht the intervention? What are the risks and magnitude of potentially negative effects of the treatment? Are such risks concentrated on a particular subset of your population?

Outcomes and Covariates

Section 4: Outcome and Covariates
Primary Outcome	What is your primary outcome?
Measurement	How will it be measured? (Give the actual text of the survey question and response options, if using a survey measure. Is the outcome continuous, binary, etc.?)
Priors	What is the expected distribution of the primary outcome? (This may come from a prior study on a similar population or you may have to make an educated guess).
Validity and measurement error	Is there any concern with untruthful reporting? If so, how will you address it?
Stages	Will you collect a baseline? Will you collect a midline? Will you collect multiple waves of endline measurement? If you will collect a baseline or midline, how will you find the same respondents (minimize attrition?)
Covariates	What covariate data do you need, including for subgroup analysis? How will covariates be measured? What addtional covariates (if any) will you measure? What additional outcomes or covariates will you collect to distinguish between your explanation and alternatives if your findings are consistent with your hypothesis?
Ethics	Will data collection be onerous (time, effort) or painful (physically, emotionally) for any respondents? Are these costs necessary? Have they been minimized? Are they outweighed by the potential benefits of the research to society?

Randomisation

Section 5: Randomisation
Randomisation strategy	Complete/simple, block, cluster, factorial etc.
Blocks	What are they, how many blocks, how many units per block?
Clusters	What are they, how many clusters, how many units per cluster? If you have clusters, what is the intra-class correlation (ICC)? Is clustering strictly necessary, or could you randomize at the individual level?

Analysis

Section 6: Analysis
Estimator	What is your estimator?
Standard Errors	What kind of standard errors will you use?
Test	If you plan to report a p-value, what kind of test will you use?
Missing Data	How will you handle missing data?
Effect size	What is the expected effect size? What is the minimum effect size that would make the study worth running? what effect sizes have similar studies found?
What is your power?

Implementation

Section 7: Implementation
Randomisation	How will you conduct the randomisation? (on a computer in advance, drawing from an urn in public, etc.)
Implementation	Who will implement the intervention? Are there any dangers to your research team, including enumerators? How will you minimize them? How will you track the quality of the implementation of the intervention?
Compliance	Who will measure compliance?
Data management	How will you manage the data? (security, anonymity, etc.)

PAP examples you can use 📝

Some examples

Fortunately, there are many examples of PAPs available online
If you need some guidance, you can check the following resources:
Some examples of PAPs:
Yokum’s (2016) evaluation of a police body-worn camera programme
Leight et al. (2020) about the impact of SMS messages on clinic visits in Mozambique

Dates for your PAPs 📅

Important dates

Wednesday, March 26: PAP draft due (10-15 pages)
Monday, March 31: You will receive feedback on your PAP
Monday, April 7: Final version of your PAP due
Monday, April 14: You will receive your dataset
Wednesday, April 24 and Monday, April 28: You will present your results in class (about 15 minutes) and submit your final report (about 15 pages)

And that’s all for today! 🎉

QTM 385 - Experimental Methods

Hello, everyone! 😊

Brief recap 📚

Last class, we discussed…

Today’s plan 📅

A closer look at pre-analysis plans

Writing and executing PAPs

The brief history of PAPs 📜

Why do we bother with PAPs?

The rise of PAPs: Ulysses pacts for researchers?

Pros and cons of PAPs

Responding to criticisms

SOPs as a flexible alternative to PAPs?

SOPs as a flexible alternative to PAPs?

Components of a PAP 📝

What should a PAP contain?

Group activity

How would you organise your PAP?

Section 01 - Introduction

Hypotheses

Population and Sample

Section 03 - Treatment and Randomisation

Threats to Validity

Outcomes and Covariates

Randomisation

Analysis

Implementation

PAP examples you can use 📝

Some examples

Dates for your PAPs 📅

Important dates

And that’s all for today! 🎉

Have a great week! 🎉