---title:"ggplot2 demo" format: html: code-fold: true---## Meet QuartoQuarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.```{r}#| label: plot-penguins#| echo: false#| message: false#| warning: falselibrary(tidyverse)library(palmerpenguins)ggplot(penguins, aes(x = flipper_length_mm, y = bill_length_mm)) +geom_point(aes(color = species, shape = species)) +scale_color_manual(values = c("darkorange","purple","cyan4")) + labs( title = "Flipper and bill length", subtitle = "Dimensions for penguins at Palmer Station LTER", x = "Flipper length (mm)", y = "Bill length (mm)", color = "Penguin species", shape = "Penguin species" ) +theme_minimal()
Jupyter
---title: "Palmer Penguins Demo" format: html: code-fold: truejupyter: python3---## Meet QuartoQuarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.```{python}#| echo: false#| message: falseimport pandas as pdimport seaborn as sns from palmerpenguins import load_penguinssns.set_style('whitegrid')penguins = load_penguins()g = sns.lmplot(x="flipper_length_mm", y="body_mass_g", hue="species", height=7, data=penguins, palette=['#FF8C00','#159090','#A034F0']);g.set_xlabels('Flipper Length');g.set_ylabels('Body Mass');
The easiest way to manage references is with a reference manager like Zotero or download a BibTex reference from Google Scholar
Just search for a paper and click on the cite button
Then click on BibTeX and copy the reference to your .bib file
Example
@article{nash1950equilibrium,title={Equilibrium points in n-person games},author={Nash Jr, John F},journal={Proceedings of the national academy of sciences},volume={36},number={1},pages={48--49},year={1950},publisher={National Acad Sciences}}
Save it on your references.bib file (or any other name, as long as you reference it in your YAML header and the file ends with .bib)
DeclareDesign is an R package that helps you design and analyse experiments, quasi-experiments, and other observational studies
Its main goal is to provide tools for declaration, diagnosis, and redesign in code, that is, to help you plan and execute your research before data analysis
DeclareDesign can be thought of in 2 parts
The data fabrication step (this is the bulk of the code)
The evaluation step, where users can change initial assumptions and see how well their design actually worked (this is the computationally intensive part of the code)
The great thing about DeclareDesign is that, as far as I understand, you can simulate any kind of design you can think of (e.g., experiments, regression discontinuity, difference-in-differences, even qualitative research!)
However, most studies are not that complex, so you will rarely need to use all of DeclareDesign’s capabilities
The authors have written a book that explains the package in detail
And they have also created a companion package, DesignLibrary, with pre-built designs that you can use right away! 🤓
Six components of a DeclareDesign study
Population: Set of units about which inferences are sought and their characteristics
Where should the theory apply? Who are the units of interest?
Potential outcomes: Outcomes each unit might exhibit depending on how causal process changes the world
Rooted in theory; non-compliance, spillovers and attrition affect potential outcomes
Sampling strategy: Strategy used to select units to include in study
How are we selecting units to analyse?
Assignment: Manner in which units are assigned to reveal one potential outcome or another
Randomisation strategy used
Estimand: Quantities that we want to learn about in the world, in terms of potential outcomes
What are we trying to estimate? ATE? Difference in means? Other quantities?
Estimator: Procedure for generating estimates of the quantities we want to learn about
This is your statistical model (e.g., OLS, IV, etc.)
DeclareDesign in action
First, you need to install DeclareDesign with install.packages("DeclareDesign") and DesignLibrary with install.packages("DesignLibrary")
Under the hood, DeclareDesign uses fabricatr to generate data, randomizr to randomise units, and estimatr to estimate causal effects
All functions start with declare_, and you can chain them together with +
Once you have your design, you can run it with diagnose_design and draw_data
Then you can also use ggplot2 to visualise your results or dplyr to summarise them
The authors have written a tutorial that explains how to use DeclareDesign
And they have also created a gallery with examples of different designs
Design component
Function
Description
Model
declare_model()
background variables and potential outcomes
Inquiry
declare_inquiry()
research questions
Data strategy
declare_sampling()
sampling procedures
declare_assignment()
assignment procedures
declare_measurement()
measurement procedures
Answer strategy
declare_estimator()
estimation procedures
declare_test()
testing procedures
Chapter 18 of the book explains how to use DeclareDesign for experimental causal inference. Lots of examples! Some we are going to see today 🤓
Example
Two-arm experiment
library(DeclareDesign)library(tidyverse)set.seed(385)sample_size <-50two_arm <-declare_model(N =10000,X =rep(c(0, 1), each = N /2),U =rnorm(N, sd =0.25),potential_outcomes(Y ~0.2* Z + X + U) ) +declare_inquiry(ATE =mean(Y_Z_1 - Y_Z_0)) +declare_sampling(S =complete_rs(N = N, n = sample_size)) +declare_assignment(Z =complete_ra(N = N, m = sample_size/2)) +declare_measurement(Y =reveal_outcomes(Y ~ Z)) +declare_estimator(Y ~ Z, .method = lm_robust, inquiry ="ATE")diagnosis_two_arm <-diagnose_design(two_arm)diagnosis_two_arm
Research design diagnosis based on 500 simulations. Diagnosis completed in 2 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).
Design Inquiry Estimator Outcome Term N Sims Mean Estimand Mean Estimate
two_arm ATE estimator Y Z 500 0.20 0.21
(0.00) (0.01)
Bias SD Estimate RMSE Power Coverage
0.01 0.16 0.16 0.24 0.96
(0.01) (0.00) (0.00) (0.02) (0.01)
design sample_size diagnosand estimate
1 design_1 50 power 0.236
2 design_2 100 power 0.398
3 design_3 250 power 0.788
4 design_4 500 power 0.986
ggplot(two_arm_power, aes(sample_size, estimate)) +geom_point() +geom_smooth(method ="loess", color ="blue") +geom_hline(yintercept =0.8, color ="red") +labs(x ="Sample Size", y ="Statistical Power") +theme_minimal()
Two-arm experiments with covariates
set.seed(385)model <-declare_model(N =10000,X =rep(c(0, 1), each = N /2),U =rnorm(N, sd =0.25),potential_outcomes(Y ~0.2* Z + X + U) ) inquiry <-declare_inquiry(ATE =mean(Y_Z_1 - Y_Z_0))sampling <-declare_sampling(S =complete_rs(N = N, n = sample_size))assignment <-declare_assignment(Z =complete_ra(N = N, m = sample_size/2))measurement <-declare_measurement(Y =reveal_outcomes(Y ~ Z))answer_strategy <-declare_estimator(Y ~ Z, .method = lm_robust, inquiry ="ATE", label ="OLS")two_arms_a <- model + inquiry + sampling + assignment + measurement + answer_strategydiagnosis_two_arms_a <-diagnose_design(two_arms_a)diagnosis_two_arms_a
Research design diagnosis based on 500 simulations. Diagnosis completed in 2 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).
Design Inquiry Estimator Outcome Term N Sims Mean Estimand Mean Estimate
two_arms_a ATE OLS Y Z 500 0.20 0.21
(0.00) (0.01)
Bias SD Estimate RMSE Power Coverage
0.01 0.16 0.16 0.24 0.96
(0.01) (0.00) (0.00) (0.02) (0.01)
Research design diagnosis based on 500 simulations. Diagnosis completed in 2 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).
Design Inquiry Estimator Outcome Term N Sims Mean Estimand
two_arm_cov ATE OLS with covariates Y Z 500 0.20
(0.00)
Mean Estimate Bias SD Estimate RMSE Power Coverage
0.20 -0.00 0.07 0.07 0.81 0.95
(0.00) (0.00) (0.00) (0.00) (0.02) (0.01)
Two-arm experiments without covariates
simulations_ta <-get_simulations(diagnosis_two_arm)# first create summary for vertical linessummary_ta <- simulations_ta |>group_by(estimator) |>summarize(estimand =mean(estimand))# then plot simulationsggplot(simulations_ta) +geom_histogram(aes(estimate),bins =40, fill ="#72B4F3") +geom_vline(data = summary_ta,aes(xintercept = estimand),lty ="dashed", color ="#C6227F") +annotate("text", y =300, x =0.18, label ="Estimand",color ="#C6227F", hjust =1) +facet_wrap(~ estimator) +labs(x ="Estimate", y ="Count of simulations") +theme_minimal()
Two-arm experiments with covariates
simulations_tac <-get_simulations(diagnosis_two_arm_cov)summary_tac <- simulations_tac |>group_by(estimator) |>summarize(estimand =mean(estimand))# then plot simulationsggplot(simulations_tac) +geom_histogram(aes(estimate),bins =40, fill ="#72B4F3") +geom_vline(data = summary_tac,aes(xintercept = estimand),lty ="dashed", color ="#C6227F") +annotate("text", y =300, x =0.18, label ="Estimand",color ="#C6227F", hjust =1) +facet_wrap(~ estimator) +labs(x ="Estimate", y ="Count of simulations") +theme_minimal()
Research design diagnosis based on 500 simulations. Diagnosis completed in 4 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).
Design Inquiry Estimator Outcome Term N Sims Mean Estimand
block_randomisation ATE Lin Y Z 500 0.20
(0.00)
Mean Estimate Bias SD Estimate RMSE Power Coverage
0.20 0.00 0.05 0.05 0.97 0.94
(0.00) (0.00) (0.00) (0.00) (0.01) (0.01)
simulations_br <-get_simulations(diagnosis_block_randomisation)summary_br <- simulations_br |>group_by(estimator) |>summarize(estimand =mean(estimand))# then plot simulationsggplot(simulations_br) +geom_histogram(aes(estimate),bins =40, fill ="#72B4F3") +geom_vline(data = summary_br,aes(xintercept = estimand),lty ="dashed", color ="#C6227F") +annotate("text", y =300, x =0.18, label ="Estimand",color ="#C6227F", hjust =1) +facet_wrap(~ estimator) +labs(x ="Estimate", y ="Count of simulations") +theme_minimal()
Research design diagnosis based on 500 simulations. Diagnosis completed in 2 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).
Design Inquiry Estimator Outcome Term N Sims Mean Estimand
block_cluster ATE estimator Y Z 500 0.01
(0.01)
Mean Estimate Bias SD Estimate RMSE Power Coverage
-0.02 -0.03 0.52 0.41 0.00 1.00
(0.02) (0.02) (0.01) (0.01) (0.00) (0.00)