Design and Statistical Analysis Portfolio
Summary of Case Studies
COVID Tracker and Policy Analysis
- Building and design dashboard for tracking COVID-19 statistics in Chile.
- Consolidation of different data sources, including new cases, testing, and hospitals’ capacity.
- Published research paper on effectiveness of lockdowns during the pandemic.
In early 2020, I started a pet project to track COVID statistics in Chile while isolated in my New York apartment. Given that there was no proper dashboard to follow the data that the Ministry of Health was publishing, and the fact that other organizations were also posting disaggregated data, I decided to combine all these different sources and create a simple website that would allow users to follow the state of the country in terms of COVID each day.
The dashboard combined data involving total and new cases, available testing, hospital capacity, and geographic location of contagions (by municipality), among others. You can access the last updated version of the app here and see some of the different tabs displayed in Figure 1.
To create this web application, I taught myself how to build a ShinyApp and how to scrape data, including all the learning processes both on GitHub and the resources section of my website.
Using all the data compiled for this dashboard, I started analyzing more deeply how the measures adopted by the Chilean Government were potentially affecting the contagion rate. In particular, I was interested in the effectiveness of the strict lockdown measures adopted by authorities early in the pandemic. Unlike other countries, Chile was the first country to adopt strict stay-in-place orders and lockdowns in different regions of the country, in an attempt to slow down the rate of infection. I included some of this early analysis for a general audience in the same dashboard.
With all the data I had collected and harmonized, I was able to conduct a rigorous analysis of the effectiveness of lockdowns in Chile in a research paper titled “All Things Equal? Heterogeneity in Policy Effectiveness against COVID-19 Spread in Chile”, published in 2021 in the Journal World Development, one of the top journals in development studies, with 131 citations to date.
In my analysis, I find that even though the strict stay-in-place policies positively reduced COVID-19 contagion, there are relevant heterogeneities in this impact. Leveraging the date each municipality started with a lockdown as a natural experiment, I use new causal inference techniques such as augmented synthetic control to better understand the effect of these policies by socioeconomic status. While more affluent municipalities saw a positive impact from lockdowns, the effect is much smaller (and sometimes null) for lower-income municipalities. These findings are consistent with mobility data I was able to merge for this analysis, showing that higher-income neighborhoods reduced their mobility for more extended periods in comparison to more vulnerable areas.
Helping micro-entrepreneurs in Latin America
- Design of two-phase study for understanding the determinants of success for vulnerable micro-entrepreneurs in Latin America.
- Analysis of 2,000 in-depth interviews, using data for building prediction models for metrics of success.
- Design and implementation of large scale randomized controlled trial for low-cost online interventions on training (e.g. financial education and business practices), role models, and starting loan amount.
In this ongoing project (co-authored with William Fuchs and Jaime Millan) in partnership with the BBVA Foundation, the objective is to identify actionable characteristics that improve the chances of success for vulnerable micro-entrepreneurs in Latin America and, in a second phase, test these interventions to assess the potential effect of these treatments on loan repayment and business growth.
For the first phase of this study, we conducted in-depth interviews with over 2,000 existing clients in four different countries, gathering information from demographic and socioeconomic characteristics to business practices and financial knowledge. Additionally, we were able to conduct a set of games to assess risk and loss aversion, as well as honesty.
With this initial information, I led the analysis of our survey and administrative data to identify potential characteristics that correlated with different definitions of success. Using both data sources from the loan entities and survey data, I used prediction models (e.g., Random Forest) to identify which characteristics better explained the probability that a client was successful (i.e., paid on time and grew their business), in addition to identifying heterogeneity between clusters of clients.
The main takeaways from Phase 1 are the following:
- The initial loan amount is positively associated with business growth
- The use of technology is a way to enhance business growth
- Training (especially in business practices) is associated with financial success
- Changes in mindset and ambition could help micro-entrepreneurs succeed
With these findings, we are now in the process of designing a Randomized Controlled Trial in these four countries for new clients with the following interventions:
- Online training
- Role Models + Ambition enhancement
- Initial loan amount increase
The first two interventions will be delivered over time fully online through a Whatsapp chatbot. The idea is to make the delivery of these interventions less costly for both the implementer and the client. We will be working with over 12,000 clients (see Power Calculations on the following app), randomizing them to six different groups to assess potential complementarities of the treatments as well.
Tracking on the field? Racial segregation in the Quarterback position
- Creation of large data panel of football players at different stages, including high school, college, and the NFL from 2000 to 2019, including race, physical characteristics, performance, and recruitment statistics.
- Analysis showing that even after adjusting by performance and other attributes, Black high school quarterbacks (QB) are 13 percentage points less likely to be recruited as college QBs compared to White players.
- Findings are robust to playing style and other performance metrics, which is indicative of racial discrimination at the moment of recruiting for the QB position.
In this study, I explore the racial imbalance present at various tiers within American competitive football, including high school, college, and the National Football League (NFL), focusing on the quarterback (QB) position, a critical role within the team. I examine the selection processes at each level, shedding light on the recruitment and drafting patterns. My initial findings reveal significant disparities in the recruitment probabilities for QBs based on race: Highly skilled high school QBs of color have a lower chance of being recruited for college teams of 13 percentage points in comparison to their white peers, even when accounting for physical attributes and playing techniques. Furthermore, elite college QBs of color face a 5 percentage point lower probability of entering the NFL relative to white players. These discrepancies remain even after adjustments to equate the playing styles of black QBs to those of white QBs.
The following figure (Figure 5) shows the selection process at different stages according to race. While non-White players represent a third of ranked high school QBs, they represent just over 20% of the NFL quarterbacks in the sample I study from 2000 to 2017.
Even though sports analysts have many hypotheses regarding what could be driving the difference in recruitment by race in the QB position, there has been scarce quantitative analysis to assess this phenomenon. One of the main limitations for this is data: Even though there are extensive datasets for college and professional football, data at the high school level is not as prominent, and most importantly, it has not been systematically combined with other sources, including player’s race.
For example, several websites collect information at the high school level (such as rivals.com, 247sports.com, and espn.com, among others), but they sometimes have different statistics or data (i.e., some sites post combine data or recruitment interest, while others do not). Additionally, in order to analyze disparities by race, we need to have race coding in our data.
For this project, I merged all available data from the largest football platforms, matching players by name, state, and high school, when available. I harmonized these datasets to make it easier for analysis, and with the support of a research assistant, we coded the race for all available players (both high school and college QBs) in this dataset by searching their photographs, between the years 2000 and 2019. Even though this is a measure of perceived race, I believe this measure is a good representation of what could potentially be driving disparities.
Some of the key findings are presented here:
After adjusting by scouts rating, playing style, and physical characteristics, high school QBs of color are 13 percentage points less likely to be recruited to College as QBs compared to White high school QBs.
After creating matched pairs of players (one White and one of color) with the same playing style (pocket passer or dual threat), the same scouts rating in high school, and very similar height and weight, Black high school QBs are 14 percentage points less likely to be recruited to be a college QB compared to their White counterparts.
The previous difference in probability (14 pp) is the same one that steams from a 0.57 SD difference in scouts’ rating.
Players classified as “Dual Threats” in high school (which is more likely for Black QBs) have very similar statistics to those classified as “Pocket Passers”, which indicates that playing style before college is not a strong determinant of performance.
Finally, players classified as “Dual Threats” are 10 percentage points more likely to be switched to a different position outside of QB when transitioning from high school to college. This indicates that while high school QBs of color are being recruited for college football, they are not playing the leading position.
The next steps for this project is to continue diving into the available data, trying to better understand the recruiting process. I have finished collecting colleges’ interests in players (when available), which will allow me to compare players with similar levels of interest from colleges and similar portfolios of recruitment. The preliminary report can be accessed here.