class: title-slide <br><br><br> # Lecture 20 ## Machine Learning for Causal Modeling ### Tyler Ransom ### ECON 6343, University of Oklahoma --- # Plan for the Day Go over a number of econ papers that use machine learning methods --- # Publishing fads .center[] [Image source](https://www.economist.com/finance-and-economics/2016/11/24/economists-are-prone-to-fads-and-the-latest-is-machine-learning) --- # `\(k\)`-means clustering and unobserved types - Bonhomme, Lamadon, and Manresa (2019) - Panel data model where unobserved heterogeneity is continuous in the population - But approximated in the model with a discrete distribution (Group Fixed Effects, GFE) - Propose a 2-step estimation algorithm: 1. Classify units into groups using `\(k\)`-means clustering 2. Estimate the model using the groups in step 1 - This is different from finite mixture models: no joint estimation required! --- # Assumptions of BLM (2019) There are two main assumptions: 1. Unobserved heterogeneity depends on a low-dimensional vector of latent types - This is similar to the conditions of a factor model - But this method doesn't require a factor structure 2. Underlying types can be approximated from individual-specific moments - Moments can come from the data (e.g. a battery of test scores) - They can also come from the model (e.g. choice probabilities) --- # Further considerations - The `\(k\)`-means objective function is not globally concave - This means you will need to search for the global minimum - Consider the log likelihood of a dynamic discrete choice model: `\begin{align*} \ell_i\left(\alpha_i,\theta; d_{it},X_{it},Y_{it}\right) &= \sum_t \underbrace{\ln f\left(d_{it}\vert X_{it},\alpha_i,\theta\right)}_{\text{choices}} + \underbrace{\ln f\left(X_{it}\vert d_{it-1},X_{it-1},\alpha_i,\theta\right)}_{\text{state transitions}} + \\ &\phantom{=\sum_t} \underbrace{\ln f\left(Y_{it}\vert d_{it},X_{it},\alpha_i,\theta\right)}_{\text{outcomes}} \end{align*}` - Likelihoods are assumed to be additively separable conditional on the FE `\(\alpha_i\)` --- # Extensions - You can incorporate covariates into the `\(k\)`-means step - This can often improve performance - You can also incorporate model moments in the first step - This is required if you don't have external measurements (like test scores) - Another thing to keep in mind is that the GFE is inherently biased - You may need to iterate on the 2-step estimator multiple times to correct for this --- # Using ML to solve the sample selection problem - Heckman (1979) outlines the canonical sample selection problem - e.g. we only observe the earnings of individuals who are employed - This might distort our estimates of wage returns to skill - Can we improve on this by using machine learning? - Especially if the choice dimension is much larger than work/not work? --- # Ransom (2021) - Considers geographic heterogeneity in wage returns to college major - Individuals choose where they live based on wages and non-wage factors - Problem: researcher only sees wages in chosen residence location - Thus, wage returns are potentially contaminated by selection bias --- # Resolving the selection problem - Heckman model: the inverse Mill's ratio `\(\lambda(\cdot)\)` corrects for selection `\begin{align*} \ln wage &= X\beta + \lambda\left(Z\gamma\right) + u \end{align*}` - One can generalize this approach to multinomial choice and non-normality `\begin{align*} \ln wage &= X\beta + \sum_j d_j\widetilde{\lambda}\left(p_j(Z),p_k(Z)\right) + u \end{align*}` where - `\(d_j\)` is a dummy for living in location `\(j\)` - `\(\widetilde{\lambda}\)` is a flexible function - `\(p_j\)` and `\(p_k\)` are probabilities of choosing `\(j\)` or `\(k\)` (as a function of `\(Z\)`) --- # Using a tree model to estimate selection - The `\(p\)`'s on the previous slide are selection probabilities - `\(p_j\)` is the probability of choosing the chosen alternative - `\(p_k\)` is the probability of choosing the next-preferred alternative - Use a classification tree model to obtain the `\(p\)`'s - Assume that individuals with same values of `\(Z\)` and similar `\(p\)`'s have identical tastes - This approach improves on a bin-estimation approach - Can include a higher dimension of `\(Z\)` while limiting the curse of dimensionality --- # Can LASSO improve causal inference? - Shifting gears, let's talk about how model selection might improve causal inference - Thought experiment: - Methods such as matching and regression rely on unconfoundedness - If we have high-dimensional data, we can "control for everything"! - This would give us a high `\(R^2\)` and remove any omitted variable bias - LASSO can potentially select only the most important variables --- # Prediction problems - The problem with the above thought experiment is that LASSO only predicts - If we took a slightly different sample, it might select different variables - This is because LASSO doesn't care about inference, it cares only about prediction - Mullainathan and Spiess (2017) illustrate this in their Figure 2 - 2 functions with very different coefficients can produce the exact same prediction - To use ML in econometrics, we need to be more principled about ML's role --- # Regularization bias - In econometrics, we like our estimators to be CAN (Consistent & Asym Normal) - Suppose we want to estimate a treatment effect `\(\theta\)` in a high-dimensional model `\begin{align*} Y &= D\cdot\theta + g(X) + U, & \mathbb{E} \left[U | X, D \right] =0 \end{align*}` - We might want to use LASSO, ridge, random forest, etc. since `\(X\)` is high-dimensional - This solves the bias/variance tradeoff, but introduces bias into `\(\hat\theta\)` - Why? Because the bias/variance tradeoff trades off .hi[regularization bias] and variance - See Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, Newey, and Robins (2018) --- # Double ML estimation - How do we solve the regularization bias problem? Add another equation - Consider outcome and selection equations, respectively `\begin{align*} Y &= D\cdot\theta + g(X) + U, & \mathbb{E} \left[U | X, D \right] =0 \\ D &= m(X) + V, & \mathbb{E} \left[V | X\right] =0 \end{align*}` - We include the second equation to .hi[orthogonalize] `\(D\)` - We also need to .hi[split our sample] to be able to estimate this system - Instead of using `\(D\)`, we use `\(\hat V = D - \hat{m}(X)\)` - This idea is related to the concept of control functions --- # Steps for Double ML (0.) Divide the sample in half; one subsample labeled `\(I^C\)` and the other labeled `\(I\)` 1. Estimate `\(\hat V = D - \hat{m}(X)\)` in `\(I^C\)` 2. Estimate `\(\hat U = Y - \hat{g}(X)\)` in `\(I^C\)` 3. Estimate `\(\check \theta = \left(\hat{V}'D\right)^{-1}\hat{V}'\hat{U}\)` in `\(I\)` (cf. biased `\(\hat \theta = \left(D'D\right)^{-1}D'\hat{U}\)`) 4. Repeat steps 1-3, but switch `\(I^C\)` and `\(I\)` (this is known as cross-fitting) 5. `\(\check \theta_{cf} = \frac{1}{2} \check \theta(I^C,I)+ \frac{1}{2} \check \theta(I,I^C)\)` - These steps ensure that `\(\check \theta\)` is unbiased and efficient - Nice examples in [R](https://www.r-bloggers.com/2017/06/cross-fitting-double-machine-learning-estimator/) and [Python](http://aeturrell.com/2018/02/10/econometrics-in-python-partI-ML/) --- # Post Double Selection (PDS) - Now let's consider a related idea to Double ML - This is known as .hi[post double selection] (Belloni, Chernozhukov, and Hansen, 2014) - It is a useful way to estimate treatment effects in linear models - Same setup as Double ML, but here `\(g(\cdot)\)` and `\(m(\cdot)\)` are linear `\begin{align*} Y &= D\cdot\theta + g(X) + U, & \mathbb{E} \left[U | X, D \right] =0 \\ D &= m(X) + V, & \mathbb{E} \left[V | X\right] =0 \end{align*}` --- # PDS steps 1. Use LASSO to separately select `\(X\)` - First on `\(Y = g(X) + \tilde U\)` - Then on `\(D = m(X) + V\)` 2. Regress `\(Y\)` on `\(D\)` and the union of the selected `\(X\)`'s from step 1 - The procedure is called "post double selection" because the final regression is on the set of `\(X\)`'s that have been doubly selected (first in the outcome equation, then in the selection equation) - Key idea is that we avoid regularization bias by only looking at the selection part of LASSO (not the shrinkage part) --- # Usefulness of PDS - For an example, let's re-evaluate Donohue and Levitt (2001) - Their claim: legalizing abortion reduces crime - Intuition: unwanted children are most likely to become criminals - Use a "two-way fixed effects" model on state-level panel data: `\begin{align*} y_{st} &= \alpha a_{st} + \beta w_{st} + \delta_s + \gamma_t + \varepsilon_{st} \end{align*}` where `\(s\)` is US state, `\(t\)` is time, and `\(a_{st}\)` is the abortion rate (15-25 years prior) - `\(y_{st}\)` are various measures of crime (property, violent, murder, ...) - `\(w_{st}\)` are state-level controls (prisoners per capita, police per capita, ...) --- # Re-evaluating Donohue and Levitt (2001) - A potential issue with Donohue and Levitt (2001): specification of `\(w_{st}\)` - We might think we should include highly flexible forms of elements of `\(w_{st}\)` - Indeed, when Belloni, Chernozhukov, and Hansen (2014) do this, the SE's get larger - All previous results are diminished in magnitude and have 5x larger SE's - The PDS approach is also useful for other regression designs such as DiD --- # Heterogeneous treatment effects - ML can also help us with treatment effect heterogeneity - See Athey and Imbens (2016) - Use regression trees to partition units into groups with similar TE's - Estimation is "honest" in a similar way as Double ML: - Split the sample in half - Use one subsample to do the partitioning - Use the other subsample to estimate the TE's --- # Matrix completion - Causal inference is fundamentally a missing data problem - This is because we only ever observe `\(Y=D_0 Y_0 + D_1 Y_1\)` - Athey, Bayati, Doudchenko, Imbens, and Khosravi (2018) propose .hi[matrix completion] methods for panel data - This is a credible data imputation technique - Estimate the ATE by imputing `\(Y_0\)` for treated units - Take into account within-unit serial correlation --- # Recent advances in difference-in-differences - Roth, Sant'Anna, Bilinski, and Poe (2022) provide an overview of recent advances - de Chaisemartin and D'Haultfoeuille (2022) also provide an overview - Main advances: - Multiple periods and variation in treatment timing - Non-parallel trends - Alternative sampling assumptions (i.e. appropriate standard errors) - Main idea is that treatment effect heterogeneity complicates things --- # Key papers in new DiD literature - Multiple periods and variation in treatment timing: - de Chaisemartin and D'Haultfoeuille (2020); Goodman-Bacon (2021); Callaway and Sant’Anna (2021); Sun and Abraham (2021) - Relaxing parallel trends assumption / testing for pre-trends: - Roth (2022); others (see Roth, Sant'Anna, Bilinski et al. (2022)) - Appropriate standard errors in DiD estimation: - Roth and Sant'Anna (2021); others (see Roth, Sant'Anna, Bilinski et al. (2022)) --- # Further reading - Bajari, Nekipelov, Ryan, and Yang (2015) - Examples of using ML in IO demand estimation - Dube, Jacobs, Naidu, and Suri (2020) - Example of using Double ML to estimate employer monopsony power - Angrist and Frandsen (2022) - Discussion of the role ML should play in empirical labor economics --- # References .minuscule[ Aakvik, A., J. J. Heckman, and E. J. Vytlacil (2005). "Estimating Treatment Effects for Discrete Outcomes When Responses to Treatment Vary: An Application to Norwegian Vocational Rehabilitation Programs". In: _Journal of Econometrics_ 125.1, pp. 15-51. DOI: [10.1016/j.jeconom.2004.04.002](https://doi.org/10.1016%2Fj.jeconom.2004.04.002). Ackerberg, D. A. (2003). "Advertising, Learning, and Consumer Choice in Experience Good Markets: An Empirical Examination". In: _International Economic Review_ 44.3, pp. 1007-1040. DOI: [10.1111/1468-2354.t01-2-00098](https://doi.org/10.1111%2F1468-2354.t01-2-00098). Adams, R. P. (2018). _Model Selection and Cross Validation_. Lecture Notes. Princeton University. URL: [https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf](https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf). Ahlfeldt, G. M., S. J. Redding, D. M. Sturm, et al. (2015). "The Economics of Density: Evidence From the Berlin Wall". In: _Econometrica_ 83.6, pp. 2127-2189. DOI: [10.3982/ECTA10876](https://doi.org/10.3982%2FECTA10876). Altonji, J. G., T. E. Elder, and C. R. Taber (2005). "Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools". In: _Journal of Political Economy_ 113.1, pp. 151-184. DOI: [10.1086/426036](https://doi.org/10.1086%2F426036). Altonji, J. G. and C. R. Pierret (2001). "Employer Learning and Statistical Discrimination". In: _Quarterly Journal of Economics_ 116.1, pp. 313-350. DOI: [10.1162/003355301556329](https://doi.org/10.1162%2F003355301556329). Angrist, J. D. and B. Frandsen (2022). "Machine Labor". In: _Journal of Labor Economics_ 40.S1, pp. S97-S140. DOI: [10.1086/717933](https://doi.org/10.1086%2F717933). Angrist, J. D. and A. B. Krueger (1991). "Does Compulsory School Attendance Affect Schooling and Earnings?" In: _Quarterly Journal of Economics_ 106.4, pp. 979-1014. DOI: [10.2307/2937954](https://doi.org/10.2307%2F2937954). Angrist, J. D. and J. Pischke (2009). _Mostly Harmless Econometrics: An Empiricist's Companion_. Princeton University Press. ISBN: 0691120358. Arcidiacono, P. (2004). "Ability Sorting and the Returns to College Major". In: _Journal of Econometrics_ 121, pp. 343-375. DOI: [10.1016/j.jeconom.2003.10.010](https://doi.org/10.1016%2Fj.jeconom.2003.10.010). Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2016). _College Attrition and the Dynamics of Information Revelation_. Working Paper. Duke University. URL: [https://tyleransom.github.io/research/CollegeDropout2016May31.pdf](https://tyleransom.github.io/research/CollegeDropout2016May31.pdf). Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2025). "College Attrition and the Dynamics of Information Revelation". In: _Journal of Political Economy_ 133.1. DOI: [10.1086/732526](https://doi.org/10.1086%2F732526). Arcidiacono, P. and J. B. Jones (2003). "Finite Mixture Distributions, Sequential Likelihood and the EM Algorithm". In: _Econometrica_ 71.3, pp. 933-946. DOI: [10.1111/1468-0262.00431](https://doi.org/10.1111%2F1468-0262.00431). Arcidiacono, P., J. Kinsler, and T. Ransom (2022b). "Asian American Discrimination in Harvard Admissions". In: _European Economic Review_ 144, p. 104079. DOI: [10.1016/j.euroecorev.2022.104079](https://doi.org/10.1016%2Fj.euroecorev.2022.104079). Arcidiacono, P., J. Kinsler, and T. Ransom (2022a). "Legacy and Athlete Preferences at Harvard". In: _Journal of Labor Economics_ 40.1, pp. 133-156. DOI: [10.1086/713744](https://doi.org/10.1086%2F713744). Arcidiacono, P. and R. A. Miller (2011). "Conditional Choice Probability Estimation of Dynamic Discrete Choice Models With Unobserved Heterogeneity". In: _Econometrica_ 79.6, pp. 1823-1867. DOI: [10.3982/ECTA7743](https://doi.org/10.3982%2FECTA7743). Arroyo Marioli, F., F. Bullano, S. Kucinskas, et al. (2020). _Tracking R of COVID-19: A New Real-Time Estimation Using the Kalman Filter_. Working Paper. medRxiv. DOI: [10.1101/2020.04.19.20071886](https://doi.org/10.1101%2F2020.04.19.20071886). Ashworth, J., V. J. Hotz, A. Maurel, et al. (2021). "Changes across Cohorts in Wage Returns to Schooling and Early Work Experiences". In: _Journal of Labor Economics_ 39.4, pp. 931-964. DOI: [10.1086/711851](https://doi.org/10.1086%2F711851). Athey, S., M. Bayati, N. Doudchenko, et al. (2018). _Matrix Completion Methods for Causal Panel Data Models_. Working Paper 25132. National Bureau of Economic Research. DOI: [10.3386/w25132](https://doi.org/10.3386%2Fw25132). Athey, S. and G. Imbens (2016). "Recursive partitioning for heterogeneous causal effects". In: _Proceedings of the National Academy of Sciences_ 113.27, pp. 7353-7360. DOI: [10.1073/pnas.1510489113](https://doi.org/10.1073%2Fpnas.1510489113). Attanasio, O. P., C. Meghir, and A. Santiago (2011). "Education Choices in Mexico: Using a Structural Model and a Randomized Experiment to Evaluate PROGRESA". In: _Review of Economic Studies_ 79.1, pp. 37-66. DOI: [10.1093/restud/rdr015](https://doi.org/10.1093%2Frestud%2Frdr015). Aucejo, E. M. and J. James (2019). "Catching Up to Girls: Understanding the Gender Imbalance in Educational Attainment Within Race". In: _Journal of Applied Econometrics_ 34.4, pp. 502-525. DOI: [10.1002/jae.2699](https://doi.org/10.1002%2Fjae.2699). Bajari, P., D. Nekipelov, S. P. Ryan, et al. (2015). "Machine Learning Methods for Demand Estimation". In: _American Economic Review_ 105.5, pp. 481-485. DOI: [10.1257/aer.p20151021](https://doi.org/10.1257%2Faer.p20151021). Baragatti, M., A. Grimaud, and D. Pommeret (2013). "Likelihood-free Parallel Tempering". In: _Statistics and Computing_ 23.4, pp. 535-549. DOI: [ 10.1007/s11222-012-9328-6](https://doi.org/%2010.1007%2Fs11222-012-9328-6). Bayer, P., R. McMillan, A. Murphy, et al. (2016). "A Dynamic Model of Demand for Houses and Neighborhoods". In: _Econometrica_ 84.3, pp. 893-942. DOI: [10.3982/ECTA10170](https://doi.org/10.3982%2FECTA10170). Begg, C. B. and R. Gray (1984). "Calculation of Polychotomous Logistic Regression Parameters Using Individualized Regressions". In: _Biometrika_ 71.1, pp. 11-18. DOI: [10.1093/biomet/71.1.11](https://doi.org/10.1093%2Fbiomet%2F71.1.11). Beggs, S. D., N. S. Cardell, and J. Hausman (1981). "Assessing the Potential Demand for Electric Cars". In: _Journal of Econometrics_ 17.1, pp. 1-19. DOI: [10.1016/0304-4076(81)90056-7](https://doi.org/10.1016%2F0304-4076%2881%2990056-7). Belloni, A., V. Chernozhukov, and C. Hansen (2014). "Inference on Treatment Effects after Selection among High-Dimensional Controls". In: _Review of Economic Studies_ 81.2, pp. 608-650. DOI: [10.1093/restud/rdt044](https://doi.org/10.1093%2Frestud%2Frdt044). Berry, S., J. Levinsohn, and A. Pakes (1995). "Automobile Prices in Market Equilibrium". In: _Econometrica_ 63.4, pp. 841-890. URL: [http://www.jstor.org/stable/2171802](http://www.jstor.org/stable/2171802). Bjorklund, A. and R. Moffitt (1987). "The Estimation of Wage Gains and Welfare Gains in Self-Selection Models". In: _Review of Economics and Statistics_ 69.1, pp. 42-49. DOI: [10.2307/1937899](https://doi.org/10.2307%2F1937899). Blass, A. A., S. Lach, and C. F. Manski (2010). "Using Elicited Choice Probabilities to Estimate Random Utility Models: Preferences for Electricity Reliability". In: _International Economic Review_ 51.2, pp. 421-440. DOI: [10.1111/j.1468-2354.2010.00586.x](https://doi.org/10.1111%2Fj.1468-2354.2010.00586.x). Blundell, R. (2010). "Comments on: ``Structural vs. Atheoretic Approaches to Econometrics'' by Michael Keane". In: _Journal of Econometrics_ 156.1, pp. 25-26. DOI: [10.1016/j.jeconom.2009.09.005](https://doi.org/10.1016%2Fj.jeconom.2009.09.005). Bonhomme, S., T. Lamadon, and E. Manresa (2019). _Discretizing Unobserved Heterogeneity_. Working Paper. University of Chicago. URL: [https://lamadon.com/paper/blm2_2019.pdf](https://lamadon.com/paper/blm2_2019.pdf). Bonhomme, S. and J. Robin (2009). "Consistent Noisy Independent Component Analysis". In: _Journal of Econometrics_ 149.1, pp. 12-25. DOI: [10.1016/j.jeconom.2008.12.019](https://doi.org/10.1016%2Fj.jeconom.2008.12.019). Bonhomme, S. and J. Robin (2010). "Generalized Non-Parametric Deconvolution with an Application to Earnings Dynamics". In: _Review of Economic Studies_ 77.2, pp. 491-533. DOI: [10.1111/j.1467-937X.2009.00577.x](https://doi.org/10.1111%2Fj.1467-937X.2009.00577.x). Bresnahan, T. F., S. Stern, and M. Trajtenberg (1997). "Market Segmentation and the Sources of Rents from Innovation: Personal Computers in the Late 1980s". In: _The RAND Journal of Economics_ 28.0, pp. S17-S44. DOI: [10.2307/3087454](https://doi.org/10.2307%2F3087454). Brien, M. J., L. A. Lillard, and S. Stern (2006). "Cohabitation, Marriage, and Divorce in a Model of Match Quality". In: _International Economic Review_ 47.2, pp. 451-494. DOI: [10.1111/j.1468-2354.2006.00385.x](https://doi.org/10.1111%2Fj.1468-2354.2006.00385.x). Brinch, C. N., M. Mogstad, and M. Wiswall (2017). "Beyond LATE with a Discrete Instrument". In: _Journal of Political Economy_ 125.4, pp. 985-1039. DOI: [10.1086/692712](https://doi.org/10.1086%2F692712). Callaway, B. and P. H. Sant’Anna (2021). "Difference-in-Differences with multiple time periods". In: _Journal of Econometrics_ 225.2. Themed Issue: Treatment Effect 1, pp. 200-230. DOI: [10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016%2Fj.jeconom.2020.12.001). Card, D. (1995). "Using Geographic Variation in College Proximity to Estimate the Return to Schooling". In: _Aspects of Labor Market Behaviour: Essays in Honour of John Vanderkamp_. Ed. by L. N. Christofides, E. K. Grant and R. Swidinsky. Toronto: University of Toronto Press. Cardell, N. S. (1997). "Variance Components Structures for the Extreme-Value and Logistic Distributions with Application to Models of Heterogeneity". In: _Econometric Theory_ 13.2, pp. 185-213. URL: [https://www.jstor.org/stable/3532724](https://www.jstor.org/stable/3532724). Carneiro, P., K. T. Hansen, and J. J. Heckman (2003). "Estimating Distributions of Treatment Effects with an Application to the Returns to Schooling and Measurement of the Effects of Uncertainty on College Choice". In: _International Economic Review_ 44.2, pp. 361-422. DOI: [10.1111/1468-2354.t01-1-00074](https://doi.org/10.1111%2F1468-2354.t01-1-00074). Carneiro, P., J. J. Heckman, and E. Vytlacil (2010). "Evaluating Marginal Policy Changes and the Average Effect of Treatment for Individuals at the Margin". In: _Econometrica_ 78.1, pp. 377-394. DOI: [10.3982/ECTA7089](https://doi.org/10.3982%2FECTA7089). Carneiro, P., J. J. Heckman, and E. J. Vytlacil (2011). "Estimating Marginal Returns to Education". In: _American Economic Review_ 101.6, pp. 2754-2781. DOI: [10.1257/aer.101.6.2754](https://doi.org/10.1257%2Faer.101.6.2754). Caucutt, E. M., L. Lochner, J. Mullins, et al. (2020). _Child Skill Production: Accounting for Parental and Market-Based Time and Goods Investments_. Working Paper 27838. National Bureau of Economic Research. DOI: [10.3386/w27838](https://doi.org/10.3386%2Fw27838). Chaisemartin, C. de and X. D'Haultfoeuille (2020). "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects". In: _American Economic Review_ 110.9, pp. 2964-2996. DOI: [10.1257/aer.20181169](https://doi.org/10.1257%2Faer.20181169). Chen, X., H. Hong, and D. Nekipelov (2011). "Nonlinear Models of Measurement Errors". In: _Journal of Economic Literature_ 49.4, pp. 901-937. DOI: [10.1257/jel.49.4.901](https://doi.org/10.1257%2Fjel.49.4.901). Chernozhukov, V., D. Chetverikov, M. Demirer, et al. (2018). "Double/Debiased Machine Learning for Treatment and Structural Parameters". In: _Econometrics Journal_ 21.1, pp. C1-C68. DOI: [10.1111/ectj.12097](https://doi.org/10.1111%2Fectj.12097). Chintagunta, P. K. (1992). "Estimating a Multinomial Probit Model of Brand Choice Using the Method of Simulated Moments". In: _Marketing Science_ 11.4, pp. 386-407. DOI: [10.1287/mksc.11.4.386](https://doi.org/10.1287%2Fmksc.11.4.386). Cinelli, C. and C. Hazlett (2020). "Making Sense of Sensitivity: Extending Omitted Variable Bias". In: _Journal of the Royal Statistical Society: Series B (Statistical Methodology)_ 82.1, pp. 39-67. DOI: [10.1111/rssb.12348](https://doi.org/10.1111%2Frssb.12348). Coate, P. and K. Mangum (2019). _Fast Locations and Slowing Labor Mobility_. Working Paper 19-49. Federal Reserve Bank of Philadelphia. Cunha, F. and J. Heckman (2007). "The Technology of Skill Formation". In: _American Economic Review_ 97.2, pp. 31-47. DOI: [10.1257/aer.97.2.31](https://doi.org/10.1257%2Faer.97.2.31). Cunha, F., J. J. Heckman, and S. M. Schennach (2010). "Estimating the Technology of Cognitive and Noncognitive Skill Formation". In: _Econometrica_ 78.3, pp. 883-931. DOI: [10.3982/ECTA6551](https://doi.org/10.3982%2FECTA6551). Cunningham, S. (2021). _Causal Inference: The Mixtape_. Yale University Press. URL: [https://www.scunning.com/causalinference_norap.pdf](https://www.scunning.com/causalinference_norap.pdf). de Chaisemartin, C. and X. D'Haultfoeuille (2022). "Two-way Fixed Effects and Differences-in-differences with Heterogeneous Treatment Effects: A S urvey". In: _The Econometrics Journal_. DOI: [10.1093/ectj/utac017](https://doi.org/10.1093%2Fectj%2Futac017). Delavande, A. and C. F. Manski (2015). "Using Elicited Choice Probabilities in Hypothetical Elections to Study Decisions to Vote". In: _Electoral Studies_ 38, pp. 28-37. DOI: [10.1016/j.electstud.2015.01.006](https://doi.org/10.1016%2Fj.electstud.2015.01.006). Delavande, A. and B. Zafar (2019). "University Choice: The Role of Expected Earnings, Nonpecuniary Outcomes, and Financial Constraints". In: _Journal of Political Economy_ 127.5, pp. 2343-2393. DOI: [10.1086/701808](https://doi.org/10.1086%2F701808). Diegert, P., M. A. Masten, and A. Poirier (2025). _Assessing Omitted Variable Bias when the Controls are Endogenous_. arXiv. DOI: [10.48550/ARXIV.2206.02303](https://doi.org/10.48550%2FARXIV.2206.02303). Donohue, J. J. I. and S. D. Levitt (2001). "The Impact of Legalized Abortion on Crime". In: _Quarterly Journal of Economics_ 116.2, pp. 379-420. DOI: [10.1162/00335530151144050](https://doi.org/10.1162%2F00335530151144050). Dube, A., J. Jacobs, S. Naidu, et al. (2020). "Monopsony in Online Labor Markets". In: _American Economic Review: Insights_ 2.1, pp. 33-46. DOI: [10.1257/aeri.20180150](https://doi.org/10.1257%2Faeri.20180150). Erdem, T. and M. P. Keane (1996). "Decision-Making under Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent Consumer Goods Markets". In: _Marketing Science_ 15.1, pp. 1-20. DOI: [10.1287/mksc.15.1.1](https://doi.org/10.1287%2Fmksc.15.1.1). Evans, R. W. (2018). _Simulated Method of Moments (SMM) Estimation_. QuantEcon Note. University of Chicago. URL: [https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93](https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93). Farber, H. S. and R. Gibbons (1996). "Learning and Wage Dynamics". In: _Quarterly Journal of Economics_ 111.4, pp. 1007-1047. DOI: [10.2307/2946706](https://doi.org/10.2307%2F2946706). Fu, C., N. Grau, and J. Rivera (2020). _Wandering Astray: Teenagers' Choices of Schooling and Crime_. Working Paper. University of Wisconsin-Madison. URL: [https://www.ssc.wisc.edu/~cfu/wander.pdf](https://www.ssc.wisc.edu/~cfu/wander.pdf). Geary, R. C. (1942). "Inherent Relations between Random Variables". In: _Proceedings of the Royal Irish Academy. Section A: Mathematical and Physical Sciences_ 47, pp. 63-76. URL: [http://www.jstor.org/stable/20488436](http://www.jstor.org/stable/20488436). Gillingham, K., F. Iskhakov, A. Munk-Nielsen, et al. (2022). "Equilibrium Trade in Automobiles". In: _Journal of Political Economy_. DOI: [10.1086/720463](https://doi.org/10.1086%2F720463). Goodman-Bacon, A. (2021). "Difference-in-differences with variation in treatment timing". In: _Journal of Econometrics_ 225.2. Themed Issue: Treatment Effect 1, pp. 254-277. DOI: [10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016%2Fj.jeconom.2021.03.014). Haile, P. (2019). _``Structural vs. Reduced Form'' Language and Models in Empirical Economics_. Lecture Slides. Yale University. URL: [http://www.econ.yale.edu/~pah29/intro.pdf](http://www.econ.yale.edu/~pah29/intro.pdf). Haile, P. (2024). _Models, Measurement, and the Language of Empirical Economics_. Lecture Slides. Yale University. URL: [https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf](https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf). Hastie, T., R. Tibshirani, and J. Friedman (2009). _The Elements of Statistical Learning: Data Mining, Inference, Prediction_. 2nd. New York: Springer. URL: [https://web.stanford.edu/~hastie/Papers/ESLII.pdf](https://web.stanford.edu/~hastie/Papers/ESLII.pdf). Heckman, J. J. (1979). "Sample Selection Bias as a Specification Error". In: _Econometrica_ 47.1, pp. 153-161. DOI: [10.2307/1912352](https://doi.org/10.2307%2F1912352). Heckman, J. J. and J. A. Smith (1993). "Assessing the Case for Randomized Evaluation of Social Programs". In: _Measuring Labor Market Measures: Evaluating the Effects of Active Labour Market Policies_. Ed. by K. Jensen and P. K. Madsen. Copenhagen: Danish Ministry of Labor, pp. 35-96. Heckman, J. J., J. Smith, and N. Clements (1997). "Making the Most Out of Programme Evaluations and Social Experiments: Accounting for Heterogeneity in Programme Impacts". In: _Review of Economic Studies_ 64.4, pp. 487-535. URL: [http://www.jstor.org/stable/2971729](http://www.jstor.org/stable/2971729). Heckman, J. J., J. Stixrud, and S. Urzua (2006). "The Effects of Cognitive and Noncognitive Abilities on Labor Market Outcomes and Social Behavior". In: _Journal of Labor Economics_ 24.3, pp. 411-482. DOI: [10.1086/504455](https://doi.org/10.1086%2F504455). Heckman, J. J. and E. Vytlacil (2005). "Structural Equations, Treatment Effects, and Econometric Policy Evaluation1". In: _Econometrica_ 73.3, pp. 669-738. DOI: [10.1111/j.1468-0262.2005.00594.x](https://doi.org/10.1111%2Fj.1468-0262.2005.00594.x). Hotz, V. J. and R. A. Miller (1993). "Conditional Choice Probabilities and the Estimation of Dynamic Models". In: _The Review of Economic Studies_ 60.3, pp. 497-529. DOI: [10.2307/2298122](https://doi.org/10.2307%2F2298122). Huenermund, P. and E. Bareinboim (2019). _Causal Inference and Data-Fusion in Econometrics_. Working Paper. arXiv. URL: [https://arxiv.org/abs/1912.09104](https://arxiv.org/abs/1912.09104). Hurwicz, L. (1950). "Generalization of the Concept of Identification". In: _Statistical Inference in Dynamic Economic Models_. Hoboken, NJ: John Wiley and Sons, pp. 245-257. Imbens, G. W. and J. D. Angrist (1994). "Identification and Estimation of Local Average Treatment Effects". In: _Econometrica_ 62.2, pp. 467-475. DOI: [10.2307/2951620](https://doi.org/10.2307%2F2951620). Ishimaru, S. (2022). _Geographic Mobility of Youth and Spatial Gaps in Local College and Labor Market Opportunities_. Working Paper. Hitotsubashi University. James, G., D. Witten, T. Hastie, et al. (2013). _An Introduction to Statistical Learning with Applications in R_. New York: Springer. DOI: [10.1007/978-1-4614-7138-7](https://doi.org/10.1007%2F978-1-4614-7138-7). URL: [https://faculty.marshall.usc.edu/gareth-james/ISL/ISLR_Seventh_Printing.pdf](https://faculty.marshall.usc.edu/gareth-james/ISL/ISLR_Seventh_Printing.pdf). James, J. (2011). _Ability Matching and Occupational Choice_. Working Paper 11-25. Federal Reserve Bank of Cleveland. James, J. (2017). "MM Algorithm for General Mixed Multinomial Logit Models". In: _Journal of Applied Econometrics_ 32.4, pp. 841-857. DOI: [10.1002/jae.2532](https://doi.org/10.1002%2Fjae.2532). Jin, H. and H. Shen (2020). "Foreign Asset Accumulation Among Emerging Market Economies: A Case for Coordination". In: _Review of Economic Dynamics_ 35.1, pp. 54-73. DOI: [10.1016/j.red.2019.04.006](https://doi.org/10.1016%2Fj.red.2019.04.006). Keane, M. P. (2010). "Structural vs. Atheoretic Approaches to Econometrics". In: _Journal of Econometrics_ 156.1, pp. 3-20. DOI: [10.1016/j.jeconom.2009.09.003](https://doi.org/10.1016%2Fj.jeconom.2009.09.003). Keane, M. P. and K. I. Wolpin (1997). "The Career Decisions of Young Men". In: _Journal of Political Economy_ 105.3, pp. 473-522. DOI: [10.1086/262080](https://doi.org/10.1086%2F262080). Koopmans, T. C. and O. Reiersol (1950). "The Identification of Structural Characteristics". In: _The Annals of Mathematical Statistics_ 21.2, pp. 165-181. URL: [http://www.jstor.org/stable/2236899](http://www.jstor.org/stable/2236899). Kosar, G., T. Ransom, and W. van der Klaauw (2022). "Understanding Migration Aversion Using Elicited Counterfactual Choice Probabilities". In: _Journal of Econometrics_ 231.1, pp. 123-147. DOI: [10.1016/j.jeconom.2020.07.056](https://doi.org/10.1016%2Fj.jeconom.2020.07.056). Kotlarski, I. (1967). "On Characterizing the Gamma and the Normal Distribution". In: _Pacific Journal of Mathematics_ 20, pp. 69-76. Krauth, B. (2016). "Bounding a Linear Causal Effect Using Relative Correlation Restrictions". In: _Journal of Econometric Methods_ 5.1, pp. 117-141. DOI: [10.1515/jem-2013-0013](https://doi.org/10.1515%2Fjem-2013-0013). Lang, K. and M. D. Palacios (2018). _The Determinants of Teachers' Occupational Choice_. Working Paper 24883. National Bureau of Economic Research. DOI: [10.3386/w24883](https://doi.org/10.3386%2Fw24883). Lee, D. S., J. McCrary, M. J. Moreira, et al. (2020). _Valid t-ratio Inference for IV_. Working Paper. arXiv. URL: [https://arxiv.org/abs/2010.05058](https://arxiv.org/abs/2010.05058). Lewbel, A. (2019). "The Identification Zoo: Meanings of Identification in Econometrics". In: _Journal of Economic Literature_ 57.4, pp. 835-903. DOI: [10.1257/jel.20181361](https://doi.org/10.1257%2Fjel.20181361). Mahoney, N. (2022). "Principles for Combining Descriptive and Model-Based Analysis in Applied Microeconomics Research". In: _Journal of Economic Perspectives_ 36.3, pp. 211-22. DOI: [10.1257/jep.36.3.211](https://doi.org/10.1257%2Fjep.36.3.211). Mardia, K. V. (1970). "Measures of Multivariate Skewness and Kurtosis with Applications". In: _Biometrika_ 57.3, pp. 519-530. URL: [http://www.jstor.org/stable/2334770](http://www.jstor.org/stable/2334770). McFadden, D. (1978). "Modelling the Choice of Residential Location". In: _Spatial Interaction Theory and Planning Models_. Ed. by A. Karlqvist, L. Lundqvist, F. Snickers and J. W. Weibull. Amsterdam: North Holland, pp. 75-96. McFadden, D. (1989). "A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration". In: _Econometrica_ 57.5, pp. 995-1026. DOI: [10.2307/1913621](https://doi.org/10.2307%2F1913621). URL: [http://www.jstor.org/stable/1913621](http://www.jstor.org/stable/1913621). Mellon, J. (2020). _Rain, Rain, Go Away: 137 Potential Exclusion-Restriction Violations for Studies Using Weather as an Instrumental Variable_. Working Paper. University of Manchester. URL: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610). Miller, R. A. (1984). "Job Matching and Occupational Choice". In: _Journal of Political Economy_ 92.6, pp. 1086-1120. DOI: [10.1086/261276](https://doi.org/10.1086%2F261276). Mincer, J. (1974). _Schooling, Experience and Earnings_. New York: Columbia University Press for National Bureau of Economic Research. Mullainathan, S. and J. Spiess (2017). "Machine Learning: An Applied Econometric Approach". In: _Journal of Economic Perspectives_ 31.2, pp. 87-106. DOI: [10.1257/jep.31.2.87](https://doi.org/10.1257%2Fjep.31.2.87). Ost, B., W. Pan, and D. Webber (2018). "The Returns to College Persistence for Marginal Students: Regression Discontinuity Evidence from University Dismissal Policies". In: _Journal of Labor Economics_ 36.3, pp. 779-805. DOI: [10.1086/696204](https://doi.org/10.1086%2F696204). Oster, E. (2019). "Unobservable Selection and Coefficient Stability: Theory and Evidence". In: _Journal of Business & Economic Statistics_ 37.2, pp. 187-204. DOI: [10.1080/07350015.2016.1227711](https://doi.org/10.1080%2F07350015.2016.1227711). Pearl, J. (2012). "The Do-Calculus Revisited". In: _Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence_. Ed. by N. de Freitas and K. Murphy. Corvallis, OR: AUAI Press, pp. 4-11. Pischke, S. (2007). _Lecture Notes on Measurement Error_. Lecture Notes. London School of Economics. URL: [http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf](http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf). Ransom, M. R. and T. Ransom (2018). "Do High School Sports Build or Reveal Character? Bounding Causal Estimates of Sports Participation". In: _Economics of Education Review_ 64, pp. 75-89. DOI: [10.1016/j.econedurev.2018.04.002](https://doi.org/10.1016%2Fj.econedurev.2018.04.002). Ransom, T. (2021). "Selective Migration, Occupational Choice, and the Wage Returns to College Majors". In: _Annals of Economics & Statistics_ 142, pp. 45-110. DOI: [10.15609/annaeconstat2009.142.0045](https://doi.org/10.15609%2Fannaeconstat2009.142.0045). Ransom, T. (2022). "Labor Market Frictions and Moving Costs of the Employed and Unemployed". In: _Journal of Human Resources_ 57.S, pp. S137-S166. DOI: [10.3368/jhr.monopsony.0219-10013R2](https://doi.org/10.3368%2Fjhr.monopsony.0219-10013R2). Reiersol, O. (1950). "Identifiability of a Linear Relation between Variables Which Are Subject to Error". In: _Econometrica_ 18.4, pp. 375-389. URL: [http://www.jstor.org/stable/1907835](http://www.jstor.org/stable/1907835). Robins, J. M. (1997). "Causal Inference from Complex Longitudinal Data". In: _Latent Variable Modeling and Applications to Causality_. Ed. by M. Berkane. New York: Springer, pp. 69-117. Robinson, P. M. (1988). "Root-N-Consistent Semiparametric Regression". In: _Econometrica_ 56.4, pp. 931-954. URL: [http://www.jstor.org/stable/1912705](http://www.jstor.org/stable/1912705). Roth, J. (2022). "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends". In: _American Economic Review: Insights_ 4.3, pp. 305-322. DOI: [10.1257/aeri.20210236](https://doi.org/10.1257%2Faeri.20210236). Roth, J. and P. H. C. Sant'Anna (2021). _Efficient Estimation for Staggered Rollout Designs_. Working Paper. arXiv. DOI: [10.48550/ARXIV.2102.01291](https://doi.org/10.48550%2FARXIV.2102.01291). URL: [https://arxiv.org/abs/2102.01291](https://arxiv.org/abs/2102.01291). Roth, J., P. Sant'Anna, A. Bilinski, et al. (2022). "What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature". In: _Journal of Econometrics_. Rudik, I. (2020). "Optimal Climate Policy When Damages Are Unknown". In: _American Economic Journal: Economic Policy_ 12.2, pp. 340-373. DOI: [10.1257/pol.20160541](https://doi.org/10.1257%2Fpol.20160541). Rueschendorf, L. (1981). "Sharpness of Frechet-bounds". In: _Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete_ 57.2, pp. 293-302. DOI: [10.1007/BF00535495](https://doi.org/10.1007%2FBF00535495). Rust, J. (1987). "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher". In: _Econometrica_ 55.5, pp. 999-1033. URL: [http://www.jstor.org/stable/1911259](http://www.jstor.org/stable/1911259). Shalizi, C. R. (2019). _Advanced Data Analysis from an Elementary Point of View_. Cambridge University Press. URL: [http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf](http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf). Smith Jr., A. A. (2008). "Indirect Inference". In: _The New Palgrave Dictionary of Economics_. Ed. by S. N. Durlauf and L. E. Blume. Vol. 1-8. London: Palgrave Macmillan. DOI: [10.1007/978-1-349-58802-2](https://doi.org/10.1007%2F978-1-349-58802-2). URL: [http://www.econ.yale.edu/smith/palgrave7.pdf](http://www.econ.yale.edu/smith/palgrave7.pdf). Stinebrickner, R. and T. Stinebrickner (2014a). "Academic Performance and College Dropout: Using Longitudinal Expectations Data to Estimate a Learning Model". In: _Journal of Labor Economics_ 32.3, pp. 601-644. DOI: [10.1086/675308](https://doi.org/10.1086%2F675308). Stinebrickner, R. and T. R. Stinebrickner (2014b). "A Major in Science? Initial Beliefs and Final Outcomes for College Major and Dropout". In: _Review of Economic Studies_ 81.1, pp. 426-472. DOI: [10.1093/restud/rdt025](https://doi.org/10.1093%2Frestud%2Frdt025). Su, C. and K. L. Judd (2012). "Constrained Optimization Approaches to Estimation of Structural Models". In: _Econometrica_ 80.5, pp. 2213-2230. DOI: [10.3982/ECTA7925](https://doi.org/10.3982%2FECTA7925). Sun, L. and S. Abraham (2021). "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects". In: _Journal of Econometrics_ 225.2. Themed Issue: Treatment Effect 1, pp. 175-199. DOI: [10.1016/j.jeconom.2020.09.006](https://doi.org/10.1016%2Fj.jeconom.2020.09.006). Train, K. (2009). _Discrete Choice Methods with Simulation_. 2nd ed. Cambridge; New York: Cambridge University Press. ISBN: 9780521766555. Vytlacil, E. (2002). "Independence, Monotonicity, and Latent Index Models: An Equivalence Result". In: _Econometrica_ 70.1, pp. 331-341. DOI: [10.1111/1468-0262.00277](https://doi.org/10.1111%2F1468-0262.00277). Wiswall, M. and B. Zafar (2018). "Preference for the Workplace, Investment in Human Capital, and Gender". In: _Quarterly Journal of Economics_ 133.1, pp. 457-507. DOI: [10.1093/qje/qjx035](https://doi.org/10.1093%2Fqje%2Fqjx035). Young, A. (2020). _Consistency without Inference: Instrumental Variables in Practical Application_. Working Paper. London School of Economics. ]