Lecture 20

class: title-slide

# Lecture 20

## Machine Learning for Causal Modeling

### Tyler Ransom

### ECON 6343, University of Oklahoma

---

# Plan for the Day

Go over a number of econ papers that use machine learning methods

---
# Publishing fads

.center[![cross-validation](econfads.png)]

[Image source](https://www.economist.com/finance-and-economics/2016/11/24/economists-are-prone-to-fads-and-the-latest-is-machine-learning)

---
# `\(k\)`-means clustering and unobserved types

- Bonhomme, Lamadon, and Manresa (2019)

- Panel data model where unobserved heterogeneity is continuous in the population

- But approximated in the model with a discrete distribution (Group Fixed Effects, GFE)

- Propose a 2-step estimation algorithm:

1. Classify units into groups using `\(k\)`-means clustering
    
    2. Estimate the model using the groups in step 1

- This is different from finite mixture models: no joint estimation required!

---
# Assumptions of BLM (2019)

There are two main assumptions:

1. Unobserved heterogeneity depends on a low-dimensional vector of latent types

- This is similar to the conditions of a factor model
    
    - But this method doesn't require a factor structure
    
2. Underlying types can be approximated from individual-specific moments
 
    - Moments can come from the data (e.g. a battery of test scores)
    
    - They can also come from the model (e.g. choice probabilities)
    
---
# Further considerations

- The `\(k\)`-means objective function is not globally concave

- This means you will need to search for the global minimum

- Consider the log likelihood of a dynamic discrete choice model:

`\begin{align*}
\ell_i\left(\alpha_i,\theta; d_{it},X_{it},Y_{it}\right) &= \sum_t \underbrace{\ln f\left(d_{it}\vert X_{it},\alpha_i,\theta\right)}_{\text{choices}} + \underbrace{\ln f\left(X_{it}\vert d_{it-1},X_{it-1},\alpha_i,\theta\right)}_{\text{state transitions}} + \\
&\phantom{=\sum_t} \underbrace{\ln f\left(Y_{it}\vert d_{it},X_{it},\alpha_i,\theta\right)}_{\text{outcomes}}
\end{align*}`

- Likelihoods are assumed to be additively separable conditional on the FE `\(\alpha_i\)`

---
# Extensions

- You can incorporate covariates into the `\(k\)`-means step

- This can often improve performance

- You can also incorporate model moments in the first step

- This is required if you don't have external measurements (like test scores)

- Another thing to keep in mind is that the GFE is inherently biased

- You may need to iterate on the 2-step estimator multiple times to correct for this

---
# Using ML to solve the sample selection problem

- Heckman (1979) outlines the canonical sample selection problem

- e.g. we only observe the earnings of individuals who are employed

- This might distort our estimates of wage returns to skill

- Can we improve on this by using machine learning?

- Especially if the choice dimension is much larger than work/not work?

---
# Ransom (2021)

- Considers geographic heterogeneity in wage returns to college major

- Individuals choose where they live based on wages and non-wage factors

- Problem: researcher only sees wages in chosen residence location

- Thus, wage returns are potentially contaminated by selection bias

---
# Resolving the selection problem

- Heckman model: the inverse Mill's ratio `\(\lambda(\cdot)\)` corrects for selection

`\begin{align*}
\ln wage &= X\beta + \lambda\left(Z\gamma\right) + u
\end{align*}`

- One can generalize this approach to multinomial choice and non-normality
`\begin{align*}
\ln wage &= X\beta + \sum_j d_j\widetilde{\lambda}\left(p_j(Z),p_k(Z)\right) + u
\end{align*}`
where
    - `\(d_j\)` is a dummy for living in location `\(j\)`
    - `\(\widetilde{\lambda}\)` is a flexible function
    - `\(p_j\)` and `\(p_k\)` are probabilities of choosing `\(j\)` or `\(k\)` (as a function of `\(Z\)`)

---
# Using a tree model to estimate selection

- The `\(p\)`'s on the previous slide are selection probabilities

- `\(p_j\)` is the probability of choosing the chosen alternative

- `\(p_k\)` is the probability of choosing the next-preferred alternative

- Use a classification tree model to obtain the `\(p\)`'s

- Assume that individuals with same values of `\(Z\)` and similar `\(p\)`'s have identical tastes

- This approach improves on a bin-estimation approach

- Can include a higher dimension of `\(Z\)` while limiting the curse of dimensionality

---
# Can LASSO improve causal inference?

- Shifting gears, let's talk about how model selection might improve causal inference

- Thought experiment:

- Methods such as matching and regression rely on unconfoundedness
    
    - If we have high-dimensional data, we can "control for everything"!
    
    - This would give us a high `\(R^2\)` and remove any omitted variable bias
    
    - LASSO can potentially select only the most important variables
    
---
# Prediction problems

- The problem with the above thought experiment is that LASSO only predicts

- If we took a slightly different sample, it might select different variables

- This is because LASSO doesn't care about inference, it cares only about prediction

- Mullainathan and Spiess (2017) illustrate this in their Figure 2

- 2 functions with very different coefficients can produce the exact same prediction

- To use ML in econometrics, we need to be more principled about ML's role

---
# Regularization bias

- In econometrics, we like our estimators to be CAN (Consistent & Asym Normal)

- Suppose we want to estimate a treatment effect `\(\theta\)` in a high-dimensional model

`\begin{align*}
Y &= D\cdot\theta + g(X) + U, & \mathbb{E} \left[U | X, D \right] =0
\end{align*}`

- We might want to use LASSO, ridge, random forest, etc. since `\(X\)` is high-dimensional

- This solves the bias/variance tradeoff, but introduces bias into `\(\hat\theta\)`

- Why? Because the bias/variance tradeoff trades off .hi[regularization bias] and variance

- See Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, Newey, and Robins (2018)

---
# Double ML estimation

- How do we solve the regularization bias problem? Add another equation

- Consider outcome and selection equations, respectively
`\begin{align*}
Y &= D\cdot\theta + g(X) + U, & \mathbb{E} \left[U | X, D \right] =0 \\
D &= m(X) + V, & \mathbb{E} \left[V | X\right] =0
\end{align*}`

- We include the second equation to .hi[orthogonalize] `\(D\)`

- We also need to .hi[split our sample] to be able to estimate this system

- Instead of using `\(D\)`, we use `\(\hat V = D - \hat{m}(X)\)`

- This idea is related to the concept of control functions

---
# Steps for Double ML

(0.) Divide the sample in half; one subsample labeled `\(I^C\)` and the other labeled `\(I\)`

1. Estimate `\(\hat V = D - \hat{m}(X)\)` in `\(I^C\)`

2. Estimate `\(\hat U = Y - \hat{g}(X)\)` in `\(I^C\)`

3. Estimate `\(\check \theta = \left(\hat{V}'D\right)^{-1}\hat{V}'\hat{U}\)` in `\(I\)` (cf. biased `\(\hat \theta = \left(D'D\right)^{-1}D'\hat{U}\)`)

4. Repeat steps 1-3, but switch `\(I^C\)` and `\(I\)` (this is known as cross-fitting)

5. `\(\check \theta_{cf} = \frac{1}{2} \check \theta(I^C,I)+ \frac{1}{2} \check \theta(I,I^C)\)`

- These steps ensure that `\(\check \theta\)` is unbiased and efficient

- Nice examples in [R](https://www.r-bloggers.com/2017/06/cross-fitting-double-machine-learning-estimator/) and [Python](http://aeturrell.com/2018/02/10/econometrics-in-python-partI-ML/)

---
# Post Double Selection (PDS)

- Now let's consider a related idea to Double ML

- This is known as .hi[post double selection] (Belloni, Chernozhukov, and Hansen, 2014)

- It is a useful way to estimate treatment effects in linear models

- Same setup as Double ML, but here `\(g(\cdot)\)` and `\(m(\cdot)\)` are linear

`\begin{align*}
Y &= D\cdot\theta + g(X) + U, & \mathbb{E} \left[U | X, D \right] =0 \\
D &= m(X) + V, & \mathbb{E} \left[V | X\right] =0
\end{align*}`

---
# PDS steps

1. Use LASSO to separately select `\(X\)`

- First on `\(Y = g(X) + \tilde U\)`
    
    - Then on `\(D = m(X) + V\)`
    
2. Regress `\(Y\)` on `\(D\)` and the union of the selected `\(X\)`'s from step 1

- The procedure is called "post double selection" because the final regression is on the set of `\(X\)`'s that have been doubly selected (first in the outcome equation, then in the selection equation)

- Key idea is that we avoid regularization bias by only looking at the selection part of LASSO (not the shrinkage part)

---
# Usefulness of PDS

- For an example, let's re-evaluate Donohue and Levitt (2001)

- Their claim: legalizing abortion reduces crime

- Intuition: unwanted children are most likely to become criminals

- Use a "two-way fixed effects" model on state-level panel data:

`\begin{align*}
y_{st} &= \alpha a_{st} + \beta w_{st} + \delta_s + \gamma_t + \varepsilon_{st}
\end{align*}`

where `\(s\)` is US state, `\(t\)` is time, and `\(a_{st}\)` is the abortion rate (15-25 years prior)

- `\(y_{st}\)` are various measures of crime (property, violent, murder, ...)

- `\(w_{st}\)` are state-level controls (prisoners per capita, police per capita, ...)

---
# Re-evaluating Donohue and Levitt (2001)

- A potential issue with Donohue and Levitt (2001): specification of `\(w_{st}\)`

- We might think we should include highly flexible forms of elements of `\(w_{st}\)`

- Indeed, when Belloni, Chernozhukov, and Hansen (2014) do this, the SE's get larger

- All previous results are diminished in magnitude and have 5x larger SE's

- The PDS approach is also useful for other regression designs such as DiD

---
# Heterogeneous treatment effects

- ML can also help us with treatment effect heterogeneity

- See Athey and Imbens (2016)

- Use regression trees to partition units into groups with similar TE's

- Estimation is "honest" in a similar way as Double ML:

- Split the sample in half
    
    - Use one subsample to do the partitioning
    
    - Use the other subsample to estimate the TE's

---
# Matrix completion

- Causal inference is fundamentally a missing data problem

- This is because we only ever observe `\(Y=D_0 Y_0 + D_1 Y_1\)`

- Athey, Bayati, Doudchenko, Imbens, and Khosravi (2018) propose .hi[matrix completion] methods for panel data

- This is a credible data imputation technique

- Estimate the ATE by imputing `\(Y_0\)` for treated units

- Take into account within-unit serial correlation

---
# Recent advances in difference-in-differences

- Roth, Sant'Anna, Bilinski, and Poe (2022) provide an overview of recent advances

- de Chaisemartin and D'Haultfoeuille (2022) also provide an overview

- Main advances:

- Multiple periods and variation in treatment timing

- Non-parallel trends

- Alternative sampling assumptions (i.e. appropriate standard errors)

- Main idea is that treatment effect heterogeneity complicates things

---
# Key papers in new DiD literature

- Multiple periods and variation in treatment timing:

- de
Chaisemartin and D'Haultfoeuille (2020); Goodman-Bacon (2021); Callaway and Sant’Anna (2021); Sun and Abraham (2021)

- Relaxing parallel trends assumption / testing for pre-trends:

- Roth (2022); others (see Roth, Sant'Anna, Bilinski et al. (2022))

- Appropriate standard errors in DiD estimation:

- Roth and Sant'Anna (2021); others (see Roth, Sant'Anna, Bilinski et al. (2022))

---
# Further reading

- Bajari, Nekipelov, Ryan, and Yang (2015)

- Examples of using ML in IO demand estimation

- Dube, Jacobs, Naidu, and Suri (2020)

- Example of using Double ML to estimate employer monopsony power

- Angrist and Frandsen (2022)

- Discussion of the role ML should play in empirical labor economics

---
# References
.minuscule[
Aakvik, A., J. J. Heckman, and E. J. Vytlacil (2005). "Estimating Treatment Effects for
Discrete Outcomes When Responses to Treatment Vary: An Application to Norwegian Vocational
Rehabilitation Programs". In: _Journal of Econometrics_ 125.1, pp. 15-51. DOI:
[10.1016/j.jeconom.2004.04.002](https://doi.org/10.1016%2Fj.jeconom.2004.04.002).

Ackerberg, D. A. (2003). "Advertising, Learning, and Consumer Choice in Experience Good
Markets: An Empirical Examination". In: _International Economic Review_ 44.3, pp.
1007-1040. DOI:
[10.1111/1468-2354.t01-2-00098](https://doi.org/10.1111%2F1468-2354.t01-2-00098).

Adams, R. P. (2018). _Model Selection and Cross Validation_. Lecture Notes. Princeton
University. URL:
[https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf](https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf).

Ahlfeldt, G. M., S. J. Redding, D. M. Sturm, et al. (2015). "The Economics of Density:
Evidence From the Berlin Wall". In: _Econometrica_ 83.6, pp. 2127-2189. DOI:
[10.3982/ECTA10876](https://doi.org/10.3982%2FECTA10876).

Altonji, J. G., T. E. Elder, and C. R. Taber (2005). "Selection on Observed and Unobserved
Variables: Assessing the Effectiveness of Catholic Schools". In: _Journal of Political
Economy_ 113.1, pp. 151-184. DOI: [10.1086/426036](https://doi.org/10.1086%2F426036).

Altonji, J. G. and C. R. Pierret (2001). "Employer Learning and Statistical
Discrimination". In: _Quarterly Journal of Economics_ 116.1, pp. 313-350. DOI:
[10.1162/003355301556329](https://doi.org/10.1162%2F003355301556329).

Angrist, J. D. and B. Frandsen (2022). "Machine Labor". In: _Journal of Labor Economics_
40.S1, pp. S97-S140. DOI: [10.1086/717933](https://doi.org/10.1086%2F717933).

Angrist, J. D. and A. B. Krueger (1991). "Does Compulsory School Attendance Affect
Schooling and Earnings?" In: _Quarterly Journal of Economics_ 106.4, pp. 979-1014. DOI:
[10.2307/2937954](https://doi.org/10.2307%2F2937954).

Angrist, J. D. and J. Pischke (2009). _Mostly Harmless Econometrics: An Empiricist's
Companion_. Princeton University Press. ISBN: 0691120358.

Arcidiacono, P. (2004). "Ability Sorting and the Returns to College Major". In: _Journal
of Econometrics_ 121, pp. 343-375. DOI:
[10.1016/j.jeconom.2003.10.010](https://doi.org/10.1016%2Fj.jeconom.2003.10.010).

Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2016). _College Attrition and the Dynamics
of Information Revelation_. Working Paper. Duke University. URL:
[https://tyleransom.github.io/research/CollegeDropout2016May31.pdf](https://tyleransom.github.io/research/CollegeDropout2016May31.pdf).

Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2025). "College Attrition and the Dynamics
of Information Revelation". In: _Journal of Political Economy_ 133.1. DOI:
[10.1086/732526](https://doi.org/10.1086%2F732526).

Arcidiacono, P. and J. B. Jones (2003). "Finite Mixture Distributions, Sequential
Likelihood and the EM Algorithm". In: _Econometrica_ 71.3, pp. 933-946. DOI:
[10.1111/1468-0262.00431](https://doi.org/10.1111%2F1468-0262.00431).

Arcidiacono, P., J. Kinsler, and T. Ransom (2022b). "Asian American Discrimination in
Harvard Admissions". In: _European Economic Review_ 144, p. 104079. DOI:
[10.1016/j.euroecorev.2022.104079](https://doi.org/10.1016%2Fj.euroecorev.2022.104079).

Arcidiacono, P., J. Kinsler, and T. Ransom (2022a). "Legacy and Athlete Preferences at
Harvard". In: _Journal of Labor Economics_ 40.1, pp. 133-156. DOI:
[10.1086/713744](https://doi.org/10.1086%2F713744).

Arcidiacono, P. and R. A. Miller (2011). "Conditional Choice Probability Estimation of
Dynamic Discrete Choice Models With Unobserved Heterogeneity". In: _Econometrica_ 79.6,
pp. 1823-1867. DOI: [10.3982/ECTA7743](https://doi.org/10.3982%2FECTA7743).

Arroyo Marioli, F., F. Bullano, S. Kucinskas, et al. (2020). _Tracking R of COVID-19: A
New Real-Time Estimation Using the Kalman Filter_. Working Paper. medRxiv. DOI:
[10.1101/2020.04.19.20071886](https://doi.org/10.1101%2F2020.04.19.20071886).

Ashworth, J., V. J. Hotz, A. Maurel, et al. (2021). "Changes across Cohorts in Wage
Returns to Schooling and Early Work Experiences". In: _Journal of Labor Economics_ 39.4,
pp. 931-964. DOI: [10.1086/711851](https://doi.org/10.1086%2F711851).

Athey, S., M. Bayati, N. Doudchenko, et al. (2018). _Matrix Completion Methods for Causal
Panel Data Models_. Working Paper 25132. National Bureau of Economic Research. DOI:
[10.3386/w25132](https://doi.org/10.3386%2Fw25132).

Athey, S. and G. Imbens (2016). "Recursive partitioning for heterogeneous causal effects".
In: _Proceedings of the National Academy of Sciences_ 113.27, pp. 7353-7360. DOI:
[10.1073/pnas.1510489113](https://doi.org/10.1073%2Fpnas.1510489113).

Attanasio, O. P., C. Meghir, and A. Santiago (2011). "Education Choices in Mexico: Using a
Structural Model and a Randomized Experiment to Evaluate PROGRESA". In: _Review of
Economic Studies_ 79.1, pp. 37-66. DOI:
[10.1093/restud/rdr015](https://doi.org/10.1093%2Frestud%2Frdr015).

Aucejo, E. M. and J. James (2019). "Catching Up to Girls: Understanding the Gender
Imbalance in Educational Attainment Within Race". In: _Journal of Applied Econometrics_
34.4, pp. 502-525. DOI: [10.1002/jae.2699](https://doi.org/10.1002%2Fjae.2699).

Bajari, P., D. Nekipelov, S. P. Ryan, et al. (2015). "Machine Learning Methods for Demand
Estimation". In: _American Economic Review_ 105.5, pp. 481-485. DOI:
[10.1257/aer.p20151021](https://doi.org/10.1257%2Faer.p20151021).

Baragatti, M., A. Grimaud, and D. Pommeret (2013). "Likelihood-free Parallel Tempering".
In: _Statistics and Computing_ 23.4, pp. 535-549. DOI: [
10.1007/s11222-012-9328-6](https://doi.org/%2010.1007%2Fs11222-012-9328-6).

Bayer, P., R. McMillan, A. Murphy, et al. (2016). "A Dynamic Model of Demand for Houses
and Neighborhoods". In: _Econometrica_ 84.3, pp. 893-942. DOI:
[10.3982/ECTA10170](https://doi.org/10.3982%2FECTA10170).

Begg, C. B. and R. Gray (1984). "Calculation of Polychotomous Logistic Regression
Parameters Using Individualized Regressions". In: _Biometrika_ 71.1, pp. 11-18. DOI:
[10.1093/biomet/71.1.11](https://doi.org/10.1093%2Fbiomet%2F71.1.11).

Beggs, S. D., N. S. Cardell, and J. Hausman (1981). "Assessing the Potential Demand for
Electric Cars". In: _Journal of Econometrics_ 17.1, pp. 1-19. DOI:
[10.1016/0304-4076(81)90056-7](https://doi.org/10.1016%2F0304-4076%2881%2990056-7).

Belloni, A., V. Chernozhukov, and C. Hansen (2014). "Inference on Treatment Effects after
Selection among High-Dimensional Controls". In: _Review of Economic Studies_ 81.2, pp.
608-650. DOI: [10.1093/restud/rdt044](https://doi.org/10.1093%2Frestud%2Frdt044).

Berry, S., J. Levinsohn, and A. Pakes (1995). "Automobile Prices in Market Equilibrium".
In: _Econometrica_ 63.4, pp. 841-890. URL:
[http://www.jstor.org/stable/2171802](http://www.jstor.org/stable/2171802).

Bjorklund, A. and R. Moffitt (1987). "The Estimation of Wage Gains and Welfare Gains in
Self-Selection Models". In: _Review of Economics and Statistics_ 69.1, pp. 42-49. DOI:
[10.2307/1937899](https://doi.org/10.2307%2F1937899).

Blass, A. A., S. Lach, and C. F. Manski (2010). "Using Elicited Choice Probabilities to
Estimate Random Utility Models: Preferences for Electricity Reliability". In:
_International Economic Review_ 51.2, pp. 421-440. DOI:
[10.1111/j.1468-2354.2010.00586.x](https://doi.org/10.1111%2Fj.1468-2354.2010.00586.x).

Blundell, R. (2010). "Comments on: ``Structural vs. Atheoretic Approaches to
Econometrics'' by Michael Keane". In: _Journal of Econometrics_ 156.1, pp. 25-26. DOI:
[10.1016/j.jeconom.2009.09.005](https://doi.org/10.1016%2Fj.jeconom.2009.09.005).

Bonhomme, S., T. Lamadon, and E. Manresa (2019). _Discretizing Unobserved Heterogeneity_.
Working Paper. University of Chicago. URL:
[https://lamadon.com/paper/blm2_2019.pdf](https://lamadon.com/paper/blm2_2019.pdf).

Bonhomme, S. and J. Robin (2009). "Consistent Noisy Independent Component Analysis". In:
_Journal of Econometrics_ 149.1, pp. 12-25. DOI:
[10.1016/j.jeconom.2008.12.019](https://doi.org/10.1016%2Fj.jeconom.2008.12.019).

Bonhomme, S. and J. Robin (2010). "Generalized Non-Parametric Deconvolution with an
Application to Earnings Dynamics". In: _Review of Economic Studies_ 77.2, pp. 491-533.
DOI:
[10.1111/j.1467-937X.2009.00577.x](https://doi.org/10.1111%2Fj.1467-937X.2009.00577.x).

Bresnahan, T. F., S. Stern, and M. Trajtenberg (1997). "Market Segmentation and the
Sources of Rents from Innovation: Personal Computers in the Late 1980s". In: _The RAND
Journal of Economics_ 28.0, pp. S17-S44. DOI:
[10.2307/3087454](https://doi.org/10.2307%2F3087454).

Brien, M. J., L. A. Lillard, and S. Stern (2006). "Cohabitation, Marriage, and Divorce in
a Model of Match Quality". In: _International Economic Review_ 47.2, pp. 451-494. DOI:
[10.1111/j.1468-2354.2006.00385.x](https://doi.org/10.1111%2Fj.1468-2354.2006.00385.x).

Brinch, C. N., M. Mogstad, and M. Wiswall (2017). "Beyond LATE with a Discrete
Instrument". In: _Journal of Political Economy_ 125.4, pp. 985-1039. DOI:
[10.1086/692712](https://doi.org/10.1086%2F692712).

Callaway, B. and P. H. Sant’Anna (2021). "Difference-in-Differences with multiple time
periods". In: _Journal of Econometrics_ 225.2. Themed Issue: Treatment Effect 1, pp.
200-230. DOI:
[10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016%2Fj.jeconom.2020.12.001).

Card, D. (1995). "Using Geographic Variation in College Proximity to Estimate the Return
to Schooling". In: _Aspects of Labor Market Behaviour: Essays in Honour of John
Vanderkamp_. Ed. by L. N. Christofides, E. K. Grant and R. Swidinsky. Toronto: University
of Toronto Press.

Cardell, N. S. (1997). "Variance Components Structures for the Extreme-Value and Logistic
Distributions with Application to Models of Heterogeneity". In: _Econometric Theory_ 13.2,
pp. 185-213. URL:
[https://www.jstor.org/stable/3532724](https://www.jstor.org/stable/3532724).

Carneiro, P., K. T. Hansen, and J. J. Heckman (2003). "Estimating Distributions of
Treatment Effects with an Application to the Returns to Schooling and Measurement of the
Effects of Uncertainty on College Choice". In: _International Economic Review_ 44.2, pp.
361-422. DOI:
[10.1111/1468-2354.t01-1-00074](https://doi.org/10.1111%2F1468-2354.t01-1-00074).

Carneiro, P., J. J. Heckman, and E. Vytlacil (2010). "Evaluating Marginal Policy Changes
and the Average Effect of Treatment for Individuals at the Margin". In: _Econometrica_
78.1, pp. 377-394. DOI: [10.3982/ECTA7089](https://doi.org/10.3982%2FECTA7089).

Carneiro, P., J. J. Heckman, and E. J. Vytlacil (2011). "Estimating Marginal Returns to
Education". In: _American Economic Review_ 101.6, pp. 2754-2781. DOI:
[10.1257/aer.101.6.2754](https://doi.org/10.1257%2Faer.101.6.2754).

Caucutt, E. M., L. Lochner, J. Mullins, et al. (2020). _Child Skill Production: Accounting
for Parental and Market-Based Time and Goods Investments_. Working Paper 27838. National
Bureau of Economic Research. DOI: [10.3386/w27838](https://doi.org/10.3386%2Fw27838).

Chaisemartin, C. de and X. D'Haultfoeuille (2020). "Two-Way Fixed Effects Estimators with
Heterogeneous Treatment Effects". In: _American Economic Review_ 110.9, pp. 2964-2996.
DOI: [10.1257/aer.20181169](https://doi.org/10.1257%2Faer.20181169).

Chen, X., H. Hong, and D. Nekipelov (2011). "Nonlinear Models of Measurement Errors". In:
_Journal of Economic Literature_ 49.4, pp. 901-937. DOI:
[10.1257/jel.49.4.901](https://doi.org/10.1257%2Fjel.49.4.901).

Chernozhukov, V., D. Chetverikov, M. Demirer, et al. (2018). "Double/Debiased Machine
Learning for Treatment and Structural Parameters". In: _Econometrics Journal_ 21.1, pp.
C1-C68. DOI: [10.1111/ectj.12097](https://doi.org/10.1111%2Fectj.12097).

Chintagunta, P. K. (1992). "Estimating a Multinomial Probit Model of Brand Choice Using
the Method of Simulated Moments". In: _Marketing Science_ 11.4, pp. 386-407. DOI:
[10.1287/mksc.11.4.386](https://doi.org/10.1287%2Fmksc.11.4.386).

Cinelli, C. and C. Hazlett (2020). "Making Sense of Sensitivity: Extending Omitted
Variable Bias". In: _Journal of the Royal Statistical Society: Series B (Statistical
Methodology)_ 82.1, pp. 39-67. DOI:
[10.1111/rssb.12348](https://doi.org/10.1111%2Frssb.12348).

Coate, P. and K. Mangum (2019). _Fast Locations and Slowing Labor Mobility_. Working Paper
19-49. Federal Reserve Bank of Philadelphia.

Cunha, F. and J. Heckman (2007). "The Technology of Skill Formation". In: _American
Economic Review_ 97.2, pp. 31-47. DOI:
[10.1257/aer.97.2.31](https://doi.org/10.1257%2Faer.97.2.31).

Cunha, F., J. J. Heckman, and S. M. Schennach (2010). "Estimating the Technology of
Cognitive and Noncognitive Skill Formation". In: _Econometrica_ 78.3, pp. 883-931. DOI:
[10.3982/ECTA6551](https://doi.org/10.3982%2FECTA6551).

Cunningham, S. (2021). _Causal Inference: The Mixtape_. Yale University Press. URL:
[https://www.scunning.com/causalinference_norap.pdf](https://www.scunning.com/causalinference_norap.pdf).

de Chaisemartin, C. and X. D'Haultfoeuille (2022). "Two-way Fixed Effects and
Differences-in-differences with Heterogeneous Treatment Effects: A S urvey". In: _The
Econometrics Journal_. DOI:
[10.1093/ectj/utac017](https://doi.org/10.1093%2Fectj%2Futac017).

Delavande, A. and C. F. Manski (2015). "Using Elicited Choice Probabilities in
Hypothetical Elections to Study Decisions to Vote". In: _Electoral Studies_ 38, pp. 28-37.
DOI: [10.1016/j.electstud.2015.01.006](https://doi.org/10.1016%2Fj.electstud.2015.01.006).

Delavande, A. and B. Zafar (2019). "University Choice: The Role of Expected Earnings,
Nonpecuniary Outcomes, and Financial Constraints". In: _Journal of Political Economy_
127.5, pp. 2343-2393. DOI: [10.1086/701808](https://doi.org/10.1086%2F701808).

Diegert, P., M. A. Masten, and A. Poirier (2025). _Assessing Omitted Variable Bias when
the Controls are Endogenous_. arXiv. DOI:
[10.48550/ARXIV.2206.02303](https://doi.org/10.48550%2FARXIV.2206.02303).

Donohue, J. J. I. and S. D. Levitt (2001). "The Impact of Legalized Abortion on Crime".
In: _Quarterly Journal of Economics_ 116.2, pp. 379-420. DOI:
[10.1162/00335530151144050](https://doi.org/10.1162%2F00335530151144050).

Dube, A., J. Jacobs, S. Naidu, et al. (2020). "Monopsony in Online Labor Markets". In:
_American Economic Review: Insights_ 2.1, pp. 33-46. DOI:
[10.1257/aeri.20180150](https://doi.org/10.1257%2Faeri.20180150).

Erdem, T. and M. P. Keane (1996). "Decision-Making under Uncertainty: Capturing Dynamic
Brand Choice Processes in Turbulent Consumer Goods Markets". In: _Marketing Science_ 15.1,
pp. 1-20. DOI: [10.1287/mksc.15.1.1](https://doi.org/10.1287%2Fmksc.15.1.1).

Evans, R. W. (2018). _Simulated Method of Moments (SMM) Estimation_. QuantEcon Note.
University of Chicago. URL:
[https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93](https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93).

Farber, H. S. and R. Gibbons (1996). "Learning and Wage Dynamics". In: _Quarterly Journal
of Economics_ 111.4, pp. 1007-1047. DOI:
[10.2307/2946706](https://doi.org/10.2307%2F2946706).

Fu, C., N. Grau, and J. Rivera (2020). _Wandering Astray: Teenagers' Choices of Schooling
and Crime_. Working Paper. University of Wisconsin-Madison. URL:
[https://www.ssc.wisc.edu/~cfu/wander.pdf](https://www.ssc.wisc.edu/~cfu/wander.pdf).

Geary, R. C. (1942). "Inherent Relations between Random Variables". In: _Proceedings of
the Royal Irish Academy. Section A: Mathematical and Physical Sciences_ 47, pp. 63-76.
URL: [http://www.jstor.org/stable/20488436](http://www.jstor.org/stable/20488436).

Gillingham, K., F. Iskhakov, A. Munk-Nielsen, et al. (2022). "Equilibrium Trade in
Automobiles". In: _Journal of Political Economy_. DOI:
[10.1086/720463](https://doi.org/10.1086%2F720463).

Goodman-Bacon, A. (2021). "Difference-in-differences with variation in treatment timing".
In: _Journal of Econometrics_ 225.2. Themed Issue: Treatment Effect 1, pp. 254-277. DOI:
[10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016%2Fj.jeconom.2021.03.014).

Haile, P. (2019). _``Structural vs. Reduced Form'' Language and Models in Empirical
Economics_. Lecture Slides. Yale University. URL:
[http://www.econ.yale.edu/~pah29/intro.pdf](http://www.econ.yale.edu/~pah29/intro.pdf).

Haile, P. (2024). _Models, Measurement, and the Language of Empirical Economics_. Lecture
Slides. Yale University. URL:
[https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf](https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf).

Hastie, T., R. Tibshirani, and J. Friedman (2009). _The Elements of Statistical Learning:
Data Mining, Inference, Prediction_. 2nd. New York: Springer. URL:
[https://web.stanford.edu/~hastie/Papers/ESLII.pdf](https://web.stanford.edu/~hastie/Papers/ESLII.pdf).

Heckman, J. J. (1979). "Sample Selection Bias as a Specification Error". In:
_Econometrica_ 47.1, pp. 153-161. DOI:
[10.2307/1912352](https://doi.org/10.2307%2F1912352).

Heckman, J. J. and J. A. Smith (1993). "Assessing the Case for Randomized Evaluation of
Social Programs". In: _Measuring Labor Market Measures: Evaluating the Effects of Active
Labour Market Policies_. Ed. by K. Jensen and P. K. Madsen. Copenhagen: Danish Ministry of
Labor, pp. 35-96.

Heckman, J. J., J. Smith, and N. Clements (1997). "Making the Most Out of Programme
Evaluations and Social Experiments: Accounting for Heterogeneity in Programme Impacts".
In: _Review of Economic Studies_ 64.4, pp. 487-535. URL:
[http://www.jstor.org/stable/2971729](http://www.jstor.org/stable/2971729).

Heckman, J. J., J. Stixrud, and S. Urzua (2006). "The Effects of Cognitive and
Noncognitive Abilities on Labor Market Outcomes and Social Behavior". In: _Journal of
Labor Economics_ 24.3, pp. 411-482. DOI:
[10.1086/504455](https://doi.org/10.1086%2F504455).

Heckman, J. J. and E. Vytlacil (2005). "Structural Equations, Treatment Effects, and
Econometric Policy Evaluation1". In: _Econometrica_ 73.3, pp. 669-738. DOI:
[10.1111/j.1468-0262.2005.00594.x](https://doi.org/10.1111%2Fj.1468-0262.2005.00594.x).

Hotz, V. J. and R. A. Miller (1993). "Conditional Choice Probabilities and the Estimation
of Dynamic Models". In: _The Review of Economic Studies_ 60.3, pp. 497-529. DOI:
[10.2307/2298122](https://doi.org/10.2307%2F2298122).

Huenermund, P. and E. Bareinboim (2019). _Causal Inference and Data-Fusion in
Econometrics_. Working Paper. arXiv. URL:
[https://arxiv.org/abs/1912.09104](https://arxiv.org/abs/1912.09104).

Hurwicz, L. (1950). "Generalization of the Concept of Identification". In: _Statistical
Inference in Dynamic Economic Models_. Hoboken, NJ: John Wiley and Sons, pp. 245-257.

Imbens, G. W. and J. D. Angrist (1994). "Identification and Estimation of Local Average
Treatment Effects". In: _Econometrica_ 62.2, pp. 467-475. DOI:
[10.2307/2951620](https://doi.org/10.2307%2F2951620).

Ishimaru, S. (2022). _Geographic Mobility of Youth and Spatial Gaps in Local College and
Labor Market Opportunities_. Working Paper. Hitotsubashi University.

James, G., D. Witten, T. Hastie, et al. (2013). _An Introduction to Statistical Learning
with Applications in R_. New York: Springer. DOI:
[10.1007/978-1-4614-7138-7](https://doi.org/10.1007%2F978-1-4614-7138-7). URL:
[https://faculty.marshall.usc.edu/gareth-james/ISL/ISLR_Seventh_Printing.pdf](https://faculty.marshall.usc.edu/gareth-james/ISL/ISLR_Seventh_Printing.pdf).

James, J. (2011). _Ability Matching and Occupational Choice_. Working Paper 11-25. Federal
Reserve Bank of Cleveland.

James, J. (2017). "MM Algorithm for General Mixed Multinomial Logit Models". In: _Journal
of Applied Econometrics_ 32.4, pp. 841-857. DOI:
[10.1002/jae.2532](https://doi.org/10.1002%2Fjae.2532).

Jin, H. and H. Shen (2020). "Foreign Asset Accumulation Among Emerging Market Economies: A
Case for Coordination". In: _Review of Economic Dynamics_ 35.1, pp. 54-73. DOI:
[10.1016/j.red.2019.04.006](https://doi.org/10.1016%2Fj.red.2019.04.006).

Keane, M. P. (2010). "Structural vs. Atheoretic Approaches to Econometrics". In: _Journal
of Econometrics_ 156.1, pp. 3-20. DOI:
[10.1016/j.jeconom.2009.09.003](https://doi.org/10.1016%2Fj.jeconom.2009.09.003).

Keane, M. P. and K. I. Wolpin (1997). "The Career Decisions of Young Men". In: _Journal of
Political Economy_ 105.3, pp. 473-522. DOI:
[10.1086/262080](https://doi.org/10.1086%2F262080).

Koopmans, T. C. and O. Reiersol (1950). "The Identification of Structural
Characteristics". In: _The Annals of Mathematical Statistics_ 21.2, pp. 165-181. URL:
[http://www.jstor.org/stable/2236899](http://www.jstor.org/stable/2236899).

Kosar, G., T. Ransom, and W. van der Klaauw (2022). "Understanding Migration Aversion
Using Elicited Counterfactual Choice Probabilities". In: _Journal of Econometrics_ 231.1,
pp. 123-147. DOI:
[10.1016/j.jeconom.2020.07.056](https://doi.org/10.1016%2Fj.jeconom.2020.07.056).

Kotlarski, I. (1967). "On Characterizing the Gamma and the Normal Distribution". In:
_Pacific Journal of Mathematics_ 20, pp. 69-76.

Krauth, B. (2016). "Bounding a Linear Causal Effect Using Relative Correlation
Restrictions". In: _Journal of Econometric Methods_ 5.1, pp. 117-141. DOI:
[10.1515/jem-2013-0013](https://doi.org/10.1515%2Fjem-2013-0013).

Lang, K. and M. D. Palacios (2018). _The Determinants of Teachers' Occupational Choice_.
Working Paper 24883. National Bureau of Economic Research. DOI:
[10.3386/w24883](https://doi.org/10.3386%2Fw24883).

Lee, D. S., J. McCrary, M. J. Moreira, et al. (2020). _Valid t-ratio Inference for IV_.
Working Paper. arXiv. URL:
[https://arxiv.org/abs/2010.05058](https://arxiv.org/abs/2010.05058).

Lewbel, A. (2019). "The Identification Zoo: Meanings of Identification in Econometrics".
In: _Journal of Economic Literature_ 57.4, pp. 835-903. DOI:
[10.1257/jel.20181361](https://doi.org/10.1257%2Fjel.20181361).

Mahoney, N. (2022). "Principles for Combining Descriptive and Model-Based Analysis in
Applied Microeconomics Research". In: _Journal of Economic Perspectives_ 36.3, pp. 211-22.
DOI: [10.1257/jep.36.3.211](https://doi.org/10.1257%2Fjep.36.3.211).

Mardia, K. V. (1970). "Measures of Multivariate Skewness and Kurtosis with Applications".
In: _Biometrika_ 57.3, pp. 519-530. URL:
[http://www.jstor.org/stable/2334770](http://www.jstor.org/stable/2334770).

McFadden, D. (1978). "Modelling the Choice of Residential Location". In: _Spatial
Interaction Theory and Planning Models_. Ed. by A. Karlqvist, L. Lundqvist, F. Snickers
and J. W. Weibull. Amsterdam: North Holland, pp. 75-96.

McFadden, D. (1989). "A Method of Simulated Moments for Estimation of Discrete Response
Models Without Numerical Integration". In: _Econometrica_ 57.5, pp. 995-1026. DOI:
[10.2307/1913621](https://doi.org/10.2307%2F1913621). URL:
[http://www.jstor.org/stable/1913621](http://www.jstor.org/stable/1913621).

Mellon, J. (2020). _Rain, Rain, Go Away: 137 Potential Exclusion-Restriction Violations
for Studies Using Weather as an Instrumental Variable_. Working Paper. University of
Manchester. URL:
[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610).

Miller, R. A. (1984). "Job Matching and Occupational Choice". In: _Journal of Political
Economy_ 92.6, pp. 1086-1120. DOI: [10.1086/261276](https://doi.org/10.1086%2F261276).

Mincer, J. (1974). _Schooling, Experience and Earnings_. New York: Columbia University
Press for National Bureau of Economic Research.

Mullainathan, S. and J. Spiess (2017). "Machine Learning: An Applied Econometric
Approach". In: _Journal of Economic Perspectives_ 31.2, pp. 87-106. DOI:
[10.1257/jep.31.2.87](https://doi.org/10.1257%2Fjep.31.2.87).

Ost, B., W. Pan, and D. Webber (2018). "The Returns to College Persistence for Marginal
Students: Regression Discontinuity Evidence from University Dismissal Policies". In:
_Journal of Labor Economics_ 36.3, pp. 779-805. DOI:
[10.1086/696204](https://doi.org/10.1086%2F696204).

Oster, E. (2019). "Unobservable Selection and Coefficient Stability: Theory and Evidence".
In: _Journal of Business & Economic Statistics_ 37.2, pp. 187-204. DOI:
[10.1080/07350015.2016.1227711](https://doi.org/10.1080%2F07350015.2016.1227711).

Pearl, J. (2012). "The Do-Calculus Revisited". In: _Proceedings of the Twenty-Eighth
Conference on Uncertainty in Artificial Intelligence_. Ed. by N. de Freitas and K. Murphy.
Corvallis, OR: AUAI Press, pp. 4-11.

Pischke, S. (2007). _Lecture Notes on Measurement Error_. Lecture Notes. London School of
Economics. URL:
[http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf](http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf).

Ransom, M. R. and T. Ransom (2018). "Do High School Sports Build or Reveal Character?
Bounding Causal Estimates of Sports Participation". In: _Economics of Education Review_
64, pp. 75-89. DOI:
[10.1016/j.econedurev.2018.04.002](https://doi.org/10.1016%2Fj.econedurev.2018.04.002).

Ransom, T. (2021). "Selective Migration, Occupational Choice, and the Wage Returns to
College Majors". In: _Annals of Economics & Statistics_ 142, pp. 45-110. DOI:
[10.15609/annaeconstat2009.142.0045](https://doi.org/10.15609%2Fannaeconstat2009.142.0045).

Ransom, T. (2022). "Labor Market Frictions and Moving Costs of the Employed and
Unemployed". In: _Journal of Human Resources_ 57.S, pp. S137-S166. DOI:
[10.3368/jhr.monopsony.0219-10013R2](https://doi.org/10.3368%2Fjhr.monopsony.0219-10013R2).

Reiersol, O. (1950). "Identifiability of a Linear Relation between Variables Which Are
Subject to Error". In: _Econometrica_ 18.4, pp. 375-389. URL:
[http://www.jstor.org/stable/1907835](http://www.jstor.org/stable/1907835).

Robins, J. M. (1997). "Causal Inference from Complex Longitudinal Data". In: _Latent
Variable Modeling and Applications to Causality_. Ed. by M. Berkane. New York: Springer,
pp. 69-117.

Robinson, P. M. (1988). "Root-N-Consistent Semiparametric Regression". In: _Econometrica_
56.4, pp. 931-954. URL:
[http://www.jstor.org/stable/1912705](http://www.jstor.org/stable/1912705).

Roth, J. (2022). "Pretest with Caution: Event-Study Estimates after Testing for Parallel
Trends". In: _American Economic Review: Insights_ 4.3, pp. 305-322. DOI:
[10.1257/aeri.20210236](https://doi.org/10.1257%2Faeri.20210236).

Roth, J. and P. H. C. Sant'Anna (2021). _Efficient Estimation for Staggered Rollout
Designs_. Working Paper. arXiv. DOI:
[10.48550/ARXIV.2102.01291](https://doi.org/10.48550%2FARXIV.2102.01291). URL:
[https://arxiv.org/abs/2102.01291](https://arxiv.org/abs/2102.01291).

Roth, J., P. Sant'Anna, A. Bilinski, et al. (2022). "What's Trending in
Difference-in-Differences? A Synthesis of the Recent Econometrics Literature". In:
_Journal of Econometrics_.

Rudik, I. (2020). "Optimal Climate Policy When Damages Are Unknown". In: _American
Economic Journal: Economic Policy_ 12.2, pp. 340-373. DOI:
[10.1257/pol.20160541](https://doi.org/10.1257%2Fpol.20160541).

Rueschendorf, L. (1981). "Sharpness of Frechet-bounds". In: _Zeitschrift fur
Wahrscheinlichkeitstheorie und Verwandte Gebiete_ 57.2, pp. 293-302. DOI:
[10.1007/BF00535495](https://doi.org/10.1007%2FBF00535495).

Rust, J. (1987). "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold
Zurcher". In: _Econometrica_ 55.5, pp. 999-1033. URL:
[http://www.jstor.org/stable/1911259](http://www.jstor.org/stable/1911259).

Shalizi, C. R. (2019). _Advanced Data Analysis from an Elementary Point of View_.
Cambridge University Press. URL:
[http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf](http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf).

Smith Jr., A. A. (2008). "Indirect Inference". In: _The New Palgrave Dictionary of
Economics_. Ed. by S. N. Durlauf and L. E. Blume. Vol. 1-8. London: Palgrave Macmillan.
DOI: [10.1007/978-1-349-58802-2](https://doi.org/10.1007%2F978-1-349-58802-2). URL:
[http://www.econ.yale.edu/smith/palgrave7.pdf](http://www.econ.yale.edu/smith/palgrave7.pdf).

Stinebrickner, R. and T. Stinebrickner (2014a). "Academic Performance and College Dropout:
Using Longitudinal Expectations Data to Estimate a Learning Model". In: _Journal of Labor
Economics_ 32.3, pp. 601-644. DOI: [10.1086/675308](https://doi.org/10.1086%2F675308).

Stinebrickner, R. and T. R. Stinebrickner (2014b). "A Major in Science? Initial Beliefs
and Final Outcomes for College Major and Dropout". In: _Review of Economic Studies_ 81.1,
pp. 426-472. DOI: [10.1093/restud/rdt025](https://doi.org/10.1093%2Frestud%2Frdt025).

Su, C. and K. L. Judd (2012). "Constrained Optimization Approaches to Estimation of
Structural Models". In: _Econometrica_ 80.5, pp. 2213-2230. DOI:
[10.3982/ECTA7925](https://doi.org/10.3982%2FECTA7925).

Sun, L. and S. Abraham (2021). "Estimating Dynamic Treatment Effects in Event Studies with
Heterogeneous Treatment Effects". In: _Journal of Econometrics_ 225.2. Themed Issue:
Treatment Effect 1, pp. 175-199. DOI:
[10.1016/j.jeconom.2020.09.006](https://doi.org/10.1016%2Fj.jeconom.2020.09.006).

Train, K. (2009). _Discrete Choice Methods with Simulation_. 2nd ed. Cambridge; New York:
Cambridge University Press. ISBN: 9780521766555.

Vytlacil, E. (2002). "Independence, Monotonicity, and Latent Index Models: An Equivalence
Result". In: _Econometrica_ 70.1, pp. 331-341. DOI:
[10.1111/1468-0262.00277](https://doi.org/10.1111%2F1468-0262.00277).

Wiswall, M. and B. Zafar (2018). "Preference for the Workplace, Investment in Human
Capital, and Gender". In: _Quarterly Journal of Economics_ 133.1, pp. 457-507. DOI:
[10.1093/qje/qjx035](https://doi.org/10.1093%2Fqje%2Fqjx035).

Young, A. (2020). _Consistency without Inference: Instrumental Variables in Practical
Application_. Working Paper. London School of Economics.
]