class: title-slide <br><br><br> # Lecture 9 ## Simulated Method of Moments: Another Method of Structural Estimation ### Tyler Ransom ### ECON 6343, University of Oklahoma --- # Plan for the Day 1. Review Method of Moments and GMM 2. Introduce simulated method of moments (SMM) 3. Walk through how to do SMM in Julia 4. Discuss indirect inference --- # Generalized Method of Moments (GMM) - GMM is a fundamental concept taught in graduate-level econometrics - It is very popular because it nests many common econometric estimators: - OLS - IV and 2SLS - Nonlinear least squares (NLLS) - MLE (e.g. probit, logit) - There's a great overview video [here](https://www.youtube.com/watch?v=U7Ylm187hYA) --- # Method of Moments - We can use method of moments to estimate a model's parameters - Consider a simple regression model `\begin{align*} y &= \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \varepsilon \end{align*}` - Assume `\(\mathbb{E}[\varepsilon \vert \mathbb{x}] = 0\)` (conditional independence) - Then we can form a system of 3 equations and 3 unknowns --- # OLS Population Moment Conditions - If we write out the OLS moment conditions, we get `\begin{align*} \mathbb{E}[\varepsilon] &= 0\\ \mathbb{E}[\varepsilon' x_1] &= 0\\ \mathbb{E}[\varepsilon' x_2] &= 0\\ \end{align*}` - Rewriting in terms of our parameters of interest `\((\beta_0,\beta_1,\beta_2)\)`: `\begin{align*} \mathbb{E}[(y - \beta_0 - \beta_1 x_1 - \beta_2 x_2)] &= 0\\ \mathbb{E}[(y - \beta_0 - \beta_1 x_1 - \beta_2 x_2)' x_1] &= 0\\ \mathbb{E}[(y - \beta_0 - \beta_1 x_1 - \beta_2 x_2)' x_2] &= 0\\ \end{align*}` --- # OLS Sample Moment Conditions - We then need to adjust the previous formula to work with sample analogs: `\begin{align*} g\left(\boldsymbol \beta\right) &=\begin{cases} \frac{1}{N}\sum_{i=1}^N(y_i - \beta_0 - \beta_1 x_{i1} - \beta_2 x_{i2}) &= 0\\ \frac{1}{N}\sum_{i=1}^N(y_i - \beta_0 - \beta_1 x_{i1} - \beta_2 x_{i2})' x_{i1} &= 0\\ \frac{1}{N}\sum_{i=1}^N(y_i - \beta_0 - \beta_1 x_{i1} - \beta_2 x_{i2})' x_{i2} &= 0\end{cases} \end{align*}` - We can estimate this by exactly-identified GMM using the objective function `\begin{align*} \hat{\boldsymbol \beta} &= \arg \min_{\boldsymbol \beta} J\left(\boldsymbol \beta\right) \end{align*}` where `\begin{align*} J\left(\boldsymbol \beta\right) &= N g\left(\boldsymbol \beta\right)' g\left(\boldsymbol \beta\right) \end{align*}` --- # GMM with more moment conditions than parameters - The solution to the obj fn on the pvs slide has a closed form for OLS: `\((X'X)^{-1}X'y\)` - In cases with more moment conditions than parameters, we need to weight `\begin{align*} \hat{\boldsymbol \beta} &= \arg \min_{\boldsymbol \beta} J\left(\boldsymbol \beta, \hat{\mathbf{W}}\right) \end{align*}` where `\begin{align*} J\left(\boldsymbol \beta\right) &= N g\left(\boldsymbol \beta\right)' \hat{\mathbf{W}}(\boldsymbol \beta) g\left(\boldsymbol \beta\right) \end{align*}` - There is a ton of econometric theory about the optimal weighting matrix `\(\hat{\mathbf{W}}\)` - As well as the asymptotic properties of the GMM estimator (spoiler: they're good) --- # GMM as OLS - Another example is OLS posed a different way - Previously, we solved `\(K\)` equations of `\(\mathbb{E}\left[\varepsilon'X_k\right]=0\)` and `\(\mathbb{E}\left[\varepsilon\right]=0\)` - We could instead simply try to match `\(y\)` to `\(X\beta\)` for every observation - In this case, `\(g = y-X\beta\)` - There are `\(N\)` moment conditions and `\(K+1\)` parameters to be estimated - Use the `\(N\times N\)` Identity matrix for `\(\mathbf{W}\)` and this is precisely OLS - In my experience, this approach has better computational properties than the "classical" approach --- # Binary Logit Sample Moment Conditions - The "classical" approach to the moment conditions for the binary logit model is: `\begin{align*} g\left(\boldsymbol \beta\right) &=\begin{cases} \frac{1}{N}\sum_{i=1}^N\left[y_i - \frac{\exp\left(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}\right)}{1+\exp\left(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}\right)}\right] &= 0\\ \frac{1}{N}\sum_{i=1}^N\left[y_i - \frac{\exp\left(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}\right)}{1+\exp\left(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}\right)}\right]' x_{i1} &= 0\\ \frac{1}{N}\sum_{i=1}^N\left[y_i - \frac{\exp\left(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}\right)}{1+\exp\left(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}\right)}\right]' x_{i2} &= 0\end{cases} \end{align*}` where `\(y_i \in \left\{0,1\right\}\)` - With the same formula for `\(J\)` as in the OLS case (or any other case) - Under the alternative approach, use `\(g = y - P\)` where `\(y\in \left\{0,1\right\}\)` and `\(P\in\left[0,1\right]\)` - Again, there are `\(N\)` moment conditions and `\(K+1\)` parameters to be estimated --- # Coding example: Estimating binary logit by GMM - We can estimate the binary logit model by GMM as follows: ``` julia using Optim, LinearAlgebra function logit_gmm(α, X, y) P = exp.(X*α)./(1 .+ exp.(X*α)) g = y .- P J = g'*I*g return J end α̂_optim = optimize(a -> logit_gmm(a, X, y), rand(size(X,2)), LBFGS(), Optim.Options(g_tol=1e-8, iterations=100_000)) println(α̂_optim.minimizer) ``` - This gives estimates that are quite close to (but not identical to) MLE - With `\(\mathbf{W} = \mathbf{I}\)`, this objective function is identical to nonlinear least squares (NLLS) --- # Usefulness of simulation - As we showed in PS4, we can sometimes use simulation to compute integrals - Another alternative is to use quadrature to compute the integral - In the simulation case, we took draws from the mixture distribution of a mixed logit - More generally, we can estimate highly complex models using simulation methods - In some cases, simulation is the _only_ option; everything else is intractable - Quadrature typically only works with very low-dimensional integrals --- # Simulation methods Train (2009) mentions three different types of simulation-based methods: 1. .hi[Simulated Maximum Likelihood] (a.k.a. Maximum Simulated Likelihood) 2. .hi[Simulated Method of Moments] (a.k.a. Method of Simulated Moments) 3. .hi[Method of Simulated Scores] - What I asked you to code up in PS4 was basically SML - Today we'll talk mostly about SMM - We won't cover Method of Simulated Scores --- # Simulated Method of Moments - As the name would imply, SMM is a simulated version of GMM - The difference: SMM uses moments from simulated data - The objective is then to make simulated and actual data match - See McFadden (1989) and Evans (2018) for more details - Evans (2018) includes a Python coding example - Notes by [Jason DeBacker](https://www.jasondebacker.com/classes/Lecture10_Notes_SMM.pdf), [Eric Sims](https://www3.nd.edu/~esims1/advanced_topics.pdf) and [Colin Cameron](http://cameron.econ.ucdavis.edu/mmabook/transparencies/ct06_gmm.pdf) are also helpful --- # Pros of SMM - Can estimate models with `\(P\)`'s that don't have a closed form, like probit (Chintagunta, 1992) - Can estimate other models that would otherwise be intractable - e.g. dynamic models with high-dimensional integrals - Or micro-models based only on aggregated data - Coding for simulating the model is already done! Can dive right into counterfactuals - It's straightforward to interpret the moments and know the model is fitting these - Also easier to compare with reduced-form evidence --- # Cons of SMM - Much more computationally intensive than GMM - Loss of (statistical) efficiency, relative to MLE (i.e. larger SE's) - For me personally, it's not always clear which moments to select - this can feel a bit _ad hoc_ --- # SMM in Julia - Once we know the objective fn, we can program any estimator we please - Let's consider how to estimate a simple linear regression model `\begin{align*} y &= X\beta + \varepsilon\\ \varepsilon&\sim N(0,\sigma^2) \end{align*}` - `\(y\)` and `\(X\)` are data, and we want to estimate `\(\beta\)` and `\(\sigma\)` - .hi[Note:] here we need to make an assumption about what the DGP looks like - This means making the strong assumption that `\(\varepsilon\sim N(0,\sigma^2)\)` --- # Estimation steps - For .hi[each guess] of `\(\theta = [\beta', \sigma]'\)` we do the following: - Compute data moments - Draw `\(N\)` `\(\varepsilon\)`'s `\(D\)` times (typically `\(D>1000\)`) - For each draw, compute `\(y\)` from the model equation (call it `\(\tilde{y}\)`) given `\(\theta\)` - Compute model moments using `\(\tilde{y}\)` (same as data moments with `\(y\)`) - Model moments are averaged across all `\(D\)` draws - Update objective function value given values of data and avg'd model moments --- # SMM in Julia .scroll-box-12[ ``` julia function ols_smm(θ, X, y, D) K = size(X,2) N = size(y,1) β = θ[1:end-1] σ = θ[end] if length(β)==1 β = β[1] end # N+1 moments in both model and data gmodel = zeros(N+1,D) # data moments are just the y vector itself # and the variance of the y vector gdata = vcat(y,var(y)) #### !!!!!!!!!!!!!!!!!!!!!!!!!!!!! #### # This is critical! # Random.seed!(1234) # # You must always use the same ε draw # # for every guess of θ! # #### !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ### # simulated model moments for d=1:D ε = σ*randn(N) ỹ = X*β .+ ε gmodel[1:end-1,d] = ỹ gmodel[ end ,d] = var(ỹ) end # criterion function err = vec(gdata .- mean(gmodel; dims=2)) # weighting matrix is the identity matrix # minimize weighted difference between data and moments J = err'*I*err return J end ``` ] - Data moments to match: `\(\left\{y_i, i=1,\ldots,N;\widehat{V}(y)\right\}\)` - Model moments to match: `\(\left\{\tilde{y}_i, i=1,\ldots,N;\widehat{V}(\tilde{y})\right\}\)` --- # SMM optimization - We can optimize the objective function with any optimizer we'd like - In general, the SMM objective function may be poorly behaved (i.e. local optima) - So you may need to employ tactics to find the global optimum: - use LBFGS from many different starting values - use Simulated Annealing or Particle Swarm - (these are algorithms designed to find global optima) - But SMM should be well behaved for simple problems (like OLS) - .hi[Always remember:] Must use .hi[same] draw of `\(\varepsilon\)` in every optimizer iteration! --- # SMM.jl - SMM is so common, that others have already implemented it - And probably in a more computationally efficient manner! - One such package is `SMM.jl`, written by [Florian Oswald](https://floswald.github.io/) (Sciences Po) - This package allows for parallelization, which can speed up estimation time - It also uses a Bayesian Markov Chain Monte Carlo algorithm known as BGP - "BGP" comes from Baragatti, Grimaud, and Pommeret (2013) - I am still learning this package but there are some examples --- # SMM.jl example - Let's estimate the following model using `SMM.jl` `\begin{align*} Y_1 &= \beta_{01} + \varepsilon_{1}\\ Y_2 &= \beta_{02} + \varepsilon_{2} \end{align*}` where `\(\mathbf{\varepsilon} \sim MVN\left(\mathbf{0},I\right)\)`. Thus, the `\(\beta\)`'s constitute the means of each MVN dimension. - The code to do this is included in the examples of `SMM.jl` with `\((\beta_{01},\beta_{02}) = (-1,1)\)` ``` julia using SMM, DataFrames MA = SMM.parallelNormal() # Note: this line may take up to 5 minutes to execute dc = SMM.history(MA.chains[1]) dc = dc[dc[!,:accepted].==true, :] println(describe(dc)) ``` - You can then verify that the `mean` column for `p1` and `p2` is close to -1 and 1. --- # Indirect inference (Smith Jr., 2008) - So far today we've only talked about matching model moments to data - Logic: if the model matches the data, then it is a reasonable model - Another alternative is known as .hi[indirect inference] - In this case, we use an .hi[auxiliary model] --- # Indirect Inference (Cont'd) - The auxiliary model doesn't need to accurately describe the DGP - It simply acts a lens through which to view the world - .hi[Objective:] minimize the parameters of the economic model such that - real-world data = simulated data .hi[through the lens of the auxiliary model] --- # Example: Economic Model - Consider a simple macro model with two simultaneous equations: `\begin{align*} C_t &= \beta Y_t + u_t\\ Y_t &= C_t + X_t \end{align*}` - `\(C_t\)` (consumption) and `\(Y_t\)` (income) are endogenous - `\(X_t\)` (non-consumption expenditure) is exogenous - `\(u_t \overset{iid}{\sim}N(0,\sigma^2)\)` - Supposing we know the value of `\(\sigma^2\)`, then `\(\beta\)` is the lone parameter in the model --- # Example: Auxiliary model - We don't need to use indirect inference to estimate `\(\beta\)`, but we can - Suppose our auxiliary model is `\begin{align*} C_t &= \theta X_t + e_t\\ e_t &\sim N(0,s^2) \end{align*}` where again the variance `\(s^2\)` is known - We can estimate `\(\theta\)` by OLS or MLE - But how does that help us estimate `\(\beta\)`? - We need to find the mapping between `\(\beta\)` and `\(\theta\)` --- # Example: Finding the mapping - Let's apply some algebra to the first system of equations. Substituting `\(Y_t\)` gives `\begin{align*} C_t &= \beta(C_t+X_t)+u_t\\ C_t &= \frac{\beta}{1-\beta}X_t + \frac{1}{1-\beta}u_t \\ &\Rightarrow \theta = \frac{\beta}{1-\beta} \\ &\Rightarrow \beta = \frac{\theta}{1+\theta} \end{align*}` - We know we can easily estimate `\(\theta\)` by OLS - Then we can recover `\(\hat{\beta}\)` by evaluating `\(\frac{\hat{\theta}}{1+\hat{\theta}}\)` - We worked backwards from the auxiliary model to get estimates of the main model --- # References .smallest[ Ackerberg, D. A. (2003). "Advertising, Learning, and Consumer Choice in Experience Good Markets: An Empirical Examination". In: _International Economic Review_ 44.3, pp. 1007-1040. DOI: [10.1111/1468-2354.t01-2-00098](https://doi.org/10.1111%2F1468-2354.t01-2-00098). Adams, R. P. (2018). _Model Selection and Cross Validation_. Lecture Notes. Princeton University. URL: [https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf](https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf). Ahlfeldt, G. M., S. J. Redding, D. M. Sturm, et al. (2015). "The Economics of Density: Evidence From the Berlin Wall". In: _Econometrica_ 83.6, pp. 2127-2189. DOI: [10.3982/ECTA10876](https://doi.org/10.3982%2FECTA10876). Altonji, J. G., T. E. Elder, and C. R. Taber (2005). "Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools". In: _Journal of Political Economy_ 113.1, pp. 151-184. DOI: [10.1086/426036](https://doi.org/10.1086%2F426036). Altonji, J. G. and C. R. Pierret (2001). "Employer Learning and Statistical Discrimination". In: _Quarterly Journal of Economics_ 116.1, pp. 313-350. DOI: [10.1162/003355301556329](https://doi.org/10.1162%2F003355301556329). Angrist, J. D. and A. B. Krueger (1991). "Does Compulsory School Attendance Affect Schooling and Earnings?" In: _Quarterly Journal of Economics_ 106.4, pp. 979-1014. DOI: [10.2307/2937954](https://doi.org/10.2307%2F2937954). Angrist, J. D. and J. Pischke (2009). _Mostly Harmless Econometrics: An Empiricist's Companion_. Princeton University Press. ISBN: 0691120358. Arcidiacono, P. (2004). "Ability Sorting and the Returns to College Major". In: _Journal of Econometrics_ 121, pp. 343-375. DOI: [10.1016/j.jeconom.2003.10.010](https://doi.org/10.1016%2Fj.jeconom.2003.10.010). Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2016). _College Attrition and the Dynamics of Information Revelation_. Working Paper. Duke University. URL: [https://tyleransom.github.io/research/CollegeDropout2016May31.pdf](https://tyleransom.github.io/research/CollegeDropout2016May31.pdf). Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2025). "College Attrition and the Dynamics of Information Revelation". In: _Journal of Political Economy_ 133.1. DOI: [10.1086/732526](https://doi.org/10.1086%2F732526). Arcidiacono, P. and J. B. Jones (2003). "Finite Mixture Distributions, Sequential Likelihood and the EM Algorithm". In: _Econometrica_ 71.3, pp. 933-946. DOI: [10.1111/1468-0262.00431](https://doi.org/10.1111%2F1468-0262.00431). Arcidiacono, P., J. Kinsler, and T. Ransom (2022b). "Asian American Discrimination in Harvard Admissions". In: _European Economic Review_ 144, p. 104079. DOI: [10.1016/j.euroecorev.2022.104079](https://doi.org/10.1016%2Fj.euroecorev.2022.104079). Arcidiacono, P., J. Kinsler, and T. Ransom (2022a). "Legacy and Athlete Preferences at Harvard". In: _Journal of Labor Economics_ 40.1, pp. 133-156. DOI: [10.1086/713744](https://doi.org/10.1086%2F713744). Arcidiacono, P. and R. A. Miller (2011). "Conditional Choice Probability Estimation of Dynamic Discrete Choice Models With Unobserved Heterogeneity". In: _Econometrica_ 79.6, pp. 1823-1867. DOI: [10.3982/ECTA7743](https://doi.org/10.3982%2FECTA7743). Arroyo Marioli, F., F. Bullano, S. Kucinskas, et al. (2020). _Tracking R of COVID-19: A New Real-Time Estimation Using the Kalman Filter_. Working Paper. medRxiv. DOI: [10.1101/2020.04.19.20071886](https://doi.org/10.1101%2F2020.04.19.20071886). Ashworth, J., V. J. Hotz, A. Maurel, et al. (2021). "Changes across Cohorts in Wage Returns to Schooling and Early Work Experiences". In: _Journal of Labor Economics_ 39.4, pp. 931-964. DOI: [10.1086/711851](https://doi.org/10.1086%2F711851). Attanasio, O. P., C. Meghir, and A. Santiago (2011). "Education Choices in Mexico: Using a Structural Model and a Randomized Experiment to Evaluate PROGRESA". In: _Review of Economic Studies_ 79.1, pp. 37-66. DOI: [10.1093/restud/rdr015](https://doi.org/10.1093%2Frestud%2Frdr015). Aucejo, E. M. and J. James (2019). "Catching Up to Girls: Understanding the Gender Imbalance in Educational Attainment Within Race". In: _Journal of Applied Econometrics_ 34.4, pp. 502-525. DOI: [10.1002/jae.2699](https://doi.org/10.1002%2Fjae.2699). Baragatti, M., A. Grimaud, and D. Pommeret (2013). "Likelihood-free Parallel Tempering". In: _Statistics and Computing_ 23.4, pp. 535-549. DOI: [ 10.1007/s11222-012-9328-6](https://doi.org/%2010.1007%2Fs11222-012-9328-6). Bayer, P., R. McMillan, A. Murphy, et al. (2016). "A Dynamic Model of Demand for Houses and Neighborhoods". In: _Econometrica_ 84.3, pp. 893-942. DOI: [10.3982/ECTA10170](https://doi.org/10.3982%2FECTA10170). Begg, C. B. and R. Gray (1984). "Calculation of Polychotomous Logistic Regression Parameters Using Individualized Regressions". In: _Biometrika_ 71.1, pp. 11-18. DOI: [10.1093/biomet/71.1.11](https://doi.org/10.1093%2Fbiomet%2F71.1.11). Beggs, S. D., N. S. Cardell, and J. Hausman (1981). "Assessing the Potential Demand for Electric Cars". In: _Journal of Econometrics_ 17.1, pp. 1-19. DOI: [10.1016/0304-4076(81)90056-7](https://doi.org/10.1016%2F0304-4076%2881%2990056-7). Berry, S., J. Levinsohn, and A. Pakes (1995). "Automobile Prices in Market Equilibrium". In: _Econometrica_ 63.4, pp. 841-890. URL: [http://www.jstor.org/stable/2171802](http://www.jstor.org/stable/2171802). Blass, A. A., S. Lach, and C. F. Manski (2010). "Using Elicited Choice Probabilities to Estimate Random Utility Models: Preferences for Electricity Reliability". In: _International Economic Review_ 51.2, pp. 421-440. DOI: [10.1111/j.1468-2354.2010.00586.x](https://doi.org/10.1111%2Fj.1468-2354.2010.00586.x). Blundell, R. (2010). "Comments on: ``Structural vs. Atheoretic Approaches to Econometrics'' by Michael Keane". In: _Journal of Econometrics_ 156.1, pp. 25-26. DOI: [10.1016/j.jeconom.2009.09.005](https://doi.org/10.1016%2Fj.jeconom.2009.09.005). Bresnahan, T. F., S. Stern, and M. Trajtenberg (1997). "Market Segmentation and the Sources of Rents from Innovation: Personal Computers in the Late 1980s". In: _The RAND Journal of Economics_ 28.0, pp. S17-S44. DOI: [10.2307/3087454](https://doi.org/10.2307%2F3087454). Brien, M. J., L. A. Lillard, and S. Stern (2006). "Cohabitation, Marriage, and Divorce in a Model of Match Quality". In: _International Economic Review_ 47.2, pp. 451-494. DOI: [10.1111/j.1468-2354.2006.00385.x](https://doi.org/10.1111%2Fj.1468-2354.2006.00385.x). Card, D. (1995). "Using Geographic Variation in College Proximity to Estimate the Return to Schooling". In: _Aspects of Labor Market Behaviour: Essays in Honour of John Vanderkamp_. Ed. by L. N. Christofides, E. K. Grant and R. Swidinsky. Toronto: University of Toronto Press. Cardell, N. S. (1997). "Variance Components Structures for the Extreme-Value and Logistic Distributions with Application to Models of Heterogeneity". In: _Econometric Theory_ 13.2, pp. 185-213. URL: [https://www.jstor.org/stable/3532724](https://www.jstor.org/stable/3532724). Caucutt, E. M., L. Lochner, J. Mullins, et al. (2020). _Child Skill Production: Accounting for Parental and Market-Based Time and Goods Investments_. Working Paper 27838. National Bureau of Economic Research. DOI: [10.3386/w27838](https://doi.org/10.3386%2Fw27838). Chen, X., H. Hong, and D. Nekipelov (2011). "Nonlinear Models of Measurement Errors". In: _Journal of Economic Literature_ 49.4, pp. 901-937. DOI: [10.1257/jel.49.4.901](https://doi.org/10.1257%2Fjel.49.4.901). Chintagunta, P. K. (1992). "Estimating a Multinomial Probit Model of Brand Choice Using the Method of Simulated Moments". In: _Marketing Science_ 11.4, pp. 386-407. DOI: [10.1287/mksc.11.4.386](https://doi.org/10.1287%2Fmksc.11.4.386). Cinelli, C. and C. Hazlett (2020). "Making Sense of Sensitivity: Extending Omitted Variable Bias". In: _Journal of the Royal Statistical Society: Series B (Statistical Methodology)_ 82.1, pp. 39-67. DOI: [10.1111/rssb.12348](https://doi.org/10.1111%2Frssb.12348). Coate, P. and K. Mangum (2019). _Fast Locations and Slowing Labor Mobility_. Working Paper 19-49. Federal Reserve Bank of Philadelphia. Cunha, F., J. J. Heckman, and S. M. Schennach (2010). "Estimating the Technology of Cognitive and Noncognitive Skill Formation". In: _Econometrica_ 78.3, pp. 883-931. DOI: [10.3982/ECTA6551](https://doi.org/10.3982%2FECTA6551). Cunningham, S. (2021). _Causal Inference: The Mixtape_. Yale University Press. URL: [https://www.scunning.com/causalinference_norap.pdf](https://www.scunning.com/causalinference_norap.pdf). Delavande, A. and C. F. Manski (2015). "Using Elicited Choice Probabilities in Hypothetical Elections to Study Decisions to Vote". In: _Electoral Studies_ 38, pp. 28-37. DOI: [10.1016/j.electstud.2015.01.006](https://doi.org/10.1016%2Fj.electstud.2015.01.006). Delavande, A. and B. Zafar (2019). "University Choice: The Role of Expected Earnings, Nonpecuniary Outcomes, and Financial Constraints". In: _Journal of Political Economy_ 127.5, pp. 2343-2393. DOI: [10.1086/701808](https://doi.org/10.1086%2F701808). Diegert, P., M. A. Masten, and A. Poirier (2025). _Assessing Omitted Variable Bias when the Controls are Endogenous_. arXiv. DOI: [10.48550/ARXIV.2206.02303](https://doi.org/10.48550%2FARXIV.2206.02303). Erdem, T. and M. P. Keane (1996). "Decision-Making under Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent Consumer Goods Markets". In: _Marketing Science_ 15.1, pp. 1-20. DOI: [10.1287/mksc.15.1.1](https://doi.org/10.1287%2Fmksc.15.1.1). Evans, R. W. (2018). _Simulated Method of Moments (SMM) Estimation_. QuantEcon Note. University of Chicago. URL: [https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93](https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93). Farber, H. S. and R. Gibbons (1996). "Learning and Wage Dynamics". In: _Quarterly Journal of Economics_ 111.4, pp. 1007-1047. DOI: [10.2307/2946706](https://doi.org/10.2307%2F2946706). Fu, C., N. Grau, and J. Rivera (2020). _Wandering Astray: Teenagers' Choices of Schooling and Crime_. Working Paper. University of Wisconsin-Madison. URL: [https://www.ssc.wisc.edu/~cfu/wander.pdf](https://www.ssc.wisc.edu/~cfu/wander.pdf). Gillingham, K., F. Iskhakov, A. Munk-Nielsen, et al. (2022). "Equilibrium Trade in Automobiles". In: _Journal of Political Economy_. DOI: [10.1086/720463](https://doi.org/10.1086%2F720463). Haile, P. (2019). _``Structural vs. Reduced Form'' Language and Models in Empirical Economics_. Lecture Slides. Yale University. URL: [http://www.econ.yale.edu/~pah29/intro.pdf](http://www.econ.yale.edu/~pah29/intro.pdf). Haile, P. (2024). _Models, Measurement, and the Language of Empirical Economics_. Lecture Slides. Yale University. URL: [https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf](https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf). Heckman, J. J., J. Stixrud, and S. Urzua (2006). "The Effects of Cognitive and Noncognitive Abilities on Labor Market Outcomes and Social Behavior". In: _Journal of Labor Economics_ 24.3, pp. 411-482. DOI: [10.1086/504455](https://doi.org/10.1086%2F504455). Hotz, V. J. and R. A. Miller (1993). "Conditional Choice Probabilities and the Estimation of Dynamic Models". In: _The Review of Economic Studies_ 60.3, pp. 497-529. DOI: [10.2307/2298122](https://doi.org/10.2307%2F2298122). Hurwicz, L. (1950). "Generalization of the Concept of Identification". In: _Statistical Inference in Dynamic Economic Models_. Hoboken, NJ: John Wiley and Sons, pp. 245-257. Ishimaru, S. (2022). _Geographic Mobility of Youth and Spatial Gaps in Local College and Labor Market Opportunities_. Working Paper. Hitotsubashi University. James, J. (2011). _Ability Matching and Occupational Choice_. Working Paper 11-25. Federal Reserve Bank of Cleveland. James, J. (2017). "MM Algorithm for General Mixed Multinomial Logit Models". In: _Journal of Applied Econometrics_ 32.4, pp. 841-857. DOI: [10.1002/jae.2532](https://doi.org/10.1002%2Fjae.2532). Jin, H. and H. Shen (2020). "Foreign Asset Accumulation Among Emerging Market Economies: A Case for Coordination". In: _Review of Economic Dynamics_ 35.1, pp. 54-73. DOI: [10.1016/j.red.2019.04.006](https://doi.org/10.1016%2Fj.red.2019.04.006). Keane, M. P. (2010). "Structural vs. Atheoretic Approaches to Econometrics". In: _Journal of Econometrics_ 156.1, pp. 3-20. DOI: [10.1016/j.jeconom.2009.09.003](https://doi.org/10.1016%2Fj.jeconom.2009.09.003). Keane, M. P. and K. I. Wolpin (1997). "The Career Decisions of Young Men". In: _Journal of Political Economy_ 105.3, pp. 473-522. DOI: [10.1086/262080](https://doi.org/10.1086%2F262080). Koopmans, T. C. and O. Reiersol (1950). "The Identification of Structural Characteristics". In: _The Annals of Mathematical Statistics_ 21.2, pp. 165-181. URL: [http://www.jstor.org/stable/2236899](http://www.jstor.org/stable/2236899). Kosar, G., T. Ransom, and W. van der Klaauw (2022). "Understanding Migration Aversion Using Elicited Counterfactual Choice Probabilities". In: _Journal of Econometrics_ 231.1, pp. 123-147. DOI: [10.1016/j.jeconom.2020.07.056](https://doi.org/10.1016%2Fj.jeconom.2020.07.056). Krauth, B. (2016). "Bounding a Linear Causal Effect Using Relative Correlation Restrictions". In: _Journal of Econometric Methods_ 5.1, pp. 117-141. DOI: [10.1515/jem-2013-0013](https://doi.org/10.1515%2Fjem-2013-0013). Lang, K. and M. D. Palacios (2018). _The Determinants of Teachers' Occupational Choice_. Working Paper 24883. National Bureau of Economic Research. DOI: [10.3386/w24883](https://doi.org/10.3386%2Fw24883). Lee, D. S., J. McCrary, M. J. Moreira, et al. (2020). _Valid t-ratio Inference for IV_. Working Paper. arXiv. URL: [https://arxiv.org/abs/2010.05058](https://arxiv.org/abs/2010.05058). Lewbel, A. (2019). "The Identification Zoo: Meanings of Identification in Econometrics". In: _Journal of Economic Literature_ 57.4, pp. 835-903. DOI: [10.1257/jel.20181361](https://doi.org/10.1257%2Fjel.20181361). Mahoney, N. (2022). "Principles for Combining Descriptive and Model-Based Analysis in Applied Microeconomics Research". In: _Journal of Economic Perspectives_ 36.3, pp. 211-22. DOI: [10.1257/jep.36.3.211](https://doi.org/10.1257%2Fjep.36.3.211). McFadden, D. (1978). "Modelling the Choice of Residential Location". In: _Spatial Interaction Theory and Planning Models_. Ed. by A. Karlqvist, L. Lundqvist, F. Snickers and J. W. Weibull. Amsterdam: North Holland, pp. 75-96. McFadden, D. (1989). "A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration". In: _Econometrica_ 57.5, pp. 995-1026. DOI: [10.2307/1913621](https://doi.org/10.2307%2F1913621). URL: [http://www.jstor.org/stable/1913621](http://www.jstor.org/stable/1913621). Mellon, J. (2020). _Rain, Rain, Go Away: 137 Potential Exclusion-Restriction Violations for Studies Using Weather as an Instrumental Variable_. Working Paper. University of Manchester. URL: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610). Miller, R. A. (1984). "Job Matching and Occupational Choice". In: _Journal of Political Economy_ 92.6, pp. 1086-1120. DOI: [10.1086/261276](https://doi.org/10.1086%2F261276). Mincer, J. (1974). _Schooling, Experience and Earnings_. New York: Columbia University Press for National Bureau of Economic Research. Ost, B., W. Pan, and D. Webber (2018). "The Returns to College Persistence for Marginal Students: Regression Discontinuity Evidence from University Dismissal Policies". In: _Journal of Labor Economics_ 36.3, pp. 779-805. DOI: [10.1086/696204](https://doi.org/10.1086%2F696204). Oster, E. (2019). "Unobservable Selection and Coefficient Stability: Theory and Evidence". In: _Journal of Business & Economic Statistics_ 37.2, pp. 187-204. DOI: [10.1080/07350015.2016.1227711](https://doi.org/10.1080%2F07350015.2016.1227711). Pischke, S. (2007). _Lecture Notes on Measurement Error_. Lecture Notes. London School of Economics. URL: [http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf](http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf). Ransom, M. R. and T. Ransom (2018). "Do High School Sports Build or Reveal Character? Bounding Causal Estimates of Sports Participation". In: _Economics of Education Review_ 64, pp. 75-89. DOI: [10.1016/j.econedurev.2018.04.002](https://doi.org/10.1016%2Fj.econedurev.2018.04.002). Ransom, T. (2022). "Labor Market Frictions and Moving Costs of the Employed and Unemployed". In: _Journal of Human Resources_ 57.S, pp. S137-S166. DOI: [10.3368/jhr.monopsony.0219-10013R2](https://doi.org/10.3368%2Fjhr.monopsony.0219-10013R2). Rudik, I. (2020). "Optimal Climate Policy When Damages Are Unknown". In: _American Economic Journal: Economic Policy_ 12.2, pp. 340-373. DOI: [10.1257/pol.20160541](https://doi.org/10.1257%2Fpol.20160541). Rust, J. (1987). "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher". In: _Econometrica_ 55.5, pp. 999-1033. URL: [http://www.jstor.org/stable/1911259](http://www.jstor.org/stable/1911259). Shalizi, C. R. (2019). _Advanced Data Analysis from an Elementary Point of View_. Cambridge University Press. URL: [http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf](http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf). Smith Jr., A. A. (2008). "Indirect Inference". In: _The New Palgrave Dictionary of Economics_. Ed. by S. N. Durlauf and L. E. Blume. Vol. 1-8. London: Palgrave Macmillan. DOI: [10.1007/978-1-349-58802-2](https://doi.org/10.1007%2F978-1-349-58802-2). URL: [http://www.econ.yale.edu/smith/palgrave7.pdf](http://www.econ.yale.edu/smith/palgrave7.pdf). Stinebrickner, R. and T. Stinebrickner (2014a). "Academic Performance and College Dropout: Using Longitudinal Expectations Data to Estimate a Learning Model". In: _Journal of Labor Economics_ 32.3, pp. 601-644. DOI: [10.1086/675308](https://doi.org/10.1086%2F675308). Stinebrickner, R. and T. R. Stinebrickner (2014b). "A Major in Science? Initial Beliefs and Final Outcomes for College Major and Dropout". In: _Review of Economic Studies_ 81.1, pp. 426-472. DOI: [10.1093/restud/rdt025](https://doi.org/10.1093%2Frestud%2Frdt025). Su, C. and K. L. Judd (2012). "Constrained Optimization Approaches to Estimation of Structural Models". In: _Econometrica_ 80.5, pp. 2213-2230. DOI: [10.3982/ECTA7925](https://doi.org/10.3982%2FECTA7925). Train, K. (2009). _Discrete Choice Methods with Simulation_. 2nd ed. Cambridge; New York: Cambridge University Press. ISBN: 9780521766555. Wiswall, M. and B. Zafar (2018). "Preference for the Workplace, Investment in Human Capital, and Gender". In: _Quarterly Journal of Economics_ 133.1, pp. 457-507. DOI: [10.1093/qje/qjx035](https://doi.org/10.1093%2Fqje%2Fqjx035). Young, A. (2020). _Consistency without Inference: Instrumental Variables in Practical Application_. Working Paper. London School of Economics. ]