Lecture 4

class: title-slide

# Lecture 4

## Static Discrete Choice Models

### Tyler Ransom

### ECON 6343, University of Oklahoma

---

# Attribution

Many of these slides are based on slides written by Peter Arcidiacono. I use them with his permission.

These slides also follow Chapters 1-3 of Train (2009)

---

# Plan for the day

1. Describe static discrete choice models

2. Derive logit probabilities

3. Show how to estimate logit and multinomial logit models

4. Cover the IIA property

5. Discuss expected utility and consumer surplus

---

# What are discrete choice models?

- Discrete choice models are one of the workhorses of structural economics

- Deeply tied to economic theory:
    - utility maximization
    
    - revealed preference

- Used to model "utility" (broadly defined), for example:
    
    - consumer product purchase decisions

- firm market entry decisions

- investment decisions

---

# Example of a discrete choice model

- Cities in the Bay Area are interested in how the introduction of rideshare services will impact ridership on Bay Area Rapid Transit (BART)

- Questions that cities need to know the answers to:
    
    - Is rideshare a substitute for public transit or a complement?
    - How inelastic is demand for BART? Should fares be `\(\uparrow\)` or `\(\downarrow\)`?
    - Should BART services be scaled up to compete with rideshares?
    - Will the influx of rideshare vehicles increase traffic congestion / pollution?
    
- Each of these questions requires making a counterfactual prediction
- In particular, need a way to make such a prediction confidently and in a way that is easy to understand

---

# Properties of discrete choice models

1. Agents choose from among a .hi[finite] set of alternatives (called the _choice set_)

2. Alternatives in choice set are .hi[mutually exclusive]

3. Choice set is .hi[exhaustive]

---

# Notation

- Let `\(d_i\)` indicate the choice individual `\(i\)` makes where `\(d_i\in\{1,\ldots, J\}\)`.

- Individuals choose `\(d\)` to maximize their utility, `\(U\)`, which generally is written as:
`\begin{align}
U_{ij}&=u_{ij}+\epsilon_{ij}
\end{align}`
where:
- `\(u_{ij}\)` relates observed factors to the utility individual `\(i\)` receives from choosing option `\(j\)`
- `\(\epsilon_{ij}\)` are unobserved to the econometrician but observed to the individual

`\begin{align}
d_{ij}&=1 \text{  if  } u_{ij}+\epsilon_{ij}>u_{ij'}+\epsilon_{ij'}\text{  for all  } j'\neq j
\end{align}`

---

# Probabilities
- With the `\(\epsilon\)`'s unobserved, the probability of `\(i\)` making choice `\(j\)` is given by:
`\begin{align*}
P_{ij}&=\Pr(u_{ij}+\epsilon_{ij}>u_{ij'}+\epsilon_{ij'}\,\,\forall\,\, j'\neq j)\\
&=\Pr(\epsilon_{ij'}-\epsilon_{ij}<u_{ij}-u_{ij'}\,\,\forall\,\, j'\neq j)\\
&=\int_{\epsilon}I(\epsilon_{ij'}-\epsilon_{ij}<u_{ij}-u_{ij'}\,\,\forall\,\, j'\neq j)f(\epsilon)d\epsilon
\end{align*}`

- Note that, regardless of what distributional assumptions are made on the `\(\epsilon\)`'s, the probability of choosing a particular option does not change when we:

1. Add a constant to the utility of all options (i.e. .hi[only differences in utility matter])

2. Multiply by a positive number (need to .hi[scale something]; e.g. the variance of the `\(\epsilon\)`'s)

---

# Variables

- Suppose we have:
`\begin{eqnarray*}
u_{i1}=\alpha Male_i+\beta_1 X_i + \gamma Z_1\\
u_{i2}=\alpha Male_i+\beta_2 X_i+\gamma Z_2\\
\end{eqnarray*}`

- Since only differences in utility matter:
`\begin{align*}
u_{i1}-u_{i2}&=(\beta_1-\beta_2)X_i+\gamma (Z_1-Z_2)
\end{align*}`

- We can't tell whether men are happier than women, but can tell whether they more strongly prefer one option

- We can only obtain differenced coefficient estimates on `\(X\)`'s

- We can only obtain an estimate of a coefficient that is constant across choices if its corresponding variable varies by choice

---

# Number of Error Terms

- Similar to the `\(X\)`'s, there are restrictions on the number of error terms

- This is because only differences in utility matter

- Recall that he probability `\(i\)` will choose `\(j\)` is given by:
`\begin{align}
P_{ij}&=\Pr(u_{ij}+\epsilon_{ij}>u_{ij'}+\epsilon_{ij'}\,\,\forall\,\, j'\neq j)\nonumber\\
&=\Pr(\epsilon_{ij'}-\epsilon_{ij}<u_{ij}-u_{ij'}\,\,\forall\,\, j'\neq j)\label{eq:intprob}\\
&=\int_{\epsilon}I(\epsilon_{ij'}-\epsilon_{ij}<u_{ij}-u_{ij'}\,\,\forall\,\, j'\neq j)f(\epsilon)d\epsilon\nonumber
\end{align}`
where the integral is `\(J\)`-dimensional

---

# Number of Error Terms

- Rewriting the last line of \eqref{eq:intprob} as a `\(J-1\)` dimensional integral over the differenced `\(\epsilon\)`'s:
`\begin{align}
P_{ij}&=\int_{\tilde{\epsilon}}I(\tilde{\epsilon}_{ij'}<\tilde{u}_{ij'} \,\,\forall\,\, j'\neq j)g(\tilde{\epsilon})d\tilde{\epsilon}
\end{align}`

- This means one dimension of `\(f(\epsilon)\)` is not identified and must therefore be normalized

- Arises from only differences in utility mattering (.hi[location normalization])

- The scale of utility also doesn't matter (.hi[scale normalization])

- The scale normalization implies we must place restrictions on the variance of `\(\epsilon\)`'s

---

# More on the scale normalization

- The need to normalize scale means that we can never estimate the variance of `\(F\left(\tilde{\epsilon}\right)\)`

- This contrasts with linear regression models, where we can easily estimate MSE

- The scale normalization means our `\(\beta\)`'s and `\(\gamma\)`'s are implicitly divided by an unknown variance term:

`\begin{align*}
u_{i1}-u_{i2}&=(\beta_1-\beta_2)X_i+\gamma (Z_1-Z_2)\\
             &=\tilde{\beta}X_i + \gamma \tilde{Z} \\
             &=\frac{\beta^*}{\sigma}X_i + \frac{\gamma^*}{\sigma}\tilde{Z}
\end{align*}`

- `\(\tilde{\beta}\)` is what we estimate, but we will never know `\(\beta^*\)` because utility is scale-invariant

---

# Where does the logit formula come from?
- Consider a binary choice set `\(\{1,2\}\)`. The Type 1 extreme value CDF for `\(\epsilon_2\)` is:
`\begin{align*}
F(\epsilon_2)&=e^{-e^{(-\epsilon_2)}}
\end{align*}`

- To get the probability of choosing `\(1\)`, substitute in for `\(\epsilon_2\)` with `\(\epsilon_1+u_1-u_2\)`:
`\begin{align}
\Pr(d_1=1|\epsilon_1)&=e^{-e^{-(\epsilon_1+u_1-u_2)}}
\end{align}`
- But `\(\epsilon_1\)` is unobserved so we need to integrate it out

---
# Logit derivation
- Taking the integral over what is random `\((\epsilon_1)\)`:

`\begin{align*}
\Pr(d_1=1)&=\int_{-\infty}^{\infty}\left(e^{-e^{-(\epsilon_1+u_1-u_2)}}\right)f(\epsilon_1)d\epsilon_1\\
&=\int_{-\infty}^{\infty}\left(e^{-e^{-(\epsilon_1+u_1-u_2)}}\right)e^{-\epsilon_1}e^{-e^{-\epsilon_1}}d\epsilon_1\\
&=\int_{-\infty}^{\infty}\exp\left(-e^{-\epsilon_1}-e^{-(\epsilon_1+u_1-u_2)}\right)e^{-\epsilon_1}d\epsilon_1\\
&=\int_{-\infty}^{\infty}\exp\left(-e^{-\epsilon_1}\left[1+e^{u_2-u_1}\right]\right)e^{-\epsilon_1}d\epsilon_1
\end{align*}`

- We can simplify by U-substitution where `\(t=\exp(-\epsilon_1)\)` and `\(dt=-\exp(-\epsilon_1)d\epsilon_1\)`

- And adjusting the bounds of integration accordingly, `\(\exp(-\infty)=0\)` and `\(\exp(\infty)=\infty\)`

---

# Logit Derivation
 
- Substituting in then yields:

`\begin{align*}
\Pr(d_1=1)&=\int_{\infty}^0\exp\left(-t\left[1+e^{(u_2-u_1)}\right]\right)(-dt)\\
&=\int_0^{\infty}\exp\left(-t\left[1+e^{(u_2-u_1)}\right]\right)dt\\
&=\left.\frac{\exp\left(-t\left[1+e^{(u_2-u_1)}\right]\right)}{-\left[1+e^{(u_2-u_1)}\right]}\right\vert^{\infty}_{0}\\
&=0-\frac{1}{-\left[1+e^{(u_2-u_1)}\right]}\\
&=\frac{\exp(u_1)}{\exp(u_1)+\exp(u_2)}
\end{align*}`

---

# Logit Estimation
- Consider our model from before:
`\begin{align*}
u_{i1}-u_{i2}=&(\beta_1-\beta_2)X_i+\gamma (Z_1-Z_2)
\end{align*}`

- We observe `\(X_i\)`, `\(Z_1\)`, `\(Z_2\)`, and `\(d_i\)`

- Assuming `\(\epsilon_1,\epsilon_2 \overset{iid}{\sim} T1EV\)` gives the likelihood of choosing `\(1\)` and `\(2\)` respectively as:
`\begin{align*}
P_{i1}=&\frac{\exp(u_{i1}-u_{i2})}{1+\exp(u_{i1}-u_{i2})}\\
P_{i2}=&\frac{1}{1+\exp(u_{i1}-u_{i2})}
\end{align*}`

- Note: if `\(\epsilon_1,\epsilon_2 \overset{iid}{\sim} T1EV\)` then  `\(\tilde{\epsilon}_1 \sim Logistic\)`, where `\(\tilde{\epsilon}_1 := \epsilon_1-\epsilon_2\)`

---

# Likelihood function
- We can view the event `\(d_i = j\)` as a weighted coin flip

- This gives us a random variable that follows the Bernoulli distribution

- Supposing our sample is of size `\(N\)`, the likelihood function would then be
`\begin{align}
\mathcal{L}\left(X,Z;\beta,\gamma\right)=&\prod_{i=1}^N P_{i1}^{d_{i1}} P_{i2}^{d_{i2}} \nonumber\\
=&\prod_{i=1}^N P_{i1}^{d_{i1}}\left[1-P_{i1}\right]^{(1-d_{i1})}\label{eq:logitlike}
\end{align}`

where `\(P_{i1}\)` and `\(P_{i2}\)` are both functions of `\(X,Z,\beta,\gamma\)`

---

# Log likelihood function

- For many reasons, it's better to maximize the log likelihood function

- Taking the log of \eqref{eq:logitlike} gives

`\begin{align}
\ell\left(X,Z;\beta,\gamma\right)=&\sum_{i=1}^N d_{i1}\log P_{i1} + (1-d_{i1}) \log \left(1-P_{i1}\right)\nonumber\\
=&\sum_{i=1}^N \sum_{j=1}^2 d_{ij}\log P_{ij}\label{eq:logitloglike}\\
=&\sum_{i=1}^N d_{i1}\left[\log \left(\exp (u_{i1}-u_{i2})\right)-\log\left(1 + \exp(u_{i1}-u_{i2})\right)\right] + \nonumber\\
&(1-d_{i1}) \left[\log \left(1\right)-\log \left(1 + \exp (u_{i1}-u_{i2})\right)\right]\nonumber\\
=&\sum_{i=1}^N d_{i1} \left[u_{i1}-u_{i2}\right]-\log\left(1+\exp(u_{i1}-u_{i2})\right)\nonumber
\end{align}`

---

# Multinomial Logit Estimation
.small[

- Adding more choices with i.i.d. Type I extreme value errors yields the .hi[multinomial logit]

- Normalizing with respect to alternative `\(J\)` we have (for `\(j\in\{1,\ldots,J-1\}\)`)
`\begin{align}
u_{ij}-u_{iJ}=&(\beta_j-\beta_J)X_i+\gamma (Z_j-Z_{J})
\end{align}`

- We observe `\(X_i, Z_1, \ldots, Z_J\)`, and `\(d_i\)`. The likelihood of choosing `\(j\)` and `\(J\)` respectively is:
`\begin{align}
P_{ij}&=\frac{\exp(u_{ij}-u_{iJ})}{1+\sum_{j'=1}^{J-1}\exp(u_{ij'}-u_{iJ})},&P_{iJ}=\frac{1}{1+\sum_{j'=1}^{J-1}\exp(u_{ij'}-u_J)}
\end{align}`

- The log likelihood function we maximize is then:
`\begin{align}
\ell(X,Z;\beta,\gamma)=&\sum_{i=1}^N\left[\sum_{j=1}^{J-1}(d_{ij}=1)(u_{ij}-u_{iJ})\right]-\ln\left(1+\sum_{j'=1}^{J-1}\exp(u_{ij'}-u_{iJ})\right)
\end{align}`
]

---

# Independence of Irrelevant Alternatives (IIA)
- One of the properties of the multinomial logit model is .hi[IIA]

- `\(P_{ij}/P_{ik}\)` does not depend upon what other alternatives are available:
`\begin{align*}
\frac{P_{ij}}{P_{ik}}&=\frac{e^{u_{ij}}/\sum_{j'}e^{u_{ij'}}}{e^{u_{ik}}/\sum_{j'}e^{u_{ij'}}}\\
&=\frac{e^{u_{ij}}}{e^{u_{ik}}}\\
&=e^{u_{ij}-u_{ik}}
\end{align*}`

---

# Advantage of IIA

- IIA can simplify estimation. Instead of using as our likelihood
`\begin{align}
P_{ij}=&\frac{\exp(u_{ij})}{\sum_{j'}^J\exp(u_{ij'})},
\end{align}`
we can use the _conditional likelihood_ `\(P_i(j|j\in K)\)` where `\(K<J\)`.

- The log likelihood function is then  (Begg and Gray, 1984):
`\begin{align*}
L(\beta,\gamma|d_i\in K)&=\sum_{i=1}^N\left[\sum_{j=1}^{K-1}(d_{ij}=1)(u_{ij}-u_{iK})\right]\\
&-\ln\left(1+\sum_{j'=1}^K\exp(u_{ij'}-u_{iK})\right)
\end{align*}`

---

# Disadvantage of IIA

Most famously illustrated by the "red bus/blue bus problem"

- Consider a commuter with the choice set `\(\{\text{ride a blue-colored bus}, \text{drive a car}\}\)`

- Now add a red-colored bus to the choice set

- Assume that the only difference in utility between a red bus and a blue bus is in `\(\epsilon\)`

- This will .hi[double] the probability of taking a bus

- Why? `\(P(\text{blue bus})/P(\text{car})\)` does not depend upon whether the red bus is available

We'll talk later about how to address this disadvantage

---

# Expected Utility 
- It is possible to move from the estimates of the utility function to expected utility

- (or at least differences in expected utility)
 
- Individual `\(i\)` is going to choose the best alternative

- Thus, expected utility from the best choice, `\(V_i\)`, is given by:
`\begin{align*}
V_i&=E\max_{j}(u_{ij}+\epsilon_{ij})
\end{align*}`
where the expectation is over all possible values of `\(\epsilon _{ij}\)`

---

# Expected Utility

- For the multinomial logit, this has a closed form:
`\begin{align}
V_i&=\ln\left(\sum_{j=1}^J\exp{u_{ij}}\right)+C
\end{align}`
where `\(C\)` is Euler's constant (a.k.a. Euler-Mascheroni constant)

- We will use this later when we discuss dynamic discrete choice models

---

# Alternative expression for expected utility

- Note that we can also express `\(V_i\)` as:
`\begin{align*}
V_i&=\ln\left(\sum_{j=1}^J\frac{\exp(u_{iJ})\exp(u_{ij})}{\exp(u_{iJ})}\right)+C\\
&=\ln\left(\sum_{j=1}^J\exp(u_{ij}-u_{iJ})\right)+u_{iJ}+C\\
&=\ln\left(1+\sum_{j=1}^{J-1}\exp(u_{ij}-u_{iJ})\right)+u_{iJ}+C
\end{align*}`

- This representation will become useful later in the course

---

# From Expected Utility to Consumer Surplus
- We may want to transform utility into dollars to get consumer surplus

- We need something in the utility function (such as price) that is measured in dollars

- Suppose `\(u_{ij}=\beta_jX_i+\gamma Z_j-\delta p_j\)`

- The coefficient on price, `\(\delta\)` then gives the utils-to-dollars conversion:
`\begin{align}
E(CS_i)=&\frac{1}{\delta}\left[\ln\left(\sum_{j=1}^J\exp{u_{ij}}\right)+C\right]
\end{align}`
- We can calculate the change in consumer surplus after a policy change as `\(E(CS_{i2})-E(CS_{i1})\)` where the `\(C\)`'s cancel out

---

# References
.smallest[
Ackerberg, D. A. (2003). "Advertising, Learning, and Consumer Choice in Experience Good
Markets: An Empirical Examination". In: _International Economic Review_ 44.3, pp.
1007-1040. DOI:
[10.1111/1468-2354.t01-2-00098](https://doi.org/10.1111%2F1468-2354.t01-2-00098).

Adams, R. P. (2018). _Model Selection and Cross Validation_. Lecture Notes. Princeton
University. URL:
[https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf](https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf).

Ahlfeldt, G. M., S. J. Redding, D. M. Sturm, et al. (2015). "The Economics of Density:
Evidence From the Berlin Wall". In: _Econometrica_ 83.6, pp. 2127-2189. DOI:
[10.3982/ECTA10876](https://doi.org/10.3982%2FECTA10876).

Altonji, J. G., T. E. Elder, and C. R. Taber (2005). "Selection on Observed and Unobserved
Variables: Assessing the Effectiveness of Catholic Schools". In: _Journal of Political
Economy_ 113.1, pp. 151-184. DOI: [10.1086/426036](https://doi.org/10.1086%2F426036).

Altonji, J. G. and C. R. Pierret (2001). "Employer Learning and Statistical
Discrimination". In: _Quarterly Journal of Economics_ 116.1, pp. 313-350. DOI:
[10.1162/003355301556329](https://doi.org/10.1162%2F003355301556329).

Angrist, J. D. and A. B. Krueger (1991). "Does Compulsory School Attendance Affect
Schooling and Earnings?" In: _Quarterly Journal of Economics_ 106.4, pp. 979-1014. DOI:
[10.2307/2937954](https://doi.org/10.2307%2F2937954).

Angrist, J. D. and J. Pischke (2009). _Mostly Harmless Econometrics: An Empiricist's
Companion_. Princeton University Press. ISBN: 0691120358.

Arcidiacono, P. (2004). "Ability Sorting and the Returns to College Major". In: _Journal
of Econometrics_ 121, pp. 343-375. DOI:
[10.1016/j.jeconom.2003.10.010](https://doi.org/10.1016%2Fj.jeconom.2003.10.010).

Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2016). _College Attrition and the Dynamics
of Information Revelation_. Working Paper. Duke University. URL:
[https://tyleransom.github.io/research/CollegeDropout2016May31.pdf](https://tyleransom.github.io/research/CollegeDropout2016May31.pdf).

Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2025). "College Attrition and the Dynamics
of Information Revelation". In: _Journal of Political Economy_ 133.1. DOI:
[10.1086/732526](https://doi.org/10.1086%2F732526).

Arcidiacono, P. and J. B. Jones (2003). "Finite Mixture Distributions, Sequential
Likelihood and the EM Algorithm". In: _Econometrica_ 71.3, pp. 933-946. DOI:
[10.1111/1468-0262.00431](https://doi.org/10.1111%2F1468-0262.00431).

Arcidiacono, P., J. Kinsler, and T. Ransom (2022b). "Asian American Discrimination in
Harvard Admissions". In: _European Economic Review_ 144, p. 104079. DOI:
[10.1016/j.euroecorev.2022.104079](https://doi.org/10.1016%2Fj.euroecorev.2022.104079).

Arcidiacono, P., J. Kinsler, and T. Ransom (2022a). "Legacy and Athlete Preferences at
Harvard". In: _Journal of Labor Economics_ 40.1, pp. 133-156. DOI:
[10.1086/713744](https://doi.org/10.1086%2F713744).

Arcidiacono, P. and R. A. Miller (2011). "Conditional Choice Probability Estimation of
Dynamic Discrete Choice Models With Unobserved Heterogeneity". In: _Econometrica_ 79.6,
pp. 1823-1867. DOI: [10.3982/ECTA7743](https://doi.org/10.3982%2FECTA7743).

Arroyo Marioli, F., F. Bullano, S. Kucinskas, et al. (2020). _Tracking R of COVID-19: A
New Real-Time Estimation Using the Kalman Filter_. Working Paper. medRxiv. DOI:
[10.1101/2020.04.19.20071886](https://doi.org/10.1101%2F2020.04.19.20071886).

Ashworth, J., V. J. Hotz, A. Maurel, et al. (2021). "Changes across Cohorts in Wage
Returns to Schooling and Early Work Experiences". In: _Journal of Labor Economics_ 39.4,
pp. 931-964. DOI: [10.1086/711851](https://doi.org/10.1086%2F711851).

Attanasio, O. P., C. Meghir, and A. Santiago (2011). "Education Choices in Mexico: Using a
Structural Model and a Randomized Experiment to Evaluate PROGRESA". In: _Review of
Economic Studies_ 79.1, pp. 37-66. DOI:
[10.1093/restud/rdr015](https://doi.org/10.1093%2Frestud%2Frdr015).

Aucejo, E. M. and J. James (2019). "Catching Up to Girls: Understanding the Gender
Imbalance in Educational Attainment Within Race". In: _Journal of Applied Econometrics_
34.4, pp. 502-525. DOI: [10.1002/jae.2699](https://doi.org/10.1002%2Fjae.2699).

Baragatti, M., A. Grimaud, and D. Pommeret (2013). "Likelihood-free Parallel Tempering".
In: _Statistics and Computing_ 23.4, pp. 535-549. DOI: [
10.1007/s11222-012-9328-6](https://doi.org/%2010.1007%2Fs11222-012-9328-6).

Bayer, P., R. McMillan, A. Murphy, et al. (2016). "A Dynamic Model of Demand for Houses
and Neighborhoods". In: _Econometrica_ 84.3, pp. 893-942. DOI:
[10.3982/ECTA10170](https://doi.org/10.3982%2FECTA10170).

Begg, C. B. and R. Gray (1984). "Calculation of Polychotomous Logistic Regression
Parameters Using Individualized Regressions". In: _Biometrika_ 71.1, pp. 11-18. DOI:
[10.1093/biomet/71.1.11](https://doi.org/10.1093%2Fbiomet%2F71.1.11).

Beggs, S. D., N. S. Cardell, and J. Hausman (1981). "Assessing the Potential Demand for
Electric Cars". In: _Journal of Econometrics_ 17.1, pp. 1-19. DOI:
[10.1016/0304-4076(81)90056-7](https://doi.org/10.1016%2F0304-4076%2881%2990056-7).

Berry, S., J. Levinsohn, and A. Pakes (1995). "Automobile Prices in Market Equilibrium".
In: _Econometrica_ 63.4, pp. 841-890. URL:
[http://www.jstor.org/stable/2171802](http://www.jstor.org/stable/2171802).

Blass, A. A., S. Lach, and C. F. Manski (2010). "Using Elicited Choice Probabilities to
Estimate Random Utility Models: Preferences for Electricity Reliability". In:
_International Economic Review_ 51.2, pp. 421-440. DOI:
[10.1111/j.1468-2354.2010.00586.x](https://doi.org/10.1111%2Fj.1468-2354.2010.00586.x).

Blundell, R. (2010). "Comments on: ``Structural vs. Atheoretic Approaches to
Econometrics'' by Michael Keane". In: _Journal of Econometrics_ 156.1, pp. 25-26. DOI:
[10.1016/j.jeconom.2009.09.005](https://doi.org/10.1016%2Fj.jeconom.2009.09.005).

Bresnahan, T. F., S. Stern, and M. Trajtenberg (1997). "Market Segmentation and the
Sources of Rents from Innovation: Personal Computers in the Late 1980s". In: _The RAND
Journal of Economics_ 28.0, pp. S17-S44. DOI:
[10.2307/3087454](https://doi.org/10.2307%2F3087454).

Brien, M. J., L. A. Lillard, and S. Stern (2006). "Cohabitation, Marriage, and Divorce in
a Model of Match Quality". In: _International Economic Review_ 47.2, pp. 451-494. DOI:
[10.1111/j.1468-2354.2006.00385.x](https://doi.org/10.1111%2Fj.1468-2354.2006.00385.x).

Card, D. (1995). "Using Geographic Variation in College Proximity to Estimate the Return
to Schooling". In: _Aspects of Labor Market Behaviour: Essays in Honour of John
Vanderkamp_. Ed. by L. N. Christofides, E. K. Grant and R. Swidinsky. Toronto: University
of Toronto Press.

Cardell, N. S. (1997). "Variance Components Structures for the Extreme-Value and Logistic
Distributions with Application to Models of Heterogeneity". In: _Econometric Theory_ 13.2,
pp. 185-213. URL:
[https://www.jstor.org/stable/3532724](https://www.jstor.org/stable/3532724).

Caucutt, E. M., L. Lochner, J. Mullins, et al. (2020). _Child Skill Production: Accounting
for Parental and Market-Based Time and Goods Investments_. Working Paper 27838. National
Bureau of Economic Research. DOI: [10.3386/w27838](https://doi.org/10.3386%2Fw27838).

Chen, X., H. Hong, and D. Nekipelov (2011). "Nonlinear Models of Measurement Errors". In:
_Journal of Economic Literature_ 49.4, pp. 901-937. DOI:
[10.1257/jel.49.4.901](https://doi.org/10.1257%2Fjel.49.4.901).

Chintagunta, P. K. (1992). "Estimating a Multinomial Probit Model of Brand Choice Using
the Method of Simulated Moments". In: _Marketing Science_ 11.4, pp. 386-407. DOI:
[10.1287/mksc.11.4.386](https://doi.org/10.1287%2Fmksc.11.4.386).

Cinelli, C. and C. Hazlett (2020). "Making Sense of Sensitivity: Extending Omitted
Variable Bias". In: _Journal of the Royal Statistical Society: Series B (Statistical
Methodology)_ 82.1, pp. 39-67. DOI:
[10.1111/rssb.12348](https://doi.org/10.1111%2Frssb.12348).

Coate, P. and K. Mangum (2019). _Fast Locations and Slowing Labor Mobility_. Working Paper
19-49. Federal Reserve Bank of Philadelphia.

Cunha, F., J. J. Heckman, and S. M. Schennach (2010). "Estimating the Technology of
Cognitive and Noncognitive Skill Formation". In: _Econometrica_ 78.3, pp. 883-931. DOI:
[10.3982/ECTA6551](https://doi.org/10.3982%2FECTA6551).

Cunningham, S. (2021). _Causal Inference: The Mixtape_. Yale University Press. URL:
[https://www.scunning.com/causalinference_norap.pdf](https://www.scunning.com/causalinference_norap.pdf).

Delavande, A. and C. F. Manski (2015). "Using Elicited Choice Probabilities in
Hypothetical Elections to Study Decisions to Vote". In: _Electoral Studies_ 38, pp. 28-37.
DOI: [10.1016/j.electstud.2015.01.006](https://doi.org/10.1016%2Fj.electstud.2015.01.006).

Delavande, A. and B. Zafar (2019). "University Choice: The Role of Expected Earnings,
Nonpecuniary Outcomes, and Financial Constraints". In: _Journal of Political Economy_
127.5, pp. 2343-2393. DOI: [10.1086/701808](https://doi.org/10.1086%2F701808).

Diegert, P., M. A. Masten, and A. Poirier (2025). _Assessing Omitted Variable Bias when
the Controls are Endogenous_. arXiv. DOI:
[10.48550/ARXIV.2206.02303](https://doi.org/10.48550%2FARXIV.2206.02303).

Erdem, T. and M. P. Keane (1996). "Decision-Making under Uncertainty: Capturing Dynamic
Brand Choice Processes in Turbulent Consumer Goods Markets". In: _Marketing Science_ 15.1,
pp. 1-20. DOI: [10.1287/mksc.15.1.1](https://doi.org/10.1287%2Fmksc.15.1.1).

Evans, R. W. (2018). _Simulated Method of Moments (SMM) Estimation_. QuantEcon Note.
University of Chicago. URL:
[https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93](https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93).

Farber, H. S. and R. Gibbons (1996). "Learning and Wage Dynamics". In: _Quarterly Journal
of Economics_ 111.4, pp. 1007-1047. DOI:
[10.2307/2946706](https://doi.org/10.2307%2F2946706).

Fu, C., N. Grau, and J. Rivera (2020). _Wandering Astray: Teenagers' Choices of Schooling
and Crime_. Working Paper. University of Wisconsin-Madison. URL:
[https://www.ssc.wisc.edu/~cfu/wander.pdf](https://www.ssc.wisc.edu/~cfu/wander.pdf).

Gillingham, K., F. Iskhakov, A. Munk-Nielsen, et al. (2022). "Equilibrium Trade in
Automobiles". In: _Journal of Political Economy_. DOI:
[10.1086/720463](https://doi.org/10.1086%2F720463).

Haile, P. (2019). _``Structural vs. Reduced Form'' Language and Models in Empirical
Economics_. Lecture Slides. Yale University. URL:
[http://www.econ.yale.edu/~pah29/intro.pdf](http://www.econ.yale.edu/~pah29/intro.pdf).

Haile, P. (2024). _Models, Measurement, and the Language of Empirical Economics_. Lecture
Slides. Yale University. URL:
[https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf](https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf).

Heckman, J. J., J. Stixrud, and S. Urzua (2006). "The Effects of Cognitive and
Noncognitive Abilities on Labor Market Outcomes and Social Behavior". In: _Journal of
Labor Economics_ 24.3, pp. 411-482. DOI:
[10.1086/504455](https://doi.org/10.1086%2F504455).

Hotz, V. J. and R. A. Miller (1993). "Conditional Choice Probabilities and the Estimation
of Dynamic Models". In: _The Review of Economic Studies_ 60.3, pp. 497-529. DOI:
[10.2307/2298122](https://doi.org/10.2307%2F2298122).

Hurwicz, L. (1950). "Generalization of the Concept of Identification". In: _Statistical
Inference in Dynamic Economic Models_. Hoboken, NJ: John Wiley and Sons, pp. 245-257.

Ishimaru, S. (2022). _Geographic Mobility of Youth and Spatial Gaps in Local College and
Labor Market Opportunities_. Working Paper. Hitotsubashi University.

James, J. (2011). _Ability Matching and Occupational Choice_. Working Paper 11-25. Federal
Reserve Bank of Cleveland.

James, J. (2017). "MM Algorithm for General Mixed Multinomial Logit Models". In: _Journal
of Applied Econometrics_ 32.4, pp. 841-857. DOI:
[10.1002/jae.2532](https://doi.org/10.1002%2Fjae.2532).

Jin, H. and H. Shen (2020). "Foreign Asset Accumulation Among Emerging Market Economies: A
Case for Coordination". In: _Review of Economic Dynamics_ 35.1, pp. 54-73. DOI:
[10.1016/j.red.2019.04.006](https://doi.org/10.1016%2Fj.red.2019.04.006).

Keane, M. P. (2010). "Structural vs. Atheoretic Approaches to Econometrics". In: _Journal
of Econometrics_ 156.1, pp. 3-20. DOI:
[10.1016/j.jeconom.2009.09.003](https://doi.org/10.1016%2Fj.jeconom.2009.09.003).

Keane, M. P. and K. I. Wolpin (1997). "The Career Decisions of Young Men". In: _Journal of
Political Economy_ 105.3, pp. 473-522. DOI:
[10.1086/262080](https://doi.org/10.1086%2F262080).

Koopmans, T. C. and O. Reiersol (1950). "The Identification of Structural
Characteristics". In: _The Annals of Mathematical Statistics_ 21.2, pp. 165-181. URL:
[http://www.jstor.org/stable/2236899](http://www.jstor.org/stable/2236899).

Kosar, G., T. Ransom, and W. van der Klaauw (2022). "Understanding Migration Aversion
Using Elicited Counterfactual Choice Probabilities". In: _Journal of Econometrics_ 231.1,
pp. 123-147. DOI:
[10.1016/j.jeconom.2020.07.056](https://doi.org/10.1016%2Fj.jeconom.2020.07.056).

Krauth, B. (2016). "Bounding a Linear Causal Effect Using Relative Correlation
Restrictions". In: _Journal of Econometric Methods_ 5.1, pp. 117-141. DOI:
[10.1515/jem-2013-0013](https://doi.org/10.1515%2Fjem-2013-0013).

Lang, K. and M. D. Palacios (2018). _The Determinants of Teachers' Occupational Choice_.
Working Paper 24883. National Bureau of Economic Research. DOI:
[10.3386/w24883](https://doi.org/10.3386%2Fw24883).

Lee, D. S., J. McCrary, M. J. Moreira, et al. (2020). _Valid t-ratio Inference for IV_.
Working Paper. arXiv. URL:
[https://arxiv.org/abs/2010.05058](https://arxiv.org/abs/2010.05058).

Lewbel, A. (2019). "The Identification Zoo: Meanings of Identification in Econometrics".
In: _Journal of Economic Literature_ 57.4, pp. 835-903. DOI:
[10.1257/jel.20181361](https://doi.org/10.1257%2Fjel.20181361).

Mahoney, N. (2022). "Principles for Combining Descriptive and Model-Based Analysis in
Applied Microeconomics Research". In: _Journal of Economic Perspectives_ 36.3, pp. 211-22.
DOI: [10.1257/jep.36.3.211](https://doi.org/10.1257%2Fjep.36.3.211).

McFadden, D. (1978). "Modelling the Choice of Residential Location". In: _Spatial
Interaction Theory and Planning Models_. Ed. by A. Karlqvist, L. Lundqvist, F. Snickers
and J. W. Weibull. Amsterdam: North Holland, pp. 75-96.

McFadden, D. (1989). "A Method of Simulated Moments for Estimation of Discrete Response
Models Without Numerical Integration". In: _Econometrica_ 57.5, pp. 995-1026. DOI:
[10.2307/1913621](https://doi.org/10.2307%2F1913621). URL:
[http://www.jstor.org/stable/1913621](http://www.jstor.org/stable/1913621).

Mellon, J. (2020). _Rain, Rain, Go Away: 137 Potential Exclusion-Restriction Violations
for Studies Using Weather as an Instrumental Variable_. Working Paper. University of
Manchester. URL:
[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610).

Miller, R. A. (1984). "Job Matching and Occupational Choice". In: _Journal of Political
Economy_ 92.6, pp. 1086-1120. DOI: [10.1086/261276](https://doi.org/10.1086%2F261276).

Mincer, J. (1974). _Schooling, Experience and Earnings_. New York: Columbia University
Press for National Bureau of Economic Research.

Ost, B., W. Pan, and D. Webber (2018). "The Returns to College Persistence for Marginal
Students: Regression Discontinuity Evidence from University Dismissal Policies". In:
_Journal of Labor Economics_ 36.3, pp. 779-805. DOI:
[10.1086/696204](https://doi.org/10.1086%2F696204).

Oster, E. (2019). "Unobservable Selection and Coefficient Stability: Theory and Evidence".
In: _Journal of Business & Economic Statistics_ 37.2, pp. 187-204. DOI:
[10.1080/07350015.2016.1227711](https://doi.org/10.1080%2F07350015.2016.1227711).

Pischke, S. (2007). _Lecture Notes on Measurement Error_. Lecture Notes. London School of
Economics. URL:
[http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf](http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf).

Ransom, M. R. and T. Ransom (2018). "Do High School Sports Build or Reveal Character?
Bounding Causal Estimates of Sports Participation". In: _Economics of Education Review_
64, pp. 75-89. DOI:
[10.1016/j.econedurev.2018.04.002](https://doi.org/10.1016%2Fj.econedurev.2018.04.002).

Ransom, T. (2022). "Labor Market Frictions and Moving Costs of the Employed and
Unemployed". In: _Journal of Human Resources_ 57.S, pp. S137-S166. DOI:
[10.3368/jhr.monopsony.0219-10013R2](https://doi.org/10.3368%2Fjhr.monopsony.0219-10013R2).

Rudik, I. (2020). "Optimal Climate Policy When Damages Are Unknown". In: _American
Economic Journal: Economic Policy_ 12.2, pp. 340-373. DOI:
[10.1257/pol.20160541](https://doi.org/10.1257%2Fpol.20160541).

Rust, J. (1987). "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold
Zurcher". In: _Econometrica_ 55.5, pp. 999-1033. URL:
[http://www.jstor.org/stable/1911259](http://www.jstor.org/stable/1911259).

Shalizi, C. R. (2019). _Advanced Data Analysis from an Elementary Point of View_.
Cambridge University Press. URL:
[http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf](http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf).

Smith Jr., A. A. (2008). "Indirect Inference". In: _The New Palgrave Dictionary of
Economics_. Ed. by S. N. Durlauf and L. E. Blume. Vol. 1-8. London: Palgrave Macmillan.
DOI: [10.1007/978-1-349-58802-2](https://doi.org/10.1007%2F978-1-349-58802-2). URL:
[http://www.econ.yale.edu/smith/palgrave7.pdf](http://www.econ.yale.edu/smith/palgrave7.pdf).

Stinebrickner, R. and T. Stinebrickner (2014a). "Academic Performance and College Dropout:
Using Longitudinal Expectations Data to Estimate a Learning Model". In: _Journal of Labor
Economics_ 32.3, pp. 601-644. DOI: [10.1086/675308](https://doi.org/10.1086%2F675308).

Stinebrickner, R. and T. R. Stinebrickner (2014b). "A Major in Science? Initial Beliefs
and Final Outcomes for College Major and Dropout". In: _Review of Economic Studies_ 81.1,
pp. 426-472. DOI: [10.1093/restud/rdt025](https://doi.org/10.1093%2Frestud%2Frdt025).

Su, C. and K. L. Judd (2012). "Constrained Optimization Approaches to Estimation of
Structural Models". In: _Econometrica_ 80.5, pp. 2213-2230. DOI:
[10.3982/ECTA7925](https://doi.org/10.3982%2FECTA7925).

Train, K. (2009). _Discrete Choice Methods with Simulation_. 2nd ed. Cambridge; New York:
Cambridge University Press. ISBN: 9780521766555.

Wiswall, M. and B. Zafar (2018). "Preference for the Workplace, Investment in Human
Capital, and Gender". In: _Quarterly Journal of Economics_ 133.1, pp. 457-507. DOI:
[10.1093/qje/qjx035](https://doi.org/10.1093%2Fqje%2Fqjx035).

Young, A. (2020). _Consistency without Inference: Instrumental Variables in Practical
Application_. Working Paper. London School of Economics.
]