Lecture 10

class: center, middle, inverse, title-slide

# Lecture 10
## Model Fit, Validation and Counterfactual Simulations
### Tyler Ransom
### ECON 6343, University of Oklahoma

---

# Plan for the Day

1. Discuss how to assess model fit

2. Model validation

3. Counterfactual simulations

4. Go through examples from a couple of recent papers

- Fu, Grau, and Rivera (2020)

- Lang and Palacios (2018)

---
# Steps to producing a "structural" estimation

- We discussed the structural workflow in our 3rd class meeting

- Since that class meeting, we've focused mainly on estimation

- We also briefly talked about identification

- But there are still some steps we haven't covered

- Counterfactuals are where you actually answer your research question

- And before counterfactuals you need to show your model fits the data

---
# Model Fit

- Model fit basically consists of showing that your model can match the data

- This is because the counterfactuals rely solely on the model

- Most model fit is _ad hoc_: you can choose what to show

- Typically, you'll want to show that moments of the `$Y$` variables match

- e.g. choice frequencies in the data match choice probabilities of the model

- But there may be other moments of interest, like dynamic choice transitions

---
# Example from Fu, Grau, and Rivera (2020)

- In this paper, the authors show the following model fit statistics:

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Variable </th>
   <th style="text-align:right;"> Data </th>
   <th style="text-align:right;"> Model </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Ever arrested % </td>
   <td style="text-align:right;"> 5.60 </td>
   <td style="text-align:right;"> 5.10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Always enrolled, 0 arrest % </td>
   <td style="text-align:right;"> 72.20 </td>
   <td style="text-align:right;"> 71.10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> GPA (standardized) </td>
   <td style="text-align:right;"> -0.39 </td>
   <td style="text-align:right;"> -0.43 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Retention % </td>
   <td style="text-align:right;"> 7.30 </td>
   <td style="text-align:right;"> 8.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Grade Completed by T </td>
   <td style="text-align:right;"> 11.10 </td>
   <td style="text-align:right;"> 11.20 </td>
  </tr>
</tbody>
</table>

- These results are for youth with parents who are not well educated

- (This group is the most at risk to be arrested or drop out of school)

- Model fit for other groups is similar

---
# How to do model fit

There are a couple of options for producing model fit statistics

1. Use the model assumptions and estimates to simulate a new dataset

2. Use the data and estimates to compute `$\hat{Y}$` and compare with `$Y$`

---
# Simulating a new dataset

- To simulate a new data set, you simply draw the unobservables from the model

- Use any initial conditions from the data

- Then (if the model is dynamic) simply draw `$Y$` in `$t=1$`

- Then update the `$X$`'s in `$t=2$` and draw `$Y$` in `$t=2$`, ...

- If you used SMM to estimate the model, you should already have code for this

- Typically you want to repeat this process a few times

- This is because you don't want model fit to be subject to simulation error

---
# Assessing model fit from the data and estimates

- A simpler option is to predict the `$\hat{Y}$` using the actual data

- This is identical to using a `predict()` function (like in PS6)

- However, this approach may not be possible in a model with key latent variables

- For example, unobserved types or other unobservables are not in the data

- In this case, you have no choice but to go the simulation route

---
# Model Validation

- Even if your model fits the data well, some may still be skeptical

- This is because, in most cases, you used your entire dataset in estimation

- If you do this, you open yourself up to the possibility of .hi[overfitting]

- .hi[Model validation] is a way of showing that your model is not overfit

- To do so, you show that the model reproduces key stats in a "holdout sample"

- This sample was "held out" of the estimation for purposes of validation

---
# Mini aside on machine learning

Machine learning is all about automating two hand-in-hand processes:

1. Model selection
    - What should the specification be?

2. Model validation
    - Does the model generalize to other (similar) contexts?

- The goal of machine learning is to automate these processes to maximize prediction

- This is different than the goal of econometrics (causal inference)

---
# Model selection

- In classical econometrics, we always select our models by hand

- This what the famed and sometimes maligned "robustness checks" address

- In machine learning, we let the computer tell us which specification is "best"

- "Best" is defined by minimizing both in- and out-of-sample prediction error

- This involves estimating many different specifications

- For complex models, automated model selection is computationally intensive

- Rust (1987) did this by hand when exploring the shape of the cost function

---
# Model validation

- Broadly speaking, model validation is about whether the model "makes sense"

- It can be qualitative ("do the parameter results make sense? conform to theory?")

- It can also be quantitative ("is the prediction error sufficiently low?")

- In machine learning, .hi[cross-validation] is typically used

- Cross-validation informs about the optimal complexity of the model

- Then we (can) automate our specification search subject to that optimal complexity

- Automated specification search is still not commonly done in structural estimation

---
# How cross-validation works  (Adams, 2018)

.center[![cross-validation](CVdiagram.jpg)]

- Blue is the data that we use to estimate the model's parameters
- We randomly hold out `$K$` portions of this data one-at-a-time (Green boxes)
- We assess the performance of the model in the Green data
- This tells us the optimal complexity (by "hyperparameters" if CV is automated)

---
# Validation in  Lang and Palacios (2018)

- A common question is how much unobserved heterogeneity to allow for

- This is not _ex ante_ knowable

- Typical approach is to put "enough" in there such that the in-sample fit is good

- Lang and Palacios (2018) search for optimal number of types

- Use a 20% holdout sample to assess the out-of-sample fit

- Note that they do not do cross-validation

- They simply compare how the model fits in the estimation and holdout data

---
# Validation in  Delavande and Zafar (2019)

- Another form of validation could be to predict treatment effects in control group

- If the model is good, it should be able to do this

- Delavande and Zafar (2019) do something similar to this

- Rather than comparing model predictions in/out of sample,

- They compare them for a different state of the world

- In their case, the new state of the world is one without schooling costs

- They can do this because they elicited preferences for schooling under both regimes

- But they only used one regime when estimating their model

---
# Counterfactuals

- Counterfactuals involve changing the model, simulating new data and then comparing w/status quo

- "Changing a policy" typically can be captured by changing the value of a set of parameters

- "Policy experiments" typically mimic some real-world policy in this way

- .hi[Note:] policy experiments always assume invariance of the other model parameters!

- So if I change the value of one parameter, that won't "leak" into any others

- This may be a heroic assumption, depending on the model

---
# Examples of counterfactuals

.smaller[
- Fu, Grau, and Rivera (2020)
    - Would low-SES youth commit fewer crimes if we sent them to better schools?
    
- Arcidiacono, Aucejo, Maurel, and Ransom (2016)
    - How would human capital investment change if there were perfect information?
    
- Arcidiacono, Kinsler, and Ransom (2019)
    - Would Harvard become less white if it were to eliminate legacy admissions?
    
- Ransom (2019)
    - Would American workers move to a new city if they were paid $10,000 to do so?
    
- Caucutt, Lochner, Mullins, and Park (2020)
    - What can reduce disparities in child investment across rich/poor families?
]

---
# How to do counterfactuals

- To perform the counterfactual simulations, follow these steps:

- Impose parameter restrictions
    
    - Simulate fake data under those restrictions
    
    - Compare the resulting model summary to a "baseline" prediction
    
    - (The baseline prediction is your model fit)
    
- As you can see, if you use SMM, you've already got the code to simulate the model

---
# Coding example: multinomial logit estimation

This code comes from the first question of Problem Set 4:

.scroll-box-16[

```julia
using Random, LinearAlgebra, Statistics, Optim, DataFrames, CSV, HTTP, GLM, ForwardDiff, FreqTables

# read in data
url = "https://raw.githubusercontent.com/OU-PhD-Econometrics/fall-2020/master/ProblemSets/PS4-mixture/nlsw88t.csv"
df = CSV.read(HTTP.get(url).body)
X = [df.age df.white df.collgrad]
Z = hcat(df.elnwage1, df.elnwage2, df.elnwage3, df.elnwage4, 
         df.elnwage5, df.elnwage6, df.elnwage7, df.elnwage8)
y = df.occ_code

# estimate multionomial logit model
include("mlogit_with_Z.jl")
θ_start = [ .0403744; .2439942; -1.57132; .0433254; .1468556; -2.959103; .1020574; .7473086; -4.12005; .0375628; .6884899; -3.65577; .0204543; -.3584007; -4.376929; .1074636; -.5263738; -6.199197; .1168824; -.2870554; -5.322248; 1.307477]
td = TwiceDifferentiable(b -> mlogit_with_Z(b, X, Z, y), θ_start; autodiff = :forward);
θ̂_optim_ad = optimize(td, θ_start, LBFGS(), Optim.Options(g_tol = 1e-5, iterations=100_000, show_trace=true, show_every=50));
θ̂_mle_ad = θ̂_optim_ad.minimizer
H  = Optim.hessian!(td, θ̂_mle_ad)
θ̂_mle_ad_se = sqrt.(diag(inv(H)))
```
]

---
# Coding example: model fit and counterfactuals

Now let's evaluate model fit and effect of counterfactual policy

.scroll-box-16[

```julia
# get baseline predicted probabilities
function plogit(θ, X, Z, J)
    α = θ[1:end-1]
    γ = θ[end]
    K = size(X,2)
    bigα = [reshape(α,K,J-1) zeros(K)]
    P = exp.(X*bigα .+ Z*γ) ./ sum.(eachrow(exp.(X*bigα .+ Z*γ)))
    return P
end
println(θ̂_mle_ad)
P = plogit(θ̂_mle_ad, X, Z, length(unique(y)))

# compare with data
modelfitdf = DataFrame(data_freq = 100*convert(Array, prop(freqtable(df,:occ_code))), P = 100*vec(mean(P'; dims=2)))
display(modelfitdf)

# now run a counterfactual:
# suppose people don't care about wages when choosing occupation
# this means setting θ̂_mle_ad[end] = 0
θ̂_cfl = deepcopy(θ̂_mle_ad)
θ̂_cfl[end] = 0
P_cfl = plogit(θ̂_cfl, X, Z, length(unique(y)))
modelfitdf.P_cfl = 100*vec(mean(P_cfl'; dims=2))
modelfitdf.ew   = vec(mean(Z'; dims=2))
modelfitdf.diff = modelfitdf.P_cfl .- modelfitdf.P
display(modelfitdf)
│ Row │ data_freq │ P       │ P_cfl   │ diff      │ ew      │
│     │ Float64   │ Float64 │ Float64 │ Float64   │ Float64 │
├─────┼───────────┼─────────┼─────────┼───────────┼─────────┤
│ 1   │ 10.5976   │ 11.2549 │ 8.34334 │ -2.91158  │ 1.96627 │
│ 2   │ 5.24943   │ 6.21553 │ 4.98708 │ -1.22845  │ 1.85026 │
│ 3   │ 38.6462   │ 37.5457 │ 34.4537 │ -3.092    │ 1.71771 │
│ 4   │ 4.66067   │ 4.88623 │ 5.70078 │ 0.814551  │ 1.54756 │
│ 5   │ 1.54063   │ 1.80065 │ 1.62727 │ -0.173382 │ 1.75298 │
│ 6   │ 15.1525   │ 14.676  │ 15.123  │ 0.447092  │ 1.58918 │
│ 7   │ 18.3571   │ 17.3364 │ 23.3689 │ 6.03245   │ 1.42956 │
│ 8   │ 5.79588   │ 6.28457 │ 6.39589 │ 0.111316  │ 1.59501 │
```
]

---
# Discussion of model fit counterfactuals

- Model fit is good, and I would have expected even better

- MLE is designed to match the data choice frequencies

- The counterfactuals make sense

- Removing preference for earnings leads to re-sorting of occupations

- Sorting into low-wage occupations and out of high-wage ones

---
# Confidence intervals around counterfactuals

- The counterfactuals above give us a new predicted occupational distribution

- But we don't know what the confidence intervals are around the prediction

- Our estimates inherently come with sampling variation

- We should incorporate that variation into our counterfactual

- How do we do this? Bootstrapping

---
# Bootstrapping to get CIs around counterfactuals

- Recall: `$-H^{-1}$` is the variance matrix of our estimates, where `$H$` is the Hessian

- Assume that `$\hat{\theta} \sim MVN(\theta,-H^{-1})$`

- but remember that we want to minimize `$-\ell$` so we just use `$H^{-1}$`

- We sample from this distribution `$B$` times and compute our new counterfactuals

- Then we can look at the 2.5th and 97.5th percentiles

- This will represent a 95% confidence interval

- This algorithm is known as the .hi[parametric bootstrap]

- Cf. .hi[nonparametric bootstrap] which is when we randomly re-sample from our data

---
# Coding example: parametric bootstrap

- Here's how we can get the confidence interval around the policy effect

- Note: it's uncommon to see this, due to computational demands of just one cfl

.scroll-box-12[

```julia
# now do parametric bootstrap
Random.seed!(1234)
invHfix = UpperTriangular(inv(H)) .- Diagonal(inv(H)) .+ UpperTriangular(inv(H))' # this code makes inv(H) truly symmetric (not just numerically symmetric)
d = MvNormal(θ̂_mle_ad,invHfix)
B = 1000
cfl_bs = zeros(8,B)
for b = 1:B
    θ_draw = rand(d)
    Pb = plogit(θ_draw, X, Z, length(unique(y)))
    Pbs = 100*vec(mean(Pb'; dims=2))
    θ_draw[end] = 0
    P_cflb = plogit(θ_draw, X, Z, length(unique(y)))
    Pcfl_bs = 100*vec(mean(P_cflb'; dims=2))
    cfl_bs[:,b] = Pcfl_bs .- Pbs
end
bsdiffL = deepcopy(cfl_bs[:,1])
bsdiffU = deepcopy(cfl_bs[:,1])
for i=1:size(cfl_bs,1)
    bsdiffL[i] = quantile(cfl_bs[i,:],.025)
    bsdiffU[i] = quantile(cfl_bs[i,:],.975)
end
modelfitdf.bs_diffL = bsdiffL
modelfitdf.bs_diffU = bsdiffU
display(modelfitdf)
│ Row │ data_freq │ P       │ P_cfl   │ diff      │ ew      │ bs_diffL  │ bs_diffU  │
│     │ Float64   │ Float64 │ Float64 │ Float64   │ Float64 │ Float64   │ Float64   │
├─────┼───────────┼─────────┼─────────┼───────────┼─────────┼───────────┼───────────┤
│ 1   │ 10.5976   │ 11.2549 │ 8.34334 │ -2.91158  │ 1.96627 │ -3.37791  │ -2.40627  │
│ 2   │ 5.24943   │ 6.21553 │ 4.98708 │ -1.22845  │ 1.85026 │ -1.46601  │ -1.0004   │
│ 3   │ 38.6462   │ 37.5457 │ 34.4537 │ -3.092    │ 1.71771 │ -3.72105  │ -2.4462   │
│ 4   │ 4.66067   │ 4.88623 │ 5.70078 │ 0.814551  │ 1.54756 │ 0.666758  │ 0.967763  │
│ 5   │ 1.54063   │ 1.80065 │ 1.62727 │ -0.173382 │ 1.75298 │ -0.214975 │ -0.135771 │
│ 6   │ 15.1525   │ 14.676  │ 15.123  │ 0.447092  │ 1.58918 │ 0.387452  │ 0.506462  │
│ 7   │ 18.3571   │ 17.3364 │ 23.3689 │ 6.03245   │ 1.42956 │ 4.82327   │ 7.20728   │
│ 8   │ 5.79588   │ 6.28457 │ 6.39589 │ 0.111316  │ 1.59501 │ 0.0795148 │ 0.143139  │
```
]

---
# References
.tiny[
Adams, R. P. (2018). _Model Selection and Cross Validation_. Lecture
Notes. Princeton University. URL:
[https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf](https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf).

Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2016). _College
Attrition and the Dynamics of Information Revelation_. Working Paper.
Duke University. URL:
[https://tyleransom.github.io/research/CollegeDropout2016May31.pdf](https://tyleransom.github.io/research/CollegeDropout2016May31.pdf).

Arcidiacono, P., J. Kinsler, and T. Ransom (2019). _Legacy and Athlete
Preferences at Harvard_. Working Paper 26316. National Bureau of
Economic Research. DOI:
[10.3386/w26316](https://doi.org/10.3386%2Fw26316).

Caucutt, E. M., L. Lochner, J. Mullins, et al. (2020). _Child Skill
Production: Accounting for Parental and Market-Based Time and Goods
Investments_. Working Paper 27838. National Bureau of Economic
Research. DOI: [10.3386/w27838](https://doi.org/10.3386%2Fw27838).

Delavande, A. and B. Zafar (2019). "University Choice: The Role of
Expected Earnings, Nonpecuniary Outcomes, and Financial Constraints".
In: _Journal of Political Economy_ 127.5, pp. 2343-2393. DOI:
[10.1086/701808](https://doi.org/10.1086%2F701808).

Fu, C., N. Grau, and J. Rivera (2020). _Wandering Astray: Teenagersâ€™
Choices of Schooling and Crime_. Working Paper. University of
Wisconsin-Madison. URL:
[https://www.ssc.wisc.edu/~cfu/wander.pdf](https://www.ssc.wisc.edu/~cfu/wander.pdf).

Lang, K. and M. D. Palacios (2018). _The Determinants of Teachers'
Occupational Choice_. Working Paper 24883. National Bureau of Economic
Research. DOI: [10.3386/w24883](https://doi.org/10.3386%2Fw24883).

Ransom, T. (2019). _Labor Market Frictions and Moving Costs of the
Employed and Unemployed_. Discussion Paper 12139. IZA. URL:
[http://ftp.iza.org/dp12139.pdf](http://ftp.iza.org/dp12139.pdf).

Rust, J. (1987). "Optimal Replacement of GMC Bus Engines: An Empirical
Model of Harold Zurcher". In: _Econometrica_ 55.5, pp. 999-1033. URL:
[http://www.jstor.org/stable/1911259](http://www.jstor.org/stable/1911259).
]