Lecture 3

class: center, middle, inverse, title-slide

# Lecture 3
## Structural Modeling Workflow
### Tyler Ransom
### ECON 6343, University of Oklahoma

---

# Today

- What steps are required to estimate a structural model?

- Go through each step on an example model

---

# Steps we won't discuss today

- The material we discuss today will already assume you have data

- And that you have sufficient understanding of your data

- It also assumes you have an understanding of your preferred coding language

- These are all non-trivial steps, but they are typically covered in other classes

- I will (indirectly) try to help you develop these skills throughout the course

---

# Steps to performing structural estimation

Mike Keane gave a [talk](https://www.youtube.com/watch?v=0hazaPBAYWE) at the University of Chicago in 2015 and listed these steps:

1. Theoretical Model Development

2. Practical Specification Issues

3. Solving the Model

4. Understanding How the Model Works

5. Estimation

6. Validation

7. Policy Experiments

---

# An example model

To help fix ideas, let's revisit a commonly used model in introductory econometrics:

$$
`\begin{align}
\log(w_{i}) =& \beta_0 + \beta_1 s_{i} + \beta_2 x_{i} + \beta_3 x^2_{i} + \varepsilon_{i}
\label{eq:basicmincer}
\end{align}`
$$

where we have cross-sectional data and where
- `$i$` indexes individuals
- `$w_{i}$` is employment income
- `$s_{i}$` is years of schooling
- `$x_{i}$` is years of work experience (or, more commonly, _potential_ work experience)
- `$\varepsilon_{i}$` is anything else that determines income

We want to estimate `$\left(\beta_1,\beta_2,\beta_3\right)$`, which are .hi[returns to human capital investment]

---

# Quick review

- It is nearly certain that \eqref{eq:basicmincer} suffers from omitted variable bias

- i.e. there are lots of factors in `$\varepsilon_{i}$` that are correlated with both `$s_i$` and `$w_i$`

- Thus, our estimates of `$\left(\beta_1,\beta_2,\beta_3\right)$` will not be causal

- We could try to get causal estimates using a variety of identification strategies:

- find a valid instrument for `$s_i$` (Angrist and Krueger, 1991; Card, 1995)

- exploit a discontinuity in `$s_i$` (Ost, Pan, and Webber, 2018)

- randomize `$s_i$` (Attanasio, Meghir, and Santiago, 2011)

- etc.

---

# A structural view of Equation \eqref{eq:basicmincer}

- We know that \eqref{eq:basicmincer} will produced biased estimates, but _why_? Some possibilities:

- .hi[ability bias]
    - `$s_i$` and `$w_i$` are both positively correlated with unobservable cognitive ability

- .hi[comparative advantage]
    - multidimensional unobservable ability `$\implies$` self-selection into schooling

- .hi[credit constraints] 
    - `$s_i$` is a costly investment; some people may not be able to borrow enough

- .hi[preference heterogeneity] (differing tastes for `$s_i$`, differing discount rates)

---

# 1. Theoretical Model Development

- Since schooling has an up-front cost and long-term benefit, need a dynamic model

- period 1: decide how much schooling to get

- period 2: choose whether or not to work; if working, receive income by \eqref{eq:basicmincer}
    
    - individuals choose schooling level to maximize lifetime utility

- Preferences (denote utility in period `$t$` by `$u_t$`, with `$s,x$` and `$w$` defined previously)

$$
`\begin{align}
u_1\left(z,c,\eta_1\right) & = f\left(z,c,\eta_1\right) \nonumber \\
u_2\left(w\left(s,x\right),k,\eta_2\right) & = g\left(w\left(s,x\right),k,\eta_2\right) \\
\label{eq:utils}
\end{align}`
$$
where `$z$` is family background, `$c$` is schooling costs, `$k$` is number of kids in adult household and `$\eta_t$` are unobservable preferences [similar to `$\varepsilon$` in \eqref{eq:basicmincer}]

---

# 1. Theoretical Model Development

With discount factor `$\delta \in \left[0,1\right]$`, the discounted lifetime utility function is then

$$
`\begin{align}
V & = u_1\left(z,c,\eta_1\right) + \delta u_2\left(w\left(s,x\right),k,\eta_2\right)
\label{eq:PDV}
\end{align}`
$$

- Equations \eqref{eq:basicmincer}–\eqref{eq:PDV} define our model

- This model is still .hi[laughably unrealistic], but at least we have something

- A number of important questions arise (But we'll ignore these for today)
    - Where is cognitive ability? What exactly does `$c$` represent? Where are loans?
    - Maybe people should care about _consumption_ in period 2, not income
    - Does family background really only enter `$u_1$` and not `$\log\left(w\right)$`?
    - Should `$x$` in \eqref{eq:basicmincer} be a function of `$s$`? (Lower `$s \implies$` longer working life)
    - What are people's beliefs about future `$k$` when deciding `$s$`?

---

# Overview of the theoretical model

- As you can see, it takes a lot of know-how to write down even a simple model

- Requires knowledge about the subject and about math/econ more generally

.smallest[
.pull-left[
.hi[Exogenous variables]
- family background `$(z)$`
- schooling costs `$(c)$`
- children in household `$(k)$`

.hi[Endogenous variables]
- schooling `$(s)$`
- period-2 work decision

]

.pull-right[
.hi[Outcome variable]
- labor income `$(w)$`

.hi[Unobservables]
- income `$(\varepsilon)$`
- preferences `$(\eta_t)$`

.hi[Model parameters]
- returns to human capital `$(\beta)$`
- discount factor `$(\delta)$`
- other parameters implied by `$f(\cdot)$` and `$g(\cdot)$`

]
]

---

# 2. Practical Specification Issues

- Now that we have a model, we need to figure out how to take it to data

- This is where we apply knowledge about .hi[our data] and .hi[stats/econometrics]

- Key data questions:
    - Can we observe the variables of the model in our data set?
    - If so, are they reliably measured?

- Key specification questions:
    - How to model `$\eta_t$` and `$\varepsilon$`? (Need to make distributional assumptions)
    - Functional forms of `$f(\cdot)$` and `$g(\cdot)$`
    - Should `$s$` be continuous (years of schooling) or discrete (college/not)?

---

# 2. Practical Specification Issues

- We won't get into too many details about this today, but specification is important!

- What determines the specification is often:

- what is reliably measured in the data
    
    - what is computationally feasible to estimate
    
- Parameters of the model either need to be .hi[estimated] or .hi[calibrated]

- e.g. often we don't have reliable data to allow us to estimate `$\delta$`; we must calibrate it

- Computational feasibility often governs how we specify the different functions

- e.g. _linear-in-parameters_ with _additively separable_ unobservables [like \eqref{eq:basicmincer}]

---

# Example with real data

- Here is some real data from the most recent round of the NLSY97

.scroll-box-12[

```julia
using CSV, DataFrames, Statistics
df = CSV.read("Data/slides3data.csv"; missingstrings=["NA"])
size(df)
# outputs (6009, 12)

describe(df)
# outputs the below:
12×8 DataFrame
│ Row │ variable       │ mean     │ min  │ median  │ max     │ nunique │ nmissing │
│     │ Symbol         │ Float64  │ Real │ Float64 │ Real    │ Nothing │ Union…   │
├─────┼────────────────┼──────────┼──────┼─────────┼─────────┼─────────┼──────────┤
│ 1   │ id             │ 4534.71  │ 4    │ 4544.0  │ 9022    │         │          │
│ 2   │ female         │ 0.52671  │ 0    │ 1.0     │ 1       │         │          │
│ 3   │ black          │ 0.269762 │ 0    │ 0.0     │ 1       │         │          │
│ 4   │ latin          │ 0.210351 │ 0    │ 0.0     │ 1       │         │          │
│ 5   │ white          │ 0.511067 │ 0    │ 1.0     │ 1       │         │          │
│ 6   │ employed       │ 0.756532 │ 0    │ 1.0     │ 1       │         │          │
│ 7   │ wage           │ 25.5309  │ 8.0  │ 20.0    │ 150.0   │         │ 933      │
│ 8   │ collgrad       │ 0.350474 │ 0    │ 0.0     │ 1       │         │          │
│ 9   │ age            │ 34.967   │ 33   │ 35.0    │ 37      │         │          │
│ 10  │ parent_college │ 0.238975 │ 0    │ 0.0     │ 1       │         │          │
│ 11  │ numkids        │ 1.32684  │ 0    │ 1.0     │ 9       │         │          │
│ 12  │ efc            │ 4.2243   │ 0.0  │ 0.77763 │ 118.111 │         │          │
```
]

- We have demographics/background, wages, employment status, education, fertility
- N=6009, age `$\in \{33,\ldots,37\}$`, and 35% of respondents graduated college
- 24% have at least one college-graduate parent

---

# Example: setting up the specification

- It looks like we can estimate some form of our model

- We have family background, cost of college (this is the `efc` variable)

- We have employment status, wage and number of children

- It looks like we'll have to have `$s$` be binary (`collgrad` variable)

- Also need to assume `$x = age - 18$` if non-grad, `$x = age - 22$` if grad  (Mincer, 1974)

- Then we just need to add some functional form assumptions, and we'll be ready

- `$\varepsilon \sim$` Normal, `$\eta_t \sim$` Logistic
    - `$u_{i1} = \alpha_0 + \alpha_1 \text{ parent_college} + \alpha_2 \text{ efc} + \eta_1$`
    - `$u_{i2} = \gamma_0 + \gamma_1 \mathbb{E} \log w_{i} + \gamma_2 \text{ numkids} + \eta_2$`

---

# Parameters of the empirical model

- We can now detail the parameters of the empirical model

- .hi[wage parameters] `$(\beta,\sigma_{\varepsilon})$`
    - The latter is the std. dev. of income shocks

- .hi[schooling parameters] `$(\alpha)$`

- .hi[employment parameters] `$(\gamma,\delta)$`

- Then write down a statistical objective function as a fn. of data and parameters

- e.g. maximize the likelihood, or minimize the sum of the squared residuals

- We'll learn how to do this in later classes, but not today

---

# 3. Solving and 4. Understanding How the Model Works

- .hi[Solving the model:]

- solve the dynamic utility max problem for given parameter values
    
    - (we aren't estimating parameter values yet)
    
    - (we will talk about how to do this next week)

- .hi[Understanding the model:]

- simulate data from the model
    
    - make sure the simulated data is consistent with the model's implications
    
    - look at descriptive statistics from the simulated data

---

# 3. Solving and 4. Understanding How the Model Works

- Start with as simple of a model as possible; make sure things are working

- When introducing more complexities, do "numerical comparative statics"

- Make sure the parameters move in the correct directions

- e.g. `$\uparrow \beta_1 \implies \uparrow$` schooling (ceteris paribus)
   
- If they don't, you've likely got a bug somewhere

---

# Example with real data

- How would we do this in Julia?

- We can simulate log wages and then see how close we got

- This is kind of silly in our simple model, but the workflow is there

```julia
N = size(df,1)
β = [1.65,.4,.06,-.0002]
σ = .4;
df.exper = df.age .- ( 18*(1 .- df.collgrad) .+ 22*df.collgrad )
df.lwsim = β[1] .+ β[2]*df.collgrad .+ β[3]*df.exper .+ β[4]*df.exper.^2 .+ σ*randn(N)
df.lw    = log.(df.wage)
```

- We can then compare how `df.lwsim` compares with `df.lw` in the data
.scroll-box-4[

```julia
describe(df;cols=[:lw,:lwsim])
# returns
│ Row │ variable │ mean    │ min     │ median  │ max     │ nunique │ nmissing │ eltype                  │
│     │ Symbol   │ Float64 │ Float64 │ Float64 │ Float64 │ Nothing │ Union…   │ Type                    │
├─────┼──────────┼─────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────────────────────┤
│ 1   │ lw       │ 3.06219 │ 2.07944 │ 2.99573 │ 5.01064 │         │ 933      │ Union{Missing, Float64} │
│ 2   │ lwsim    │ 2.67169 │ 1.12192 │ 2.67668 │ 3.98557 │         │          │ Float64                 │
```
]

---

# 5. Estimation

- Most structural models require .hi[nonlinear estimation]

- e.g. MLE/GMM or their simulated counterparts

- In nonlinear optimization, starting values are crucial
   
- Initializing at random starting values is likely to give poor results

- Keane recommends calibrating the model by hand

- e.g. match the intercept of each equation to the `$\overline{Y}$`'s in the data

- I recommend estimating an intercepts-only model (or with very few `$X$`'s)

- But this advice is model-specific!

---

# 5. Estimation

- There are lots of algorithms for nonlinear optimization

- We'll talk more about these later in the course

- Your next problem set will show how to do this in Julia

---

# Example using real data

- In our simple model, we can get good starting values by estimating OLS and logits

- The wage equation can be estimated by OLS (on the subsample who are employed)

```julia
using GLM
β̂ = lm(@formula(lw ~ collgrad + exper + exper^2), df[df.employed.==1,:])
# returns
Coefficients:
─────────────────────────────────────────────────────────────────────────────────
               Estimate  Std. Error    t value  Pr(>|t|)    Lower 95%   Upper 95%
─────────────────────────────────────────────────────────────────────────────────
(Intercept)   2.94607    0.323145     9.11688     <1e-18   2.31255     3.57959
collgrad      0.534326   0.0271395   19.6881      <1e-82   0.481119    0.587532
exper        -0.0265561  0.0412115   -0.644386    0.5194  -0.107351    0.0542385
exper ^ 2     0.0014304  0.00132307   1.08112     0.2797  -0.00116346  0.00402426
─────────────────────────────────────────────────────────────────────────────────

df.elwage = predict(β̂, df) # generates expected log wage for all observations

r2(β̂)                               # reports R2
sqrt(deviance(β̂)/dof_residual(β̂))  # reports root mean squared error
```

---

# Example using real data

- The `$u_t$` equations can be estimated as simple logits (on the full sample)

.scroll-box-14[

```julia
α̂ = glm(@formula(collgrad ~ parent_college + efc), df, Binomial(), LogitLink())
# returns
Coefficients:
──────────────────────────────────────────────────────────────────────────────────
                  Estimate  Std. Error   z value  Pr(>|z|)   Lower 95%   Upper 95%
──────────────────────────────────────────────────────────────────────────────────
(Intercept)     -1.20091    0.0364888   -32.9118    <1e-99  -1.27243    -1.1294
parent_college   1.47866    0.068433     21.6074    <1e-99   1.34453     1.61278
efc              0.0450253  0.00437704   10.2867    <1e-24   0.0364464   0.0536041
──────────────────────────────────────────────────────────────────────────────────

γ̂ = glm(@formula(employed ~ elwage + numkids), df, Binomial(), LogitLink())
# returns
Coefficients:
──────────────────────────────────────────────────────────────────────────────
               Estimate  Std. Error   z value  Pr(>|z|)  Lower 95%   Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept)  -4.25036     0.454826   -9.34503    <1e-20  -5.1418    -3.35892
elwage        1.80081     0.149078   12.0796     <1e-32   1.50863    2.093
numkids      -0.0797204   0.0218106  -3.65512    0.0003  -0.122468  -0.0369724
──────────────────────────────────────────────────────────────────────────────
```
]

---

# Do these results make sense?

- It can be informative to try and interpret even these simple results

- wage equation:

- insignificant return to experience is surprising; otherwise makes sense
    
- schooling choice:

- If `efc` captures college costs, it should have a negative sign
    - This suggests omitted variable bias in this equation
    
- employment choice:

- These results check out; may want to introduce nonlinearities in `numkids`

---

# 6. Validation

- If you have a good model, it should be .hi[valid] (i.e. predict well out of sample)

- Validation is not always possible, but it's good to do if you can

- e.g. if experimental data, estimate on control group, validate on treatment group

- e.g. see if model can replicate major policy change in data

- More simply, you could throw out half your data, then try to predict other half

- This is typically not done if the full sample isn't huge

---

# 7. Policy Experiments

- This is the main reason to do structural estimation!

- Structural estimation `$\implies$` recovering the DGP of the model

- Once we know the DGP, we can simulate data from it and do policy experiments

- requires having policy-invariant parameters!

- We can predict the effects of:

- proposed policies
    
    - hypothetical policies
    
- Contrast with RCTs, which only reveal effects of implemented policies

---

# Example using real data

- We have two policy variables we could play with

1. `efc` (i.e. how much gov't subsidizes college tuition & fees)
    2. return to schooling (this could change due to e.g. technological change)

- Here's how we would look at a counterfactual with lower cost:
.scroll-box-4[

```julia
df_cfl     = deepcopy(df)
df_cfl.efc = df.efc .- 1         # change value of efc to be $1,000 less
df.basesch = predict(α̂, df)     # predicted collgrad probabilities under status quo
df.cflsch  = predict(α̂, df_cfl) # predicted collgrad probabilities under counterfactual
describe(df;cols=[:basesch,:cflsch])
# returns
│ Row │ variable │ mean     │ min      │ median   │ max      │ nunique │ nmissing │
│     │ Symbol   │ Float64  │ Float64  │ Float64  │ Float64  │ Nothing │ Int64    │
├─────┼──────────┼──────────┼──────────┼──────────┼──────────┼─────────┼──────────┤
│ 1   │ basesch  │ 0.350474 │ 0.231313 │ 0.24387  │ 0.986715 │         │ 0        │
│ 2   │ cflsch   │ 0.341794 │ 0.223404 │ 0.235663 │ 0.986111 │         │ 0        │
```
]
- Average likelihood of `collgrad` _declines_ from 35% to 34.2%

- This doesn't make sense because the `efc` coefficient didn't make sense

---

# Example using real data

- We can't assess the counterfactual of increasing the return to schooling

- Because `elwage` doesn't directly enter the `collgrad` logit model

- This is because we aren't really estimating the dynamic model yet

- We will learn how to do this in the near future

---

# In summary: Why structural estimation?

- Want to examine effects of policies not yet implemented

- Learn more about economics by looking through the lens of a model

- Assess performance of theoretical models in explaining real-world data

- Can be used to build up long-run "canonical" models of behavior in many areas

- It can be really fun to do more complicated econometrics beyond simple regressions

- Observational data is much cheaper to collect than experimental data

---

# In summary: Why _not_ structural estimation?

- It's really difficult to write down and estimate a tractable, realistic model!

- It requires additional effort beyond data preparation and running regressions

- Understanding identification of the model takes a lot of effort, too

- It can be really miserable to try and debug the code of a structural estimation

- Many structural models can take weeks to estimate one specification

- in addition to months spent coding/debugging beforehand

- As you can see, even with a simple model things have already gotten complicated!

---

# References
.smallest[
Angrist, J. D. and A. B. Krueger (1991). "Does Compulsory School
Attendance Affect Schooling and Earnings?" In: _Quarterly Journal of
Economics_ 106.4, pp. 979-1014. DOI:
[10.2307/2937954](https://doi.org/10.2307%2F2937954).

Attanasio, O. P, C. Meghir, and A. Santiago (2011). "Education Choices
in Mexico: Using a Structural Model and a Randomized Experiment to
Evaluate PROGRESA". In: _Review of Economic Studies_ 79.1, pp. 37-66.
DOI:
[10.1093/restud/rdr015](https://doi.org/10.1093%2Frestud%2Frdr015).

Card, D. (1995). "Using Geographic Variation in College Proximity to
Estimate the Return to Schooling". In: _Aspects of Labor Market
Behaviour: Essays in Honour of John Vanderkamp_. Ed. by L. N.
Christofides, E. K. Grant and R. Swidinsky. Toronto: University of
Toronto Press.

Mincer, J. (1974). _Schooling, Experience and Earnings_. New York:
Columbia University Press for National Bureau of Economic Research.

Ost, B, W. Pan, and D. Webber (2018). "The Returns to College
Persistence for Marginal Students: Regression Discontinuity Evidence
from University Dismissal Policies". In: _Journal of Labor Economics_
36.3, pp. 779-805. DOI:
[10.1086/696204](https://doi.org/10.1086%2F696204).
]