class: title-slide <br><br><br> # Lecture 14 ## Advanced Optimization Techniques ### Tyler Ransom ### ECON 6343, University of Oklahoma --- # Plan for the Day 1. More on optimization and optimizers 2. Constrained optimization 3. Analytical gradients and hessians 4. Fixed points and MPEC --- # Beyond nonlinear optimization - Throughout this course, we've focused on Julia's `Optim` package - This package provides algorithms for nonlinear unconstrained optimization - This makes sense: likelihood functions are nonlinear - But there are many other types of optimizers out there - They may not be as applicable for econometric applications - But they might be helpful for certain applications --- # Other optimizers - Aside from nonlinear optimization, there are: - Linear programming - Mixed integer programming - Semidefinite programming - Convex optimization - Constrained nonlinear optimization - Each of these algorithms can be accessed through Julia's `JuMP` package --- # JuMP uses - These other algorithms in JuMP have valuable real-world uses - Optimal bus routes - Optimal power grid architecture - Solving budget constraint problems - Solving sudoku puzzles - The great thing about JuMP is that it .hi[interfaces] with a plethora of optimizers - You can keep your code the same and simply switch out which optimizer to use --- # Why do we need constrained optimization? - In nonlinear optimization, constraints can be very helpful, for a number of reasons: - .hi[Numerical stability] - e.g. optimization will crash if it guesses a variance to be negative - .hi[Make results consistent with economic theory] - e.g. discount factor `\(\beta \in [0,1]\)` in DDC models, otherwise model is undefined - .hi[Simplify the problem] - e.g. `\(\beta=0\)` reduces to a static model - .hi[More quickly solve equilibrium models] through a method called MPEC --- # Brief review of constrained optimization - How do we do constrained optimization in economics? Lagrangians! `\begin{align*} & \max_x f(x) \\ & \text{subject to} \\ & g(x)\leq 0 \end{align*}` `\begin{align*} \mathcal{L}(x,\lambda) &= f(x) - \lambda g(x) \end{align*}` - In the case of optimization, `\(f(x)\)` is our likelihood function; `\(x\)`'s are the parameters - The first-order conditions (FOCs) tell us what the optimal `\(x\)`'s are - Also must satisfy second-order (SOCs) and Kuhn-Tucker conditions - In this case, the SOCs involve looking at the .hi[bordered Hessian] --- # How to use JuMP - Let's go through an example of how to estimate an econometric model with `JuMP` - There are four basic components to any `JuMP` model: 1. An optimizer 2. Variables 3. Constraints 4. Objective function - This list is not too different from what goes into `Optim.jl` --- # Limitations and considerations when using JuMP - You cannot vectorize the objective function - i.e. everything needs to be expressed as a scalar - You cannot use `Distributions.jl` objects in the objective function - It is a royal pain to extract the Hessian of the objective function - We need the Hessian to conduct statistical inference - It is very simple to add constraints - When adding constraints, the Hessian becomes the bordered Hessian - This gives incorrect SEs; JuMP also ignores linear constraints in the Hessian --- # Example: unconstrained optimization with Optim - This example will estimate a linear regression model by maximum likelihood - The objective is to maximize `\(\ell = \sum_i \log f_i(\beta;y_i,X_i)\)` where `\(f(\cdot)\)` is the normal pdf .scroll-box-12[ ``` julia using JuMP, Ipopt, Optim, LineSearches, LinearAlgebra, SparseArrays, Distributions, DataFrames, CSV, HTTP # Let's read in the data from PS8 url = "https://raw.githubusercontent.com/OU-PhD-Econometrics/fall-2021/master/ProblemSets/PS8-factor/nlsy.csv" df = CSV.read(HTTP.get(url).body, DataFrame) X = [df.black df.hispanic df.female df.schoolt df.gradHS df.grad4yr ones(size(df,1),1)] y = df.logwage # first let's do unconstrained optimization function reg_mle(θ, X, y) # first K elements are the coefficients of the outcome equation β = θ[1:end-1] # last element is the variance (stdev) σ = θ[end] # now build the likelihood loglike = -sum(-.5 .* ( log(2*π) .+ log(σ^2) .+ ( (y .- X*β)./σ ).^2 ) ) # more intuitive way? (but JuMP can't use pdf's from Distributions.jl) #loglike = -sum( log(1 ./ sqrt(σ^2)) .+ logpdf.(Normal(0,1),(y .- X*β)./sqrt(σ^2)) ) return loglike end # run the optimizer for MLE svals = vcat(X\y,.5); td = TwiceDifferentiable(th -> reg_mle(th, X, y), svals; autodiff = :forward) θ̂_optim_ad = optimize(td, svals, Newton(linesearch = BackTracking()), Optim.Options(g_tol = 1e-5, iterations=100_000, show_trace=true, show_every=1)) θ̂_mle_optim_ad = θ̂_optim_ad.minimizer loglikeval = θ̂_optim_ad.minimum # evaluate the Hessian at the estimates H = Optim.hessian!(td, θ̂_mle_optim_ad) θ̂_mle_optim_ad_se = sqrt.(diag(inv(H))) # store results in a data frame results = DataFrame(coef_mle = vcat(vec(θ̂_mle_optim_ad),-loglikeval), se_mle = vcat(vec(θ̂_mle_optim_ad_se),missing), coef_ols = vcat(X\y,missing,missing) ) │ Row │ coef_mle │ se_mle │ coef_ols │ ├─────┼────────────┼────────────┼────────────┼ │ 1 │ -0.167441 │ 0.0242349 │ -0.167441 │ │ 2 │ -0.054249 │ 0.0257493 │ -0.054249 │ │ 3 │ -0.155049 │ 0.0197612 │ -0.155049 │ │ 4 │ 0.00525102 │ 0.00493929 │ 0.00525102 │ │ 5 │ 0.195649 │ 0.0493521 │ 0.195649 │ │ 6 │ 0.299131 │ 0.0276513 │ 0.299131 │ │ 7 │ 2.00771 │ 0.0485718 │ 2.00771 │ │ 8 │ 0.476761 │ 0.00682761 │ missing │ │ 9 │ -1653.45 │ missing │ missing │ ``` ] --- # Same example using JuMP .scroll-box-18[ ``` julia # we need this function for later function dense_hessian(hessian_sparsity, V, n) I = [i for (i,j) in hessian_sparsity] J = [j for (i,j) in hessian_sparsity] raw = sparse(I, J, V, n, n) return Matrix(raw + raw' - sparse(diagm(0=>diag(raw)))) end function jump_mle(θ₀, X, y) # define the model model = Model(Ipopt.Optimizer) set_silent(model) @variable(model, β[j=1:size(X,2)], start = θ₀[j]) @variable(model, σ, start = θ₀[end]) @NLobjective(model, Max, sum(-.5 * ( log(2*π) + log(σ^2) + ((y[i] - sum(X[i,j]*β[j] for j in 1:size(X,2)) )/σ)^2 ) for i in 1:size(X,1) ) ) # optimize the model JuMP.optimize!(model) # return parameter estimates coef_jump = vcat(JuMP.value.(β), JuMP.value(σ), JuMP.objective_value(model) ) # return Hessian for SEs values = coef_jump[1:end-1] MOI = JuMP.MathOptInterface d = JuMP.NLPEvaluator(model) MOI.initialize(d, [:Hess]) hessian_sparsity = MOI.hessian_lagrangian_structure(d) V = zeros(length(hessian_sparsity)) MOI.eval_hessian_lagrangian(d, V, values, 1.0, Float64[]) H = dense_hessian(hessian_sparsity, V, length(values)) se_jump = sqrt.(diag(inv(-H))) return coef_jump, se_jump end jump_coefs,jump_se = jump_mle(svals, X, y) results.coef_jump = jump_coefs results.se_jump = vcat(jump_se,missing) │ Row │ coef_mle │ se_mle │ coef_jump │ se_jump │ ├─────┼────────────┼────────────┼────────────┼────────────┼ │ 1 │ -0.167441 │ 0.0242349 │ -0.167441 │ 0.0242349 │ │ 2 │ -0.054249 │ 0.0257493 │ -0.054249 │ 0.0257493 │ │ 3 │ -0.155049 │ 0.0197612 │ -0.155049 │ 0.0197612 │ │ 4 │ 0.00525102 │ 0.00493929 │ 0.00525102 │ 0.00493929 │ │ 5 │ 0.195649 │ 0.0493521 │ 0.195649 │ 0.0493521 │ │ 6 │ 0.299131 │ 0.0276513 │ 0.299131 │ 0.0276513 │ │ 7 │ 2.00771 │ 0.0485718 │ 2.00771 │ 0.0485718 │ │ 8 │ 0.476761 │ 0.00682761 │ 0.476761 │ 0.00682761 │ │ 9 │ -1653.45 │ missing │ -1653.45 │ missing │ ``` ] --- # Doing constrained optimization - In `JuMP`, it is simple to add a constraint - Simply add, for example, `@constraint(model, β[2] == .16)` - In `Optim`, it is a little bit trickier - In this case, we need to treat the constrained parameter as "data" - We need to reduce the dimensionality of the vector we're estimating - Then we need to impose the constraint - We also need to repeat these steps outside of the optimization --- # Constrain `\(\beta_2 = .16, \beta_4 = 1+2\beta_3\)` in JuMP - In `JuMP`, we have .scroll-box-16[ ``` julia function jump_cns2_mle(θ₀, X, y) # define the model model = Model(Ipopt.Optimizer) set_silent(model) @variable(model, β[j=1:size(X,2)], start = θ₀[j]) @variable(model, σ, start = θ₀[end]) * @constraint(model, β[2] == .16) * @constraint(model, β[4] == 1+2*β[3]) @NLobjective(model, Max, sum(-.5 * ( log(2*π) + log(σ^2) + ((y[i] - sum(X[i,j]*β[j] for j in 1:size(X,2)) )/σ)^2 ) for i in 1:size(X,1) ) ) # optimize the model JuMP.optimize!(model) # return parameter estimates coef_jump = vcat(JuMP.value.(β), JuMP.value(σ), JuMP.objective_value(model) ) # return Hessian for SEs values = coef_jump[1:end-1] MOI = JuMP.MathOptInterface d = JuMP.NLPEvaluator(model) MOI.initialize(d, [:Hess]) hessian_sparsity = MOI.hessian_lagrangian_structure(d) V = zeros(length(hessian_sparsity)) MOI.eval_hessian_lagrangian(d, V, values, 1.0, Float64[]) H = dense_hessian(hessian_sparsity, V, length(values)) se_jump = sqrt.(Complex.(diag(inv(-H)))) return coef_jump, se_jump end jump_cns2_coefs,jump_cns2_se = jump_cns2_mle(svals, X, y) results.coef_jump_cns2 = jump_cns2_coefs results.se_jump_cns2 = vcat(jump_cns2_se,missing) │ Row │ coef_jump │ se_jump │ coef_jump_cns2 │ se_jump_cns2 │ │ │ Float64 │ Float64? │ Float64 │ Float64? │ ├─────┼────────────┼────────────┼────────────────┼──────────────┤ │ 1 │ -0.167441 │ 0.0242349 │ -0.0823977 │ 0.0260503 │ │ 2 │ -0.054249 │ 0.0257493 │ 0.16 │ 0.0284127 │ │ 3 │ -0.155049 │ 0.0197612 │ -0.488346 │ 0.0238168 │ │ 4 │ 0.00525102 │ 0.00493929 │ 0.0233084 │ 0.00531191 │ │ 5 │ 0.195649 │ 0.0493521 │ 0.248021 │ 0.0527678 │ │ 6 │ 0.299131 │ 0.0276513 │ 0.308031 │ 0.0295506 │ │ 7 │ 2.00771 │ 0.0485718 │ 1.96151 │ 0.051928 │ │ 8 │ 0.476761 │ 0.00682761 │ 0.509483 │ 0.00841742 │ │ 9 │ -1653.45 │ missing │ -1815.29 │ missing │ ``` ] --- # Comments on JuMP results - `JuMP` gives us the correct point estimates - The standard errors, however, are incorrect - At the very least, `\(\text{se}(\beta_2)\)` should be 0 - Note that the constrained log likelihood is much lower; this is as it should be --- # Constrain `\(\beta_2 = .16, \beta_4 = 1+2\beta_3\)` in Optim - In `Optim`, it's helpful to create a matrix that stores our constraints .scroll-box-16[ ``` julia # can we use optim for the same constrained optimization? # now we need additional information: whether the constraint is "type 1" (set equal to fixed value) or "type 2" (set equal to another parameter) # first, set up constraints # Type 1 Restricting one parameter ("parmA") to equal a fixed value # Type 2 Restricting one parameter, parmA, to equal another ("parmB"), # potentially multiplied by some real number q and addd to # some constant m, e.g. parmA = m + q*parmB. # # RESTRMAT follows a very specific format. It is an R-by-5 matrix, # where R is the number of restrictions. The role of each of the four # columns is as follows # # Column 1 The index of parmA # Column 2 The index of parmB (zero if type 1 restriction) # Column 3 Binary vector w here 0 indciates a type 1 restriction (parmA # set equal to fixed value) and 1 indicates a type 2 # restriction (parmA set equal to parmB) # Column 4 If a type 1 restriction, 0. If a type 2 restriction, any # real number q such that parmA = q*parmB. # Column 5 If a type 1 restriction, the fixed value. If a type 2 # restriction, any real number m such that parmA = m+q*parmB. # NOTE: parmA should always be a later index than parmB cns_mat2 = [2 0 0 0 .16; 4 3 1 2 1] function cns2_reg_mle(θ, cns_mat, X, y) # first K elements are the coefficients of the outcome equation β = θ[1:end-1] # last element is the variance (stdev) σ = θ[end] # impose constraints for r in 1:size(cns_mat,1) idx1 = convert(Int64,cns_mat[r,1]) idx2 = convert(Int64,cns_mat[r,2]) if cns_mat[r,3]==0 insert!(β,idx1,cns_mat[r,5]) else insert!(β,idx1,cns_mat[r,5]+cns_mat[r,4]*β[idx2]) end end # now build the likelihood loglike = -sum(-.5 .* ( log(2*π) .+ log(σ^2) .+ ( (y .- X*β)./σ ).^2 ) ) # more intuitive way? (but JuMP can't use pdf's from Distributions.jl) #loglike = -sum( log(1 ./ sqrt(σ^2)) .+ logpdf.(Normal(0,1),(y .- X*β)./sqrt(σ^2)) ) return loglike end # run the optimizer for MLE svals = vcat(X\y,.5) # constraints are treated as data, so take them out of starting values (they get added back in inside the obj function) for r=1:size(cns_mat2,1) deleteat!(svals, convert(Int64,cns_mat2[r,1])) end td = TwiceDifferentiable(th -> cns2_reg_mle(th, cns_mat2, X, y), svals; autodiff = :forward) θ̂_optim_ad = optimize(td, svals, Newton(linesearch = BackTracking()), Optim.Options(g_tol = 1e-5, iterations=100_000, show_trace=true, show_every=1)) θ̂_mle_optim_ad = θ̂_optim_ad.minimizer loglikeval = θ̂_optim_ad.minimum # evaluate the Hessian at the estimates H = Optim.hessian!(td, θ̂_mle_optim_ad) θ̂_mle_optim_ad_se = sqrt.(diag(inv(H))) println(θ̂_mle_optim_ad) # add back in constraint in both estimates and SEs for r in 1:size(cns_mat2,1) idx1 = convert(Int64,cns_mat2[r,1]) idx2 = convert(Int64,cns_mat2[r,2]) if cns_mat2[r,3]==0 insert!(θ̂_mle_optim_ad,idx1,cns_mat2[r,5]) insert!(θ̂_mle_optim_ad_se,idx1,0) else insert!(θ̂_mle_optim_ad,idx1,cns_mat2[r,5]+cns_mat2[r,4]*θ̂_mle_optim_ad[idx2]) insert!(θ̂_mle_optim_ad_se,idx1,cns_mat2[r,5]+cns_mat2[r,4]*θ̂_mle_optim_ad_se[idx2]) # this is wrong end println(θ̂_mle_optim_ad) end # store results in a data frame results.coef_optim_cns2 = vcat(θ̂_mle_optim_ad,-loglikeval) results.se_optim_cns2 = vcat(θ̂_mle_optim_ad_se,missing) │ Row │ coef_mle │ se_mle │ coef_optim_cns2 │ se_optim_cns2 │ │ │ Float64 │ Float64? │ Float64 │ Float64? │ ├─────┼────────────┼────────────┼─────────────────┼───────────────┤ │ 1 │ -0.167441 │ 0.0242349 │ -0.0823977 │ 0.0246269 │ │ 2 │ -0.054249 │ 0.0257493 │ 0.16 │ 0.0 │ │ 3 │ -0.155049 │ 0.0197612 │ -0.488346 │ 0.00256683 │ │ 4 │ 0.00525102 │ 0.00493929 │ 0.0233084 │ 1.00513 │ │ 5 │ 0.195649 │ 0.0493521 │ 0.248021 │ 0.0526101 │ │ 6 │ 0.299131 │ 0.0276513 │ 0.308031 │ 0.0291047 │ │ 7 │ 2.00771 │ 0.0485718 │ 1.96151 │ 0.0506234 │ │ 8 │ 0.476761 │ 0.00682761 │ 0.509483 │ 0.00729623 │ │ 9 │ -1653.45 │ missing │ -1815.29 │ missing │ ``` ] --- # Analytical gradients and Hessians - So far in this course, we've used Julia's `autodiff` to take derivatives for us - In most cases, this will get you pretty close to as much speed as you'll need - But in some cases, you may require even more performance gains - In this case, it can be helpful to provide `Optim` with the analytical gradient - In one test I ran, the analytical gradient ran over .hi[3x faster] than autodiff --- # How to pass an analytical gradient to Optim - The example code below re-works question 1 from PS4 (multinomial logit) .scroll-box-16[ ``` julia @views @inline function asclogit(bstart::Vector,Y::Array,X::Array,Z::Array,J::Int64,baseAlt::Int64=J,W::Array=ones(length(Y))) ## error checking @assert ((!isempty(X) || !isempty(Z)) && !isempty(Y)) "You must supply data to the model" @assert (ndims(Y)==1 && size(Y,2)==1) "Y must be a 1-D Array" @assert (minimum(Y)==1 && maximum(Y)==J) "Y should contain integers numbered consecutively from 1 through J" if !isempty(X) @assert ndims(X)==2 "X must be a 2-dimensional matrix" @assert size(X,1)==size(Y,1) "The 1st dimension of X should equal the number of observations in Y" end if !isempty(Z) @assert ndims(Z)==3 "Z must be a 3-dimensional tensor" @assert size(Z,1)==size(Y,1) "The 1st dimension of Z should equal the number of observations in Y" @assert size(Z,3)==J "The 3rd dimension of Z should equal the number of choice alternatives" end K1 = size(X,2) K2 = size(Z,2) jdx = setdiff(1:J,baseAlt) function f(b) T = promote_type(promote_type(promote_type(eltype(X),eltype(b)),eltype(Z)),eltype(W)) num = zeros(T,size(Y)) dem = zeros(T,size(Y)) temp = zeros(T,size(Y)) numer = ones(T,size(Y,1),J) P = zeros(T,size(Y,1),J) ℓ = zero(T) b2 = b[K1*(J-1)+1:K1*(J-1)+K2] k = 1 for j in 1:J if j != baseAlt temp .= X*b[(k-1)*K1+1:k*K1] .+ (Z[:,:,j].-Z[:,:,baseAlt])*b2 num .= (Y.==j).*temp.+num dem .+= exp.(temp) numer[:,j] .= exp.(temp) k += 1 end end dem.+=1 P .=numer./(1 .+ sum(numer;dims=2)) ℓ = -W'*(num.-log.(dem)) end function g!(G,b) T = promote_type(promote_type(promote_type(eltype(X),eltype(b)),eltype(Z)),eltype(W)) numer = zeros(T,size(Y,1),J) P = zeros(T,size(Y,1),J) numg = zeros(T,K2) demg = zeros(T,K2) b2 = b[K1*(J-1)+1:K1*(J-1)+K2] G .= T(0) k = 1 for j in 1:J if j != baseAlt numer[:,j] .= exp.( X*b[(k-1)*K1+1:k*K1] .+ (Z[:,:,j].-Z[:,:,baseAlt])*b2 ) k += 1 end end P .=numer./(1 .+ sum(numer;dims=2)) k = 1 for j in 1:J if j != baseAlt G[(k-1)*K1+1:k*K1] .= -X'*(W.*((Y.==j).-P[:,j])) k += 1 end end for j in 1:J if j != baseAlt numg .-= (Z[:,:,j].-Z[:,:,baseAlt])'*(W.*(Y.==j)) demg .-= (Z[:,:,j].-Z[:,:,baseAlt])'*(W.*P[:,j]) end end G[K1*(J-1)+1:K1*(J-1)+K2] .= numg.-demg return nothing end td = TwiceDifferentiable(f, g!, bstart, autodiff = :forwarddiff) rs = optimize(td, bstart, LBFGS(; linesearch = LineSearches.BackTracking()), Optim.Options(iterations=100_000,g_tol=1e-8,f_tol=1e-8,x_tol=1e-8,show_trace=true)) β = Optim.minimizer(rs) ℓ = Optim.minimum(rs)*(-1) H = Optim.hessian!(td, β) g = Optim.gradient!(td, β) se = sqrt.(diag(inv(H))) return β,se,ℓ,g end url = "https://raw.githubusercontent.com/OU-PhD-Econometrics/fall-2021/master/ProblemSets/PS4-mixture/nlsw88t.csv" dff = CSV.read(HTTP.get(url).body, DataFrame) XX = [dff.age dff.white dff.collgrad] ZZ = cat(dff.elnwage1, dff.elnwage2, dff.elnwage3, dff.elnwage4, dff.elnwage5, dff.elnwage6, dff.elnwage7, dff.elnwage8; dims=3) yy = dff.occ_code J = 8 startvals = [2*rand(7*size(XX,2)).-1; .1] β,se,ℓ,g = asclogit(startvals,yy,XX,ZZ,J,J,ones(length(yy))) dfr = DataFrame(β=β,se=se) @show dfr ``` ] --- # Solving for equilibria using a contraction mapping - In many instances, we may want to solve for an equilibrium - This is especially common in IO applications, where firms interact strategically - The goal is to estimate preference parameters consistent with the equilibrium - This would typically involve some kind of a contraction mapping: - Take a guess at the parameter values - Conditional on the parameter values, solve for the equilibrium - Update the parameter values, re-solve for the equilibrium, ... --- # Solving for equilibria using MPEC - An alternative approach to solving for equilibria is MPEC - .hi[MPEC:] Mathematical Programming with Equilibrium Constraints - With MPEC, we re-cast the equilibrium as a set of optimization constraints - This reduces the need to solve for a fixed point - Typically the estimation converges much faster - because the optimizer sees the constraints and makes "smarter" guesses --- # MPEC in the Rust bus engine problem - An alternative approach to solving for equilibria is MPEC - Su and Judd (2012) compare MPEC with NFXP in the Rust (1987) model - They show that MPEC converges about .hi[800x faster] --- # MPEC example: Cournot oligopoly - `JuMP`, with an add-on package (`Complementarity.jl`), supports MPEC - Example: `\(N\)`-firm symmetric Cournot monopolistic competition - Each firm `\(i\)` chooses output `\(q_i\)` to maximize profit, subject to `\(q_{-i}\)` (others' output decisions) - Marginal cost `\(c_i\)` is assumed to be constant and equal across all firms - Market demand is given by `\(P(Q) = a - bQ\)` - This problem is easy to solve analytically: `\begin{align*} q^* &= \frac{a-c}{b(N+1)}, & P^* &= \frac{a+Nc}{N+1} \end{align*}` --- # Cournot oligopoly using JuMP - The code for solving this in JuMP is below - We could easily generalize this to non-linear demand, asymmetric `\(c_i\)`, etc. .scroll-box-12[ ``` julia using JuMP, Ipopt, Complementarity # simplified version of cournot demo (linear demand, symmetric costs) function cournot_symm(; N = 7, mc = 20, a=100, b=2, L = 1000) c = mc*ones(N) m = Model(Ipopt.Optimizer) @variable(m, 0 <= x <= L) # firm 1 output @variable(m, y[1:N-1]) # other firms' output @variable(m, l[1:N-1]) # legrange multipliers @variable(m, Q >= 0) # total market output @constraint(m, Q == x+sum(y[i] for i in 1:N-1)) @constraint(m, x == y[1]) @NLobjective(m, Min, c[1]*x - x*( a - b*Q ) ) # firm 1's objective @NLconstraint(m, cnstr[i=1:N-1], 0 == ( c[i+1] ) - ( a - b*Q ) - y[i]*( -b ) - l[i] ) # other firms' FOCs for i in 1:N-1 @complements(m, l[i], 0 <= y[i] <= L, smooth) end optimize!(m) @show getobjectivevalue(m) @show getvalue.(x) @show getvalue.(y) @show getvalue.(l) @show getvalue.(Q) @show P = a - b*getvalue.(Q) @assert isapprox(getvalue.(x), (a-mc)/(b*(N+1)), atol=1e-4) return P end P7 = cournot_symm() P17 = cournot_symm(L=17) ``` ] --- # References Ackerberg, D. A. (2003). "Advertising, Learning, and Consumer Choice in Experience Good Markets: An Empirical Examination". In: _International Economic Review_ 44.3, pp. 1007-1040. DOI: [10.1111/1468-2354.t01-2-00098](https://doi.org/10.1111%2F1468-2354.t01-2-00098). Adams, R. P. (2018). _Model Selection and Cross Validation_. Lecture Notes. Princeton University. URL: [https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf](https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/model-selection.pdf). Ahlfeldt, G. M., S. J. Redding, D. M. Sturm, et al. (2015). "The Economics of Density: Evidence From the Berlin Wall". In: _Econometrica_ 83.6, pp. 2127-2189. DOI: [10.3982/ECTA10876](https://doi.org/10.3982%2FECTA10876). Altonji, J. G., T. E. Elder, and C. R. Taber (2005). "Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools". In: _Journal of Political Economy_ 113.1, pp. 151-184. DOI: [10.1086/426036](https://doi.org/10.1086%2F426036). Altonji, J. G. and C. R. Pierret (2001). "Employer Learning and Statistical Discrimination". In: _Quarterly Journal of Economics_ 116.1, pp. 313-350. DOI: [10.1162/003355301556329](https://doi.org/10.1162%2F003355301556329). Angrist, J. D. and A. B. Krueger (1991). "Does Compulsory School Attendance Affect Schooling and Earnings?" In: _Quarterly Journal of Economics_ 106.4, pp. 979-1014. DOI: [10.2307/2937954](https://doi.org/10.2307%2F2937954). Angrist, J. D. and J. Pischke (2009). _Mostly Harmless Econometrics: An Empiricist's Companion_. Princeton University Press. ISBN: 0691120358. Arcidiacono, P. (2004). "Ability Sorting and the Returns to College Major". In: _Journal of Econometrics_ 121, pp. 343-375. DOI: [10.1016/j.jeconom.2003.10.010](https://doi.org/10.1016%2Fj.jeconom.2003.10.010). Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2016). _College Attrition and the Dynamics of Information Revelation_. Working Paper. Duke University. URL: [https://tyleransom.github.io/research/CollegeDropout2016May31.pdf](https://tyleransom.github.io/research/CollegeDropout2016May31.pdf). Arcidiacono, P., E. Aucejo, A. Maurel, et al. (2025). "College Attrition and the Dynamics of Information Revelation". In: _Journal of Political Economy_ 133.1. DOI: [10.1086/732526](https://doi.org/10.1086%2F732526). Arcidiacono, P. and J. B. Jones (2003). "Finite Mixture Distributions, Sequential Likelihood and the EM Algorithm". In: _Econometrica_ 71.3, pp. 933-946. DOI: [10.1111/1468-0262.00431](https://doi.org/10.1111%2F1468-0262.00431). Arcidiacono, P., J. Kinsler, and T. Ransom (2022b). "Asian American Discrimination in Harvard Admissions". In: _European Economic Review_ 144, p. 104079. DOI: [10.1016/j.euroecorev.2022.104079](https://doi.org/10.1016%2Fj.euroecorev.2022.104079). Arcidiacono, P., J. Kinsler, and T. Ransom (2022a). "Legacy and Athlete Preferences at Harvard". In: _Journal of Labor Economics_ 40.1, pp. 133-156. DOI: [10.1086/713744](https://doi.org/10.1086%2F713744). Arcidiacono, P. and R. A. Miller (2011). "Conditional Choice Probability Estimation of Dynamic Discrete Choice Models With Unobserved Heterogeneity". In: _Econometrica_ 79.6, pp. 1823-1867. DOI: [10.3982/ECTA7743](https://doi.org/10.3982%2FECTA7743). Arroyo Marioli, F., F. Bullano, S. Kucinskas, et al. (2020). _Tracking R of COVID-19: A New Real-Time Estimation Using the Kalman Filter_. Working Paper. medRxiv. DOI: [10.1101/2020.04.19.20071886](https://doi.org/10.1101%2F2020.04.19.20071886). Ashworth, J., V. J. Hotz, A. Maurel, et al. (2021). "Changes across Cohorts in Wage Returns to Schooling and Early Work Experiences". In: _Journal of Labor Economics_ 39.4, pp. 931-964. DOI: [10.1086/711851](https://doi.org/10.1086%2F711851). Attanasio, O. P., C. Meghir, and A. Santiago (2011). "Education Choices in Mexico: Using a Structural Model and a Randomized Experiment to Evaluate PROGRESA". In: _Review of Economic Studies_ 79.1, pp. 37-66. DOI: [10.1093/restud/rdr015](https://doi.org/10.1093%2Frestud%2Frdr015). Aucejo, E. M. and J. James (2019). "Catching Up to Girls: Understanding the Gender Imbalance in Educational Attainment Within Race". In: _Journal of Applied Econometrics_ 34.4, pp. 502-525. DOI: [10.1002/jae.2699](https://doi.org/10.1002%2Fjae.2699). Baragatti, M., A. Grimaud, and D. Pommeret (2013). "Likelihood-free Parallel Tempering". In: _Statistics and Computing_ 23.4, pp. 535-549. DOI: [ 10.1007/s11222-012-9328-6](https://doi.org/%2010.1007%2Fs11222-012-9328-6). Bayer, P., R. McMillan, A. Murphy, et al. (2016). "A Dynamic Model of Demand for Houses and Neighborhoods". In: _Econometrica_ 84.3, pp. 893-942. DOI: [10.3982/ECTA10170](https://doi.org/10.3982%2FECTA10170). Begg, C. B. and R. Gray (1984). "Calculation of Polychotomous Logistic Regression Parameters Using Individualized Regressions". In: _Biometrika_ 71.1, pp. 11-18. DOI: [10.1093/biomet/71.1.11](https://doi.org/10.1093%2Fbiomet%2F71.1.11). Beggs, S. D., N. S. Cardell, and J. Hausman (1981). "Assessing the Potential Demand for Electric Cars". In: _Journal of Econometrics_ 17.1, pp. 1-19. DOI: [10.1016/0304-4076(81)90056-7](https://doi.org/10.1016%2F0304-4076%2881%2990056-7). Berry, S., J. Levinsohn, and A. Pakes (1995). "Automobile Prices in Market Equilibrium". In: _Econometrica_ 63.4, pp. 841-890. URL: [http://www.jstor.org/stable/2171802](http://www.jstor.org/stable/2171802). Blass, A. A., S. Lach, and C. F. Manski (2010). "Using Elicited Choice Probabilities to Estimate Random Utility Models: Preferences for Electricity Reliability". In: _International Economic Review_ 51.2, pp. 421-440. DOI: [10.1111/j.1468-2354.2010.00586.x](https://doi.org/10.1111%2Fj.1468-2354.2010.00586.x). Blundell, R. (2010). "Comments on: ``Structural vs. Atheoretic Approaches to Econometrics'' by Michael Keane". In: _Journal of Econometrics_ 156.1, pp. 25-26. DOI: [10.1016/j.jeconom.2009.09.005](https://doi.org/10.1016%2Fj.jeconom.2009.09.005). Bresnahan, T. F., S. Stern, and M. Trajtenberg (1997). "Market Segmentation and the Sources of Rents from Innovation: Personal Computers in the Late 1980s". In: _The RAND Journal of Economics_ 28.0, pp. S17-S44. DOI: [10.2307/3087454](https://doi.org/10.2307%2F3087454). Brien, M. J., L. A. Lillard, and S. Stern (2006). "Cohabitation, Marriage, and Divorce in a Model of Match Quality". In: _International Economic Review_ 47.2, pp. 451-494. DOI: [10.1111/j.1468-2354.2006.00385.x](https://doi.org/10.1111%2Fj.1468-2354.2006.00385.x). Card, D. (1995). "Using Geographic Variation in College Proximity to Estimate the Return to Schooling". In: _Aspects of Labor Market Behaviour: Essays in Honour of John Vanderkamp_. Ed. by L. N. Christofides, E. K. Grant and R. Swidinsky. Toronto: University of Toronto Press. Cardell, N. S. (1997). "Variance Components Structures for the Extreme-Value and Logistic Distributions with Application to Models of Heterogeneity". In: _Econometric Theory_ 13.2, pp. 185-213. URL: [https://www.jstor.org/stable/3532724](https://www.jstor.org/stable/3532724). Caucutt, E. M., L. Lochner, J. Mullins, et al. (2020). _Child Skill Production: Accounting for Parental and Market-Based Time and Goods Investments_. Working Paper 27838. National Bureau of Economic Research. DOI: [10.3386/w27838](https://doi.org/10.3386%2Fw27838). Chen, X., H. Hong, and D. Nekipelov (2011). "Nonlinear Models of Measurement Errors". In: _Journal of Economic Literature_ 49.4, pp. 901-937. DOI: [10.1257/jel.49.4.901](https://doi.org/10.1257%2Fjel.49.4.901). Chintagunta, P. K. (1992). "Estimating a Multinomial Probit Model of Brand Choice Using the Method of Simulated Moments". In: _Marketing Science_ 11.4, pp. 386-407. DOI: [10.1287/mksc.11.4.386](https://doi.org/10.1287%2Fmksc.11.4.386). Cinelli, C. and C. Hazlett (2020). "Making Sense of Sensitivity: Extending Omitted Variable Bias". In: _Journal of the Royal Statistical Society: Series B (Statistical Methodology)_ 82.1, pp. 39-67. DOI: [10.1111/rssb.12348](https://doi.org/10.1111%2Frssb.12348). Coate, P. and K. Mangum (2019). _Fast Locations and Slowing Labor Mobility_. Working Paper 19-49. Federal Reserve Bank of Philadelphia. Cunha, F., J. J. Heckman, and S. M. Schennach (2010). "Estimating the Technology of Cognitive and Noncognitive Skill Formation". In: _Econometrica_ 78.3, pp. 883-931. DOI: [10.3982/ECTA6551](https://doi.org/10.3982%2FECTA6551). Cunningham, S. (2021). _Causal Inference: The Mixtape_. Yale University Press. URL: [https://www.scunning.com/causalinference_norap.pdf](https://www.scunning.com/causalinference_norap.pdf). Delavande, A. and C. F. Manski (2015). "Using Elicited Choice Probabilities in Hypothetical Elections to Study Decisions to Vote". In: _Electoral Studies_ 38, pp. 28-37. DOI: [10.1016/j.electstud.2015.01.006](https://doi.org/10.1016%2Fj.electstud.2015.01.006). Delavande, A. and B. Zafar (2019). "University Choice: The Role of Expected Earnings, Nonpecuniary Outcomes, and Financial Constraints". In: _Journal of Political Economy_ 127.5, pp. 2343-2393. DOI: [10.1086/701808](https://doi.org/10.1086%2F701808). Diegert, P., M. A. Masten, and A. Poirier (2025). _Assessing Omitted Variable Bias when the Controls are Endogenous_. arXiv. DOI: [10.48550/ARXIV.2206.02303](https://doi.org/10.48550%2FARXIV.2206.02303). Erdem, T. and M. P. Keane (1996). "Decision-Making under Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent Consumer Goods Markets". In: _Marketing Science_ 15.1, pp. 1-20. DOI: [10.1287/mksc.15.1.1](https://doi.org/10.1287%2Fmksc.15.1.1). Evans, R. W. (2018). _Simulated Method of Moments (SMM) Estimation_. QuantEcon Note. University of Chicago. URL: [https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93](https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93). Farber, H. S. and R. Gibbons (1996). "Learning and Wage Dynamics". In: _Quarterly Journal of Economics_ 111.4, pp. 1007-1047. DOI: [10.2307/2946706](https://doi.org/10.2307%2F2946706). Fu, C., N. Grau, and J. Rivera (2020). _Wandering Astray: Teenagers' Choices of Schooling and Crime_. Working Paper. University of Wisconsin-Madison. URL: [https://www.ssc.wisc.edu/~cfu/wander.pdf](https://www.ssc.wisc.edu/~cfu/wander.pdf). Gillingham, K., F. Iskhakov, A. Munk-Nielsen, et al. (2022). "Equilibrium Trade in Automobiles". In: _Journal of Political Economy_. DOI: [10.1086/720463](https://doi.org/10.1086%2F720463). Haile, P. (2019). _``Structural vs. Reduced Form'' Language and Models in Empirical Economics_. Lecture Slides. Yale University. URL: [http://www.econ.yale.edu/~pah29/intro.pdf](http://www.econ.yale.edu/~pah29/intro.pdf). Haile, P. (2024). _Models, Measurement, and the Language of Empirical Economics_. Lecture Slides. Yale University. URL: [https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf](https://www.dropbox.com/s/8kwtwn30dyac18s/intro.pdf). Heckman, J. J., J. Stixrud, and S. Urzua (2006). "The Effects of Cognitive and Noncognitive Abilities on Labor Market Outcomes and Social Behavior". In: _Journal of Labor Economics_ 24.3, pp. 411-482. DOI: [10.1086/504455](https://doi.org/10.1086%2F504455). Hotz, V. J. and R. A. Miller (1993). "Conditional Choice Probabilities and the Estimation of Dynamic Models". In: _The Review of Economic Studies_ 60.3, pp. 497-529. DOI: [10.2307/2298122](https://doi.org/10.2307%2F2298122). Hurwicz, L. (1950). "Generalization of the Concept of Identification". In: _Statistical Inference in Dynamic Economic Models_. Hoboken, NJ: John Wiley and Sons, pp. 245-257. Ishimaru, S. (2022). _Geographic Mobility of Youth and Spatial Gaps in Local College and Labor Market Opportunities_. Working Paper. Hitotsubashi University. James, J. (2011). _Ability Matching and Occupational Choice_. Working Paper 11-25. Federal Reserve Bank of Cleveland. James, J. (2017). "MM Algorithm for General Mixed Multinomial Logit Models". In: _Journal of Applied Econometrics_ 32.4, pp. 841-857. DOI: [10.1002/jae.2532](https://doi.org/10.1002%2Fjae.2532). Jin, H. and H. Shen (2020). "Foreign Asset Accumulation Among Emerging Market Economies: A Case for Coordination". In: _Review of Economic Dynamics_ 35.1, pp. 54-73. DOI: [10.1016/j.red.2019.04.006](https://doi.org/10.1016%2Fj.red.2019.04.006). Keane, M. P. (2010). "Structural vs. Atheoretic Approaches to Econometrics". In: _Journal of Econometrics_ 156.1, pp. 3-20. DOI: [10.1016/j.jeconom.2009.09.003](https://doi.org/10.1016%2Fj.jeconom.2009.09.003). Keane, M. P. and K. I. Wolpin (1997). "The Career Decisions of Young Men". In: _Journal of Political Economy_ 105.3, pp. 473-522. DOI: [10.1086/262080](https://doi.org/10.1086%2F262080). Koopmans, T. C. and O. Reiersol (1950). "The Identification of Structural Characteristics". In: _The Annals of Mathematical Statistics_ 21.2, pp. 165-181. URL: [http://www.jstor.org/stable/2236899](http://www.jstor.org/stable/2236899). Kosar, G., T. Ransom, and W. van der Klaauw (2022). "Understanding Migration Aversion Using Elicited Counterfactual Choice Probabilities". In: _Journal of Econometrics_ 231.1, pp. 123-147. DOI: [10.1016/j.jeconom.2020.07.056](https://doi.org/10.1016%2Fj.jeconom.2020.07.056). Krauth, B. (2016). "Bounding a Linear Causal Effect Using Relative Correlation Restrictions". In: _Journal of Econometric Methods_ 5.1, pp. 117-141. DOI: [10.1515/jem-2013-0013](https://doi.org/10.1515%2Fjem-2013-0013). Lang, K. and M. D. Palacios (2018). _The Determinants of Teachers' Occupational Choice_. Working Paper 24883. National Bureau of Economic Research. DOI: [10.3386/w24883](https://doi.org/10.3386%2Fw24883). Lee, D. S., J. McCrary, M. J. Moreira, et al. (2020). _Valid t-ratio Inference for IV_. Working Paper. arXiv. URL: [https://arxiv.org/abs/2010.05058](https://arxiv.org/abs/2010.05058). Lewbel, A. (2019). "The Identification Zoo: Meanings of Identification in Econometrics". In: _Journal of Economic Literature_ 57.4, pp. 835-903. DOI: [10.1257/jel.20181361](https://doi.org/10.1257%2Fjel.20181361). Mahoney, N. (2022). "Principles for Combining Descriptive and Model-Based Analysis in Applied Microeconomics Research". In: _Journal of Economic Perspectives_ 36.3, pp. 211-22. DOI: [10.1257/jep.36.3.211](https://doi.org/10.1257%2Fjep.36.3.211). McFadden, D. (1978). "Modelling the Choice of Residential Location". In: _Spatial Interaction Theory and Planning Models_. Ed. by A. Karlqvist, L. Lundqvist, F. Snickers and J. W. Weibull. Amsterdam: North Holland, pp. 75-96. McFadden, D. (1989). "A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration". In: _Econometrica_ 57.5, pp. 995-1026. DOI: [10.2307/1913621](https://doi.org/10.2307%2F1913621). URL: [http://www.jstor.org/stable/1913621](http://www.jstor.org/stable/1913621). Mellon, J. (2020). _Rain, Rain, Go Away: 137 Potential Exclusion-Restriction Violations for Studies Using Weather as an Instrumental Variable_. Working Paper. University of Manchester. URL: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3715610). Miller, R. A. (1984). "Job Matching and Occupational Choice". In: _Journal of Political Economy_ 92.6, pp. 1086-1120. DOI: [10.1086/261276](https://doi.org/10.1086%2F261276). Mincer, J. (1974). _Schooling, Experience and Earnings_. New York: Columbia University Press for National Bureau of Economic Research. Ost, B., W. Pan, and D. Webber (2018). "The Returns to College Persistence for Marginal Students: Regression Discontinuity Evidence from University Dismissal Policies". In: _Journal of Labor Economics_ 36.3, pp. 779-805. DOI: [10.1086/696204](https://doi.org/10.1086%2F696204). Oster, E. (2019). "Unobservable Selection and Coefficient Stability: Theory and Evidence". In: _Journal of Business & Economic Statistics_ 37.2, pp. 187-204. DOI: [10.1080/07350015.2016.1227711](https://doi.org/10.1080%2F07350015.2016.1227711). Pischke, S. (2007). _Lecture Notes on Measurement Error_. Lecture Notes. London School of Economics. URL: [http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf](http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf). Ransom, M. R. and T. Ransom (2018). "Do High School Sports Build or Reveal Character? Bounding Causal Estimates of Sports Participation". In: _Economics of Education Review_ 64, pp. 75-89. DOI: [10.1016/j.econedurev.2018.04.002](https://doi.org/10.1016%2Fj.econedurev.2018.04.002). Ransom, T. (2022). "Labor Market Frictions and Moving Costs of the Employed and Unemployed". In: _Journal of Human Resources_ 57.S, pp. S137-S166. DOI: [10.3368/jhr.monopsony.0219-10013R2](https://doi.org/10.3368%2Fjhr.monopsony.0219-10013R2). Rudik, I. (2020). "Optimal Climate Policy When Damages Are Unknown". In: _American Economic Journal: Economic Policy_ 12.2, pp. 340-373. DOI: [10.1257/pol.20160541](https://doi.org/10.1257%2Fpol.20160541). Rust, J. (1987). "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher". In: _Econometrica_ 55.5, pp. 999-1033. URL: [http://www.jstor.org/stable/1911259](http://www.jstor.org/stable/1911259). Shalizi, C. R. (2019). _Advanced Data Analysis from an Elementary Point of View_. Cambridge University Press. URL: [http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf](http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf). Smith Jr., A. A. (2008). "Indirect Inference". In: _The New Palgrave Dictionary of Economics_. Ed. by S. N. Durlauf and L. E. Blume. Vol. 1-8. London: Palgrave Macmillan. DOI: [10.1007/978-1-349-58802-2](https://doi.org/10.1007%2F978-1-349-58802-2). URL: [http://www.econ.yale.edu/smith/palgrave7.pdf](http://www.econ.yale.edu/smith/palgrave7.pdf). Stinebrickner, R. and T. Stinebrickner (2014a). "Academic Performance and College Dropout: Using Longitudinal Expectations Data to Estimate a Learning Model". In: _Journal of Labor Economics_ 32.3, pp. 601-644. DOI: [10.1086/675308](https://doi.org/10.1086%2F675308). Stinebrickner, R. and T. R. Stinebrickner (2014b). "A Major in Science? Initial Beliefs and Final Outcomes for College Major and Dropout". In: _Review of Economic Studies_ 81.1, pp. 426-472. DOI: [10.1093/restud/rdt025](https://doi.org/10.1093%2Frestud%2Frdt025). Su, C. and K. L. Judd (2012). "Constrained Optimization Approaches to Estimation of Structural Models". In: _Econometrica_ 80.5, pp. 2213-2230. DOI: [10.3982/ECTA7925](https://doi.org/10.3982%2FECTA7925). Train, K. (2009). _Discrete Choice Methods with Simulation_. 2nd ed. Cambridge; New York: Cambridge University Press. ISBN: 9780521766555. Wiswall, M. and B. Zafar (2018). "Preference for the Workplace, Investment in Human Capital, and Gender". In: _Quarterly Journal of Economics_ 133.1, pp. 457-507. DOI: [10.1093/qje/qjx035](https://doi.org/10.1093%2Fqje%2Fqjx035). Young, A. (2020). _Consistency without Inference: Instrumental Variables in Practical Application_. Working Paper. London School of Economics.