Lecture 14

class: center, middle, inverse, title-slide

# Lecture 14
## Advanced Optimization Techniques
### Tyler Ransom
### ECON 6343, University of Oklahoma

---

# Plan for the Day

1. More on optimization and optimizers

2. Constrained optimization

3. Analytical gradients and hessians

4. Fixed points and MPEC

---
# Beyond nonlinear optimization

- Throughout this course, we've focused on Julia's `Optim` package

- This package provides algorithms for nonlinear unconstrained optimization

- This makes sense: likelihood functions are nonlinear

- But there are many other types of optimizers out there

- They may not be as applicable for econometric applications

- But they might be helpful for certain applications

---
# Other optimizers

- Aside from nonlinear optimization, there are:

- Linear programming

- Mixed integer programming

- Semidefinite programming

- Convex optimization

- Constrained nonlinear optimization

- Each of these algorithms can be accessed through Julia's `JuMP` package

---
# JuMP uses

- These other algorithms in JuMP have valuable real-world uses

- Optimal bus routes

- Optimal power grid architecture

- Solving budget constraint problems

- Solving sudoku puzzles

- The great thing about JuMP is that it .hi[interfaces] with a plethora of optimizers

- You can keep your code the same and simply switch out which optimizer to use

---
# Why do we need constrained optimization?

- In nonlinear optimization, constraints can be very helpful, for a number of reasons:

- .hi[Numerical stability]
    - e.g. optimization will crash if it guesses a variance to be negative
    
- .hi[Make results consistent with economic theory]
    - e.g. discount factor `\(\beta \in [0,1]\)` in DDC models, otherwise model is undefined
    
- .hi[Simplify the problem] 
    - e.g. `\(\beta=0\)` reduces to a static model

- .hi[More quickly solve equilibrium models] through a method called MPEC

---
# Brief review of constrained optimization

- How do we do constrained optimization in economics? Lagrangians!

`\begin{align*}
& \max_x f(x) \\
& \text{subject to} \\
& g(x)\leq 0
\end{align*}`

`\begin{align*}
\mathcal{L}(x,\lambda) &= f(x) - \lambda g(x)
\end{align*}`

- In the case of optimization, `\(f(x)\)` is our likelihood function; `\(x\)`'s are the parameters

- The first-order conditions (FOCs) tell us what the optimal `\(x\)`'s are

- Also must satisfy second-order (SOCs) and Kuhn-Tucker conditions

- In this case, the SOCs involve looking at the .hi[bordered Hessian]

---
# How to use JuMP

- Let's go through an example of how to estimate an econometric model with `JuMP`

- There are four basic components to any `JuMP` model:

1. An optimizer
    
    2. Variables
    
    3. Constraints
    
    4. Objective function
    
- This list is not too different from what goes into `Optim.jl`

---
# Limitations and considerations when using JuMP

- You cannot vectorize the objective function

- i.e. everything needs to be expressed as a scalar

- You cannot use `Distributions.jl` objects in the objective function

- It is a royal pain to extract the Hessian of the objective function

- We need the Hessian to conduct statistical inference

- It is very simple to add constraints

- When adding constraints, the Hessian becomes the bordered Hessian

- This gives incorrect SEs; JuMP also ignores linear constraints in the Hessian

---
# Example: unconstrained optimization with Optim

- This example will estimate a linear regression model by maximum likelihood

- The objective is to maximize `\(\ell = \sum_i \log f_i(\beta;y_i,X_i)\)` where `\(f(\cdot)\)` is the normal pdf

.scroll-box-12[

```julia
using JuMP, Ipopt, Optim, LineSearches, LinearAlgebra, SparseArrays, Distributions, DataFrames, CSV, HTTP

# Let's read in the data from PS8
url = "https://raw.githubusercontent.com/OU-PhD-Econometrics/fall-2020/master/ProblemSets/PS8-factor/nlsy.csv"
df = CSV.read(HTTP.get(url).body)
X = [df.black df.hispanic df.female df.schoolt df.gradHS df.grad4yr ones(size(df,1),1)]
y = df.logwage

# first let's do unconstrained optimization
function reg_mle(θ, X, y)
    # first K elements are the coefficients of the outcome equation
    β = θ[1:end-1]
    # last element is the variance (stdev)
    σ = θ[end]
    # now build the likelihood
    loglike = -sum(-.5 .* ( log(2*π) .+ log(σ^2) .+ ( (y .- X*β)./σ ).^2 ) )
    # more intuitive way? (but JuMP can't use pdf's from Distributions.jl)
    #loglike = -sum( log(1 ./ sqrt(σ^2)) .+ logpdf.(Normal(0,1),(y .- X*β)./sqrt(σ^2)) )
    return loglike
end

# run the optimizer for MLE
svals = vcat(X\y,.5);
td = TwiceDifferentiable(th -> reg_mle(th, X, y), svals; autodiff = :forward)
θ̂_optim_ad = optimize(td, svals, Newton(linesearch = BackTracking()), Optim.Options(g_tol = 1e-5, iterations=100_000, show_trace=true, show_every=1))
θ̂_mle_optim_ad = θ̂_optim_ad.minimizer
loglikeval = θ̂_optim_ad.minimum
# evaluate the Hessian at the estimates
H  = Optim.hessian!(td, θ̂_mle_optim_ad)
θ̂_mle_optim_ad_se = sqrt.(diag(inv(H)))
# store results in a data frame
results = DataFrame(coef_mle = vcat(vec(θ̂_mle_optim_ad),-loglikeval), se_mle = vcat(vec(θ̂_mle_optim_ad_se),missing), coef_ols = vcat(X\y,missing,missing) )

│ Row │ coef_mle   │ se_mle     │ coef_ols   │
├─────┼────────────┼────────────┼────────────┼
│ 1   │ -0.167441  │ 0.0242349  │ -0.167441  │
│ 2   │ -0.054249  │ 0.0257493  │ -0.054249  │
│ 3   │ -0.155049  │ 0.0197612  │ -0.155049  │
│ 4   │ 0.00525102 │ 0.00493929 │ 0.00525102 │
│ 5   │ 0.195649   │ 0.0493521  │ 0.195649   │
│ 6   │ 0.299131   │ 0.0276513  │ 0.299131   │
│ 7   │ 2.00771    │ 0.0485718  │ 2.00771    │
│ 8   │ 0.476761   │ 0.00682761 │ missing    │
│ 9   │ -1653.45   │ missing    │ missing    │
```
]

---
# Same example using JuMP

.scroll-box-18[

```julia
# we need this function for later
function dense_hessian(hessian_sparsity, V, n)
    I = [i for (i,j) in hessian_sparsity]
    J = [j for (i,j) in hessian_sparsity]
    raw = sparse(I, J, V, n, n)
    return Matrix(raw + raw' - sparse(diagm(0=>diag(raw))))
end

function jump_mle(θ₀, X, y)
    # define the model
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, β[j=1:size(X,2)], start = θ₀[j])
    @variable(model, σ, start = θ₀[end])
    @NLobjective(model, Max, sum(-.5 * ( log(2*π) + log(σ^2) + ((y[i] - sum(X[i,j]*β[j] for j in 1:size(X,2)) )/σ)^2 ) for i in 1:size(X,1) ) )
    # optimize the model
    JuMP.optimize!(model)
    # return parameter estimates
    coef_jump = vcat(JuMP.value.(β), JuMP.value(σ), JuMP.objective_value(model) )
    # return Hessian for SEs
    values = coef_jump[1:end-1]
    MOI = JuMP.MathOptInterface
    d = JuMP.NLPEvaluator(model)
    MOI.initialize(d, [:Hess])
    hessian_sparsity = MOI.hessian_lagrangian_structure(d)
    V = zeros(length(hessian_sparsity))
    MOI.eval_hessian_lagrangian(d, V, values, 1.0, Float64[])
    H = dense_hessian(hessian_sparsity, V, length(values))
    se_jump = sqrt.(diag(inv(-H)))
    return coef_jump, se_jump
end
jump_coefs,jump_se = jump_mle(svals, X, y)
results.coef_jump = jump_coefs
results.se_jump = vcat(jump_se,missing)

│ Row │ coef_mle   │ se_mle     │ coef_jump  │ se_jump    │
├─────┼────────────┼────────────┼────────────┼────────────┼
│ 1   │ -0.167441  │ 0.0242349  │ -0.167441  │ 0.0242349  │
│ 2   │ -0.054249  │ 0.0257493  │ -0.054249  │ 0.0257493  │
│ 3   │ -0.155049  │ 0.0197612  │ -0.155049  │ 0.0197612  │
│ 4   │ 0.00525102 │ 0.00493929 │ 0.00525102 │ 0.00493929 │
│ 5   │ 0.195649   │ 0.0493521  │ 0.195649   │ 0.0493521  │
│ 6   │ 0.299131   │ 0.0276513  │ 0.299131   │ 0.0276513  │
│ 7   │ 2.00771    │ 0.0485718  │ 2.00771    │ 0.0485718  │
│ 8   │ 0.476761   │ 0.00682761 │ 0.476761   │ 0.00682761 │
│ 9   │ -1653.45   │ missing    │ -1653.45   │ missing    │
```
]

---
# Doing constrained optimization

- In `JuMP`, it is simple to add a constraint

- Simply add, for example, `@constraint(model, β[2] == .16)`

- In `Optim`, it is a little bit trickier

- In this case, we need to treat the constrained parameter as "data"
    
    - We need to reduce the dimensionality of the vector we're estimating
    
    - Then we need to impose the constraint
    
    - We also need to repeat these steps outside of the optimization

---
# Constrain `\(\beta_2 = .16, \beta_4 = 1+2\beta_3\)` in JuMP

- In `JuMP`, we have

.scroll-box-16[

```julia
function jump_cns2_mle(θ₀, X, y)
    # define the model
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, β[j=1:size(X,2)], start = θ₀[j])
    @variable(model, σ, start = θ₀[end])
*    @constraint(model, β[2] == .16)
*    @constraint(model, β[4] == 1+2*β[3])
    @NLobjective(model, Max, sum(-.5 * ( log(2*π) + log(σ^2) + ((y[i] - sum(X[i,j]*β[j] for j in 1:size(X,2)) )/σ)^2 ) for i in 1:size(X,1) ) )
    # optimize the model
    JuMP.optimize!(model)
    # return parameter estimates
    coef_jump = vcat(JuMP.value.(β), JuMP.value(σ), JuMP.objective_value(model) )
    # return Hessian for SEs
    values = coef_jump[1:end-1]
    MOI = JuMP.MathOptInterface
    d = JuMP.NLPEvaluator(model)
    MOI.initialize(d, [:Hess])
    hessian_sparsity = MOI.hessian_lagrangian_structure(d)
    V = zeros(length(hessian_sparsity))
    MOI.eval_hessian_lagrangian(d, V, values, 1.0, Float64[])
    H = dense_hessian(hessian_sparsity, V, length(values))
    se_jump = sqrt.(Complex.(diag(inv(-H))))
    return coef_jump, se_jump
end
jump_cns2_coefs,jump_cns2_se = jump_cns2_mle(svals, X, y)
results.coef_jump_cns2 = jump_cns2_coefs
results.se_jump_cns2 = vcat(jump_cns2_se,missing)

│ Row │ coef_jump  │ se_jump    │ coef_jump_cns2 │ se_jump_cns2 │
│     │ Float64    │ Float64?   │ Float64        │ Float64?     │
├─────┼────────────┼────────────┼────────────────┼──────────────┤
│ 1   │ -0.167441  │ 0.0242349  │ -0.0823977     │ 0.0260503    │
│ 2   │ -0.054249  │ 0.0257493  │ 0.16           │ 0.0284127    │
│ 3   │ -0.155049  │ 0.0197612  │ -0.488346      │ 0.0238168    │
│ 4   │ 0.00525102 │ 0.00493929 │ 0.0233084      │ 0.00531191   │
│ 5   │ 0.195649   │ 0.0493521  │ 0.248021       │ 0.0527678    │
│ 6   │ 0.299131   │ 0.0276513  │ 0.308031       │ 0.0295506    │
│ 7   │ 2.00771    │ 0.0485718  │ 1.96151        │ 0.051928     │
│ 8   │ 0.476761   │ 0.00682761 │ 0.509483       │ 0.00841742   │
│ 9   │ -1653.45   │ missing    │ -1815.29       │ missing      │
```
]

---
# Comments on JuMP results

- `JuMP` gives us the correct point estimates

- The standard errors, however, are incorrect

- At the very least, `\(\text{se}(\beta_2)\)` should be 0

- Note that the constrained log likelihood is much lower; this is as it should be

---
# Constrain `\(\beta_2 = .16, \beta_4 = 1+2\beta_3\)` in Optim

- In `Optim`, it's helpful to create a matrix that stores our constraints

.scroll-box-16[

```julia
# can we use optim for the same constrained optimization?
# now we need additional information: whether the constraint is "type 1" (set equal to fixed value) or "type 2" (set equal to another parameter)
# first, set up constraints
#     Type 1  Restricting one parameter ("parmA") to equal a fixed value
#     Type 2  Restricting one parameter, parmA, to equal another ("parmB"),
#             potentially multiplied by some real number q and addd to
#             some constant m, e.g. parmA = m + q*parmB.
#
#   RESTRMAT follows a very specific format. It is an R-by-5 matrix,
#   where R is the number of restrictions. The role of each of the four
#   columns is as follows
#
#   Column 1  The index of parmA
#   Column 2  The index of parmB (zero if type 1 restriction)
#   Column 3  Binary vector w   here 0 indciates a type 1 restriction (parmA
#               set equal to fixed value) and 1 indicates a type 2
#               restriction (parmA set equal to parmB)
#   Column 4  If a type 1 restriction, 0. If     a type 2 restriction, any
#               real number q such that parmA = q*parmB.
#   Column 5  If a type 1 restriction, the fixed value. If a type 2
#                restriction, any real number m such that parmA = m+q*parmB.
#   NOTE: parmA should always be a later index than parmB
cns_mat2 = [2 0 0 0 .16;
            4 3 1 2 1]

function cns2_reg_mle(θ, cns_mat, X, y)
    # first K elements are the coefficients of the outcome equation
    β = θ[1:end-1]
    # last element is the variance (stdev)
    σ = θ[end]
    # impose constraints
    for r in 1:size(cns_mat,1)
        idx1 = convert(Int64,cns_mat[r,1])
        idx2 = convert(Int64,cns_mat[r,2])
        if cns_mat[r,3]==0
            insert!(β,idx1,cns_mat[r,5])
        else
            insert!(β,idx1,cns_mat[r,5]+cns_mat[r,4]*β[idx2])
        end
    end
    # now build the likelihood
    loglike = -sum(-.5 .* ( log(2*π) .+ log(σ^2) .+ ( (y .- X*β)./σ ).^2 ) )
    # more intuitive way? (but JuMP can't use pdf's from Distributions.jl)
    #loglike = -sum( log(1 ./ sqrt(σ^2)) .+ logpdf.(Normal(0,1),(y .- X*β)./sqrt(σ^2)) )
    return loglike
end

# run the optimizer for MLE
svals = vcat(X\y,.5)
# constraints are treated as data, so take them out of starting values (they get added back in inside the obj function)
for r=1:size(cns_mat2,1)
    deleteat!(svals, convert(Int64,cns_mat2[r,1]))
end
td = TwiceDifferentiable(th -> cns2_reg_mle(th, cns_mat2, X, y), svals; autodiff = :forward)
θ̂_optim_ad = optimize(td, svals, Newton(linesearch = BackTracking()), Optim.Options(g_tol = 1e-5, iterations=100_000, show_trace=true, show_every=1))
θ̂_mle_optim_ad = θ̂_optim_ad.minimizer
loglikeval = θ̂_optim_ad.minimum
# evaluate the Hessian at the estimates
H  = Optim.hessian!(td, θ̂_mle_optim_ad)
θ̂_mle_optim_ad_se = sqrt.(diag(inv(H)))
println(θ̂_mle_optim_ad)
# add back in constraint in both estimates and SEs
for r in 1:size(cns_mat2,1)
    idx1 = convert(Int64,cns_mat2[r,1])
    idx2 = convert(Int64,cns_mat2[r,2])
    if cns_mat2[r,3]==0
        insert!(θ̂_mle_optim_ad,idx1,cns_mat2[r,5])
        insert!(θ̂_mle_optim_ad_se,idx1,0)
    else
        insert!(θ̂_mle_optim_ad,idx1,cns_mat2[r,5]+cns_mat2[r,4]*θ̂_mle_optim_ad[idx2])
        insert!(θ̂_mle_optim_ad_se,idx1,cns_mat2[r,5]+cns_mat2[r,4]*θ̂_mle_optim_ad_se[idx2]) # this is wrong
    end
    println(θ̂_mle_optim_ad)
end
# store results in a data frame
results.coef_optim_cns2 = vcat(θ̂_mle_optim_ad,-loglikeval)
results.se_optim_cns2   = vcat(θ̂_mle_optim_ad_se,missing)

│ Row │ coef_mle   │ se_mle     │ coef_optim_cns2 │ se_optim_cns2 │
│     │ Float64    │ Float64?   │ Float64         │ Float64?      │
├─────┼────────────┼────────────┼─────────────────┼───────────────┤
│ 1   │ -0.167441  │ 0.0242349  │ -0.0823977      │ 0.0246269     │
│ 2   │ -0.054249  │ 0.0257493  │ 0.16            │ 0.0           │
│ 3   │ -0.155049  │ 0.0197612  │ -0.488346       │ 0.00256683    │
│ 4   │ 0.00525102 │ 0.00493929 │ 0.0233084       │ 1.00513       │
│ 5   │ 0.195649   │ 0.0493521  │ 0.248021        │ 0.0526101     │
│ 6   │ 0.299131   │ 0.0276513  │ 0.308031        │ 0.0291047     │
│ 7   │ 2.00771    │ 0.0485718  │ 1.96151         │ 0.0506234     │
│ 8   │ 0.476761   │ 0.00682761 │ 0.509483        │ 0.00729623    │
│ 9   │ -1653.45   │ missing    │ -1815.29        │ missing       │
```
]

---
# Analytical gradients and Hessians

- So far in this course, we've used Julia's `autodiff` to take derivatives for us

- In most cases, this will get you pretty close to as much speed as you'll need

- But in some cases, you may require even more performance gains

- In this case, it can be helpful to provide `Optim` with the analytical gradient

- In one test I ran, the analytical gradient ran over .hi[3x faster] than autodiff

---
# How to pass an analytical gradient to Optim

- The example code below re-works question 1 from PS4 (multinomial logit)

.scroll-box-16[

```julia
@views @inline function asclogit(bstart::Vector,Y::Array,X::Array,Z::Array,J::Int64,baseAlt::Int64=J,W::Array=ones(length(Y)))
    ## error checking
    @assert ((!isempty(X) || !isempty(Z)) && !isempty(Y))    "You must supply data to the model"
    @assert (ndims(Y)==1 && size(Y,2)==1)                    "Y must be a 1-D Array"
    @assert (minimum(Y)==1 && maximum(Y)==J) "Y should contain integers numbered consecutively from 1 through J"
    if !isempty(X)
        @assert ndims(X)==2          "X must be a 2-dimensional matrix"
        @assert size(X,1)==size(Y,1) "The 1st dimension of X should equal the number of observations in Y"
    end
    if !isempty(Z)
        @assert ndims(Z)==3          "Z must be a 3-dimensional tensor"
        @assert size(Z,1)==size(Y,1) "The 1st dimension of Z should equal the number of observations in Y"
        @assert size(Z,3)==J         "The 3rd dimension of Z should equal the number of choice alternatives"
    end

K1 = size(X,2)
    K2 = size(Z,2)
    jdx = setdiff(1:J,baseAlt)

function f(b)
        T = promote_type(promote_type(promote_type(eltype(X),eltype(b)),eltype(Z)),eltype(W))
        num   = zeros(T,size(Y))
        dem   = zeros(T,size(Y))
        temp  = zeros(T,size(Y))
        numer =  ones(T,size(Y,1),J)
        P     = zeros(T,size(Y,1),J)
        ℓ     =  zero(T)
        b2 = b[K1*(J-1)+1:K1*(J-1)+K2]

k = 1
        for j in 1:J
            if j != baseAlt
                temp       .= X*b[(k-1)*K1+1:k*K1] .+ (Z[:,:,j].-Z[:,:,baseAlt])*b2
                num        .= (Y.==j).*temp.+num
                dem        .+= exp.(temp)
                numer[:,j] .=  exp.(temp)
                k += 1
            end
        end
        dem.+=1
        P   .=numer./(1 .+ sum(numer;dims=2))

ℓ = -W'*(num.-log.(dem))
    end

function g!(G,b)
        T     = promote_type(promote_type(promote_type(eltype(X),eltype(b)),eltype(Z)),eltype(W))
        numer = zeros(T,size(Y,1),J)
        P     = zeros(T,size(Y,1),J)
        numg  = zeros(T,K2)
        demg  = zeros(T,K2)
        b2    = b[K1*(J-1)+1:K1*(J-1)+K2]

G .= T(0)
        k = 1
        for j in 1:J
            if j != baseAlt
                numer[:,j] .= exp.( X*b[(k-1)*K1+1:k*K1] .+ (Z[:,:,j].-Z[:,:,baseAlt])*b2 )
                k += 1
            end
        end
        P   .=numer./(1 .+ sum(numer;dims=2))

k = 1
        for j in 1:J
            if j != baseAlt
                G[(k-1)*K1+1:k*K1] .= -X'*(W.*((Y.==j).-P[:,j]))
                k += 1
            end
        end

for j in 1:J
            if j != baseAlt
                numg .-= (Z[:,:,j].-Z[:,:,baseAlt])'*(W.*(Y.==j))
                demg .-= (Z[:,:,j].-Z[:,:,baseAlt])'*(W.*P[:,j])
            end
        end
        G[K1*(J-1)+1:K1*(J-1)+K2] .= numg.-demg
        return nothing
    end

td = TwiceDifferentiable(f, g!, bstart, autodiff = :forwarddiff)
    rs = optimize(td, bstart, LBFGS(; linesearch = LineSearches.BackTracking()), Optim.Options(iterations=100_000,g_tol=1e-8,f_tol=1e-8,x_tol=1e-8,show_trace=true))
    β  = Optim.minimizer(rs)
    ℓ  = Optim.minimum(rs)*(-1)
    H  = Optim.hessian!(td, β)
    g  = Optim.gradient!(td, β)
    se = sqrt.(diag(inv(H)))

return β,se,ℓ,g
end

url = "https://raw.githubusercontent.com/OU-PhD-Econometrics/fall-2020/master/ProblemSets/PS4-mixture/nlsw88t.csv"
dff = CSV.read(HTTP.get(url).body)
XX  = [dff.age dff.white dff.collgrad]
ZZ  = cat(dff.elnwage1, dff.elnwage2, dff.elnwage3, dff.elnwage4,
         dff.elnwage5, dff.elnwage6, dff.elnwage7, dff.elnwage8; dims=3)
yy  = dff.occ_code
J   = 8
startvals = [2*rand(7*size(XX,2)).-1; .1]
β,se,ℓ,g = asclogit(startvals,yy,XX,ZZ,J,J,ones(length(yy)))
dfr = DataFrame(β=β,se=se)
@show dfr
```
]

---
# Solving for equilibria using a contraction mapping

- In many instances, we may want to solve for an equilibrium

- This is especially common in IO applications, where firms interact strategically

- The goal is to estimate preference parameters consistent with the equilibrium

- This would typically involve some kind of a contraction mapping:

- Take a guess at the parameter values
    
    - Conditional on the parameter values, solve for the equilibrium
    
    - Update the parameter values, re-solve for the equilibrium, ...

---
# Solving for equilibria using MPEC

- An alternative approach to solving for equilibria is MPEC

- .hi[MPEC:] Mathematical Programming with Equilibrium Constraints

- With MPEC, we re-cast the equilibrium as a set of optimization constraints

- This reduces the need to solve for a fixed point

- Typically the estimation converges much faster

- because the optimizer sees the constraints and makes "smarter" guesses

---
# MPEC in the Rust bus engine problem

- An alternative approach to solving for equilibria is MPEC

- Su and Judd (2012) compare MPEC with NFXP in the Rust (1987) model

- They show that MPEC converges about .hi[800x faster]

---
# MPEC example: Cournot oligopoly

- `JuMP`, with an add-on package (`Complementarity.jl`), supports MPEC

- Example: `\(N\)`-firm symmetric Cournot monopolistic competition

- Each firm `\(i\)` chooses output `\(q_i\)` to maximize profit, subject to `\(q_{-i}\)` (others' output decisions)

- Marginal cost `\(c_i\)` is assumed to be constant and equal across all firms

- Market demand is given by `\(P(Q) = a - bQ\)`

- This problem is easy to solve analytically:
`\begin{align*}
q^* &= \frac{a-c}{b(N+1)}, & P^* &= \frac{a+Nc}{N+1}
\end{align*}`

---
# Cournot oligopoly using JuMP

- The code for solving this in JuMP is below

- We could easily generalize this to non-linear demand, asymmetric `\(c_i\)`, etc.

.scroll-box-12[

```julia
using JuMP, Ipopt, Complementarity

# simplified version of cournot demo (linear demand, symmetric costs)
function cournot_symm(N = 7, mc = 20, a=100, b=2, L = 1000)

c = mc*ones(N)
    m = Model(Ipopt.Optimizer)

@variable(m, 0 <= x <= L) # firm 1 output
    @variable(m, y[1:N-1])    # other firms' output
    @variable(m, l[1:N-1])    # legrange multipliers
    @variable(m, Q >= 0)      # total market output
    @constraint(m, Q == x+sum(y[i] for i in 1:N-1))
    @constraint(m, x == y[1])
    @NLobjective(m, Min, c[1]*x - x*( a - b*Q ) ) # firm 1's objective

@NLconstraint(m, cnstr[i=1:N-1], 0 == ( c[i+1] ) - ( a - b*Q ) - y[i]*( -b ) - l[i] ) # other firms' FOCs

for i in 1:N-1
        @complements(m, l[i], 0 <= y[i] <= L, smooth)
    end

optimize!(m)

@show getobjectivevalue(m)
    @show getvalue.(x)
    @show getvalue.(y)
    @show getvalue.(l)
    @show getvalue.(Q)
    @show P = a - b*getvalue.(Q)
    @assert isapprox(getvalue.(x), (a-mc)/(b*(N+1)), atol=1e-4)
    return P
end
P7  = cournot_symm()
P17 = cournot_symm(17)
```
]

---
# References
Rust, J. (1987). "Optimal Replacement of GMC Bus Engines: An Empirical
Model of Harold Zurcher". In: _Econometrica_ 55.5, pp. 999-1033. URL:
[http://www.jstor.org/stable/1911259](http://www.jstor.org/stable/1911259).

Su, C. and K. L. Judd (2012). "Constrained Optimization Approaches to
Estimation of Structural Models". In: _Econometrica_ 80.5, pp.
2213-2230. DOI: [10.3982/ECTA7925](https://doi.org/10.3982%2FECTA7925).