ACE 592 - Lecture 3.2

Nonlinear Systems of Equations

Diego S. Cardoso

University of Illinois Urbana-Champaign

Course Roadmap

  1. Introduction to Scientific Computing
  2. Fundamentals of numerical methods
  3. Systems of equations
    1. Linear systems
    2. Nonlinear systems
    3. Complementarity problems
  4. Optimization
  5. Function approximation
  6. Structural estimation

Agenda

  • We continue our study of methods to solve systems of equations, now looking at systems that contain nonlinear equations
  • Unlike linear systems, with nonlinear systems we don’t have guarantees of closed-form solutions, so solving them is harder
  • We will study two types of methods: derivative-free and derivative-based

Main references for today

  • Miranda & Fackler (2002), Ch. 3
  • Judd (1998), Ch. 5
  • Nocedal & Wright (2006), Ch. 11
  • Lecture notes for Ivan Rudik’s Dynamic Optimization (Cornell)

Types of nonlinear equation problems

Types of nonlinear equation problems

Problems involving nonlinear equations are very common in economics

We can classify them into two types

  1. Systems of nonlinear equations
  2. Complementarity problems

System of nonlinear equations

These come in two forms

  1. Rootfinding problems

For a function \(f: \mathbb{R}^n \rightarrow \mathbb{R}^n\), we want to find a n-vector \(x\) that satisfies \(f(x) = 0\)

\(\rightarrow\) Any \(x\) that satisfies this condition is called a root of \(f\)

Examples?

  • Market equilibrium in general: market clearing conditions
  • No-arbitrage conditions (pricing models)
  • Solving first-order optimality conditions

System of nonlinear equations

  1. Fixed-point problems

For a function \(g: \mathbb{R}^n \rightarrow \mathbb{R}^n\), we want to find a n-vector \(x\) that satisfies \(x = g(x)\) \(\rightarrow\) Any \(x\) that satisfies this condition is called a fixed point of \(g\)

Examples?

  • Best-response functions
  • Many equilibrium concepts in game theory
    • A Nash equilibrium is a fixed point in the strategy space

System of nonlinear equations

Rootfinding and fixed-point problems are equivalent

We can easily convert one into another

  • Rootfinding \(\rightarrow\) fixed-point
  • Define a new \(g(x) = x - f(x)\)
  • Fixed-point \(\rightarrow\) rootfinding
    • Define a new \(f(x) = x - g(x)\)

Complementarity problems

In these problems, we have

  • A function \(f: \mathbb{R}^n \rightarrow \mathbb{R}^n\)
  • n-vectors \(a\) and \(b\), with \(a<b\)

And we are looking for an n-vector \(x \in [a,b]\) such that for all \(i = 1,\dots,n\)

\[ \begin{align*} x_i > a_i \Rightarrow f_i(x) \geq 0 \\ x_i < b_i \Rightarrow f_i(x) \leq 0 \end{align*} \]

Examples of complementarity problems?

  • Market equilibrium with constraints: quotas, price support, non-negativity, limited capacity, etc
  • First-order conditions of constrained function maximization/minimization

Complementarity problems

\[ \begin{align*} x_i > a_i \Rightarrow f_i(x) \geq 0 \\ x_i < b_i \Rightarrow f_i(x) \leq 0 \end{align*} \]

What do these equations mean?

If the constraints on \(x_i\) do not bind ( \(a_i < x_i < b_i\)) , then the first-order condition is precisely zero

But suppose the upper bound binds ( \(x_i = b_i\) ). Then \(f_i(x) \geq 0\) since \(x_i > a_i\)

  • But we can’t guarantee that \(f_i(x) = 0\) because \(f_i(x)\) might still be increasing at that point

Complementarity problems

Rootfinding is a special case of complementarity problems: \(a = -\infty\) and \(b = \infty\)

But complementarity problems are not just about finding a root within \([a, b]\)

  • Remember: if some \(x_i\) is at the boundary ( \(x_i = a_i\) or \(x_i = b_i\) ), some element of \(f(x)\) can be non-zero!
  • NOTE: We will complementary problems next lecture (separate slide deck)

Derivative-free methods

Rootfinding problems

Let’s start simple: we have a continuous function \(f:[a,b]\in\mathbb{R} \rightarrow\mathbb{R}\) and we know that \(f(a)<0\) and \(f(b)>0\)

What does the Intermediate Value Theorem says here?

If \(f\) is continuous and \(f(a) \neq f(b)\), then \(f\) must assume all values in between \(f(a)\) and \(f(b)\)

  • So if \(f(a)<0\) and \(f(b)>0 \Rightarrow\) there must be at least one root \(x\in[a,b]\) such that \(f(x) = 0\)

How would you go about finding a root?

Bisection method

Basic idea: split the search interval in two parts and check whether there’s a root in each part

How do we check that?

  • By looking at the signs of \(f(x)\) at the boundaries of each interval
    • If they are different, there’s a root \(\Rightarrow\) we keep looking there

Let’s see an illustration

Bisection method

We start with \(l = a, u = b\) and find \(m = (u+l)/2\)

Let’s say \(f(l)<0\), \(f(u)>0\), and \(f(m)>0\). What do we do next?

Bisection method

Since \(f(l)<0\) and \(f(m)>0\) have different signs, we continue our search in \([l,m]\)

We set \(u \leftarrow m\)

Then we calculate the new midpoint \(m\)

Now say \(f(l)<0\), \(f(u)>0\), and \(f(m)<0\). What do we do next?

Bisection method

Since \(f(m)<0\) and \(f(u)>0\) have different signs, we continue our search in \([m,u]\)

We set \(l \leftarrow m\)

And the search continues until we are satisfied with the precision

Try it! Bisection method

Your turn!

  • Write a function that takes (f, a, b) and returns a root of \(f\) using the bisection method.
  • Then, use it to find the root of \(f(x) = -x^{-2} + x - 1\) between \([0.2, 4]\)

Here are the basic steps:

  1. Start with a lower ( \(l = a\) ) and an upper ( \(u = b\) ) bounds
  2. Get the midpoint \(m = (u+l)/2\)
  3. Check the sign of \(f(m)\)
  4. If \(sign(f(m)) = sign(f(l))\), move lower bound up: \(l \leftarrow m\)
  5. If \(sign(f(m)) = sign(f(u))\), move upper bound down: \(u \leftarrow m\)
  6. Repeat 2 and 3 until our interval is short enough ( \((u-l)/2 < tol\) ) and return \(x = m\)

Bisection method

Show the code
function bisection(f, lo, up)
    tolerance = 1e-3          # tolerance for solution
    mid = (lo + up)/2         # initial guess, bisect the interval
    difference = (up - lo)/2  # initialize bound difference

    while difference > tolerance         # loop until convergence
        println("Intermediate guess: $mid")
        if sign(f(lo)) == sign(f(mid)) # if the guess has the same sign as the lower bound
            lo = mid                   # a solution is in the upper half of the interval
            mid = (lo + up)/2
        else                           # else the solution is in the lower half of the interval
            up = mid
            mid = (lo + up)/2
        end
        difference = (up - lo)/2       # update the difference 
    end
    println("The root of f(x) is $mid")
end;

Bisection method

f(x) = -x^(-2) + x - 1;
bisection(f, 0.2, 4.0)
Intermediate guess: 2.1
Intermediate guess: 1.1500000000000001
Intermediate guess: 1.625
Intermediate guess: 1.3875000000000002
Intermediate guess: 1.50625
Intermediate guess: 1.4468750000000001
Intermediate guess: 1.4765625
Intermediate guess: 1.4617187500000002
Intermediate guess: 1.469140625
Intermediate guess: 1.4654296875000001
Intermediate guess: 1.46728515625
The root of f(x) is 1.4663574218750002

Bisection method

Bisection method

What happens if we specify the wrong interval (i.e, there’s no root in there)?

It will go towards the boundaries

bisection(f, 2.0, 4.0)
Intermediate guess: 3.0
Intermediate guess: 3.5
Intermediate guess: 3.75
Intermediate guess: 3.875
Intermediate guess: 3.9375
Intermediate guess: 3.96875
Intermediate guess: 3.984375
Intermediate guess: 3.9921875
Intermediate guess: 3.99609375
Intermediate guess: 3.998046875
The root of f(x) is 3.9990234375

Bisection method

So it’s a good idea to check if you have the right boundaries

f(2.0)
0.75
f(4.0)
2.9375

These are both positive. So by the IVT, we can’t know for sure if there’s a root here

Bisection method

The bisection method is incredibly robust: if a function \(f\) satisfies the IVT, it is guaranteed to converge in a finite number of iterations

A root can be calculated to arbitrary precision \(\epsilon\) in a maximum of \(log_2\frac{b-a}{\epsilon}\) iterations

But robustness comes with drawbacks:

  1. It only works in one dimension
  2. It is slow because it only uses information about the function’s level but not its variation

Function iteration

For the next method, we recast rootfinding as a fixed-point problem

\[ f(x) = 0 \Rightarrow g(x) = x - f(x) \]

Then, we start with an initial guess \(x^{(0)}\) and iterate \(x^{(k+1)}\leftarrow g(x^{k})\) until convergence: \(|x^{(k+1)}-x^{(k)}| \approx 0\)

Let’s try it!

  • Write a function that takes (f, initial_guess) and returns a root of \(f\) using function iteration
  • Then, use it to find the root of \(f(x) = -x^{-2} + x - 1\) with initial_guess = 1

Function iteration

Show the code
function function_iteration(f, initial_guess)
    tolerance = 1e-3   # tolerance for solution
    difference = Inf   # initialize difference
    x = initial_guess  # initialize current value
    
    while abs(difference) > tolerance # loop until convergence
        println("Intermediate guess: $x")
        x_prev = x  # store previous value
        x = x_prev - f(x_prev) # calculate next guess
        difference = x - x_prev # update difference
    end
    println("The root of f(x) is $x")
end;

Function iteration

f(x) = -x^(-2) + x - 1;
function_iteration(f, 1.0)
Intermediate guess: 1.0
Intermediate guess: 2.0
Intermediate guess: 1.25
Intermediate guess: 1.6400000000000001
Intermediate guess: 1.37180249851279
Intermediate guess: 1.5313942135189396
Intermediate guess: 1.426408640598956
Intermediate guess: 1.4914870486759138
Intermediate guess: 1.4495324290188554
Intermediate guess: 1.475931147477801
Intermediate guess: 1.4590582576091302
Intermediate guess: 1.4697369615928915
Intermediate guess: 1.4629358005751476
Intermediate guess: 1.4672501656759618
Intermediate guess: 1.46450636089898
Intermediate guess: 1.4662485297695573
Intermediate guess: 1.4651412125747465
The root of f(x) is 1.4658445625216316

Function iteration

A fixed point \(x = g(x)\) is at the intersection between \(g(x)\) and the 45o line

  • Starting from \(x^{(0)}\), we calculate \(g(x^{(0)})\) and find the corresponding point on the 45o line for \(x^{(1)}\)
  • We keep iterating until we (approximately) find the fixed point

Function iteration

Function iteration is guaranteed to converge to a fixed point \(x^*\) if

  1. \(g\) is differentiable, and
    1. the initial guess is “sufficiently close” to an \(x^*\) at which \(\|g^\prime (x^{*}) \| < 1\)

It may also converge when these conditions are not met

Since this is an easy method to implement, it’s worth trying it before switching to more complex methods

Function iteration

But wait: What is “sufficiently close”?

Good question! There is no practical formula. As Miranda and Fackler (2002) put it

Typically, an analyst makes a reasonable guess for the root \(f\) and counts his/her blessings if the iterates converge. If the iterates do not converge, then the analyst must look more closely at the properties of \(f\) to find a better starting value, or change to another rootfinding method.

This is where science also becomes a bit of an art

Derivative-based methods

Newton’s method

Newton’s method and variants are the workhorses of solving n-dimensional non-linear problems

Key idea: take a hard non-linear problem and replace it with a sequence of linear problems \(\rightarrow\) successive linearization

Newton’s method

  1. Start with an initial guess of the root at \(x^{(0)}\)
  2. Approximate the non-linear function with its first-order Taylor expansion around \(x^{(0)}\)
  • This is just the tangent line at \(x^0\)
  1. Solve for the root of this linear approximation, call it \(x^{(1)}\)

Newton’s method

  1. Repeat starting at \(x^{(1)}\) until we converge to \(x^*\)

This can be applied to a function with an arbitrary number of dimensions

Newton’s method

Formally: begin with some initial guess of the root vector \(\mathbf{x^{(0)}}\)

Given \(\mathbf{x^{(k)}}\), our new guess \(\mathbf{x^{(k+1)}}\) is obtained by approximating \(f(\mathbf{x})\) using a first-order Taylor expansion around \(\mathbf{x^{(k)}}\)

\[ f(\mathbf{x}) \approx f(\mathbf{x^{(k)}}) + f'(\mathbf{x^{(k)}})(\mathbf{x^{(k+1)}}-\mathbf{x^{(k)}}) = 0 \]

Then, solve for \(\mathbf{x^{(k+1)}}\):

\[ \mathbf{x^{(k+1)}} = \mathbf{x^{(k)}} - \left[f'(\mathbf{x^{(k)}})\right]^{-1}f(\mathbf{x^{(k)}}) \]

Your turn! Newton’s method

Try it!

  • Write a function that takes (f, f_prime, initial_guess) and returns a root of \(f\) using Newton’s method
  • Then use it to find the root of \(f(x) = -x^{-2} + x - 1\) with initial_guess = 1

\[ \mathbf{x^{(k+1)}} \leftarrow \mathbf{x^{(k)}} - \left[f'(\mathbf{x^{(k)}})\right]^{-1}f(\mathbf{x^{(k)}}) \]

Newton’s method

Show the code
function newtons_method(f, f_prime, initial_guess)
  tolerance = 1e-3   # tolerance for solution
  difference = Inf   # initialize difference
  x = initial_guess  # initialize current value
  
  while abs(difference) > tolerance # loop until convergence
      println("Intermediate guess: $x")
      x_prev = x  # store previous value
      x = x_prev - f(x_prev)/f_prime(x_prev) # calculate next guess
      # ^ this is the only line that changes from function iteration
      difference = x - x_prev # update difference
  end
  println("The root of f(x) is $x")
end;

Newton’s method

f(x) = -x^(-2) + x - 1;
f_prime(x) = 2x^(-3) + 1;
newtons_method(f, f_prime, 1.0)
Intermediate guess: 1.0
Intermediate guess: 1.3333333333333333
Intermediate guess: 1.4576271186440677
Intermediate guess: 1.4655459327062879
The root of f(x) is 1.465571231622256

Newton’s method

Newton’s method has nice properties regarding convergence and speed

It converges if

  1. If \(f(x)\) is continuously differentiable,
  2. The initial guess is “sufficiently close” to the root, and
  3. \(f(x)\) is invertible near the root

We need \(f(x)\) to be invertible so the algorithm above is well defined

If \(f'(x)\) is ill-conditioned we can run into problems with rounding error

Newton’s method: a duopoly example

Inverse demand: \(P(q) = q^{-1/\epsilon}\)

Two firms with costs: \(C_i(q_i) = \frac{1}{2}c_i q_i^2\)

Firm i’s profits: \(\pi_i(q_1,q_2) = P(q_1 + q_2)q_i - C_i(q_i)\)

Firms take other’s output as given. So their first-order conditions are

\[ \frac{\partial \pi_i}{\partial q_i} = P(q_1 + q_2) + P^\prime(q_1 + q_2)q_i - C_i^\prime(q_i) = 0 \]

Newton’s method: a duopoly example

We are looking for an equilibrium: a pair \((q_1, q_2)\) which are roots to two nonlinear equations

\[ \begin{align*} f_1(q_1, q_2) & = (q_1 + q_2)^{-1/\epsilon} - (1/\epsilon)(q_1 + q_2)^{-1/\epsilon-1}q_1 - c_1 q_1 = 0 \\ f_2(q_1, q_2) & = (q_1 + q_2)^{-1/\epsilon} - (1/\epsilon)(q_1 + q_2)^{-1/\epsilon-1}q_2 - c_2 q_2 = 0 \end{align*} \]

Can you solve this analytically? It’s quite hard…

Let’s do it numerically, starting by coding this function in Julia

Newton’s method: a duopoly example

For this example, \(\epsilon = 1.6, c_1 = 0.6, c_2 = 0.8\)

epsilon = 1.6; c = [0.6; 0.8]; # column vector

\(f(q)\) will return a vector (pay attention to how I used the dot syntax)

function f(q)
    Q = sum(q)
    F = Q^(-1/epsilon) .- (1/epsilon)Q^(-1/epsilon-1) .*q .- c .*q
end;
f([0.2; 0.2])
2-element Vector{Float64}:
 1.0989480808247896
 1.0589480808247895

Newton’s method: a duopoly example

What do we need next to use Newton’s method?

The derivatives! In this case, we have a function \(f:\mathbb{R}^2\rightarrow\mathbb{R}^2\), so we need to define the Jacobian matrix

\[ J = \begin{bmatrix} \frac{\partial f_1}{\partial q_1} & \frac{\partial f_1}{\partial q_2} \\ \frac{\partial f_2}{\partial q_1} & \frac{\partial f_2}{\partial q_2} \\ \end{bmatrix} \]

Newton’s method: a duopoly example

We can derive these terms analytically

\[ \begin{align*} \frac{\partial f_{i}}{\partial q_{i}} &= (1/\epsilon)(q_{1}+q_{2})^{-1/\epsilon-1}\left[-2+(1/\epsilon+1)(q_{1}+q_{2})^{-1}q_{i}\right]-c_{i} \\ \frac{\partial f_{i}}{\partial q_{j\neq i}} &= \underbrace{(1/\epsilon)(q_{1}+q_{2})^{-1/\epsilon-1}}_{A} \left[-1+ \underbrace{(1/\epsilon+1)(q_{1}+q_{2})^{-1}}_{B} q_{i}\right] \end{align*} \]

Newton’s method: a duopoly example

So we can write the Jacobian as

\[ J = -A \begin{bmatrix} 2 & 1\\ 1 & 2 \end{bmatrix} +AB \begin{bmatrix} q_{1} & q_{1}\\ q_{2} & q_{2}\\ \end{bmatrix} - \begin{bmatrix} c_{1} & 0\\ 0 & c_{2} \end{bmatrix} \]

where \(A\equiv(1/\epsilon)(q_{1}+q_{2})^{-1/\epsilon-1}\), \(B\equiv(1/\epsilon)(q_{1}+q_{2})^{-1}\)

Let’s turn this Jacobian into a Julia function so we can use Newton’s method

Newton’s method: a duopoly example

Now we define a function for the Jacobian

using LinearAlgebra
function f_jacobian(q)
    Q = sum(q)
    A = (1/epsilon)Q^(-1/epsilon-1)
    B = (1/epsilon+1)Q^(-1)
    J = -A .* [2 1; 1 2] + (A*B) .* [q q] - LinearAlgebra.Diagonal(c)
end;
f_jacobian([0.2; 0.2])
2×2 Matrix{Float64}:
 -3.88977   -0.519438
 -0.519438  -4.08977

Newton’s method: a duopoly example

We need to write a new version of our Newton’s method so it can handle \(n\)-dimensional functions

function newtons_method_multidim(f, f_jacobian, initial_guess) 
    tolerance = 1e-3   
    difference = Inf   
    x = initial_guess  
    
    while norm(difference) > tolerance # <=== Changed here
        println("Intermediate guess: $x")
        x_prev = x  
        x = x_prev - f_jacobian(x_prev)\f(x_prev) # <=== and here
        difference = x - x_prev 
    end
    println("The root of f(x) is $x")
    return x
  end;

Newton’s method: a duopoly example

Note that we only needed to change 2 things from the previous version:

  1. Our tolerance is now over the norm of the difference vector
  2. The “derivative” is a Jacobian matrix, so we multiply \(f(x^{(k)})\) by the inverse of \(J(x^{(k)})\)
  • We use operator \ because it is more efficient than inverting \(J\)

We also added a return x so that the function returns a solution

Newton’s method: a duopoly example

Let’s test it out with initial guess \((0.2, 0.2)\)

x = newtons_method_multidim(f, f_jacobian, [0.2; 0.2])
Intermediate guess: [0.2, 0.2]
Intermediate guess: [0.4522233972882232, 0.4268911412785466]
Intermediate guess: [0.7443820767549576, 0.6380274933135309]
Intermediate guess: [0.8359566131332257, 0.6874038938481142]
Intermediate guess: [0.8395632064645864, 0.6887954351409207]
The root of f(x) is [0.8395676035295858, 0.6887964311626718]
2-element Vector{Float64}:
 0.8395676035295858
 0.6887964311626718

Newton’s method: a duopoly example

Let’s check our solution

f(x)
2-element Vector{Float64}:
 5.7654991891809004e-12
 9.037215420448774e-13

Pretty good result with quick convergence!

Newton’s method: a duopoly example

Visually, this is the path Newton’s method followed

Newton’s method: analytic vs numerical derivatives

It was tedious but in the previous example we could calculate the Jacobian analytically. Sometimes it’s much harder and we can make mistakes.

Here is a challenge for you: redo the previous example but, instead of defining the Jacobian analytically, use the ForwardDiff package to do the derivatives for you. Then, compare your solution with mine.

An alternative is to use the Symbolics.jl package. For low-dimensional derivative and Jacobians, it does a good job!

Quasi-Newton methods

We usually don’t want to deal with analytic derivatives unless we have access to autodifferentiation

Why?

  1. It can be difficult to do the analytic derivation
  2. Coding a complicate Jacobian is prone to errors and takes time
  3. Can actually be slower to evaluate than finite differences for a nonlinear problem

Alternative \(\rightarrow\) finite differences instead of analytic derivatives

Quasi-Newton: Secant method

Using our current root guess \(x^{(k)}\) and our previous root guess \(x^{(k-1)}\):

\[ f'(x^{(k)}) \approx \frac{f(x^{(k)})-f(x^{(k-1)})}{x^{(k)} - x^{(k-1)}} \]

Our new iteration rule then becomes

\[ x^{(k+1)} = x^{(k)} - \frac{x^{(k)}-x^{(k-1)}}{f(x^{(k)})-f(x^{(k-1)})}f(x^{(k)}) \]

Now we require two initial guesses so that we have an initial approximation of the derivative

Quasi-Newton: Secant method

Quasi-Newton: Broyden’s method

Broyden’s method is the most widely used rootfinding method for n-dimensional problems

  • This is a generalization of the secant method where have a sequence of guesses of the Jacobian at the root

It also relies on a 1st-order Taylor expansion about \(\mathbf{x}\), but now in n dimensions

\[ f(\mathbf{x}) \approx f(\mathbf{x^{(k)}}) + A^{(k)} (\mathbf{x} - \mathbf{x^{(k)}}) = 0 \]

We must initially provide a guess of the root, \(x^{(0)}\), but also a guess of the Jacobian, \(A_{(0)}\)

  • A good guess for \(A_{(0)}\) is to calculate it numerically at our chosen \(x^{(0)}\)

Quasi-Newton: Broyden’s method

The iteration rule is the same as before but with our guess of the Jacobian substituted in for the actual Jacobian (or the finite difference approximation)

\[ \mathbf{x^{(k+1)}} \leftarrow \mathbf{x^{(k)}} - (A^{(k)})^{-1} \, f(\mathbf{x^{(k)}}) \]

We still need to update \(A_{(k)}\): the idea of Broyden’s method is to choose a new Jacobian that satisfies the multidimensional secant condition

\[ f(\mathbf{x^{(k+1)}}) - f(\mathbf{x^{(k)}}) = A^{(k+1)}\left( \mathbf{x^{(k+1)}} - \mathbf{x^{(k)}} \right) \]

  • Any reasonable guess for the Jacobian should satisfy this condition

But this gives \(n\) conditions with \(n^2\) elements to solve for in \(A\)

Quasi-Newton: Broyden’s method

Broyden’s methods solves this under-determined problem with an assumption that focuses on the direction we are most interested in: \(d^{(k)} = \mathbf{x^{(k+1)}} - \mathbf{x^{(k)}}\)

For any direction \(q\) orthogonal to \(d^{(k)}\), it assumes that \(A^{(k+1)}q = A^{(k)}q\)

  • In other words, our next guess is as good as the current ones for any changes in \(x\) that are orthogonal to the one we are interested right now

Jointly, the secant condition and the orthogonality assumption give the iteration rule for the Jacobian:

\[ A^{(k+1)} \leftarrow A^{(k)} + \left[f(\mathbf{x^{(k+1)}}) - f(\mathbf{x^{(k)}}) - A^{(k)}d^{(k)}\right] \frac{d^{(k)T}}{d^{(k)T}d^{(k)}} \]

Quasi-Newton: Broyden’s method

The general algorithm for Broyden’s method is:

  1. Choose an initial guess for the root \(\mathbf{x}^{(0)}\) and the Jacobian \(A^{(0)}\)
  2. Calculate the next guess for the root: \(\mathbf{x}^{(k+1)}\)
  3. Calculate the next guess for the Jacobian: \(A^{(k+1)}\)
  4. Repeat 2 and 3 until convergence: \(\|\mathbf{x}^{(k+1)} - \mathbf{x}^{(k)} \| \leq tolerance\)

Quasi-Newton: Broyden’s method

One costly part of Broyden’s method is that we update the Jacobian and “invert” it every iteration

But, since we only need the Jacobian’s inverse to update \(\mathbf{x}\), we can make it faster by updating the inverse Jacobian directly

\[ B^{(k+1)} = B^{(k)} + \frac{[d^{(k)} - u^{(k)}]{d^{(k)T}}B_{(k)}}{{d^{(k)T}} u^{(k)}} \]

where \(u^{(k)} = B^{(k)}\left[f(\mathbf{x^{(k+1)}})-f(\mathbf{x^{(k)}})\right]\)

  • Most canned implementations of Broyden’s method use the inverse update rule

Quasi-Newton: Broyden’s method

Broyden converges under relatively weak conditions:

  1. \(f\) is continuously differentiable,
  2. \(x^{(0)}\) is “close” to the root of \(f\)
  3. \(f'\) is invertible around the root
  4. \(A_0\) is sufficiently close to the Jacobian

Broyden’s method: a duopoly example

Let’s revisit our previous example but now solve it using Broyden’s method

We won’t code it by hand. Instead, we use package NLsolve.jl

Function NLsolve.nlsolve has a ton of options to solve nonlinear equations

Once again, we are looking a pair \((q_1, q_2)\) which are roots to two nonlinear equations

\[ \begin{align*} f_1(q_1, q_2) & = (q_1 + q_2)^{-1/\epsilon} - (1/\epsilon)(q_1 + q_2)^{-1/\epsilon-1}q_1 - c_1 q_1 = 0 \\ f_2(q_1, q_2) & = (q_1 + q_2)^{-1/\epsilon} - (1/\epsilon)(q_1 + q_2)^{-1/\epsilon-1}q_2 - c_2 q_2 = 0 \end{align*} \]

Broyden’s method: a duopoly example

Using our previously defined \(f\), we run

using NLsolve
NLsolve.nlsolve(f, [1.0; 2.0], method=:broyden,
                xtol=:1e-8, ftol=:0.0, iterations=:1000, show_trace=:true)
Iter     f(x) inf-norm    Step 2-norm 
------   --------------   --------------
     0     1.306427e+00              NaN
     1     1.451615e+00     2.730291e-02
     2     1.963157e-01     2.710146e+00
     3     1.212705e-01     4.591065e-02
     4     1.384160e-01     2.899985e-04
     5     6.349495e-03     2.595537e-02
     6     1.049686e-03     4.755299e-05
     7     3.404813e-05     1.308004e-06
     8     3.486624e-07     1.411664e-09
     9     3.116015e-10     1.486563e-13
    10     1.252332e-13     1.177236e-19
Results of Nonlinear Solver Algorithm
 * Algorithm: broyden without line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.8395676035355293, 0.6887964311629567]
 * Inf-norm of residuals: 0.000000
 * Iterations: 10
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 37
 * Jacobian Calls (df/dx): 0

Broyden’s method: a duopoly example

  • The first and second arguments are \(f\) and an initial guess
    • nlsolve will automatically generate a Jacobian guess for us
  • method=:broyden tells nlsolve to use Broyden’s methods

The other arguments are optional

  • xtol is the convergence tolerance over \(x\): \(\| \mathbf{x^{(k+1)}} - \mathbf{x^{(k)}} \| \leq\) xtol (default is 0.0 meaning no criterion)
  • ftol is the convergence tolerance over \(f(x)\): \(\| f(\mathbf{x^{(k+1)}}) - f(\mathbf{x^{(k)}}) \| \leq\) ftol (default is 1e-8)
  • iterations set the maximum number of iterations before declaring non-convergence (default is 1000)
  • show_trace will print all the iterations if you set to true (default is false)

NLsolve.jl

We can use this package to solve with Newton’s method as well. Here we make use of the analytic Jacobian we defined earlier

NLsolve.nlsolve(f, f_jacobian, [1.0; 2.0], method=:newton,
                xtol=:1e-8, ftol=:0.0, iterations=:1000)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.8395676035356598, 0.6887964311630005]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

NLsolve.jl

If you omit the Jacobian, nlsolve will calculate it numerically for you using centered finite differencing

NLsolve.nlsolve(f, [1.0; 2.0], method=:newton,
                xtol=:1e-8, ftol=:0.0, iterations=:1000)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.83956760353566, 0.6887964311630005]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

NLsolve.jl

You can add argument autodiff=:forward to use forward autodifferentiation instead of finite differences

NLsolve.nlsolve(f, [1.0; 2.0], method=:newton, autodiff=:forward,
                xtol=:1e-8, ftol=:0.0, iterations=:1000)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.8395676035356598, 0.6887964311630005]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

Quick detour

Before we continue, take a look again at our \(f\)

function f(q)
    Q = sum(q)
    F = Q^(-1/epsilon) .- (1/epsilon)Q^(-1/epsilon-1) .*q .- c .*q
end;

Do you see any potential inefficiency?

We allocate a new F every time we call this function!

Instead, we can be more efficient by writing a function that modifies a pre-allocated vector

Quick detour: functions that modify arguments

By convention, in Julia we name functions that modify arguments with a ! at the end. For our \(f\), we can define

function f!(F, q)
    F[1] = sum(q)^(-1/epsilon) - (1/epsilon)sum(q)^(-1/epsilon-1)*q[1] - c[1]*q[1]
    F[2] = sum(q)^(-1/epsilon) - (1/epsilon)sum(q)^(-1/epsilon-1)*q[2] - c[2]*q[2]
end;
F = zeros(2) # This allocates a 2-vector with elements equal to zero
f!(F, [0.2; 0.2]); # Note the first argument is a pre-allocated vector
F
2-element Vector{Float64}:
 1.0989480808247896
 1.0589480808247895

Quick detour: functions that modify arguments

NLsolve.nlsolve understands when we pass a ! function and pre-allocates the vector for us

Because it allocates only once, it will be more efficient

NLsolve.nlsolve(f!, [1.0; 2.0], method=:newton,
                xtol=:1e-8, ftol=:0.0, iterations=:1000)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.83956760353566, 0.6887964311630005]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

Comparing methods

Convergence speed

Rootfinding algorithms will converge at different speeds in terms of the number of operations

A sequence of iterates \(x^{(k)}\) is said to converge to \(x^*\) at a rate of order \(p\) if there is a constant \(C\) such that

\[ \|x^{(k+1)}-x^{*}\|\leq C \|x^{(k)} - x^{*}\|^p \]

for sufficiently large \(k\)

Convergence speed

\[ \|x^{(k+1)}-x^{*}\|\leq C \|x^{(k)} - x^{*}\|^p \]

  • If \(C < 1\) and \(p = 1\): linear convergence
  • If \(1 < p < 2\): superlinear convergence
  • If \(p = 2\): quadratic convergence

The higher order the convergence rate, the faster it converges

Convergence speed

How fast do the methods we’ve seen converge?

  • Bisection: linear rate with \(C = 0.5\)
  • Function iteration: linear rate with \(C = ||f'(x^*)||\)
  • Secant and Broyden: superlinear rate with \(p \approx 1.62\)
  • Newton: \(p = 2\)

Convergence speed

Consider an example where \(f(x) = x - \sqrt(x) = 0\)

This is how the 3 main approaches converge in terms of the \(L^1-\)norm for an initial guess \(x^{(0)} = 0.5\)

Choosing a solution method

Convergence rates only account for the number of iterations of the method

The steps taken in a given iteration of each solution method may vary in computational cost because of differences in the number of arithmetic operations

Although an algorithm may take more iterations to solve, each iteration may be solved faster and the overall algorithm takes less time

Choosing a solution method

  • Bisection method only requires a single function evaluation during each iteration
  • Function iteration only requires a single function evaluation during each iteration
  • Broyden’s method requires both a function evaluation and matrix multiplication
  • Newton’s method requires a function evaluation, a derivative evaluation, and solving a linear system

\(\rightarrow\) Bisection and function iteration are usually slow

\(\rightarrow\) Broyden’s method can be faster than Newton’s method if derivatives are costly to compute

Choosing a solution method

Besides convergence rates and algorithm speed, you should also factor development time

  • Newton’s method is fastest to converge
    • If deriving and programming the Jacobian is relatively simple and not too costly to compute, this method is a good choice
  • If derivatives are complex, quasi-Newton methods are good candidates
  • Bisection and function iteration are generally dominated options but are easy to program/debug, so they have value as a quick proof-of-concept
    • Bisection is often used in hybrid methods, such as Dekker’s and Brent’s. Hybrid methods select between bisection, secant, or other basic solution methods every iteration depending on a set of criteria

Common problems and solutions

When solvers fail: domain errors

A very common problem with Newton and Broyden: the solver tries values outside the function’s domain

Consider a market equilibrium with isoelastic supply \(S_i = c_i p_i^{\sigma_i}\) and demand \(D_i = y\, p_i^{\epsilon_{ii}} p_j^{\epsilon_{ij}}\)

\[ \begin{align} f_1(p_1, p_2) & = c_1 p_1^{\sigma_1} - y \, p_1^{\epsilon_{11}} p_2^{\epsilon_{12}} = 0 \\ f_2(p_1, p_2) & = c_2 p_2^{\sigma_2} - y \, p_1^{\epsilon_{21}} p_2^{\epsilon_{22}} = 0 \end{align} \]

Prices must be positive: \(p^{\sigma}\) with non-integer \(\sigma\) is only defined for \(p > 0\)

But the solver doesn’t know that! If a Newton/Broyden step lands on \(p_i \leq 0\), Julia throws a DomainError

When solvers fail: domain errors

Let’s see it happen

using NLsolve
c = [2.0; 3.0]; σ = [1.5; 1.3];
ϵ_own = [-1.2; -1.6]; ϵ_cross = [0.2; 0.2]; y₀ = 10.0;

function f!(F, p)
    F[1] = c[1]*p[1]^σ[1] - y₀*p[1]^ϵ_own[1]*p[2]^ϵ_cross[1]
    F[2] = c[2]*p[2]^σ[2] - y₀*p[1]^ϵ_cross[2]*p[2]^ϵ_own[2]
end

# From a good initial guess, it works:
NLsolve.nlsolve(f!, [1.5; 1.5], method=:broyden)
Results of Nonlinear Solver Algorithm
 * Algorithm: broyden without line-search
 * Starting Point: [1.5, 1.5]
 * Zero: [1.8777225419239614, 1.5818822320913937]
 * Inf-norm of residuals: 0.000000
 * Iterations: 9
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 45
 * Jacobian Calls (df/dx): 0

When solvers fail: domain errors

# From an asymmetric guess, it crashes:
try
    NLsolve.nlsolve(f!, [0.05; 2.0], method=:broyden)
catch e
    println("Error: $e")
end
Error: DomainError(-3.57491732087526, "Exponentiation yielding a complex result requires a complex argument.\nReplace x^y with (x+0im)^y, Complex(x)^y, or similar.")

What happened? Broyden’s approximate Jacobian led to a step that pushed \(p_1\) below zero. Julia tried to compute a negative number raised to a fractional power and threw a DomainError

Why this happens

Newton and Broyden are unconstrained methods: they search over all of \(\mathbb{R}^n\)

This is problematic when:

  • Variables must be positive (prices, quantities, income)
  • You use isoelastic functions: \(x^{\alpha}\) with non-integer \(\alpha\) requires \(x > 0\)
  • Your initial guess is far from the solution \(\rightarrow\) large steps \(\rightarrow\) overshooting

Broyden is more prone to this than Newton because its Jacobian approximation can degrade, leading to poorly directed steps

Three common strategies to deal with this:

  1. Start with a better initial guess
  2. Try a different method
  3. Transform variables so the solver’s domain is unrestricted

Strategy 1: Better initial guesses

Idea: Solve a simpler version of the problem first, then use that solution as the starting point

For a system with cross-price effects, we can ignore them and solve each market in isolation

\[ c_i p_i^{\sigma_i} = y \, p_i^{\epsilon_{ii}} \implies p_i^{(\sigma_i - \epsilon_{ii})} = \frac{y}{c_i} \implies p_i^{0} = \left(\frac{y}{c_i}\right)^{\frac{1}{\sigma_i - \epsilon_{ii}}} \]

This has a closed-form solution and gives a reasonable starting point for the coupled system

Strategy 1: Better initial guesses

# Solve each market in isolation (set cross-price elasticities to zero)
p0 = [(y₀/c[i])^(1/(σ[i] - ϵ_own[i])) for i in 1:2];
println("Simplified guess: $p0");
Simplified guess: [1.8150048061254909, 1.5146176577290846]
NLsolve.nlsolve(f!, p0, method=:broyden)
Results of Nonlinear Solver Algorithm
 * Algorithm: broyden without line-search
 * Starting Point: [1.8150048061254909, 1.5146176577290846]
 * Zero: [1.8777225419271355, 1.5818822320989752]
 * Inf-norm of residuals: 0.000000
 * Iterations: 7
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 40
 * Jacobian Calls (df/dx): 0

Strategy 2: Try a different method

Broyden failed because its Jacobian approximation was inaccurate far from the solution

Newton’s method computes (or approximates) the Jacobian at every step, so it is less prone to overshooting

# Newton with autodiff: same bad starting point, no DomainError
NLsolve.nlsolve(f!, [0.05; 2.0], method=:newton, autodiff=:forward)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [0.05, 2.0]
 * Zero: [1.8777225419239383, 1.581882232092248]
 * Inf-norm of residuals: 0.000000
 * Iterations: 10
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 11
 * Jacobian Calls (df/dx): 11

Strategy 3: Variable transformation

A more robust approach: change variables so the domain is unrestricted

If \(p\) must be positive, define \(z = \log(p)\), so \(p = e^z\)

  • \(z \in \mathbb{R}\): no domain restriction
  • \(e^z > 0\) always: prices stay positive regardless of what the solver does

Instead of solving \(f(p) = 0\), we solve \(g(z) \equiv f(e^z) = 0\)

After convergence, recover \(p^* = e^{z^*}\)

Strategy 3: Variable transformation

For our market equilibrium, substitute \(p_i = e^{z_i}\)

\[ \begin{align} g_1(z_1, z_2) & = c_1 e^{\sigma_1 z_1} - y \, e^{\epsilon_{11} z_1 + \epsilon_{12} z_2} = 0 \\ g_2(z_1, z_2) & = c_2 e^{\sigma_2 z_2} - y \, e^{\epsilon_{21} z_1 + \epsilon_{22} z_2} = 0 \end{align} \]

This is the same system, parameterized differently

Same equilibrium: if \(z^*\) solves \(g = 0\), then \(p^* = e^{z^*}\) solves \(f = 0\)

But now the solver can explore any \(z \in \mathbb{R}^2\) without ever triggering a domain error

Strategy 3: Variable transformation

function g!(G, z)
    G[1] = c[1]*exp(σ[1]*z[1]) - y₀*exp(ϵ_own[1]*z[1] + ϵ_cross[1]*z[2])
    G[2] = c[2]*exp(σ[2]*z[2]) - y₀*exp(ϵ_cross[2]*z[1] + ϵ_own[2]*z[2])
end

# Same starting point that failed before: p = (0.05, 2.0) → z = log.((0.05, 2.0))
result = NLsolve.nlsolve(g!, log.([0.05; 2.0]), method=:broyden)
p_solution = exp.(result.zero)
println("Solution: p = $p_solution")
Solution: p = [1.877722541843727, 1.5818822321046384]

No DomainError. The solver explored negative \(z\) values, and \(e^z\) kept prices positive throughout

When to use each strategy

  1. Better initial guess: simple, requires no code changes to \(f\), but needs problem-specific insight

  2. Different method: Newton with autodiff is more robust than Broyden with finite differences, at the cost of more computation per step

  3. Log transformation: most robust, works from distant guesses, but adds a layer of indirection and can lead to scaling issues if the solution is very large/small

Note: A related transformation for variables in \((0,1)\) (e.g., probabilities or shares):

\[ z = \log\!\left(\frac{x}{1-x}\right), \quad x = \frac{1}{1+e^{-z}} \]

This is the logit transformation: \(z \in \mathbb{R}\) maps to \(x \in (0,1)\)

When solvers fail: convergence problems

We’ve seen that Newton’s method converges if the initial guess is sufficiently close to a root where \(f'\) is invertible

But what does failure look like in practice?

Three classic failure modes:

  1. Cycling: iterates bounce back and forth without approaching the root
  2. Divergence: iterates move farther and farther from the root
  3. Slow convergence: iterates approach the root, but VERY slowly

Failure mode 1: Cycling

Consider \(f(x) = x^3 - 2x + 2\)

Starting at \(x^{(0)} = 0\):

  • \(f(0) = 2\), \(f'(0) = -2\) \(\Rightarrow x^{(1)} = 0 - \frac{2}{-2} = 1\)
  • \(f(1) = 1\), \(f'(1) = 1\) \(\Rightarrow x^{(2)} = 1 - \frac{1}{1} = 0\)

The method enters a perfect 2-cycle: the tangent at \(x = 0\) hits zero at \(x = 1\), and the tangent at \(x = 1\) hits zero at \(x = 0\)

The actual root is at \(x^* \approx -1.77\), which Newton never reaches from this starting point

Failure mode 1: Cycling

Let’s verify with a modified version of our newtons_method function

f(x) = x^3 - 2*x + 2;
f_prime(x) = 3*x^2 - 2;

We need to add a maximum iteration count so the loop terminates

function newtons_method_maxiter(f, f_prime, initial_guess; maxiter=20)
  tolerance = 1e-3   # tolerance for solution
  difference = Inf   # initialize difference
  x = initial_guess  # initialize current value
  
  for k in 1:maxiter
      println("Iteration $k: x = $(round(x, digits=6))")
      x_prev = x
      x = x_prev - f(x_prev)/f_prime(x_prev)
      difference = x - x_prev
      abs(difference) < tolerance && return x
  end
  println("Did not converge after $maxiter iterations")
  return x
end;

Failure mode 1: Cycling

The iterates alternate between 0.0 and 1.0 indefinitely

newtons_method_maxiter(f, f_prime, 0.0; maxiter=10);
Iteration 1: x = 0.0
Iteration 2: x = 1.0
Iteration 3: x = 0.0
Iteration 4: x = 1.0
Iteration 5: x = 0.0
Iteration 6: x = 1.0
Iteration 7: x = 0.0
Iteration 8: x = 1.0
Iteration 9: x = 0.0
Iteration 10: x = 1.0
Did not converge after 10 iterations

Failure mode 1: Cycling

Show the code
using Plots; gr()

f(x) = x^3 - 2*x + 2;
f_prime(x) = 3*x^2 - 2;

xgrid = range(-2.5, 2.0, length=300)
p = plot(xgrid, f.(xgrid), label="f(x) = x³ - 2x + 2", color=:black, linewidth=2)
hline!([0], color=:gray, linestyle=:dash, label=nothing)

# Newton steps: tangent from (0, f(0)) to (1, 0), then (1, f(1)) to (0, 0)
xs_cycle = [0.0, 1.0, 0.0, 1.0, 0.0]
colors_cycle = [:royalblue, :crimson]
for k in 1:4
    xk = xs_cycle[k]; xk1 = xs_cycle[k+1]
    plot!([xk, xk1], [f(xk), 0.0], color=colors_cycle[mod1(k,2)],
          linestyle=:dot, linewidth=1.5, label=nothing)
    plot!([xk1, xk1], [0.0, f(xk1)], color=colors_cycle[mod1(k,2)],
          linestyle=:dot, linewidth=1.5, label=nothing)
end
scatter!([0.0, 1.0], [f(0.0), f(1.0)], color=:red, markersize=6, label="Iterates")
title!("Cycling: Newton bounces between x = 0 and x = 1")
xlabel!("x"); ylabel!("f(x)")
p

Failure mode 2: Divergence

Consider \(f(x) = \text{sign}(x)|x|^{1/3}\) (the cube root)

The root is at \(x^* = 0\), with \(f'(x) = \frac{1}{3}|x|^{-2/3}\). The Newton update simplifies to:

\[ x^{(k+1)} = x^{(k)} - \frac{x^{(k)^{1/3}}}{\frac{1}{3}x^{(k)^{-2/3}}} = x^{(k)} - 3x^{(k)} = -2x^{(k)} \]

Each step doubles the distance to the root and flips the sign

The function is too flat near the root: the tangent line always overshoots

Failure mode 2: Divergence

The iterates grow geometrically: \(0.5, -1, 2, -4, 8, \ldots\)

f(x) = sign(x) * abs(x)^(1/3);
f_prime(x) = (1/3) * abs(x)^(-2/3);

newtons_method_maxiter(f, f_prime, 0.5; maxiter=8);
Iteration 1: x = 0.5
Iteration 2: x = -1.0
Iteration 3: x = 2.0
Iteration 4: x = -4.0
Iteration 5: x = 8.0
Iteration 6: x = -16.0
Iteration 7: x = 32.0
Iteration 8: x = -64.0
Did not converge after 8 iterations

Failure mode 2: Divergence

Show the code
f(x) = sign(x) * abs(x)^(1/3);
f_prime(x) = (1/3) * abs(x)^(-2/3);

# Collect iterates
xs_div = [0.5]
for k in 1:6
    xk = xs_div[end]
    push!(xs_div, xk - f(xk)/f_prime(xk))
end

xlim = maximum(abs.(xs_div)) * 1.2
xgrid = range(-xlim, xlim, length=500)
p = plot(xgrid, f.(xgrid), label="f(x) = sign(x)|x|^(1/3)", color=:black, linewidth=2)
hline!([0], color=:gray, linestyle=:dash, label=nothing)

for k in 1:min(length(xs_div)-1, 5)
    xk = xs_div[k]; xk1 = xs_div[k+1]
    plot!([xk, xk1], [f(xk), 0.0], color=:royalblue,
          linestyle=:dot, linewidth=1.5, label=nothing, alpha=0.4+0.12*k)
    plot!([xk1, xk1], [0.0, f(xk1)], color=:royalblue,
          linestyle=:dot, linewidth=1.5, label=nothing, alpha=0.4+0.12*k)
end
scatter!(xs_div[1:6], f.(xs_div[1:6]), color=:red, markersize=5, label="Iterates")
title!("Divergence: each step doubles the distance from the root")
xlabel!("x"); ylabel!("f(x)")
p

Failure mode 3: Slow convergence

Consider \(f(x) = x^3\), which has a root at \(x^* = 0\)

But this is a root with multiplicity 3: \(f'(0) = 0\)

The Newton update simplifies to:

\[ x^{(k+1)} = x^{(k)} - \frac{(x^{(k)})^3}{3(x^{(k)})^2} = \frac{2}{3}x^{(k)} \]

So the error only shrinks by a factor of \(2/3\) each step: linear convergence instead of the quadratic convergence we expect from Newton’s method

Failure mode 3: Slow convergence

f(x) = x^3;
f_prime(x) = 3*x^2;

newtons_method_maxiter(f, f_prime, 1.0; maxiter=50);
Iteration 1: x = 1.0
Iteration 2: x = 0.666667
Iteration 3: x = 0.444444
Iteration 4: x = 0.296296
Iteration 5: x = 0.197531
Iteration 6: x = 0.131687
Iteration 7: x = 0.087791
Iteration 8: x = 0.058528
Iteration 9: x = 0.039018
Iteration 10: x = 0.026012
Iteration 11: x = 0.017342
Iteration 12: x = 0.011561
Iteration 13: x = 0.007707
Iteration 14: x = 0.005138
Iteration 15: x = 0.003425
Iteration 16: x = 0.002284

Failure mode 3: Slow convergence

Compare with a well-behaved case: \(f(x) = x^3 - 1\) has a simple root at \(x^* = 1\)

Show the code
# Collect errors for x^3 (repeated root)
f_slow(x) = x^3; fp_slow(x) = 3*x^2;
x = 1.0; errors_slow = [abs(x)]
for k in 1:50
    x = x - f_slow(x)/fp_slow(x)
    push!(errors_slow, abs(x))
end

# Collect errors for x^3 - 1 (simple root)
f_fast(x) = x^3 - 1; fp_fast(x) = 3*x^2;
x = 2.0; errors_fast = [abs(x - 1.0)]
for k in 1:15
    x = x - f_fast(x)/fp_fast(x)
    push!(errors_fast, abs(x - 1.0))
end

p = plot(0:length(errors_slow)-1, errors_slow, marker=:circle, markersize=3,
     color=:crimson, label="x³ (repeated root, linear rate)",
     yscale=:log10, xlabel="Iteration", ylabel="Error (log scale)",
     ylims=(1e-16, 10), linewidth=1.5)
plot!(0:length(errors_fast)-1, errors_fast, marker=:diamond, markersize=3,
     color=:royalblue, label="x³ - 1 (simple root, quadratic rate)", linewidth=1.5)
title!("Convergence comparison: repeated vs simple root")
p

Failure mode 3: Slow convergence

The simple root reaches machine precision in about 7 iterations

The repeated root (level and 1st derivative) is still crawling after 50 iterations. Why?

Why? Newton’s quadratic convergence requires \(f'(x^*) \neq 0\)

When \(f'(x^*) = 0\) (repeated root), the method degrades to linear convergence!

Summary: convergence failures

Failure mode Example What goes wrong Root cause
Cycling \(x^3 - 2x + 2\) at \(x_0=0\) Iterates form a periodic orbit Tangent lines create a closed loop from this starting point
Divergence \(x^{1/3}\) at any \(x_0 \neq 0\) Iterates grow without bound \(\|f/f'\|\) grows near the root: steps get larger, not smaller
Slow convergence \(x^3\) near \(x^*=0\) Linear instead of quadratic rate \(f'(x^*) = 0\) (repeated root): steps shrink, but only linearly

Summary: convergence failures

So, all three cases violate assumptions of Newton’s convergence theorem.

  1. Cycling: bad starting point relative to the function’s geometry
  2. Divergence: \(f'\) is not continuously differentiable (at the root) and grows unboundedly near the root
  3. Slow convergence: \(f'\) is singular at the root