ACE 592 - Lecture 3.2

Nonlinear Systems of Equations

Author
Affiliation

Diego S. Cardoso

University of Illinois Urbana-Champaign

0.1 Course Roadmap

  1. Introduction to Scientific Computing
  2. Fundamentals of numerical methods
  3. Systems of equations
    1. Linear systems
    2. Nonlinear systems
    3. Complementarity problems
  4. Optimization
  5. Function approximation
  6. Structural estimation

0.2 Agenda

  • We continue our study of methods to solve systems of equations, now looking at systems that contain nonlinear equations
  • Unlike linear systems, with nonlinear systems we don’t have guarantees of closed-form solutions, so solving them is harder
  • We will study two types of methods: derivative-free and derivative-based

0.3 Main references for today

  • Miranda & Fackler (2002), Ch. 3
  • Judd (1998), Ch. 5
  • Nocedal & Wright (2006), Ch. 11
  • Lecture notes for Ivan Rudik’s Dynamic Optimization (Cornell)

1 Types of nonlinear equation problems

1.1 Types of nonlinear equation problems

Problems involving nonlinear equations are very common in economics

We can classify them into two types

  1. Systems of nonlinear equations
  2. Complementarity problems

1.2 System of nonlinear equations

These come in two forms

  1. Rootfinding problems

For a function f: \mathbb{R}^n \rightarrow \mathbb{R}^n, we want to find a n-vector x that satisfies f(x) = 0

\rightarrow Any x that satisfies this condition is called a root of f

Examples?

. . .

  • Market equilibrium in general: market clearing conditions
  • No-arbitrage conditions (pricing models)
  • Solving first-order optimality conditions

1.3 System of nonlinear equations

  1. Fixed-point problems

For a function g: \mathbb{R}^n \rightarrow \mathbb{R}^n, we want to find a n-vector x that satisfies x = g(x) \rightarrow Any x that satisfies this condition is called a fixed point of g

Examples?

. . .

  • Best-response functions
  • Many equilibrium concepts in game theory
    • A Nash equilibrium is a fixed point in the strategy space

1.4 System of nonlinear equations

Rootfinding and fixed-point problems are equivalent

We can easily convert one into another

  • Rootfinding \rightarrow fixed-point

. . .

  • Define a new g(x) = x - f(x)

. . .

  • Fixed-point \rightarrow rootfinding
    • Define a new f(x) = x - g(x)

1.5 Complementarity problems

In these problems, we have

  • A function f: \mathbb{R}^n \rightarrow \mathbb{R}^n
  • n-vectors a and b, with a<b

And we are looking for an n-vector x \in [a,b] such that for all i = 1,\dots,n

\begin{align*} x_i > a_i \Rightarrow f_i(x) \geq 0 \\ x_i < b_i \Rightarrow f_i(x) \leq 0 \end{align*}

. . .

Examples of complementarity problems?

. . .

  • Market equilibrium with constraints: quotas, price support, non-negativity, limited capacity, etc
  • First-order conditions of constrained function maximization/minimization

1.6 Complementarity problems

\begin{align*} x_i > a_i \Rightarrow f_i(x) \geq 0 \\ x_i < b_i \Rightarrow f_i(x) \leq 0 \end{align*}

What do these equations mean?

. . .

If the constraints on x_i do not bind ( a_i < x_i < b_i) , then the first-order condition is precisely zero

. . .

But suppose the upper bound binds ( x_i = b_i ). Then f_i(x) \geq 0 since x_i > a_i

  • But we can’t guarantee that f_i(x) = 0 because f_i(x) might still be increasing at that point

1.7 Complementarity problems

Rootfinding is a special case of complementarity problems: a = -\infty and b = \infty

But complementarity problems are not just about finding a root within [a, b]

  • Remember: if some x_i is at the boundary ( x_i = a_i or x_i = b_i ), some element of f(x) can be non-zero!

. . .

  • NOTE: We will complementary problems next lecture (separate slide deck)

2 Derivative-free methods

2.1 Rootfinding problems

Let’s start simple: we have a continuous function f:[a,b]\in\mathbb{R} \rightarrow\mathbb{R} and we know that f(a)<0 and f(b)>0

What does the Intermediate Value Theorem says here?

. . .

If f is continuous and f(a) \neq f(b), then f must assume all values in between f(a) and f(b)

  • So if f(a)<0 and f(b)>0 \Rightarrow there must be at least one root x\in[a,b] such that f(x) = 0

. . .

How would you go about finding a root?

2.2 Bisection method

Basic idea: split the search interval in two parts and check whether there’s a root in each part

How do we check that?

. . .

  • By looking at the signs of f(x) at the boundaries of each interval
    • If they are different, there’s a root \Rightarrow we keep looking there

Let’s see an illustration

2.3 Bisection method

We start with l = a, u = b and find m = (u+l)/2

Let’s say f(l)<0, f(u)>0, and f(m)>0. What do we do next?

2.4 Bisection method

Since f(l)<0 and f(m)>0 have different signs, we continue our search in [l,m]

We set u \leftarrow m

. . .

Then we calculate the new midpoint m

. . .

Now say f(l)<0, f(u)>0, and f(m)<0. What do we do next?

2.5 Bisection method

Since f(m)<0 and f(u)>0 have different signs, we continue our search in [m,u]

We set l \leftarrow m

And the search continues until we are satisfied with the precision

2.6 Try it! Bisection method

Your turn!

  • Write a function that takes (f, a, b) and returns a root of f using the bisection method.
  • Then, use it to find the root of f(x) = -x^{-2} + x - 1 between [0.2, 4]

Here are the basic steps:

  1. Start with a lower ( l = a ) and an upper ( u = b ) bounds
  2. Get the midpoint m = (u+l)/2
  3. Check the sign of f(m)
  4. If sign(f(m)) = sign(f(l)), move lower bound up: l \leftarrow m
  5. If sign(f(m)) = sign(f(u)), move upper bound down: u \leftarrow m
  6. Repeat 2 and 3 until our interval is short enough ( (u-l)/2 < tol ) and return x = m

2.7 Bisection method

Show the code
function bisection(f, lo, up)
    tolerance = 1e-3          # tolerance for solution
    mid = (lo + up)/2         # initial guess, bisect the interval
    difference = (up - lo)/2  # initialize bound difference

    while difference > tolerance         # loop until convergence
        println("Intermediate guess: $mid")
        if sign(f(lo)) == sign(f(mid)) # if the guess has the same sign as the lower bound
            lo = mid                   # a solution is in the upper half of the interval
            mid = (lo + up)/2
        else                           # else the solution is in the lower half of the interval
            up = mid
            mid = (lo + up)/2
        end
        difference = (up - lo)/2       # update the difference 
    end
    println("The root of f(x) is $mid")
end;

2.8 Bisection method

Code
f(x) = -x^(-2) + x - 1;
bisection(f, 0.2, 4.0)
Intermediate guess: 2.1
Intermediate guess: 1.1500000000000001
Intermediate guess: 1.625
Intermediate guess: 1.3875000000000002
Intermediate guess: 1.50625
Intermediate guess: 1.4468750000000001
Intermediate guess: 1.4765625
Intermediate guess: 1.4617187500000002
Intermediate guess: 1.469140625
Intermediate guess: 1.4654296875000001
Intermediate guess: 1.46728515625
The root of f(x) is 1.4663574218750002

2.9 Bisection method

2.10 Bisection method

What happens if we specify the wrong interval (i.e, there’s no root in there)?

. . .

It will go towards the boundaries

Code
bisection(f, 2.0, 4.0)
Intermediate guess: 3.0
Intermediate guess: 3.5
Intermediate guess: 3.75
Intermediate guess: 3.875
Intermediate guess: 3.9375
Intermediate guess: 3.96875
Intermediate guess: 3.984375
Intermediate guess: 3.9921875
Intermediate guess: 3.99609375
Intermediate guess: 3.998046875
The root of f(x) is 3.9990234375

2.11 Bisection method

So it’s a good idea to check if you have the right boundaries

Code
f(2.0)
0.75
Code
f(4.0)
2.9375

These are both positive. So by the IVT, we can’t know for sure if there’s a root here

2.12 Bisection method

The bisection method is incredibly robust: if a function f satisfies the IVT, it is guaranteed to converge in a finite number of iterations

. . .

A root can be calculated to arbitrary precision \epsilon in a maximum of log_2\frac{b-a}{\epsilon} iterations

But robustness comes with drawbacks:

  1. It only works in one dimension
  2. It is slow because it only uses information about the function’s level but not its variation

2.13 Function iteration

For the next method, we recast rootfinding as a fixed-point problem

. . .

f(x) = 0 \Rightarrow g(x) = x - f(x)

Then, we start with an initial guess x^{(0)} and iterate x^{(k+1)}\leftarrow g(x^{k}) until convergence: |x^{(k+1)}-x^{(k)}| \approx 0

. . .

Let’s try it!

  • Write a function that takes (f, initial_guess) and returns a root of f using function iteration
  • Then, use it to find the root of f(x) = -x^{-2} + x - 1 with initial_guess = 1

2.14 Function iteration

Show the code
function function_iteration(f, initial_guess)
    tolerance = 1e-3   # tolerance for solution
    difference = Inf   # initialize difference
    x = initial_guess  # initialize current value
    
    while abs(difference) > tolerance # loop until convergence
        println("Intermediate guess: $x")
        x_prev = x  # store previous value
        x = x_prev - f(x_prev) # calculate next guess
        difference = x - x_prev # update difference
    end
    println("The root of f(x) is $x")
end;

2.15 Function iteration

Code
f(x) = -x^(-2) + x - 1;
function_iteration(f, 1.0)
Intermediate guess: 1.0
Intermediate guess: 2.0
Intermediate guess: 1.25
Intermediate guess: 1.6400000000000001
Intermediate guess: 1.37180249851279
Intermediate guess: 1.5313942135189396
Intermediate guess: 1.426408640598956
Intermediate guess: 1.4914870486759138
Intermediate guess: 1.4495324290188554
Intermediate guess: 1.475931147477801
Intermediate guess: 1.4590582576091302
Intermediate guess: 1.4697369615928915
Intermediate guess: 1.4629358005751476
Intermediate guess: 1.4672501656759618
Intermediate guess: 1.46450636089898
Intermediate guess: 1.4662485297695573
Intermediate guess: 1.4651412125747465
The root of f(x) is 1.4658445625216316

2.16 Function iteration

A fixed point x = g(x) is at the intersection between g(x) and the 45o line

  • Starting from x^{(0)}, we calculate g(x^{(0)}) and find the corresponding point on the 45o line for x^{(1)}
  • We keep iterating until we (approximately) find the fixed point

2.17 Function iteration

Function iteration is guaranteed to converge to a fixed point x^* if

  1. g is differentiable, and
    1. the initial guess is “sufficiently close” to an x^* at which \|g^\prime (x^{*}) \| < 1

It may also converge when these conditions are not met

Since this is an easy method to implement, it’s worth trying it before switching to more complex methods

2.18 Function iteration

But wait: What is “sufficiently close”?

. . .

Good question! There is no practical formula. As Miranda and Fackler (2002) put it

Typically, an analyst makes a reasonable guess for the root f and counts his/her blessings if the iterates converge. If the iterates do not converge, then the analyst must look more closely at the properties of f to find a better starting value, or change to another rootfinding method.

This is where science also becomes a bit of an art

3 Derivative-based methods

3.1 Newton’s method

Newton’s method and variants are the workhorses of solving n-dimensional non-linear problems

. . .

Key idea: take a hard non-linear problem and replace it with a sequence of linear problems \rightarrow successive linearization

3.2 Newton’s method

  1. Start with an initial guess of the root at x^{(0)}
  2. Approximate the non-linear function with its first-order Taylor expansion around x^{(0)}
  • This is just the tangent line at x^0
  1. Solve for the root of this linear approximation, call it x^{(1)}

3.3 Newton’s method

  1. Repeat starting at x^{(1)} until we converge to x^*

This can be applied to a function with an arbitrary number of dimensions

3.4 Newton’s method

Formally: begin with some initial guess of the root vector \mathbf{x^{(0)}}

. . .

Given \mathbf{x^{(k)}}, our new guess \mathbf{x^{(k+1)}} is obtained by approximating f(\mathbf{x}) using a first-order Taylor expansion around \mathbf{x^{(k)}}

f(\mathbf{x}) \approx f(\mathbf{x^{(k)}}) + f'(\mathbf{x^{(k)}})(\mathbf{x^{(k+1)}}-\mathbf{x^{(k)}}) = 0

Then, solve for \mathbf{x^{(k+1)}}:

\mathbf{x^{(k+1)}} = \mathbf{x^{(k)}} - \left[f'(\mathbf{x^{(k)}})\right]^{-1}f(\mathbf{x^{(k)}})

3.5 Your turn! Newton’s method

Try it!

  • Write a function that takes (f, f_prime, initial_guess) and returns a root of f using Newton’s method
  • Then use it to find the root of f(x) = -x^{-2} + x - 1 with initial_guess = 1

\mathbf{x^{(k+1)}} \leftarrow \mathbf{x^{(k)}} - \left[f'(\mathbf{x^{(k)}})\right]^{-1}f(\mathbf{x^{(k)}})

3.6 Newton’s method

Show the code
function newtons_method(f, f_prime, initial_guess)
  tolerance = 1e-3   # tolerance for solution
  difference = Inf   # initialize difference
  x = initial_guess  # initialize current value
  
  while abs(difference) > tolerance # loop until convergence
      println("Intermediate guess: $x")
      x_prev = x  # store previous value
      x = x_prev - f(x_prev)/f_prime(x_prev) # calculate next guess
      # ^ this is the only line that changes from function iteration
      difference = x - x_prev # update difference
  end
  println("The root of f(x) is $x")
end;

3.7 Newton’s method

Code
f(x) = -x^(-2) + x - 1;
f_prime(x) = 2x^(-3) + 1;
newtons_method(f, f_prime, 1.0)
Intermediate guess: 1.0
Intermediate guess: 1.3333333333333333
Intermediate guess: 1.4576271186440677
Intermediate guess: 1.4655459327062879
The root of f(x) is 1.465571231622256

3.8 Newton’s method

Newton’s method has nice properties regarding convergence and speed

It converges if

  1. If f(x) is continuously differentiable,
  2. The initial guess is “sufficiently close” to the root, and
  3. f(x) is invertible near the root

. . .

We need f(x) to be invertible so the algorithm above is well defined

. . .

If f'(x) is ill-conditioned we can run into problems with rounding error

3.9 Newton’s method: a duopoly example

Inverse demand: P(q) = q^{-1/\epsilon}

Two firms with costs: C_i(q_i) = \frac{1}{2}c_i q_i^2

. . .

Firm i’s profits: \pi_i(q_1,q_2) = P(q_1 + q_2)q_i - C_i(q_i)

. . .

Firms take other’s output as given. So their first-order conditions are

\frac{\partial \pi_i}{\partial q_i} = P(q_1 + q_2) + P^\prime(q_1 + q_2)q_i - C_i^\prime(q_i) = 0

3.10 Newton’s method: a duopoly example

We are looking for an equilibrium: a pair (q_1, q_2) which are roots to two nonlinear equations

\begin{align*} f_1(q_1, q_2) & = (q_1 + q_2)^{-1/\epsilon} - (1/\epsilon)(q_1 + q_2)^{-1/\epsilon-1}q_1 - c_1 q_1 = 0 \\ f_2(q_1, q_2) & = (q_1 + q_2)^{-1/\epsilon} - (1/\epsilon)(q_1 + q_2)^{-1/\epsilon-1}q_2 - c_2 q_2 = 0 \end{align*}

Can you solve this analytically? It’s quite hard…

. . .

Let’s do it numerically, starting by coding this function in Julia

3.11 Newton’s method: a duopoly example

For this example, \epsilon = 1.6, c_1 = 0.6, c_2 = 0.8

Code
epsilon = 1.6; c = [0.6; 0.8]; # column vector

f(q) will return a vector (pay attention to how I used the dot syntax)

Code
function f(q)
    Q = sum(q)
    F = Q^(-1/epsilon) .- (1/epsilon)Q^(-1/epsilon-1) .*q .- c .*q
end;
f([0.2; 0.2])
2-element Vector{Float64}:
 1.0989480808247896
 1.0589480808247895

3.12 Newton’s method: a duopoly example

What do we need next to use Newton’s method?

. . .

The derivatives! In this case, we have a function f:\mathbb{R}^2\rightarrow\mathbb{R}^2, so we need to define the Jacobian matrix

J = \begin{bmatrix} \frac{\partial f_1}{\partial q_1} & \frac{\partial f_1}{\partial q_2} \\ \frac{\partial f_2}{\partial q_1} & \frac{\partial f_2}{\partial q_2} \\ \end{bmatrix}

3.13 Newton’s method: a duopoly example

We can derive these terms analytically

\begin{align*} \frac{\partial f_{i}}{\partial q_{i}} &= (1/\epsilon)(q_{1}+q_{2})^{-1/\epsilon-1}\left[-2+(1/\epsilon+1)(q_{1}+q_{2})^{-1}q_{i}\right]-c_{i} \\ \frac{\partial f_{i}}{\partial q_{j\neq i}} &= \underbrace{(1/\epsilon)(q_{1}+q_{2})^{-1/\epsilon-1}}_{A} \left[-1+ \underbrace{(1/\epsilon+1)(q_{1}+q_{2})^{-1}}_{B} q_{i}\right] \end{align*}

3.14 Newton’s method: a duopoly example

So we can write the Jacobian as

J = -A \begin{bmatrix} 2 & 1\\ 1 & 2 \end{bmatrix} +AB \begin{bmatrix} q_{1} & q_{1}\\ q_{2} & q_{2}\\ \end{bmatrix} - \begin{bmatrix} c_{1} & 0\\ 0 & c_{2} \end{bmatrix}

where A\equiv(1/\epsilon)(q_{1}+q_{2})^{-1/\epsilon-1}, B\equiv(1/\epsilon)(q_{1}+q_{2})^{-1}

. . .

Let’s turn this Jacobian into a Julia function so we can use Newton’s method

3.15 Newton’s method: a duopoly example

Now we define a function for the Jacobian

Code
using LinearAlgebra
function f_jacobian(q)
    Q = sum(q)
    A = (1/epsilon)Q^(-1/epsilon-1)
    B = (1/epsilon+1)Q^(-1)
    J = -A .* [2 1; 1 2] + (A*B) .* [q q] - LinearAlgebra.Diagonal(c)
end;
f_jacobian([0.2; 0.2])
2×2 Matrix{Float64}:
 -3.88977   -0.519438
 -0.519438  -4.08977

3.16 Newton’s method: a duopoly example

We need to write a new version of our Newton’s method so it can handle n-dimensional functions

Code
function newtons_method_multidim(f, f_jacobian, initial_guess) 
    tolerance = 1e-3   
    difference = Inf   
    x = initial_guess  
    
    while norm(difference) > tolerance # <=== Changed here
        println("Intermediate guess: $x")
        x_prev = x  
        x = x_prev - f_jacobian(x_prev)\f(x_prev) # <=== and here
        difference = x - x_prev 
    end
    println("The root of f(x) is $x")
    return x
  end;

3.17 Newton’s method: a duopoly example

Note that we only needed to change 2 things from the previous version:

  1. Our tolerance is now over the norm of the difference vector
  2. The “derivative” is a Jacobian matrix, so we multiply f(x^{(k)}) by the inverse of J(x^{(k)})
  • We use operator \ because it is more efficient than inverting J

We also added a return x so that the function returns a solution

3.18 Newton’s method: a duopoly example

Let’s test it out with initial guess (0.2, 0.2)

Code
x = newtons_method_multidim(f, f_jacobian, [0.2; 0.2])
Intermediate guess: [0.2, 0.2]
Intermediate guess: [0.4522233972882232, 0.4268911412785466]
Intermediate guess: [0.7443820767549576, 0.6380274933135309]
Intermediate guess: [0.8359566131332257, 0.6874038938481142]
Intermediate guess: [0.8395632064645864, 0.6887954351409207]
The root of f(x) is [0.8395676035295858, 0.6887964311626718]
2-element Vector{Float64}:
 0.8395676035295858
 0.6887964311626718

3.19 Newton’s method: a duopoly example

Let’s check our solution

Code
f(x)
2-element Vector{Float64}:
 5.7654991891809004e-12
 9.037215420448774e-13

Pretty good result with quick convergence!

3.20 Newton’s method: a duopoly example

Visually, this is the path Newton’s method followed

3.21 Newton’s method: analytic vs numerical derivatives

It was tedious but in the previous example we could calculate the Jacobian analytically. Sometimes it’s much harder and we can make mistakes.

Here is a challenge for you: redo the previous example but, instead of defining the Jacobian analytically, use the ForwardDiff package to do the derivatives for you. Then, compare your solution with mine.

. . .

An alternative is to use the Symbolics.jl package. For low-dimensional derivative and Jacobians, it does a good job!

3.22 Quasi-Newton methods

We usually don’t want to deal with analytic derivatives unless we have access to autodifferentiation

Why?

. . .

  1. It can be difficult to do the analytic derivation
  2. Coding a complicate Jacobian is prone to errors and takes time
  3. Can actually be slower to evaluate than finite differences for a nonlinear problem

. . .

Alternative \rightarrow finite differences instead of analytic derivatives

3.23 Quasi-Newton: Secant method

Using our current root guess x^{(k)} and our previous root guess x^{(k-1)}:

f'(x^{(k)}) \approx \frac{f(x^{(k)})-f(x^{(k-1)})}{x^{(k)} - x^{(k-1)}}

. . .

Our new iteration rule then becomes

x^{(k+1)} = x^{(k)} - \frac{x^{(k)}-x^{(k-1)}}{f(x^{(k)})-f(x^{(k-1)})}f(x^{(k)})

. . .

Now we require two initial guesses so that we have an initial approximation of the derivative

3.24 Quasi-Newton: Secant method

3.25 Quasi-Newton: Broyden’s method

Broyden’s method is the most widely used rootfinding method for n-dimensional problems

  • This is a generalization of the secant method where have a sequence of guesses of the Jacobian at the root

. . .

It also relies on a 1st-order Taylor expansion about \mathbf{x}, but now in n dimensions

f(\mathbf{x}) \approx f(\mathbf{x^{(k)}}) + A^{(k)} (\mathbf{x} - \mathbf{x^{(k)}}) = 0

. . .

We must initially provide a guess of the root, x^{(0)}, but also a guess of the Jacobian, A_{(0)}

  • A good guess for A_{(0)} is to calculate it numerically at our chosen x^{(0)}

3.26 Quasi-Newton: Broyden’s method

The iteration rule is the same as before but with our guess of the Jacobian substituted in for the actual Jacobian (or the finite difference approximation)

\mathbf{x^{(k+1)}} \leftarrow \mathbf{x^{(k)}} - (A^{(k)})^{-1} \, f(\mathbf{x^{(k)}})

. . .

We still need to update A_{(k)}: the idea of Broyden’s method is to choose a new Jacobian that satisfies the multidimensional secant condition

f(\mathbf{x^{(k+1)}}) - f(\mathbf{x^{(k)}}) = A^{(k+1)}\left( \mathbf{x^{(k+1)}} - \mathbf{x^{(k)}} \right)

  • Any reasonable guess for the Jacobian should satisfy this condition

. . .

But this gives n conditions with n^2 elements to solve for in A

3.27 Quasi-Newton: Broyden’s method

Broyden’s methods solves this under-determined problem with an assumption that focuses on the direction we are most interested in: d^{(k)} = \mathbf{x^{(k+1)}} - \mathbf{x^{(k)}}

For any direction q orthogonal to d^{(k)}, it assumes that A^{(k+1)}q = A^{(k)}q

  • In other words, our next guess is as good as the current ones for any changes in x that are orthogonal to the one we are interested right now

. . .

Jointly, the secant condition and the orthogonality assumption give the iteration rule for the Jacobian:

. . .

A^{(k+1)} \leftarrow A^{(k)} + \left[f(\mathbf{x^{(k+1)}}) - f(\mathbf{x^{(k)}}) - A^{(k)}d^{(k)}\right] \frac{d^{(k)T}}{d^{(k)T}d^{(k)}}

3.28 Quasi-Newton: Broyden’s method

The general algorithm for Broyden’s method is:

  1. Choose an initial guess for the root \mathbf{x}^{(0)} and the Jacobian A^{(0)}
  2. Calculate the next guess for the root: \mathbf{x}^{(k+1)}
  3. Calculate the next guess for the Jacobian: A^{(k+1)}
  4. Repeat 2 and 3 until convergence: \|\mathbf{x}^{(k+1)} - \mathbf{x}^{(k)} \| \leq tolerance

3.29 Quasi-Newton: Broyden’s method

One costly part of Broyden’s method is that we update the Jacobian and “invert” it every iteration

. . .

But, since we only need the Jacobian’s inverse to update \mathbf{x}, we can make it faster by updating the inverse Jacobian directly

B^{(k+1)} = B^{(k)} + \frac{[d^{(k)} - u^{(k)}]{d^{(k)T}}B_{(k)}}{{d^{(k)T}} u^{(k)}}

where u^{(k)} = B^{(k)}\left[f(\mathbf{x^{(k+1)}})-f(\mathbf{x^{(k)}})\right]

  • Most canned implementations of Broyden’s method use the inverse update rule

3.30 Quasi-Newton: Broyden’s method

Broyden converges under relatively weak conditions:

  1. f is continuously differentiable,
  2. x^{(0)} is “close” to the root of f
  3. f' is invertible around the root
  4. A_0 is sufficiently close to the Jacobian

3.31 Broyden’s method: a duopoly example

Let’s revisit our previous example but now solve it using Broyden’s method

We won’t code it by hand. Instead, we use package NLsolve.jl

Function NLsolve.nlsolve has a ton of options to solve nonlinear equations

. . .

Once again, we are looking a pair (q_1, q_2) which are roots to two nonlinear equations

\begin{align*} f_1(q_1, q_2) & = (q_1 + q_2)^{-1/\epsilon} - (1/\epsilon)(q_1 + q_2)^{-1/\epsilon-1}q_1 - c_1 q_1 = 0 \\ f_2(q_1, q_2) & = (q_1 + q_2)^{-1/\epsilon} - (1/\epsilon)(q_1 + q_2)^{-1/\epsilon-1}q_2 - c_2 q_2 = 0 \end{align*}

3.32 Broyden’s method: a duopoly example

Using our previously defined f, we run

Code
using NLsolve
NLsolve.nlsolve(f, [1.0; 2.0], method=:broyden,
                xtol=:1e-8, ftol=:0.0, iterations=:1000, show_trace=:true)
Iter     f(x) inf-norm    Step 2-norm 
------   --------------   --------------
     0     1.306427e+00              NaN
     1     1.451615e+00     2.730291e-02
     2     1.963157e-01     2.710146e+00
     3     1.212705e-01     4.591065e-02
     4     1.384160e-01     2.899985e-04
     5     6.349495e-03     2.595537e-02
     6     1.049686e-03     4.755299e-05
     7     3.404813e-05     1.308004e-06
     8     3.486624e-07     1.411664e-09
     9     3.116015e-10     1.486563e-13
    10     1.252332e-13     1.177236e-19
Results of Nonlinear Solver Algorithm
 * Algorithm: broyden without line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.8395676035355293, 0.6887964311629567]
 * Inf-norm of residuals: 0.000000
 * Iterations: 10
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 37
 * Jacobian Calls (df/dx): 0

3.33 Broyden’s method: a duopoly example

  • The first and second arguments are f and an initial guess
    • nlsolve will automatically generate a Jacobian guess for us
  • method=:broyden tells nlsolve to use Broyden’s methods

. . .

The other arguments are optional

  • xtol is the convergence tolerance over x: \| \mathbf{x^{(k+1)}} - \mathbf{x^{(k)}} \| \leq xtol (default is 0.0 meaning no criterion)
  • ftol is the convergence tolerance over f(x): \| f(\mathbf{x^{(k+1)}}) - f(\mathbf{x^{(k)}}) \| \leq ftol (default is 1e-8)
  • iterations set the maximum number of iterations before declaring non-convergence (default is 1000)
  • show_trace will print all the iterations if you set to true (default is false)

3.34 NLsolve.jl

We can use this package to solve with Newton’s method as well. Here we make use of the analytic Jacobian we defined earlier

Code
NLsolve.nlsolve(f, f_jacobian, [1.0; 2.0], method=:newton,
                xtol=:1e-8, ftol=:0.0, iterations=:1000)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.8395676035356598, 0.6887964311630005]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

3.35 NLsolve.jl

If you omit the Jacobian, nlsolve will calculate it numerically for you using centered finite differencing

Code
NLsolve.nlsolve(f, [1.0; 2.0], method=:newton,
                xtol=:1e-8, ftol=:0.0, iterations=:1000)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.83956760353566, 0.6887964311630005]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

3.36 NLsolve.jl

You can add argument autodiff=:forward to use forward autodifferentiation instead of finite differences

Code
NLsolve.nlsolve(f, [1.0; 2.0], method=:newton, autodiff=:forward,
                xtol=:1e-8, ftol=:0.0, iterations=:1000)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.8395676035356598, 0.6887964311630005]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

3.37 Quick detour

Before we continue, take a look again at our f

Code
function f(q)
    Q = sum(q)
    F = Q^(-1/epsilon) .- (1/epsilon)Q^(-1/epsilon-1) .*q .- c .*q
end;

Do you see any potential inefficiency?

. . .

We allocate a new F every time we call this function!

Instead, we can be more efficient by writing a function that modifies a pre-allocated vector

3.38 Quick detour: functions that modify arguments

By convention, in Julia we name functions that modify arguments with a ! at the end. For our f, we can define

Code
function f!(F, q)
    F[1] = sum(q)^(-1/epsilon) - (1/epsilon)sum(q)^(-1/epsilon-1)*q[1] - c[1]*q[1]
    F[2] = sum(q)^(-1/epsilon) - (1/epsilon)sum(q)^(-1/epsilon-1)*q[2] - c[2]*q[2]
end;
F = zeros(2) # This allocates a 2-vector with elements equal to zero
f!(F, [0.2; 0.2]); # Note the first argument is a pre-allocated vector
F
2-element Vector{Float64}:
 1.0989480808247896
 1.0589480808247895

3.39 Quick detour: functions that modify arguments

NLsolve.nlsolve understands when we pass a ! function and pre-allocates the vector for us

Because it allocates only once, it will be more efficient

Code
NLsolve.nlsolve(f!, [1.0; 2.0], method=:newton,
                xtol=:1e-8, ftol=:0.0, iterations=:1000)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [1.0, 2.0]
 * Zero: [0.83956760353566, 0.6887964311630005]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 1.0e-08: true
   * |f(x)| < 0.0e+00: false
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

4 Comparing methods

4.1 Convergence speed

Rootfinding algorithms will converge at different speeds in terms of the number of operations

. . .

A sequence of iterates x^{(k)} is said to converge to x^* at a rate of order p if there is a constant C such that

\|x^{(k+1)}-x^{*}\|\leq C \|x^{(k)} - x^{*}\|^p

for sufficiently large k

4.2 Convergence speed

\|x^{(k+1)}-x^{*}\|\leq C \|x^{(k)} - x^{*}\|^p

  • If C < 1 and p = 1: linear convergence
  • If 1 < p < 2: superlinear convergence
  • If p = 2: quadratic convergence

The higher order the convergence rate, the faster it converges

4.3 Convergence speed

How fast do the methods we’ve seen converge?

. . .

  • Bisection: linear rate with C = 0.5

. . .

  • Function iteration: linear rate with C = ||f'(x^*)||

. . .

  • Secant and Broyden: superlinear rate with p \approx 1.62

. . .

  • Newton: p = 2

4.4 Convergence speed

Consider an example where f(x) = x - \sqrt(x) = 0

This is how the 3 main approaches converge in terms of the L^1-norm for an initial guess x^{(0)} = 0.5

4.5 Choosing a solution method

Convergence rates only account for the number of iterations of the method

The steps taken in a given iteration of each solution method may vary in computational cost because of differences in the number of arithmetic operations

Although an algorithm may take more iterations to solve, each iteration may be solved faster and the overall algorithm takes less time

4.6 Choosing a solution method

  • Bisection method only requires a single function evaluation during each iteration
  • Function iteration only requires a single function evaluation during each iteration
  • Broyden’s method requires both a function evaluation and matrix multiplication
  • Newton’s method requires a function evaluation, a derivative evaluation, and solving a linear system

. . .

\rightarrow Bisection and function iteration are usually slow

\rightarrow Broyden’s method can be faster than Newton’s method if derivatives are costly to compute

4.7 Choosing a solution method

Besides convergence rates and algorithm speed, you should also factor development time

  • Newton’s method is fastest to converge
    • If deriving and programming the Jacobian is relatively simple and not too costly to compute, this method is a good choice
  • If derivatives are complex, quasi-Newton methods are good candidates
  • Bisection and function iteration are generally dominated options but are easy to program/debug, so they have value as a quick proof-of-concept
    • Bisection is often used in hybrid methods, such as Dekker’s and Brent’s. Hybrid methods select between bisection, secant, or other basic solution methods every iteration depending on a set of criteria

5 Common problems and solutions

5.1 When solvers fail: domain errors

A very common problem with Newton and Broyden: the solver tries values outside the function’s domain

. . .

Consider a market equilibrium with isoelastic supply S_i = c_i p_i^{\sigma_i} and demand D_i = y\, p_i^{\epsilon_{ii}} p_j^{\epsilon_{ij}}

\begin{align} f_1(p_1, p_2) & = c_1 p_1^{\sigma_1} - y \, p_1^{\epsilon_{11}} p_2^{\epsilon_{12}} = 0 \\ f_2(p_1, p_2) & = c_2 p_2^{\sigma_2} - y \, p_1^{\epsilon_{21}} p_2^{\epsilon_{22}} = 0 \end{align}

Prices must be positive: p^{\sigma} with non-integer \sigma is only defined for p > 0

. . .

But the solver doesn’t know that! If a Newton/Broyden step lands on p_i \leq 0, Julia throws a DomainError

5.2 When solvers fail: domain errors

Let’s see it happen

Code
using NLsolve
c = [2.0; 3.0]; σ = [1.5; 1.3];
ϵ_own = [-1.2; -1.6]; ϵ_cross = [0.2; 0.2]; y₀ = 10.0;

function f!(F, p)
    F[1] = c[1]*p[1]^σ[1] - y₀*p[1]^ϵ_own[1]*p[2]^ϵ_cross[1]
    F[2] = c[2]*p[2]^σ[2] - y₀*p[1]^ϵ_cross[2]*p[2]^ϵ_own[2]
end

# From a good initial guess, it works:
NLsolve.nlsolve(f!, [1.5; 1.5], method=:broyden)
Results of Nonlinear Solver Algorithm
 * Algorithm: broyden without line-search
 * Starting Point: [1.5, 1.5]
 * Zero: [1.8777225419239614, 1.5818822320913937]
 * Inf-norm of residuals: 0.000000
 * Iterations: 9
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 45
 * Jacobian Calls (df/dx): 0

5.3 When solvers fail: domain errors

Code
# From an asymmetric guess, it crashes:
try
    NLsolve.nlsolve(f!, [0.05; 2.0], method=:broyden)
catch e
    println("Error: $e")
end
Error: DomainError(-3.57491732087526, "Exponentiation yielding a complex result requires a complex argument.\nReplace x^y with (x+0im)^y, Complex(x)^y, or similar.")

. . .

What happened? Broyden’s approximate Jacobian led to a step that pushed p_1 below zero. Julia tried to compute a negative number raised to a fractional power and threw a DomainError

5.4 Why this happens

Newton and Broyden are unconstrained methods: they search over all of \mathbb{R}^n

. . .

This is problematic when:

  • Variables must be positive (prices, quantities, income)
  • You use isoelastic functions: x^{\alpha} with non-integer \alpha requires x > 0
  • Your initial guess is far from the solution \rightarrow large steps \rightarrow overshooting

. . .

Broyden is more prone to this than Newton because its Jacobian approximation can degrade, leading to poorly directed steps

. . .

Three common strategies to deal with this:

  1. Start with a better initial guess
  2. Try a different method
  3. Transform variables so the solver’s domain is unrestricted

5.5 Strategy 1: Better initial guesses

Idea: Solve a simpler version of the problem first, then use that solution as the starting point

. . .

For a system with cross-price effects, we can ignore them and solve each market in isolation

c_i p_i^{\sigma_i} = y \, p_i^{\epsilon_{ii}} \implies p_i^{(\sigma_i - \epsilon_{ii})} = \frac{y}{c_i} \implies p_i^{0} = \left(\frac{y}{c_i}\right)^{\frac{1}{\sigma_i - \epsilon_{ii}}}

. . .

This has a closed-form solution and gives a reasonable starting point for the coupled system

5.6 Strategy 1: Better initial guesses

Code
# Solve each market in isolation (set cross-price elasticities to zero)
p0 = [(y₀/c[i])^(1/(σ[i] - ϵ_own[i])) for i in 1:2];
println("Simplified guess: $p0");
Simplified guess: [1.8150048061254909, 1.5146176577290846]
Code
NLsolve.nlsolve(f!, p0, method=:broyden)
Results of Nonlinear Solver Algorithm
 * Algorithm: broyden without line-search
 * Starting Point: [1.8150048061254909, 1.5146176577290846]
 * Zero: [1.8777225419271355, 1.5818822320989752]
 * Inf-norm of residuals: 0.000000
 * Iterations: 7
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 40
 * Jacobian Calls (df/dx): 0

5.7 Strategy 2: Try a different method

Broyden failed because its Jacobian approximation was inaccurate far from the solution

. . .

Newton’s method computes (or approximates) the Jacobian at every step, so it is less prone to overshooting

Code
# Newton with autodiff: same bad starting point, no DomainError
NLsolve.nlsolve(f!, [0.05; 2.0], method=:newton, autodiff=:forward)
Results of Nonlinear Solver Algorithm
 * Algorithm: Newton with line-search
 * Starting Point: [0.05, 2.0]
 * Zero: [1.8777225419239383, 1.581882232092248]
 * Inf-norm of residuals: 0.000000
 * Iterations: 10
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 11
 * Jacobian Calls (df/dx): 11

5.8 Strategy 3: Variable transformation

A more robust approach: change variables so the domain is unrestricted

. . .

If p must be positive, define z = \log(p), so p = e^z

  • z \in \mathbb{R}: no domain restriction
  • e^z > 0 always: prices stay positive regardless of what the solver does

. . .

Instead of solving f(p) = 0, we solve g(z) \equiv f(e^z) = 0

After convergence, recover p^* = e^{z^*}

5.9 Strategy 3: Variable transformation

For our market equilibrium, substitute p_i = e^{z_i}

\begin{align} g_1(z_1, z_2) & = c_1 e^{\sigma_1 z_1} - y \, e^{\epsilon_{11} z_1 + \epsilon_{12} z_2} = 0 \\ g_2(z_1, z_2) & = c_2 e^{\sigma_2 z_2} - y \, e^{\epsilon_{21} z_1 + \epsilon_{22} z_2} = 0 \end{align}

. . .

This is the same system, parameterized differently

Same equilibrium: if z^* solves g = 0, then p^* = e^{z^*} solves f = 0

But now the solver can explore any z \in \mathbb{R}^2 without ever triggering a domain error

5.10 Strategy 3: Variable transformation

Code
function g!(G, z)
    G[1] = c[1]*exp(σ[1]*z[1]) - y₀*exp(ϵ_own[1]*z[1] + ϵ_cross[1]*z[2])
    G[2] = c[2]*exp(σ[2]*z[2]) - y₀*exp(ϵ_cross[2]*z[1] + ϵ_own[2]*z[2])
end

# Same starting point that failed before: p = (0.05, 2.0) → z = log.((0.05, 2.0))
result = NLsolve.nlsolve(g!, log.([0.05; 2.0]), method=:broyden)
p_solution = exp.(result.zero)
println("Solution: p = $p_solution")
Solution: p = [1.877722541843727, 1.5818822321046384]

No DomainError. The solver explored negative z values, and e^z kept prices positive throughout

5.11 When to use each strategy

  1. Better initial guess: simple, requires no code changes to f, but needs problem-specific insight

  2. Different method: Newton with autodiff is more robust than Broyden with finite differences, at the cost of more computation per step

  3. Log transformation: most robust, works from distant guesses, but adds a layer of indirection and can lead to scaling issues if the solution is very large/small

. . .

Note: A related transformation for variables in (0,1) (e.g., probabilities or shares):

z = \log\!\left(\frac{x}{1-x}\right), \quad x = \frac{1}{1+e^{-z}}

This is the logit transformation: z \in \mathbb{R} maps to x \in (0,1)

5.12 When solvers fail: convergence problems

We’ve seen that Newton’s method converges if the initial guess is sufficiently close to a root where f' is invertible

But what does failure look like in practice?

Three classic failure modes:

  1. Cycling: iterates bounce back and forth without approaching the root
  2. Divergence: iterates move farther and farther from the root
  3. Slow convergence: iterates approach the root, but VERY slowly

5.13 Failure mode 1: Cycling

Consider f(x) = x^3 - 2x + 2

Starting at x^{(0)} = 0:

  • f(0) = 2, f'(0) = -2 \Rightarrow x^{(1)} = 0 - \frac{2}{-2} = 1
  • f(1) = 1, f'(1) = 1 \Rightarrow x^{(2)} = 1 - \frac{1}{1} = 0

The method enters a perfect 2-cycle: the tangent at x = 0 hits zero at x = 1, and the tangent at x = 1 hits zero at x = 0

The actual root is at x^* \approx -1.77, which Newton never reaches from this starting point

5.14 Failure mode 1: Cycling

Let’s verify with a modified version of our newtons_method function

Code
f(x) = x^3 - 2*x + 2;
f_prime(x) = 3*x^2 - 2;

We need to add a maximum iteration count so the loop terminates

Code
function newtons_method_maxiter(f, f_prime, initial_guess; maxiter=20)
  tolerance = 1e-3   # tolerance for solution
  difference = Inf   # initialize difference
  x = initial_guess  # initialize current value
  
  for k in 1:maxiter
      println("Iteration $k: x = $(round(x, digits=6))")
      x_prev = x
      x = x_prev - f(x_prev)/f_prime(x_prev)
      difference = x - x_prev
      abs(difference) < tolerance && return x
  end
  println("Did not converge after $maxiter iterations")
  return x
end;

5.15 Failure mode 1: Cycling

The iterates alternate between 0.0 and 1.0 indefinitely

Code
newtons_method_maxiter(f, f_prime, 0.0; maxiter=10);
Iteration 1: x = 0.0
Iteration 2: x = 1.0
Iteration 3: x = 0.0
Iteration 4: x = 1.0
Iteration 5: x = 0.0
Iteration 6: x = 1.0
Iteration 7: x = 0.0
Iteration 8: x = 1.0
Iteration 9: x = 0.0
Iteration 10: x = 1.0
Did not converge after 10 iterations

5.16 Failure mode 1: Cycling

Show the code
using Plots; gr()

f(x) = x^3 - 2*x + 2;
f_prime(x) = 3*x^2 - 2;

xgrid = range(-2.5, 2.0, length=300)
p = plot(xgrid, f.(xgrid), label="f(x) = x³ - 2x + 2", color=:black, linewidth=2)
hline!([0], color=:gray, linestyle=:dash, label=nothing)

# Newton steps: tangent from (0, f(0)) to (1, 0), then (1, f(1)) to (0, 0)
xs_cycle = [0.0, 1.0, 0.0, 1.0, 0.0]
colors_cycle = [:royalblue, :crimson]
for k in 1:4
    xk = xs_cycle[k]; xk1 = xs_cycle[k+1]
    plot!([xk, xk1], [f(xk), 0.0], color=colors_cycle[mod1(k,2)],
          linestyle=:dot, linewidth=1.5, label=nothing)
    plot!([xk1, xk1], [0.0, f(xk1)], color=colors_cycle[mod1(k,2)],
          linestyle=:dot, linewidth=1.5, label=nothing)
end
scatter!([0.0, 1.0], [f(0.0), f(1.0)], color=:red, markersize=6, label="Iterates")
title!("Cycling: Newton bounces between x = 0 and x = 1")
xlabel!("x"); ylabel!("f(x)")
p

5.17 Failure mode 2: Divergence

Consider f(x) = \text{sign}(x)|x|^{1/3} (the cube root)

The root is at x^* = 0, with f'(x) = \frac{1}{3}|x|^{-2/3}. The Newton update simplifies to:

x^{(k+1)} = x^{(k)} - \frac{x^{(k)^{1/3}}}{\frac{1}{3}x^{(k)^{-2/3}}} = x^{(k)} - 3x^{(k)} = -2x^{(k)}

Each step doubles the distance to the root and flips the sign

The function is too flat near the root: the tangent line always overshoots

5.18 Failure mode 2: Divergence

The iterates grow geometrically: 0.5, -1, 2, -4, 8, \ldots

Code
f(x) = sign(x) * abs(x)^(1/3);
f_prime(x) = (1/3) * abs(x)^(-2/3);

newtons_method_maxiter(f, f_prime, 0.5; maxiter=8);
Iteration 1: x = 0.5
Iteration 2: x = -1.0
Iteration 3: x = 2.0
Iteration 4: x = -4.0
Iteration 5: x = 8.0
Iteration 6: x = -16.0
Iteration 7: x = 32.0
Iteration 8: x = -64.0
Did not converge after 8 iterations

5.19 Failure mode 2: Divergence

Show the code
f(x) = sign(x) * abs(x)^(1/3);
f_prime(x) = (1/3) * abs(x)^(-2/3);

# Collect iterates
xs_div = [0.5]
for k in 1:6
    xk = xs_div[end]
    push!(xs_div, xk - f(xk)/f_prime(xk))
end

xlim = maximum(abs.(xs_div)) * 1.2
xgrid = range(-xlim, xlim, length=500)
p = plot(xgrid, f.(xgrid), label="f(x) = sign(x)|x|^(1/3)", color=:black, linewidth=2)
hline!([0], color=:gray, linestyle=:dash, label=nothing)

for k in 1:min(length(xs_div)-1, 5)
    xk = xs_div[k]; xk1 = xs_div[k+1]
    plot!([xk, xk1], [f(xk), 0.0], color=:royalblue,
          linestyle=:dot, linewidth=1.5, label=nothing, alpha=0.4+0.12*k)
    plot!([xk1, xk1], [0.0, f(xk1)], color=:royalblue,
          linestyle=:dot, linewidth=1.5, label=nothing, alpha=0.4+0.12*k)
end
scatter!(xs_div[1:6], f.(xs_div[1:6]), color=:red, markersize=5, label="Iterates")
title!("Divergence: each step doubles the distance from the root")
xlabel!("x"); ylabel!("f(x)")
p

5.20 Failure mode 3: Slow convergence

Consider f(x) = x^3, which has a root at x^* = 0

But this is a root with multiplicity 3: f'(0) = 0

The Newton update simplifies to:

x^{(k+1)} = x^{(k)} - \frac{(x^{(k)})^3}{3(x^{(k)})^2} = \frac{2}{3}x^{(k)}

So the error only shrinks by a factor of 2/3 each step: linear convergence instead of the quadratic convergence we expect from Newton’s method

5.21 Failure mode 3: Slow convergence

Code
f(x) = x^3;
f_prime(x) = 3*x^2;

newtons_method_maxiter(f, f_prime, 1.0; maxiter=50);
Iteration 1: x = 1.0
Iteration 2: x = 0.666667
Iteration 3: x = 0.444444
Iteration 4: x = 0.296296
Iteration 5: x = 0.197531
Iteration 6: x = 0.131687
Iteration 7: x = 0.087791
Iteration 8: x = 0.058528
Iteration 9: x = 0.039018
Iteration 10: x = 0.026012
Iteration 11: x = 0.017342
Iteration 12: x = 0.011561
Iteration 13: x = 0.007707
Iteration 14: x = 0.005138
Iteration 15: x = 0.003425
Iteration 16: x = 0.002284

5.22 Failure mode 3: Slow convergence

Compare with a well-behaved case: f(x) = x^3 - 1 has a simple root at x^* = 1

Show the code
# Collect errors for x^3 (repeated root)
f_slow(x) = x^3; fp_slow(x) = 3*x^2;
x = 1.0; errors_slow = [abs(x)]
for k in 1:50
    x = x - f_slow(x)/fp_slow(x)
    push!(errors_slow, abs(x))
end

# Collect errors for x^3 - 1 (simple root)
f_fast(x) = x^3 - 1; fp_fast(x) = 3*x^2;
x = 2.0; errors_fast = [abs(x - 1.0)]
for k in 1:15
    x = x - f_fast(x)/fp_fast(x)
    push!(errors_fast, abs(x - 1.0))
end

p = plot(0:length(errors_slow)-1, errors_slow, marker=:circle, markersize=3,
     color=:crimson, label="x³ (repeated root, linear rate)",
     yscale=:log10, xlabel="Iteration", ylabel="Error (log scale)",
     ylims=(1e-16, 10), linewidth=1.5)
plot!(0:length(errors_fast)-1, errors_fast, marker=:diamond, markersize=3,
     color=:royalblue, label="x³ - 1 (simple root, quadratic rate)", linewidth=1.5)
title!("Convergence comparison: repeated vs simple root")
p

5.23 Failure mode 3: Slow convergence

The simple root reaches machine precision in about 7 iterations

The repeated root (level and 1st derivative) is still crawling after 50 iterations. Why?

. . .

Why? Newton’s quadratic convergence requires f'(x^*) \neq 0

When f'(x^*) = 0 (repeated root), the method degrades to linear convergence!

5.24 Summary: convergence failures

Failure mode Example What goes wrong Root cause
Cycling x^3 - 2x + 2 at x_0=0 Iterates form a periodic orbit Tangent lines create a closed loop from this starting point
Divergence x^{1/3} at any x_0 \neq 0 Iterates grow without bound \|f/f'\| grows near the root: steps get larger, not smaller
Slow convergence x^3 near x^*=0 Linear instead of quadratic rate f'(x^*) = 0 (repeated root): steps shrink, but only linearly

5.25 Summary: convergence failures

So, all three cases violate assumptions of Newton’s convergence theorem.

  1. Cycling: bad starting point relative to the function’s geometry
  2. Divergence: f' is not continuously differentiable (at the root) and grows unboundedly near the root
  3. Slow convergence: f' is singular at the root