sin_error_3 (generic function with 1 method)
Main references for today
- Miranda & Fackler (2002), Ch. 5
- Judd (1998), Ch. 7
- Nocedal & Wright (2006), Ch. 8
- Lecture notes for Ivan Rudik’s Dynamic Optimization (Cornell)
Numerical differentiation
University of Illinois Urbana-Champaign
A useful way to represent limiting behavior
How do we quantify speed and accuracy of computational algorithms?
The usual way is to use Big O (or order) notation
Written as: \(O(f(x))\)
Formally, for two real-valued functions \(f(x)\) and \(g(x)\) we say
\[g(x) = O(f(x))\]
if there \(\exists\) a positive \(C\) such that
\[|g(x)| \leq C |f(x)|, \text{as}\; x \rightarrow a\]
So \(g\) is bounded by \(f\) up to a scalar. Another way of saying it (if \(g\) is non-zero): \(\lim_{x \rightarrow a} \frac{|g(x)|}{|f(x)|} < \infty\)
Intuitively, \(g\) never gets “too far away” from \(f\) as \(x \rightarrow a\)
Here is how to think about it:
\(\mathbf{O(x)}\): linear
Any Examples?
\(\mathbf{O(c^x)}:\) exponential
Any examples?
\(\mathbf{O(n!)}\): factorial
Any examples?
This is how you have probably seen Big O used before:
Taylor series for \(sin(x)\) around zero:
\(sin(x) = x - x^3/3! + x^5/5! + O(x^7)\)
What does \(O(x^7)\) mean here?
\(sin(x) = x - x^3/3! + x^5/5! + O(x^7)\)
As we move away from \(0\) to some \(x\), the upper bound of the growth rate in the error of our approximation to \(sin(x)\) is \(x^7\)
If we are approximating about zero so \(x\) is small: \(x^n\) is decreasing in \(n\)
For small \(x\), higher order polynomials mean the error will grow slower and we have a better local approximation
sin_error_3 (generic function with 1 method)
println("Error of fifth-order approximation at x = .001 is: $(sin_error_5(.001))
Error of third-order approximation at x = .001 is: $(sin_error_3(.001))
Error of fifth-order approximation at x = .01 is: $(sin_error_5(.01))
Error of third-order approximation at x = .01 is: $(sin_error_3(.01))
Error of fifth-order approximation at x = .1 is: $(sin_error_5(.1))
Error of third-order approximation at x = .1 is: $(sin_error_3(.1))")Error of fifth-order approximation at x = .001 is: 0.0
Error of third-order approximation at x = .001 is: 8.239936510889834e-18
Error of fifth-order approximation at x = .01 is: -1.734723475976807e-18
Error of third-order approximation at x = .01 is: 8.333316675601665e-13
Error of fifth-order approximation at x = .1 is: -1.983851971587569e-11
Error of third-order approximation at x = .1 is: 8.331349481138783e-8
Here are a few examples for fundamental computational methods
\(\mathbf{O(1)}\): algorithm executes in constant time
The size of the input does not affect execution speed
Example:
x[10]\(\mathbf{O(x)}\): algorithm executes in linear time
Execution speed grows linearly in input size
Example:
\(\mathbf{O(x^2):}\) algorithm executes in quadratic time
Execution speed grows quadratically in input size
Example:
Derivatives are obviously important in economics for finding optimal allocations, etc
The formal definition of a derivative is:
\[\frac{d f(x)}{dx} = \lim_{h\rightarrow 0} \frac{f(x+h)-f(x)}{h}\]
But we can let \(t = 1/h\) and reframe this as an infinite limit
\[\frac{d f(x)}{dx} = \lim_{t\rightarrow \infty} \frac{f(x+1/t)-f(x)}{1/t}\]
which we know a computer can’t handle because of finite space to store \(t\)
How do we perform derivatives on computers if we can’t take the limit?
Finite difference methods
Idea: Approximate the limit by letting \(h\) be a small number
What does a finite difference approximation look like?
The forward difference looks exactly like the formal definition without the limit: \[\frac{d f(x)}{dx} \approx \frac{f(x+h)-f(x)}{h}\]
It works the same for partial derivatives:
\[\frac{\partial g(x,y)}{\partial x} \approx \frac{g(x+h,y)-g(x,y)}{h}\]
Let’s see how it works in practice by calculating derivatives of \(x^2\) at \(x=2\)
deriv_x_squared(h,x) that returns the forward difference approximation to the derivative of \(x^2\) around with step size \(h\)x=2 and 3 values of hh = 1e-1h = 1e-12h = 1e-30deriv_x_squared (generic function with 1 method)
The derivative with h=1e-1 is: 4.100000000000001
The derivative with h=1e-12 is: 4.000355602329364
The derivative with h=1e-30 is: 0.0
None of the values we chose for \(h\) were perfect, but clearly some were better than others
Why?
We face two opposing forces:
We can select \(h\) in an optimal fashion: \(h = \max\{|x|,1\}\sqrt{\epsilon}\)
There’s proofs for why this is the case but generally testing out different \(h\)’s works fine
Can we measure the error growth rate in \(h\) (i.e. Big O notation)?
Perform a first-order Taylor expansion of \(f(x)\) around \(x\):
\[f(x+h) = f(x) + f'(x)h + O(h^2)\]
Recall that \(O(h^2)\) means the error in our approximation grows quadratically in \(h\), though we only did a linear approximation
How can we use this to understand the error in our finite difference approximation?
Rearrange to obtain: \(f'(x) = \frac{f(x+h) - f(x)}{h} + O(h^2)/h\)
\(\Rightarrow f'(x) = \frac{f(x+h) - f(x)}{h} + O(h)\) because \(O(h^2)/h = O(h)\)
Forward differences have linearly growing errors
If we halve \(h\), we halve the error in our approximation (ignoring rounding/truncation issues)
How can we improve the accuracy of the forward difference?
First, why do we have error?
How about we center \(x\)?
We can approximate \(f'(x)\) in a slightly different way: \[f'(x) \approx \frac{f(x+h)-f(x-h)}{2h}\]
Is this an improvement on forward differences?
Lets do two second-order Taylor expansions:
Subtract the two expressions (note that \(O(h^3) - O(h^3) = O(h^3)\)) and then divide by \(2h\) to get
\[f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2)\]
\[f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2)\]
Error is quadratic in \(h\): if we halve \(h\) we reduce error by 75%
We are looking for a combination of coefficients \(a, b, c, d\) in
\[af(x+h) + bf(x-h) + cf(x+2h) + df(x-2h)\]
such that we can cleverly cancel out higher order (2nd, 3rd, and 4th) terms
If you do the math, you’ll find that with \(a = -1/12, b = 2/3, c = -2/3, d = 12\) we get
\[f'(x) = \frac{-f(x+2h)+8f(x+h)-8f(x-h)+f(x-2h)}{12h} + O(h^4)\]
By adding yet another two points, we quadrupled the accuracy: if we halve \(h\) we reduce error by 93.75%!
Suppose we’re computing a Jacobian of a multidimensional function. Why would we ever use forward differences instead of central differences?
Forward differences saves on the number of operations at the expense of accuracy
We can use these techniques to approximate higher order derivatives
For example, take two third order Taylor expansions
Add the two expressions and then divide by \(h^2\) to get
\[f''(x) = \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} + O(h^2)\]
Second derivatives are important for calculating Hessians and checking maxima or minima
Despair not! That’s what (reliable) packages are for. And we thank the good souls who programmed those tools for free so we can focus on solving our research questions
FiniteDifferences.jl in Julia
numdifftools)FiniteDifferences.jl: basicsLet’s start with the scalar (univariate) case
forward_fdm(p, q): forward finite differences methodbackward_fdm(p, q): backward finite differences methodcentral_fdm(p, q): central finite differences method1 for first-order )FiniteDifferences.jl: scalarsLet’s create an operator for 1st-order differentation with each type using the simplest case with 2 points
FiniteDifferenceMethod:
order of method: 2
order of derivative: 1
grid: [0, 1]
coefficients: [-1.0, 1.0]
FiniteDifferenceMethod:
order of method: 2
order of derivative: 1
grid: [-1, 1]
coefficients: [-0.5, 0.5]
FiniteDifferences.jl: scalarsAs expected, increasing the number of points (order of method) will increase precision
FiniteDifferenceMethod:
order of method: 6
order of derivative: 1
grid: [0, 1, 2, 3, 4, 5]
coefficients: [-2.283333333333333, 5.0, -5.0, 3.3333333333333335, -1.25, 0.2]
FiniteDifferenceMethod:
order of method: 6
order of derivative: 1
grid: [-3, -2, -1, 1, 2, 3]
coefficients: [-0.016666666666666666, 0.15, -0.75, 0.75, -0.15, 0.016666666666666666]
FiniteDifferences.jl: higher ordersHigher-order derivatives are also easy!
FiniteDifferenceMethod:
order of method: 3
order of derivative: 2
grid: [-1, 0, 1]
coefficients: [1.0, -2.0, 1.0]
FiniteDifferences.jl: gradientsWe can also use this package to calculate derivatives of functions of vectors (multivariate functions). To do so, we use function grad and pass our fdm operator first
Let’s see an example for function \(g(x_1, x_2) = x_1^2 + 2x_1 + x_2^2 - 2x_2 + 4 x_1 x_2\) evaluated at \((1, 1)\). We know gradients are
([8.000000010231267, 3.9999999945594786],)
FiniteDifferences.jl: Jacobians of scalar functionsThe Jacobian matrix for a multivariate scalar function \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) is the row vector of first-order partial derivatives of \(f\):
\[ J = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} & \cdots & \frac{\partial f}{\partial x_n} \end{bmatrix} \]
FiniteDifferences.jl: Jacobians of vector functionsIn many instances, we will work vector functions of the type \(f: \mathbb{R}^n \rightarrow \mathbb{R}^n\), where \(f = (f_1, f_2, \ldots, f_n)\) and each \(f_i\) is a function of \(n\) variables \(x_1, x_2, \ldots, x_n\)
The Jacobian matrix can be represented as follows:
\[ J = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \\ \end{bmatrix} \]
FiniteDifferences.jl: Jacobians of vector functionsThe process with FiniteDifferences.jl for vector functions is the same. Let’s try it with \(G: \mathbb{R}^2 \rightarrow \mathbb{R}^2\)
\[ G(\mathbf{X}) = \begin{bmatrix} x_1^2 + 2x_1 + x_2^2 - 2x_2 + 4 x_1 x_2 \\ x_1^2 - 2x_1 + x_2^2 + 2x_2 - 4 x_1 x_2 \\ \end{bmatrix} \]
function G(x)
G = similar(x) # Initializes a vector with same dimensions as x (2x1)
G[1] = x[1]^2 + 2.0*x[1] + x[2]^2 - 2.0*x[2] + 4.0*x[1]*x[2]
G[2] = x[1]^2 - 2.0*x[1] + x[2]^2 + 2.0*x[2] - 4.0*x[1]*x[2]
return G
end;
jacobian(my_central_diff, G, [1.0 1.0])[1] #[1] returns the first element of the output tuple (a matrix)2×2 Matrix{Float64}:
8.0 4.0
-4.0 0.0
FiniteDifferences.jl: Hesssian of scalar functionsThe Hessian matrix for a function \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) is a square matrix of second-order partial derivatives of \(f\):
\[ H = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \\ \end{bmatrix} \]
FiniteDifferences.jl currently doesn’t have a specific function to calculate Hessians directly. But it’s easy to do it by acknowledging that each row \(i\) of a Hessian matrix is the Jacobian of function \(f_i(x) = \frac{\partial f(x)}{\partial x_i} \rightarrow\) it’s a Jacobian of a Jacobian!
FiniteDifferences.jl: Hesssian of scalar functionsLet’s write a function to calculate a Hessian matrix
function hessian(fdm, f, x)
# Calculate the Jacobian of function f
f_i(x) = jacobian(fdm, f, x)[1] #[1] gets the 1st element in the tuple output, which is a vector of partial derivatives
# Calculate the Jacobian of vector function f_i
H = jacobian(fdm, f_i, x)[1]
return H
end;
hessian(my_central_diff, g, [1.0, 1.0])2×2 Matrix{Float64}:
2.00789 3.80655
3.51684 2.45064
Finite differences put us in between two opposing forces on the size of \(h\)
Can we improve upon finite differences?
Exact solution! (But up to machine precision, of course)
Coding up analytic derivatives by hand for complex problems is not always great because:
There is another option…
Think about this: your code is always made up of simple arithmetic operations
The closed form derivatives of these operations is not hard: it turns out your computer can do it and yield exact solutions
That’s called automatic differentiation, or autodiff for short
How?
There are methods that basically apply a giant chain rule to your whole program and break down the derivative into the (easy) component parts that another package knows how to handle
The details of decomposing calculations with computer instructions get pretty complicated pretty fast. We’re not going to code it by hand. Instead, we’re going to use a package for that: ForwardDiff.jl
ForwardDiff.jl: basicsThere are four basic methods we are interested in
derivative(f, x): derivative of univariate scalar function f at point xgradient(f, x): partial derivatives of multivariate scalar function fjacobian(f, x): Jacobian matrix of multivariate vector function f
f MUST return a vector, even if it’s a single dimensionhessian(f, x): Hessian matrix of multivariate scalar function fForwardDiff.jl: scalars and gradientsCalculating scalar derivatives of univariate functions is easy:
The same goes for gradients:
2-element Vector{Float64}:
8.0
4.0
Note: ForwardDiff.jl doesn’t implement higher-order derivatives. But you can easily calculate them by composition: define derivatives as functions and apply derivatives on them.
ForwardDiff.jl: JacobiansFor Jacobians, you need to be careful and make sure your function always returns an array, even if it’s one-dimensional
1-element Vector{Float64}:
6.0
Jacobians of “true” vector functions follow the same procedure
ForwardDiff.jl: HessiansHessians are also easy to obtain:
| Factor | Finite Differences | Automatic Differentiation |
|---|---|---|
| Accuracy | - Less accurate and sensitive to step size choice, especially for higher derivatives. | - Highly accurate for all orders of derivatives. |
| Robustness | - Error prone due to numerical instability. | - More robust against numerical issues. |
| Efficiency | - Generally faster, but efficiency decreases with higher orders. | - Generally slower, but efficiency maintained for higher derivatives. |
| Memory Usage | - Lower memory usage in general. | - Can be memory-intensive due to graph storage. |
| Scope | - Works fine with “black box” functions. | - Needs to modify internal parts of functions, so may fail with complex functions. |
Analytical differentiation in Julia
Symbolics.jl
Latexify.jl to generate LaTeX expressions for usSymbolics.jl: defining symbolsLet’s work out a simple example of a profit function with isoelastic inverse demand \(p(q) = \alpha q^{-1/\epsilon}\) and quadratic cost \(c(q) = \kappa \frac{1}{2}q^2\)
\[ \pi(q) = q p(q) - c(q) = q [\alpha q^{-1/\epsilon}] - \kappa \frac{1}{2}q^2 \]
Symbolics.jl: defining symbolsThen, define the symbol for \(\pi\) as a function of other symbols
Symbolics.jl: taking derivatives\[ \begin{equation} \frac{ - q^{-1 + \frac{-1}{\epsilon}} ~ q ~ \alpha}{\epsilon} - q ~ \kappa + q^{\frac{-1}{\epsilon}} ~ \alpha \end{equation} \]
Symbolics.jl: taking JacobiansNow let’s expand this to a duopoly problem:
\[ \begin{gather} \pi_1(q_1, q_2) = q_1 p(q_1 + q_2) - c(q_1) = q_1 [\alpha (q_1+q_2)^{-1/\epsilon}] - \kappa_1 \frac{1}{2}q_1^2 \\ \pi_2(q_1, q_2) = q_2 p(q_1 + q_2) - c(q_2) = q_2 [\alpha (q_1+q_2)^{-1/\epsilon}] - \kappa_2 \frac{1}{2}q_2^2 \end{gather} \]
\[ \begin{equation} - 0.5 ~ \mathtt{q\_1}^{2} ~ \mathtt{\kappa\_1} + \left( \mathtt{q\_1} + \mathtt{q\_2} \right)^{\frac{-1}{\epsilon}} ~ \mathtt{q\_1} ~ \alpha \end{equation} \]
Symbolics.jl: taking JacobiansWe can obtain the Jacobian matrix of the stacked profit functions with
\[ \begin{equation} \left[ \begin{array}{cc} \frac{ - \left( \mathtt{q\_1} + \mathtt{q\_2} \right)^{-1 + \frac{-1}{\epsilon}} ~ \mathtt{q\_1} ~ \alpha}{\epsilon} - \mathtt{q\_1} ~ \mathtt{\kappa\_1} + \left( \mathtt{q\_1} + \mathtt{q\_2} \right)^{\frac{-1}{\epsilon}} ~ \alpha & \frac{ - \left( \mathtt{q\_1} + \mathtt{q\_2} \right)^{-1 + \frac{-1}{\epsilon}} ~ \mathtt{q\_1} ~ \alpha}{\epsilon} \\ \frac{ - \left( \mathtt{q\_1} + \mathtt{q\_2} \right)^{-1 + \frac{-1}{\epsilon}} ~ \mathtt{q\_2} ~ \alpha}{\epsilon} & \frac{ - \left( \mathtt{q\_1} + \mathtt{q\_2} \right)^{-1 + \frac{-1}{\epsilon}} ~ \mathtt{q\_2} ~ \alpha}{\epsilon} - \mathtt{q\_2} ~ \mathtt{\kappa\_2} + \left( \mathtt{q\_1} + \mathtt{q\_2} \right)^{\frac{-1}{\epsilon}} ~ \alpha \\ \end{array} \right] \end{equation} \]
Symbolics.jl: turning symbolics into numericsThis packages also offers a handy way to turn those expressions into actual Julia functions!
But first we need to assign the global values for our constant parameters
Symbolics.jl: turning symbolics into numericsTesting some values
Symbolics.jl: turning symbolics derivatives into numerical onesNow, we are ready to create a Julia function for the Jacobian
build_function function actually give us two options in this case.
Symbolics.jl: turning symbolics derivatives into numerical onesNext, we use a wrapper function to map vector q into symbolic variables
2×2 Matrix{Float64}:
2.3179 -1.10813
-1.10813 2.2779