ACE 592 - Lecture 2.1

Numerical arithmetic

Author

Affiliation

Diego S. Cardoso

University of Illinois Urbana-Champaign

0.1 Course Roadmap

Introduction to Scientific Computing
Fundamentals of numerical methods
1. Numerical arithmetic
2. Numerical differentiation
3. Numerical integration
Systems of equations
Optimization
Function approximation
Structural estimation

0.2 Agenda

Today we will understand how computers perform mathematical operations
And, most importantly, why they have an inherent limitation that we must always have in mind

0.3 Main references for today

Miranda & Fackler (2002), Ch. 2
Judd (1998), Ch. 2
Lecture notes for Ivan Rudik’s Dynamic Optimization (Cornell)

1 Briefest review of Computer Systems

(or How do computers work?)

1.1 Why bother?

Knowing the basics of how computers perform the instructions we give them is essential to

Understand why we write code in certain ways
Why your program fails or takes forever to run
Write more efficient routines
Understand the limits of computational methods in quantitative research

1.2 What happens when we run a program?

Code

function hello()
  print("hello, world\n")
end;

hello()

hello, world

1.3 What happens when we run a program?

Our function hello is compiled: another program will read our human language and convert it to machine language (instructions)

I.e., our code goes from a sequence of bits (0s and 1s) that can be represented as characters in a display to another sequence of bits that represents instructions which the CPU can understand

. . .

The compiled function is placed into the main memory

Compiled code can also be stored in permanent memory (hard drives, SSD, disks). Those are the executable files. But before they are executed, they must be loaded into the main memory anyway

1.4 What happens when we run a program?

We type hello() in Julia REPL. Our input prompts Julia and the Operating System (OS) to prepare those instructions for execution

. . .

The CPU executes those instructions and outputs the results back in the command line, which is then converted into image by a graphics adapter and our display

. . .

Moral of the story: the computer spends A LOT of time just moving information around and very little time actually doing “real work!”

1.5 Running fast

The computer spends A LOT of time just moving information around and very little time actually doing “real work!” What to do?

. . .

Efficient computation needs to

Minimize the need to pass information around

. . .

This is what we do, for example, when we preallocate variables, avoid temporary allocations, vectorize

. . .

Make information travel faster

. . .

Faster hardware (not always affordable/available)
Cache memory!

1.6 Cache memory

Physics law: bigger memory = slower memory

Your hard disk can be 1,000\times larger than the main memory but it might take 10,000,000\times longer for the processor to read

But the main memory is too big/slow. The CPU actually has its own internal layers of fast memory with different sizes

Modern processors have registers + 3 levels of cache memory

1.7 Writing cache-friendly code

Now that you knowing about the memory hierarchy, you can use that knowledge to write more efficient, cache-friendly code. That means:

Making the common case go fast
- Programs often spend most of the time in a few loops. So, focus on the inner loops of your core functions

. . .

Minimize the chances of needing to bring variables from slow memory
- Reference local variables
- If you need to reference the same variable in multiple iterations, think about a way order your loops in a way that you make those repeated references in sequence

2 Numerical Arithmetic

2.1 Why bother?

Simple arithmetic: (10^{-20} + 1) - 1 = 10^{-20} + (1 - 1), right?

. . .

Let’s check:

Code

x = (1e-20 + 1) - 1;
y = (1e-20) + (1 - 1);
x == y

false

Go ahead, try it!

What happened?!

2.2 Why bother?

Welcome to the world of finite precision

Code

0.0

Code

1.0e-20

. . .

Numbers are an ideal/abstract concept that works perfectly in analytic operations

Computers are a physical machine with limited resources: they can’t have infinite precision

. . .

Understanding this limitation is crucial in numerical analysis

2.3 Number storage

Q: How are numbers physically represented by a computer?

. . .

A: In a binary (or Base 2) system. This means each digit can only take on 0 or 1

. . .

The system we’re most used to is Base 10: each digit can take on 0-9

. . .

Q: So which numbers can be represented by a computer?

. . .

A: A subset of the rational numbers

. . .

But computers have finite memory and hard disk space, there are infinite rational numbers

. . .

This imposes a strict limitation on the storage of numbers

2.4 Number storage

Numbers are stored as: \pm mb^{\pm n}

m is the significand
b is the base,
n is the exponent

All three are integers

. . .

The significand typically gives the significant digits
The exponent scales the number up or down in magnitude

2.5 Number storage

The size of numbers a computer can represent is limited by how much space is allocated for a number

. . .

Space allocations are usually 64 bits: 53 for m and 11 for n

. . .

We’ve seen these types before

Code

println(typeof(5.0))
println(typeof(5))

Float64
Int64

2.6 Number storage

Int64 means it is an integer with 64 bits of storage
Float64 means it is a floating point number with 64 bits of storage
- Floating point just means b^{\pm n} can move the decimal point around in the significand

These two types take the same space in memory (64 bits) but are interpreted differently by the processor

2.7 The limits of computers

Limitations on storage suggest three facts

. . .

Fact 1: There exists a machine epsilon: the smallest relative quantity representable by a computer

. . .

Machine epsilon is the smallest \epsilon such that a machine can always distinguish N+\epsilon > N > N-\epsilon

2.8 The limits of computers

Code

println("Machine epsilon ϵ is $(eps(Float64))")
println("Is 1 + ϵ/2 > 1? $(1.0 + eps(Float64)/2 > 1.0)")

Machine epsilon ϵ is 2.220446049250313e-16
Is 1 + ϵ/2 > 1? false

The function eps gives the distance between 1.0 and the next representable Float64

2.9 The limits of computers

Code

println("The smallest representable number larger than 1.0 is $(nextfloat(1.0))")
println("The largest representable number smaller than 1.0 is $(prevfloat(1.0))")

The smallest representable number larger than 1.0 is 1.0000000000000002
The largest representable number smaller than 1.0 is 0.9999999999999999

2.10 The limits of computers

The machine epsilon changes depending on the amount of storage allocated

. . .

Code

println("32 bit machine epsilon is $(eps(Float32))")
println("Is 1 + ϵ/2 > 1? $(Float32(1) + eps(Float32)/2 > 1)")

32 bit machine epsilon is 1.1920929e-7
Is 1 + ϵ/2 > 1? false

. . .

There is a trade-off between precision and storage requirements

This matters for low-memory systems like GPUs

2.11 The limits of computers

Fact 2: There is a smallest representable number

Code

println("64 bit smallest float is $(floatmin(Float64))")
println("32 bit smallest float is $(floatmin(Float32))")
println("16 bit smallest float is $(floatmin(Float16))")

64 bit smallest float is 2.2250738585072014e-308
32 bit smallest float is 1.1754944e-38
16 bit smallest float is 6.104e-5

2.12 The limits of computers

Fact 3: There is a largest representable number

Code

println("64 bit largest float is $(floatmax(Float64))")
println("32 bit largest float is $(floatmax(Float32))")
println("16 bit largest float is $(floatmax(Float16))")

64 bit largest float is 1.7976931348623157e308
32 bit largest float is 3.4028235e38
16 bit largest float is 6.55e4

2.13 Epsilon is about differences in number magnitudes

Q: If eps() was of the order of 10^{-16}, why is floatmin() of the order of 10^{-308}?

. . .

A: Because it’s all about the relative scale of numbers. eps() in particular is about the relative distance from 1.0\times 10^0, so it’s a good reference point. But what matters is the distance in the exponents! Let’s divide everything by 100 and check:

Code

println("Is 1/100 + ϵ/100 > 1/100? $(1.0/100 + eps(Float64)/100 > 1.0/100)")

Is 1/100 + ϵ/100 > 1/100? true

Code

println("Because the next representable number from 1/100 is $(nextfloat(1.0/100.0)),
 so the machine epsilon at that scale is $(eps(1.0/100.0))")

Because the next representable number from 1/100 is 0.010000000000000002,
 so the machine epsilon at that scale is 1.734723475976807e-18

2.14 Epsilon is about differences in number magnitudes

And it works all the way down to the smallest representable number

Code

println("The mininum float is $(floatmin(Float64)), and the next float is $(nextfloat(floatmin(Float64))),
 so their difference $(nextfloat(floatmin(Float64)) - floatmin(Float64))
 is equal to epsilon at that scale $(eps(floatmin(Float64)))")

The mininum float is 2.2250738585072014e-308, and the next float is 2.225073858507202e-308,
 so their difference 5.0e-324
 is equal to epsilon at that scale 5.0e-324

. . .

So we will have problems if we try and take a step much bigger based on the epsilon at scale 1.0

Code

println("Is floatmin + ϵ(1.0) = ϵ(1.0)? $(floatmin(Float64) + eps(1.0) == eps(1.0))")

Is floatmin + ϵ(1.0) = ϵ(1.0)? true

2.15 The limits of computers: time is a flat circle

Code

println("The largest 64 bit integer is $(typemax(Int64))")
println("Add one to it and we get: $(typemax(Int64)+1)")
println("It loops us around the number line: $(typemin(Int64))")

The largest 64 bit integer is 9223372036854775807
Add one to it and we get: -9223372036854775808
It loops us around the number line: -9223372036854775808

2.16 The limits of computers

The scale of your problem matters

. . .

If a parameter or variable is > floatmax or < floatmin, you will have a very bad time

. . .

Scale numbers appropriately (e.g. millions of dollars, not millionths of cents)

2.17 Error

We can only represent a finite number of numbers

. . .

This means we will have error in our computations

. . .

Error comes in two major forms:

Rounding
Truncation

2.18 Rounding

We will always need to round numbers to the nearest computer representable number. This introduces error

Code

println("Half of π is: $(π/2)")

Half of π is: 1.5707963267948966

. . .

The computer gave us a rational number, but \pi/2 should be irrational

2.19 Truncation

Lots of important numbers are defined by infinite sums e^x = \sum_{n=0}^\infty \frac{x^n}{n!}

. . .

It turns out that computers can’t add up infinitely many terms because there is finite space \rightarrow we need to truncate the sum

2.20 Why does this matter?

Errors are small, who cares?

. . .

You should!

Because errors can propagate and grow as you keep applying an algorithm (e.g. function iteration)

2.21 Error example 1

Let’s get back to the example we saw earlier

Code

x = (1e-20 + 1) - 1   
y = 1e-20 + (1 - 1)   

println("The difference is: $(x-y).")

The difference is: -1.0e-20.

2.22 Error example 1

Why did we get y > x ?

. . .

For (10^{-20} + 1) - 1: when we added 10^{-20} to 1, it got rounded away
For 10^{-20} + (1 - 1): here 1-1 was evaluated first and return 0, as we would expect

. . .

Adding numbers of greatly different magnitudes does not always works like you would want

2.23 Error example 2

Consider a simple quadratic eq. x^2-26x+1=0 with solution x = 13-\sqrt{168}

. . .

Code

println("64 bit: 13 - √168 = $(13-sqrt(168))")

64 bit: 13 - √168 = 0.03851860318427924

Code

println("32 bit: 13 - √168 = $(convert(Float32,13-sqrt(168)))")

32 bit: 13 - √168 = 0.038518604

Code

println("16 bit: 13 - √168 = $(convert(Float16,13-sqrt(168)))")

16 bit: 13 - √168 = 0.0385

2.24 Error example 2

Let’s check whether they solve the equation

. . .

Code

x64 = 13-sqrt(168); x32 = convert(Float32,13-sqrt(168)); x16 = convert(Float16,13-sqrt(168));
f(x) = x^2 - 26x + 1;
println("64 bit: $(f(x64))")
println("32 bit: $(f(x32))")
println("16 bit: $(f(x16))")

64 bit: 7.66053886991358e-15
32 bit: 0.0
16 bit: 0.0004883

2.25 Error example 3

Let’s just subtract two numbers: 100000.2 - 100000.1

We know the answer is 0.1

Try it!

Calculate the difference between 100000.2 - 100000.1
Test (with an if clause) if this difference is equal to 0.1

2.26 Error example 3

Code

println("100000.2 - 100000.1 is: $(100000.2 - 100000.1)")
if (100000.2 - 100000.1) == 0.1
    println("and it is equal to 0.1")
else
    println("and it is not equal to 0.1")
end

100000.2 - 100000.1 is: 0.09999999999126885
and it is not equal to 0.1

2.27 Error example 3

Why do we get this error?

. . .

Neither of the two numbers can be precisely represented by the machine!

100000.1 \approx 8589935450993459\times 2^{-33} = 100000.0999999999767169356346130 100000.2 \approx 8589936309986918\times 2^{-33} = 100000.1999999999534338712692261

So their difference won’t necessarily be 0.1

2.28 Error example 3

There are tools for approximate equality. Remember the \approx operator we saw last lecture? You can use it

Code

(100000.2 - 100000.1) ≈ 0.1 # You type \approx then hit TAB

true

This is equivalent to:

Code

isapprox(100000.2 - 100000.1, 0.1)

true

This matters, particularly when you’re trying to evaluate logical expressions of equality

Other Formats

0.1 Course Roadmap

0.2 Agenda

0.3 Main references for today

1 Briefest review of Computer Systems

1.1 Why bother?

1.2 What happens when we run a program?

1.3 What happens when we run a program?

1.4 What happens when we run a program?

1.5 Running fast

1.6 Cache memory

1.7 Writing cache-friendly code

2 Numerical Arithmetic

2.1 Why bother?

2.2 Why bother?

2.3 Number storage

2.4 Number storage

2.5 Number storage

2.6 Number storage

2.7 The limits of computers

2.8 The limits of computers

2.9 The limits of computers

2.10 The limits of computers

2.11 The limits of computers

2.12 The limits of computers

2.13 Epsilon is about differences in number magnitudes

2.14 Epsilon is about differences in number magnitudes

2.15 The limits of computers: time is a flat circle

2.16 The limits of computers

2.17 Error

2.18 Rounding

2.19 Truncation

2.20 Why does this matter?

2.21 Error example 1

2.22 Error example 1

2.23 Error example 2

2.24 Error example 2

2.25 Error example 3

2.26 Error example 3

2.27 Error example 3

2.28 Error example 3