Code
function hello()
print("hello, world\n")
end;
hello()hello, world
Numerical arithmetic
(or How do computers work?)
Knowing the basics of how computers perform the instructions we give them is essential to
function hello()
print("hello, world\n")
end;
hello()hello, world
hello is compiled: another program will read our human language and convert it to machine language (instructions)0s and 1s) that can be represented as characters in a display to another sequence of bits that represents instructions which the CPU can understand. . .
hello() in Julia REPL. Our input prompts Julia and the Operating System (OS) to prepare those instructions for execution. . .
. . .
Moral of the story: the computer spends A LOT of time just moving information around and very little time actually doing “real work!”
The computer spends A LOT of time just moving information around and very little time actually doing “real work!” What to do?
. . .
Efficient computation needs to
. . .
. . .
. . .
Physics law: bigger memory = slower memory
But the main memory is too big/slow. The CPU actually has its own internal layers of fast memory with different sizes
Now that you knowing about the memory hierarchy, you can use that knowledge to write more efficient, cache-friendly code. That means:
. . .
Simple arithmetic: (10^{-20} + 1) - 1 = 10^{-20} + (1 - 1), right?
. . .
Let’s check:
x = (1e-20 + 1) - 1;
y = (1e-20) + (1 - 1);
x == yfalse
What happened?!
Welcome to the world of finite precision
x0.0
y1.0e-20
. . .
Numbers are an ideal/abstract concept that works perfectly in analytic operations
Computers are a physical machine with limited resources: they can’t have infinite precision
. . .
Understanding this limitation is crucial in numerical analysis
Q: How are numbers physically represented by a computer?
. . .
A: In a binary (or Base 2) system. This means each digit can only take on 0 or 1
. . .
The system we’re most used to is Base 10: each digit can take on 0-9
. . .
Q: So which numbers can be represented by a computer?
. . .
A: A subset of the rational numbers
. . .
But computers have finite memory and hard disk space, there are infinite rational numbers
. . .
This imposes a strict limitation on the storage of numbers
Numbers are stored as: \pm mb^{\pm n}
All three are integers
. . .
The size of numbers a computer can represent is limited by how much space is allocated for a number
. . .
Space allocations are usually 64 bits: 53 for m and 11 for n
. . .
We’ve seen these types before
println(typeof(5.0))
println(typeof(5))Float64
Int64
Int64 means it is an integer with 64 bits of storage
Float64 means it is a floating point number with 64 bits of storage
These two types take the same space in memory (64 bits) but are interpreted differently by the processor
Limitations on storage suggest three facts
. . .
Fact 1: There exists a machine epsilon: the smallest relative quantity representable by a computer
. . .
Machine epsilon is the smallest \epsilon such that a machine can always distinguish N+\epsilon > N > N-\epsilon
println("Machine epsilon ϵ is $(eps(Float64))")
println("Is 1 + ϵ/2 > 1? $(1.0 + eps(Float64)/2 > 1.0)")Machine epsilon ϵ is 2.220446049250313e-16
Is 1 + ϵ/2 > 1? false
The function eps gives the distance between 1.0 and the next representable Float64
println("The smallest representable number larger than 1.0 is $(nextfloat(1.0))")
println("The largest representable number smaller than 1.0 is $(prevfloat(1.0))")The smallest representable number larger than 1.0 is 1.0000000000000002
The largest representable number smaller than 1.0 is 0.9999999999999999
The machine epsilon changes depending on the amount of storage allocated
. . .
println("32 bit machine epsilon is $(eps(Float32))")
println("Is 1 + ϵ/2 > 1? $(Float32(1) + eps(Float32)/2 > 1)")32 bit machine epsilon is 1.1920929e-7
Is 1 + ϵ/2 > 1? false
. . .
There is a trade-off between precision and storage requirements
Fact 2: There is a smallest representable number
println("64 bit smallest float is $(floatmin(Float64))")
println("32 bit smallest float is $(floatmin(Float32))")
println("16 bit smallest float is $(floatmin(Float16))")64 bit smallest float is 2.2250738585072014e-308
32 bit smallest float is 1.1754944e-38
16 bit smallest float is 6.104e-5
Fact 3: There is a largest representable number
println("64 bit largest float is $(floatmax(Float64))")
println("32 bit largest float is $(floatmax(Float32))")
println("16 bit largest float is $(floatmax(Float16))")64 bit largest float is 1.7976931348623157e308
32 bit largest float is 3.4028235e38
16 bit largest float is 6.55e4
Q: If eps() was of the order of 10^{-16}, why is floatmin() of the order of 10^{-308}?
. . .
A: Because it’s all about the relative scale of numbers. eps() in particular is about the relative distance from 1.0\times 10^0, so it’s a good reference point. But what matters is the distance in the exponents! Let’s divide everything by 100 and check:
println("Is 1/100 + ϵ/100 > 1/100? $(1.0/100 + eps(Float64)/100 > 1.0/100)")Is 1/100 + ϵ/100 > 1/100? true
println("Because the next representable number from 1/100 is $(nextfloat(1.0/100.0)),
so the machine epsilon at that scale is $(eps(1.0/100.0))")Because the next representable number from 1/100 is 0.010000000000000002,
so the machine epsilon at that scale is 1.734723475976807e-18
And it works all the way down to the smallest representable number
println("The mininum float is $(floatmin(Float64)), and the next float is $(nextfloat(floatmin(Float64))),
so their difference $(nextfloat(floatmin(Float64)) - floatmin(Float64))
is equal to epsilon at that scale $(eps(floatmin(Float64)))")The mininum float is 2.2250738585072014e-308, and the next float is 2.225073858507202e-308,
so their difference 5.0e-324
is equal to epsilon at that scale 5.0e-324
. . .
So we will have problems if we try and take a step much bigger based on the epsilon at scale 1.0
println("Is floatmin + ϵ(1.0) = ϵ(1.0)? $(floatmin(Float64) + eps(1.0) == eps(1.0))")Is floatmin + ϵ(1.0) = ϵ(1.0)? true
println("The largest 64 bit integer is $(typemax(Int64))")
println("Add one to it and we get: $(typemax(Int64)+1)")
println("It loops us around the number line: $(typemin(Int64))")The largest 64 bit integer is 9223372036854775807
Add one to it and we get: -9223372036854775808
It loops us around the number line: -9223372036854775808
The scale of your problem matters
. . .
If a parameter or variable is > floatmax or < floatmin, you will have a very bad time
. . .
Scale numbers appropriately (e.g. millions of dollars, not millionths of cents)
We can only represent a finite number of numbers
. . .
This means we will have error in our computations
. . .
Error comes in two major forms:
We will always need to round numbers to the nearest computer representable number. This introduces error
println("Half of π is: $(π/2)")Half of π is: 1.5707963267948966
. . .
The computer gave us a rational number, but \pi/2 should be irrational
Lots of important numbers are defined by infinite sums e^x = \sum_{n=0}^\infty \frac{x^n}{n!}
. . .
It turns out that computers can’t add up infinitely many terms because there is finite space \rightarrow we need to truncate the sum
Errors are small, who cares?
. . .
You should!
Because errors can propagate and grow as you keep applying an algorithm (e.g. function iteration)
Let’s get back to the example we saw earlier
x = (1e-20 + 1) - 1
y = 1e-20 + (1 - 1)
println("The difference is: $(x-y).")The difference is: -1.0e-20.
Why did we get y > x ?
. . .
. . .
Adding numbers of greatly different magnitudes does not always works like you would want
Consider a simple quadratic eq. x^2-26x+1=0 with solution x = 13-\sqrt{168}
. . .
println("64 bit: 13 - √168 = $(13-sqrt(168))")64 bit: 13 - √168 = 0.03851860318427924
println("32 bit: 13 - √168 = $(convert(Float32,13-sqrt(168)))")32 bit: 13 - √168 = 0.038518604
println("16 bit: 13 - √168 = $(convert(Float16,13-sqrt(168)))")16 bit: 13 - √168 = 0.0385
Let’s check whether they solve the equation
. . .
x64 = 13-sqrt(168); x32 = convert(Float32,13-sqrt(168)); x16 = convert(Float16,13-sqrt(168));
f(x) = x^2 - 26x + 1;
println("64 bit: $(f(x64))")
println("32 bit: $(f(x32))")
println("16 bit: $(f(x16))")64 bit: 7.66053886991358e-15
32 bit: 0.0
16 bit: 0.0004883
Let’s just subtract two numbers: 100000.2 - 100000.1
We know the answer is 0.1
Try it!
100000.2 - 100000.1if clause) if this difference is equal to 0.1println("100000.2 - 100000.1 is: $(100000.2 - 100000.1)")
if (100000.2 - 100000.1) == 0.1
println("and it is equal to 0.1")
else
println("and it is not equal to 0.1")
end100000.2 - 100000.1 is: 0.09999999999126885
and it is not equal to 0.1
Why do we get this error?
. . .
Neither of the two numbers can be precisely represented by the machine!
100000.1 \approx 8589935450993459\times 2^{-33} = 100000.0999999999767169356346130 100000.2 \approx 8589936309986918\times 2^{-33} = 100000.1999999999534338712692261
So their difference won’t necessarily be 0.1
There are tools for approximate equality. Remember the \approx operator we saw last lecture? You can use it
(100000.2 - 100000.1) ≈ 0.1 # You type \approx then hit TABtrue
This is equivalent to:
isapprox(100000.2 - 100000.1, 0.1)true
This matters, particularly when you’re trying to evaluate logical expressions of equality