Math 171 - UTK

Math 371 - Alexiades - UTK

Errors - Finite Precision - Roundoff

errors/approximations are everywhere in problem formulation, data, algorithm, programming, arithmetic, ...

if X is an approximation to x: absolute error = |X − x| , relative error = |X − x|/|x| ( = | 1 − X/x | )

approximation correct to k significant (decimal) digits means: |error| ≤ ½ 10^−k

Machine numbers (floating point numbers)

real numbers are represented in the form: (sign).(Mantissa) 2^Exponent
number of bits for Mantissa determines the presicion,
number of bits for (biased) Exponent determines the range of representable numbers.

Only finitely many floating point numbers exist on any computer!

They are NOT evenly spaced (same # between consecutive powers of 2), so they spread out further apart.

Any real number x must be approximated by a nearest machine number, fl(x), resulting in round-off error.
e.g. For x=8.3456: fl(x)=.83456×10¹. On a 3-digit decimal machine: fl(x)=.835×10=8.35 rel.error 0.05%
NOTE: x=0.1 is NOT a machine number! so fl(0.1) ≠ 0.1 !!!

There is a smallest, s, and largest, L, positive machine number; |x|<s underflows to 0, |x|>L overflows to Inf.
Computations leading to an undefined value (like 0/0, Inf-Inf) result in NaN (Not-a-Number)

NOTE: Calculations are performed internally on CPU registers in extra precision (typically 80 bits), then rounded.
Writing a computed value to a variable forces rounding to machine precision.

IEEE 754 Standard (1985)
Single Precision (SP) : 32 bit word: sign, 8 bits for biased Exponent, 23 bits for Mantissa
Double Precision (DP): 64 bit word: sign, 11 bits for biased Exponent, 53 bits for Mantissa

.	precision	lower expo	upper expo	smallest positive #	largest #	number of machine #s	ε_mach	# of reliable decimal digits
SP	23 bits	−126	+127	2⁻¹²⁶ ≈ 1.2 × 10⁻³⁸	≈ 3.4 × 10⁺³⁸	≈ 4 × 10⁹	2⁻²⁴ ≈ 6 × 10⁻⁸	7
DP	53 bits	−1022	+1023	2⁻¹⁰²² ≈ 2.2 × 10⁻³⁰⁸	≈ 1.8 × 10⁺³⁰⁸	≈ 2 × 10¹⁹	2⁻⁵³ ≈ 1.1 × 10⁻¹⁶	15

Machine epsilon ε_mach

ε_mach is the smallest positive number such that 1+ε_mach > 1 . = distance from 1 to nearest machine number.

ε_mach = 2^−p , p=precision = # of bits in mantissa. In double precision, p=53, so ε_mach = 2⁻⁵³ ≈ 10⁻¹⁵ .

For any real x, fl(x) = x(1 + ε), so ε = (fl(x)−x)/x = relative error in representing x by fl(x), with ε ≤ ε_mach .

Thus, the absolute error in representing x by fl(x) is roundoff = fl(x)−x = εx , it is proportional to x ,
so the larger the number the larger the roundoff error!

In Matlab: eps(x) gives the machine number nearest to x for that computer. eps(1) or just eps gives ε_mach

For example: eps(1)=2.2204e-16 , eps(1000)=1.1369e-13 , eps(1.e6)=1.1642e-10 ..., losing more and more accuracy.

Testing: (1+eps == 1) gives 0 (False), (1+eps/2 == 1) gives 1 (True).
So, any (positive) number less than eps is effectively zero in computation!

Important consequencies:
1. Inherent error in representing real numbers (by nearest machine number), called round-off error .
2. Loss of significant digits in arithmetic may occur. Especially dangerous situations are:
• subtractive cancellation: when subtracting nearly equal reals (try to avoid...)
• negligible addition: when adding a very small to a large real number
• unbalanced multiplication: when nultiplying by a very small number (or dividing by a very large number).
3. Relative error in addition or subtraction is ≤ 2ε_mach ; in multiplication or division is ≤ 3ε_mach.
After some millions of these, precision may be lost... but it can be much worse in the cases shown in 2. above.
4. Machine arithmetic is not necessarily associative, i.e. grouping of operations may affect the result!
e.g. if a=0.99ε_mach then (1+a)+a=1+a = 1 but 1+(a+a)=1+2a > 1 !!!
5. Should NEVER ask for equality of computed real numbers, use if( abs(a − b) ≤ TOL ) , for some TOLerance
( with TOL no smaller than 1.e-15, for double precision ).
6. Number of decimal digits that can be trusted: up to 7 in single precision, up to 15 in double precision.
Do not trust what you see: printed values are affected by how they are being printed (format)!

Some strategies for effective computing:

Minimize number of operations.

Use double precision computations (Matlab does it by default).

Guard against loss of digits, (subtraction of nearly equal numbers, etc)
Try to re-write the expression somehow...

Enter very big and very small numbers in scientific notation: e.g. 3.14e6 , 3.14e-6
and print big and small numbers with '%e' formating.

Use integer exponents in powers (use * for low powers: x*x, x*x*x )

Do not round input, enter all digits known to be correct.

Evaluate polynomials in nested form (saves operations and may reduce roundoff):
e.g. ax² + bx + c = c + x*(b+a*x) , e.g. ax³ + bx² + cx + d = d + x*(c+x*(b+a*x)) , etc.

The quadratic formula is subject to loss of significant digits when 0 < 4ac << b² (much less than).
A safe way to find roots of a quadratic: q = −( b + sign(b)√(b²−4ac) ) / 2 , then x₁ = q/a , x₂ = c / q .

Fact of life: Computation (with reals) is always approximate. Errors may come from many sources and many are unavoidable.
Estimation of algorithmic errors (from discretization and roundoff) is crucial, the subject of Numerical Analysis.
More generally, the field of Uncertainty Quantification (UQ) studies how errors affect solutions.