Math 371 - Alexiades - UTK

            Errors - Finite Precision - Roundoff

  • errors/approximations are everywhere   in problem formulation, data, algorithm, programming, arithmetic, ...
  • if X is an approximation to x:   absolute error = |X − x| ,   relative error = |X − x|/|x|   ( = | 1 − X/x | )
  • approximation correct to k significant (decimal) digits   means:   |error| ≤ ½ 10−k

    Machine numbers (floating point numbers)
  • real numbers are represented in the form: (sign).(Mantissa) 2Exponent
        number of bits for Mantissa determines the presicion,
        number of bits for (biased) Exponent determines the range of representable numbers.
  • Only finitely many floating point numbers exist on any computer!
  • They are NOT evenly spaced (same # between consecutive powers of 2), so they spread out further apart.
  • Any real number x must be approximated by a nearest machine number, fl(x), resulting in round-off error.
      e.g. For x=8.3456: fl(x)=.83456×101. On a 3-digit decimal machine: fl(x)=.835×10=8.35 rel.error 0.05%
      NOTE: x=0.1 is NOT a machine number! so fl(0.1) ≠ 0.1 !!!
  • There is a smallest, s, and largest, L, positive machine number; |x|<s  underflows to 0,  |x|>L   overflows to Inf.
      Computations leading to an undefined value (like 0/0, Inf-Inf) result in NaN (Not-a-Number)
  • NOTE: Calculations are performed internally on CPU registers in extra precision (typically 80 bits), then rounded.
           Writing a computed value to a variable forces rounding to machine precision.

    IEEE 754 Standard (1985)
    Single  Precision (SP) :  32 bit word:  sign,   8 bits for biased Exponent, 23 bits for Mantissa
    Double Precision (DP):  64 bit word:  sign, 11 bits for biased Exponent, 53 bits for Mantissa
    . precision lower
    expo
    upper
    expo
    smallest positive # largest # number of
    machine #s
    εmach # of reliable
    decimal digits
    SP  23 bits −126 +127 2−126 ≈ 1.2 × 10−38 ≈ 3.4 × 10+38 ≈ 4 × 109 2−24 ≈ 6 × 10−8   7
    DP  53 bits −1022 +1023 2−1022 ≈ 2.2 × 10−308 ≈ 1.8 × 10+308 ≈ 2 × 1019 2−53 ≈ 1.1 × 10−16 15

    Machine epsilon εmach
  • εmach is the smallest positive number such that 1+εmach > 1 .   = distance from 1 to nearest machine number.
  • εmach = 2−p , p=precision = # of bits in mantissa.   In double precision, p=53, so   εmach = 2−53 ≈ 10−15 .
  • For any real x,   fl(x) = x(1 + ε), so   ε = (fl(x)−x)/x = relative error in representing x by fl(x), with ε ≤ εmach .
  • Thus, the absolute error in representing x by fl(x) is   roundoff = fl(x)−x = εx , it is proportional to x ,
      so the larger the number the larger the roundoff error!
  • In Matlab: eps(x) gives the machine number nearest to x for that computer. eps(1) or just eps gives εmach
  • For example: eps(1)=2.2204e-16 , eps(1000)=1.1369e-13 , eps(1.e6)=1.1642e-10 ..., losing more and more accuracy.
  • Testing: (1+eps == 1) gives 0 (False),   (1+eps/2 == 1) gives 1 (True).
            So, any (positive) number less than eps is effectively zero in computation!

    Important consequencies:
    1. Inherent error in representing real numbers (by nearest machine number), called round-off error .
    2. Loss of significant digits in arithmetic may occur. Especially dangerous situations are:
      • subtractive cancellation: when subtracting nearly equal reals (try to avoid...)
      • negligible addition: when adding a very small to a large real number
      • unbalanced multiplication: when nultiplying by a very small number (or dividing by a very large number).
    3. Relative error in addition or subtraction is ≤ 2εmach ; in multiplication or division is ≤ 3εmach.
      After some millions of these, precision may be lost... but it can be much worse in the cases shown in 2. above.
    4. Machine arithmetic is not necessarily associative, i.e. grouping of operations may affect the result!
      e.g. if a=0.99εmach then (1+a)+a=1+a = 1 but 1+(a+a)=1+2a > 1 !!!
    5. Should NEVER ask for equality of computed real numbers, use if( abs(a − b) ≤ TOL ) , for some TOLerance
      ( with TOL no smaller than 1.e-15, for double precision ).
    6. Number of decimal digits that can be trusted:   up to 7 in single precision, up to 15 in double precision.
      Do not trust what you see: printed values are affected by how they are being printed (format)!

    Some strategies for effective computing:
  • Minimize number of operations.
  • Use double precision computations (Matlab does it by default).
  • Guard against loss of digits, (subtraction of nearly equal numbers, etc)
      Try to re-write the expression somehow...
  • Enter very big and very small numbers in scientific notation: e.g. 3.14e6 , 3.14e-6
      and print big and small numbers with '%e' formating.
  • Use integer exponents in powers (use * for low powers: x*x, x*x*x )
  • Do not round input, enter all digits known to be correct.
  • Evaluate polynomials in nested form (saves operations and may reduce roundoff):
    e.g. ax2 + bx + c = c + x*(b+a*x) ,   e.g. ax3 + bx2 + cx + d = d + x*(c+x*(b+a*x)) , etc.
  • The quadratic formula is subject to loss of significant digits when   0 < 4ac << b2 (much less than).
      A safe way to find roots of a quadratic:   q = −( b + sign(b)√(b2−4ac) ) / 2 ,   then x1 = q/a ,   x2 = c / q .

    Fact of life: Computation (with reals) is always approximate. Errors may come from many sources and many are unavoidable.
        Estimation of algorithmic errors (from discretization and roundoff) is crucial, the subject of Numerical Analysis.
        More generally, the field of Uncertainty Quantification (UQ) studies how errors affect solutions.