4-3  FLOATING-POINT NUMBERS - SOME PRACTICAL ISSUES 
 ***************************************************

 The previous chapters discussed briefly the way floating-point
 arithmetic is implemented in practice, In the following chapters
 we will explore the consequences.

 Topics discussed in this chapter:

    Roundoff errors in practice
    Little and Big-Endian machines
    Porting binary files



 Roundoff errors in practice
 ---------------------------
 We have already said that roundoff errors are the result of the 'forced' 
 representation of the continuum of real numbers by a discrete subset.

 Roundoff errors are generated when the internal representation is created,
 and after almost every arithmetical operation performed, they accumulate 
 and are very difficult to estimate directly.

 You can find if roundoff errors are trashing your computation in
 the following ways:

    1) Recomputing with a larger and more exact float type, e.g.
       replace every REAL*4 in the program with REAL*8 or REAL*16, 
       recompute and compare the new results with the former. 
       If there is no significant difference you probably don't have 
       a roundoff problem.

    2) Using a special compiler option, that truncates floating-point
       results after a specified number of bits, with different values.
       Such compiler option exists for CRAY's single precision floats
       on CRAY computers (f77 -Wf-t n where n is in [0,47]).

    3) Repeat the computation several times, each time randomly modify 
       (but in a controlled way) the value of some key variable. 
       See the example program in the chapter on numerical stability.

       The idea is to add/subtract a small random number from the 
       variable under test. Controlled randomization can be applied
       to different parts of the program, and the results of the 
       program can be examined, if they change unexpectedly, then 
       probably a numerical problem does exist.

       The responsible piece of code can be located by applying the
       randomization to different places, and watching the results.

    4) Using an interval/stochastic arithmetic package. These methods 
       compute with each arithmetical operation the maximal error 
       (or an interval the true result must be within), or the most 
       probable error. 

       Using such arithmetical packages slows down the computation, 
       but may be very valuable during the development stage.

       These extended arithmetics are implemented as procedure calls,
       with a special preprocessor or with the function overloading 
       feature of Fortran 90.

    5) Using a special compiler option that changes the rounding mode,
       e.g. instead of rounding towards zero, round towards -infinity.


 You can make roundoff errors smaller if you use a very large type of 
 floating-point numbers, DEC and Sun supports a non-standard REAL*16. 
 REAL*16 computations takes more CPU time than REAL*8 (X 3-13), but the 
 roundoff error is quite small compared with REAL*8. 

 See the chapter on performance measurements for an example program
 that compares CPU time usage of different float types.

 See the chapter on avoiding floating-point errors for other methods
 to minimize roundoff errors.

 However, don't think you can solve all numerical problems that way,
 accuracy depends also on other factors except the size of the individual 
 roundoff errors. For example, unstable algorithms can trash even REAL*16 
 computations because they accumulate and magnify the individual roundoff
 errors (see the chapter on stability).


 Little and Big-Endian machines
 ------------------------------
 Another practical complication arises when you store any multi-byte 
 entity in memory. 

 Computer memory is referenced by addresses that are positive integers. 
 It is 'natural' to store numbers with the LEAST SIGNIFICANT BYTE coming 
 before the MOST SIGNIFICANT BYTE in the computer memory, however 
 computer designers prefer sometimes to use a reversed order version of 
 the representation.

 The 'natural' order, where less significant binary digits comes before 
 more significant digits in memory is called LITTLE-ENDIAN, many vendors 
 like IBM, CRAY and Sun preferred the reverse order that of course is 
 called BIG-ENDIAN.

 Vendors and float types
 -----------------------

  Machine       Endian          Type            Conversion
  ======        ======          ====            ==========
  ALPHA         LITTLE          DEC/IEEE        Several mechanisms
  CRAY          BIG             ----            ---
  HP            ------          ----            ---
  IBM           BIG             IBM             ---
  MAC           BIG             IEEE            ---
  PC            LITTLE          IEEE            ---
  SGI           BIG             IEEE            ---
  SUN           BIG             IEEE            yes
  VAX           LITTLE          DEC             Several mechanisms


 A small example program can check the endian status of your computer, 
 (of course the program would be less horrible if we used the BYTE 
  non-standard data type and EQUIVALENCE): 


      PROGRAM CHKEND
C     ------------------------------------------------------------------
      INTEGER*4
     *              I,
     *              ASCII_0, ASCII_1, ASCII_2, ASCII_3
C     ------------------------------------------------------------------
      PARAMETER(
     *              ASCII_0 =       48,
     *              ASCII_1 =       49,
     *              ASCII_2 =       50,
     *              ASCII_3 =       51)
C     ------------------------------------------------------------------
      COMMON // I
C     ------------------------------------------------------------------
      I = ASCII_0 + ASCII_1*256 + ASCII_2*(256**2) + ASCII_3*(256**3)
      CALL SUB()
C     ------------------------------------------------------------------
      END


      SUBROUTINE SUB()
C     ------------------------------------------------------------------
      CHARACTER
     *              I*4
C     ------------------------------------------------------------------
      COMMON // I
C     ------------------------------------------------------------------
      WRITE(*,*) ' Integer structure: ', I
      WRITE(*,*) ' Byte order:        ', '0123'
      WRITE(*,*) 
C     ------------------------------------------------------------------
      IF (I .EQ. '0123') THEN
        WRITE(*,*) ' Machine is Little-Endian '
        RETURN
      ENDIF
C     ------------------------------------------------------------------
      IF (I .EQ. '3210') THEN
        WRITE(*,*) ' Machine is Big-Endian '
        RETURN
      ENDIF
C     ------------------------------------------------------------------
      WRITE(*,*) ' Mixed endianity machine ... '
C     ------------------------------------------------------------------
      RETURN
      END


 The example program may have to be modified on 64 bit machines where
 INTEGERs may be 8 bytes long.


 Porting binary files
 --------------------
 See the chapter on files and records for a description of the
 internal structure of unformatted files on different systems.

 When you transfer binary (unformatted) files from one machine to 
 another that has different type of internal representation you
 will need to perform some conversion. You will need to convert
 between different floating point representations, different 
 endianity types, and there may also be differences in the file 
 formats used.

 The most simple solution is to avoid transferring binary files and
 use instead ASCII (formatted) files, the price is making the
 transferred files about 2-3 times larger.

 DEC and other vendors supplies some conversion mechanisms between 
 their NATIVE formats and the formats of others.


  +---------------------------------------------------------------------+
  |     SUMMARY                                                         |
  |     =======                                                         |
  |     1) Roundoff errors accumulates; Can be crudely estimated        |
  |     2) Another complication, little/big endian representations      |
  |     3) Porting binary files may be a problem                        |
  +---------------------------------------------------------------------+

Return to contents page