FORTRAN Performance Tuning co-guide

Copyright (C) 1998 Timothy C. Prince

Freely distributable with acknowledgment



References

branch history and prediction schemes: Uht, Sindagi, Somanathan "Branch Effect Reduction Techniques" IEEE Computer May 1997 pp 71-81.

cache prefetch:
Vander Wiel, Lilja "When Caches Aren't Enough: Data Prefetching Techniques: IEEE Computer  July '97 pp 23-30.

celefunt: Cody's accuracy test suite for FORTRAN complex math functions netlib/toms714. Quite useful in its standard form, although not written for extended precision (like Intel).

directives: "Visual KAP for OpenMP User's Manual" www.kai.com/vkomp

divide/sqrt hardware techniques:
Soderquist, Leeser "Division and Square Root ..." IEEE Micro July/Aug'97 pp 56-66

egcs: directories under ftp.cygnus.com and many mirror sites
 
elefunt: Accuracy test suite for FORTRAN math functions. Has some portability problems (runs but results not right). Translated to C by Plauger and further modified by Prince. Copyright by Plauger, possibly available with permission.

f77/f90 comparison:
Einarsson, Shokin "Fortran 90 for the Fortran 77 Programmer"
http://www.nsc.liu.se.~boein/f77tof90

Computational Science Education Project "Fortran 90 and Computational Science".

f90 tutorial: Metcalf http://wwwcn.cern.ch/asdoc/WWW/f90/
Patrick Corde, Herve Delouis "Cours Fortran 90"  idris.fr

f95 compilers and netlib software: many listed on www.fortran.com/fortran
look for modernized versions of netlib software elsewhere
e.g. http://www.vic.cmis.csiro.au/~alan

f95: FORTRAN 95 Handbook, Adams, Brainerd et al MIT Press 1997 ISBN0-262-51096-0.

fused MAC effects etc:
http://http.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps Note that Kahan's quadratic code for fused MAC is not satisfactorily programmable in standard FORTRAN, but can be done reasonably in C.

g77: gnu or egcs mirror sites; CD versions tend to be out of date.

HP PA-8000:
Kumar "the HP PA-8000 RISC CPU" IEEE Micro Mar/Apr '97 pp 27-32.

IEEE P754/854: Cody, IEEE Micro Aug. 1984 pp 84-100.

Intel Pentium Pro: Papworth "Tuning the Pentium Pro.." IEEE Micro April 1996 pp 8-15; Bhandarkar and Ding "Performance Characterization of the Pentium Pro" distributed by Internet.

latency and instruction level parallelism, Newton and Goldschmidt schemes: Soderquist, Leeser "Division and Square Root..." IEEE Micro July 1997 pp 56-66.

Alan Miller's site for modernized netlib: http://www.ozemail.com.au/~milleraj

MIPS/SGI R10000: Yeager "The MIPS R10000.." IEEE Micro April 1996 pp28-40.

pipelining: Smith, Weiss " PowerPC 601 and Alpha 21064..." IEEE Computer, June 1994 pp 46-58