William N. Celmaster 

Modern Fortran 
Revived as the 
Language of Scientific
Parallel Computing


A Brief History of Fortran
The Fortran (FORmula TRANslating) computer lan-
guage was the result of a project begun by John
Backus at IBM in 1954. The goal of this project was to
provide a way for programmers to express mathemati-
cal formulas through a formalism that computers could
translate into machine instructions. Initially there was
a great deal of skepticism about the efficacy of such 
a scheme. "How,'' the scientists asked, "would anyone
be able to tolerate the inefficiencies that would result
from compiled code?'' But, as it turned out, the first
compilers were surprisingly good, and programmers
were able, for the first time, to express mathematics in
a high-level computer language. 
Fortran has evolved continually over the years in
response to the needs of users, particularly in the areas
of mathematical expressivity, program maintainability,
hardware control (such as I/O), and, of course, code
optimizations. In the meantime, other languages such
as C and C   have been designed to better meet the
nonmathematical aspects of software design, such as
graphical interfaces and complex logical layouts. These
languages have caught on and have gradually begun to
erode the scientific/engineering Fortran code base. 
By the 1980s, pronouncements of the "death of
Fortran" prompted language designers to propose
extensions to Fortran that would incorporate the best
features of other high-level languages and, in addition,
provide new levels of mathematical expressivity popu-


New features of Fortran are changing the way    
in which scientists are writing and maintaining     
large analytic codes. Further, a number of these
new features make it easier for compilers to
generate highly optimized architecture-specific
codes. Among the most exciting kinds of           
architecture-specific optimizations are those
having to do with parallelism. This paper
describes Fortran 90 and the standardized
language extensions for both shared-memory
and distributed-memory parallelism. In par-
ticular, three case studies are examined, show-
ing how the distributed-memory extensions
(High Performance Fortran) are used both for

data parallel algorithms and for single-program­  
multiple-data algorithms.

Digital Technical Journal              Vol. 8 No. 3 1996     39

lar on supercomputers such as the CYBER 205 and the
CRAY systems. This language became standardized as
Fortran 90 (ISO/IEC 1539: 1991; ANSI X3.198-
1992). At the present time, Fortran 95, which
includes many of the parallelization features of High
Performance Fortran discussed later in this paper, is in
the final stages of standardization. It is not yet clear
whether the modernization of Fortran can, of itself,
stem the C tide. However, I will demonstrate in this
paper that modern Fortran is a viable mainstream lan-
guage for parallelism. It is true that parallelism is not
yet part of the scientific programming mainstream.
However, it seems likely that, with the scientists'
never-ending thirst for affordable performance, paral-
lelism will become much more common-especially


now that appropriate standards have evolved. Just as
I    Better readability 
      early Fortran enabled average scientists and engineers
I    Easier program maintenance 
      to program the computers of the 1960s, modern
      Fortran may enable average scientists and engineers to               Additionally, of course, programmers have the
      program parallel computers of the next decade.
assurance of complete portability between platforms
                                                                      and architectures. 
      An Introduction to Fortran 90                                        The following code fragment illustrates the simplic-
                                                                      ity of dynamic memory allocation with Fortran 90. It
      Fortran 90 introduces some important capabilities in            also includes some of the new syntax for declaring vari-
      mathematical expressivity through a wealth of natural           ables, some examples of array manipulations, and an
      constructs for manipulating arrays.1 In addition,               example of how to use the new intrinsic matrix multi-
      Fortran 90 incorporates modern control constructs               plication function. In addition, the exclamation mark,
      and up-to-date features for data abstraction and data           which is used to begin comment statements, is a new
      hiding. Some of these constructs, for example, DO               Fortran 90 feature that was widely used in the past as
      WHILE, although not part of FORTRAN 77, are                     an extension to FORTRAN 77. 
      already part of the de facto Fortran standard as pro-           REAL, DIMENSION(:,:,:),        ! NEW DECLARATION SYNTAX 
      vided, for example, with DEC Fortran.                          &  ALLOCATABLE :: GRID          ! DYNAMIC STORAGE 
           Among the key new features of Fortran 90 are the           REAL*8 A(4,4),B(4,4),C(4,4) ! OLD DECLARATION SYNTAX
      following:                                                      READ *, N                      ! READ IN THE DIMENSION
                                                                      ALLOCATE(GRID(N+2,N+2,2))      ! ALLOCATE THE STORAGE 
                                                                      GRID(:,:,1) = 1.0              ! ASSIGN PART OF ARRAY
      I    Inclusion of all of FORTRAN 77, so users can               GRID(:,:,2) = 2.0              ! ASSIGN REST OF ARRAY 
           compile their FORTRAN 77 codes without                     A = GRID(1:4,1:4,1)            ! ASSIGNMENT 
           modification                                               B = GRID(2:5,1:4,2)            ! ASSIGNMENT 
                                                                      C = MATMUL(A,B)                ! MATRIX MULTIPLICATION 
      I    Permissibility of free-form source code, so pro-
           grammers can use long (i.e., meaningful) variable               Some of the new features of Fortran 90 were intro-
           names and are not restricted to begin statements           duced not only for simplified programming but also 
           in column 7                                                to permit better hardware-specific optimizations. 
      I    Modern control structures like CASE and DO                 For example, in Fortran 90, one can write the array
           WHILE, so programmers can take advantage of                assignment 
           structured programming constructs                          A = B + C 
      I    Extended control of numeric precision, for archi-
           tecture independence                                       which in FORTRAN 77 would be written as 
      I    Array processing extensions, for more easily express-                DO 100 J = 1,N 
           ing array operations and also for expressing inde-                     DO 200 I = 1,M 
           pendence of element operations                                                A(I,J) = B(I,J) + C(I,J) 
                                                                      200    END DO 
      I    Pointers, for more flexible control of data placement      100  END DO 
      I    Data structures, for data abstraction                           The Fortran 90 array assignment not only is more
      I    User-defined types and operators, for data                 elegant but also permits the compiler to easily recog-
           abstraction                                                nize that the individual element assignments are inde-
      I    Procedures and modules, to help programmers                pendent of one another. If the compiler were targeting
           write reusable code                                        a vector or parallel computer, it could generate code
                                                                      that exploits the architecture by taking advantage of
      I    Stream character-oriented input/output features            this independence between iterations. 
      I    New intrinsic functions                                         Of course, the particular DO loop shown above is
           With these new features, a modern Fortran pro-             simple enough that many compilers would recognize
      grammer can not only successfully compile and exe-              the independence of iterations and could therefore
      cute previous standards-compliant Fortran codes but             perform the architecture-specific optimizations with-
      also design better codes with                                   out the aid of Fortran 90's new array constructs. But
                                                                      in general, many of the new features of Fortran 90
      I    Dramatically simplified ways of doing dynamic              help compilers to perform architecture-specific opti-
           memory management                                          mizations. More important, these features help pro-
      I    Dynamic memory allocation and deallocation for             grammers express basic numerical algorithms in ways
           memory management                                          inherently more amenable to optimizations that take
      I    Better modularity and therefore reusability                advantage of multiple arithmetic units. 


40    Digital Technical Journal               Vol. 8 No. 3 1996


A Brief History of Parallel Fortran: PCF and HPF           executed simultaneously (provided of course that the
                                                           hardware allows simultaneous access to instructions
During the past ten years, two significant efforts have    and data). Technology has evolved to the point at
been undertaken to standardize parallel extensions to      which compilers are often able to detect these kinds 
Fortran. The first of these was under the auspices of      of parallelization opportunities and automatically
the Parallel Computing Forum (PCF) and targeted            decompose codes. Even when the compiler is not able
global-shared-memory architectures. The PCF effort         to make this analysis, the programmer often is able to
was directed to control parallelism, with little atten-    do so, perhaps after performing a few algorithmic
tion to language features for managing data locality.      modifications. It is then relatively easy to provide lan-
The 1991 PCF standard established an approach to           guage constructs that the user can add to the program
shared-memory extensions of Fortran and also estab-        as parallelization hints to the compiler. 
lished an interim syntax. These extensions were later        This kind of analysis is all well and good, provided
somewhat modified and incorporated in the standard         that data can be accessed democratically and quickly by
extensions now known as ANSI X3H5.                         all processors. With modern hardware clocked at about
  At about the time the ANSI X3H5 standard                 300 megahertz, this amounts to saying that memory
was adopted, another standardization committee             latencies are lower than 100 nanoseconds, and memory
began work on extending Fortran 90 for distributed-        bandwidths are greater than 100 megabytes per sec-
memory architectures, with the goal of providing           ond. This characterizes today's single and symmetric
a language suitable for scalable computing. This           multiprocessing (SMP) computers such as Digital's
committee became known as the High Performance             AlphaServer 8400 system, which comes with twelve
Fortran Forum and produced in 1993 the High                600-megaflop processors on a backplane with a band-
Performance Fortran (HPF) language specification.2         width of close to 2 gigabytes per second. 
The HPF programming-model target was data paral-             In summary, the beauty of shared-memory paral-
lelism, and many data placement directives are pro-        lelism is that the programmer does not need to worry
vided for the programmer to optimize data locality. In     too much about where the data is and can concentrate
addition, HPF includes ways to specify a more general      instead on the easier problem of control parallelism. In
style of single-program­multiple-data (SPMD) execu-        the simplest cases, the compiler can automatically
tion in which separate processors can independently        decompose the problem without requiring any code
work on different parts of the code. This SPMD speci-      modifications. For example, automatic decomposition
fication is formalized in such a way as to make the        for SMP systems of a code called, for example, cfd.f,
resulting code far more maintainable than previous         can be done trivially with Digital's KAP optimizer by
message-passing-library ways of specifying SPMD dis-       using the command line 
tributed parallelism. 
  Can HPF and PCF extensions be used together in           kf90 -fkapargs=`-conc' cfd.f -o cfd.exe 
the same Fortran 90 code? Sure. But the PCF specifi-         As an example of guided automatic decomposition,
cation has lots of "user-beware" warnings about the        the following shows how a KAP parallelization asser-
correct usage of the PARALLEL REGION construct,            tion can be included in the code. (Actually, the code
and the HPF specification has lots of warnings about       segment below is so simple that the compiler can auto-
the correct usage of the EXTRINSIC(HPF_LOCAL)              matically detect the parallelism without the help of the
construct. So as you can see, there are times when         assertion.) 
a programmer had better be very knowledgeable if she
or he wants to write a mixed HPF/PCF code. Digital's       C*$*   ASSERT DO (CONCURRENT) 
products support both the PCF and HPF extensions.                   DO 100 I = 4,N 
                                                                           A(I) = B(I) + C(I) 
The HPF extensions are supported as part of the DEC                 END DO 
Fortran 90 compiler, and the PCF extensions are sup-
ported through Digital's KAP Fortran optimizer.3,4           For explicit control of the parallelism, PCF direc-
                                                           tives can be used. In the example that follows, the KAP
Shared Memory Fortran Parallelism                          preprocessor form of the PCF directives are used to
                                                           parallelize a loop. 
The traditional discussions of parallel computing focus    C*KAP*PARALLEL REGION 
rather heavily on what is known as control parallelism.    C*KAP*&SHARED(A,B,C) LOCAL(I) 
Namely, the application is analyzed in terms of the        C*KAP*PARALLEL DO 
opportunities for parallel execution of various threads            DO 10 I = 1,N 
                                                                       A(I) = B(I) + C(I) 
of control. The canonical example is a DO loop in          10    CONTINUE 
which the individual iterations operate on inde-           C*KAP*END PARALLEL REGION 
pendent data. Each iteration could, in principle, be


                                                           Digital Technical Journal               Vol. 8 No. 3 1996    41


      Cluster Fortran Parallelism                                    !HPF$ DISTRIBUTE X(BLOCK) 
                                                                     !HPF$ ALIGN Y WITH X 
                                                                           REAL*8 X(1000), Y(1000) 
      High Performance Fortran V1.1 is currently the only
      language standard for distributed-memory parallel                    <initialize x> 
      computing. The most significant way in which HPF                     Y(2:999) = X(1:998) + X(3:1000) - 2 * X(2:999) 
      extends Fortran 90 is through a rich family of data
      placement directives. There are also library routines                <check the answer> 
      and some extensions for control parallelism. HPF                     END 
      is the simplest way of parallelizing data-parallel appli-        The HPF compiler is responsible for generating all of
      cations on clusters (also known as "farms") of work-           the boundary-element communication code. The com-
      stations and servers. Other methods of cluster                 piler is also responsible for determining the most even
      parallelism, such as message passing, require more             distribution of arrays. (If, for example, there were 13
      bookkeeping and are therefore less easy to express and         processors, some chunks would be bigger than others.) 
      less easy to maintain. In addition, during the past year,        This simple example is useful not only as an illustra-
      HPF has become widely available and is supported on            tion of the power of HPF but also as a way of pointing
      the platforms of all major vendors.                            to one of the hazards of parallel algorithm develop-
         HPF is often considered to be a data parallel lan-          ment. Each of the element-updates involves three
      guage. That is, it facilitates parallelization of array-       floating-point operations-an addition, a subtraction,
      based algorithms in which the instruction stream can           and a multiplication. So, as an example, on a four-
      be described as a sequence of array manipulations,             processor system, each processor would operate on
      each of which is inherently parallel. What is less well        250 elements with 750 floating-point operations. In
      known is that HPF also provides a powerful way of              addition, each processor would be required to com-
      expressing the more general SPMD parallelism men-              municate one word of data for each of the two chunk
      tioned earlier. This kind of parallelism, often expressed      boundaries. The time that each of these communica-
      with message-passing libraries such as MPI,5 is one in         tions takes is known as the communications latency.
      which individual processors can operate simultane-             Typical transmission control protocol/internet proto-
      ously on independent instruction streams and gener-            col (TCP/IP) network latencies are twenty thousand
      ally exchange data either by explicitly sharing memory         times (or more) longer than the time it typically takes
      or by exchanging messages. Three case studies follow           a high-performance system to perform a floating-
      which illustrate the data parallel and the SPMD styles         point operation. Thus even 750 floating-point opera-
      of programming.                                                tions are negligible compared with the time taken to
                                                                     communicate. In the above example, network paral-
      A One-dimensional Finite-difference Algorithm                  lelism would be a net loss, since the total execution
      Consider a simple one-dimensional grid problem-                time would be totally swamped by the network
      the most mind-bogglingly simple illustration of HPF            latency. 
      in action-in which each grid value is updated as a lin-          Of course, some communication mechanisms are of
      ear combination of its (previous) nearest neighbors.           lower latency than TCP/IP networks. As an example,
         For each interior grid index i, the update algorithm is     Digital's implementation of MEMORY CHANNEL
         Y(i) = X(i  1)   X(i   1)   2   X(i)                        cluster interconnect reduces the latency to less than
                                                                     1000 floating-point operations (relative to the perfor-
         In Fortran 90, the resulting DO loop can be                 mance of, say, Digital's AlphaStation 600 5/300 sys-
      expressed as a single array assignment. How would              tem). For SMP, the latency is even smaller. In both
      this be parallelized? The simplest way to imagine paral-       cases, there may be a benefit to parallelism. 
      lelization would be to partition the X and Y arrays into
      equal-size chunks, with one chunk on each processor.           A Three-dimensional Red­Black Poisson 
      Each iteration could proceed simultaneously, and at            Equation Solver 
      the chunk boundaries, some communication would                 The example of a one-dimensional algorithm in the
      occur between processors. The HPF implementation               previous section can be easily generalized to a more
      of this idea is simply to add the Fortran 90 code to two       realistic three-dimensional algorithm for solving 
      data placement statements. One of these declares that          the Poisson equation using a relaxation technique
      the X array should be distributed into chunks, or              commonly known as the red­black method. The 
      blocks. The other declares that the Y array should be          grid is partitioned into two colors, following a two-
      distributed such that the elements align to the same           dimensional checkerboard arrangement. Each red 
      processors as the corresponding elements of the X              grid element is updated based on the values of neigh-
      array. The resultant code for arrays with 1,000 ele-           boring black elements. A similar array assignment can
      ments is as follows: 


42    Digital Technical Journal               Vol. 8 No. 3 1996


be written as in the previous example or, as shown in       Communications and SPMD Programming with HPF 
the partial code segment below, alternatively can use       Since  HPF can be used to place data, it stands to
the HPF FORALL construct to express the assign-             reason that communication can be forced between
ments in a style similar to that for serial DO loops.       processors. The beauty of HPF is that all of this can be
                                                            done in the context of mathematics rather than in the
!HPF$ DISTRIBUTE(*,BLOCK,BLOCK) :: U,V                      context of distributed parallel programming. The
       <other stuff>
       FORALL (I=2:NX-1,J=2:NY-1:2,K=2:NZ-1:2)              code fragment in Figure 1 illustrates how this is done. 
            U(I,J,K) = FACTOR*(HSQ*F(I,J,K) +    &            On two processors, the two columns of the U and V
                  U(I-1,J,K) + U(I+1,J,K)   +    &          arrays are each on different processors; thus the array
  The distribution directive lays out the array so that     assignment causes one of those columns to be moved
the first dimension is completely contained within          to the other processor. This kind of an operation begins
a processor, with the other two dimensions block-           to provide programmers with explicit ways to control
distributed across processors in rectangular chunks.        data communication and therefore to more explicitly
The red­black checkerboarding is performed along            manage the association of data and operations to
the second and third dimensions. Note also the              processors. Notice that the programmer need not be
Fortran 90 free-form syntax employed here, in which         explicit about the parallelism. In fact, scientists and
the ampersand is used as an end-of- line continuation       engineers rarely want to express parallelism. In typical
statement.                                                  message-passing programs, the messages often express
  In this example, the parallelism is similar to that       communication of vector and array information. 
of the one-dimensional finite-difference example.             However, despite the fervent hopes of programmers,
However, communication now occurs along the two-            there are times when a parallel algorithm can be
dimensional boundaries between blocks. The HPF              expressed most simply as a collection of individual
compiler is responsible for these communications.           instruction streams operating on local data. This SPMD
Digital's Fortran 90 compiler performs several opti-        style of programming can be expressed in HPF with the
mizations of those communications. First, it pack-          EXTRINSIC(HPF_LOCAL) declaration, as illustrated
ages up all of the data that must be communicated           by continuing the above code segment as shown in
into long vectors so that the start-up latency is effec-    Figure 2.
tively hidden. Second, the compiler creates so-called         Because the subroutine CFD is declared to be
shadow edges (processor-local copies of nonlocal            EXTRINSIC(HPF_LOCAL), the HPF compiler exe-
boundary edges) for the local arrays so as to minimize      cutes that routine independently on each processor (or
the effect of buffering of neighbor values. These kinds     more generally, the execution is done once per peer
of optimizations can be extremely tedious to message-       process), operating on routine-local data. As for the
passing programmers, and one of the virtues of a high-      array argument, V, which is passed to the CFD routine,
level language like HPF is that the compiler can take       each processor operates only on its local slice of that
care of the bookkeeping. Also, since the compiler           array. In the specific example above on two processors,
can reliably do buffer-management bookkeeping (for          the first one operates on the first column of V and the
example, ensuring that communication buffers do not         second one operates on the second column of V. 
overflow), the communications runtime library can             It is important to mention here that, although HPF
be optimized to a far greater extent than one would         permits-and even encourages-SPMD program-
normally expect from a user-safe message library.           ming, the more popular method at this time is the 
Indeed, Digital's HPF communications are performed          message-passing technique embodied in, for example,
using a proprietary optimized communications library,       the  PVM7 and  MPI5 libraries. These libraries can be
Digital's Parallel Software Environment.6                   invoked from Fortran, and can also be used in conjunc-
                                                            tion with EXTRINSIC(HPF_LOCAL) subroutines. 


                       !HPF$ DISTRIBUTE(*,BLOCK) :: U 
                       !HPF$ ALIGN V WITH U 
                              REAL*8 U(N,2),V(N,2) 
                              <initialize arrays> 
                              V(:,1) = U(:,2)       ! MOVE A VECTOR BETWEEN PROCESSORS


Figure 1
Code Example Showing Control of Data Communication without Expression of Parallelism


                                                            Digital Technical Journal             Vol. 8 No. 3 1996     43


                                        CALL CFD(V)      ! DO LOCAL WORK ON THE LOCAL PART OF V
                                        <finish the main program> 

                                        EXTRINSIC(HPF_LOCAL) SUBROUTINE CFD(VLOCAL) 
                                        REAL*8, DIMENSION(:,:)  :: VLOCAL 
                                   !HPF$ DISTRIBUTE *(*,BLOCK)   :: VLOCAL 
                                        <do arbitrarily complex work with vlocal> 
                                        END 


      Figure 2
      Code Example of Parallel Algorithm Expressed as Collection of Instruction Streams


      Clusters of SMP Systems                                         as "Amdahl's Law''). In the case of clustered SMP sys-
      During these last few years of the second millennium,           tems, the weak link would be the inter-SMP commu-
      we are witnessing the emergence of systems that con-            nication and not the intra-SMP (shared-memory)
      sist of clusters of shared-memory SMP computers.                communication. This casts some doubt on the worth
      This exciting development is the logical result of the          of local communications optimizations. Experimenta-
      exponential increase in performance of mid-priced               tion will be necessary. 
      ($100K to $1000K) systems.                                        Whatever else one might say about parallelism, one
         There are two natural ways of writing parallel               thing is certain: The future will not be boring. 
      Fortran programs for clusters of SMP systems. The
      easiest way is to use HPF and to target the total num-          Summary
      ber of processors. So, for example, if there were two
      SMP systems, each with four processors, one would               Fortran was developed and has continued to evolve as
      compile the HPF program for eight processors (more              a computer language that is particularly suited to
      generally, for eight peers). If the program contained,          expressing mathematical formulas. Among the recent
      for instance, block-distribution directives, the affected       extensions to Fortran are a variety of constructs for 
      arrays would be split up into eight chunks of contigu-          the high-level manipulation of arrays. These constructs
      ous array sections.                                             are especially amenable to parallel optimization. In
         The second way of writing parallel Fortran pro-              addition, there are extensions (PCF) for explicit
      grams for clustered SMP systems is to use HPF to                shared-memory parallelization and also data-parallel
      target the total number of SMP machines and then                extensions (HPF) for cluster parallelism. The Digital
      to use PCF (or more generally, shared-memory exten-             Fortran compiler performs many interesting optimiza-
      sions) to achieve parallelism locally on each of the SMP        tions of codes written using HPF. These HPF codes
      machines. For example, one might write                          are able to hide-without sacrificing performance-
                                                                      much of the tedium that otherwise accompanies clus-
      !HPF$ DISTRIBUTE (*,BLOCK) :: V 
              <stuff>                                                 ter programming. Today, the most exciting frontier
              EXTRINSIC(HPF_LOCAL) SUBROUTINE CFD(V)                  for Fortran is that of SMP clusters and other
              <stuff>                                                 nonuniform-memory-access (NUMA) systems. 
      C*KAP*PARALLEL REGION 

         If the target system consisted of two SMP systems,           References
      each with four processors, and the above program was
      compiled for two peers, then the V array would be dis-          1.  J. Adams et al., Fortran 90 Handbook (New York:
      tributed into two chunks of columns-one chunk                      McGraw-Hill, Inc., 1992). 
      per SMP system. Then the routine, CFD, would be                 2.  "High Performance Fortran language specification,"
      executed once per SMP system; and the PCF directives               Scientific Programming, vol. 2: 1­170 (New York:
      would, on each system, cause parallelism on four                   John Wiley and Sons, Inc., 1993), and C. Koelbel et al.,
      threads of execution.                                              The High Performance Fortran Handbook (Boston:
         It is unclear at this time whether there would ever             MIT Press, 1994). 
      be a practical reason for using a mix of HPF and PCF            3.  J. Harris et al., "Compiling High Performance Fortran
      extensions. It might be tempting to think that there               for Distributed-memory Systems,'' Digital Technical
      would be performance advantages associated with the                Journal, vol. 7, no. 3 (1995): 5­23. 
      local use of shared-memory parallelism. However,                4.  R. Kuhn, B. Leasure, and S. Shah, "The KAP Parallelizer
      experience has shown that program performance                      for  DEC  Fortran and DEC C Programs,'' Digital
      tends to be restricted by the weakest link in the perfor-          Technical Journal, vol. 6, no. 3 (1994): 57­70. 
      mance chain (an observation that has been enshrined

44    Digital Technical Journal                 Vol. 8 No. 3 1996


5.  For example see Proceedings of Supercomputing '93
   (IEEE, November 1993): 878­883, and W. Gropp, 
   E. Lusk, and A. Skjellum, Using MPI (Boston: MIT
   Press, 1994). 
6.  E. Benson et al., "Design of Digital's Parallel Software
   Environment," Digital Technical Journal, vol. 7, no. 3
   (1995): 24­38. 
7.  For example see A. Geist et al., PVM: Parallel Virtual
   Machine. A Users' Guide and Tutorial for Network
   Parallel Computing (Boston: MIT Press, 1994). 


Biography


William N. Celmaster 
Bill Celmaster has long been involved with high-performance
computing, both as a scientist and as a computing consul-
tant. Joining Digital from BBN in 1991, Bill managed the
porting of major scientific and engineering applications to
the DECmpp 12000 system. Now a member of Digital's
High Performance Computing Expertise Center, he is
responsible for parallel software demonstrations and per-
formance characterization of Digital's high-performance
systems. He has published numerous papers on parallel
computing methods, as well as on topics in the field of
physics. Bill received a bachelor of science degree in mathe-
matics and physics from the University of British Columbia
and a Ph.D. in physics from Harvard University. 


                                                                 Digital Technical Journal    Vol. 8 No. 3 1996    45