Parallelization will be only in the z-direction, so
always use Mz=even number, and divisible by nWRs.
Specific steps for parallelization : On your 1D code first !
[Note on file names: I find it convenient to have all files of a code start with a common
letter, like: z.main.f z.io.f ...., and all output files like o.out o.prof o.hist ....]
1. 1Dserial code:
a. Place a copy of your serial 1D code in a directory, say, 1Dserial/ .
b. Organize your code: Split your serial code into separate files: z.main.f z.io.f z.setup.f z.update.f
with obvious contents: input/output in io.f, MESH,INIT in setup.f, FLUX,PDE in update.f
c. Download the file Makefile_serial. Save As "Makefile" (must be PLAIN text file!).
Look inside it to see what a makefile looks like. It contains macros and directives for 'make'.
Note that directives (like 'run:') are followed by a line starting with TAB (not spaces!!!).
Customize it (set names, compiler, ...).
d. Try it: make compile ( executable will be $(PROG).x )
make run ( it should run )
2. 1Dparallel code:
a. Copy (all files of) your 1Dserial code into a new dir, say, 1Dpar
b. Comment out all subroutine/function calls.
c. Compile each file with: COMPILER -c z.* and fix the worst problems.
d. Make two copies of main.f: mainMR.f, mainWR.f
e. In mainMR.f : comment out what will NOT be done by MR. Compile.
In mainWR.f : comment out what will NOT be done by WR. Compile.
3. Create a new main.f which: starts up MPI, calls MASTER() or WORKER() and shuts down MPI
Here is a sample:
-------------------------- main.f sample -------------------------------
program main
include 'mpif.h' !!(include 'mpi.h' for C)
... (declare variables) ...
!>>>mpi>>>
!...startup:
call MPI_INIT( ierr )
!....... nPROC is specified at mpirun or mpiexec, see Makefile....
call MPI_COMM_SIZE( MPI_COMM_WORLD,nPROC,ierr ) !..returns nPROC
mster = 0 ! master gets rank=0
nWRs = nPROC - 1 ! =number of workers
!----------------- start 0, ... ,nWRs tasks ---------------!
call MPI_COMM_RANK(MPI_COMM_WORLD, myID, ierr) !..assigns myID
IF( myID = mster ) THEN
tt0 = MPI_Wtime() !...start CPU timer on MR
call MASTER( nWRs, mster, ... )
tt1 = MPI_Wtime() !...end timer
print*,'>>main>> MR timing= ',tt1-tt0,' sec on ',nWRs,' WRs'
ELSE
call WORKER( nWRs, myID, ... ) !... now MPI is running ...!
print*, 'Bye from WR:',myID,': ierr= ', ierr
if( ierr .NE. 0 ) then
print*, '>>>> worker:',myID,' ended with ierr= ',ierr
endif
ENDIF
!...termination: the only clean way to exit is this:
call MPI_FINALIZE(ierr)
!<<< mpi<<<
END
------------------------------------------------------------------------
4. Download Makefile_parallel. Save As 'Makefile'.
Customize it for 1Dpar code (set names, compiler, ...).
5. Parallelization strategy: Domain Decomposition along one direction only (z-direction)
6. Start inserting MPI calls in mainMR.f and corresponding in mainWR.f
See the sample coding in "outlineMPI" and look up the syntax of MPI functions.
Try your hardest to do it correctly the first time!!!
(in C coding, remove the 'ierr' item from the arguments).
Insert one MPI call, test it, fix it, then another,...
Use a very coarse mesh (MM=4 or 8), and lots of print statements
to see what's happening (then comment out).
7. Test/correct till it runs with nPROC=2, i.e. nWRs=1, on local machine,
(even though nothing is computed yet till step 10 below).
This is still essentially serial.
8. Test/correct till it runs with nPROC=3, i.e. nWRs=2, on local machine.
This is now parallel ! Most bugs will have been removed by this stage...
9. Test/correct till it runs with nPROC=5, i.e. nWRs=4.
Tougher bugs will have been removed at this stage...
10. Once the basic MPI operations are working, start adding one
routine/function call at a time, test as in 6, and periodically as in 7, 8, 9,
with lots of print statements.
Make sure you keep backup copies of each version that works
so that you can get back to a working version if worse comes to worse...
11. Exchanging "boundary" values between neighbors:
All the message passing between neighbors has to be done before FLUX
routine is called.
Each PROCess (except Me=1) must send its bottom row (j=1) of U to its NodeDN
neighbor and receive that neighbor's top row as its boundary values (j=0).
Also, each PROCess (except Me=nWRs) must send its top row to NodeUP and
receive that neighbor's bottom row as its top boundary values (j=Mz+1).
Be careful with the indices and the logic!!!
Do this crucial message passing on paper first !
GOOD LUCK and have fun !!!