Math578 - Alexiades - Parallel-part2

Math578 - Alexiades

ACF info

ACF is UTK's Advanced Computing Facility.

Our class ACF Project is "ACF-UTK0151".

If you already have ACF account: login to portal.nics.utk.edu and choose option to be added to Project ACF-UTK0151 .

If you do not have ACF account: go to https://portal.acf.utk.edu/accounts/request and request "new account", for Project ACF-UTK0151 .

Running MPI code on ACF
Clusters and HPC systems, like ACF, provide environments for running BATCH jobs.
Resources are loaded by "module", which drastically simplifies the Makefile.
Running code involves several steps you need to be aware of:
Login, transfer files, put them in Lustre, compile, submit batch job (via 'qsub' using PBSscript), and wait for it to run...
ACF consists of several clusters, Beacon, Rho, Sigma, ... each with several nodes (beacon has 43 compute nodes),
plus several login nodes. Each node has 16 "cores" in 2 "sockets".

File systems: Home directories are mounted on login (service) nodes via NFS, but NOT mounted on the compute nodes.
You MUST run jobs from the Lustre file system, which provides "scratch" space, mounted on all compute nodes.
The envar $SCRATCHDIR points to your scratch space.
Create a link to it in your home dir: ln -s $SCRATCHDIR Scratch so you can do: cd Scratch
Note: Lustre files are NOT backed up, and are deleted after 30 days, should copy important files to your $HOME often.

Compilers:. There are 2 suites: Intel: mpiicc, mpiicpc, mpiifort , and Gnu: gcc, g++, gfortran.
They are loaded by module (see below). See module commands Intel compilers are faster than Gnu usually.

Scheduler and PBS: Jobs are submitted (to Torque manager and Moab scheduler) via a "PBSscript".
Copy the following to a (plain text) file named PBSscript
For each run, you will need to set: nodes=?:ppn=? , walltime , jobname , −n ?? , code.x

############ PBSscript for ACF ##########
#PBS -S /bin/sh
#PBS -A ACF-UTK0151             #( this is our account number )
#PBS -l nodes=1:ppn=11		#( requests 11 cores of 1 node )
#PBS -N name_for_your_job       #( short single_string e.g. J256on11 )
#PBS -l walltime=00:30:00	#( hh:mm:ss )
#PBS -j oe
#PBS -k oe
cd $PBS_O_WORKDIR		#(points to dir job is submitted from)
####------ ACF mpich ------:
mpirun -n 11 ./code.x  	# < ./dat > ./OUT  (to redirect I/O) 
############ end of PBSscript ##########

Submit with: qsub PBSscript
This will schedule job "J256on11" to be run on 11 cores of one node (when resources become available...).
Important: The batch system will allocate entire node exclusivelly to you, even if you only use 1 core!
The more nodes (and cores) you request, the longer it will take for your job to start running...
Job monitoring commands: qstat -a (qu script, see below), showq -r , checkjob , qdel , ...

Steps for compiling and running code (see Running Jobs )

Login to ACF (with your smartphone at hand for Duo...): ssh -X NetID@duo.acf.tennessee.edu
[ To get another terminal without going thru Duo: on ACF type: nohup xterm -bg black -fg cyan -fn 8x13bold -ls &
other colors: aquamarine , khaki , peachpuff , seagreen , ... and can reverse -bg with -fg.
Can create an alias "xt" in your ~/.bashrc : alias xt="nohup xterm ..... " , then: . ~/.bashrc , then: xt
Or download this fancier xtloc script into a file "xtloc", and make it executable: chmod u+x xtloc ]

On another window on your PC, zip (the dir with your) code into a CODE.zip and scp ACF:
scp -p CODE.zip NetID@acf-login2.nics.utk.edu:CODE.zip

It will go to your $HOME . Copy it to your $SCRATCHDIR: cp -ip CODE.zip Scratch

cd Scratch ; unzip CODE.zip

cd CODE Make sure you copy "PBSscript" into CODE/.

Check what's loaded: module list several are loaded by default (including Intel compilers).
(To use Gnu compilers: module swap PE-intel PE-gnu)

Compile your code. Basically mpiifort code.f90 -o code.x ... or mpiicpc code.cpp -o code.x ...
Better way: In Makefile insert: COMP = mpiifort or COMP = mpiicpc and then: make compile
Note: To compile with '-fast' optimization, put these in Makefile, and do: make mpifast

##............on acf -fast  needs 2-steps:
mpifast:
        mpiifort $(code_f) -c -fPIE -fast
        mpiifort -pie $(code_o)  -o $(code).x

Edit your PBSscript to customize it for this specific run. Then

submit the job: qsub PBSscript (or: make pbs )

Check status: qstat -a | grep $USER
Better yet, use this qu script (put in a file "qu" and make it executable: chmod u+x qu ).
The first item displayed is JobID, needed for 'checkjob', 'qdel', ...

If a job is running and you want to kill it: qdel JobID

Compile and run your SERIAL code on ACF

On your PC, put your Lab3 code (and relevant files) into a dir "SERIAL".

Clean it up! Comment out any diagnostics, remove any and all interactive features.
Only a data file should be read in (simplest way: code.x < dat ). Only the OUTPUT routine should print out
(and main at the end of the run), only essentials.
[ For C++ programmers, strongly recommend printing via printf(...) and not via "<<", it's much cleaner... ]

Copy the 'PBSscript' and 'qu' scripts into SERIAL .

zip -oy SERIAL.zip SERIAL/*

transfer to ACF: scp SERIAL.zip NetID@acf-login2.nics.utk.edu:SERIAL.zip

Copy SERIAL.zip to your Scratch dir, and: unzip SERIAL.zip

To compile and run with Intel compiler:

mv SERIAL SERIAL-intel (rename it)

cd SERIAL-intel ; module list , should see PE-intel

Compile it. Name the executable 'serial-intel.x'

Edit PBSscript and set: node=1:ppn=1 , jobname: Jintel , -n 1

qsub PBSscript

./qu to see if it's running and the JobID, something like:
63157 username Jintel -- R 00:00:13

Hopefully it will run and give you what you expect! Record the timing.

To compile and run with Gnu compiler: (no good reason for it, Intel compilers are faster, but anyway...)

cd .. (to parent dir) ; unzip SERIAL.zip ; mv SERIAL SERIAL-gnu ; cd SERIAL-gnu

module swap PE-intel PE-gnu

module list should show PE-gnu

Repeat the above, replacing "intel" with "gnu"
Good luck!

Then do the above with your parallelized par1D code !
Good luck!

...if I forgot anything let me know... last updated on 11mar21

Hybrid Computing: MPI + OpenMP
For the adventurous, the next level up would be Hybrid Programming with OpenMP and MPI

OpenMP tutorial , rather detailed, from LLNL
Do not confuse "openMPI" (an implementation of MPI) with "OpenMP" (shared memory programming standard)