Math578 - Alexiades
ACF info
ACF is UTK's Advanced Computing Facility.
Our class ACF Project is "ACF-UTK0151".
If you already have ACF account:
login to portal.nics.utk.edu
and choose option to be added to Project ACF-UTK0151 .
If you do not have ACF account: go to
https://portal.acf.utk.edu/accounts/request
and request "new account", for Project ACF-UTK0151 .
Running MPI code on ACF
Clusters and HPC systems, like ACF, provide environments for running
BATCH jobs.
Resources are loaded by "module",
which drastically simplifies the Makefile.
Running code involves several steps you need to be aware of:
Login, transfer files, put them in Lustre, compile,
submit batch job (via 'qsub' using PBSscript), and wait for it to run...
ACF consists of several clusters, Beacon, Rho, Sigma, ...
each with several nodes (beacon has 43 compute nodes),
plus several login nodes.
Each node has 16 "cores" in 2 "sockets".
File systems:
Home directories are mounted on login (service) nodes via NFS,
but NOT mounted on the compute nodes.
You MUST run jobs from the Lustre file system,
which provides "scratch" space, mounted on all compute nodes.
The envar $SCRATCHDIR points to your scratch space.
Create a link to it in your home dir:
ln -s $SCRATCHDIR Scratch
so you can do: cd Scratch
Note: Lustre files are NOT backed up, and are deleted after
30 days, should copy important files to your $HOME often.
Compilers:.
There are 2 suites: Intel: mpiicc, mpiicpc, mpiifort ,
and Gnu: gcc, g++, gfortran.
They are loaded by module (see below). See
module commands
Intel compilers are faster than Gnu usually.
Scheduler and PBS:
Jobs are submitted (to Torque manager and Moab scheduler) via a "PBSscript".
Copy the following to a (plain text) file named PBSscript
For each run, you will need to set:
nodes=?:ppn=? , walltime , jobname , −n ?? , code.x
############ PBSscript for ACF ##########
#PBS -S /bin/sh
#PBS -A ACF-UTK0151 #( this is our account number )
#PBS -l nodes=1:ppn=11 #( requests 11 cores of 1 node )
#PBS -N name_for_your_job #( short single_string e.g. J256on11 )
#PBS -l walltime=00:30:00 #( hh:mm:ss )
#PBS -j oe
#PBS -k oe
cd $PBS_O_WORKDIR #(points to dir job is submitted from)
####------ ACF mpich ------:
mpirun -n 11 ./code.x # < ./dat > ./OUT (to redirect I/O)
############ end of PBSscript ##########
Submit with: qsub PBSscript
This will schedule job "J256on11" to be run on 11 cores of one node
(when resources become available...).
Important:
The batch system will allocate entire node exclusivelly to you,
even if you only use 1 core!
The more nodes (and cores) you request, the longer it will take
for your job to start running...
Job monitoring commands: qstat -a (qu script, see below), showq -r , checkjob , qdel , ...
Steps for compiling and running code
(see Running Jobs )
Login to ACF (with your smartphone at hand for Duo...):
ssh -X NetID@duo.acf.tennessee.edu
[ To get another terminal without going thru Duo:
on ACF type:
nohup xterm -bg black -fg cyan -fn 8x13bold -ls &
other colors: aquamarine , khaki , peachpuff , seagreen , ...
and can reverse -bg with -fg.
Can create an alias "xt" in your ~/.bashrc :
alias xt="nohup xterm ..... " , then:
. ~/.bashrc , then: xt
Or download this fancier xtloc script into a file "xtloc", and make it executable: chmod u+x xtloc ]
On another window on your PC, zip (the dir with your) code into a
CODE.zip and scp ACF:
scp -p CODE.zip NetID@acf-login2.nics.utk.edu:CODE.zip
It will go to your $HOME . Copy it to your $SCRATCHDIR:
cp -ip CODE.zip Scratch
cd Scratch ; unzip CODE.zip
cd CODE Make sure you copy "PBSscript" into CODE/.
Check what's loaded: module list
several are loaded by default (including Intel compilers).
(To use Gnu compilers: module swap PE-intel PE-gnu)
Compile your code. Basically
mpiifort code.f90 -o code.x ...
or mpiicpc code.cpp -o code.x ...
Better way: In Makefile insert: COMP = mpiifort
or COMP = mpiicpc
and then: make compile
Note: To compile with '-fast' optimization, put these in Makefile, and do: make mpifast
##............on acf -fast needs 2-steps:
mpifast:
mpiifort $(code_f) -c -fPIE -fast
mpiifort -pie $(code_o) -o $(code).x
Edit your PBSscript to customize it for this specific run. Then
submit the job: qsub PBSscript
(or: make pbs )
Check status: qstat -a | grep $USER
Better yet, use this
qu script
(put in a file "qu" and make it executable: chmod u+x qu ).
The first item displayed is JobID, needed for 'checkjob', 'qdel', ...
If a job is running and you want to kill it:
qdel JobID
Compile and run your SERIAL code on ACF
On your PC, put your Lab3 code (and relevant files)
into a dir "SERIAL".
Clean it up! Comment out any diagnostics, remove any and all interactive features.
Only a data file should be read in (simplest way: code.x < dat ).
Only the OUTPUT routine should print out
(and main at the end of the run), only essentials.
[ For C++ programmers, strongly recommend printing via
printf(...) and not via "<<", it's much cleaner... ]
Copy the 'PBSscript' and 'qu' scripts into SERIAL .
zip -oy SERIAL.zip SERIAL/*
transfer to ACF: scp SERIAL.zip NetID@acf-login2.nics.utk.edu:SERIAL.zip
Login on ACF.
Copy SERIAL.zip to your Scratch dir, and: unzip SERIAL.zip
To compile and run with Intel compiler:
mv SERIAL SERIAL-intel (rename it)
cd SERIAL-intel ; module list , should see PE-intel
Compile it. Name the executable 'serial-intel.x'
Edit PBSscript and set: node=1:ppn=1 ,
jobname: Jintel , -n 1
qsub PBSscript
./qu to see if it's running and the JobID, something like:
63157 username Jintel -- R 00:00:13
Hopefully it will run and give you what you expect!
Record the timing.
To compile and run with Gnu compiler:
(no good reason for it, Intel compilers are faster, but anyway...)
cd .. (to parent dir) ;
unzip SERIAL.zip ; mv SERIAL SERIAL-gnu ;
cd SERIAL-gnu
module swap PE-intel PE-gnu
module list should show PE-gnu
Repeat the above, replacing "intel" with "gnu"
Good luck!
Then do the above with your parallelized par1D code !
Good luck!
...if I forgot anything let me know...
last updated on 11mar21
Hybrid Computing: MPI + OpenMP
For the adventurous, the next level up would be
Hybrid Programming with OpenMP and MPI
OpenMP tutorial , rather detailed, from LLNL
Do not confuse "openMPI" (an implementation of MPI)
with "OpenMP" (shared memory programming standard)