Timings of meltflow benchmark
Melting of Tin in a square cavity
( Navier-Stokes + energy )
implicit scheme, 600x600 mesh, serial code
100, 300, 400, 500 time-steps

Summary
 100 time steps ++++++++++++++++++++++++++++++++++++++++++++++++++++
                             CPU  mem        CPUsec   wall   run
machine      model           MHz   MB   comp  /step  hh:mm      date
-------- ------------------ ---- ----  ------  ----- ------- -------
mars     Intel Xeon i7x4    3600  24GB gfort    3.5   0:06   26jun15
ares     Intel Xeon i7x4    3600  16GB gfort    6.1   0:06   24aug15
darter   Cray XC30 XeonE5   2600   2GB Cray     4.6   0:07   19sep15
householder Intel Xeon 10   3000 192GB gfort    4.8   0:08   26may14
frost    Intel XeonE5440    2830  4GB  ifort   10.8   0:18    3feb12
ThinkPad Intel core i5      2400 1.5G  gfort   10.9   0:18   30apr11
midtown  AMD opteron2376    2300  2GB  ifort   15.0   0:25    6feb10
zeus     AMD opteron2376    2300  2GB  ifort   15.0   0:25    6feb10
newton   Intel Xeon_64      3200  4GB  ifort   18.6   0:31    5may06
ares     AMD Opteron2220    2800 16GB  gfort   18.8   0:32   30apr11
zeus     AMD opteron252     2600  2GB   g77    21.5   0:36   13apr06
zeus     AMD opteron252     2600  2GB  pgf95   21.9   0:37   13apr06
tiger   Cray opteron248     2200  8GB  pgf90   23.7   0:40     dec07
tiger   Cray opteron248     2200  8GB   g77    24.0   0:40     dec07
oic      Intel Xeon_64      3400  4GB  ifort   27.2   0:45    5may06
cheetah  IBM SP p690 pwr4   1300  1GB   xlf    28.9   0:48   16nov02
fubini   Intel P4 Xeon      3056  4GB  ifort   33.6   0:56   13oct03
hawk     AMD opteron242     1596  2GB   g77    31.8   0:55   25jan05
hawk     Amd opteron242     1596  2GB  ifort   34.1   0:57   25jan05
frodo    AMD opteron240     1396  2GB   g77    40.4   1:07   29oct04
agnesi   Intel P4 Xeon      2200  4GB  ifort   41.2   1:08   13nov02
abcd     Intel P4 Xeon      3200  4GB  ifort   52.1   1:26   28oct04
hawk     AMD opteron242     1596  2GB   pf90   53.3   1:31   29jan05
colt     Alpha SC ev67       667  2GB   f90    65.9   1:50   29apr01
knox3    Sun UltraSparc      900  1GB   f77    76.    2:07   29apr01
barnard  Sun ultra80 dual    450 1024   f77   162.9   6:52   24nov01 
vxa      Dell Latitude C600  752  256   g77   198.3   5:36   21nov01
capsicum SGI IP27            250 4096   f77   220.8   6:10   30nov01
goliath  Intel PentiumIII    497 1028   f77   221.1   6:09   22nov01
larry    Sun ultra4          296 2048   f77   263.9   7:20   23nov01

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 300 time steps ++++++++++++++++++++++++++++++++++++++++++++++++++++
                            CPU   mem       CPUsec    wall  run
machine      model           MHz   MB  comp  /step   hh:mm     date
-------- ------------------ ---- ----  ----- -----  ------  -------
mars     Intel Xeon i7x4    3600  24GB gfort   4.8    0:24   26jun15
ares     Intel Xeon i7x4    3600  16GB gfort   5.2    0:26   24aug15
householder Intel Xeon      3000 192GB gfort   6.0    0:33   26may14
frost    Intel XeonE5440    2830  4GB  ifort  14.2    1:11    3feb12
ThinkPad Intel core i5      2400 1.5G  gfort  13.4    1:27   30apr11
midtown  AMD opteron2376    2300  2GB  ifort  20.8    1:44    6feb10
zeus     AMD opteron2376    2300  2GB  ifort  22.6    1:53    6feb10
newton    Intel Xeon_64     3200  4GB  ifort  25.6    2:08    5may06
ares     AMD Opteron2220    2800 16GB  gfort  26.6    1:27   30apr11
zeus      opteron 252       2600  2GB  pgf95  31.3    2:37   14apr06
zeus      opteron 252       2600  2GB   g77   33.8    2:49   13apr06
tiger   Cray opteron248     2200  8GB  pgf90  30.0    2:30     dec07
tiger   Cray opteron248     2200  8GB   g77   33.5    2:48     dec07
cheetah   IBM SP p690 pwr4  1300  1GB   xlf   38.7    3:14   16nov02
fubini    Intel P4 Xeon     3056  4GB  ifort  42.9    3:35   13oct03
oic       Intel Xeon_64     3400  4GB  ifort  49.03   4:05    5may06
frodo     opteron 240       1396  2GB   g77   53.2    4:27   29oct04
agnesi    Intel P4 Xeon     2200  4GB  ifort  53.98   4:30   13nov02
hawk      opteron 242       1596  2GB  ifort  53.8    4:33   30jan05
hawk      opteron 242       1596  2GB   g77   55.9    4:58   25jan05
hawk      opteron 242       1596  2GB  pf90   78.1    6:35   30jan05
abcd      Intel P4 Xeon     3200  4GB  ifort  79.4    6:37   28oct04
colt      Alpha SC ev67      667  2GB   f90   95.77   7:59    2nov01
capsicum  SGI IP27           250 4096   f77  219.15  18:21    1dec01
barnard   Sun ultra80        450 1024   f77  219.15  27:54   27nov01 
vxa       Dell Latitude C600 752  261   g77  259.54  23:39   25nov01
goliath   dual PentiumIII    497 1028   f77  296.61  24:43   22nov01
larry     Sun ultra4         296 2048   f77  364.37  30:26   24nov01

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

400 time steps ++++++++++++++++++++++++++++++++++++++++++++++++
                             CPU  mem       CPUsec   wall   run
machine      model           MHz   MB  comp  /step   hh:mm     date
-------- ------------------ ---- ---- ------  ----- ------- -------
mars     Intel Xeon i7x4    3600 24GB gfort    5.7    0:37   26jun15
ares     Intel Xeon i7x4    3600 16GB gfort    6.1    0:41   24aug15
darter   Cray XC30 XeonE5   2600  2GB Cray     7.4    0:50   19sep15
householder Intel XeonE5    3000 192G gfort    7.7    0:51   26may14
darter   Cray XC30 XeonE5   2600  2GB ifort    8.2    0:55   19sep15
frost    Intel XeonE5440    2830  4GB ifort   16.2    1:48    3feb12
ThinkPad Intel core i5      2400 1.5G gfort   20.9    2:20   30apr11
midtown  AMD opteron2376    2300  2GB ifort   23.2    2:35    6feb10
zeus     AMD opteron2376    2300  2GB ifort   25.5    2:50    6feb10
newton    Intel Xeon_64     3200  4GB ifort   29.8    3:19    6may06
zeus      opteron 252       2600  2GB pgf95   34.8    3:52   14apr06
tiger   Cray opteron248     2200  8GB  g77    35.5    3:57     dec07
tiger   Cray opteron248     2200  8GB pgf90   37.2    4:09     dec07
ares     AMD Opteron2220    2800 16GB  gfort  37.5    3:50   30apr11
zeus      opteron 252       2600  2GB  g77    38.6    4:18   13apr06
cheetah   IBM SP p690 pwr4  1300  1GB  xlf    46.70   5:11   17nov02
frodo     opteron 240       1396  2GB  g77    57.1    6:20   29oct04
fatou     Intel P4 Xeon     3056  4GB ifort   55.80   6:12    1jul04
agnesi    Intel P4 Xeon     2200  4GB ifort   61.48   6:50   14nov02
hawk      opteron 242       1596  2GB ifort   65.07   7:14   30jan05
hawk      opteron 242       1596  2GB  g77    72.98   8:07   30jan05
abcd      Intel P4 Xeon     3200  4GB ifort   75.8    8:26   28oct04
hawk      opteron 242       1596  2GB pf90    79.48   8:51   30jan05
colt      Alpha SC ev67      667  2GB  f90   103.27  11:29    4nov01
barnard   Sun ultra80        450 1024  f77   255.40  52:45   30nov01
capsicum  SGI IP27           250 4096  f90   326.18  36:27    3dec01
goliath   dual PentiumIII    497 1028 ifort  339.73  38:11   25nov01
larry     Sun ultra4         296 2048  f77   425.70  47:21    2dec01
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

500 time steps ++++++++++++++++++++++++++++++++++++++++++++++++
                             CPU  mem       CPUsec   wall   run
machine      model           MHz   MB  comp  /step   hh:mm     date
-------- ------------------ ---- ---- ------  ----- ------- -------
mars     Intel Xeon i7x4    3600 24GB gfort    6.95   0:58   26jun15
ares     Intel Xeon i7x4    3600 16GB gfort    7.4    1:02   24aug15
householder Intel XeonE5    3000 192G gfortr  10.1    1:24   26may14
frost    Intel XeonE5440    2830  4GB ifort   19.3    2:41    3feb12
ThinkPad Intel core i5      2400  1.5G gfort  22.8    3:10   30apr11
midtown  AMD opteron2376    2300  2GB ifort   27.5    3:50    6feb10
zeus     AMD opteron2376    2300  2GB ifort   29.6    4:06    6feb10
newton    Intel Xeon_64     3200  4GB ifort   38.1    5:17    6may06
zeus      opteron 252       2600  2GB pgf95   42.4    5:56   14apr06
tiger   Cray opteron248     2200  8GB pgf90   42.95   5:59     dec07
zeus      opteron 252       2600  2GB  g77    44.7    6:13   13apr06
tiger   Cray opteron248     2200  8GB  g77    46.3    6:27     dec07
abcd      Intel P4 Xeon     3200  4GB ifort   58.7    8:09   28oct04
cheetah   IBM SP p690 pwr4  1300  1GB  xlf    60.3    8:22   17nov02
hawk      opteron 242       1596  2GB  g77    65.0    9:04   25jan05
hawk      opteron 242       1596  2GB ifort   67.13   9:20   30jan05
frodo     opteron 240       1396  2GB  g77    65.7    9:07   29oct04
fatou     Intel P4 Xeon     3056  4GB ifort   66.65   9:16    1jul04
agnesi    Intel P4 Xeon     2200  4GB ifort   76.09  10:34   15nov02
hawk      opteron 242       1596  2GB  pf90  106.9   14:53   29jan05
colt      Alpha SC ev67      667  2GB  f90   122.42  17:01    3nov01
barnard   Sun ultra80        450 1024  f77   306.76  82:07    5dec01
goliath   dual PentiumIII    497 1028 ifort  405.84  56:49    4dec01
larry     Sun ultra4         296 2048  f77   514.99  71:49    5dec01
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
If you would be willing to run the benchmark on another machine
please email me at alexiades@utk.edu


Details

colt.ccs.ornl.gov on one CPU falcon.ccs.ornl.gov on one CPU Compaq AlphaServer SC, 4 SMP CPUs per node, 2GB RAM CPU: ES40 processor: 21264a (ev67), 667 MHz, 64KB I-cache, 64KB D-cache, 8MB L2 cache On colt or falcon (with prun, no DFS/DCE) 100 steps: uname -a: OSF1 colt13 V5.0 910 alpha f90 5.3: f90 -fast -O5 -tune ev6 6588.46u 1.16s 1:49:54 99% 0+2709k 29apr01 ==> 65.88 CPUs/step in 1:49 hrs f95 Compaq Fortran Compiler X5.5-2801-48CAG 6900.18u 0.73s 1:55:05 99% 0+2709k 13nov02 ==> 69.00 CPUs/step in 1:55 hrs uname -a: OSF1 falcon63 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 7382.22u 2.71s 2:03:15 99% 0+2709k 21nov01 ==> 73.82 CPUs/step in 2:03 hrs 300 steps: uname -a: OSF1 colt13 V5.0 910 alpha f90 5.3: f90 -fast -O5 -tune ev6 23229.03u 0.96s 6:27:24 99% 0+2721k 29apr01 ==> 77.43 CPUs/step in 6:27 hrs uname -a: OSF1 colt13 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 28730.36u 3.77s 7:59:15 99% 0+2720k 2nov01 ==> 95.77 CPUs/step in 7:59 hrs uname -a: OSF1 falcon63 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 29594.60u 18.05s 8:14:27 99% 0+2720k 21nov01 ==> 98.65 CPUs/step in 8:14 hrs 400 steps: uname -a: OSF1 colt13 V5.0 910 alpha f90 5.3: f90 -fast -O5 -tune ev6 35077.46u 1.11s 9:45:00 99% 0+2725k 29apr01 ==> 87.69 CPUs/step in 9:45 hrs uname -a: OSF1 colt13 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 41308.00u 1.00s 11:28:52 99% 0+2725k 4nov01 ==> 103.27 CPUs/step in 11:29 hrs uname -a: OSF1 falcon63 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 42850.37u 0.88s 11:54:43 99% 0+2724k 22nov01 ==> 107.13 CPUs/step in 11:54 hrs 500 steps: uname -a: OSF1 colt13 V5.0 910 alpha f90 5.3: f90 -fast -O5 -tune ev6 58239.35u 31.24s 16:11:57 99% 0+2728k 2may01 ==> 116.48 CPUs/step in 16:12 hrs uname -a: OSF1 colt13 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 61211.11u 1.14s 17:00:46 99% 0+2729k 3nov01 ==> 122.42 CPUs/step in 17:00 hrs
goliath.math.utk.edu Dell ???, dual PentiumIII, 497MHz, 1028MB, 512KB cache uname -a: Linux goliath 2.4.7-10smp #1 SMP i686 unknown g77 version 2.96 20000731 (Red Hat Linux 7.1 2.96-98) g77 -O3 100 steps: 22109.0u 5.75s 6:08:42 22nov01 ==> 221.09 CPUs/step in 6:09 hrs 300 steps: 88982.47u 18.4s 24:43:02 22nov01 ==> 296.61 CPUs/step in 24:43 hrs 400 steps: 135893.92u 42.170s 38:11:27.54 25nov01 ==> 339.73 CPUs/step in 38:11 hrs 500 steps: 202920.600u 100.160s 56:48:51.17 99.2% 4dec01 ==> 405.84 CPUs/step in 56:49 hrs
bearcat.ccs.ornl.gov IBM Power3 ? uname -a: AIX bearcat 3 4 000729924C00 xlf -O5 100 steps: not enought memory to run ! power3.cs.utk.edu IBM Power3 dual SMP uname -a: AIX power3 3 4 00005F6B4C00 xlf -O4 -qarch=auto -qnolm -qtune=pwr3 100 steps: not enought memory to run !
barnard.math.ua.edu (N.Hannoun ran it) Sun ultra-80 dual SMP 450MHz, 1GB, solaris 5.8 SunOS barnard 5.8 Generic_108528-06 sun4u sparc SUNW,Ultra-80 f95: Sun WorkShop 6 update 1 Fortran 95 6.1 2000/09/11 f95 -fast -O4 100 steps: 16288.0u 1.0s 6:52:27 65% 24nov01 ==> 162.88 CPUs/step in 6:52 hrs 300 steps: 65744.0u 1.0s 27:53:52 65% 27nov01 ==> 219.15 CPUs/step in 27:54 hrs 400 steps: 102160.0u 2.0s 52:44:52 53% 30nov01 ==> 255.40 CPUs/step in 52:45 hrs 500 steps: 153381.0u 2.0s 82:07:19 51% 4dec01 ==> 306.76 CPUs/step in 82:07 hrs
larry.cas.utk.edu Sun Ultra-4 296MHz, 2048MB, solaris 5.8 SunOS larry 5.8 Generic_108528-07 sun4u sparc SUNW,Ultra-4 f95: Sun WorkShop 6 2000/04/07 FORTRAN 95 6.0 f95 -fast -O4 100 steps: 26388.0u 2.6s 7:19:48 22nov01 ==> 263.88 CPUs/step in 7:20 hrs 300 steps: 109310.20u 3.1s 30:26:39 24nov01 ==> 364.37 CPUs/step in 30:26 hrs 400 steps: 172535.0u 10.0s 53:01:27 90% 26nov01 ==> 431.37 CPUs/step in 53:01 hrs f95 -fast -O4 -xtarget=ultra2 100 steps: 26754.0u 2.0s 7:26:41 99% 7dec01 300 steps: 110351.0u 3.0s 30:45:31 99% 6dec01 ==> 367.84 CPUs/step in 30:46 hrs 400 steps: 170279.0u 3.0s 47:21:55 99% 1dec01 ==> 425.70 CPUs/step in 47:21 hrs 500 steps: 257494.0u 8.0s 71:48:49 -66% 5dec01 ==> 514.99 CPUs/step in 71:49 hrs
capsicum.epm.ornl.gov SGI 32 node SMP, 250MHz, 32K/4MB, 4096 MB uname -a: IRIX64 capsicum 6.5 04131233 IP27 MIPSpro Compilers: Version 7.3.1.1m f90 -Ofast 100 steps: 22080.3u 16.3s 6:10:06 99% 30nov01 ==> 220.80 CPUs/step in 6:10 hrs 300 steps: 65744.8u 20.5s 18:21:09 99% 1dec01 ==> 219.15 CPUs/step in 18:21 hrs 400 steps: 130473.3u 53.6s 36:26:51 99% 3dec01 ==> 326.18 CPUs/step in 36:27 hrs
vxa.math.utk.edu Dell Latitude C600 752MHz, 256MB, 256KB cache redhat7.1 linux2.4.2 g77 version 2.96 20000731 (RedHat Linux 7.1.2.96-81) g77 -O3 100 steps: 19829.04u 7.86s 5:35:57 21nov01 ==> 198.29 CPUs/step in 5:36 hrs 300 steps: 77863.240u 56.950s 23:39:14.78 91.5% 25nov01 ==> 259.54 CPUs/step in 23:39 hrs
knox3.rgrid.utk.edu (a node of knox OIT cluster) Sun UltraSparc 900MHz 1MB uname -a: SunOS knox1 5.9 Generic_112233-11 sun4u sparc SUNW,Sun-Fire-280R f95 -V: Forte Developer 7 Fortran 95 7.0 2002/03/09 f95 -fast -O4 100 steps: 7599.0u 0.0s 2:07:16 09jan05 ==> 75.99 CPUs/step in 2:07 hrs
agnesi.math.utk.edu: Dell ??? 2.2GHz Xeon, 4GB, 512KB cache Linux 2.4.9-31enterprise Red Hat Linux release 7.2 Intel(R) Fortran Compiler for 32-bit applications, Version 6.0 Build 020312Z trial nov02 ifc -O3 -mp1 -tpp7 100 steps: 4114.890u 0.440s 1:08:34.60 13nov02 ==> 41.15 CPUs/step in 1:08 hrs 300 steps: 16192.93u 0.62s 4:29:53 13nov02 ==> 53.98 CPUs/step in 4:30 hrs 400 steps: 24590.580u 4.920s 6:49:55.30 14nov02 ==> 61.48 CPUs/step in 6:50 hrs 500 steps: 38043.360u 0.600s 10:34:04.97 15nov02 ==> 76.09 CPUs/step in 10:34 hrs
cheetah.ccs.ornl.gov IBM pSeries System (p690) 27 "Regatta" nodes, each with 32 processors on 16 chips CPU: 1.3 GHz Power4 processor, 64 KB L1 cache, 32 KB D-cache, 1.5 MB L2 cache estimated computational power 4.5 TeraFLOP/s OS: AIX 5.1.0.0 uname -a: AIX cheetah0033 1 5 00207D8A4C00 Fortran level: 7.1.1.3 xlf_r -g -O4 -qnoipa -bmaxdata:0x40000000 run from GPFS area (default is 32bit, needs -bmaxdata for 1GB memory, faster than -q64) runs on single processor: 100 steps: 3107.85u 0.64s 51:59 xlf_r -g -O4 -qnoipa -q64 16nov02 ==> 31.08 CPUs/step in 52 min slower than 32bit: 2886.1u 0.8s 48:09 xlf_r -g -O4 -qnoipa 16nov02 ==> 28.86 CPUs/step in 48 min 300 steps: 11608.9u 0.9s 3:13:32 xlf_r -g -O4 -qnoipa 16nov02 ==> 38.7 CPUs/step in 3:14 hrs 400 steps: 18679.9u 0.9s 5:11:19 xlf_r -g -O4 -qnoipa 17nov02 ==> 46.7 CPUs/step in 5:11 hrs 500 steps: 30138.3u 1.6s 8:22:17 xlf_r -g -O4 -qnoipa 17nov02 ==> 60.28 CPUs/step in 8:22 hrs
fubini.math.utk.edu: Dell ??? 3.06GHz Xeon, 4GB, 512KB cache Red Hat Linux release 9(Shrike) Intel(R) Fortran Compiler for 32-bit applications, Version 7.1 Build 20030909Z 100 steps: 3359.830u 4.730s 56:04.80 ifc -O3 -tpp7 13oct03 300 steps: 12871.430u 12.480s 3:34:45.47 ifc -O3 -tpp7 13oct03 400 steps: 34414.940u 4.390s 9:33:44.05 ? g77 -O3 ? 15oct03
fatou.math.utk.edu: Dell dual P4 Xeon 3.06GHz, 512KB cache, 4GB mem Linux 2.4.20-30.9bigmem #1 SMP ; Red Hat Linux release 9(Shrike) Statically compiled (on ares-linux) with: Intel(R) Fortran Compiler Version 8.0 Build 20031231Z and ran on fatou: 400 steps: 22318.670u 34.470s 6:12:38.22 ifc -O3 -tpp7 1jul04 ==> 55.8 CPUs/step in 6:12 hrs 500 steps: 33327.400u 40.560s 9:16:09.12 ifc -O3 -tpp7 2jul04 ==> 66.65 CPUs/step in 9:16 hrs
abcd.math.vanderbilt.edu dual Intel Pentium 4 XEON 3.20GHz 512KB cache, 4GB mem uname -a: 2.4.9-e.3smp #1 SMP i686 unknown ifc -v: Version 8.0 compiled with: 100 steps: 5206.730u 0.610s 1:26:47.28 ifc -O3 -tpp7 -w95 -FI 27oct04 ==> 52.1 CPUs/step in 1:26 hrs 300 steps: 23828.830u 0.770s 6:37:10.30 ifc -O3 -tpp7 -w95 -FI 27oct04 ==> 79.4 CPUs/step in 6:37 hrs 400 steps: 30334.310u 0.510s 8:25:35.10 ifc -O3 -tpp7 -w95 -FI 27oct04 ==> 75.8 CPUs/step in 8:26 hrs 500 steps: 29350.420u 0.460s 8:09:10.05 ifc -O3 -tpp7 -w95 -FI 27oct04 ==> 58.7 CPUs/step in 8:09 hrs
frodo.sinrg.cs.utk.edu 64 node linux cluster dual AMD Opteron 240 1.4GHz 1024KB cache 2GB mem uname -a: Linux head 2.4.19-NUMA #1 SMP x86_64 gcc -v: gcc version 3.2.2 (SuSE Linux) on head node 100 steps: 3992.950u 43.300s 1:07:24.90 g77 -O3 27oct04 ==> 40.4 CPUs/step in 1:07 hrs 300 steps: 15968.600u 15.180s 4:26:50.86 g77 -O3 27oct04 ==> 53.2 CPUs/step in 4:27 hrs 400 steps: 22812.520u 29.470s 6:20:42.16 g77 -O3 27oct04 ==> 57.1 CPUs/step in 6:20 hrs 500 steps: 32839.200u 0.320s 9:07:19.75 g77 -O3 27oct04 ==> 65.7 CPUs/step in 9:07 hrs
grig.sinrg.cs.utk.edu 64 node linux cluster dual Intel Xeon 3.2GHz 1024KB cache 4GB mem uname -a: Linux grig-head 2.6.8 #1 SMP x86_64 GNU/Linux gcc -v: gcc version 3.3.5 (Debian 1:3.3.5-13) ran via PBS 100 steps: walltime=00:43:18 g77 -O3 26mar07
hawk.csm.ornl.gov head node of 50 node linux cluster dual AMD Opteron 242 1.4GHz 1024KB cache 2GB mem uname -a: Linux g77 -v: gcc version 3.3.3 (SuSE Linux) g77 -O3 on hawk1 (head node) without prun: 100 steps: 3400.614u 0.555s 0:56:44 99.9% w/out prun 25jan05 ==> 34.0 CPUs/step in 0:57 hrs g77 -O3 -fPIC -fno-automatic -finit-local-zero -Wno-globals 100 steps: 3658.311u 0.388s 1:01:00.24 99.9% prun:hawk31 29jan05 ==> 36.6 CPUs/step in 1:01 hrs: slower than -03 on hawk1 300 steps: 16770.257u 17.298s 4:57:49.58 93.9% -O3 w/out prun 25jan05 ==> 55.9 CPUs/step in 4:58 hrs 400 steps: 29192.212u 7.433s 8:06:54.38 99.9% -O3 prun:hawk29 30jan05 ==> 72.98 CPUs/step in 8:07 hrs 500 steps: 32517.853u 17.562s 9:03:53.24 99.7% -O3 w/out prun 25jan05 ==> 65.0 CPUs/step in 9:04 hrs g77 -v: gcc version 3.4.2 <----- faster than 3.3.3 ? g77-3.4.2 -O3 -fPIC -fno-automatic -finit-local-zero -Wno-globals 100 steps: 3633.325u 1.362s 1:00:42.14 99.7% no prun:hawk1 1feb05 g77-3.4.2 -O3 -fPIC <---- best: 100 steps: 3280.363u 0.706s 54:46.54 99.8% no prun:hawk1 1feb05 ==> 31.8 CPUs/step in 0:55 hrs ifort : version 8.1 ifort -O3 on hawk1 (head node) without prun: 100 steps: 3411.446u 1.140s 0:56:59 99.8% 25jan05 ==> 34.1 CPUs/step in 0:57 hrs ifort -O3 -fpic -save -w95 -FI 100 steps: 4197.278u 1.916s 1:10:03.41 99.8% prun: hawk30 29jan05 ==> 42.0 CPUs/step in 1:10 hrs: slower than -O3 on hawk1 300 steps: 16135.677u 14.059s 4:32:54.38 98.6% -O4 w/out prun 30jan05 ==> 53.8 CPUs/step in 4:33 hrs 400 steps: 26026.244u 1.833s 7:13:53.04 99.9% -O4 prun:hawk29 30jan05 ==> 65.07 CPUs/step in 7:14 hrs 500 steps: 33564.316u 1.672s 9:20:07.35 99.8% -O4 prun:hawk30 30jan05 ==> 67.13 CPUs/step in 9:20 hrs pathscale EKO Version 1.4 gcc version 3.3.1 (PathScale 1.4 driver) pathf90 -Ofast -fpic -static-data -msse2 -Wno-globals 100 steps: 6285.782u 0.649s 1:44:47.61 99.9% prun: hawk16 29jan05 ==> 62.9 CPUs/step in 1:45 hrs: pathf90 -Ofast -fpic -static-data -msse2 -mtune=opteron 100 steps: 6486.800u 6.292s 1:48:48.50 99.4% 29jan05 ==> 64.9 CPUs/step in 1:49 hrs: worse with -mtune ! pathf90 -Ofast <-------- fastest for pf90, rest with -Ofast: 100 steps: 5326.438u 4.774s 1:30:34.04 98.1% ==> 53.3 CPUs/step in 1:31 hrs 300 steps: 23438.313u 18.280s 6:34:31.39 99.0% ==> 78.1 CPUs/step in 6:35 hrs 400 steps: 31791.565u 0.343s 8:50:31.37 99.8% prun:hawk31 30jan05 ==> 79.48 CPUs/step in 8:51 hrs 500 steps: 53463.715u 14.770s 14:52:43.87 99.8% w/out prun 28jan05 ==> 106.9 CPUs/step in 14:53 hrs
zeus.math.utk.edu 9+headnode Opteron 252 linux cluster dual AMD Opteron 252 2.6GHz 1024KB cache 2GB mem uname -a: Linux 2.6.12-1.1381_FC3smp x86_64 GNU/Linux g77 -v: gcc version 3.4.4 20050721 (Red Hat 3.4.4-2) g77 -O3 on master or single nodes 100 steps: 2151.284u 0.938s 35:53.67 99.9% 13apr06 ==> 21.5 CPUs/step in 0:36 hrs 41% faster than fatou ifc 300 steps: 10141.622u 0.503s 2:49:05.80 13apr06 ==> 33.8 CPUs/step in 2:49 hrs 400 steps: 15446.104u 0.690s 4:17:32.58 slower than on 100steps 13apr06 ==> 38.6 CPUs/step in 4:18 hrs 31% faster than fatou ifc 500 steps: 22344.229u 0.746s 6:12:32.97 13apr06 ==> 44.7 CPUs/step in 6:13 hrs 33% faster than fatou ifc pgf95 -V: pgf95 6.1-3 64-bit target on x86-64 Linux 13apr06 pgf95 -fast -O3 on master: 40% faster than fatou ifc ! 100 steps: 2194.774u 0.793s 36:38.14 => 21.9 CPUs/step in 0:37 hrs pgf95 -fast -O3 -fastsse on n01: 100 steps: 2202.048u 0.304s 36:43.15 => 22.0 CPUs/step in 0:37 hrs pgf95 -fast -O3 -fastsse -Mconcur (with: setenv NCPUS 2) on n02: 100 steps: 4327.326u 0.550s 36:05.19 199.8%=>43.3 CPUs/step 0:36 hrs pgf95 -fast -O3 -fastsse on single nodes: 300 steps: 9390.347u 0.562s 2:36:34.67 ==> 31.3 CPUs/step 2:37 hrs 7.4% faster than zeus g77 400 steps: 13901.953u 0.580s 3:51:47.79 38% faster than fatou ifc ==> 34.8 CPUs/step 3:52 hrs 9.8% faster than zeus g77 500 steps: 21353.843u 0.796s 5:56:02.18 36% faster than fatou ifc ==> 42.7 CPUs/step 5:56 hrs 4.5% faster than g77 21207.959u 115.662s 5:55:31.74 on n02 ==> 42.4 CPUs/step 5:56 hrs 4.5% faster than g77
oic.ornl.gov 325 node Xeon linux cluster dual Intel Xeon 3.4GHz 2048KB cache 4GB mem uname -a: Linux b06l02 2.6.9-22.0.2.ELsmp #1 SMP x86_64 GNU/Linux ? gigabit interconnect, PBS(torque)scheduler, Maui mgr , MPICH ifort -V: Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 9.0 Build 20051201 on headnode: /opt/intel/fce/9.0/bin/ifort -fast 100 steps: 2722.722u 0.848s 45:24.70 5may06 ==> 27.2 CPUs/step in 0:45 hrs 300 steps: 14710.469u 3.493s 4:05:19.39 ==> 49.03 CPUs/step in 4:05 hrs 400 steps: ==> CPUs/step in hrs 500 steps: ==> CPUs/step in hrs on a node via 'qsub PBSscript': /opt/mpich-ch_p4-icc-1.2.7/bin/mpif90 -fast -save -w95 -FI 100 steps: 5may06 ==> 27.2 CPUs/step in 45min
newton.usg.utk.edu head of 36-node Xeon linux cluster 32 compute nodes: dual Xeon 3.2GHz uname -a: Linux 2.6.9-11.ELsmp #1 SMP x86_64 x86_64 GNU/Linux g77 -v: gcc version 3.4.3 20050227 (Red Hat 3.4.3-22.1) g77 -O3 -finit-local-zero -Wno-globals 5may06 ifort in /opt/intel/fce/9.0/bin/ifort: Intel(R) Fortran Compiler for Intel(R) EM64T-based v 9.0 Build 20050809 runs on headnode: 100 steps: 1857.528u 0.585s 30:59.99 5may06 ==> 18.6 CPUs/step in 0:31 hrs 300 steps: 7669.275u 1.260s 2:07:58.78 5may06 ==> 25.6 CPUs/step in 2:08 hrs 400 steps: 11916.587u 2.222s 3:18:52.91 5may06 ==> 29.8 CPUs/step in 3:19 hrs 500 steps: 19029.789u 3.392s 5:17:29.36 6may06 ==> 38.1 CPUs/step in 5:17 hrs /opt/mpich/intel/bin/mpif90 -fast -save -w95 -FI /opt/mpich/intel/bin/mpirun -np ... 5may06

tiger.ornl.gov (head of 72-node Cray XD1 linux cluster) 70 compute nodes: dual Opteron 248, 8GB memory Cray RapidArray Interconnect (Hypertransport). LSS synchronizes nodes with global clock and co-schedules processes to avoid latency in global communication. Linux ch328-n6 2.6.5_H_01_04 #39 SMP x86_64 x86_64 GNU/Linux pgf95 -V: pgf95 7.0-2 64-bit target on x86-64 Linux pgf95 -fast -O3 -fastsse dec07 100 steps: 2367.934u 1.985s 39:33.49 => 23.7 CPUs/step 40 min 200 steps: 5033.659u 5.028s 1:24:11.28 => 25.2 CPUs/step 1:24 hrs 300 steps: 9000.059u 6.429s 2:30:23.08 => 30.0 CPUs/step 2:30 hrs 400 steps: 14897.405u 10.201s 4:08:55.42 => 37.2 CPUs/step 4:09 hrs 500 steps: 21475.207u 14.057s 5:58:52.68 => 42.95 CPUs/step 5:59 hrs g77 -v: gcc version 3.3.3 (SuSE Linux) g77 -O3 -Wno-globals -funroll-loops dec07 100 steps: 2397.798u 1.850s 40:03.65 => 24.0 CPUs/step 40 min 300 steps: 10055.587u 6.698s 2:47:57.66 => 33.5 CPUs/step 2:48 hrs 400 steps: 14204.340u 9.494s 3:57:16.88 => 35.5 CPUs/step 3:57 hrs 500 steps: 23164.182u 15.356s 6:26:59.00 => 46.3 CPUs/step 6:27 hrs
zeus.math.edu (head of 52-cpu Linux cluster) installed aug2009 head+2 nodes of dual Quad-Core AMD Opteron 2376 2.3GHz 2GB/node plus 15 dual Opteron 252 nodes 2GB/node uname -a: head.bw01.math.utk.edu 2.6.18-128.2.1.el5 #1 SMP x86_64 ifort -V: Version 11.1 Build 20090630 ID: l_cprof_p_11.1.046 ifort -fast -O3 on 1 cpu of head feb10 100 steps: 1502.461u 0.162s 25:02.91 => 15.0 CPUs/step 25 min 300 steps: 6795.622u 0.389s 1:53:16.53 => 22.6 CPUs/step 1:53 hrs PBS resources_used.mem=261756kb, vmem=591724kb 400 steps: 10194.112u 0.427s 2:49:55.57 => 25.5 CPUs/step 2:50 hrs 500 steps: 14785.711u 0.273s 4:06:28.02 => 29.6 CPUs/step 4:06 hrs
midtown.uthsc.edu (head of 56-cpu Linux cluster) installed nov2009 7 nodes of dual Quad-Core AMD Opteron 2376 2.3GHz 2GB/node uname -a: midtown.bw01.uthsc.edu 2.6.18-164.9.1.el5 #1 SMP x86_64 ifort -V: Version 11.1 Build 20091130 ID: l_cprof_p_11.1.064 ifort -fast -O3 on 1cpu of a node feb10 100 steps: 1545.806u 0.155s 25:46.20 => 15.0 CPUs/step 25 min 200 steps: 3573.607u 0.383s 59:35.62 => 17.9 CPUs/step in 1:00 hrs 300 steps: 6253.090u 1.014s 1:44:15.25 => 20.8 CPUs/step 1:44 hrs 400 steps: 9273.045u 0.260s 2:34:34.12 => 23.2 CPUs/step 2:35 hrs 500 steps: 13772.764u 0.300s 3:49:34.35 => 27.5 CPUs/step 3:50 hrs
....30 apr 2011.... ares.math.utk.edu 64bit dual-core AMD Opteron 2220 x86_64 (Fedora 14) uname -a: 2.6.35.12-88.fc14.x86_64 #1 SMP x86_64 GNU/Linux gfortran -O3 (gcc 4.5.1) 100 steps: 1877.161u 0.890s 31:45.09 => 18.8 CPUs/step in 32 min 200 steps: 4355.763u 0.786s 1:13:36.91 => 21.8 CPUs/step in 1:14 hrs 300 steps: 7974.257u 1.529s 2:14:51.88 => 26.6 CPUs/step in 1:27 hrs 400 steps: => CPUs/step in hrs 500 steps: 18627.188u 3.381s 5:15:04.08 => 37.5 CPUs/step in 3:50 hrs
....30 apr 2011.... ThinkPad X201 32bit Intel Core i5 CPU M 520 @ 2.40GHz i686 2-core uname -a: 2.6.34.8-68.fc13.i686 #1 SMP i686 GNU/Linux gfortran -O3 (gcc 4.4.5) 100 steps: 1086.187u 1.822s 18:08.36 => 10.9 CPUs/step in 18 min 200 steps: 2966.126u 10.776s 49:38.23 => 14.8 CPUs/step in 50 min 300 steps: 4020.415u 3.957s 1:27:21.11 => 13.4 CPUs/step in 1:27 hrs 400 steps: 8379.494u 13.460s 2:20:28.24 => 20.9 CPUs/step in 2:20 hrs 500 steps: 11415.944u 24.757s 17:40:21.74 => 22.8 CPUs/step in 3:10 hrs
....3 feb 2012.... frost.ccs.ornl.gov (node of 2048 core Linux cluster) 3feb2012 SGI Altix ICE 8200 cluster, 128 nodes x16=2048 cores, 24GB mem Intel Xeon CPU E5440 @ 2.83GHz 4GB uname -a: Linux frost3 2.6.18-128.7.1.el5 #1 SMP x86_64 GNU/Linux Red Hat Enterprise Linux Server release 5.3 (Tikanga) ifort -fast 100 steps: 1075.660u 0.531s 17:56.71 => 10.8 CPUs/step 18 min 200 steps: 2377.953u 0.461s 39:39.18 => 11.9 CPUs/step in 40 min 300 steps: 4253.870u 0.209s 1:10:55.38 => 14.2 CPUs/step in 1:11 hrs 400 steps: 6485.084u 0.458s 1:48:06.69 => 16.2 CPUs/step in 1:48 hrs 500 steps: 9637.637u 0.379s 2:40:45.71 => 19.3 CPUs/step in 2:41 hrs
householder.math.utk.edu (20 core cluster) (Fedora19) 26may2014 Two 10 core Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 192GB mem (Huge shared memory, no MPI, no scheduler, just the OS decides) uname -a: 3.14.4-100.fc19.x86_64 #1 SMP x86_64 GNU/Linux gfortran -O3 (gcc 4.8.2) 100 steps: 485.462u 0.443s 8:07.22 => 4.8 CPUs/step in 8 min 300 steps: 1981.121u 0.944s 33:07.25 => 6.0 CPUs/step in 33 min 400 steps: 3064.296u 1.359s 51:13.86 => 7.7 CPUs/step in 51 min 500 steps: 5047.879u 2.180s 1:24:25.39 => 10.1 CPUs/step in 1:24 hrs concurrent running 4 200-steps (in different directories) 1197.249u 0.769s 20:01.92, 1194.257u 0.868s 19:59.24, 1188.664u 1.166s 19:53.68, 1208.337u 0.814s 20:13.14 so they actually ran a bit faster than the one alone! concurrent running 4 400-steps (in different directories) 3423.242u 5.039s 57:22.24, 3308.207u 2.660s 55:25.01, 3307.878u 3.612s 55:25.14, 3670.192u 2.316s 1:01:32.57 all slower than single, one much slower(20%)! So performance IS affected by how many jobs are running.
mars.math.utk.edu (4 core ) (Fedora21) Dell Optiplex ???? 26jun2015 Two 2 core Intel(R) Xeon(R) CPU i7-4790 CPU @ 3.60GHz 24.6GB mem uname -a: 4.0.4-201.fc21.x86_64 #1 SMP x86_64 GNU/Linux gfortran -O3 (gcc 4.9.2-6) 100 steps: 346.288u 0.142s 5:46.65 => 3.5 CPUs/step in 6 min 300 steps: 1452.034u 0.176s 24:12.83 => 4.8 CPUs/step in 24 min 400 steps: 2274.902u 0.390s 37:56.25 => 5.7 CPUs/step in 38 min 500 steps: 3475.478u 0.528s 57:57.41 => 6.95 CPUs/step in 58 min
ares.math.utk.edu (4 core ) (Fedora21) Dell Optiplex 9020 26jun2015 Two 2 core Intel(R) Xeon(R) CPU i7-4790 CPU @ 3.60GHz 16.4GB mem uname -a: 4.0.8-200.fc21.x86_64 #1 SMP x86_64 GNU/Linux gfortran -O3 (gcc 4.9.2-6) 100 steps: 367.116u 0.056s 6:07.31 => 3.7 CPUs/step in 6 min 300 steps: 1553.037u 0.059s 25:53.67 => 5.2 CPUs/step in 26 min 400 steps: 2455.629u 0.071s 40:56.62 => 6.1 CPUs/step in 41 min 500 steps: 3724.907u 0.287s 1:02:11.72 => 7.4 CPUs/step in 62 min
darter.nics.tennessee.edu Cray XC30 (Cascade) supercomputer 724 compute nodes, each with 16 cores, 32 GB of memory. Cores: 2.6 GHz 64bit Intel XEON E5-2600 Peak performance of 240.9 TF Cray Aries router (8GB/sec bandwidth) torque/4.2.9 , moab/7.2.9 scheduler, PBS runs with module PrgEnv-cray/5.2.40: crayftn mpich OMP ftn -o melt-cray.x -ffixed meltflowbnch.f (without OMP) 100 steps: 460u 00:07:40 => 4.6 CPUs/step in 7.7 min 400 steps: 2971u 00:49.52 => 7.4 CPUs/step in 50 min ftn -o melt-cray.x -ffixed -homp meltflowbnch.f (with OMP) 100 steps: 459u 00:07:39 => 4.6 CPUs/step in 7.65 min runs with module PrgEnv-intel/5.2.40: ifort mpich OMP ftn -o melt-intel.x -fixed -fast -openmp meltflowbnch.f (OMP) 400 steps: 3290u 00:54:50 => 8.2 CPUs/step in 55 min

How to find out specs
  • OS, hostname, etc: uname -a
  • CPU, cache: linux : more /proc/cpuinfo alpha : psrinfo -v solaris: /opt/SUNWspro/bin/fpversion irix64 : hinv | grep -e MHZ -e cache aix : sysinfo | grep cache
  • memory : linux : cat /proc/meminfo alpha : ulimit -a | grep memory solaris: /usr/sbin/prtconf | grep -i memory irix64 : hinv | grep memory aix : sysinfo | grep memory
  • compiler : linux : f77 -v , pgf90 -V , gcc -v alpha : f95 -version ; cc -V; cxx -V solaris: f95 -V ; /opt/SUNWspro/bin/cc -V irix64 : f90 -version aix : sysinfo | grep xlf : lslpp -i | grep xlf

  • Other benchmarking pages:

  • FsPx Benchmark: alloy solidification
  • Retina-MPI Benchmark: Phototransduction in Retinal Rod Cells
  • BenchWeb at netlib
  • MDBNCH: A molecular dynamics benchmark

  • ....... back to V. Alexiades Home Page
    © 2001-2015  V. Alexiades     alexiades@utk.edu                 Last Updated:   19 Sep 2015