Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Performance of the hybrid MPI/OpenMPversion of the HERACLES code on the
Curie « Fat nodes » system
Edouard Audit, Matthias Gonzalez, Pierre Kestener and Pierre-François Lavallé
SIAM meeting, Savannah, February 2012
The HERACLES code
(Magneto)hydrodynamics : finite volume, 2nd order godunovExplicit or Implicit
Multigroup radiative transfer : Moment method, Implicit
Gravity, fully coupled to ohydro / Splitted
Thermochemistry and/or heating/coling function (local)
Turbulent forcing (local)
Fixed grid finite volume code working in 1,2,and 3D in cartesian, cylindrical and spherical coordinate. Fortran + MPI, domain decomposition
Used in astrophysics (star formation, interstellar medium studies,…) and to interpret laser generated plasma experiment.
SIAM meeting, Savannah, February 2012
The HERACLES code
(Magneto)hydrodynamics : finite volume, 2nd order godunovExplicit or Implicit
Multigroup radiative transfer : Moment method, Implicit
Gravity, fully coupled to hydro / Splitted
Thermochemistry and/or heating/cooling function (local)
Turbulent forcing (local)
Fixed grid finite volume code working in 1,2,and 3D in cartesian, cylindrical and spherical coordinate. Fortran + MPI, domain decomposition
Used in astrophysics (star formation, interstellar medium studies,…) and to interpret laser generated plasma experiment.
SIAM meeting, Savannah, February 2012
Domain Decomposition
MPI process
MPI processMPI process
MPI process
SIAM meeting, Savannah, February 2012
Domaine DecompositionPhysical boundaries
Communications
SIAM meeting, Savannah, February 2012
The HERACLES code Read simulation parameters
Split domain over the MPI processes
Initial conditions
Loop over time
Fill the ghost cells : boundary conditions or communications
Compute time step Hydro step
Loop over chunk
Loop over cells (slope, Riemann solver,….)
Compute cooling (local)
Stirring (local)
Output
End
OpenMPOpenMP
OpenMPOpenMP
Not multi-threaded
SIAM meeting, Savannah, February 2012
Pure MPI vs MPI/OpenMPMPI MPI + 4 threads
16 messages of size 1 4 messages of size 2
SIAM meeting, Savannah, February 2012
The Curie system
Fat nodes360 BullX-S6010
Intel NH EX 2,26 Ghz11 520 cores32 cores/node128 GB/node
105 TFlops
Thin nodes5040 BullX B510
Intel New generation (SNB)80 640 cores
16 cores/node - 4 GB/core – 128 GB SSD1.5+ PFflops
Hybrid nodes144 Bullx B505
288 Nvidia M2090
184 + 11 TFlops
Interconnect Infiniband QDR
1st levelLustre
6 PB - 150 GB/s
February 2011 March 2012October 2011
SIAM meeting, Savannah, February 2012
The Curie system
SIAM meeting, Savannah, February 2012
Strong Scaling (9003 run)
Pur MPI
2 threads
4 threads
8 threads
SIAM meeting, Savannah, February 2012
Strong Scaling (9003 run)
Pur MPI
2 threads
4 threads
8 threads
SIAM meeting, Savannah, February 2012
Strong Scaling (9003 run)
Pur MPI
2 threads
4 threads
8 threads
SIAM meeting, Savannah, February 2012
weak scaling - 2563 / node (32 cores)
Pur MPI
2 threads
4 threads
8 threads
SIAM meeting, Savannah, February 2012
weak scaling - 2563 / node (32 cores)
SIAM meeting, Savannah, February 2012
weak scaling - 2563 / node (32 cores)
SIAM meeting, Savannah, February 2012
Scaling on BlueGene-IDRIS (strong scaling)
SIAM meeting, Savannah, February 2012
IO – the craftsman way
All processes write their output at the same time….
Failure when > few 103
Write by packet + temporization
Ncpu_write ~ 100 – 1000 T_wait ~ 2 – 10 secondes
One output ~ 5 time steps
SIAM meeting, Savannah, February 2012
IO – the professional approachP. Wautelet and P. Kestener
4 different IO approach where tested : POSIX : 1 file per MPI processes MPI-IO HDF5 Parallel-NetCDF
STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend on the application 7 of the 23 available hints where tested !!
STEP 2 : Strong Scaling test
SIAM meeting, Savannah, February 2012
IO – the professional approachP. Wautelet and P. Kestener
SIAM meeting, Savannah, February 2012
Conclusions
Multi-threading necessary for large number of cores
OpenMP is “easy” to implement but not always to understand…
Multi-threaded communications probably necessary
Good results for a small number of threads.