21
Performance of the hybrid MPI/OpenMP version of the HERACLES code on the Curie « Fat nodes » system Edouard Audit, Matthias Gonzalez, Pierre Kestener and Pierre-François Lavallé

Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

Performance of the hybrid MPI/OpenMPversion of the HERACLES code on the

Curie « Fat nodes » system

Edouard Audit, Matthias Gonzalez, Pierre Kestener and Pierre-François Lavallé

Page 2: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

The HERACLES code

(Magneto)hydrodynamics : finite volume, 2nd order godunovExplicit or Implicit

Multigroup radiative transfer : Moment method, Implicit

Gravity, fully coupled to ohydro / Splitted

Thermochemistry and/or heating/coling function (local)

Turbulent forcing (local)

Fixed grid finite volume code working in 1,2,and 3D in cartesian, cylindrical and spherical coordinate. Fortran + MPI, domain decomposition

Used in astrophysics (star formation, interstellar medium studies,…) and to interpret laser generated plasma experiment.

Page 3: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

The HERACLES code

(Magneto)hydrodynamics : finite volume, 2nd order godunovExplicit or Implicit

Multigroup radiative transfer : Moment method, Implicit

Gravity, fully coupled to hydro / Splitted

Thermochemistry and/or heating/cooling function (local)

Turbulent forcing (local)

Fixed grid finite volume code working in 1,2,and 3D in cartesian, cylindrical and spherical coordinate. Fortran + MPI, domain decomposition

Used in astrophysics (star formation, interstellar medium studies,…) and to interpret laser generated plasma experiment.

Page 4: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend
Page 5: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

Domain Decomposition

MPI process

MPI processMPI process

MPI process

Page 6: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

Domaine DecompositionPhysical boundaries

Communications

Page 7: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

The HERACLES code Read simulation parameters

Split domain over the MPI processes

Initial conditions

Loop over time

Fill the ghost cells : boundary conditions or communications

Compute time step Hydro step

Loop over chunk

Loop over cells (slope, Riemann solver,….)

Compute cooling (local)

Stirring (local)

Output

End

OpenMPOpenMP

OpenMPOpenMP

Not multi-threaded

Page 8: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

Pure MPI vs MPI/OpenMPMPI MPI + 4 threads

16 messages of size 1 4 messages of size 2

Page 9: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

The Curie system

Fat nodes360 BullX-S6010

Intel NH EX 2,26 Ghz11 520 cores32 cores/node128 GB/node

105 TFlops

Thin nodes5040 BullX B510

Intel New generation (SNB)80 640 cores

16 cores/node - 4 GB/core – 128 GB SSD1.5+ PFflops

Hybrid nodes144 Bullx B505

288 Nvidia M2090

184 + 11 TFlops

Interconnect Infiniband QDR

1st levelLustre

6 PB - 150 GB/s

February 2011 March 2012October 2011

Page 10: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

The Curie system

Page 11: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

Strong Scaling (9003 run)

Pur MPI

2 threads

4 threads

8 threads

Page 12: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

Strong Scaling (9003 run)

Pur MPI

2 threads

4 threads

8 threads

Page 13: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

Strong Scaling (9003 run)

Pur MPI

2 threads

4 threads

8 threads

Page 14: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

weak scaling - 2563 / node (32 cores)

Pur MPI

2 threads

4 threads

8 threads

Page 15: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

weak scaling - 2563 / node (32 cores)

Page 16: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

weak scaling - 2563 / node (32 cores)

Page 17: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

Scaling on BlueGene-IDRIS (strong scaling)

Page 18: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

IO – the craftsman way

All processes write their output at the same time….

Failure when > few 103

Write by packet + temporization

Ncpu_write ~ 100 – 1000 T_wait ~ 2 – 10 secondes

One output ~ 5 time steps

Page 19: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

IO – the professional approachP. Wautelet and P. Kestener

4 different IO approach where tested : POSIX : 1 file per MPI processes MPI-IO HDF5 Parallel-NetCDF

STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend on the application 7 of the 23 available hints where tested !!

STEP 2 : Strong Scaling test

Page 20: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

IO – the professional approachP. Wautelet and P. Kestener

Page 21: Performance of the hybrid MPI/OpenMP version of the ......STEP 1 : Optimizing the MPI-IO Hints MPI-O hints can have a dramatic effect on the IO performances Best parameters depend

SIAM meeting, Savannah, February 2012

Conclusions

Multi-threading necessary for large number of cores

OpenMP is “easy” to implement but not always to understand…

Multi-threaded communications probably necessary

Good results for a small number of threads.