1
Experiments on up to 9216 cores of Kraken with the CM1 atmospheric simulation Damaris… Achieves a nearly perfect scalability, shows a more than 3x speedup compared to collective-I/O Improves the aggregate throughput by a factor of 15 compared to collective-I/O Efficient I/O Using Dedicated Cores in Large- Scale HPC Simulations Matthieu Dorier, ENS Cachan Brittany, IRISA, [email protected] Nek500 0 CM1 http:// damaris.gforge.inria. 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425 0 50 100 150 200 250 300 Simulation Iteration number Time (sec) 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425 0 50 100 150 200 250 300 Simulation Visualization Iteration number Time (sec) 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425 0 50 100 150 200 250 300 Simulation Iteration number Time (sec) 12 34567 8 9 10111213141516171819202122232425 0 50 100 150 200 250 300 Simulation Iteration number Time (sec) 0 100 0 200 0 3000 400 0 500 0 600 0 700 0 800 0 9000 100 00 0 5000 10000 Perfect scaling Damaris File-per-process Collective-I/O Number of cores Scalability factor 0 1000 2000 3000 4000 500 0 6000 700 0 800 0 900 0 10000 0.01 1 100 File-per-process Damaris Collective-I/O Number of cores Aggregate throughput (GB/s) 576 2304 9216 0 100 200 300 400 500 600 Collective-I/O File-per-process Damaris Number of cores Duration of write (sec) 576 2304 9216 0 500 1000 File-per-process Collective-I/O Damaris Number of cores Run time (sec) The VisIt visualization software connects to the running simulation to perform in-situ visualization All cores used by Nek5000, no visualization. 1 core out of 24 in each node is dedicated. No visualization. The dedicated cores are used to perform in-situ visualization in the background, without impacting Nek5000. Image by Matthieu Dorier Images by Leigh Orf Experiments on 816 cores of Grid’5000 With the Nek5000 CFD code Damaris… Completely hides the run-time impact of in-situ visualization and analysis within dedicated cores Provides interactivity through a connection with the VisIt software Challenges HPC simulations running on over 100.000 cores Petabytes of data to be stored for subsequent visualization and analysis Heavy pressure on the file system Huge storage space requirements How to efficiently write large data? How to efficiently retrieve scientific insights? The Damaris Approach: Dedicated I/O Cores <mesh name=“my3Dgridtype=“rectilinear”> <coordinate name=“x”/> <coordinate name=“y”/> </mesh> <layout name=“my3Dlayoutdim=“3,5*N/2”/> <variable name=“temperaturelayout=“my3Dlayoutmesh=“my3Dgrid”/> <event name=“do_statisticslibrary=“mylib.soaction=“my_function”/> program example real, dimension(64,16,2) :: my_data ... call dc_initialize("my_config.xml”,…) call dc_write(”temperature", my_data, ierr) call dc_signal(”do_statistics”, ierr) call dc_end_iteration(ierr) call dc_finalize(ierr) ... end program example Damaris – Leave a core, go faster! Dedicate one or a few cores per SMP node Communicate data through shared- memory Through an adaptable plugin system, use this core to asynchronously… Damaris was evaluated on JaguarPF (ORNL) Kraken (NICS) Intrepid (ANL) BlueWaters (NCSA) Grid’5000 Damaris constitutes one of the first results of the JLPC to be validated for use on BlueWaters Completely hides the I/O performance variability from the point of view of the simulation Aggregates data in bigger files and allows an overhead- free 600% compression ratio Spares time in dedicated cores that can be used for in- situ visualization Process, compress and aggregate the data Write it to files Analyze it while the simulation runs Transparently connect to visualization backends Check out an online demo from your smartphone! Memory cor e cor e cor e cor e Memory cor e cor e cor e cor e In-Situ Visualizati on Asynchronou s Storage Fil es Multicore SMP node Without Damaris With Damaris I m a g e b y R o b e r t o S i s n e r o s Globally improves the overall application run-time

Experiments on up to 9216 cores of Kraken with the CM1 atmospheric simulation Damaris …

  • Upload
    thea

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Efficient I/O Using Dedicated Cores in Large-Scale HPC Simulations. Matthieu Dorier, ENS Cachan Brittany, IRISA, [email protected]. Challenges HPC simulations running on over 100.000 cores Petabytes of data to be stored for subsequent visualization and analysis - PowerPoint PPT Presentation

Citation preview

Page 1: Experiments on up  to  9216  cores of Kraken with the CM1 atmospheric simulation Damaris …

Experiments on up to 9216 cores of Kraken with the CM1 atmospheric simulationDamaris…

Achieves a nearly perfect scalability, shows a more than 3x speedup compared to collective-I/O Improves the aggregate throughput by a factor of 15 compared to collective-I/O

Efficient I/O Using Dedicated Cores in Large-Scale HPC Simulations

Matthieu Dorier, ENS Cachan Brittany, IRISA, [email protected]

Nek5000

CM1

http://damaris.gforge.inria.fr

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

50

100

150

200

250

300

Simulation

Iteration number

Tim

e (s

ec)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

50

100

150

200

250

300

Simulation Visualization

Iteration number

Tim

e (s

ec)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

50

100

150

200

250

300

Simulation

Iteration number

Tim

e (s

ec)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

50

100

150

200

250

300

Simulation

Iteration number

Tim

e (s

ec)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

100020003000400050006000700080009000

10000

Perfect scaling DamarisFile-per-process Collective-I/O

Number of cores

Scal

abili

ty fa

ctor

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.01

0.1

1

10

100

File-per-process DamarisCollective-I/O

Number of coresAggr

egat

e th

roug

hput

(G

B/s)

576 2304 92160

100

200

300

400

500

600

Collective-I/O File-per-processDamaris

Number of cores

Dura

tion

of w

rite

(sec

)

576 2304 92160

200

400

600

800

1000

File-per-process Collective-I/ODamaris

Number of cores

Run

time

(sec

)

The VisIt visualization software connects to the running simulation to

perform in-situ visualizationAll cores used by Nek5000,

no visualization.

1 core out of 24 in each node is dedicated. No visualization.

The dedicated cores are used to perform in-situ visualization in the background,

without impacting Nek5000.

Image by Matthieu Dorier

Images by Leigh Orf

Experiments on 816 cores of Grid’5000With the Nek5000 CFD code

Damaris… Completely hides the run-time impact of in-situ

visualization and analysis within dedicated cores Provides interactivity through a

connection with the VisIt software

Challenges HPC simulations running on over 100.000 cores Petabytes of data to be stored for subsequent

visualization and analysis Heavy pressure on the file system Huge storage space requirements

How to efficiently write large data?How to efficiently retrieve scientific insights?

The Damaris Approach: Dedicated I/O Cores

<mesh name=“my3Dgrid” type=“rectilinear”> <coordinate name=“x”/> <coordinate name=“y”/></mesh><layout name=“my3Dlayout” dim=“3,5*N/2”/><variable name=“temperature” layout=“my3Dlayout” mesh=“my3Dgrid”/><event name=“do_statistics” library=“mylib.so” action=“my_function”/>

program example real, dimension(64,16,2) :: my_data ... call dc_initialize("my_config.xml”,…) call dc_write(”temperature", my_data, ierr) call dc_signal(”do_statistics”, ierr) call dc_end_iteration(ierr) call dc_finalize(ierr) ...end program example

Damaris – Leave a core, go faster! Dedicate one or a few cores per SMP node Communicate data through shared-memory Through an adaptable plugin system, use this core to

asynchronously…

Damaris was evaluated on

JaguarPF (ORNL)Kraken (NICS)Intrepid (ANL)

BlueWaters (NCSA)Grid’5000

Damaris constitutes one of the first results of the JLPC to be

validated for use on BlueWaters

Completely hides the I/O performance variability from the point of view of the simulation

Aggregates data in bigger files and allows an overhead-free 600% compression ratio Spares time in dedicated cores that can be used for in-situ visualization

Process, compress and aggregate the data Write it to files Analyze it while the simulation runs Transparently connect to visualization backendsCheck out an online demo from your smartphone!

Mem

ory

core core

core core

Mem

ory

core core

core core

In-Situ Visualization

Asynchronous Storage

Files

Multicore SMP node

Without Damaris

With Damaris

Image by Roberto Sisneros

Globally improves the overall application run-time