Transcript
Page 1: Experiments on up  to  9216  cores of Kraken with the CM1 atmospheric simulation Damaris …

Experiments on up to 9216 cores of Kraken with the CM1 atmospheric simulationDamaris…

Achieves a nearly perfect scalability, shows a more than 3x speedup compared to collective-I/O Improves the aggregate throughput by a factor of 15 compared to collective-I/O

Efficient I/O Using Dedicated Cores in Large-Scale HPC Simulations

Matthieu Dorier, ENS Cachan Brittany, IRISA, [email protected]

Nek5000

CM1

http://damaris.gforge.inria.fr

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

50

100

150

200

250

300

Simulation

Iteration number

Tim

e (s

ec)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

50

100

150

200

250

300

Simulation Visualization

Iteration number

Tim

e (s

ec)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

50

100

150

200

250

300

Simulation

Iteration number

Tim

e (s

ec)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

50

100

150

200

250

300

Simulation

Iteration number

Tim

e (s

ec)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

100020003000400050006000700080009000

10000

Perfect scaling DamarisFile-per-process Collective-I/O

Number of cores

Scal

abili

ty fa

ctor

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.01

0.1

1

10

100

File-per-process DamarisCollective-I/O

Number of coresAggr

egat

e th

roug

hput

(G

B/s)

576 2304 92160

100

200

300

400

500

600

Collective-I/O File-per-processDamaris

Number of cores

Dura

tion

of w

rite

(sec

)

576 2304 92160

200

400

600

800

1000

File-per-process Collective-I/ODamaris

Number of cores

Run

time

(sec

)

The VisIt visualization software connects to the running simulation to

perform in-situ visualizationAll cores used by Nek5000,

no visualization.

1 core out of 24 in each node is dedicated. No visualization.

The dedicated cores are used to perform in-situ visualization in the background,

without impacting Nek5000.

Image by Matthieu Dorier

Images by Leigh Orf

Experiments on 816 cores of Grid’5000With the Nek5000 CFD code

Damaris… Completely hides the run-time impact of in-situ

visualization and analysis within dedicated cores Provides interactivity through a

connection with the VisIt software

Challenges HPC simulations running on over 100.000 cores Petabytes of data to be stored for subsequent

visualization and analysis Heavy pressure on the file system Huge storage space requirements

How to efficiently write large data?How to efficiently retrieve scientific insights?

The Damaris Approach: Dedicated I/O Cores

<mesh name=“my3Dgrid” type=“rectilinear”> <coordinate name=“x”/> <coordinate name=“y”/></mesh><layout name=“my3Dlayout” dim=“3,5*N/2”/><variable name=“temperature” layout=“my3Dlayout” mesh=“my3Dgrid”/><event name=“do_statistics” library=“mylib.so” action=“my_function”/>

program example real, dimension(64,16,2) :: my_data ... call dc_initialize("my_config.xml”,…) call dc_write(”temperature", my_data, ierr) call dc_signal(”do_statistics”, ierr) call dc_end_iteration(ierr) call dc_finalize(ierr) ...end program example

Damaris – Leave a core, go faster! Dedicate one or a few cores per SMP node Communicate data through shared-memory Through an adaptable plugin system, use this core to

asynchronously…

Damaris was evaluated on

JaguarPF (ORNL)Kraken (NICS)Intrepid (ANL)

BlueWaters (NCSA)Grid’5000

Damaris constitutes one of the first results of the JLPC to be

validated for use on BlueWaters

Completely hides the I/O performance variability from the point of view of the simulation

Aggregates data in bigger files and allows an overhead-free 600% compression ratio Spares time in dedicated cores that can be used for in-situ visualization

Process, compress and aggregate the data Write it to files Analyze it while the simulation runs Transparently connect to visualization backendsCheck out an online demo from your smartphone!

Mem

ory

core core

core core

Mem

ory

core core

core core

In-Situ Visualization

Asynchronous Storage

Files

Multicore SMP node

Without Damaris

With Damaris

Image by Roberto Sisneros

Globally improves the overall application run-time