Upload
thea
View
29
Download
0
Embed Size (px)
DESCRIPTION
Efficient I/O Using Dedicated Cores in Large-Scale HPC Simulations. Matthieu Dorier, ENS Cachan Brittany, IRISA, [email protected]. Challenges HPC simulations running on over 100.000 cores Petabytes of data to be stored for subsequent visualization and analysis - PowerPoint PPT Presentation
Citation preview
Experiments on up to 9216 cores of Kraken with the CM1 atmospheric simulationDamaris…
Achieves a nearly perfect scalability, shows a more than 3x speedup compared to collective-I/O Improves the aggregate throughput by a factor of 15 compared to collective-I/O
Efficient I/O Using Dedicated Cores in Large-Scale HPC Simulations
Matthieu Dorier, ENS Cachan Brittany, IRISA, [email protected]
Nek5000
CM1
http://damaris.gforge.inria.fr
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250
50
100
150
200
250
300
Simulation
Iteration number
Tim
e (s
ec)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250
50
100
150
200
250
300
Simulation Visualization
Iteration number
Tim
e (s
ec)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250
50
100
150
200
250
300
Simulation
Iteration number
Tim
e (s
ec)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250
50
100
150
200
250
300
Simulation
Iteration number
Tim
e (s
ec)
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
100020003000400050006000700080009000
10000
Perfect scaling DamarisFile-per-process Collective-I/O
Number of cores
Scal
abili
ty fa
ctor
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.01
0.1
1
10
100
File-per-process DamarisCollective-I/O
Number of coresAggr
egat
e th
roug
hput
(G
B/s)
576 2304 92160
100
200
300
400
500
600
Collective-I/O File-per-processDamaris
Number of cores
Dura
tion
of w
rite
(sec
)
576 2304 92160
200
400
600
800
1000
File-per-process Collective-I/ODamaris
Number of cores
Run
time
(sec
)
The VisIt visualization software connects to the running simulation to
perform in-situ visualizationAll cores used by Nek5000,
no visualization.
1 core out of 24 in each node is dedicated. No visualization.
The dedicated cores are used to perform in-situ visualization in the background,
without impacting Nek5000.
Image by Matthieu Dorier
Images by Leigh Orf
Experiments on 816 cores of Grid’5000With the Nek5000 CFD code
Damaris… Completely hides the run-time impact of in-situ
visualization and analysis within dedicated cores Provides interactivity through a
connection with the VisIt software
Challenges HPC simulations running on over 100.000 cores Petabytes of data to be stored for subsequent
visualization and analysis Heavy pressure on the file system Huge storage space requirements
How to efficiently write large data?How to efficiently retrieve scientific insights?
The Damaris Approach: Dedicated I/O Cores
<mesh name=“my3Dgrid” type=“rectilinear”> <coordinate name=“x”/> <coordinate name=“y”/></mesh><layout name=“my3Dlayout” dim=“3,5*N/2”/><variable name=“temperature” layout=“my3Dlayout” mesh=“my3Dgrid”/><event name=“do_statistics” library=“mylib.so” action=“my_function”/>
program example real, dimension(64,16,2) :: my_data ... call dc_initialize("my_config.xml”,…) call dc_write(”temperature", my_data, ierr) call dc_signal(”do_statistics”, ierr) call dc_end_iteration(ierr) call dc_finalize(ierr) ...end program example
Damaris – Leave a core, go faster! Dedicate one or a few cores per SMP node Communicate data through shared-memory Through an adaptable plugin system, use this core to
asynchronously…
Damaris was evaluated on
JaguarPF (ORNL)Kraken (NICS)Intrepid (ANL)
BlueWaters (NCSA)Grid’5000
Damaris constitutes one of the first results of the JLPC to be
validated for use on BlueWaters
Completely hides the I/O performance variability from the point of view of the simulation
Aggregates data in bigger files and allows an overhead-free 600% compression ratio Spares time in dedicated cores that can be used for in-situ visualization
Process, compress and aggregate the data Write it to files Analyze it while the simulation runs Transparently connect to visualization backendsCheck out an online demo from your smartphone!
Mem
ory
core core
core core
Mem
ory
core core
core core
In-Situ Visualization
Asynchronous Storage
Files
Multicore SMP node
Without Damaris
With Damaris
Image by Roberto Sisneros
Globally improves the overall application run-time