Download pdf - CCalvin- HPC for reactor physics -v2 · • Explicit message passing between processors (MPI) ... HPC For reactor physics: ... • « Advanced plutonium assembly parallel calculations

1CEA/DEN/DANS/DM2S/SERMA

HPC for reactor physics

C. Calvin - C. Diop

CEA – Nuclear Energy Division

CEA/DEN/DANS/DM2S/SERMA


Outline

• Introduction: today and future solutions for HPC (a short intro.)– Hardware zoology

– Software

– CEA/DEN tools in this landscape

• HPC for reactor physics– Objectives and simulation targets

• HPC in deterministic simulations– APOLLO2, CRONOS2, APOLLO3

• HPC in Monte Carlo simulations– TRIPOLI-4


Hardware & software zoology

• Hardware– Vector machines : CRAY X1, NEC SX9– Parallel machines

• Based on standard processors (Intel, AMD, IBM Power)• Network of workstations (NOW)• Cluster of SMP (SGI, IBM, HP, SUN, BULL)• MPP (IBM Blue Gene, CRAY XT4))

– Trends• All the processors are multi-core ones ↑↑↑↑• Petaflops �� power / Flops• Accelerators (GPU, Cell, Clearspeed, FPGAs, …) -Hyb rid architecture

(CRAY XT5h)• Software

– Today: Mainly 2:• Explicit message passing between processors (MPI)• Communication through memory and multithreading (OpenMP)

– Today +1:• Specific language for accelerators (CUDA, CELL SDK, …)• Tentative of generalization (RapidMind, CAPS HMPP, …)

– Tomorrow:• New languages for managing complexity of

– Hybrid architecture– Massive multi-cores in a node– Machine with many many cores (>100 000)

• UPC, CAF, X10, Chapel


TOP 500 List – June 2008 - Europe

51

46

50

49

60

54

60

64

63

62

94

126

156

111

123

139

223

RPeak

5548Cray XT4Cray Inc.Univ. of Bergen47

16384Blue GeneIBMFZJ46

5336ACTION ClusterACTIONGdansk45

7680Novascale 3045Bull SACEA42

6400Cluster Platform 3000 DL140HPNSC40

5376BladeCenter HS21IBMHPC2N39

5000T-Platforms T60, T-PlatformsMoscow Univ.36

9968NovaScale 5160,Bull SACEA32

11328Cray XT4, 2.8 GHzCray Inc.Univ. of Edinburgh29

9728Altix 4700 1.6 GHzSGILeibniz27

10240BladeCenter JS21 ClusterIBMBSC26

6720Power 575IBMRZG/Max-Planck19

8320Power 575IBMECMWF18

32768Blue Gene/P SolutionIBMEDF R&D13

10240SGI Altix ICE 8200EXSGITotal10

40960Blue Gene/P SolutionIBMIDRIS9

65536Blue Gene/P SolutionIBMFZJ6

ProcComputerManuf.SiteRank


CEA/DEN tools in this landscape

• Hardware:

– Production:• Vector machine (NEC SX8+ - CCRT)

• NOW (Dep.)• SMP and cluster of SMPs (Dep. + BULL Novascale CCRT)

– Experimentation• MPP (Blue Gene)

• Accelerators (Cell, GPU – CCRT)

• Software:

– MPI, PVM, OpenMP Machine NEC2 TflopsMémoire 0.9ToDisques 40 To

Système de stockage

1 Gb/s

utilisateursutilisateurs

Cluster HP Opteron2.4 TflopsMémoire 0,8 ToDisques 12 To

Niveau 1 : 520To disques Sata DDN

Serveur NFS

6 Liens 10 Gb/s

Backbone CCRT10 Gbits/s

2*SGI Altix 450+ DMF

Niveau 2 : robotique STK (9940, T10000)

8 Liens 10 Gb/s

Cluster BULL47.7 Tflops Mémoire 23.2 ToDisques 420 To

8 Liens 1Gb/s

4 Liens 1 Gb/s

Machine NEC2 TflopsMémoire 0.9ToDisques 40 To

Système de stockage

1 Gb/s

utilisateursutilisateurs

Cluster HP Opteron2.4 TflopsMémoire 0,8 ToDisques 12 To

Niveau 1 : 520To disques Sata DDN

Serveur NFS

6 Liens 10 Gb/s

Backbone CCRT10 Gbits/s

2*SGI Altix 450+ DMF

Niveau 2 : robotique STK (9940, T10000)

8 Liens 10 Gb/s

Cluster BULL47.7 Tflops Mémoire 23.2 ToDisques 420 To

8 Liens 1Gb/s

4 Liens 1 Gb/s

CCRT

2008: 48TFlops (CCRT)+ 2TFlops (2008: 48TFlops (CCRT)+ 2TFlops (DepDep.).)2009: +103TFlops + 192 2009: +103TFlops + 192 TFlopsTFlops (GPU) (GPU) -- CCRTCCRT

End 2009: 350TFlopsEnd 2009: 350TFlops

End 2010: 1PFlops (PRACE)End 2010: 1PFlops (PRACE)


CEA/DEN tools in this landscape

RoadRunnerRoadRunner: 1st 1PFlops : 1st 1PFlops sustainedsustained –– JuneJune 20082008

IBM IBM HybridHybrid architecture architecture –– OpteronOpteron+CELL BE+CELL BE

�� NextNext stepstep EXAFLOPSEXAFLOPS


0s 3 years 50 years 10 6 years time

Core physics, criticality, radiation shielding, Instrumentation

Application fields and challenges

. Particles propagation– neutrons, …-

. Isotopic depletion

. Energy range: 0. – 20 MeV

. Domain size: �� 30 000 m3 ≡≡≡≡ 3. 1010 cm3

radioactivity α, β, γ, n,

In core stayPool storage

Fuel transport, refining, production

Dismantling Geological warehousing


HPC For reactor physics: simulation targets

• To provide simulation capabilities in order to:– To simulate specific situations for safety reasons

•• Sensibility calculations, Transient simulations Sensibility calculations, Transient simulations – Allow fast design of complex reactors

•• Design simulationsDesign simulations– Validate and qualify models and calculation routes

when experiments are not possible or to expensive

•• Reference simulationsReference simulations– Improve reactor management

•• Real time simulationReal time simulation

• These simulations have to be the most realistic as possible and this implies use of very fine 3D 3D modelingmodeling .

New reactor generation New reactor generation �� new approach for design and safety new approach for design and safety �� new code generationnew code generation

With controlled uncertaintiesWith controlled uncertainties


The different levels of parallelism

• 3 levels– Level 1: multi parameters (distributed computing)

• multi parameters assemblies calculations

– Level 2: multi domains (massive parallel computing)

• σ, Σ• Depletion, Neutronic Feedback,

• DDM for flux computation

– Level 3: flux computation on a sub-domain (SMP)

• Long term:– Coupling spatial domain decomposition with parallel particle models– Adaptation to physical phenomena


Some existing references (SERMA)

• Solvers parallelization:– MINOS Parallelization and depletion solver in CRONOS2 :

• MINOS : OpenMP between 60% (4 threads) and 80% (2 threads) efficiency

• Micro depletion: MPI, > 90% (CRONOS2.8)•• «« Application of multiApplication of multi--threaded computing and domain decomposition to 3D threaded computing and domain decomposition to 3D

neutronicsneutronics FEM code CRONOSFEM code CRONOS »», J. Ragusa, SNA 2003, J. Ragusa, SNA 2003

•• «« Implementation of multithreaded computing in the MINOS Implementation of multithreaded computing in the MINOS neutronicsneutronics solver solver of the CRONOS codeof the CRONOS code »», J. Ragusa, Nuclear Mathematical and Computational , J. Ragusa, Nuclear Mathematical and Computational Sciences Sciences –– 20032003

– CP solver parallelization in APOLLO2 :

• Coupling APOLLO2 C90 – CP solver // T3e – Sp = 17 on 32 proc (Ts on C90)

•• «« Advanced plutonium assembly parallel calculations using the APOLAdvanced plutonium assembly parallel calculations using the APOLLO2 LO2 codecode »», Z. , Z. StankovskiStankovski, A. , A. PuillPuill, L. , L. DullierDullier, M&C, M&C’’9797

• 2 levels parallelism:

– On energy groups using PVM

– In one energy group, trajectories parallelization using OpenMP•• «« ArlequinArlequin: an integrated java application: an integrated java application »», Z. , Z. StankovskiStankovski, Java Grande, Java Grande’’2001 2001


T1: Multi-parameters calculations and sensibilities

• Parameterization of nuclear data with APOLLO2 (EDF, AREVA): – All the computations are independent and are achieved in //– 50 000 computation 2D / assembly level– Example:

• Lib. UO2: 160h with 1P, from 12 to 20h with 13 P.• Lib. Gado : 256h with 1P, 37h with 13 P•• Note SERMA/LENR/RT/02Note SERMA/LENR/RT/02--3123/A, J. Ragusa3123/A, J. Ragusa

• Nuclear fuel optimization using genetic algorithms: – APOLLO3 diffusion solver: MINOS – 200 000 2D core calculations in 30 minutes using 30

processors–– Note SERMA/LLPR/RT/07Note SERMA/LLPR/RT/07--4332/A, G. Arnaud et al. 4332/A, G. Arnaud et al.

• Simulation of incidental or accidental configuration of power plants.

– CRONOS2 - FLICA4 with sensibilities analyses (10-4s time step)

– Already effective 3D power map - RIA simulation (CRONOS-2/FLICA-4 codes )


•Parallel JHR simulation using APOLLO3 :Domain decomposition method

0

200

400

600

800

1000

1200

1400

1600

1 2 4 6 8 9 12 16 20 25 30 36

P

sec.

0

20

40

60

80

100

120

140

160

E (

%)

Time

Eff

Domain decomposition �� acceleration method

T2: Design simulation

Eff. > 100%

P. Guérin, A.M. Baudron, J.J. Lautard.


T3: Reference simulations• In order to validate the different models and calculation routes, one needs reference calculations which can be achieved using Monte-Carlo simulations and deterministic simulations. • The aim is to simulate a whole cycle of fuel depletion at a pin core level

GT-MHR : Heterogeneous 2D MOCDepletion Calculation

(APOLLO-2 code)

Vessel aging using MC simulation (TRIPOLI-4 code)

• 3D heterogeneous transport solver.

• Flops=1.5.10E18

• Target: results in 1 week �

25 25 TFlops/sTFlops/s peakpeak

• MC simulation + depletion solver.

• Flops=4.10E18

• Target: results in 1 week �

60 60 TFlops/sTFlops/s peakpeak

2D VVER full MOC calculation

(APOLLO-2 code)


T4: Real time neutronics

• Objectives– Simulator: real time simulation of the reactor behavior

(training, optimization)

– Real time control: Embedded calculator in order to control the reactor behavior

• Mainly simplified models in order to reach real time

– � Try to have more accurate modeling (3D, TH feedbacks, …)

• Computing time: TARGET 1s time/step

• Industrial scheme

– Homogenized description, Diffusion, 2g, Simp. TH/TM, depl.

• � 30s / time step• With moderated parallelism < 5s (8 cores)

• Fine modeling

– Pin/pin description, SPn,Multi-groups, Simp. TH/TM, depl.

• � 3 000s / time step � >2 TFlops

APOLLO3


T4: Real time neutronics

• Real time simulation neutronics cannot be achieved using massive parallel computing:

– Either for simulator or industry, need to have “reachable” architecture

• How to obtain real time simulation without using >>100 processors ?

• Use of the other side of HPC: obtain the best performance as possible of limited hardware:

– Overcome 20% of peak performance of standard processor– Maximum use of “limited” parallelism (around 10 proc.)– Exploit high performance processing units (Cell, GPU, FPGA)

• Fine modeling � >2 TFlops• Cell processor 200 GFlops/s � 1 rack over 10 TFlops/s

• Tesla: 500 GFlops/s – Server over 4TFlops/s

• It is possible to reach real time even for fine modeling using a “moderate” architecture

x10 x10 accelerationacceleration –– Power Power methodmethod


HPC �� Parallelization of TRIPOLI-4

. Principle of parallelization implemented in

TRIPOLI-4 :

� a monitor : it controls all processes

� a scorer : it collects all results

� several simulators : they simulate

particles trajectories

HPC in Monte Carlo simulations

. Natural parallelization based statistical

independence of trajectories

. Massive parallel machine

. Distributed computer configuration

Monitor Scorer Simulators

controls

all tasksinitialization

of the collector

initialization

of the simulators

calculation

of one batch

calculation

of one batch

mean calculation

on batches

mean calculation

on batches

edition of results

results

results

Monitor Scorer Simulators

controls

all tasksinitialization

of the collector

initialization

of the simulators

calculation

of one batch

calculation

of one batch

mean calculation

on batches

mean calculation

on batches

edition of results

results

results


« 40000 pins »

- Neutron effective multiplication k eff- Power of each assemby


HPC �� Full PWR core calculation withTRIPOLI-4


keff and power of each assemblyconvergence study : calculation of variancetaking into account batch correlations

� 1,3 billions of neutron histories� 20 processors of 3 Ghz� ~ 24 hours� precision : k eff ~ 3 pcm ;

power à ~ 0,1 %

E. DumonteilF.X. Hugot


HPC �� Full PWR core calculation withTRIPOLI-4


R23R26

R2

R1

R3

R7

R5

R4

R8R6 R9

R10R11

R12

R13

R14

R15

R16

R17

R18R19

R20

R21R22

R24R25

R27R28

R29

R30

senspositif

R1-1

R1-10

R23R26

R2

R1

R3

R7

R5

R4

R8R6 R9

R10R11

R12

R13

R14

R15

R16

R17

R18R19

R20

R21R22

R24R25

R27R28

R29

R30

senspositif

R1-1

R1-10

core

pressure vessel

concrete

barrel

water

water

core

pressure vessel

concrete

barrel

water

water

� 150 millions to 100 billions of neutron histories� 20 to 100 processors� ~ 15 days� precision : 0.1 to some %

. Neutron fluence and ex-core chamber responsesversus the assembly loading . FLUOLE experiment

in EOLE reactor

O. Petit, S. Bourganel

M. Chiron, G. Campionni


HPC �� deep penetration calculation withTRIPOLI-4


Number of processorsE

ffici

ency

�� decreasing of communications between processors

�� increasing of virtual memory needs : . Nuclear data. Post-processing of the simulation features

�� increasing the number of files opened

F.X. Hugot

TRIPOLI-4 calculationswith 1000 processors :

number of histories x number of collisions xnumber of parameters xnumber of octets/parameter :

1011 x 10 x 20 x 8 = 160 TBytes


HPC �� TRIPOLI-4 adaptations needed

Example of size needed to store the simulation

information related to the particles :


Geometrical and material aspects :

- TRIPOLI-4 / SALOME : REPLICA benchmark device - J.C. Trama

- TRIPOLI-4 / ROOTPWR - E. Dumonteil

. Calculation ressources are also neededfor pre-processing and post-processing

ROOT post-processing neutron histories in OMEGA fusion device– E. Dumonteil



Conclusion

• HPC for core physics

– Parametric studies � Decrease margin by decreasing uncertainties– Finer physics, “Numerical Nuclear Reactor” � Design optimization

– “Generalized” MC approach � Improve safety

• Classes of HPC

– Distributed computing: (X x 100P) � Parametric studies– Extensive computing (>> 1000P) � Reference calculations

– Intensive computing (“TFlops in a box”) � Real time neutronics


Conclusion

• R&D and training– Thesis

• Efficient use of accelerators and prog. Model• …

– Participation to working groups on HPC related to nuclear reactor applications

– Organization of Summer school on HPC and Petascale– Collaboration with other institutes and countries:

• PAAP France-Japan / ANL / CEA/DEN – JAEA

• Challenges– Uncertainties; mesh refinement; sensitivity analysis ... �

management tools– Efficiency vs portability– Efficient use of computing resources (intensive and extensive)– Fault tolerance– Training