28
UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien and Janine Taylor Lawrence Livermore National Laboratory Joint Russian-American Five-Laboratory Conference on Computational Mathematics and Physics 19 – 23 June 2005 Vienna, Austria Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, CA 94551 30 June 2005 Page 1 of 28

UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

Embed Size (px)

Citation preview

Page 1: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Dynamic Load Balancing of ParallelMonte Carlo Transport Calculations

Richard Procassini, Matthew O'Brien and Janine TaylorLawrence Livermore National Laboratory

Joint Russian-American Five-Laboratory Conference on Computational Mathematics and Physics

19 – 23 June 2005Vienna, Austria

Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, CA 94551

30 June 2005 Page 1 of 28

Page 2: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

● Description of the MERCURY Monte Carlo Code

● The MERCURY Parallel Programming Model

● Load Imbalance in Parallel Monte Carlo Calculations

● The MERCURY Load Balancing Algorithm

● The Results of Dynamic Load Balancing in MERCURY

● Summary and Conclusions

30 June 2005 Page 2 of 28

Outline

Page 3: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

● The main physics capabilities of the MERCURY Monte Carlo transport code include:

Time dependent transport of several types of particles through a medium:

➔ Neutrons n

➔ Gammas

➔ Light charged ions 1H , 2H , 3 H , 3 He , 4 He

Particle tracking through a wide variety of problem geometries:

➔ 1-D spherical (radial) meshes

➔ 2-D r-z structured and quadrilateral unstructured meshes

➔ 3-D Cartesian structured and tetrahedral unstructured meshes

➔ 3-D combinatorial geometry

Multigroup and continuous energy treatment of cross sections

Population control can be applied to all types of particles

Static k eff and eigenvalue “settle” calculations for neutrons

Dynamic calculations for all types of particles

30 June 2005 Page 3 of 28

Description of the MERCURY Monte Carlo Code

Page 4: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

● Main physics capabilities of MERCURY (continued):

All types of particles can interact with the medium via collisions, resulting in:

➔ Deposition of energy

➔ Depletion and accretion of isotopes resulting from nuclear reactions

➔ Deposition of momentum (to be added)

Support for sources is currently limited to:

➔ External monoenergetic, fission-spectrum or file-based sources

➔ Zonal-based reaction sources

● Near term enhancements of MERCURY will include:

Generalization of the current source capabilities

Generalization of the current tally capabilities, and addition of event history support

Post-processing of tallies will be provided by the CALORIS code

Addition of several variance reduction methods

30 June 2005 Page 4 of 28

Description of the MERCURY Monte Carlo Code

Page 5: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

● The MERCURY Monte Carlo particle transport code employs a hybrid approach to

parallelism:

Domain Decomposition: The problem geometry or mesh is spatially partitioned into

domains. Individual processors are then assigned to work on specific domains.

This form of spatial parallelism allows MERCURY to track particles through large,

multidimensional meshes/geometries.

Domain Replication: The easiest way to parallelize a Monte Carlo transport code is

to store the geometry information redundantly on each of the processors. Individual

processors are then assigned to work on a different set of particles. This form of

particle parallelism allows MERCURY to track a large number of particles. The

number of processors assigned to work on a domain is known as the replication

level of the domain.

Many problems are so large that particle parallelism alone is not sufficient. For

these cases, a combination of both spatial and particle parallelism is employed to

achieve a scalable parallel solution. 30 June 2005 Page 5 of 28

The MERCURY Parallel Programming Model

Page 6: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Domain Decomposition (Spatial Parallelism)

30 June 2005 Page 6 of 28

The MERCURY Parallel Programming Model

Page 7: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Domain Replication (Particle Parallelism)

30 June 2005 Page 7 of 28

The MERCURY Parallel Programming Model

Page 8: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Domain Decomposition and Domain Replication

(Spatial and Particle Parallelism)

30 June 2005 Page 8 of 28

The MERCURY Parallel Programming Model

Page 9: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

● An important feature of Monte Carlo transport codes is that particles migrate in both

space and time between different regions of a problem in response to the physics.

● A natural consequence of domain decomposition is that the amount of computational

work will vary from domain to domain.

● Many Monte Carlo algorithms require that one phase of the calculation (cycle, itera-

tion, etc.) must be completed by all processors before the next phase can commence.

● If one processor has more work than the other processors, the less-loaded processors

must wait for the most-loaded processor to complete its work before proceeding.

● The result is particle-induced load imbalance.

● Particle-induced load imbalance can dramatically reduce the parallel efficiency of the

calculation, where the efficiency is defined as:

= W pW p

= 1N p ∑p=1

N p [W p]

max p = 1N p [W p]

30 June 2005 Page 9 of 28

Load Imbalance in Parallel Monte Carlo Calculations

Page 10: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Particle-Induced Load Imbalance

Double-Density Godiva Static Calculation

30 June 2005 Page 10 of 28

Load Imbalance in Parallel Monte Carlo Calculations

(a) (b) (c)Problem Geometry Uniform: 60% Efficient Variable: 91% Efficient

4-way Spatial Parallelism (Black lines represent domain boundaries).Static Replication of 4 domains on 16 processors (Domain replication level is indicated).

Pseudocolor plots represent neutron number density per cell.

U

Air 4 4

4 4

5 2

7 2

Page 11: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Particle-Induced Load Imbalance

Double-Density Godiva Static Calculation

30 June 2005 Page 11 of 28

Load Imbalance in Parallel Monte Carlo Calculations

Pseudocolor plots of neutron number density per cell at six cycles during the calculation.4-way Spatial Parallelism (Black lines represent domain boundaries).

Page 12: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

● In an attempt to reduce the effects of particle-induced load imbalance, a load balanc-

ing algorithm has been developed and implemented in MERCURY.

● This method is applicable for parallel calculations employing both domain decomposi-

tion and domain replication.

● The essence of the method is that the domain replication level is dynamically varied in

accordance with the amount of work (particle segments) on that domain.

● When the replication level of a domain is changed, the number of particles located in

that domain are distributed evenly among the number of processors assigned to work

on the domain. Only the minimum number of particles required to achieve a particle-

balanced domain are communicated.

● The method periodically checks to determine if the load imbalance is severe enough

to warrant the expense of changing the replication levels of the domain, including the

cost of communicating particles between processors.

30 June 2005 Page 12 of 28

The MERCURY Load Balancing Algorithm

Page 13: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

30 June 2005 Page 13 of 28

The MERCURY Load Balancing Algorithm

This is the legend for the diagrams in this figure. The length of the parti-cle bar indicates the number of particles on each processor. Particles within a domain must remain within that domain after load balancing.

Domain 0 Domain 1

Domain 2 Domain 3

Processor

Domain 0 Particles

Domain 1 ParticlesDomain 2 ParticlesDomain 3 Particles

Step 1. The initial particle distribu-tion over processors at the start of a cycle.

Step 2. Domain 2 loses one pro-cessor that is reassigned to Do-main 0. The particles on that pro-cessor must be communicated to the other processors that remain assigned to Domain 2.

Domain 0 Domain 1

Domain 3Domain 2

Domain 0 Domain 1

Domain 3Domain 2

a

b

Page 14: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

30 June 2005 Page 14 of 28

The MERCURY Load Balancing Algorithm

Step 3. After determining which processors are assigned to each domain, each domain can inde-pendently balance its particle load.

Step 4. This communication is necessary to achieve load balance within each domain.

Domain 0 Domain 1

Domain 2 Domain 3

Domain 0 Domain 1

Domain 2 Domain 3

Step 5. The end result of load balancing: the number of proces-sors per domain has been chan-ged so that the maximum number of particles per processor in mini-mized.

Domain 0 Domain 1

Domain 2 Domain 3

c

d

Page 15: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Dynamic, Variable Domain Replication

Double-Density Godiva Static Calculation

30 June 2005 Page 15 of 28

The MERCURY Load Balancing Algorithm

Dynamic Replication of 4 domains on 16 processors.

Processors Per Spatial Domain

0

24

68

1012

14

0 1 3 4 5 7 30

Cycle

Num

ber o

f Pro

cess

ors

Domain 0Domain 1Domain 2Domain 3

Page 16: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

● The efficacy of dynamic load balancing in the context of parallel Monte Carlo particle

transport calculations is tested by running two test problems: one criticality problem

and one sourced problem.

● These problems are chosen because they exhibit substantial particle-induced dynamic

load imbalance during the course of the calculation.

● Each of these problems is time dependent, and the particle distributions also evolve in

space, energy and direction over many cycles.

● Two calculations were made for each of these problems, with the dynamic load bal-

ancing feature either disabled or enabled.

● Parallel calculations were run on the MCR machine, a Linux-cluster parallel computer

with 2-way symmetric multiprocessor (SMP) nodes at LLNL.

30 June 2005 Page 16 of 28

The Results of Dynamic Load Balancing in MERCURY

Page 17: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Criticality Test Problem

● The criticality problem is one of the benchmark critical assemblies compiled in the

International Handbook of Evaluated Criticality Safety Benchmark Experiments.

● The particular critical assembly is a known as HEU-MET-FAST-017: a right-circular

cylindrical system comprised of alternating layers of highly-enriched uranium and

beryllium, with beryllium end reflectors.

● These calculations were run with N p = 2×106 particles, using a “pseudo-dynamic”

algorithm that iterates in time to calculate both the k eff and eigenvalues of the

system.

● This problem was run on a 2-D r − z mesh that was spatially decomposed into 14

domains, axially along the axis of rotation. Parallel calculations were run on 28

processors.

30 June 2005 Page 17 of 28

The Results of Dynamic Load Balancing in MERCURY

Page 18: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Criticality Test Problem

Critical Assembly HEU-MET-FAST-017

30 June 2005 Page 18 of 28

The Results of Dynamic Load Balancing in MERCURY

BerylliumReflector

HighlyEnrichedUranium

SourceCavity

BerylliumModerator

Page 19: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Criticality Test Problem

Critical Assembly HEU-MET-FAST-017

30 June 2005 Page 19 of 28

The Results of Dynamic Load Balancing in MERCURY

Pseudocolor plots of neutron number density per cell at six cycles during the calculation.14-way Spatial Parallelism (Black lines represent domain boundaries).

Cycle2Cycle1

Cycle3 Cycle 4

Cycle8 Cycle15

Page 20: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Criticality Test Problem

Critical Assembly HEU-MET-FAST-017

30 June 2005 Page 20 of 28

The Results of Dynamic Load Balancing in MERCURY

Number of Processors Per Domain

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Domain Number

Num

ber o

f Pro

cess

ors

Cycle 0Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 7Cycle 10Cycle 15

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Page 21: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Criticality Test Problem

Critical Assembly HEU-MET-FAST-017

30 June 2005 Page 21 of 28

The Results of Dynamic Load Balancing in MERCURY

Cycle Run Time vs. Cycle

0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Cycle

Run

Tim

e P

er C

ycle

(sec

onds

)

Load BalancedNot Load Balanced

Parallel Efficiency vs. Cycle

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Cycle

Par

alle

l Effi

cien

cy

Load BalancedNot Load Balanced

Critical Assembly Test Problem Cumulative Run Times

Problem Run Time (sec)Cycle Range Not Load

BalancedLoad Balanced Speedup

1 to 4 102.2 65.0 1.571 to 14 397.1 286.4 1.39

Page 22: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Sourced Test Problem

● The time-dependent sourced problem is a spherized version of a candidate neutron

shield that would surround a fusion reactor.

● The shield consists of alternating layers of stainless steel and borated polyethylene.

● Monoenergetic E = 14.1 MeV particles are sourced into the center of the system from

an isotropic point source.

● During each of the the first 200 cycles, N p' = 1×105 particles are injected into the sys-

tem. The source is then shut off, and the particles continue to flow through the shield

for the next 800 cycles. The size of the time step is t = 1×10−8 sec.

● This problem was run on a 2-D r − z mesh that was spatially decomposed in 4 do-

mains, 2 domains along each of the axes. Parallel calculations were run on 16 pro-

cessors.

30 June 2005 Page 22 of 28

The Results of Dynamic Load Balancing in MERCURY

Page 23: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Sourced Test Problem

Spherical Neutron Shield

30 June 2005 Page 23 of 28

The Results of Dynamic Load Balancing in MERCURY

Air

SS

SS

SS

BP

BP

Page 24: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Sourced Test Problem

Spherical Neutron Shield

30 June 2005 Page 24 of 28

The Results of Dynamic Load Balancing in MERCURY

Particle scatter plots of neutron kinetic energy at six cycles during the calculation.4-way Spatial Parallelism (Black lines represent domain boundaries).

Cycle2 Cycle101 Cycle201

Cycle301 Cycle701 Cycle1001

Page 25: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Sourced Test Problem

Spherical Neutron Shield

30 June 2005 Page 25 of 28

The Results of Dynamic Load Balancing in MERCURY

Number Of Processors Per Domain

0

2

4

6

8

10

12

14

0 200 400 600 800 1000

Cycle

Num

ber O

f Pro

cess

ors

Domain 0Domain 1Domain 2Domain 3

Domain 2 Domain 3

Domain 0 Domain 1

Page 26: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

Sourced Test Problem

Spherical Neutron Shield

30 June 2005 Page 26 of 28

The Results of Dynamic Load Balancing in MERCURY

Cycle Run Time vs. Cycle

0

1

2

3

4

5

6

7

8

1 101 201 301 401 501 601 701 801 901

Cycle

Cyc

le R

un T

ime

(sec

onds

)

LoadBalanced

Not LoadBalanced

Parallel Efficiency vs. Cycle

0

10

20

30

40

50

60

70

80

90

100

1 101 201 301 401 501 601 701 801 901

Cycle

Para

llel E

ffici

ency

LoadBalanced

Not LoadBalanced

Spherical Shield Test Problem Cumulative Run Times

Problem Run Time (sec)Cycle Range Not Load

BalancedLoad Balanced Speedup

1 to 201 1355 615 2.201 to 1001 2221 1404 1.58

Page 27: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

● Particle-induced load imbalance in Monte Carlo transport calculations is a natural con-

sequence of a parallelization scheme which employs spatial domain decomposition.

● The synchronous nature of several Monte Carlo algorithms means that the parallel

performance of the overall calculation is limited by the performance of the processor

with the most loaded domain.

● A load balancing method has been developed and implemented in MERCURY for use

in parallel Monte Carlo calculations which employ both domain decomposition (spatial

parallelism) and domain replication (particle parallelism).

● This method varies the replication level of each domain dynamically in response to the

work (number of particle segments) on that domain.

● When the load balancing method has been enabled during parallel calculations of a

criticality problem and a sourced problem, the parallel efficiency of MERCURY has

been observed to increase by more than a factor of 2.

30 June 2005 Page 27 of 28

Summary and Conclusions

Page 28: UCRL-PRES-212795 Dynamic Load Balancing of Parallel … · UCRL-PRES-212795 Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien

UCRL-PRES-212795

For additional information, please visit our web site:

www.llnl.gov/mercury

AcknowledgmentsThis work was performed under the auspices of the

U. S. Department of Energy by the University of California

Lawrence Livermore National Laboratory

under Contract W-7405-Eng-48.

30 June 2005 Page 28 of 28