Parallelized Computing Slides

7/25/2019 Parallelized Computing Slides

1/32

Adaptive Mesh Applications

Sathish Vadhiyar

Sources:

- Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning ofAdaptive Meshes. !D" #$$% &Taken verbatim'


2/32

Adaptive Applications

Highly adaptive and irregular applications Amount of work per task can vary drastically

throughout the execution similar to earlierapplications, but..

Has notions of interesting regions omputations in the interesting regions of

the domain larger than for other regions !t is difficult to predict which regions will

become interesting


3/32

AMR Applications

An example of such applications is "arallelAdaptive #esh $efinement %A#$& formulti'scale applications

Adaptive #esh #esh or grid si(e is notfixed as in )aplace*+acobi, but interestingregions are refined to form finer level

grids*mesh .g.- to study crack growth through a

macroscopic structure under stress


4/32

AMR Applications ( "rac)propagation

uch a system is sub/ect to the lawsof plasticity and elasticity and can be

solved using finite element method rack growth forces the geometry of

the domain to change

0his in turn necessitates locali(edremeshing


5/32

AMR Applications- Adaptivity

Adaptivity arises when advances crossesfrom one subdomain to another

!t is unknown in advance when or where thecrack growth will take place and whichsubdomains will be affected

0he computational complexity of a

subdomain can increase dramatically due togreater levels of mesh refinement

1ifficult to predict future workloads


6/32

Repartitioning

!n adaptive meshes computation, areasof the mesh are selectively refined orderefined in order to accurately modelthe dynamic computation

Hence, repartitioning and redistributingthe adapted mesh across processors is

necessary


7/32

Repartitioning

0he challenge is to keep therepartitioning cost to minimum limits

imilar problems to #1, 2o) 0he primary difference in A#$ is that

loads can drastically change3 cannotpredict3 will have to wait for refinement,then repartition


8/32

Structure of !arallel AMR


9/32

Repartitioning

4 methods for creating a new partitioningfrom an already distributed mesh that hasbecome load imbalanced due to meshrefinement and coarsening

Scratch-remapschemes create anentirely new partition

Diffusive schemes attempt to tweak theexisting partition to achieve better loadbalance, often minimi(ing migration costs


10/32

*raph Representation of Mesh

5or irregular mesh applications, thecomputations associated with a mesh canbe represented as a graph

6ertices represent the grid cells3 vertexweights represent the amount ofcomputations associated with the grid cells

dges represent the communicationbetween the grid cells3 edge weightsrepresent the amount of interactions


11/32

*raph Representation of Mesh

0he ob/ective is to partition across "processors ach partition has e7ual amount of vertex

weight 0otal weight of the edges cut by the

partition is minimi(ed


12/32

Scratch-map Method

"artitioning from scratch will result inhigh vertex migration since as thepartitioning does not take the initiallocation of the vertices into account

Hence a partitioning method shouldincrementally construct a new partition

as simply a modification of the inputpartition


13/32

+otations

)et 8%7& be the set of vertices withpartition 7

9eight of any partition 7 can be defined as-

Average partition weight-

A graph is imbalanced if it is partitioned,and-


14/32

erms

A partition is over'balanced if its weightis greater than the average partitionweight times %:; &

!f less, under'balanced 0he graph is balanced when no partition

is over'balanced $epartitioning existing partition used

as an input to form a new partition


15/32

erms

A vertex is clean if its current partitionis its initial partition3 else dirty

8order vertex ad/acent vertex inanother partition3 those partitions areneighbor partitions

0otal6 sum of the si(es of the verticeswhich change partitions3 i.e., sum of thesi(es of the dirty vertices


16/32

/0ectives

Maintain /alance /et1een partitions

Minimi2e edge cuts

Minimi2e otalV


17/32

Different Schemes

Repartitioning from scratch

Cut-and-pasterepartitioning: e3cessvertices in an over/alanced partitionare simply s1apped into one or moreunder/alanced partitions in order to/ring these partitions up to /alance

he method can optimi2e A4V, /utcan have a negative effect on the edge-cut


18/32

Different Schemes

Another method is analogous to diffusion oncept is for vertices to move from

overbalanced to neighboringunderbalanced partitions


19/32

53ample&Assuming edge and verte3 1eights as e6ual to #'


20/32

53ample &contd..'


21/32

Analysis of the schemes

0hus, cut'and'paste repartitioningminimi(es 0otal6, while completelyignoring edge'cut

"artitioning the graph from the scratchminimi(es edge'cut, while resulting inhigh 0otal6

1iffusion attempts to keep both 0otal6and edge'cut low


22/32

Space 7illing "urves for !artitioning and 4oad 8alancing


23/32

Space 7illing "urves

0he underlying idea is to map amultidimensional space to one dimensionwhere the partitioning is trivial

0here are many different ways 8ut a mapping for partitioning algorithms

should preserve the proximity information

present in the multidimensional space tominimi(e communication costs


24/32

Space 7illing "urve

pace filling curves are 7uick to run, can beimplemented in parallel, and produce good loadbalancing with locality

A space'filling curve is formed over grid*meshcells by using the centroid of the cells torepresent them

0he 5 produces a linear ordering of the cellssuch that cells that are close together in alinear ordering are also close together in thehigher dimensional space


25/32

Space 7illing "urve

0he curve is then broken into segments basedon the weights of the cells %weights computedusing si(e and number of particles&

0he segments are distributed to processors3thus cells that are close together in space areassigned to the same processor

0his reduces overall amount of communication

that occur, i.e., increases locality


26/32

S7" representation


27/32

9-curve or Morton ordering

0he curve for a 4=k x 4=k grid composed of four4=%k':& x 4=%k':& curves, one in each 7uadrant ofthe 4=k x 4=k grid


28/32

*raycode "urve

>ses same interleaving function as ?'curve

8ut visits points in the graycode order 2raycode two successive values differ

in only one bit

0he one'bit gray code is %@,:&


29/32

*raycode "urve

0he gray code list for n bits can begenerated recursively using n': bits 8y reflecting the list %reversing the list&

:,@B oncatenating original with the reflected

@,:,:,@B

"refixing entries in the original list with @,and prefixing entries in the reflected listwith : @@,@:,::,:@B


30/32

*raycode "urve

C'bit gray code-@@@,@@:,@::,@:@,::@,:::,:@:,:@@


31/32

il/ert "urve

Hilbert curve is a smooth curvethat avoids the sudden /umps in ?'curve and graycode curve

urve composed of four curves ofprevious resolution in four7uadrants

urve in the lower left 7uadrant

rotated clockwise by D@ degree,and curve in lower right 7uadrantrotated anticlockwise by D@ degree


32/32

S7"s for AMR

All these curve based partitioningtechni7ues can also be applied foradaptive mesh by forming hierarchical5s

Documents

Parallelized Computing Slides