View
226
Download
1
Category
Preview:
Citation preview
Partitioning using Mesh Adjacencies
Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes
reasonable time In the case of unstructured meshes, a graph node is represented as a mesh region,
mesh adjacencies define edges Mesh adjacencies are a more complete representation then a standard partition graph All mesh entities can be considered (graph has to decide what defines graph nodes,
information on the adjacencies that define the graph edges lost) Any adjacency obtained in O(1) time, as apposed to having to construct multiple
graphs (assuming use of a complete mesh adjacency structure) Possible advantages
Avoid graph construction (assuming you have needed adjacencies) Account for multiple entity types – important for the solve process - typically the most
computationally expensive step Easy to use with diffusive procedures, but not ideal for “global” balancing
Disadvantage Lack of well developed algorithms for parallel partitioning operations directly from
mesh adjacencies
ParMA: Partition Improvement
Improve scaling of applications by reducing imbalances through exchange of mesh regions between neighboring parts Current algorithm focused on improved scalability of the
solve by accounting for balance of multiple entity typesImbalance is limited to a small number of heavily loaded parts,
referred to as spikes, which limit the scalability of applicationsExample: Reduce the small number of entity imbalance spikes at the
cost of an increase in imbalance in regions which was the entity used as the nodes in the standard graph
Similar approaches can be used to:Improve balance when using multiple parts per process - may be as
good as full rebalance for lower total costImprove balance during mesh adaptation – likely want extensions past
simple diffusive methods
Example of C0, linear shape function finite elements Assembly sensitive to mesh element imbalances Solve sensitive to mesh vertex imbalances since vertices
hold the dof – dominant computation Heaviest loaded part dictates solver performance
Element-based partitioning results in spikes of dofs
ParMA: Application Requirements
element imbalance increased from 2.64% to 4.54%dof imbalance reduced from 14.7% to 4.92%
ParMA: AlgorithmInput:
Types of mesh entities need to be balanced (Rgn, Face, Edge, Vtx) The relative importance (priority) between them (= or >) The balance of entities not specified in the input are not explicitly
improved or preserved Mesh with complete representation and communication,
computation and migration weights for each entity
Algorithm: From high to low priority if separated by “>” (different groups)
From low to high dimensions based on entities topologies if separated by “=” (same group) Compute migration schedule Select regions for migration and migrate
e.g., “Rgn>Face=Edge>Vtx” is the user’s input Step 1: improve balance for mesh regionsStep 2.1: improve balance for mesh edgesStep 2.2: improve balance for mesh facesStep 3: improve balance for mesh vertices
ParMA: Application Defined Partition Criteria
Application defined priority list of entity types such that imbalance of high priority types is not increased when balancing lower priority types Satisfying multiple constraints simultaneously is difficult as
more are added Multi-constraint graph based partitioning methods balance all
constraints equally [Karypis1999, Karypis2003, Aykanat2008] Constraint priorities give flexibility to element migration and
selection procedures that can result in increased partition quality
Quantify balance requirements with application defined weights on mesh entities communication, computation, and data migration
ParMA: Migration Schedule
Coordination needed to migrate elements between parts without ‘stepping on toes’ Ex) Consider three adjacent parts, two of which are heavily loaded, the other
lightly. The two heavily loaded parts migrate elements to the lightly loaded part making it heavily loaded.
Migrate computational load to the correct part Multilevel graph schemes create several partitions before converging to the final partition
– the mesh element migration cost only paid once to create the final partition Apply Hu and Blake’s diffusive solution algorithm to determine low migration
cost migration schedule that balances computational load for a given mesh entity type. [HuBlake]
- Green parts are overweight by 10 - White parts are underweight by 10 - Yellow parts have average weights - The diffusive solution is noted on each edge
Figure 1. Diffusive Solution [Dongarra2002]
ParMA: Region Selection
Vertex:The vertices on inter-part boundaries bounding a small number of regions on source part P0; tips of ‘spikes’Edge: The edges on inter-part boundaries bounding a small number of faces; ‘ridge’ edges with (a) 2 bounding faces, and (b) 3 bounding faces on source part P0Face/Region: Regions which have two or three faces on inter-part boundaries; (a) ‘spike’ region (b) region on a ‘ridge’
Apply KL/FM like greedy heuristic to measure the relative change, or gain, in communication cost if a given mesh element is migrated
Migrate regions that have large ratio of computational cost to migration cost – high ‘bang for the buck’
ParMA: Strong Scaling – 1B Mesh up to 160k Cores
AAA 1B elements: effective partitioning at extreme scale with and without ParMA (uniform weights, iterative migration using simple schedule)
Full system
Without ParMA with ParMA
PModPMod
(see graph)
ParMA: Tests
133M region mesh on 16k parts
Table 1: Users input
Table 2:Balance of partitions
Table 3: Time usage and iterations (tests on Jaguar Cray XT5 system)
Recommended