Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf ·...

Load Balancing

Aiichiro NakanoCollaboratory for Advanced Computing & Simulations

Department of Computer ScienceDepartment of Physics & Astronomy

Department of Chemical Engineering & Materials ScienceDepartment of Biological Sciences University of Southern California

Email: anakano@usc.edu

Load Balancing• Goal: Keep all processors equally busy while minimizing inter-

processor communication for irregular parallel computations • Issues:- Spatial data vs. generic graph- Static vs. adaptive- Incremental vs. non-incremental

• Load-balancing schemes:- Recursive bisection- Spectral method- Spacefilling curve- Curved space- Load diffusion

Data Locality in Parallelization

Irregulardata-structures/processor-speed

Parallelcomputer

Challenge: Load balancing for irregular data structures

E = tcomp max p {i | ri ∈ p}( ) + tcomm max p {i | ri −∂p < rc}( )+tlatency max p Nmessage(p)[ ]( )

Optimization problem:• Minimize the load-imbalance cost• Minimize the communication cost• Topology-preserving spatial decomposition → structured 6-step message passing minimizes latency

Computational-Space DecompositionTopology-preserving “computational-space”decomposition in curved space

Curvilinear coordinate transformationξ = x + u(x)

Regular mesh topology in computational space, ξ

Curved partition in physical space, x

p ξi( ) = px ξix( )PyPz + py ξiy( )Pz + pz ξiz( )pα ξiα( ) = ξiαPα /Lα⎣ ⎦ α = x, y,z( )

⎧ ⎨ ⎩ ⎪

Particle-processor mapping: regular 3D mesh topology

A. Nakano & T. J. Campbell, Parallel Comput. 23, 1461 (’97)

Wavelet-based Adaptive Load Balancing

10 -3 10 -1 10 1 10 3

Plane wave

Wavelet

CPU time (min)

ξ(x) = x+ dlmψlm (x)l,m∑

•Simulated annealing to minimize the load-imbalance & communication costs, E[ξ(x)]

•Wavelet representation speeds up the optimization

A. Nakano, Concurrency: Practice and Experience 11, 343 (’99)

Load Balancing as Graph Partitioning

www.cs.berkeley.edu/~demmel/cs267_Spr16!Prof. James Demmel (UC Berkeley)!

• Need: Decompose tasks without spatial indices• Graph partitioning: Given a graph G = (N, E, WN, WE)-N: node set = {j | tasks}-WN: node weights = {wN(j): task costs} -E: edge set = {(j,k) | messages from j to k}-WE: edge weights = {wE(j,k): message sizes} choose a partition N = N1 ∪ N2 ∪ … ∪ NP to minimize-maxp{∑j∈NpwN(j)}-max(p,q){∑j∈Np,k∈NqwE(j,k)}

• Graph bisection: Special case of N = N1 ∪ N2

• Choosing optimal partitioning is known to be NP-complete → need heuristics

1 (2) 2 (2) 3 (1)

6 (2) 7 (3)

2 1 2 3

58 cut edges

Spectral Bisection: Motivation1. Graph as point masses connected via harmonic springs2. The node of the eigenvector of the Hessian matrix, ∂2V/∂x2, corresponding

to the 2nd smallest eigenvalue separates the graph into 2

1D example!

1st eigenvector!

2nd eigenvector!

3rd eigenvector!

2D example!

2nd!eigenvector!

Partitioned!half!

circled!

Spectral BisectionLaplacian matrix: L(G) of a graph G(N,E) is an |N| by |N| symmetric matrix:-L(G)(i,i) = degree of node i (number of incident edges)-L(G)(i,j) = -1 if i ≠ j and there is an edge (i,j)-L(G)(i,j) = 0 otherwise

Theorems:1. The eigenvalues of L(G) are nonnegative: λ1 = 0 ≤ λ2 ≤ ••• ≤ λN)

2. λ2(L(G)) ≠ 0 if and only if G is connected

Spectral bisection algorithm:1. Compute eigenvector v2 corresponding to λ2(L(G))2. For each node i of G a. if v2(i) < 0, put node i in partition N- b. else put node i in partition N+

Example!

1 2 3 4 51 1 −12 −1 2 −13 −1 2 −14 −1 2 −15 −1 1

O(N) λ2 ComputationLanczos algorithm:• Given an N×N symmetric matrix A (e.g., L(G)), compute a

K×K “approximation” T by performing K matrix-vector products, where K << N

• Approximate A’s eigenvalues & eigenvectors using T’s

Choose an arbitrary starting vector r!b(0) = ||r||!j=0!repeat! j=j+1! q(j) = r/b(j-1) ! r = A*q(j)! r = r - b(j-1)*v(j-1)! a(j) = v(j)T * r! r = r - a(j)*v(j)! b(j) = ||r||!until convergence!

a1 b1b1 a2 b2! ! !

bK−2 aK−1 bK−1bK−1 aK

⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥

Multilevel PartitioningRecursively apply:1. Replace G(N,E) by a coarse approximation Gc(Nc,Ec), & partition Gc2. Use partition of Gc to obtain a rough partitioning of G, then uncoarsen & iteratively improve it

(N+,N-) = Multilevel_Partition(N,E)!// returns N+ and N- where N = N+ ∪ N-! if |N| is small!1 Partition G = (N,E) directly to get N = N+ ∪ N-! Return (N+,N-)! else!2 Coarsen G to get an approximation Gc = (Nc,Ec)!3 (Nc+,Nc-) = Multilevel_Partition(Nc,Ec)!4 Expand (Nc+,Nc-) to a partition (N+,N-) of N!5 Improve the partition (N+,N-)! Return (N+,N-)! endif!

(5) Coarsening! Multilevel!

V-cycle!

An Extra LessonContinuous optimization is easier than discrete combinatorial optimization

cf.• Linear combination of atomic potentials (LCAP) M. Wang et al., J. Amer. Chem. Soc. 128, 3228 (’06) • Gradient-directed Monte Carlo (DGMC) X. Hu, J. Chem. Phys. 129, 064102 (’08)

v(! r ) = bA

! R vA

! R (! r )

! R ,A∑LCAP:

Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf ·...

Documents

Lanczos Method for Eigensystemscacs.usc.edu/education/phys516/Lanczos.pdf · Lanczos Method for Eigensystems! Aiichiro Nakano! Collaboratory for Advanced Computing & Simulations!

New Dynamic Balancing Algorithms Based Upon Couple · PDF fileNEW DYNAMIC BALANCING ALGORITHMS BASED UPON COUPLE FORCE REDUCTION & ... industrial balancing ... Dynamic balancing has

Hybrid MPI+OpenMP Parallel MDcacs.usc.edu/education/cs653/HMD.pdf · Hybrid MPI+OpenMP Parallel MD Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of

DVAD42 –Load Balancing P4 programmable Load Balancing

DYNAMICS OF MACHINERY - Notesvillage of Machinery.pdfBalancing: - Balancing of rotating masses, static balancing and dynamic balancing, ... Estimate the balancing mass that needs to

Soundcard Based Dynamic Balancing Machine for ... - …users.atw.hu/aerotarget/Balancing/Manual.pdf · Soundcard Based Dynamic Balancing Machine for Model Gas ... Balancing of the

Balancing Tool Chain: Balancing and automatic control in

OptionalPrereading - University of Southern Californiacacs.usc.edu/education/cs596/Prereading-Reinders... · tuning for both serial and multithreaded applications. He is the main

Hybrid Particle-Continuum Simulation - University of …cacs.usc.edu/education/cs653/Multiscale.pdfHybrid Particle-Continuum Simulation Aiichiro Nakano Collaboratory for Advanced Computing

BALANCING - Darshan Institute of Engineering & Technology€¦ · BALANCING Course Contents 3.1 Introduction 3.2 Static Balancing 3.3 Types of Balancing 3.4 Balancing of Several Masses

Modern Strategies in Electroorganic Synthesis · Modern Strategies in Electroorganic Synthesis Jun-ichi Yoshida,* Kazuhide Kataoka, Roberto Horcajada, and Aiichiro Nagaki ... and

HVAC TEST & BALANCING EQUIPMENT HVAC TEST & BALANCING

Balancing Machine Shanghai Jianping Dynamic Balancing

On Final Projects - University of Southern Californiacacs.usc.edu/education/phys516/FinalNote.pdf · On Final Projects • What? Anything You like • Written report + presentation

BALANCING VALVES - Kerr Valve & Industrial Supplies Balancing Valve.pdf · BALANCING VALVES BALANCING VALVE The STAD balancing valve delivers accurate hydronic perfomance ... TA,

Balancing and Sequencing Versus Only Balancing in …ijm2c.iauctb.ac.ir/article_521907_d6fed473bc0cf6223d0cd6c74ff229ea.… · Balancing and Sequencing Versus Only Balancing in

BALANCING VALVES - BMS Engineering & Trading …bmsengrg.com.sg/product-services/TA Balancing Valve-STAD.pdf · BALANCING VALVES BALANCING VALVE ... See catalogue leaﬂ et Prefab

Load Balancing - Northern Kentucky University · Load Balancing + Failover . ... • DNS global server load balancing • Layer 4 load balancer ... – Better load balancing algorithms

The Balancing Act: An Example of Line Balancing - Simul8 · PDF fileThe Balancing Act: An Example of Line Balancing ... to cover the 6 manual operations within ... SIMUL8.com The Balancing

B50-B200 Balancing Machines - IRD Balancing & LEXSECO