11
Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science Department of Biological Sciences University of Southern California Email: [email protected]

Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

Load Balancing

Aiichiro NakanoCollaboratory for Advanced Computing & Simulations

Department of Computer ScienceDepartment of Physics & Astronomy

Department of Chemical Engineering & Materials ScienceDepartment of Biological Sciences University of Southern California

Email: [email protected]

Page 2: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

Load Balancing• Goal: Keep all processors equally busy while minimizing inter-

processor communication for irregular parallel computations • Issues:- Spatial data vs. generic graph- Static vs. adaptive- Incremental vs. non-incremental

• Load-balancing schemes:- Recursive bisection- Spectral method- Spacefilling curve- Curved space- Load diffusion

Page 3: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

Data Locality in Parallelization

Irregulardata-structures/processor-speed

Parallelcomputer

Map

Challenge: Load balancing for irregular data structures

E = tcomp max p {i | ri ∈ p}( ) + tcomm max p {i | ri −∂p < rc}( )+tlatency max p Nmessage(p)[ ]( )

Optimization problem:• Minimize the load-imbalance cost• Minimize the communication cost• Topology-preserving spatial decomposition → structured 6-step message passing minimizes latency

Page 4: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

Computational-Space DecompositionTopology-preserving “computational-space”decomposition in curved space

Curvilinear coordinate transformationξ = x + u(x)

Regular mesh topology in computational space, ξ

Curved partition in physical space, x

p ξi( ) = px ξix( )PyPz + py ξiy( )Pz + pz ξiz( )pα ξiα( ) = ξiαPα /Lα⎣ ⎦ α = x, y,z( )

⎧ ⎨ ⎩ ⎪

Particle-processor mapping: regular 3D mesh topology

A. Nakano & T. J. Campbell, Parallel Comput. 23, 1461 (’97)

Page 5: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

Wavelet-based Adaptive Load Balancing

1000

2000

3000

10 -3 10 -1 10 1 10 3

Plane wave

Wavelet

Tot

al c

ost

CPU time (min)

ξ(x) = x+ dlmψlm (x)l,m∑

•Simulated annealing to minimize the load-imbalance & communication costs, E[ξ(x)]

•Wavelet representation speeds up the optimization

A. Nakano, Concurrency: Practice and Experience 11, 343 (’99)

Page 6: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

Load Balancing as Graph Partitioning

www.cs.berkeley.edu/~demmel/cs267_Spr16!Prof. James Demmel (UC Berkeley)!

• Need: Decompose tasks without spatial indices• Graph partitioning: Given a graph G = (N, E, WN, WE)-N: node set = {j | tasks}-WN: node weights = {wN(j): task costs} -E: edge set = {(j,k) | messages from j to k}-WE: edge weights = {wE(j,k): message sizes} choose a partition N = N1 ∪ N2 ∪ … ∪ NP to minimize-maxp{∑j∈NpwN(j)}-max(p,q){∑j∈Np,k∈NqwE(j,k)}

• Graph bisection: Special case of N = N1 ∪ N2

• Choosing optimal partitioning is known to be NP-complete → need heuristics

1 (2) 2 (2) 3 (1)

4 (3)

5 (1)

6 (2) 7 (3)

8 (1)

4

6

1 2

1

2 1 2 3

5

58 cut edges

Page 7: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

Spectral Bisection: Motivation1. Graph as point masses connected via harmonic springs2. The node of the eigenvector of the Hessian matrix, ∂2V/∂x2, corresponding

to the 2nd smallest eigenvalue separates the graph into 2

1D example!

1st eigenvector!

2nd eigenvector!

3rd eigenvector!

+

+

+ +

2D example!

2nd!eigenvector!

Partitioned!half!

circled!

Page 8: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

Spectral BisectionLaplacian matrix: L(G) of a graph G(N,E) is an |N| by |N| symmetric matrix:-L(G)(i,i) = degree of node i (number of incident edges)-L(G)(i,j) = -1 if i ≠ j and there is an edge (i,j)-L(G)(i,j) = 0 otherwise

Theorems:1. The eigenvalues of L(G) are nonnegative: λ1 = 0 ≤ λ2 ≤ ••• ≤ λN)

2. λ2(L(G)) ≠ 0 if and only if G is connected

Spectral bisection algorithm:1. Compute eigenvector v2 corresponding to λ2(L(G))2. For each node i of G a. if v2(i) < 0, put node i in partition N- b. else put node i in partition N+

Example!

1 2 3 4 51 1 −12 −1 2 −13 −1 2 −14 −1 2 −15 −1 1

Page 9: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

O(N) λ2 ComputationLanczos algorithm:• Given an N×N symmetric matrix A (e.g., L(G)), compute a

K×K “approximation” T by performing K matrix-vector products, where K << N

• Approximate A’s eigenvalues & eigenvectors using T’s

Choose an arbitrary starting vector r!b(0) = ||r||!j=0!repeat! j=j+1! q(j) = r/b(j-1) ! r = A*q(j)! r = r - b(j-1)*v(j-1)! a(j) = v(j)T * r! r = r - a(j)*v(j)! b(j) = ||r||!until convergence!

T =

a1 b1b1 a2 b2! ! !

bK−2 aK−1 bK−1bK−1 aK

⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥

Page 10: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

Multilevel PartitioningRecursively apply:1. Replace G(N,E) by a coarse approximation Gc(Nc,Ec), & partition Gc2. Use partition of Gc to obtain a rough partitioning of G, then uncoarsen & iteratively improve it

(N+,N-) = Multilevel_Partition(N,E)!// returns N+ and N- where N = N+ ∪ N-! if |N| is small!1 Partition G = (N,E) directly to get N = N+ ∪ N-! Return (N+,N-)! else!2 Coarsen G to get an approximation Gc = (Nc,Ec)!3 (Nc+,Nc-) = Multilevel_Partition(Nc,Ec)!4 Expand (Nc+,Nc-) to a partition (N+,N-) of N!5 Improve the partition (N+,N-)! Return (N+,N-)! endif!

(2,3)

(2,3)

(2,3)

(1)

(4)

(4)

(4)

(5)

(5)

(5) Coarsening! Multilevel!

V-cycle!

Page 11: Load Balancing - University of Southern Californiacacs.usc.edu/education/phys516/LoadBalance.pdf · Load Balancing Aiichiro Nakano Collaboratory for Advanced Computing & Simulations

An Extra LessonContinuous optimization is easier than discrete combinatorial optimization

cf.• Linear combination of atomic potentials (LCAP) M. Wang et al., J. Amer. Chem. Soc. 128, 3228 (’06) • Gradient-directed Monte Carlo (DGMC) X. Hu, J. Chem. Phys. 129, 064102 (’08)

v(! r ) = bA

! R vA

! R (! r )

! R ,A∑LCAP: