32
Mul$threaded Graph Coloring Algorithms for Scien$fic Compu$ng on Many‐core Architectures Assefaw Gebremedhin [email protected] Purdue University ICCS Workshop on Manycore and Accelerator‐based High‐Performance Scien$fic Compu$ng Berkeley, January 28, 2011

Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin [email protected] Purdue

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Mul$threaded Graph Coloring Algorithms for 

Scien$fic Compu$ng on Many‐core Architectures Assefaw Gebremedhin [email protected] 

Purdue University 

ICCS Workshop on Manycore and Accelerator‐based  High‐Performance Scien$fic Compu$ng 

Berkeley, January 28, 2011 

Page 2: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

CSCAPES 

www.cscapes.org 

Page 3: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Coloring and its applica$ons   

•  Graph coloring is an abstrac$on for par$$oning a set of binary‐related objects into few “independent sets” 

•  Coloring contributed to the growth of much of Graph Theory  

•  Our work on coloring is mo$vated by its prac$cal applica$ons: 

–  Concurrency discovery in parallel (scien$fic) compu$ng 

–  Sparse deriva$ve matrix computa$on 

–  Scheduling –  Frequency Assignment –  Facility Loca$on –  Register Alloca$on, etc 

Page 4: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Graph coloring in concurrency discovery 

S1

S4

S2

S5

S3

S6

S1

S2

S6

S3

S4

S5

T1 T2 T3Time

•  Adap$ve mesh refinement 

•  Itera$ve methods for sparse linear systems 

•  Full sparse $ling 

Page 5: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Coloring models in deriva$ve computa$on: overview 

Unidirec(onal par((on  Bidirec(onal par((on 

Jacobian  distance‐2 coloring   star bicoloring  Direct 

Hessian  star coloring  NA  Direct 

Jacobian  NA  acyclic bicoloring  Subs(tu(on 

Hessian  acyclic coloring  NA  Subs(tu(on 

4‐step procedure for compu/ng a  sparse deriva/ve matrix A using  Automa/c Differen/a/on: 

•  S1: Determine the sparsity structure of A 

•  S2: Obtain a seed matrix S by coloring the graph of A  

•  S3: Compute a compressed matrix B=AS 

•  S4: Recover entries of A from B 

A  S  B 

m × n  n × p  m × p 

Page 6: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Distance‐2 coloring: an archetypal model in direct methods 

c1

c2

c5

c4

c3

a11

0 0 0 a15

0 a22a23

0 0

0 a32a33a34a35

0 0 a43a44a45

a51

0 a53a54a55

a11a12 0 0 a

15

a21a22 0 0 0

a31 0 0 a

34 0

0 0 a43a44a45

A

Ga

c1

c3

c2

c4

c5

r1

r2

r3

r4

Gb

A

symmetric case 

nonsymmetric case 

structurally orthogonal par$$on 

distance‐2 coloring 

Page 7: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Coloring models in deriva$ve computa$on revisited 

Unidirec(onal par((on  Bidirec(onal par((on 

Jacobian  distance‐2 coloring G, Manne and Pothen (05)  

star bicoloring Coleman and Verma (98) Hossain and Steihaug (98) 

Direct 

Hessian  star coloring Coleman and More (84) 

restricted star coloring* Powell and Toint (79) 

NA  Direct 

Jacobian  NA  acyclic bicoloring Coleman and Verma (98) 

Subs(tu(on 

Hessian  acyclic coloring Coleman and Cai (86) 

triangular coloring* Coleman and More (84) 

NA  Subs(tu(on 

Jacobian:   bipar$te graph Hessian:    adjacency graph 

* Less accurate models 

SIAM Review 47(4):629—705, 2005. 

ColPack 

www.cscapes.org/coloringpage 

Page 8: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

An Example Applica$on 

Page 9: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Principle of Chromatography Desorbent 

(Water, organic solvent, etc) 

Feed  

(Mixture of red and blue components) 

Pump 

hgp://www.cwg.hu/english/r‐wtcomp.html

Packing medium  

(adsorbent par$cles) 

Chromatographic column 

Red component s$cks more strongly to adsorbent par$cles 

Blue component 

Red component 

Figure courtesy of Yoshiaki Kawajiri, GT 

Page 10: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Simulated Moving Bed process •  A psuedo counter‐current process that mimics opera$on of TMB 

•  Reaches only Cyclic Steady State •   Various objec$ves to be maximized could be iden$fied  

E.g: product purity, product recovery,  desorbent consump$on, throughput 

•   We considered throughput maximiza$on 

•  Objec$ve modeled as an op$miza$on problem with PDAEs as constraints 

•  Full discre$za$on was used to solve the PDAEs  sparse Jacobians   

10 

Page 11: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

•  Tested efficacy of the 4‐step procedure:   

•  Used ADOL‐C for steps S1and S3, and ColPack for steps S2 and S4 

•  Observed results for each step matched analy$cal results 

•  Techniques enabled huge savings in run$me 

   Time(Jacobian eval) ≈ 100×Time(func/on eval) 

•  Dense computa$on (without exploi$ng sparsity) was infeasible  

Results on Jacobian computa$on on SMB problem   

sparsity detec$on (S1) 

seed genera$on 

(S2) 

matrix‐vector product (S3)  recovery (S4)   

0 1 2 3 4 50

50

100

150

200

250

m/100000

run

tim

e(t

ask)/

run

tim

e(F

)

S1

S2

S3

S4

total

0 1 2 3 4 50

0.005

0.01

0.015

0.02

m/100000

runtime(F)

G, Pothen and Walther: AD2008. 11 

Page 12: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Complexity and algorithms •  Distance‐k, star, and acyclic coloring are NP‐hard (to even approximate) 

–  Distance‐1 coloring hard to approximate to within n(1‐e) for all e>0  [Zuckerman’07] 

•  A greedy algorithm usually gives good solu$on 

GREEDY(G=(V,E)) Order the ver$ces in V for i = 1 to |V| do    

Determine forbidden colors to vi Assign vi the smallest permissible color [Update collec$on of induced subgraphs] 

end‐for 

•  ColPack has 

–   O(|V|dk)‐$me algorithms for distance‐k coloring  (dk is average degree‐k) 

–  O(|V|d2)‐$me algorithms for star and acyclic coloring 

Key idea: exploit structure of two‐colored induced subgraphs 

12 

Page 13: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Ordering techniques in ColPack: fresh formula$on 

Ordering  Property 

Largest First  for i = 1 to n: vi has largest degree in V \ {v1 ,  v2  , . . . , vi‐1} 

Incidence Degree  for i = 1 to n: vi has largest back degree in V \ {v1 ,  v2  , . . . , vi‐1} 

Dynamic  Largest First 

for i = 1 to n: vi has largest forward degree in V \ {v1 ,  v2  , . . . , vi‐1} 

Smallest Last  for i = n to 1: vi has smallest back degree in V \ {vn ,  vn‐1  , . . . , vi+1} 

Formula$on enables: •  modular imp. •  linear /me imp. •  discovery of use in 

other contexts 

1v vn2v vi

Back degree

Degree

Forward degree

. . .. . . vn!1

13 

Page 14: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Paralleliza$on… 

14 

Page 15: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Challenges in paralleliza$on in general (on contemporary plauorms) 

•  Parallel Architectural Models? – Control mechanism; address space (memory) organiza$on; interconnec$on network; etc 

•  Parallel Programming Models? – Shared memory; distributed memory; massive threading; etc 

•  Parallel Computa$onal Models? – Wish: realis$c yet reasonably simple abstrac$ons  

15 

Page 16: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Challenges in parallelizing graph algorithms 

•  Low available concurrency •  Poor data locality •  Irregular memory access pagern 

•  Access pagern determined only at run$me 

•  High data access to computa$on ra$o 

16 

Page 17: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Parallel Coloring Algorithms 

•  Independent‐set based (previous approaches) –  Find maximal independent set in parallel (Luby’s algorithm)  –  Limited (or no) success 

•  Itera$on and specula$on         Itera(ve Algorithm (G=(V,E))           Order V in parallel 

     U = V      while U is not empty      1. Specula(vely color ver(ces in U in parallel;      2. Check consistency of colors in U in parallel, store conflicts in R; 

          U = R;  

•   Dataflow   –  Fine‐grain (edge‐level) synchroniza$on; no itera$on –  Feasible when there is HW support for FGS (like the Cray XMT) 

17 

Page 18: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Enhancing the Itera$ve Algorithm 

•  Color choice – First Fit – Staggered First Fit – Least Used – Random 

•  Resolving a conflict – Randomiza$on 

18 

Page 19: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Ordering is inherently sequen$al Remedy: approxima$on 

Illustra(on: 

Smallest Last  ordering 

19 

Page 20: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Experimental Results on  Parallel Performance 

20 

Page 21: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Test plauorms 

Cray XMT Sun Niagara 2 Intel Nehalem 

  128 processors   128 hardware thread streams per processor 

  cache‐less, globally accessible  shared memory  

  hardware support for fine‐grain synchroniza$on 

  two 8‐core sockets   8 hardware threads per socket 

  L1 cache on core,  shared L2 cache 

  two quad‐core chips   two hyperthreads per core 

  private L1 and L2 cache,  shared L3 cache 

!"#$%&

'%()*#+%'

,$-+%".

/()0.123/4

5 6 ! 678

9%+:$**+%&5

,$-+%".

/()0.123/4

,$-+%".

/()0.123/4

!"#$%&'()*+#)',-.$/0#)1'2%3*$4'56'!"#$%& '(!)

*+$,(,-./0-.%(&,1223+45(-$(67("#$%&(5.-413-.+$#

;<=':> /?@@$%

,$-+%".A+)'%+BB$%

89(:;.1&(<%$0;.=

!"#$%&

'%()*#+%'

5 6 ! 678

9%+:$**+%&6

;<=':> /?@@$%

,$-+%".A+)'%+BB$%

!"#$%&

'%()*#+%'

5 6 ! 678

9%+:$**+%&!

;<=':> /?@@$%

,$-+%".A+)'%+BB$%

!"#$%&'()*+#)',%-*$.

!"#$%&'/0'1#2"%'34'5#6789

!"#$%&'

($)*%$++"%

4:;'1#2"%'1$*88+#$

< = 0 > ? @ A B

($%",-

./'(012"

< = 0 > ? @ A B

($%",/

./'(012"

< = 0 > ? @ A B

($%",3

./'(012"

!"#$%&'

($)*%$++"%

!"#$%&'

($)*%$++"%

!"#$%&'

($)*%$++"%

!"#$%&'/0'1#2"%'34'5#6789

!"#$%&'

($)*%$++"%

4:;'1#2"%'1$*88+#$

< = 0 > ? @ A B

($%",-

./'(012"

< = 0 > ? @ A B

($%",/

./'(012"

< = 0 > ? @ A B

($%",3

./'(012"

!"#$%&'

($)*%$++"%

!"#$%&'

($)*%$++"%

!"#$%&'

($)*%$++"%

!"#$%&'()*+#)',%-*$.

!"#$%&'/0'1#2"%

!"# !"$

%&'()#

*$+%,-.(

*/+%,-.(

!"# !"$

%&'()$

*$+%,-.(

*/+%,-.(

!"# !"$

%&'()/

*$+%,-.(

*/+%,-.(

!"# !"$

%&'()0

*$+%,-.(

*/+%,-.(

,%-*$.'

1*34$*))%$567

!"#$%&'/0'1#2"%

!"# !"$

%&'()#

*$+%,-.(

*/+%,-.(

!"# !"$

%&'()$

*$+%,-.(

*/+%,-.(

!"# !"$

%&'()/

*$+%,-.(

*/+%,-.(

!"# !"$

%&'()0

*$+%,-.(

*/+%,-.(

567,%-*$.'

1*34$*))%$

21 

Page 22: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Test graphs 

sc : graphs from scien$fic compu$ng apps er : R‐MAT (0.25, 0.25, 0.25, 0.25) g  :  R‐MAT (0.45, 0.15, 0.15, 0.25) b  :  R‐MAT (0.55, 0.15, 0.15, 0.15)  22 

Page 23: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Distance‐2 coloring: # colors 

Nehalem 

23 

Page 24: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Distance‐2 coloring: # colors 

Nehalem 

24 

Page 25: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Distance‐2 coloring: run$me 

Nehalem 

25 

Page 26: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Distance‐2 coloring: run$me 

Nehalem 

26 

Page 27: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Distance‐1 coloring: # colors 

Nehalem, Niagara 2, Cray XMT  27 

Page 28: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Distance‐1 coloring : run$me 

!

"

#

$

!%

&"

%#

!"$

"'%

! " # $ !%

!"#

$%&"'%($)*'+(,

-.#/$0%*1%)*0$(

!()*+,-./01+,

"()*+,-.2/01+,

#()*+,-.2/01+,

$()*+,-.2/01+,

!"#$

!"$

%

#

&

'

%(

)#

(&

%#'

#$(

% # & ' %( )# (& %#'

!"#

$%&"'%($)*'+(,

-.#/$0%*1%20*)$((*0(

*+,-.#&

*+,-.#$

*+,-.#(

*+,-.#/

!"#$%

!"$%

!"%

#

$

&

'

#(

)$

(&

#$'

$%(

# $ & ' #( )$ (& #$'

!"#

$%&"'%($)*'+(,

-.#/$0%*1%20*)$((*0(

*+,-.$&

*+,-.$%

*+,-.$(

*+,-.$/

Small‐world graph with 224 = 16M ver$ces and 134M edges 

Itera$ve  Dataflow 

Niagara 2  Nehalem 

Small‐world graphs with 224, …, 227 ver$ces and 134M, …, 1B edges 

Itera$ve  Itera$ve 

Cray XMT Cray XMT 

28 

Page 29: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Itera$ve: looking inside 

Nehalem, Niagara 2, Cray XMT  29 

Page 30: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

A  “generic” paralleliza$on technique? 

•  “Standard” Par$$oning –  Break up the given problem into p independent subproblems of 

almost equal sizes –  Solve the p subproblems concurrently 

•  “Relaxed” Par$$oning –  Break up the problem into p, not necessarily en$rely 

independent, subproblems of almost equal sizes –  Solve the p subproblems concurrently –  Detect inconsistencies in the solu$ons concurrently –  Resolve any inconsistencies  

Can be used poten/ally successfully if the resolu/on in the fourth step involves only local adjustments  

30 

Page 31: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Thanks 

•  Erik Boman, Doruk Bozdag, Umit Catalyurek, John Feo, Mahantesh Halappanavar, Bruce Hendrickson,             Paul Hovland, Fredrik Manne, Duc Nguyen,            Mostafa Patwary, Alex Pothen, Arijit Tarafdar,          Andrea Walther 

•  Financial Support: DOE, NSF 

31 

Page 32: Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin agebreme@purdue.edu Purdue

Some References •  Gebremedhin, Nguyen, Pothen and Patwary. ColPack: Graph Coloring So{ware for 

Deriva$ve Computa$on and Beyond. ACM Trans. Math. Soaware. Submiged. 2010. 

•  Gebremedhin, Manne and Pothen. What color is your Jacobian? Graph coloring for compu$ng deriva$ves. SIAM Review 47(4):627—705, 2005. 

•  Gebremedhin, Tarafdar, Manne and Pothen. New acyclic and star coloring algorithms with applica$ons to compu$ng Hessians.  SIAM J. Sci. Comput. 29:1042—1072, 2007. 

•  Gebremedhin, Pothen and Walther. Exploi$ng sparsity in Jacobian computa$on via coloring and automa$c differen$a$on: a case study in a Simulated Moving Bed process. AD2008, LNCSE 64:339‐‐‐349,  2008. 

•  Catalyurek, Feo, Gebremedhin, Halappanavar, Pothen. Mul$threaded Algorithms for Graph Coloring.  In submission, 2011. 

•  Bozdag, Catalyurek, Gebremedhin, Manne, Boman and Ozguner. Distributed‐memory parallel algorithms for distance‐2 coloring and related problems in deriva$ve computa$on. SIAM J. Sci. Comput. 32(4):2418‐‐2446, 2010. 

•  Bozdag, Gebremedhin, Manne, Boman and Catalyurek.  A framework for scalable greedy coloring on distributed‐memory parallel computers.  J. Parallel Distrib. Comput. 68(4):515—535, 2008. 

•  For more informa$on: www.cs.purdue.edu/homes/agebreme 32