29
Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful Azad, Lawrence Berkeley Na.onal Laboratory Joint work with Aydın Buluç (LBNL) Support: DOE Office of Science SIAM PP 2016, Paris

Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra

Ariful  Azad,  Lawrence  Berkeley  Na.onal  Laboratory    

Joint  work  with  Aydın  Buluç  (LBNL)    Support:  DOE  Office  of  Science  

SIAM  PP  2016,  Paris  

Page 2: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q  Matching:  A  subset    of  independent  edges,  i.e.,  at  most  one  edge  in  the  matching  is  incident  on  each  vertex.    

q  Maximal  cardinality  matching:  A  matching  where  if  another  edge  is  added  it  is  not  a  matching  anymore.  

q  Maximum  cardinality  matching  (MCM)  has  the  maximum  possible  cardinality    

A  matching  in  a  graph  

x1

x2

x3

y1

y2

y3

Maximal  Cardinality  Matching  Cardinality  =  2    

x1

x2

x3

y1

y2

y3

Maximum  Cardinality  Matching  Cardinality  =  3  

Matched  vertex  

Unmatched  vertex  

Matched  edge  

Unmatched  edge  

Page 3: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Applica.on  of  matching  in  scien.fic  compu.ng  

Bipar&te)Cardinality)Matching) Bipar&te)Weighted)Matching)

Maximum)Cardinality)Matching)(MCM))

Maximum)Weighted)Matching)(MWM))exact/approx.)

Maximum<Weight)Perfect)Matching)(MWPM))

exact/approx.)

Maximal)Cardinality)Matching)(1/2)approx.))

Block)Triangular))Form)(BTF))

Sparse)QR)) Sparse)LU)) Graph))Par&&oning)

KLU)

Alg

orith

ms

App

licat

ions

AMG)Least)square)on)HBS)matrices)

Pre<condi&oning)

HSS<based)mul&frontal)solver)

use  rela.onship  

Page 4: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q  Problem:  Cardinality  matching  in  a  bipar.te  graph  –  Maximum  cardinality  matching  (MCM)  –  Maximal  cardinality  matching  (used  to  ini.alize  MCM)  

q  Algorithm:  Distributed-­‐memory  parallel  algorithms  

q  Approach:  Matrix-­‐algebraic  formula.ons  of  graph  primi.ves.  Inspired  by  Graph  BLAS  (hWp://graphblas.org/).  –  More  discussion  on  Friday  (MS68):  The  GraphBLAS  Effort:  Kernels,  API,  and  Parallel  Implementa.ons  by  Aydin  Buluc.  

q  Covers  two  recent  papers:  –  Maximal  matching:  Azad  and  Buluç,  IEEE  CLUSTER  2015  –  Maximum  matching:  Azad  and  Buluç,  IPDPS  2016  

Scope  of  this  talk  

Page 5: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

MCM  algorithm  based  on  augmen.ng-­‐path  searches  

q  Augmen.ng  path:  A  path  that  alternates  between  matched  and  unmatched  edges  with  unmatched  end  points.    

q  Algorithm:  Search  for  augmen.ng  paths  and  flip  edges  across  the  paths  to  increase  cardinality  of  the  matching.  –  Algorithmic  op?ons:  single  source  or  mul.-­‐source,  breadth-­‐first  search  

(BFS)  or  depth-­‐first  search  (DFS)  

x1

x2

x3

y1

y2

y3

x1

x2

x3

y1

y2

y3

Augment  matching  

Page 6: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Algorithmic  landscape  of  cardinality  matching  Duff,  Kaya  and  Ucar  (ACM  TOMS  2011),  Azad,  Buluç,  Pothen  (TPDS  2016)  

Class   Search  strategy   Serial  Complexity  

Maximum  cardinality  matching    

Single-­‐source  augmen.ng  path  search     DFS  or  BFS   O(nm)  

Mul.-­‐source  augmen.ng  path  search  

DFS  w  lookahead  (Pothen-­‐Fan)   O(nm)  

BFS  (MS-­‐BFS)   O(nm)  

DFS  &  BFS  (Hopcrof-­‐Karp)   O(√nm)  

Push  relabel   Label  guided  FIFO  search     O(nm)  

Maximal  cardinality  matching  

Greedy  Karp-­‐Sipser    

Dynamic  mindegree  Local     O(m)  

MS-­‐BFS:  exposes  more  parallelism    

Hopcrof-­‐Karp:  best  asympto.c  complexity   Ini.alizes  MCM  

Our  focus  

Page 7: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q When  a  graph  does  not  fit  in  the  memory  of  a  node  q  The  graph  is  already  distributed  

–  Example:  sta.c  pivo.ng  in  SuperLU_DIST  (Li  and  Demmel,  2003)  –  The  graph  is  gathered  on  a  single  node  and  MC64  is  used  to  compute  the  matching,  which  is  unscalable  and  expensive    

 

The  need  for  distributed-­‐memory  algorithms  

��

��

��

��

�� ��� ���� ���� �����

���������

���� �� ����� �� ��� �����

Time  to  gather  a  graph  and  scaWer  the  matching  on  2048  cores  of    NERSC/Edison  (Cray  XC30)  

Distributed  algorithms    are  cheaper  and  scalable  

Page 8: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q  Prior  work:  Push-­‐relabel  by  Langguth  et  al.  (2011)  and  Karp-­‐Sipser  on  general  graph  by  Patwary  et  al.  (2010).  –  does  not  scale  beyond  64  processors  

q  Challenge  –   long  paths  passing  through  mul.ple  processors    –   lots  of  fine-­‐grained  asynchronous  communica.on    

q  Here  we  use  graph-­‐matrix  duality  and  design  matching  algorithms  using  scalable  matrix  and  vector  opera.ons.  –  A  handful  of  standard  opera.ons    –  Offers  bulk-­‐synchronous  parallelism  –  Jumping  among  algorithms  is  easier        

Distributed-­‐memory  cardinality  matching  

Page 9: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Two  required  primi.ves  1.  Sparse  matrix-­‐sparse  vector  mul?ply  (SpMSpV)  

1

5

2 3 4

1 5 2 3 4

×

× ×

×

× ×

c1

c2

c3

c4

r1

r2

r3

r4

r5 c5

× × ×

1 2 5

1"

1"

2"

5"

5"

Select columns

Unmatched columns: fc

In each row, retain the minimum product from the selected columns

Vis

ited

row

s: f r

Matrix A Bipartite graph G(R,C,E)

Semiring  Op.on:  (mul.ply,add)                                                                  (select2nd,  min)  

c1 c2

r1 r2 r3 r4 r5

c5

Row  

 Ver.ces  

Column  

 Ver.ces  

1

5

2 3 4

1 5 2 3 4

×

× ×

×

× ×

c1

c2

c3

c4

r1

r2

r3

r4

r5 c5

× × ×

1 2 5

1"

1"

2"

5"

5"

Select columns

Unmatched columns: fc

In each row, retain the minimum product from the selected columns

Vis

ited

row

s: f r

Matrix A Bipartite graph G(R,C,E)

Graph  Opera.on  Traverse    

unvisited  neighbors  Matrix  Opera.on  

SpMSpV  A  matching  

Page 10: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Two  required  primi.ves  2.  Inverted  index  in  a  sparse  vector  

c1 c2

r1 r2 r4

c5

r4

c5

r1

c1

Swap  parents    and  children  Duplicates  removed  Graph  Opera.on  

1.  Keep  unique  child  2.  Swap  matched  and              unmatched  edges  

Invert  Vector  Opera.on  Inverted  index  in  a    sparse  vector  

1 1 5 1 4

1 2 3 4 5 1 2 3 4 5

Index:  child  Value:  parent  

Index:  parent  Value:  child  

Page 11: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Mul.-­‐source  BFS  (MS-­‐BFS)  algorithm    using  matrix  and  vector  opera.ons  

x1

x2

x3

x4

x5

y1

y2

y3

y4

y5

x6 y6

(a) A maximal matching in a Bipartite Graph

x1 x2

x3 x4 x5

y1 y2 y3

y4 y5

(b) Alternating BFS Forest

Roots  of  BFS  trees  

Sparse  matrix-­‐sparse  vector  mul.ply    (SpMSpV)  

Inverted  index  using    matching  vector  

Sparse  matrix-­‐sparse  vector  mul.ply    (SpMSpV)  

Not  explored  to  maintain  vertex-­‐disjoint  trees  

Step-­‐1:  Discover  vertex-­‐disjoint  augmen?ng  paths  

Page 12: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

MS-­‐BFS  algorithm    using  matrix  and  vector  opera.ons  

x1 x2

x3 x4 x5

y1 y2 y3

y4 y5

Augment matching

Step-­‐2:  Augment  matching  by  flipping  matched  and  unmatched  edges  along  the  augmen?ng  paths    

Invert  

Invert  

Page 13: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Distributed  memory  paralleliza.on  (SpMSpV)  

x A fron.er  

à  x

ALGORITHM:  1.  Gather  ver.ces  in  processor  column  [communica?on]  2.  Local  mul.plica.on  [computa.on]  3.  Find  owners  of  the  current  fron.er’s  adjacency  and  exchange  

adjacencies  in  processor  row  [communica?on]  

n p

n p

p × p Processor  grid  

P  processors  are  arranged  in  

Page 14: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Shared-­‐memory  paralleliza.on  (SpMSpV)    

y  

x  (fron.er)  n (t p )

Thread  1  

Thread  2  

•  Explicitly  split  local  submatrices  to  t  (#threads)  pieces  along  the  rows.  

n (t p )

Page 15: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Computa.on  and  communica.on  .me  of    discovering  vertex-­‐disjoint  augmen.ng  paths  (a  phase)  

Opera?on  Per  processor  Computa?on  (lower  bound)  

Per  processor  Comm  (latency)  

Per  processor  Comm  

(bandwidth)  

SpMSpV  

Invert  

height *α p

height *αp βnp

βmp+np

!

"##

$

%&&

mp

np

α  :  latency  (0.25  μs  to  3.7  μs  MPI  latency  on  Edison)  β  :  inverse  bandwidth  (~8GB/sec  MPI  bandwidth  on  Edison)  p  :  number  of  processors    

n:  number  of  ver.ces,  m:  number  of  edges    height:  maximum  height  of  the  BFS  forest      

Page 16: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

x

x

y

y

x

x

y

y

x

x

y

y

x

x

y

y

x

x

y

y

x

x

y

y Level  synchronous:                  BFS  Style  

One  path  per  process  Using  one-­‐sided  communica.on    

 via  MPI  Remote  Memory  Access  (RMA)  

P1,  P2  

P1,  P2  

P1                    P2  

Special  treatments  for  long  augmen.ng  paths  

Page 17: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q  Plaxorm:  Edison  (NERSC)  –  2.4  GHz  Intel  Ivy  Bridge  processor,  24  cores  (2  sockets)  and  64  GB  RAM  per  node  

–  Cray  Aries  network  using  a  Dragonfly  topology  (0.25  μs  to  3.7  μs  MPI  latency,  ~8GB/sec  MPI  bandwidth)  

–  Programming  environment:  C++  and  Cray  MPI,  Combinatorial  BLAS  library  (Buluc  and  Gilbert,  2011)  

q  Input  graphs  –  Real  matrices  from  Florida  sparse  matrix  collec.on  and  randomly  generated  matrices.  

–  Matrix-­‐  bipar.te  graph  conversion  •  rows:  ver.ces  in  one  part,  columns:  ver.ces  in  another  part,  nonzeros:  edges.  

Results:  experimental  Setup  

Page 18: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q  Karp-­‐Sipser  obtains  the  highest  cardinality  for  many  prac.cal  problems,  but  it  runs  the  slowest  on  high  concurrency  

q  We  found  that  dynamic  mindegree  +  MCM    ofen  runs  the  fastest  on  high  concurrency.    

Impact  of  ini.aliza.on  on  MCM  

0

8

16

24

32

Dynamic Mindegree

Karp- Sipser

Greedy No Init

Tim

e (s

ec)

nlpkkt200 Maximum Matching Maximal Matching

0!

4!

8!

12!

16!

Dynamic Mindegree!

Karp- Sipser!

Greedy! No Init!

Tim

e (s

ec)!

road_usa!

0!

2!

4!

6!

Dynamic Mindegree!

Karp- Sipser!

Greedy! No Init!

Tim

e (s

ec)!

GL7d19!

0!

2!

4!

6!

8!

Dynamic Mindegree!

Karp- Sipser!

Greedy! No Init!

Tim

e (s

ec)!

wikipedia!

On  1024  cores    of  Edison  

Page 19: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

16 32 64 128 256 512 1024 20482

4

8

16

32

64

128

256

Number of Cores

Tim

e (s

ec)

ljournalcage15road_usanlpkkt200hugetracedelaunay_n24HV15R

MCM  strong  scaling  (real  matrices)  

~80x  increase  of  cores  

12x-­‐18x  speedups  

To  appear:  Azad  and  Buluç,    IPDPS  2016  

1  node    (24  cores  of  Edison)  

Page 20: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

MCM  strong  scaling  (G500  RMAT  matrices)  

64 128 256 512 1024 2048 4096 8192 163841

2

4

8

16

32

64

Number of Cores

Tim

e (s

ec)

(b) G500

Scale−30Scale−29Scale−28Scale−27Scale−26

Scale-­‐30  RMAT:  2  billion  ver.ces,  32  billion  edges  Scaling  con.nues  beyond  10K  core  on  Large  matrices    

Page 21: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

MCM:  Breakdown  of  run.me  

48 108 192 432 972 2,0140

0.5

1

1.5

2

2.5

Number of Cores

Tim

e (s

ec)

GL7d19

MaximalSpMVSelectInvertPruneAugment

V1=  1.91M,    V2  =  1.96M,    #edges  =  37M  

Page 22: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q  Similar  graph-­‐matrix  transforma.on  applies  to  weighted  matching  algorithms.  

q  Auc.on  algorithm  ideas  [Ongoing  work]    –  Bidders  bid  for  most  profitable  objects:  SpMSpV  with  (select2nd,  max)  semiring  

–  An  object  selects  the  best  bidder  from  which  it  received  bid:  Inverted  index  

–  Dual  updates  can  be  done  using  vector  opera.ons    

Ideas  for  weighted  matching  

Page 23: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q  Summary  of  contribu.ons  – Methods:  distributed  memory  matching  algorithms  based  on  matrix  algebra  

–  Performance:  scales  up  to  10K  cores  on  large  graphs.  –  Easy  to  implement  an  algorithm  using  matrix-­‐algebraic  primi.ves.    

–  Source  code  publicly  available  at:      h1p://gauss.cs.ucsb.edu/~aydin/CombBLAS/html/  

 

q  Future  work  –  Distributed  weighted  matching  using  matrix  algebra  

Summary  

Page 24: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q  A.  Azad  and  A.  Buluç,  to  appear  IPDPS  2016,  Distributed-­‐Memory  Algorithms  for  Maximum  Cardinality  Matching  in  Bipar.te  Graphs.  

q  A.  Azad  and  A.  Buluç,  CLUSTER  2015,  Distributed-­‐memory  algorithms  for  maximal  cardinality  matching  using  matrix  algebra.  

q  Langguth  et  al.,  Parallel  Compu.ng  2011,  Parallel  algorithms  for  bipar.te  matching  problems  on  distributed  memory  computers.  

q  M.  Patwary,  R.  Bisseling,  F.  Manne,  HPPA  2010,  Parallel  greedy  graph  matching  using  an  edge  par..oning  approach.  

q  M.  Sathe,  O.  Schenk,  H.  Burkhart,  Parallel  Compu.ng  2012,  An  auc.on-­‐based  weighted  matching  implementa.on  on  massively  parallel  architectures.  

Relevant  references  

Thanks  for  your  aWen.on    

Page 25: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

                                           Suppor.ng  slides    

Page 26: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q Used  to  ini.alize  MCM  q  Example:  dynamic  mindegree  algorithm  

–  Greedy  and  Karp-­‐Sipser  are  similar  (Azad  and  Buluc,  2015)  

Maximal  matching  algorithms    using  matrix  and  vector  opera.ons  

x1

x2

x3

y1

y2

y3

d=2  

d=2  

d=3  

d=3  

d=2  

d=2  

x1

x2

x3

y1

y2

y3

d=2  

d=2  

d=3  

d=3  

d=2  

d=2  

x1

x2

x3

y1

y2

y3

d=2  

d=2  

d=2  

d=2  

neighbor  with    mindegree   Match   Update  degree  

SpMSpV  Addi?on  =  min  (degree)  

Inverted  Index    

SpMSpV  Addi?on  =  plus  

Graph  Op  

Matrix  Op  

Page 27: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

��� ��� ���� ���� ���� ���� ������

��

�� ��� �� ����������

�� ������

��� ������

�� �!���� �!���� �!�

��� ��� ���� ���� ���� ���� ������

��

��

�� ��� �� ����������

�� ������

��� "�#� �� ��#��$���

�� �!���� �!���� �!�

��� ��� ���� ���� ���� ���� ������

��

�� ��� �� ������������ ������

��� %��&!'�&���

�� �!���� �!���� �!�

Maximal  matching  strong  Scaling    Randomly  generated  RMAT  graphs  

Graph   #ver?ces   #edges   Greedy   Karp-­‐Sipser  

Dynamic  Mindegree  

RMAT-­‐26   128  million   2  billion   3x   no   6x  

RMAT-­‐28   512  million   8  billion   7x   3x   10x  

RMAT-­‐30   2  billion   32  billion   12x   8x   15x  

For  16x  increase  of  cores:  1,024  –  16,384  

Larger  graphs  Higher  speedups  

Page 28: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

Strong  Scaling    Why  does  dynamic  mindegree  scale  beWer?  

Graph   #ver?ces   #edges   Greedy   Karp-­‐Sipser  

Dynamic  Mindegree  

RMAT-­‐26   128  million   2  billion   3x   0x   6x  

RMAT-­‐28   512  million   8  billion   7x   3x   10x  

RMAT-­‐30   2  billion   32  billion   12x   8x   15x  

For  16x  increase  of  cores:  1,024  –  16,384  

� �� �� �� �� �� �� �� ���

��

���

���

���

�� ���

��� �!"#�!$��

� ��� ��� � � ������ �� � � ������

� � � � � � %�

���

���

���

���

�� ���

��� &�� ��� '��������

���� ��� � � ������ �� � � ������

%.m

e  or  %matched

 

Page 29: Distributed-Memory Algorithms for Cardinality Matching using … · 2016. 6. 21. · Distributed-Memory Algorithms for Cardinality Matching using Matrix Algebra Ariful’Azad,’Lawrence(Berkeley(Naonal(Laboratory(

q  For  graph-­‐based  algorithms,  matching  quality  decreases  with  increased  concurrency  

Graph-­‐based  vs.  Matrix-­‐based    parallel  algorithms  

� � �� �� ��� ���� ���� ������

��

��

��

��

���

�� �� �� ����� �� �������

��������������������

�!����������"���#$�����%&���!'

�!����������"���#$�����%���() �'

*����� ����"���#$�����%+�����'

*����� ����,����(%+�����'