7
EDGAR Energyefficient Data and Graph Algorithms Research Funded by Applied Math, ASCR Early Career Research Program (start: 2013) PI: Aydın Buluç (Berkeley Lab) Postdoctoral Fellow: Ariful Azad (100%) Students: Veronika StrnadovaNeeley (Since Oct 2015, UCSB) Adam Sealfon (CSGF Fellow, Summer 2015, MIT) Chaitanya Aluru (undergraduate, UC Berkeley)

EDGAR - Lawrence Berkeley National LaboratoryEDGAR(FY15)& Awards: • Aydın&Buluç, IEEETCSC Award#for#Excellence#for#Early#Career# Research&by&the&IEEE&Commiee on ScalableCompung,2015

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EDGAR - Lawrence Berkeley National LaboratoryEDGAR(FY15)& Awards: • Aydın&Buluç, IEEETCSC Award#for#Excellence#for#Early#Career# Research&by&the&IEEE&Commiee on ScalableCompung,2015

EDGAR  

•  Energy-­‐efficient  Data  and  Graph  Algorithms  Research  •  Funded  by  Applied  Math,  ASCR  •  Early  Career  Research  Program  (start:  2013)  •  PI:  Aydın  Buluç  (Berkeley  Lab)  •  Postdoctoral  Fellow:    

– Ariful  Azad  (100%)  •  Students:    

–  Veronika  Strnadova-­‐Neeley  (Since  Oct  2015,  UCSB)  – Adam  Sealfon  (CSGF  Fellow,  Summer  2015,  MIT)  –  Chaitanya  Aluru  (undergraduate,  UC  Berkeley)  

Page 2: EDGAR - Lawrence Berkeley National LaboratoryEDGAR(FY15)& Awards: • Aydın&Buluç, IEEETCSC Award#for#Excellence#for#Early#Career# Research&by&the&IEEE&Commiee on ScalableCompung,2015

EDGAR  (FY15)  Awards:  •  Aydın  Buluç,  IEEE  TCSC  Award  for  Excellence  for  Early  Career  

Research  by  the  IEEE  CommiZee  on  Scalable  Compu[ng,  2015  •  HipMer  team  (next  slide),  HPCWire’s  Readers’  Choice  Award  

for  the  Best  Use  of  HPC  Applica>on  in  Life  Sciences    Ar[facts:  •  Aydın  Buluç,  Guest  Editor:  Parallel  Compu[ng,  special  issue  

on  “Graph  Analysis  for  Scien[fic  Discovery”  •  Six  peer-­‐reviewed  publica[ons,  one  invited  ar[cle  •  Six  invited  talks  (one  at  conference,  five  at  universi[es/labs)  

Page 3: EDGAR - Lawrence Berkeley National LaboratoryEDGAR(FY15)& Awards: • Aydın&Buluç, IEEETCSC Award#for#Excellence#for#Early#Career# Research&by&the&IEEE&Commiee on ScalableCompung,2015

HipMer:  An  Extreme-­‐Scale  De  Novo  Genome  Assembler    

k-mers

New  k-­‐mer  analysis  filters  errors  using  probabilis6c  “Bloom  Filter”    

Graph  algorithm  (connected  components)    scales  to  15K  cores  on  NERSC’s  Edison  

contigs

scaffolding using scalable alignment

x  x  

New  fast  &  parallel  I/O  

reads

Meraculous Assembly Pipeline Meraculous  assembler  is  used  in  produc>on  at  the  Joint  Genome  Ins>tute  •  Wheat  assembly  is  a  “grand  challenge”    •  Hardest  part  is  con[g  genera[on    (large  in-­‐

memory  hash  table    that  represents  graph)  •  HipMer  is  an  efficient  paralleliza[on  of  

Meraculous  

32

64

128

256

512

1024

2048

4096

8192

16384

960 1920 3840 7680 15360

Seco

nds

Number of Cores

overall timekmer analysis

contig generationscaffolding

ideal overall time

Performance  improvement  from  days  to  minutes  

E.  Georganas,  A.  Buluç,  J.  Chapman,  S.  Hofmeyr,  C.  Aluru,  R.  Egan,  L.  Oliker,  D.  Rokhsar,  K.  Yelick.    HiPMer:  An  extreme-­‐scale  de  novo  genome  assembler.  SC’15    

Page 4: EDGAR - Lawrence Berkeley National LaboratoryEDGAR(FY15)& Awards: • Aydın&Buluç, IEEETCSC Award#for#Excellence#for#Early#Career# Research&by&the&IEEE&Commiee on ScalableCompung,2015

Communica[on-­‐Avoiding  Sparse  Matrix-­‐Matrix  Mul[ply  

64 256 1024 4096 163840.25

1

4

16

Number of Cores

Tim

e (s

ec)

nlpkkt160 x nlpkkt160 (on Edison)

2D (t=1)2D (t=3)2D (t=6)3D (c=4, t=1)3D (c=4, t=3)3D (c=8, t=1)3D (c=8, t=6)3D (c=16, t=6)

2D#threads

increasing

3D#layers &#threads

increasing

3D-­‐threaded  (red)  beats  the  previous  state-­‐of-­‐the-­‐art  (blue)  by  8X  at  large  concurrencies  

A::1$

A::2$

A::3$

n pc

Allto

All%

Allto

All%

C intijk = Ailk

l=1

p/c

∑ Bljk

A$ B$ Cintermediate$ Cfinal%

x$

x$

x$

=$

=$

=$

!$

!$

!$

Applica[ons:  –  Algebraic  mul[grid  (AMG)  restric[on    –  Graph  computa[ons  –  Quantum  chemistry  –  Similarity  computa[on  (data  mining)  –  Interior-­‐point  op[miza[on  

A.  Azad,  G.  Ballard,  A.  Buluç,  J.  Demmel,  L.  Grigori,  O.  Schwartz,  S.  Toledo,  S.  Williams.  Exploi[ng  mul[ple  levels  of  parallelism  in  sparse  matrix-­‐matrix  mul[plica[on.  arXiv  preprint  arXiv:1510.00844.    

Page 5: EDGAR - Lawrence Berkeley National LaboratoryEDGAR(FY15)& Awards: • Aydın&Buluç, IEEETCSC Award#for#Excellence#for#Early#Career# Research&by&the&IEEE&Commiee on ScalableCompung,2015

0  

4  

8  

12  

16  

kkt_power

 huget

race   delaunay  

rgg_n24_s0

 

coPaersDBL

P  

amazon03

12  cit-­‐Pa

tents  

RMAT  road_

usa   wb-­‐edu  web-­‐G

oogle  wikip

edia  

Rela[ve  pe

rformance  

Push-­‐Relabel  Pothen-­‐Fan  MS-­‐BFS-­‐Gram  

18   35   29   42  

Performance:    On  40-­‐core  Intel      On  average  7x  faster  than  current  best  algorithm.  Can  be  up  to  42x  faster.  

1  

2  

4  

8  

16  

32  

1   4   16   64  

Speedu

p  

Number  of  Threads  

Scien[fic  

Scale-­‐free  

Networks  

Hyperthreading  

Change  of  socket  Scaling:  One  node  of  Edison    

(24-­‐core  Intel  Ivy  Bridge).  On  average  17x  speedups  rela[ve  to  

serial  algorithm.  

Parallel  Maximum  Cardinality  Matching  in  Bipar>te  Graphs  Using  “Tree  GraVing”  

A.  Azad,  A.Pothen,  A.  Buluç.  Compu[ng  Maximum  Cardinality  Matchings  in  Parallel  on  Bipar[te  Graphs  via  Tree-­‐Graming.  Under  revision  in  IEEE  Transac[ons  on  Parallel  and  Distributed  Systems  (TPDS).    

Page 6: EDGAR - Lawrence Berkeley National LaboratoryEDGAR(FY15)& Awards: • Aydın&Buluç, IEEETCSC Award#for#Excellence#for#Early#Career# Research&by&the&IEEE&Commiee on ScalableCompung,2015

Counting and Enumerating Triangles using Matrix Algebra

A L U

1 2

1 1 1 2

C

A = L + U (hi->lo + lo->hi) L × U = B (wedge, low hinge)

A ∧ B = C (closed wedge)

sum(C)/2 = 4 triangles

A

5

6

3

1 2

4 5

6

3

1 2

4

1

1

2

B, C

A.  Azad,  A.  Buluç,  and  J.  R  Gilbert.  “Parallel  triangle  coun[ng  and  enumera[on  using  matrix  algebra”,  Graph  Algorithm  Building  Blocks    IPDPS  Workshops,  2015  

Page 7: EDGAR - Lawrence Berkeley National LaboratoryEDGAR(FY15)& Awards: • Aydın&Buluç, IEEETCSC Award#for#Excellence#for#Early#Career# Research&by&the&IEEE&Commiee on ScalableCompung,2015

Maximal Cardinality Matching using Matrix Algebra

A.  Azad,  A  Buluç.  Distributed-­‐memory  algorithms  for  maximal  cardinality  matching  using  matrix  algebra.  In  IEEE  Interna6onal  Conference  on  Cluster  Compu6ng  (CLUSTER),  2015  (full  paper).    

256 512 1024 2048 4096 8192 16384.25

.5

1

2

4

8

16

32

Number of Processors

Time(sec)

(a) Greedy

RMAT−26RMAT−28RMAT−30

256 512 1024 2048 4096 8192 16384.25

.5

1

2

4

8

16

32

64

Number of ProcessorsTime(sec)

(c) Dynamic Mindegree

RMAT−26RMAT−28RMAT−30

256 512 1024 2048 4096 8192 16384.25

.5

1

2

4

8

16

Number of Processors

Time(sec)

(b) Karp−Sipser

RMAT−26RMAT−28RMAT−30

1

5

2 3 4

1 5 2 3 4

×  

×  ×  

×  

×  ×  

c1

c2

c3

c4

r1

r2

r3

r4

r5 c5

×  ×   ×  

1 2 5

1  

1  

2  

5  

5  

Select    columns  

Unmatched  columns:  fc

Visited

 rows:  f r

Matrix  A  Bipar[te  graph    G(R,C,E)

Matrix-­‐based  primi[ves  enable  efficient  and  scalable  distributed-­‐memory  implementa[ons  of  various  maximal  cardinality  algorithms  solely  by  minimal  modifica[ons  to  the  underlying  semiring  operator.