40
Coun%ng Triangles and the A paper by: Siddharth Suri and Sergei Vassilvitskii Presented by: Ryan Rogers (with some slides from Sergei’s Presenta%on) Curse of the last reducer Curse of the last reducer

Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Coun%ng  Triangles  and  the      

A  paper  by:  Siddharth  Suri  and  Sergei  Vassilvitskii  

Presented  by:  Ryan  Rogers  (with  some  slides  from  Sergei’s  Presenta%on)  

 

Curse of the last reducer  Curse of the last reducer  

Page 2: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Introduc)on  

•  Study  Social  Networks  •  Main  metric  for  analyzing  Social  Networks:  Clustering  Coefficient  of  each  node  

•  Problem  of  finding  the  Clustering  Coefficient  of  a  node  is  basically  the  same  as  coun%ng  the  number  of      ‘s    incident  to  that  node.        

Page 3: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Clustering  and  Triangles  

Page 4: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Clustering  and  Triangles  

Page 5: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Clustering  and  Triangles  

Page 6: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Clustering  and  Triangles  

Page 7: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Clustering  and  Triangles  

CC(            )  =  2/(  2  )                            =  1/3  

4  

Page 8: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Clustering  and  Triangles  

CC(            )  =  2/(  2  )                            =  1/3  

4  

OR…  

Page 9: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Clustering  and  Triangles  

Page 10: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Clustering  and  Triangles  

Page 11: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Clustering  and  Triangles  

Number  of              ‘s        =    (  2  )  x  CC(            )  

d  

Page 12: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Past  Work  

•  Coppersmith  and  Kumar  (‘04)  and  Buriol  et  al.  (‘04):  Streaming  algorithms  to  find  total  number  of  triangles  with  high  accuracy  

•  BeccheW  et  al.  (‘08):  Es)mate  the  number  of  triangles  incident  on  each  node.  

•  Tsourakakis  et  al.  (‘09):  Randomized  MapReduce  procedure  that  gives  the  total  number  of  triangles  accurately  in  expecta)on.  

Page 13: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Contribu)ons  

•  Count  the  exact  number  of  triangles  •  Count  the  number  of  triangles  incident  on  each  node,  exactly.  

•  Comparable  speedup  as  the  randomized  MapReduce  procedure.    

Page 14: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Coun)ng  Triangles  (Naïve)  

•  Let    – for  • for  each              –   for  each    » if    

•  Output    

!!T←⎯0!v∈V

!!(u,w)∈E

!!u∈Γ(v)!!w∈Γ(v)

!!T←⎯T +1/2

!!T←⎯T /3

Run Time:

!!O du

2

u∈V∑⎛

⎝⎜⎞⎠⎟

Page 15: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

MapReduce  (Naïve)  

•  Map  1:   !!G = (V ,E)

!!< key ,value >

Page 16: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

MapReduce  (Naïve)  

•  Map  1:   !!G = (V ,E)

!!< v1 ,Γ(v1)> !!< v2 ,Γ(v2)> !!< vn ,Γ(vn)>!!< v3 ,Γ(v3)>

Page 17: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

MapReduce  (Naïve)  

•  Map  1:  

•  Reduce  1:  

!!G = (V ,E)

!!< v1 ,Γ(v1)> !!< v2 ,Γ(v2)> !!< vn ,Γ(vn)>!!< v3 ,Γ(v3)>

!!< v ,Γ(v)> !! <(u1 ,u2),v)>:u1 ,u2∈Γ(v){ }

Page 18: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

MapReduce  (Naïve)  

•  Map  2:  

•  Reduce  2:  –  If                        then.  

•  For                emit      

!!<(u,w), v1 ,...,vk ,Edge?{ }>

!Edge

!!v∈ v1 ,...,vk{ } !!< v ,1>

!!<(u,w),v > + <(u,w),Edge >

Page 19: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

What’s  Wrong  with  this?  

•  Does  this  improve  our  running  %me?  •  There  s%ll  may  be  a  very  high  degree  vertex  in  the  network  

•  Thus,  one  machine  may  be  stuck  with  a  lot  of  data!  

Still HaS Run Time:

!!O dmax2( )

Page 20: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Reality  

•  Social  Networks  are  typically  sparse  •  However,  there  may  be  few  nodes  with  very  high  degree.  

Page 21: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Reality  

Page 22: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Live  Journal  Data  

Page 23: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

The Curse of the last reducer  

 •  The  idea  that  99%  of  the  computa%on  finishes  quickly,  but  the  last  1%  takes  a  HUGE  amount  of  %me.  

Page 24: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Possible  Fixes  

•  Genera%ng  2-­‐paths  around  high-­‐degree  nodes  is  expensive  –  concentrate  on  low  degree.  

•  Divide  the  graph  into  overlapping  subgraphs  and  somehow  account  for  the  overlap.      

Page 25: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Coun)ng  Triangles  (Op)mal)  

•  NodeIterator++  –     – For  • For            and  – for            and  » if  •         

•  Return    

!!(V ,E)!!T←⎯0!v∈V!!u∈Γ(v) ! u≻ v!!w∈Γ(v) ! w ≻u!!(u,w)∈E!!T←⎯T +1

!T

!du >dv

Page 26: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Proper)es  of  NodeIterator++  

•  Has  running  %me          and  gives  the  exact  number  of  triangles  incident  to  each  node      [Schank  ‘07]    

•  Best  possible  bound:  

!!O(m3/2)

K√n  n-­‐√n  

Page 27: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

MR-­‐NodeIterator++  

• Map  1’:    – If        • Emit    

•  Reduce  1’:    

• Map  2,  Reduce  2.    

! v ≻u

!!<u,v >

!!<u,S⊆ Γ(u)> ⎯→ <u,(v ,w)>:v ,w∈S{ }

Page 28: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Memory  Required  per  Machine  

•  Lemma:  The  input  to  any  reduce  instance  in  first  round  has                                edges  (Sublinear  space)  

•  Proof:            L  =                          H  =                    

!O m( )

!! v∈V :dv < m{ }!! v∈V :dv ≥ m{ }

Page 29: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Size  of  Output  aTer  Round  1  •  Lemma:  The  total  number  of  records  output  at  the  end  of  the  first  reduce  is    

•  Proof:  – There  are  at  most  n=                                    machines  with  low  degree  nodes,  and  each  machine  produces  an  output  of  size                            

– There  are  at  most                                      machines  with  high  degree  nodes  and  each  machine  must  output  pairs  with  other  high  degree  nodes  =>                          output  size  

!!O m3/2( )!!O m1/2( )

!O m( )!!O m1/2( )

!O m( )

Page 30: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Did  it  Help?  

Page 31: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Possible  Fixes  

•  Genera%ng  2-­‐paths  around  high-­‐degree  nodes  is  expensive  –  concentrate  on  low  degree.  

•  Divide  the  graph  into  overlapping  subgraphs  and  somehow  account  for  the  overlap.      

Page 32: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

MR-­‐GraphPar))on  

•  Input:  •  Par%%on  ver%ces  into        equal  sized  •  Consider  all  triples                                      and  the  induced  graph                  for    

•  Compute  Triangles  on  each  graph  separately  – You  can  use  your  favorite  triangle  coun%ng  algorithm  on  each!  

•  Map  nodes  to  index  i  by  using  a  universal  hash  

!!(V ,E ,ρ)ρ !!V0 ,...,Vρ−1

!!(Vi ,Vj ,Vk )

!!Gijk =G Vi ,Vj ,Vk⎡⎣ ⎤⎦ !i < j < k

Page 33: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

MR-­‐GraphPar))on  

•  Map  1”:  Input    – for    •  if  – emit  

•  Reduce    1”:  Input:  – Count  Triangles  and  weight  accordingly.    

!!<(u,v),1>

!!<(a,b,c),(u,v)>

!!a<b< c ≤ ρ −1!!{h(u),h(v)}⊆ {a,b,c}

!!<(i , j ,k),Eijk >?  

Page 34: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

May  Over  Count  #  of            ‘s    

Page 35: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Analysis  

•  The  expected  size  of  the  input  to  any  machine  instance  is    

•  The  expected  total  space  used  at  the  end  of  map  phase  is    

•  Proof:  SEE  BOARD  

!!O m/ρ2( )

!O mρ( )

Page 36: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Analysis  (con)nued)  

•  Theorem:  For            ,  the  amount  of  work  done  by  all  the  machines  is      

•  Proof:                %me  per  edge  =>                        %me        for  Map  2”  phase.      Par%%on  input  amongst          reducers.        Running  Time  per  Reducer:      

!ρ ≤ m

!!O m3/2( )

!!O ρ3( )!!O 1( ) !!O mρ( ) =O m3/2( )

!!=O #Edges3/2( ) =O m

ρ2⎛⎝⎜

⎞⎠⎟

3/2⎛

⎝⎜⎜

⎠⎟⎟

Page 37: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Results  for  Par))on  

Page 38: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Comparison  of  Results  

Naïve   ++  

Par%%on  

Page 39: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

The Curse of the Last Reducer

Page 40: Coun%ng’Triangles’and’the’’grigory.us/files/mr-triangles.pdf · Coun%ng’Triangles’and’the’’ ’ A’paper’by:’Siddharth’Suri’and’Sergei’ Vassilvitskii’

Ques)ons???