28
ASAP : Fast, Approximate Graph Pattern Mining at Scale Anand Iyer et al. @ OSDI 2018 Presenter: Yunang Chen 1

ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

ASAP: Fast, Approximate Graph Pattern Mining at Scale

Anand Iyer et al. @ OSDI 2018

Presenter:YunangChen

1

Page 2: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 ASwiftApproximatePattern-miner

 Navigatestradeoffbetweenresultaccuracyandlatency

 Runsongeneral-purposedistributeddataflowplatform

 Supportsforgeneralizedgraphpatternminingalgorithms

2

ASAPDesignOverview

Page 3: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Standardapproach:Iterativeexpansion

GraphPatternMining

3

 Lackofscalability◦ Generateexponentiallylargeintermediatecandidatesets◦ Needtostore+exchangethemindistributedenvironment

Page 4: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Standardapproach:Iterativeexpansion

GraphPatternMining

4

 Lackofscalability◦ Generateexponentiallylargeintermediatecandidatesets◦ Needtostore+exchangethemindistributedenvironment

*Experimentsperformedonaclusterof20machines,eachhaving256GBofmemory.

Page 5: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Manypatternminingtasksdonotneedexactanswers.◦ Frequentsub-graphmining(FSM)findsthefrequencyofsubgraphsbutwithanend-goaloforderingthembyoccurrences.

5

GraphPatternMining

 Leverageapproximationforpatternmining

Page 6: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Previousapproach:Applytheexactsamealgorithmonsubsetsoftheinputdata,thenusethestatisticalpropertiesofthesesubsetstoestimatefinalresults.

6

ApproximatePatternMining

Page 7: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Previousapproach:Applytheexactsamealgorithmonsubsetsoftheinputdata,thenusethestatisticalpropertiesofthesesubsetstoestimatefinalresults.

7

ApproximatePatternMining

◦ Nosignificantspeedup◦ Largeerrorrate

Page 8: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Neighborhoodsampling:1.  Modeltheedgesinthegraphasastream2.  Sampleoneedge,𝑒↓1 3.  Graduallyaddmoreadjacentedges, 𝑒↓2 ,…, 𝑒↓𝑘 4.  Stopwhentheedgesformthepatternorbecomesimpossibletodoso5.  Usetheprobabilityofsamplingtoboundthetotalnumberofoccurrences

ofthepattern:𝑃(𝑒↓1 ,…, 𝑒↓𝑘 )=𝑃(𝑒↓1 )×𝑃(𝑒↓2 ∣𝑒↓1 )×…×𝑃(𝑒↓𝑘 ∣𝑒↓1 ,…, 𝑒↓𝑘−1 )

6.  RepeatStep1-5multipletimes

8

ApproximatePatternMining

Page 9: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Neighborhoodsampling:TriangleCounting1.  Modeltheedgesinthegraphasastream

9

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

Page 10: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Neighborhoodsampling:TriangleCounting2.  Sampleoneedge

10

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

Page 11: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Neighborhoodsampling:TriangleCounting3.  Graduallyaddmoreadjacentedges

11

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

Page 12: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Neighborhoodsampling:TriangleCounting4.  Stopwhentheedgesformthepatternorbecomesimpossibletodoso

12

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

Page 13: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Neighborhoodsampling:TriangleCounting4.  Stopwhentheedgesformthepatternorbecomesimpossibletodoso

13

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

Page 14: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Neighborhoodsampling:TriangleCounting5.  Usetheprobabilityofsamplingtoboundthetotalnumberofoccurrences

14

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

Page 15: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Neighborhoodsampling:TriangleCounting6.  RepeatStep1-5multipletimes

15

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

Page 16: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

16

ASAPArchitecture

Page 17: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Neighborhoodsampling:1.  Modeltheedgesinthegraphasastream2.  Sampleoneedge,𝑒↓1 3.  Graduallyaddmoreadjacentedges, 𝑒↓2 ,

…, 𝑒↓𝑘 4.  Stopwhentheedgesformthepatternor

becomesimpossibletodoso5.  Usetheprobabilityofsamplingtobound

thetotalnumberofoccurrencesofthepattern:𝑃(𝑒↓1 ,…, 𝑒↓𝑘 )=𝑃(𝑒↓1 )×𝑃(𝑒↓2 ∣𝑒↓1 )×…×𝑃(𝑒↓𝑘 ∣𝑒↓1 ,…, 𝑒↓𝑘−1 )

6.  RepeatStep1-5multipletimes

17

ProgrammingAPI

Page 18: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

18

ProgrammingAPI

SamplingPhase:fixtheverticesforapattern

ClosingPhase:waitingforremainingedgestocompletethepattern

Page 19: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Relyonmapandreduceoperations1.  Partitiontheverticesacross𝑤workers2.  Applyestimatortaskoneachsubgraphtoproduceapartialcount3.  Sumuppartialcounts4.  Adjustforunderestimationbymultiplying𝑓(𝑤)

e.g.fortrianglecount,𝑓(𝑤)=1/𝑤↑2 

19

DistributedExecution

Page 20: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Relyonmapandreduceoperations1.  Partitiontheverticesacross𝑤workers2.  Applyestimatortaskoneachsubgraphtoproduceapartialcount3.  Sumuppartialcounts4.  Adjustforunderestimationbymultiplying𝑓(𝑤)

e.g.fortrianglecount,𝑓(𝑤)= 𝑤↑2 

20

DistributedExecution

𝑤↓2 

𝑤↓2 

𝑤↓2 

𝑤↓1 

𝑤↓1 

•  Patternsacrosspartitionsareignored•  Totaloccurrenceisreducedby1/𝑓(𝑤)

Page 21: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

21

ASAPArchitecture

Page 22: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 ASAPcanperformtasksintwomodes:◦ Timebudget𝑇◦ Errorbudget𝜖

 Givenatime/errorbound,howmanyestimatorsshouldASAPuse?

22

Error-LatencyProfile(ELP)

Page 23: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Runningtimescaleslinearlywithnumberofestimators

 Testexponentiallyspacedpoints+extrapolationtobuildalinearmodel

23

Error-LatencyProfile(ELP)

Page 24: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 Chernoffboundfortrianglecounting: 𝑁↓𝑒 > 𝐾×𝑚×Δ/𝜖↑2 𝑃  Estimategroundtruth 𝑃↓𝑠  onasmallsampleofthegraph+scaleto 𝑃 

24

Error-LatencyProfile(ELP)

Page 25: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 77xspeedupwithunder5%lossofaccuracyforsmallergraphs(0.01-30millionedges)

25

Evaluation

Page 26: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 258xspeedupwithunder5%lossofaccuracyforlargergraphs

26

Evaluation

Page 27: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

 ASAPisthefirstsystemthatdoesfast,scalableapproximategraphpatternminingonlargegraphs.

 ASAPoutperformsArabesquebymorethanamagnitudefasterwithasacrificeof5%accuracy.

 ASAPscalestolargergraphswhereasArabesquefailstocompleteexecution.

27

Conclusion

Page 28: ASAP: Fast, Approximate Graph Pattern Mining at Scalepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-yunang-asap.pdfosdi18_slides_iyer.pdf Iyer, Anand Padmanabha, et al. "ASAP: fast,

◦ https://www.usenix.org/sites/default/files/conference/protected-files/osdi18_slides_iyer.pdf

◦  Iyer,AnandPadmanabha,etal."ASAP:fast,approximategraphpatternminingatscale."Proceedingsofthe12thUSENIXconferenceonOperatingSystemsDesignandImplementation.USENIXAssociation,2018.

◦  Iyer,AnandPadmanabha,etal."Towardsfastandscalablegraphpatternmining."10th{USENIX}WorkshoponHotTopicsinCloudComputing(HotCloud18).USENIX}Association},2018.

28

Reference