ASAP: Fast, Approximate Graph Pattern Mining at...

Preview:

Citation preview

ASAP: Fast, Approximate Graph Pattern Mining at Scale

Anand Iyer et al. @ OSDI 2018

Presenter:YunangChen

1

 ASwiftApproximatePattern-miner

 Navigatestradeoffbetweenresultaccuracyandlatency

 Runsongeneral-purposedistributeddataflowplatform

 Supportsforgeneralizedgraphpatternminingalgorithms

2

ASAPDesignOverview

 Standardapproach:Iterativeexpansion

GraphPatternMining

3

 Lackofscalability◦ Generateexponentiallylargeintermediatecandidatesets◦ Needtostore+exchangethemindistributedenvironment

 Standardapproach:Iterativeexpansion

GraphPatternMining

4

 Lackofscalability◦ Generateexponentiallylargeintermediatecandidatesets◦ Needtostore+exchangethemindistributedenvironment

*Experimentsperformedonaclusterof20machines,eachhaving256GBofmemory.

 Manypatternminingtasksdonotneedexactanswers.◦ Frequentsub-graphmining(FSM)findsthefrequencyofsubgraphsbutwithanend-goaloforderingthembyoccurrences.

5

GraphPatternMining

 Leverageapproximationforpatternmining

 Previousapproach:Applytheexactsamealgorithmonsubsetsoftheinputdata,thenusethestatisticalpropertiesofthesesubsetstoestimatefinalresults.

6

ApproximatePatternMining

 Previousapproach:Applytheexactsamealgorithmonsubsetsoftheinputdata,thenusethestatisticalpropertiesofthesesubsetstoestimatefinalresults.

7

ApproximatePatternMining

◦ Nosignificantspeedup◦ Largeerrorrate

 Neighborhoodsampling:1.  Modeltheedgesinthegraphasastream2.  Sampleoneedge,𝑒↓1 3.  Graduallyaddmoreadjacentedges, 𝑒↓2 ,…, 𝑒↓𝑘 4.  Stopwhentheedgesformthepatternorbecomesimpossibletodoso5.  Usetheprobabilityofsamplingtoboundthetotalnumberofoccurrences

ofthepattern:𝑃(𝑒↓1 ,…, 𝑒↓𝑘 )=𝑃(𝑒↓1 )×𝑃(𝑒↓2 ∣𝑒↓1 )×…×𝑃(𝑒↓𝑘 ∣𝑒↓1 ,…, 𝑒↓𝑘−1 )

6.  RepeatStep1-5multipletimes

8

ApproximatePatternMining

 Neighborhoodsampling:TriangleCounting1.  Modeltheedgesinthegraphasastream

9

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

 Neighborhoodsampling:TriangleCounting2.  Sampleoneedge

10

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

 Neighborhoodsampling:TriangleCounting3.  Graduallyaddmoreadjacentedges

11

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

 Neighborhoodsampling:TriangleCounting4.  Stopwhentheedgesformthepatternorbecomesimpossibletodoso

12

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

 Neighborhoodsampling:TriangleCounting4.  Stopwhentheedgesformthepatternorbecomesimpossibletodoso

13

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

 Neighborhoodsampling:TriangleCounting5.  Usetheprobabilityofsamplingtoboundthetotalnumberofoccurrences

14

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

 Neighborhoodsampling:TriangleCounting6.  RepeatStep1-5multipletimes

15

ApproximatePatternMining

edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)

16

ASAPArchitecture

 Neighborhoodsampling:1.  Modeltheedgesinthegraphasastream2.  Sampleoneedge,𝑒↓1 3.  Graduallyaddmoreadjacentedges, 𝑒↓2 ,

…, 𝑒↓𝑘 4.  Stopwhentheedgesformthepatternor

becomesimpossibletodoso5.  Usetheprobabilityofsamplingtobound

thetotalnumberofoccurrencesofthepattern:𝑃(𝑒↓1 ,…, 𝑒↓𝑘 )=𝑃(𝑒↓1 )×𝑃(𝑒↓2 ∣𝑒↓1 )×…×𝑃(𝑒↓𝑘 ∣𝑒↓1 ,…, 𝑒↓𝑘−1 )

6.  RepeatStep1-5multipletimes

17

ProgrammingAPI

18

ProgrammingAPI

SamplingPhase:fixtheverticesforapattern

ClosingPhase:waitingforremainingedgestocompletethepattern

 Relyonmapandreduceoperations1.  Partitiontheverticesacross𝑤workers2.  Applyestimatortaskoneachsubgraphtoproduceapartialcount3.  Sumuppartialcounts4.  Adjustforunderestimationbymultiplying𝑓(𝑤)

e.g.fortrianglecount,𝑓(𝑤)=1/𝑤↑2 

19

DistributedExecution

 Relyonmapandreduceoperations1.  Partitiontheverticesacross𝑤workers2.  Applyestimatortaskoneachsubgraphtoproduceapartialcount3.  Sumuppartialcounts4.  Adjustforunderestimationbymultiplying𝑓(𝑤)

e.g.fortrianglecount,𝑓(𝑤)= 𝑤↑2 

20

DistributedExecution

𝑤↓2 

𝑤↓2 

𝑤↓2 

𝑤↓1 

𝑤↓1 

•  Patternsacrosspartitionsareignored•  Totaloccurrenceisreducedby1/𝑓(𝑤)

21

ASAPArchitecture

 ASAPcanperformtasksintwomodes:◦ Timebudget𝑇◦ Errorbudget𝜖

 Givenatime/errorbound,howmanyestimatorsshouldASAPuse?

22

Error-LatencyProfile(ELP)

 Runningtimescaleslinearlywithnumberofestimators

 Testexponentiallyspacedpoints+extrapolationtobuildalinearmodel

23

Error-LatencyProfile(ELP)

 Chernoffboundfortrianglecounting: 𝑁↓𝑒 > 𝐾×𝑚×Δ/𝜖↑2 𝑃  Estimategroundtruth 𝑃↓𝑠  onasmallsampleofthegraph+scaleto 𝑃 

24

Error-LatencyProfile(ELP)

 77xspeedupwithunder5%lossofaccuracyforsmallergraphs(0.01-30millionedges)

25

Evaluation

 258xspeedupwithunder5%lossofaccuracyforlargergraphs

26

Evaluation

 ASAPisthefirstsystemthatdoesfast,scalableapproximategraphpatternminingonlargegraphs.

 ASAPoutperformsArabesquebymorethanamagnitudefasterwithasacrificeof5%accuracy.

 ASAPscalestolargergraphswhereasArabesquefailstocompleteexecution.

27

Conclusion

◦ https://www.usenix.org/sites/default/files/conference/protected-files/osdi18_slides_iyer.pdf

◦  Iyer,AnandPadmanabha,etal."ASAP:fast,approximategraphpatternminingatscale."Proceedingsofthe12thUSENIXconferenceonOperatingSystemsDesignandImplementation.USENIXAssociation,2018.

◦  Iyer,AnandPadmanabha,etal."Towardsfastandscalablegraphpatternmining."10th{USENIX}WorkshoponHotTopicsinCloudComputing(HotCloud18).USENIX}Association},2018.

28

Reference

Recommended