Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
ASAP: Fast, Approximate Graph Pattern Mining at Scale
Anand Iyer et al. @ OSDI 2018
Presenter:YunangChen
1
ASwiftApproximatePattern-miner
Navigatestradeoffbetweenresultaccuracyandlatency
Runsongeneral-purposedistributeddataflowplatform
Supportsforgeneralizedgraphpatternminingalgorithms
2
ASAPDesignOverview
Standardapproach:Iterativeexpansion
GraphPatternMining
3
Lackofscalability◦ Generateexponentiallylargeintermediatecandidatesets◦ Needtostore+exchangethemindistributedenvironment
Standardapproach:Iterativeexpansion
GraphPatternMining
4
Lackofscalability◦ Generateexponentiallylargeintermediatecandidatesets◦ Needtostore+exchangethemindistributedenvironment
*Experimentsperformedonaclusterof20machines,eachhaving256GBofmemory.
Manypatternminingtasksdonotneedexactanswers.◦ Frequentsub-graphmining(FSM)findsthefrequencyofsubgraphsbutwithanend-goaloforderingthembyoccurrences.
5
GraphPatternMining
Leverageapproximationforpatternmining
Previousapproach:Applytheexactsamealgorithmonsubsetsoftheinputdata,thenusethestatisticalpropertiesofthesesubsetstoestimatefinalresults.
6
ApproximatePatternMining
Previousapproach:Applytheexactsamealgorithmonsubsetsoftheinputdata,thenusethestatisticalpropertiesofthesesubsetstoestimatefinalresults.
7
ApproximatePatternMining
◦ Nosignificantspeedup◦ Largeerrorrate
Neighborhoodsampling:1. Modeltheedgesinthegraphasastream2. Sampleoneedge,𝑒↓1 3. Graduallyaddmoreadjacentedges, 𝑒↓2 ,…, 𝑒↓𝑘 4. Stopwhentheedgesformthepatternorbecomesimpossibletodoso5. Usetheprobabilityofsamplingtoboundthetotalnumberofoccurrences
ofthepattern:𝑃(𝑒↓1 ,…, 𝑒↓𝑘 )=𝑃(𝑒↓1 )×𝑃(𝑒↓2 ∣𝑒↓1 )×…×𝑃(𝑒↓𝑘 ∣𝑒↓1 ,…, 𝑒↓𝑘−1 )
6. RepeatStep1-5multipletimes
8
ApproximatePatternMining
Neighborhoodsampling:TriangleCounting1. Modeltheedgesinthegraphasastream
9
ApproximatePatternMining
edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)
Neighborhoodsampling:TriangleCounting2. Sampleoneedge
10
ApproximatePatternMining
edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)
Neighborhoodsampling:TriangleCounting3. Graduallyaddmoreadjacentedges
11
ApproximatePatternMining
edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)
Neighborhoodsampling:TriangleCounting4. Stopwhentheedgesformthepatternorbecomesimpossibletodoso
12
ApproximatePatternMining
edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)
Neighborhoodsampling:TriangleCounting4. Stopwhentheedgesformthepatternorbecomesimpossibletodoso
13
ApproximatePatternMining
edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)
Neighborhoodsampling:TriangleCounting5. Usetheprobabilityofsamplingtoboundthetotalnumberofoccurrences
14
ApproximatePatternMining
edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)
Neighborhoodsampling:TriangleCounting6. RepeatStep1-5multipletimes
15
ApproximatePatternMining
edgestream:(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)
16
ASAPArchitecture
Neighborhoodsampling:1. Modeltheedgesinthegraphasastream2. Sampleoneedge,𝑒↓1 3. Graduallyaddmoreadjacentedges, 𝑒↓2 ,
…, 𝑒↓𝑘 4. Stopwhentheedgesformthepatternor
becomesimpossibletodoso5. Usetheprobabilityofsamplingtobound
thetotalnumberofoccurrencesofthepattern:𝑃(𝑒↓1 ,…, 𝑒↓𝑘 )=𝑃(𝑒↓1 )×𝑃(𝑒↓2 ∣𝑒↓1 )×…×𝑃(𝑒↓𝑘 ∣𝑒↓1 ,…, 𝑒↓𝑘−1 )
6. RepeatStep1-5multipletimes
17
ProgrammingAPI
18
ProgrammingAPI
SamplingPhase:fixtheverticesforapattern
ClosingPhase:waitingforremainingedgestocompletethepattern
Relyonmapandreduceoperations1. Partitiontheverticesacross𝑤workers2. Applyestimatortaskoneachsubgraphtoproduceapartialcount3. Sumuppartialcounts4. Adjustforunderestimationbymultiplying𝑓(𝑤)
e.g.fortrianglecount,𝑓(𝑤)=1/𝑤↑2
19
DistributedExecution
Relyonmapandreduceoperations1. Partitiontheverticesacross𝑤workers2. Applyestimatortaskoneachsubgraphtoproduceapartialcount3. Sumuppartialcounts4. Adjustforunderestimationbymultiplying𝑓(𝑤)
e.g.fortrianglecount,𝑓(𝑤)= 𝑤↑2
20
DistributedExecution
𝑤↓2
𝑤↓2
𝑤↓2
𝑤↓1
𝑤↓1
• Patternsacrosspartitionsareignored• Totaloccurrenceisreducedby1/𝑓(𝑤)
21
ASAPArchitecture
ASAPcanperformtasksintwomodes:◦ Timebudget𝑇◦ Errorbudget𝜖
Givenatime/errorbound,howmanyestimatorsshouldASAPuse?
22
Error-LatencyProfile(ELP)
Runningtimescaleslinearlywithnumberofestimators
Testexponentiallyspacedpoints+extrapolationtobuildalinearmodel
23
Error-LatencyProfile(ELP)
Chernoffboundfortrianglecounting: 𝑁↓𝑒 > 𝐾×𝑚×Δ/𝜖↑2 𝑃 Estimategroundtruth 𝑃↓𝑠 onasmallsampleofthegraph+scaleto 𝑃
24
Error-LatencyProfile(ELP)
77xspeedupwithunder5%lossofaccuracyforsmallergraphs(0.01-30millionedges)
25
Evaluation
258xspeedupwithunder5%lossofaccuracyforlargergraphs
26
Evaluation
ASAPisthefirstsystemthatdoesfast,scalableapproximategraphpatternminingonlargegraphs.
ASAPoutperformsArabesquebymorethanamagnitudefasterwithasacrificeof5%accuracy.
ASAPscalestolargergraphswhereasArabesquefailstocompleteexecution.
27
Conclusion
◦ https://www.usenix.org/sites/default/files/conference/protected-files/osdi18_slides_iyer.pdf
◦ Iyer,AnandPadmanabha,etal."ASAP:fast,approximategraphpatternminingatscale."Proceedingsofthe12thUSENIXconferenceonOperatingSystemsDesignandImplementation.USENIXAssociation,2018.
◦ Iyer,AnandPadmanabha,etal."Towardsfastandscalablegraphpatternmining."10th{USENIX}WorkshoponHotTopicsinCloudComputing(HotCloud18).USENIX}Association},2018.
28
Reference