Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
EccentricityHeuristicsthroughSublinearAnalysisLenses
TalWagnerMIT
GraphEccentricities
• Let𝐺(𝑉, 𝐸) byagraph
• Shortest-pathmetric:Δ: 𝑉×𝑉 → ℝ
• Eccentricities:𝒆 𝑣 = max
2∈4Δ(𝑣, 𝑢)
• Max𝒆 𝑣 =diameter;Min𝒆 𝑣 =radius
𝒆 𝑣 = 3
Max𝒆 𝑣 =diameter;Min𝒆 𝑣 =radius90thpercentile𝒆 𝑣 =“effectivediameter”(excludesoutliers)
Applications:Networktopologyanalysis(computers,social,biological),hardwareverification,sparselinearsystemsolving,…
EccentricityDistributionofLargeGraphs
Eccentricity
Kangetal.TKDD2011
Linkedin (Aug2006)
Outsiders
Leskovec etal.WWW2008
EccentricityDistributionofLargeGraphs
Eccentricity Eccentricity
Eccentricity
Kangetal.TKDD2011 Iwabuchi etal.CLUSTER2018
U.S.Patent(1985) Linkedin (Aug2006)
YahooWeb (GCConly)
Wikipedia Twitter
com-Friendster Webgraph
(GCCsonly)
ComputingAllEccentricities
• Exactcomputation:𝑂 𝑚𝑛 (e.g.BFSfromeachnode)• Approximatealgorithms• Theoretical:
• Empirical: [Kangetal.’11],[Boldi etal.’11],[Takes&Kosters ‘13],[…], [Shun’15]
4-approx. 𝑂(𝑚) time [OneBFS]
(2 + 𝛿)-approx. 𝑂>(𝑚 𝛿⁄ ) time [Backurs-Roditty-Segal-V.Williams-Wein’18]
5 3⁄ -approx. 𝑂>(𝑚A.C) time [Chechik-Larkin-Roditty-Schoenebeck-Tarjan-V.Williams’14]
TightunderSETH
Parallel𝑘-BFSHeuristics[Shun’15]
• 𝑆A ← 𝑘 randomnodes
• ComputeBFSfromeach𝑢 ∈ 𝑆A
• 𝒆G𝟏 𝑣 ←maxdistancefrom𝑆A
• 𝑆I ← 𝑘 furthestnodesfrom𝑆A
• ComputeBFSfromeach𝑢 ∈ 𝑆I
• 𝒆G𝟐 𝑣 ←maxdistancefrom𝑆A ∪ 𝑆I
𝑘-BFS1:
𝑘-BFS2:
Parallel𝑘-BFSHeuristics[Shun’15]
• 𝑆A ← 𝑘 randomnodes
• ComputeBFSfromeach𝑢 ∈ 𝑆A
• 𝒆G𝟏 𝑣 ←maxdistancefrom𝑆A
• 𝑆I ← 𝑘 furthestnodesfrom𝑆A
• ComputeBFSfromeach𝑢 ∈ 𝑆I
• 𝒆G𝟐 𝑣 ←maxdistancefrom𝑆A ∪ 𝑆I
𝑘-BFS1:
𝑘-BFS2:
EmpiricalResultsin[Shun’15]
• 𝑘-BFS1 performsreasonablewell• E.g.,medianaveragerelativeerror7.55%
• 𝑘-BFS2 beatsallothermethodsby
ordersofmagnitude
• Oftencomputesalleccentricitiesexactly
Why?
Reagan’sPrinciple
“They'rethesortofpeoplewhoseesomethingworksinpracticeandwonderifitwouldworkintheory.”
ThisWork
• Analyze heuristicsinordertoexplain andimprove• Willgetprovable variantswithbetterempiricalperformance• Needtogobeyondworst-case(duetoSETH-hardness)
• 𝑘-BFS2: ConnectiontoStreamingSetCover• [Demaine,Indyk,Mahabadi,Vakilian ’14]
• 𝑘-BFS1: ConnectiontoDiameterPropertyTesting• [Parnas &Ron’02]
Empiricalvalidationoftheory-basedalgorithms
𝑘-BFS2 byStreamingSetCover
SetCoverFormulation
• SetCover:Givenelements𝑉 andsubsets𝒮 ⊂ 24 ,findsmallestcover𝐶 ⊂ 𝒮 of𝑉.
• EccentricitiesasSetCover:• Nodesareelements• Nodesaresets:𝒮 = 𝐴P: 𝑣 ∈ 𝑉
𝑣 𝐴P
𝐴P = 𝑢 ∈ 𝑉: 𝒆 𝑢 = Δ 𝑣, 𝑢
SetCoverFormulation
• SetCover:Givenelements𝑉 andsubsets𝒮 ⊂ 24 ,findsmallestcover𝐶 ⊂ 𝒮 of𝑉.
• EccentricitiesasSetCover:• Nodesareelements• Nodesaresets:𝒮 = 𝐴P: 𝑣 ∈ 𝑉
𝑣𝐴P
𝐴P = 𝑢 ∈ 𝑉: 𝒆 𝑢 = Δ 𝑣, 𝑢
SetCoverFormulation
• SetCover:Givenelements𝑉 andsubsets𝒮 ⊂ 24 ,findsmallestcover𝐶 ⊂ 𝒮 of𝑉.
• EccentricitiesasSetCover:• Nodesareelements• Nodesaresets:𝒮 = 𝐴P: 𝑣 ∈ 𝑉
𝐴P = 𝑢 ∈ 𝑉: 𝒆 𝑢 = Δ 𝑣, 𝑢
𝑣
𝐴P = ∅
SetCoverFormulation
• SetCover:Givenelements𝑉 andsubsets𝒮 ⊂ 24 ,findsmallestcover𝐶 ⊂ 𝒮 of𝑉.
• EccentricitiesasSetCover:• Nodesareelements• Nodesaresets:𝒮 = 𝐴P: 𝑣 ∈ 𝑉
• Covercomputesalleccentricities
• Optimalcover=“eccentriccover”,𝜿𝐴P = 𝑢 ∈ 𝑉: 𝒆 𝑢 = Δ 𝑣, 𝑢
ComputationalConstraints
• Computingaset𝐴P isprohibitive• 𝑂(𝑚𝑛) work
• Computingwhichsetscover𝑣 isexpensive• SingleBFS,𝑂(𝑚) work
• KnownSetCoveralgorithms?Yes
𝑣𝐴P
𝑣 𝐴PS𝐴PT𝐴PU
StreamingSetCover[Demaine-Indyk-Mahabadi-Vakilian’14]
• 𝑆A ← 𝑘 randomelements
• 𝐶 ← Cover forsample(e.g.greedy)
ElementSamplingLemma:
Ifglobaloptimumissmall,𝐶 covers
almostallelements.
𝑘-BFS2 vs.DIMV
𝑘-BFS2 Streaming SetCover[DIMV’14]
𝑆 ← Random sample ⟺ 𝑆 ← Random sample
Compute BFSfromeach𝑣 ∈ 𝑆 ⟺ Compute coveringsetsforeach𝑣 ∈ 𝑆
𝐶 ← 𝑘 nodeswith maxΔ(𝑣, 𝑆) ≉ 𝐶 ← Greedycoverfor 𝑆
𝑘-BFSSC𝑘-BFS2 Streaming SetCover[DIMV’14]
𝑆 ← Random sample ⟺ 𝑆 ← Random sample
Compute BFSfromeach𝑣 ∈ 𝑆 ⟺ Compute coveringsetsforeach𝑣 ∈ 𝑆
𝐶 ← 𝑘 nodeswith maxΔ(𝑣, 𝑆) ≉ 𝐶 ← Greedycoverfor 𝑆
𝐶 ← Parallelgreedycoverfor𝑆[Blelloch-Peng-Tangwongsan’11]
[Blelloch-Simhadri-Tangwongsan’12]
𝑘-BFSSCTheorem:Suppose𝐺(𝑉, 𝐸) haseccentriccoversize𝜿.
𝑘-BFSSC with𝑘 = 𝑂> 𝜿 ⋅ 𝜖ZA log 𝑛 satisfies:
• Expectedwork:𝑂(𝑘𝑚),expecteddepth:𝑂>(diam 𝐺 )
• Computesexacteccentricitiesofallbutan𝜖-fractionofnodesw.h.p.
???
EccentricCover:Warm-Up
• Path,star,clique:𝜿 = 2
• Evencycle,hypercube:𝜿 = 𝑛
• Oddcycle:𝜿 = AI𝑛 + 1
EccentricCoverintheWild
• 8real-worldgraphsin[Shun’15]
• 1M-4M nodeseach
• Upperboundsoneccentriccoversize:
• 2graphs:𝜿 ≤ 𝟏𝟐𝟖
• 5graphs:𝜿 ≲ 𝟏, 𝟎𝟎𝟎
• 1graph:𝜿 ≲ 𝟏𝟎, 𝟎𝟎𝟎
Real-worldgraphshavesmalleccentriccovers
Experiments
𝑘-BFS2 vs.𝑘-BFSSC(Real-worldgraphsfromStanford
NetworkAnalysisProject)
BFScount BFScount
averagerelativeerror
Graph1:𝑛 = 36,646𝑚 = 88,303𝜿 ≤ 1,024
Exactratio at𝑘 = 16:𝑘-BFS2:80%𝑘-BFSSC:99%
Graph2:𝑛 = 11,174𝑚 = 23,409𝜿 ≤ 32
Exactratio at𝑘 = 16:𝑘-BFS2:97%𝑘-BFSSC:100%
𝑘-BFS1 byPropertyTesting
PropertyTestingApproximation
• Usualapproximation:𝒆G 𝑣 iscloseto𝒆(𝑣)
• Propertytestingapproximation:𝒆G 𝑣 isexactonsome𝑮k closeto𝑮• Graphsare𝝐-closeifupto𝝐 ⋅ 𝒎 edgescanbeadded/removedtoget𝑮k from𝑮
• Nosparsity/densityassumption(“GeneralGraphModel”)
• Notation:𝒆G 𝑣 ≤ 𝒆 𝑣 ≼𝝐 𝒆G 𝑣
𝒆G 𝑣 = 1𝒆 𝑣 = 2
𝑮k is𝟒𝟗-closeto𝑮
𝑘-BFS1 vs.DiameterTesting
𝑘-BFS1 with𝑘 = 𝑂 𝜖ZA log 𝑛 satisfies𝒆G 𝑣 ≤ 𝒆 𝑣 ≼𝝐 𝒆G 𝑣 forall𝑣.• Work:𝑂>(𝜖ZA𝑚),depth:𝑂>(diam 𝐺 )• Algorithm:StartBFSat𝑘 randomnodes
Theorem [Parnas &Ron]:Givenagraph𝐺,computeadiameterestimate𝑫k suchthat𝑫k ≤ diam 𝐺 ≼𝝐 2𝑫k + 2.• Time:𝑝𝑜𝑙𝑦 𝜖ZA
• Algorithm:Starttruncated BFSat𝑘 randomnodes
Butthisislineartime
EccentricityTesting
Aux.Theorem:Given𝐺 and𝑣,compute𝒆G 𝑣 s.t. 𝒆G 𝑣 ≤ 𝒆 𝑣 ≼𝝐 𝒆G 𝑣intime𝑝𝑜𝑙𝑦 𝜖ZA .• Corollary– Diametertesting:𝑫k ≤ diam 𝐺 ≼𝝐 2𝑫k (shavedoff+2)
• Corollary– Radiustesting:𝑹k ≤ radius 𝐺 ≼𝝐 𝑹k + 𝟏
Impliesvariantof𝑘-BFS1:𝑘-BFSTST
𝑘-BFSTSTTheorem:𝑘-BFSTST satisfies𝒆G 𝑣 ≤ 𝒆 𝑣 ≼𝝐 𝒆G 𝑣 forall𝑣.• Work:𝑂(𝜖ZI𝑛),depth:𝑂>(𝜖ZA log 𝑛)
Sameguaranteeas𝑘-BFS1butinsublinear workand
depth,independentofgraph.
• Algorithm:truncatedBFS• 𝑆A ← 𝑘 randomnodes
• Fromeach𝑢 ∈ 𝑆A,startaBFSuptofirstlevelℓ2where𝑂>(𝜖ZA) nodesareseen.Allunseennodes
areconsideredat“distance”ℓ2 + 1 from𝑢.
• 𝒆G𝐓𝐒𝐓 𝑣 ←max“distance”from𝑆A
Experiments
𝑘-BFS1 vs.𝑘-BFSTST (withdifferentBFScutoffs)
Edgetraversals Edgetraversals
averagerelativeerror
Graph1:𝑛 = 36,646𝑚 = 88,303
Graph2:𝑛 = 11,174𝑚 = 23,409
Conclusion
• Explainandimprovehigh-performingheuristics
• Practicalalgorithm->“fit”analysis->practicalimprovementwithguarantees
• Inter-connectionsofparallel,streaming,sketching,andproperty
testingalgorithms
• All“pointtosamedirection”Thankyou