Upload
jason-riedy
View
105
Download
3
Tags:
Embed Size (px)
Citation preview
STINGER: Multi-threaded GraphStreamingJason Riedy and David BaderGeorgia Institute of Technology
Graph Algorithms Building Blocks (GABB), 19 May 2014
(insert current prefix here)scale data analysis
Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating algorithms
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes
Γ Data analysis runs throughout.Γ Many problems can be phrased through graphs.
Γ Any many are changing and dynamic.
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 2 / 24
Outline
Motivation: Graph Algorithms for Analysis
Graphs and Streaming Data
STING/STINGER Analysis Framework
Building Blocks for Streaming Graph DataPageRankTriangle CountingAgglomerative Communities
Observations
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 3 / 24
General approaches
Γ High-performance static graph analysisΓ Develop techniques that apply to unchanging massive
graphs.Γ Provides useful after-the-fact information, starting points.Γ Serves many existing applications well: market research,
much bioinformatics, ...Γ Needs to be O(|E|).
Γ High-performance streaming graph analysisΓ Focus on the dynamic changes within massive graphs.Γ Find trends or new information as they appear.Γ Serves upcoming applications: fault or threat detection,
trend analysis, online prediction...Γ Can be O(|βE|)? O(Vol(βV))? Less data β faster, efficient
Both very important to different areas.Remaining focus is on streaming.
Note: Not CS theory streaming, but analysis of streaming data.
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 4 / 24
Streaming graph data
Data RatesFrom www.statisticsbrain.com:Γ 58M posts per day on Twitter (671 / sec)Γ 1M links shared per 20 minutes on Facebook
Other applications (e.g. network security) need to respondnearly at line rate, 81k-1.5M pps on gigabit ethernet.
Opportunities
Γ Do not need to analyze the entire graph.Γ Different domains: Throughput & latency
Γ Expose different levels of concurrencyΓ Can achieve ridiculous βspeed ups.β
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 5 / 24
Streaming Queries
Different kinds of questions
Γ How are individual graph metrics (e.g. clusteringcoefficients) changing?
Γ What are the patterns in the changes?Γ Are there seasonal variations?Γ What are responses to events?
Γ What are temporal anomalies in the graph?Γ Do key members in clusters / communities change?Γ Are there indicators of event responses before they are
obvious?
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 6 / 24
On to STING...
Motivation: Graph Algorithms for Analysis
Graphs and Streaming Data
STING/STINGER Analysis Framework
Building Blocks for Streaming Graph Data
Observations
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 7 / 24
STINGβs focus
Source data
predictionaction
summary
Control
VizSimulation / query
Γ STING: Spatio-Temporal Interaction Networks and GraphsΓ STING manages queries against changing graph data.
Γ Visualization and control often are application specific.Γ Ideal: Maintain many persistent graph analysis kernels.
Γ One current graph snapshot, kernels keep smaller histories.Γ Also (a harder goal), coordinate the kernelsβ cooperation.
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 8 / 24
STING: High-level architecture
Slide credit: Rob McColl and David Ediger
Γ OpenMP + sufficiently POSIX-ishΓ Multiple processes for resilience
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 9 / 24
STING Extensible Representation: Core data structure
sourcevertex
edgetype
adj. vtx weight mod. time first time
edge type index
vertexindex
Initial considerations [Bader, et al.]
Γ Be useful for the entire βlarge graphβ communityΓ Permit good performance: No single structure is optimal for all.Γ Assume globally addressable memory access and atomic operationsΓ Not a graph database, but supports types, subsetsΓ Large graph β rare conflicts
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 10 / 24
Building Blocks for Streaming Graph Data
Motivation: Graph Algorithms for Analysis
Graphs and Streaming Data
STING/STINGER Analysis Framework
Building Blocks for Streaming Graph DataPageRankTriangle CountingAgglomerative Communities
Observations
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 11 / 24
Incremental PageRank
Γ PageRank: Well-understood method for ranking verticesbased on random walks (related to minimizingconductance).
Γ Equivalent problem, solve (Iβ Ξ±ATDβ1)x = (1β Ξ±)v giveninitial weights v.
Γ Goal: Use for seed set expansion, sparse v.Γ State-of-the-art for updating x when the graph represented
by A changes? Re-start iteration with the previous x.Γ Can do significantly better for low-latency needs.Γ Compute the change βx instead of the entire new x.
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 12 / 24
Incremental PageRank: Iteration
Iterative solverStep kβ k + 1:
βx(k+1) = Ξ±(A + βA)T(D + βD)β1βx(k)+
Ξ±οΏ½
(A + βA)T(D + βD)β1 β ATDβ1οΏ½
x
Γ Additive part: Non-zero only at changes.Γ Operator: Pushes changes outward.
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 13 / 24
Incremental PageRank: Limiting Expansion
Iterative solverStep kβ k + 1:
βxΜ(k+1) = Ξ±(A + βA)T(D + βD)β1βxΜ(k)|ex + Ξ±βxΜ|held
Ξ±οΏ½
(A + βA)T(D + βD)β1 β ATDβ1οΏ½
xΜ
Γ Additive part: Non-zero only at changes.Γ Operator: Pushes sufficiently large changes outward.
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 14 / 24
Incremental PageRank: Test Cases
Γ Initial, high-level implementation via sparse matrices in Julia.Γ Test graphs from the 10th DIMACS Implementation Challenge.Γ Add uniformly random edges... worst case.Γ Up to 100k in different batch sizes.Γ One sequence of edge actions per graph shared across
experiments.Γ Conv. & hold base threshold: 10β12
Graph |V| |E| Avg. Deg. Size (MiB)
caidaRouterLevel 192 244 609 066 3.17 5.38coPapersCiteseer 434 102 1 603 6720 36.94 124.01
coPapersDBLP 540 486 15 245 729 28.21 118.38great-britain.osm 7 733 822 8 156 517 1.05 91.73PGPgiantcompo 10 680 24 316 2.28 0.23
power 4 941 6 594 1.33 0.07
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 15 / 24
Incremental PageRank: Throughput v. latency
Percent of edge traversals relative to re-started iteration:1000 100 10
β β β β β β β β β ββ β β β β β β β β ββ β β β β β β β β ββ β β β β β β β β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0%
20%
40%
60%
80%
0%
20%
40%
60%
80%
0%
20%
40%
60%
80%
0%
20%
40%
60%
80%
0%
20%
40%
60%
80%
0%
20%
40%
60%
80%
caidaRouterLevel
coPapersC
iteseercoP
apersDB
LPgreat_britain.osm
PG
Pgiantcom
popow
er
0 2500 5000 7500 10000 0 2500 5000 7500 10000 0 2500 5000 7500 10000"Time"
Fra
ctio
n of
edg
es v
. res
tart
ed P
R
Graph name β caidaRouterLevel coPapersCiteseer coPapersDBLP great_britain.osm PGPgiantcompo power
Held vertex threshold / 10**β12 β β β β10000.0 1000.0 100.0 10.0
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 16 / 24
Triangle Counting
Current version
Γ Count all the triangles around each graph vertex.Γ Used in clustering coefficients (numerator), etc.
Γ Up to 130 000 graph updates per second on X5570(Nehalem-EP, 2.93GHz)
Γ 2000Γ speed-up over static recomputationΓ Main algorithm, for each vertex v:
Γ Sort its adjacency list.Γ For each neighbor w,
Γ search for wβs neighbors in the sorted list.
Γ Could compute diag(A3), more or less...
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 17 / 24
Triangle Counting: Small Batches
Low-latency case
Γ In generall, diag(A3) is a silly option.Γ But A3βx, a BFS, to count around a few vertices...Γ Brute force (MTAAP10)
Γ Roughly 4Γ slower with moderate batches, andΓ less than 2Γ slower with small batches.
Γ Could be reasonable for a quick hack.
Small changes (low latency) may find more applications oflinear algebra-like primitives.
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 18 / 24
Community Detection
What do we mean?Γ Partition a graphβs vertices
into disjoint communities.Γ Locally optimize some metric,
e.g. modularity, conductanceΓ Try to capture that vertices are
more similar within onecommunity than betweencommunities.
Γ Modularity: More internaledges than expected.
Jasonβs network via LinkedIn Labs
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 19 / 24
Parallel Agglomerative Method
A
B
C
DE
FG
Γ Use a matching to avoid a serializingqueue.Γ Simple greedy algorithm.Γ Would require smuggling data and
communication in through elementoperators...
Γ Highly scalable, 5.8Γ 106 β 7Γ 106
edges/sec, 8Γ β 16Γ speed-up on80-thread E7-8870 (thanks Intel!)
Γ Extends to dynamic communitymaintenanceΓ Extract vertices from communities,
re-agglomerateΓ Matrix triple product-ish
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 20 / 24
Parallel Agglomerative Method
A
B
C
DE
FG
C
D
G
Γ Use a matching to avoid a serializingqueue.Γ Simple greedy algorithm.Γ Would require smuggling data and
communication in through elementoperators...
Γ Highly scalable, 5.8Γ 106 β 7Γ 106
edges/sec, 8Γ β 16Γ speed-up on80-thread E7-8870 (thanks Intel!)
Γ Extends to dynamic communitymaintenanceΓ Extract vertices from communities,
re-agglomerateΓ Matrix triple product-ish
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 20 / 24
Parallel Agglomerative Method
A
B
C
DE
FG
C
D
G
E
B
C
Γ Use a matching to avoid a serializingqueue.Γ Simple greedy algorithm.Γ Would require smuggling data and
communication in through elementoperators...
Γ Highly scalable, 5.8Γ 106 β 7Γ 106
edges/sec, 8Γ β 16Γ speed-up on80-thread E7-8870 (thanks Intel!)
Γ Extends to dynamic communitymaintenanceΓ Extract vertices from communities,
re-agglomerateΓ Matrix triple product-ish
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 20 / 24
Seed Set Expansion
Problem
Γ Given a small number of vertices, andΓ find a region of interest around them.
Γ Start with a subset consisting of the selection.Γ Evaluate the change in modularity around the current
subset.Γ Absorb all vertices that...
Γ may increase modularity by a significant amount, orΓ are within the top 10% of changes, or...
Γ Repeat until the set is large enough.Γ Step-wise guided expansion doesnβt fit current primitives.
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 21 / 24
Observations
Γ Throughput / latency trade offs:Γ Different levels of parallelism and optimizationsΓ Larger batches β higher throughput, more collisionsΓ Small batches β lower latency, more scatteredΓ Impact optimizations similarly to direction-optimized BFS
Γ Can build proposed building blocks against STINGERΓ Many algorithms are not naturally expressed:
Γ MatchingΓ Guided set expansion by changing criteriaΓ Streaming versions of these...
Γ Targets for version 2?
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 22 / 24
STINGER: Where do you get it?
www.cc.gatech.edu/stinger/Gateway toΓ code,Γ development,Γ documentation,Γ presentations...Γ (working on usage and
development screencasts)Remember: Still academic code, butmaturing.Users / contributors / questioners:Georgia Tech, PNNL, CMU, Berkeley,Intel, Cray, NVIDIA, IBM, FederalGovernment, Ionic Security, Citi
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 23 / 24
Acknowledgment of support
Riedy, Baderβ STINGER: Building Blocks 19 May 2014 24 / 24