Upload
lykhanh
View
230
Download
0
Embed Size (px)
Citation preview
FASCIAFast Approximate Subgraph Counting and Enumeration
George M. Slota Kamesh Madduri
Scalable Computing LaboratoryThe Pennsylvania State University
2 Oct. 2013
1 / 28
Overview
Background
Motivation
Color-coding
Related Work
FASCIA
Results
Future Work
2 / 28
BackgroundSubgraph Counting
3 / 28
BackgroundSubgraph Counting
3 / 28
BackgroundSubgraph Counting
3 / 28
BackgroundSubgraph Counting
3 / 28
BackgroundSubgraph Counting
3 / 28
BackgroundSubgraph Enumeration
4 / 28
BackgroundSubgraph Enumeration
4 / 28
BackgroundSubgraph Enumeration
4 / 28
MotivationWhy do we want fast algorithms for subgraph counting?
Important to bioinformatics, chemoinformatics, socialnetwork analysis, communication network analysis, etc.
Forms basis of more complex analysis
Motif findingGraphlet frequency distance (GFD)Graphlet degree distributions (GDD)
Counting and enumeration on large networks is verytough, O(nk) complexity for naıve algorithm
5 / 28
MotivationMotif finding, GFD, and GDD
Motif finding: Look for all subgraphs of a certain size
GFD: Numerically compare occurrence frequency to other networksGDD: Numerically compare embeddings/vertex distribution
●
● ●●
●
●●
● ●
●
●
0.1
1.0
1 2 3 4 5 6 7 8 9 10 11Subgraph
Rel
ativ
e F
requ
ency
●●●●E.coli S.cerevisiae H.pylori C.elegans
6 / 28
MotivationMotif finding, GFD, and GDD
Motif finding: Look for all subgraphs of a certain sizeGFD: Numerically compare occurrence frequency to other networks
GDD: Numerically compare embeddings/vertex distribution
6 / 28
MotivationMotif finding, GFD, and GDD
Motif finding: Look for all subgraphs of a certain sizeGFD: Numerically compare occurrence frequency to other networksGDD: Numerically compare embeddings/vertex distribution
6 / 28
Color-codingColor-coding for Approximate Subgraph Counting
Color-coding [Alon et al., 1995] is a way to approximatethe number of tree-like non-induced subgraph embeddingsin a network
We randomly color a graph with k colors and count thenumber of colorful subgraph embeddings
We retrieve an approximate total count by scaling thecolorful count by the probability that any givenembedding would be colorful
Run multiple times with unique graph colorings to getfinal average
Using dynamic programming approach (to be explained),each iteration can run in O(m · 2k · ek)
7 / 28
Color-codingColor-coding for Approximate Subgraph Counting
cntcolorful = 3,Ctotal = 33,Ccolorful = 3!,P = 3!33
cntestimate = cntcolorfulP = 13.5
8 / 28
Color-codingColor-coding for Approximate Subgraph Counting
cntcolorful = 3,Ctotal = 33,Ccolorful = 3!,P = 3!33
cntestimate = cntcolorfulP = 13.5
8 / 28
Color-codingColor-coding for Approximate Subgraph Counting
cntcolorful = 3,Ctotal = 33,Ccolorful = 3!,P = 3!33
cntestimate = cntcolorfulP = 13.5
8 / 28
Color-codingColor-coding for Approximate Subgraph Counting
cntcolorful = 3,Ctotal = 33,Ccolorful = 3!,P = 3!33
cntestimate = cntcolorfulP = 13.5
8 / 28
Color-codingColor-coding for Approximate Subgraph Counting
cntcolorful = 3,Ctotal = 33,Ccolorful = 3!,P = 3!33
cntestimate = cntcolorfulP = 13.5
8 / 28
Color-codingColor-coding for Approximate Subgraph Counting
cntcolorful = 3,Ctotal = 33,Ccolorful = 3!,P = 3!33
cntestimate = cntcolorfulP = 13.5
8 / 28
Color-codingAlgorithmic Overview Description
1: Partition input template T (k vertices) into subtemplatesSi using single edge cuts.
2: Select Niter to be performed3: for i = 1 to Niter do4: Randomly assign to each vertex v in graph G a color
between 0 and k − 1.5: for all v ∈ G do6: Use a dynamic programming scheme to count
colorful non-induced occurrences of T rooted at v .7: end for8: end for9: Take average of all Niter counts to be final count.
9 / 28
Color-codingTemplate partitioning
10 / 28
Color-codingDynamic Programming Step
1: for all sub-templates Si created from partitioning T , in reverse order they werecreated during partitioning do
2: for all vertices v ∈ G do3: if Si consists of a single node then4: Set table[Si ][v ][color of v ] := 15: else6: Si consists of active child ai and passive child pi7: for all colorsets C of unique values mapped to S do8: Set count := 09: for all u ∈ N(v), where N(v) is the neighborhood of v do
10: for all possible combinations Ca and Cp created by splitting Cand mapping onto ai and pi do
11: count += table[ai ][v ][Ca]·table[pi ][u][Cp ]12: end for13: end for14: Set table[Si ][v ][C ] := count15: end for16: end if17: end for18: end for19: templateCount :=
∑v
∑C
table[T ][v ][C ]
11 / 28
Color-codingColorset and count calculation
12 / 28
Related WorkColor-coding, non-induced subgraph enumeration
Alon et al.’s Implementation [Alon et al., 2008]
Motif finding on PPI networks
MODA [Omidi et al., 2009]
Uses approximation or exact schemeMotif finding on small networks
PARSE [Zhao et al., 2010]
Distributed color-coding algorithm using MPIHandles large graphs through partitioning
SAHAD [Zhao et al., 2012]
Distributed color-coding algorithm using Hadoop(MapReduce)Handles vertex-labeled graphs, computes graphlet degreedistributions
13 / 28
FASCIAOverview
FASCIA: Fast Approximate Subgraph Counting and Enumeration
Count and enumerate subgraphs, supports node labelsPerform motif finding, calculate GDD
Algorithmic Optimizations:
Combinatorial indexing scheme for color mappingsMultiple shared-memory parallelization strategiesMemory reduction via array design, hashing schemesSpeedup via template partitioning and work avoidance
14 / 28
FASCIACombinatorial indexing scheme
Combinatorial number system to represent colorsets:C =
(c11
)+(c2
2
)+ · · ·+
(ckk
), where c1 < c2 < · · · < ck
Precompute indexes and ordering in advance, store in table (<2MB for k =12)This avoid avoids explicit handling/passing of colors, or computation of colorsetindexes during runtime
15 / 28
FASCIAShared memory parallelization
Inner loop parallelization: forall v ∈ G
Outer loop parallelization: for i = 1 to Niter
Outer loop requires individual dynamic tables for each separateiteration, storage increases linearly with thread count
Possible to do arbitrary combinations, e.g. a 12 thread machinewith 2 outer loop threads each with 6 inner loop threads
1: Partition input template T2: Select Niter to be performed3: for i = 1 to Niter in parallel do4: Randomly color G5: for all Si created during partitioning, ai and pi children do6: for all v ∈ G in parallel do7: ....8: end for9: end for
10: end for11: Take average of all Niter counts to be final count.
16 / 28
FASCIAMemory optimizations
We implement both a three-level array and hash table
Items in array are stored/retrieved as table[Si ][v ][C ]
Hash table, for each subtemplate, key = v ∗ Nc + C
If we initialize and resize hash table to be a factor ofNv ∗ Nc , then simple hash function (key mod Nv ) resultsin a relative uniform distribution based on initial randomgraph coloring
Generally: array method more memory efficient for densegraphs, hash table more efficient for sparse graphs
17 / 28
FASCIATemplate partitioning and work avoidance
Basic partitioning: try to evenly partition template
Observation: algorithm runtime is proportional to∑i
( kSi
)·(Siai
), i.e.
∑i|Ci | · |Cai |
This sum can be minimized by a one-at-a-time partitioning approachOn certain templates, this sum can be minimized by exploiting latent symmetry,HOWEVER ...
18 / 28
FASCIATemplate partitioning and work avoidance
Basic partitioning: try to evenly partition template
Observation: algorithm runtime is proportional to∑i
( kSi
)·(Siai
), i.e.
∑i|Ci | · |Cai |
This sum can be minimized by a one-at-a-time partitioning approachOn certain templates, this sum can be minimized by exploiting latent symmetry,HOWEVER ...
18 / 28
FASCIATemplate partitioning and work avoidance
Basic partitioning: try to evenly partition template
Observation: algorithm runtime is proportional to∑i
( kSi
)·(Siai
), i.e.
∑i|Ci | · |Cai |
This sum can be minimized by a one-at-a-time partitioning approach
On certain templates, this sum can be minimized by exploiting latent symmetry,HOWEVER ...
18 / 28
FASCIATemplate partitioning and work avoidance
Basic partitioning: try to evenly partition template
Observation: algorithm runtime is proportional to∑i
( kSi
)·(Siai
), i.e.
∑i|Ci | · |Cai |
This sum can be minimized by a one-at-a-time partitioning approach
On certain templates, this sum can be minimized by exploiting latent symmetry,HOWEVER ...
18 / 28
FASCIATemplate partitioning, work reduction
With a single node active or passive child we can do ∼ 1k
of the original work,
Ca = color(v) or Cp = color(u)
We can avoid all work for a given v if (table[a][v ][] == NULL), for anyarbitrary subtemplate
Can also minimize the size of N(v) by checking the initialization for eachpossible u
Finally, we can order the way in which we process all Si to minimize memoryusage
1: for all Si created during partitioning, ai and pi children do2: for all v ∈ G , where table[ai ][v ] != NULL do3: for all C possibly mapped to Si do4: for all Cp , Ca from C , where Ca,Cp is fixed if |ai | = 1 or |pi | = 1 do5: for all u ∈ N(v), where table[pi ][u] != NULL do6: count += table[ai ][v ][Ca]·table[pi ][u][Cp ]7: end for8: end for9: Set table[Si ][v ][C ] := count
10: end for11: end for12: end for
19 / 28
ResultsOverview
Networks and templates analyzed
Large network runtimes
Approximation Error in template counts
Memory savings
Strong scaling
20 / 28
ResultsNetworks and templates analyzed
Gordon at SDSC, 2× Xeon E5 @ 2.6GHz (Sandy Bridge), 64 GBDDR3Database of Interacting Proteins, SNAP, and Virginia Tech NDSSL
Network n m davg dmax
Portland 1,588,212 31,204,286 39.3 275PA Road Net 1,090,917 1,541,898 2.8 9
Slashdot 82,168 438,643 10.7 2510Enron 33,696 180,811 10.7 1383E. coli 2,546 11,520 9.0 178
S. cerevisiae 5,021 22,119 8.8 289H. pylori 687 1,352 3.9 54
C. elegans 2,391 3,831 3.2 187
U3-1 U3-2 U5-1 U5-2 U7-1
U7-2 U10-1 U10-2 U12-1 U12-2
21 / 28
ResultsLarge network runtimes
Parallel (16 cores) results for all unlabeled (left) and labeled (right)templates on Portland network
8 possible demographic labels (M/F and kid/youth/adult/senior)∼200 seconds for up to 12 vertex unlabeled template, less than 1second for all labeled templates
0
50
100
150
200
U3−1 U3−2 U5−1 U5−2 U7−1 U7−2 U10−1 U10−2 U12−1 U12−2Template
Sin
gle
Itera
tion
Exe
cutio
n T
ime
(s)
0.0
0.2
0.4
0.6
L3−1 L3−2 L5−1 L5−2 L7−1 L7−2 L10−1 L10−2 L12−1 L12−2Labeled Template
Sin
gle
Itera
tion
Exe
cutio
n T
ime
(s)
22 / 28
ResultsMotif finding runtimes
Parallel (16 cores) runtimes for a single iteration of all 7, 10, and 12vertex templates on the four biological PPI networks7 → 11 templates, 10 → 106 templates, 12 → 556 templates
●
●
●
1e−01
1e+01
1e+03
7 10 12Motif Size
Sin
gle
Itera
tion
Exe
cutio
n T
ime
(s)
●●●●E.coli S.cerevisiae H.pylori C.elegans
23 / 28
ResultsApproximation Error in template counts
Enron (left) with U3-1 and U5-1 and H. Pylori (right)across all 11 unique templates of size 7
Error increases with template size and inversely tonetwork size
●
●
● ●
● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
0.000
0.005
0.010
0.015
0.020
1 2 3 4 5 6 7 8 9 10Iterations
Err
or
●● ●●U3−1 U5−1●
●
●
●
●0.000
0.025
0.050
0.075
1 10 100 1000 10000Iterations
Err
or
24 / 28
ResultsMemory savings
Memory requirements on Portland and PA Road networkfor improved array (left) and hash table (right)Using all UX-1 chain templatesUp to 20%-90% savings versus naıve method
● ●
●
●
●
● ●
●
●
●
● ● ●●
●
0
5
10
15
20
3 5 7 10 12Template Size
Mem
ory
Usa
ge (
GB
)
●●● ●●● ●●●Naive Improved Improved − Labeled
Portland Network with Improved Array
● ● ●●
●
● ●●
●
●
● ●●
●
●
0
5
10
15
3 5 7 10 12Template Size
Mem
ory
Usa
ge (
GB
)
●●● ●●● ●●●Naive Improved Hash Table
PA Road Network with Hash Table25 / 28
ResultsStrong Scaling
Inner loop for large graphs (forall v ∈ G )
Outer loop for small graphs (for i = 1 to Niter)
U12-2 Template on Portland (left) and Enron (right), 16 threads
About 12× speedup for inner loop on Portland
About 6× speedup for outer loop on Enron, 3.5× for inner loop
●
●
●
●
●●
500
1000
1500
2000
2500
1 2 4 8 12 16Processor Cores
Sin
gle
Itera
tion
Exe
cutio
n T
ime
(s)
●
●
●
●●
●
0.1
0.2
0.3
1 2 4 8 12 16Processor Cores
Exe
cutio
n T
ime
(s)
●● Inner Loop Outer Loop
26 / 28
Conclusions and Future (Ongoing) WorkFASCIA
Memory and algorithmic improvements allow FASCIA toachieve considerable speedup relative to previous work
Real time count estimate for up to 7 vertex templates onlarge networks
Algorithm provides reasonable error after a low number ofiterations
Further optimization for count storage (CSR-likestructure)
Distribution via MPI, graph partitioning to allow largercounts
27 / 28
Bibliography
N. Alon, R. Yuster, and U. Zwick. Color-coding. J. ACM, 42(4):844–856,1995.
N. Alon, P. Dao, I. Hajirasouliha, F. Hormozdiari, and S.C. Sahinalp.Biomolecular network motif counting and discovery by color coding.Bioinformatics, 24(13):i241–i249, 2008.
S. Omidi, F. Schreiber, and A. Masoudi-Nejad. MODA: an efficientalgorithm for network motif discovery in biological networks. GenesGenet Syst, 84(5):385–395, 2009.
Z. Zhao, M. Khan, V.S.A. Kumar, and M.V. Marathe. Subgraphenumeration in large social contact networks using parallel colorcoding and streaming. In Proc. 39th Int’l. Conf. on Parallel Processing(ICPP ’10), pages 594–603, 2010.
Z. Zhao, G. Wang, A.R. Butt, M. Khan, V.S.A. Kumar, and M.V.Marathe. Efficient mining of frequent subgraphs in the presence ofisomorphism. In Proc. 26th IEEE Int’l. Parallel and DistributedProcessing Symp. (IPDPS), pages 390–401, may 2012.
28 / 28