FASCIA - Fast Approximate Subgraph Counting and Enumerationkxm85/papers/FASCIA_ICPP13_slides.pdf · FASCIA Fast Approximate Subgraph Counting and Enumeration George M. Slota Kamesh

FASCIAFast Approximate Subgraph Counting and Enumeration

George M. Slota Kamesh Madduri

Scalable Computing LaboratoryThe Pennsylvania State University

2 Oct. 2013

1 / 28

Overview

Background

Motivation

Color-coding

Related Work

FASCIA

Results

Future Work

2 / 28

BackgroundSubgraph Counting

3 / 28


3 / 28


3 / 28


3 / 28


3 / 28

BackgroundSubgraph Enumeration

4 / 28


4 / 28


4 / 28

MotivationWhy do we want fast algorithms for subgraph counting?

Important to bioinformatics, chemoinformatics, socialnetwork analysis, communication network analysis, etc.

Forms basis of more complex analysis

Motif findingGraphlet frequency distance (GFD)Graphlet degree distributions (GDD)

Counting and enumeration on large networks is verytough, O(nk) complexity for naıve algorithm

5 / 28

MotivationMotif finding, GFD, and GDD

Motif finding: Look for all subgraphs of a certain size

GFD: Numerically compare occurrence frequency to other networksGDD: Numerically compare embeddings/vertex distribution

●

● ●●

●

●●

● ●

●

●

0.1

1.0

1 2 3 4 5 6 7 8 9 10 11Subgraph

Rel

ativ

e F

requ

ency

●●●●E.coli S.cerevisiae H.pylori C.elegans

6 / 28


Motif finding: Look for all subgraphs of a certain sizeGFD: Numerically compare occurrence frequency to other networks

GDD: Numerically compare embeddings/vertex distribution

6 / 28


Motif finding: Look for all subgraphs of a certain sizeGFD: Numerically compare occurrence frequency to other networksGDD: Numerically compare embeddings/vertex distribution

6 / 28

Color-codingColor-coding for Approximate Subgraph Counting

Color-coding [Alon et al., 1995] is a way to approximatethe number of tree-like non-induced subgraph embeddingsin a network

We randomly color a graph with k colors and count thenumber of colorful subgraph embeddings

We retrieve an approximate total count by scaling thecolorful count by the probability that any givenembedding would be colorful

Run multiple times with unique graph colorings to getfinal average

Using dynamic programming approach (to be explained),each iteration can run in O(m · 2k · ek)

7 / 28


cntcolorful = 3,Ctotal = 33,Ccolorful = 3!,P = 3!33

cntestimate = cntcolorfulP = 13.5

8 / 28




8 / 28




8 / 28




8 / 28




8 / 28




8 / 28

Color-codingAlgorithmic Overview Description

1: Partition input template T (k vertices) into subtemplatesSi using single edge cuts.

2: Select Niter to be performed3: for i = 1 to Niter do4: Randomly assign to each vertex v in graph G a color

between 0 and k − 1.5: for all v ∈ G do6: Use a dynamic programming scheme to count

colorful non-induced occurrences of T rooted at v .7: end for8: end for9: Take average of all Niter counts to be final count.

9 / 28

Color-codingTemplate partitioning

10 / 28

Color-codingDynamic Programming Step

1: for all sub-templates Si created from partitioning T , in reverse order they werecreated during partitioning do

2: for all vertices v ∈ G do3: if Si consists of a single node then4: Set table[Si ][v ][color of v ] := 15: else6: Si consists of active child ai and passive child pi7: for all colorsets C of unique values mapped to S do8: Set count := 09: for all u ∈ N(v), where N(v) is the neighborhood of v do

10: for all possible combinations Ca and Cp created by splitting Cand mapping onto ai and pi do

11: count += table[ai ][v ][Ca]·table[pi ][u][Cp ]12: end for13: end for14: Set table[Si ][v ][C ] := count15: end for16: end if17: end for18: end for19: templateCount :=

∑v

∑C

table[T ][v ][C ]

11 / 28

Color-codingColorset and count calculation

12 / 28

Related WorkColor-coding, non-induced subgraph enumeration

Alon et al.’s Implementation [Alon et al., 2008]

Motif finding on PPI networks

MODA [Omidi et al., 2009]

Uses approximation or exact schemeMotif finding on small networks

PARSE [Zhao et al., 2010]

Distributed color-coding algorithm using MPIHandles large graphs through partitioning

SAHAD [Zhao et al., 2012]

Distributed color-coding algorithm using Hadoop(MapReduce)Handles vertex-labeled graphs, computes graphlet degreedistributions

13 / 28

FASCIAOverview

FASCIA: Fast Approximate Subgraph Counting and Enumeration

Count and enumerate subgraphs, supports node labelsPerform motif finding, calculate GDD

Algorithmic Optimizations:

Combinatorial indexing scheme for color mappingsMultiple shared-memory parallelization strategiesMemory reduction via array design, hashing schemesSpeedup via template partitioning and work avoidance

14 / 28

FASCIACombinatorial indexing scheme

Combinatorial number system to represent colorsets:C =

(c11

)+(c2

2

)+ · · ·+

(ckk

), where c1 < c2 < · · · < ck

Precompute indexes and ordering in advance, store in table (<2MB for k =12)This avoid avoids explicit handling/passing of colors, or computation of colorsetindexes during runtime

15 / 28

FASCIAShared memory parallelization

Inner loop parallelization: forall v ∈ G

Outer loop parallelization: for i = 1 to Niter

Outer loop requires individual dynamic tables for each separateiteration, storage increases linearly with thread count

Possible to do arbitrary combinations, e.g. a 12 thread machinewith 2 outer loop threads each with 6 inner loop threads

1: Partition input template T2: Select Niter to be performed3: for i = 1 to Niter in parallel do4: Randomly color G5: for all Si created during partitioning, ai and pi children do6: for all v ∈ G in parallel do7: ....8: end for9: end for

10: end for11: Take average of all Niter counts to be final count.

16 / 28

FASCIAMemory optimizations

We implement both a three-level array and hash table

Items in array are stored/retrieved as table[Si ][v ][C ]

Hash table, for each subtemplate, key = v ∗ Nc + C

If we initialize and resize hash table to be a factor ofNv ∗ Nc , then simple hash function (key mod Nv ) resultsin a relative uniform distribution based on initial randomgraph coloring

Generally: array method more memory efficient for densegraphs, hash table more efficient for sparse graphs

17 / 28

FASCIATemplate partitioning and work avoidance

Basic partitioning: try to evenly partition template

Observation: algorithm runtime is proportional to∑i

( kSi

)·(Siai

), i.e.

∑i|Ci | · |Cai |

This sum can be minimized by a one-at-a-time partitioning approachOn certain templates, this sum can be minimized by exploiting latent symmetry,HOWEVER ...

18 / 28




( kSi

)·(Siai

), i.e.

∑i|Ci | · |Cai |

This sum can be minimized by a one-at-a-time partitioning approachOn certain templates, this sum can be minimized by exploiting latent symmetry,HOWEVER ...

18 / 28




( kSi

)·(Siai

), i.e.

∑i|Ci | · |Cai |

This sum can be minimized by a one-at-a-time partitioning approach

On certain templates, this sum can be minimized by exploiting latent symmetry,HOWEVER ...

18 / 28




( kSi

)·(Siai

), i.e.

∑i|Ci | · |Cai |

This sum can be minimized by a one-at-a-time partitioning approach

On certain templates, this sum can be minimized by exploiting latent symmetry,HOWEVER ...

18 / 28

FASCIATemplate partitioning, work reduction

With a single node active or passive child we can do ∼ 1k

of the original work,

Ca = color(v) or Cp = color(u)

We can avoid all work for a given v if (table[a][v ][] == NULL), for anyarbitrary subtemplate

Can also minimize the size of N(v) by checking the initialization for eachpossible u

Finally, we can order the way in which we process all Si to minimize memoryusage

1: for all Si created during partitioning, ai and pi children do2: for all v ∈ G , where table[ai ][v ] != NULL do3: for all C possibly mapped to Si do4: for all Cp , Ca from C , where Ca,Cp is fixed if |ai | = 1 or |pi | = 1 do5: for all u ∈ N(v), where table[pi ][u] != NULL do6: count += table[ai ][v ][Ca]·table[pi ][u][Cp ]7: end for8: end for9: Set table[Si ][v ][C ] := count

10: end for11: end for12: end for

19 / 28

ResultsOverview

Networks and templates analyzed

Large network runtimes

Approximation Error in template counts

Memory savings

Strong scaling

20 / 28

ResultsNetworks and templates analyzed

Gordon at SDSC, 2× Xeon E5 @ 2.6GHz (Sandy Bridge), 64 GBDDR3Database of Interacting Proteins, SNAP, and Virginia Tech NDSSL

Network n m davg dmax

Portland 1,588,212 31,204,286 39.3 275PA Road Net 1,090,917 1,541,898 2.8 9

Slashdot 82,168 438,643 10.7 2510Enron 33,696 180,811 10.7 1383E. coli 2,546 11,520 9.0 178

S. cerevisiae 5,021 22,119 8.8 289H. pylori 687 1,352 3.9 54

C. elegans 2,391 3,831 3.2 187

U3-1 U3-2 U5-1 U5-2 U7-1

U7-2 U10-1 U10-2 U12-1 U12-2

21 / 28

ResultsLarge network runtimes

Parallel (16 cores) results for all unlabeled (left) and labeled (right)templates on Portland network

8 possible demographic labels (M/F and kid/youth/adult/senior)∼200 seconds for up to 12 vertex unlabeled template, less than 1second for all labeled templates

0

50

100

150

200

U3−1 U3−2 U5−1 U5−2 U7−1 U7−2 U10−1 U10−2 U12−1 U12−2Template

Sin

gle

Itera

tion

Exe

cutio

n T

ime

(s)

0.0

0.2

0.4

0.6

L3−1 L3−2 L5−1 L5−2 L7−1 L7−2 L10−1 L10−2 L12−1 L12−2Labeled Template

Sin

gle

Itera

tion

Exe

cutio

n T

ime

(s)

22 / 28

ResultsMotif finding runtimes

Parallel (16 cores) runtimes for a single iteration of all 7, 10, and 12vertex templates on the four biological PPI networks7 → 11 templates, 10 → 106 templates, 12 → 556 templates

●

●

●

1e−01

1e+01

1e+03

7 10 12Motif Size

Sin

gle

Itera

tion

Exe

cutio

n T

ime

(s)

●●●●E.coli S.cerevisiae H.pylori C.elegans

23 / 28

ResultsApproximation Error in template counts

Enron (left) with U3-1 and U5-1 and H. Pylori (right)across all 11 unique templates of size 7

Error increases with template size and inversely tonetwork size

●

●

● ●

● ●

●● ●

●

●

●

●

●

●

●

●

●

●

●

0.000

0.005

0.010

0.015

0.020

1 2 3 4 5 6 7 8 9 10Iterations

Err

or

●● ●●U3−1 U5−1●

●

●

●

●0.000

0.025

0.050

0.075

1 10 100 1000 10000Iterations

Err

or

24 / 28

ResultsMemory savings

Memory requirements on Portland and PA Road networkfor improved array (left) and hash table (right)Using all UX-1 chain templatesUp to 20%-90% savings versus naıve method

● ●

●

●

●

● ●

●

●

●

● ● ●●

●

0

5

10

15

20

3 5 7 10 12Template Size

Mem

ory

Usa

ge (

GB

)

●●● ●●● ●●●Naive Improved Improved − Labeled

Portland Network with Improved Array

● ● ●●

●

● ●●

●

●

● ●●

●

●

0

5

10

15

3 5 7 10 12Template Size

Mem

ory

Usa

ge (

GB

)

●●● ●●● ●●●Naive Improved Hash Table

PA Road Network with Hash Table25 / 28

ResultsStrong Scaling

Inner loop for large graphs (forall v ∈ G )

Outer loop for small graphs (for i = 1 to Niter)

U12-2 Template on Portland (left) and Enron (right), 16 threads

About 12× speedup for inner loop on Portland

About 6× speedup for outer loop on Enron, 3.5× for inner loop

●

●

●

●

●●

500

1000

1500

2000

2500

1 2 4 8 12 16Processor Cores

Sin

gle

Itera

tion

Exe

cutio

n T

ime

(s)

●

●

●

●●

●

0.1

0.2

0.3

1 2 4 8 12 16Processor Cores

Exe

cutio

n T

ime

(s)

●● Inner Loop Outer Loop

26 / 28

Conclusions and Future (Ongoing) WorkFASCIA

Memory and algorithmic improvements allow FASCIA toachieve considerable speedup relative to previous work

Real time count estimate for up to 7 vertex templates onlarge networks

Algorithm provides reasonable error after a low number ofiterations

Further optimization for count storage (CSR-likestructure)

Distribution via MPI, graph partitioning to allow largercounts

27 / 28

Bibliography

N. Alon, R. Yuster, and U. Zwick. Color-coding. J. ACM, 42(4):844–856,1995.

N. Alon, P. Dao, I. Hajirasouliha, F. Hormozdiari, and S.C. Sahinalp.Biomolecular network motif counting and discovery by color coding.Bioinformatics, 24(13):i241–i249, 2008.

S. Omidi, F. Schreiber, and A. Masoudi-Nejad. MODA: an efficientalgorithm for network motif discovery in biological networks. GenesGenet Syst, 84(5):385–395, 2009.

Z. Zhao, M. Khan, V.S.A. Kumar, and M.V. Marathe. Subgraphenumeration in large social contact networks using parallel colorcoding and streaming. In Proc. 39th Int’l. Conf. on Parallel Processing(ICPP ’10), pages 594–603, 2010.

Z. Zhao, G. Wang, A.R. Butt, M. Khan, V.S.A. Kumar, and M.V.Marathe. Efficient mining of frequent subgraphs in the presence ofisomorphism. In Proc. 26th IEEE Int’l. Parallel and DistributedProcessing Symp. (IPDPS), pages 390–401, may 2012.

28 / 28

Documents

FASCIA - Fast Approximate Subgraph Counting and Enumerationkxm85/papers/FASCIA_ICPP13_slides.pdf · FASCIA Fast Approximate Subgraph Counting and Enumeration George M. Slota Kamesh