38
Chayant Tantipathananandh Tantipathananandh with Tanya Berger-Wolf Constant-Factor Approximation Algorithms for Identifying Dynamic Communities

Tantipathananandh Chayant Tantipathananandh with Tanya Berger-Wolf Constant-Factor Approximation Algorithms for Identifying Dynamic Communities

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Chayant TantipathananandhTantipathananandhwith Tanya Berger-Wolf

Constant-Factor Approximation Algorithms for Identifying Dynamic Communities

Constant-Factor Approximation Algorithms for Identifying Dynamic Communities

Social NetworksSocial Networks

These are snapshots and networks change over time

Dynamic NetworksDynamic Networks

Aggregated networkAggregated network

5

4

32

11

322

11

•Interactions occur in the form of disjoint groups•Groups are not communities

…t=2

t=1

3322 114455

55 44 3311 22

55 22 33 44 11

55 22 33 44

55 22 44 11

t=1

t=2

55 44

1122

33

CommunitiesCommunities• What is community?

“Cohesive subgroups are subsets of actors among whom there are relatively strong, direct, intense, frequent, or positive ties.” [Wasserman & Faust 1994]

• Dynamic Community Identification– GraphScope [Sun et al 2005]– Metagroups [Berger-Wolf & Saia 2006]– Dynamic Communities [TBK 2007]– Clique Percolation [Palla et al 2007]– FacetNet [Lin et al 2009]– Bayesian approach [Yang et al 2009]

Ship of Theseus Ship of Theseus

Jeannot's knife “has had its blade changed fifteen times and its handle fifteen times, but is still the same knife.” [French story]

Jeannot's knife “has had its blade changed fifteen times and its handle fifteen times, but is still the same knife.” [French story]

from Wikipedia

“The ship … was preserved by the Athenians …, for they took away the old planks as they decayed, putting in new and stronger timber in their place, insomuch that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same.” [Plutarch, Theseus]

“The ship … was preserved by the Athenians …, for they took away the old planks as they decayed, putting in new and stronger timber in their place, insomuch that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same.” [Plutarch, Theseus]

Ship of Theseus Ship of Theseus

Individual parts never change identitiesCost for changing

identity

Ship of Theseus Ship of Theseus

Identity changes to match the group

Costs for visiting and being absent

ApproachApproach

Community = ColorCommunity = Color

Valid coloring: In each time step, different groups have different colors.

InterpretationInterpretation

Group color: How does community c

interact at time t?

InterpretationInterpretation

Individual color: Who belong to community c at time t?

11

11

11

11

11

22

22

22

22

22

Social Costs: ConservatismSocial Costs: Conservatism

Switching cost α

α

α

α

Absence cost β1 Visiting cost β2

α

α

α

22

22

22

22

22

22

22

22

22

22

Social Costs: LoyaltySocial Costs: Loyalty

β1

β1

β1

Absence cost β1 Visiting cost β2Switching cost α

β1

β1

β122 33

33

1111

3322

33

β1

Social Costs: LoyaltySocial Costs: Loyalty

β2

β2

Switching cost α Absence cost β1 Visiting cost β2

22

33

β2 22

β2 33

Problem ComplexityProblem Complexity

• Minimizing total cost is hardNP-complete and APX-hard [with Berger-Wolf and Kempe 2007]

• Constant-Factor Approximation [details in paper]

• Easy special caseIf no missing individuals and 2α ≤ β2 , thensimply weighted bipartite matching[details in paper]

Group GraphGroup Graph

Approximation via bipartite matchingApproximation via bipartite matching

– assume all individuals are observed at all time steps

Greedy ApproximationGreedy Approximation

time

No visiting or absence and minimizing switching

No visiting or absence and minimizing switching

Greedy ApproximationGreedy Approximation

24

3

3

7

3

34

≈ maximizing path coverage ≈ maximizing path coverage

No visiting or absence andminimizing switching

No visiting or absence andminimizing switching

2

Improvement by dynamic programming

Improvement by dynamic programming

Greedy alg guaranteesmax{2, 2α/β1, 4α/β2}

in α, β1, β2, independent of input size

Greedy alg guaranteesmax{2, 2α/β1, 4α/β2}

in α, β1, β2, independent of input size

time

Southern Women Data Set [DGG 1941]Southern Women Data Set [DGG 1941]

• 18 individuals, 14 time steps• Collected in Natchez, MS, 1935

aggregated network

Ethnography [DGG1941]Ethnography [DGG1941]

Core

Core

note: columns not ordered by time

Optimal CommunitiesOptimal Communities

all costs equalwhite circles = unknown

Core Core

time

individuals

ethnography

timetime

Approximate Optimal Approximate Optimal

Core Core Core Coreethnography

Approximation PowerApproximation Power

28 inds, 44 times 29 inds, 82 times 313 inds, 758 times

Approximation PowerApproximation Power

41 inds, 418 times 264 inds, 425 times 96 inds, 1577 times

ConclusionsConclusions

• Identity of objects that change over time (Ship of Theseus Paradox)

• Formulate an optimization problem• Greedy approximation– Fast– Near-optimal

• Future Work– Algorithm with guarantee not depending on α, β1, β2

– Network snapshots instead of disjoint groups

Arun Maiya

Saad Sheikh

Thank YouThank You

NSF grant, KDD student travel award

Habiba

David KempeJared Saia

Mayank Lahiri

Dan Rubenstein

Tanya Berger-Wolf

Rajmonda SuloRobert GrossmanSiva Sundaresan

Ilya Fischoff

Anushka Anand

Chayant

Ravi Kumar, Jasmine Novak, Prabhakar Raghavan, Andrew Tomkins IBM Almaden Research Center

On the Bursty Evolution of BlogspaceOn the Bursty Evolution of Blogspace

BlogspaceBlogspace

• Blogspace• Collection of blogs with their links

• Motivation– Sociological• Different with traditional web page

– Technical • From static snapshot to dynamic graphs

BackgroundBackground

• Web communities (Ravi Kumar,1999)• groups of individuals who share a common interest• characterized by dense directed bipartite subgraphs.

• Bursty communities of blogs• Exhibit striking temporal characteristics• Extract the community within a time interval

Time graphTime graph

• time graph G = (V,E)• v in V has an associated duaration D(v) • e in E is a triple (u, v, t)• t is a time in interval D(u) ∩ D(v).

• prefix of G at time t Gt = (Vt,Et) • Vt= {v in V | D(v) ∩ [0, t] ≠ Ø }

• Et = {(u, v, t) in E| t’ ≤ t}

ApproachApproach

• Two step approach– Community extraction• Extract dense subgraphs( potential communities)

– Bust analysis• analyze each dense subgraph to identfy and rank

bursts in these communities.

Community extractionCommunity extraction

• Finding the densest subgraph: NP-hard• Two steps:– Pruning• Remove vertices of degree no more than one• Vertices of degree two are K3

g• Output and remove communities (pass a threshold)• Repeat the 3 steps above

– Expanding• Determines the vertex containing the most links• Add it to the community If the links is larger than tk.

Burst analysisBurst analysis

• Kleinberg’s method (SIGKDD 2002)• model the generation of events by an automaton

– one of two states, “low” and “high.” high state is hypothesized as generating bursts of events.

• a cost is associated with any state transition to discourage short bursts.• find a low cost state sequence that is likely to generate

the stream.• solves the problem of enumerating all the bursts by

order of weight( dynamic programming)

Tuning the algorithmsTuning the algorithms

• Expansion in community extraction• Edges must grow to triangles; • communities of size up to six will only grow vertices

that link to all but one vertex; • Communities of size up to nine will only grow vertices

that link to all but two vertices; • communities up to size 20 will grow only vertices that

link to 70% of the community; • larger communities will grow only vertices that link to

at least 60% of the community

ResultsResults