33
Overlapping Community detection using Bayesian Non Negative Matrix Factorization Rajkumar Singh Rishi Barua Guide: Dr. Ashish Anand Dept. of Computer Science Indian Institute of Technology, Guwahati April 18, 2013 R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix Factorization April 18, 2013 1 / 15

Overlapping community Detection Using Bayesian NMF

Embed Size (px)

Citation preview

Page 1: Overlapping community Detection Using Bayesian NMF

Overlapping Community detection using Bayesian NonNegative Matrix Factorization

Rajkumar SinghRishi Barua

Guide: Dr. Ashish Anand

Dept. of Computer ScienceIndian Institute of Technology, Guwahati

April 18, 2013

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 1 / 15

Page 2: Overlapping community Detection Using Bayesian NMF

Network Paradigm

V ∈ RN×N is the adjacency matrix

vij can be boolean or denote connection weight

ki =∑

j vij degree of node i

data in relational form

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 2 / 15

Page 3: Overlapping community Detection Using Bayesian NMF

Network Paradigm

V ∈ RN×N is the adjacency matrix

vij can be boolean or denote connection weight

ki =∑

j vij degree of node i

data in relational form

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 2 / 15

Page 4: Overlapping community Detection Using Bayesian NMF

Network Paradigm

V ∈ RN×N is the adjacency matrix

vij can be boolean or denote connection weight

ki =∑

j vij degree of node i

data in relational form

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 2 / 15

Page 5: Overlapping community Detection Using Bayesian NMF

Network Paradigm

V ∈ RN×N is the adjacency matrix

vij can be boolean or denote connection weight

ki =∑

j vij degree of node i

data in relational form

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 2 / 15

Page 6: Overlapping community Detection Using Bayesian NMF

Network Paradigm

V ∈ RN×N is the adjacency matrix

vij can be boolean or denote connection weight

ki =∑

j vij degree of node i

data in relational form

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 2 / 15

Page 7: Overlapping community Detection Using Bayesian NMF

Commnity Detection

Q. What is Community ?As such no Specific Definition. It is Context Dependent.Defn. : Here we define communities as Modules which are subgraphs withmore links connecting the nodes inside than outside them.

A given real world network is assumed to be clustered into a numberof latent classes of nodes.

These nodes form regions of increased connectivity in the network.

These communities usually reflect functional modules that affect theoverall behavior of the system.

Examples: Friend cliques in social networks, Similar proteins in aprotein interaction network, Research groups in a scientificcollaboration network

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 3 / 15

Page 8: Overlapping community Detection Using Bayesian NMF

Commnity Detection

Q. What is Community ?

As such no Specific Definition. It is Context Dependent.Defn. : Here we define communities as Modules which are subgraphs withmore links connecting the nodes inside than outside them.

A given real world network is assumed to be clustered into a numberof latent classes of nodes.

These nodes form regions of increased connectivity in the network.

These communities usually reflect functional modules that affect theoverall behavior of the system.

Examples: Friend cliques in social networks, Similar proteins in aprotein interaction network, Research groups in a scientificcollaboration network

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 3 / 15

Page 9: Overlapping community Detection Using Bayesian NMF

Commnity Detection

Q. What is Community ?As such no Specific Definition. It is Context Dependent.

Defn. : Here we define communities as Modules which are subgraphs withmore links connecting the nodes inside than outside them.

A given real world network is assumed to be clustered into a numberof latent classes of nodes.

These nodes form regions of increased connectivity in the network.

These communities usually reflect functional modules that affect theoverall behavior of the system.

Examples: Friend cliques in social networks, Similar proteins in aprotein interaction network, Research groups in a scientificcollaboration network

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 3 / 15

Page 10: Overlapping community Detection Using Bayesian NMF

Commnity Detection

Q. What is Community ?As such no Specific Definition. It is Context Dependent.Defn. : Here we define communities as Modules which are subgraphs withmore links connecting the nodes inside than outside them.

A given real world network is assumed to be clustered into a numberof latent classes of nodes.

These nodes form regions of increased connectivity in the network.

These communities usually reflect functional modules that affect theoverall behavior of the system.

Examples: Friend cliques in social networks, Similar proteins in aprotein interaction network, Research groups in a scientificcollaboration network

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 3 / 15

Page 11: Overlapping community Detection Using Bayesian NMF

Commnity Detection

Q. What is Community ?As such no Specific Definition. It is Context Dependent.Defn. : Here we define communities as Modules which are subgraphs withmore links connecting the nodes inside than outside them.

A given real world network is assumed to be clustered into a numberof latent classes of nodes.

These nodes form regions of increased connectivity in the network.

These communities usually reflect functional modules that affect theoverall behavior of the system.

Examples: Friend cliques in social networks, Similar proteins in aprotein interaction network, Research groups in a scientificcollaboration network

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 3 / 15

Page 12: Overlapping community Detection Using Bayesian NMF

Commnity Detection

Q. What is Community ?As such no Specific Definition. It is Context Dependent.Defn. : Here we define communities as Modules which are subgraphs withmore links connecting the nodes inside than outside them.

A given real world network is assumed to be clustered into a numberof latent classes of nodes.

These nodes form regions of increased connectivity in the network.

These communities usually reflect functional modules that affect theoverall behavior of the system.

Examples: Friend cliques in social networks, Similar proteins in aprotein interaction network, Research groups in a scientificcollaboration network

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 3 / 15

Page 13: Overlapping community Detection Using Bayesian NMF

Commnity Detection

Q. What is Community ?As such no Specific Definition. It is Context Dependent.Defn. : Here we define communities as Modules which are subgraphs withmore links connecting the nodes inside than outside them.

A given real world network is assumed to be clustered into a numberof latent classes of nodes.

These nodes form regions of increased connectivity in the network.

These communities usually reflect functional modules that affect theoverall behavior of the system.

Examples: Friend cliques in social networks, Similar proteins in aprotein interaction network, Research groups in a scientificcollaboration network

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 3 / 15

Page 14: Overlapping community Detection Using Bayesian NMF

Commnity Detection

Q. What is Community ?As such no Specific Definition. It is Context Dependent.Defn. : Here we define communities as Modules which are subgraphs withmore links connecting the nodes inside than outside them.

A given real world network is assumed to be clustered into a numberof latent classes of nodes.

These nodes form regions of increased connectivity in the network.

These communities usually reflect functional modules that affect theoverall behavior of the system.

Examples: Friend cliques in social networks, Similar proteins in aprotein interaction network, Research groups in a scientificcollaboration network

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 3 / 15

Page 15: Overlapping community Detection Using Bayesian NMF

Community Detection

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 4 / 15

Page 16: Overlapping community Detection Using Bayesian NMF

Non-negative Matrix Factorzation

Decompose data matrix V to a product of two other matrices W , H undernon-negativity constraints.V∼V̂ = WH V , V̂ ∈ RF×N and W ∈ RF×K , H ∈ RK×N so that,FK + KN < FN

Non-negativity constraints avoid the problem of an ill-posed solution

They also reflect the idea of parts-based representation. V can beexpressed as additive combination of certain basis structures definedby wk , given an encoding hk

V̂ =∑k

w:khk: (1)

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 5 / 15

Page 17: Overlapping community Detection Using Bayesian NMF

Poisson Model

Given an Adjacency matrix V ∈ RN×N

Assume, pairwise interactions vij is generated by a Poissondistribution with rate v̂ij .

Hence V̂∼ V

Expectation network V̂ is composed of W and H, so that V̂ = WH

Inner rank K : Unknown number of communities

wik , hkj ∈ 1, . . . ,K : The contribution of each latent community tov̂ij =

∑k wikhkj

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 6 / 15

Page 18: Overlapping community Detection Using Bayesian NMF

Cost Function

posterior:

p(V |W ,H, β) = p(V |W ,H)p(W |β)p(H|β)p(β) (2)

by taking the − log of: p(V ,W ,H)p(W |β)p(H|β)p(β),we define cost as U.

U = − log p(V |W ,H)︸ ︷︷ ︸− log p(W |β)− log p(W |β)− log p(β)︸ ︷︷ ︸ (3)

U is our minimization objective under non-negativity constraints.

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 7 / 15

Page 19: Overlapping community Detection Using Bayesian NMF

Cost Function

posterior:

p(V |W ,H, β) = p(V |W ,H)p(W |β)p(H|β)p(β) (2)

by taking the − log of: p(V ,W ,H)p(W |β)p(H|β)p(β),we define cost as U.

U = − log p(V |W ,H)︸ ︷︷ ︸− log p(W |β)− log p(W |β)− log p(β)︸ ︷︷ ︸ (3)

U is our minimization objective under non-negativity constraints.

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 7 / 15

Page 20: Overlapping community Detection Using Bayesian NMF

Cost Function

posterior:

p(V |W ,H, β) = p(V |W ,H)p(W |β)p(H|β)p(β) (2)

by taking the − log of: p(V ,W ,H)p(W |β)p(H|β)p(β),we define cost as U.

U = − log p(V |W ,H)︸ ︷︷ ︸− log p(W |β)− log p(W |β)− log p(β)︸ ︷︷ ︸ (3)

U is our minimization objective under non-negativity constraints.

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 7 / 15

Page 21: Overlapping community Detection Using Bayesian NMF

Cost Function

posterior:

p(V |W ,H, β) = p(V |W ,H)p(W |β)p(H|β)p(β) (2)

by taking the − log of: p(V ,W ,H)p(W |β)p(H|β)p(β),we define cost as U.

U = − log p(V |W ,H)︸ ︷︷ ︸− log p(W |β)− log p(W |β)− log p(β)︸ ︷︷ ︸ (3)

U is our minimization objective under non-negativity constraints.

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 7 / 15

Page 22: Overlapping community Detection Using Bayesian NMF

Cost Function

posterior:

p(V |W ,H, β) = p(V |W ,H)p(W |β)p(H|β)p(β) (2)

by taking the − log of: p(V ,W ,H)p(W |β)p(H|β)p(β),we define cost as U.

U = − log p(V |W ,H)︸ ︷︷ ︸− log p(W |β)− log p(W |β)− log p(β)︸ ︷︷ ︸ (3)

U is our minimization objective under non-negativity constraints.

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 7 / 15

Page 23: Overlapping community Detection Using Bayesian NMF

Cost Function

posterior:

p(V |W ,H, β) = p(V |W ,H)p(W |β)p(H|β)p(β) (2)

by taking the − log of: p(V ,W ,H)p(W |β)p(H|β)p(β),we define cost as U.

U = − log p(V |W ,H)︸ ︷︷ ︸− log p(W |β)− log p(W |β)− log p(β)︸ ︷︷ ︸ (3)

U is our minimization objective under non-negativity constraints.

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 7 / 15

Page 24: Overlapping community Detection Using Bayesian NMF

Parameter inference

U =∑

i

∑j

[vij log

(vijv̂ij

)+ v̂ij

]+

1

2

∑k

[(∑i βkw

2ik

)+(∑

j βkh2kj

)− 2N log βk

]+∑

k

(βkbk − (ak − 1) log βk

)+ k

We find w∗, h∗,β∗ that minimize U using the gradient descentalgorithm.

The effective number of communities K is the number of non-zerocolumns of W∗ and H∗

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 8 / 15

Page 25: Overlapping community Detection Using Bayesian NMF

Parameter inference

U =∑

i

∑j

[vij log

(vijv̂ij

)+ v̂ij

]+

1

2

∑k

[(∑i βkw

2ik

)+(∑

j βkh2kj

)− 2N log βk

]+∑

k

(βkbk − (ak − 1) log βk

)+ k

We find w∗, h∗,β∗ that minimize U using the gradient descentalgorithm.

The effective number of communities K is the number of non-zerocolumns of W∗ and H∗

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 8 / 15

Page 26: Overlapping community Detection Using Bayesian NMF

Parameter inference

U =∑

i

∑j

[vij log

(vijv̂ij

)+ v̂ij

]+

1

2

∑k

[(∑i βkw

2ik

)+(∑

j βkh2kj

)− 2N log βk

]+∑

k

(βkbk − (ak − 1) log βk

)+ k

We find w∗, h∗,β∗ that minimize U using the gradient descentalgorithm.

The effective number of communities K is the number of non-zerocolumns of W∗ and H∗

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 8 / 15

Page 27: Overlapping community Detection Using Bayesian NMF

Results

W∗,H∗ describe a bipartite network of node allocations tocommunities.

If our original adjacency matrix V is symmetric, then W∗ = HT∗ .

Each wik or hki denotes the participation strength of node i tocommunity k.

The i th row of W or column of H describes a soft-membershipdistribution of node i across communities.

Varying node participation scores allow us to describe overlapsbetween communities in a disciplined manner.

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 9 / 15

Page 28: Overlapping community Detection Using Bayesian NMF

Example: 1

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 10 / 15

Page 29: Overlapping community Detection Using Bayesian NMF

Example: 1

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 11 / 15

Page 30: Overlapping community Detection Using Bayesian NMF

Example: 1

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 12 / 15

Page 31: Overlapping community Detection Using Bayesian NMF

Example: 2

We now take a 1000X1000 Adjacency matrix V , which is very sparse.

The graph is a non-community graph, i.e. having no communitystructure.

Several of the nodes are isolated.

Solutions from EO, Louvian (as described in the paper) offer highermodularity than NMF.

NMF clearly shows that the modularity is low and the nodes have nopreference to lie in any particular community, or are non-communal.

NMF does not suffer from resolution limit of modularity (whichgroups smaller communities), as can be clearly seen.

The Output shows 1 community, with 428 isolated nodes i.e.Non-communal

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 13 / 15

Page 32: Overlapping community Detection Using Bayesian NMF

Example: 2

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 14 / 15

Page 33: Overlapping community Detection Using Bayesian NMF

References

1 Overlapping community detection using Bayesian non-negative matrixfactorization - I. Psorakis, S. Roberts and M.Ebden

2 Signal Processing with Adaptive Sparse Structured Representations -V.Tan and C.Fevotte

3 D.D. Lee and H.S. Seung, Nature ,401

R. Singh, R. Barua (IITG) Overlapping Community detection using Bayesian Non Negative Matrix FactorizationApril 18, 2013 15 / 15