Consistent estimation of Mixed Memberships with Successive ...ru.discrete-mathematics.org/conferences/201805/wog/panov.pdf · Graph models Mixed membership stochastic block model

Consistent estimation of Mixed Memberships withSuccessive Projections

Maxim Panovjoint work with E. Marshakov, R. Ushakov and N. Mokrov

Skoltech and IITP

15.05.2018

Community detectionProblem statement

Graph G (E ,V ):I nodes vj ;I edges Aij .

Problem: we want to partition graph in such a way that there are few edgesbetween groups.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 2 / 31

Community detectionOverlapping communities

Non-overlapping vs. overlapping communities


Graph modelsErdos-Renyi graph

Simplest possible random graph model

Aij = Bernoulli(p),

where Aij are independent and p ∈ [0, 1].

Figure: Erdos-Renyi graph with p = 0.5.


Graph modelsErdos-Renyi graph

Simplest possible random graph model

Aij = Bernoulli(p),

where Aij are independent and p ∈ [0, 1].

Figure: Erdos-Renyi graph with p = 0.5.


Graph modelsGeneralized Erdos-Renyi graph

Simple generalization of Erdos-Renyi model:

Aij = Bernoulli(pij),

where pij ∈ [0, 1].

In a matrix form we can write

A ∼ Bernoulli(P),

where P = {pij}ni,j=1.

Question: what types of matrix P allow for community structure?


Graph modelsStochastic block model (SBM)

Figure: Example of stochastic block model and corresponding graph.


Graph modelsMixed membership stochastic block model (MMSB)

Graph edges are generated according generalized Erdos-Renyi model:

A ∼ Bernoulli(P).

The probability matrix P can be factorized as

P = ΘBΘT,

where

B ∈ [0, 1]K×K is a symmetric matrix of community-community probabilities;

Θ ∈ [0, 1]n×K is a community membership matrix.

ConditionWe assume that

1 Every row of matrix Θ sums to 1:∑K

k=1 𝜃ik = 1, i = 1, . . . , n;

2 (optional) All the community membership vectors are independent draws fromDirichlet distribution, i.e. 𝜃i ∼ Dirichlet(𝛼) for some 𝛼 ∈ RK

+, i = 1, . . . , n.


Graph modelsMMSB examples

As discussed, in MMSB model the probability matrix is

P = ΘBΘT.

It means that

pij =K∑

k,l=1

𝜃ik𝜃jlbkl .

SBM is particular case of MMSB with the property that for any i ∈ 1, n thereexists k ∈ 1,K such that

𝜃ik = 1 and 𝜃il = 0, k = l

leading to

pij = bkl

for any i , j = 1, . . . n and some k = k(i), l = l(j).Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 8 / 31

Graph modelsIdentifiability of MMSB

Problem: If our goal is estimation of parameters Θ and B, whether the truevalues are unique?

Answer: Of course not, for example

then

P(1) = M1 I3 M1T = I3 M2 I3 = P(2),

where I3 is an identity matrix of size 3.


Graph modelsIdentifiability of MMSB

Condition (Identifiability)

1 There is at least one “pure” node at each community, i.e. for eachk = 1, . . . ,K there exists i such that 𝜃ik =

∑Kl=1 𝜃il = 1.

2 Matrix B ∈ [0, 1]K×K is full rank.

3 Every row of matrix Θ sums to 1:∑K

k=1 𝜃ik = 1, i = 1, . . . , n.

Theorem

If the Condition (Identifiability) is satisfied then the MMSB is identifiable, i.e. forevery P = ΘBΘT matrices Θ and B are uniquely defined up to permutation ofcommunities (columns of matrix Θ and rows and columns of matrix B).


Algorithms for parameter estimation in MMSB

There exist several algorithms for parameter estimation in MMSB:

stochastic variational inference (Airoldi at al., 2009; SVI);

tensor spectral method (Anandkumar et al., 2013; Tensor);

geometrical nonnegative matrix factorization (Mao et al., 2013; GeoNMF).

Problems of these methods:

absence of provable guarantees (SVI);

high computational complexity (SVI, Tensor);

applicability only to limited subclass of MMSB (GeoNMF).

Recently, couple of algorithms were proposed (SPACL by Mao et al. andMixed-SCORE by Jin et al.), which are based on the ideas very similar to ours.


Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix

To account for sparsity:

P = 𝜌ΘBΘT

where 𝜌 > 0 is a sparsity parameter and we restrict maxk,l Bk,l = 1.

Spectral decomposition of probability matrix (exact):

P = ULUT,

We can conclude that

U = ΘF,

where F ∈ RK×K is some full rank matrix.


Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix

We can proceed with decomposition

U = ΘF.

Importantly, rows ui of matrix U lie in simplex:

−0.125 −0.100 −0.075 −0.050 −0.025 0.000 0.025 0.050

−0.100

−0.075

−0.050

−0.025

0.000

0.025

0.050

0.075

0.100


Successive projection overlapping clustering (SPOC)Successive projection algorithm

Question: How to detect simplex?

Answer: Successive projection algorithm (Araujo et al., 2001; Gillis and Vavasis,2014):

1 Find the point with the maximal norm: j* = arg maxj ‖uj‖.

2 fj = uj* .

3 U = U(I − fTj fj

‖fj‖2

).

4 Iterate

The final output is matrix F =(fj)Kj=1

.


Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix

Spectral decomposition of probability matrix (approximate):

A ≃ ULUT,

where L ∈ RK×K is diagonal matrix of top-K eigenvalues and U ∈ Rn×K is matrixof corresponding eigenvalues.

Similarly,

U = ΘF + N,

where F ∈ RK×K is some full rank matrix.


Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix

Importantly, rows ui of matrix U approximately lie in simplex:

−0.10 −0.05 0.00 0.05 0.10

−0.10

−0.05

0.00

0.05

0.10

So, we can compute estimate F of matrix F by SPA algorithm.


Successive projection overlapping clusteringResulting estimates

Estimate of the community-community matrix:

B = FLFT.

Estimate of community membership matrix:

Θ = UF−1.

Question: What about the efficiency of estimates?


Successive projection overlapping clustering (SPOC)

Algorithm 1 SPOC

Require: Adjacency matrix A and number of communities K .Ensure: Estimated 𝜌, Θ, B.

1: Get the rank-K eigenvalue decomposition A ≃ ULUT.2: Run SPA algorithm with input U, which outputs set of indices J of cardinality

K .3: F = [J, :].4: B = FLFT.5: 𝜌 = maxij Bij .

6: B = 1𝜌 B.

7: Θ = UF−1.


Provable efficiencyDavis-Kahan theorem

Lemma (Variant of Davis-Kahan)

Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest nonzerosingular value 𝜆K (P).

Let A be any symmetric matrix and U,U ∈ Rn×K be the K leadingeigenvectors of A and P, respectively.

Then there exists a K × K orthogonal matrix OP such that

‖U−UOP‖F ≤ 2√

2K‖A− P‖𝜆K (P)

.


Provable efficiencyConcentration in spectral norm

Lemma (Lei and Rinaldo, 2015)

Let A be the adjacency matrix of a random graph on n nodes in which edgesoccur independently.

Set E[A] = P = (pij)i,j=1,...,n and assume that nmaxij pij ≤ d for d ≥ c0 log nand c0 > 0.

Then, for any r > 0 there exists a constant C = C (r , c0) such that

‖A− P‖ ≤ C√d

with probability at least 1 − n−r .


Provable efficiencyQuality of SPA

Theorem (Gillis and Vavasis, 2014)

Let G = FW and G = G + N. Suppose that K ≥ 2 and the Condition 2 issatisfied. If in matrix N each column ni satisfies ‖ni‖F ≤ 𝜀 with

𝜀 ≤ 𝜆min(F)

1225√r,

then SPA algorithm with the input (G, r) returns the set of indices J such thatthere exists a permutation 𝜋 which gives

‖gJ(j) − f𝜋(j)‖2 ≤ (432𝜅(F) + 4)𝜀

for all j = 1, . . . , r , where gk and fk are the columns of matrices G and F

correspondingly. Here we denote by 𝜅(F) = 𝜆max (F)𝜆min(F)

is the condition number of the

matrix F.


Provable efficiencyBeyond Davis-Kahan

Lemma (Panov et al., 2017)

Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest non-zerosingular value 𝜆K (P).

Let A be any symmetric matrix such that ‖A− P‖ ≤ 12𝜆K (P) and U,U are

the n × K matrices of eigenvectors for matrices A and P corresponding totop-K eigenvalues.

Then

‖eTi (U−UOP)‖F ≤ 23K 1/2𝜅(P)‖eTi A‖F · ‖A− P‖

𝜆2K (P)

+‖eTi (A− P)U‖F

𝜆K (P),

where ei is a vector of length n with 1 in the i-th position and OP is someorthogonal matrix.


Provable efficiencyFinal theorem

Theorem (Panov et al., 2017)

There exist constants c and C depending only on the condition numbers of thematrices B and Θ and parameter r such that for 𝜌 ≥ c log n

n it holds with aprobability at least 1 − n−r that

𝜌B− 𝜌ΠBΠTF

‖𝜌B‖F≤ CK

√log n

𝜌n

and Θ−ΘΠT

F

‖Θ‖F≤ CK

√log n

𝜌n,

where Π is some permutation matrix and 𝜌 is maximal value in matrix B.


Provable efficiencyLower bound

Theorem

Consider the MMSB model. Then there exists a constant c > 0 that for𝜌 ≥ c log n

n the following lower bounds for matrices Θ, B hold

infΘ

supΘ∈Θn,K

P

(‖Θ−Θ‖F

‖Θ‖F≥ CΘ

1√𝜌n

)> 0.1,

infB

supB

P

(‖𝜌B− 𝜌B‖F

‖𝜌B‖F≥ CB

1

𝜌n

)> 0.1,

where CΘ,CB > 0 are some constants.


Provable efficiencyOpen question

We currently have the gap between lower and upper bounds form matrix B:

c1

𝜌n≤ inf

BsupB

‖𝜌B− 𝜌B‖F‖𝜌B‖F

≤ C1

√𝜌n

.

The idea for improved algorithm:


ExperimentsModel data

Default parameter settings:

number of nodes n = 5000;

number of communities K = 3;

pure nodes number 3;

Dirichlet parameter 𝛼 = 1/3;

Community-community matrix B = diag(0.3, 0.5, 0.7).

We consider several experiments.Each experiment was repeated 20 times and results were averaged over runs.



Figure: Experiment with varying number of nodes n.



Figure: Experiment with noisy off-diagonal elements of B.



Figure: Experiment with skewed B matrix.


ExperimentsReal data

Figure: Experiments on DBLP co-authorship networks.


Conclusions and outlook

Conclusions:

We proposed the algorithm SPOC for parameter estimation in MMSB whichis computationally efficient.

Theoretical guarantees on performance are provided.

The algorithm is still not perfect as well as analysis.

Outlook:It is interesting to extend the results to the cases of

dynamical networks;

multiplex networks.


Documents

Consistent estimation of Mixed Memberships with Successive ...ru.discrete-mathematics.org/conferences/201805/wog/panov.pdf · Graph models Mixed membership stochastic block model