32
Consistent estimation of Mixed Memberships with Successive Projections Maxim Panov joint work with E. Marshakov, R. Ushakov and N. Mokrov Skoltech and IITP 15.05.2018

Consistent estimation of Mixed Memberships with Successive ...ru.discrete-mathematics.org/conferences/201805/wog/panov.pdf · Graph models Mixed membership stochastic block model

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Consistent estimation of Mixed Memberships withSuccessive Projections

Maxim Panovjoint work with E. Marshakov, R. Ushakov and N. Mokrov

Skoltech and IITP

15.05.2018

Community detectionProblem statement

Graph G (E ,V ):I nodes vj ;I edges Aij .

Problem: we want to partition graph in such a way that there are few edgesbetween groups.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 2 / 31

Community detectionOverlapping communities

Non-overlapping vs. overlapping communities

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 3 / 31

Graph modelsErdos-Renyi graph

Simplest possible random graph model

Aij = Bernoulli(p),

where Aij are independent and p ∈ [0, 1].

Figure: Erdos-Renyi graph with p = 0.5.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 4 / 31

Graph modelsErdos-Renyi graph

Simplest possible random graph model

Aij = Bernoulli(p),

where Aij are independent and p ∈ [0, 1].

Figure: Erdos-Renyi graph with p = 0.5.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 4 / 31

Graph modelsGeneralized Erdos-Renyi graph

Simple generalization of Erdos-Renyi model:

Aij = Bernoulli(pij),

where pij ∈ [0, 1].

In a matrix form we can write

A ∼ Bernoulli(P),

where P = {pij}ni,j=1.

Question: what types of matrix P allow for community structure?

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 5 / 31

Graph modelsStochastic block model (SBM)

Figure: Example of stochastic block model and corresponding graph.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 6 / 31

Graph modelsMixed membership stochastic block model (MMSB)

Graph edges are generated according generalized Erdos-Renyi model:

A ∼ Bernoulli(P).

The probability matrix P can be factorized as

P = ΘBΘT,

where

B ∈ [0, 1]K×K is a symmetric matrix of community-community probabilities;

Θ ∈ [0, 1]n×K is a community membership matrix.

ConditionWe assume that

1 Every row of matrix Θ sums to 1:∑K

k=1 𝜃ik = 1, i = 1, . . . , n;

2 (optional) All the community membership vectors are independent draws fromDirichlet distribution, i.e. 𝜃i ∼ Dirichlet(𝛼) for some 𝛼 ∈ RK

+, i = 1, . . . , n.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 7 / 31

Graph modelsMMSB examples

As discussed, in MMSB model the probability matrix is

P = ΘBΘT.

It means that

pij =K∑

k,l=1

𝜃ik𝜃jlbkl .

SBM is particular case of MMSB with the property that for any i ∈ 1, n thereexists k ∈ 1,K such that

𝜃ik = 1 and 𝜃il = 0, k = l

leading to

pij = bkl

for any i , j = 1, . . . n and some k = k(i), l = l(j).Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 8 / 31

Graph modelsIdentifiability of MMSB

Problem: If our goal is estimation of parameters Θ and B, whether the truevalues are unique?

Answer: Of course not, for example

then

P(1) = M1 I3 M1T = I3 M2 I3 = P(2),

where I3 is an identity matrix of size 3.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 9 / 31

Graph modelsIdentifiability of MMSB

Condition (Identifiability)

1 There is at least one “pure” node at each community, i.e. for eachk = 1, . . . ,K there exists i such that 𝜃ik =

∑Kl=1 𝜃il = 1.

2 Matrix B ∈ [0, 1]K×K is full rank.

3 Every row of matrix Θ sums to 1:∑K

k=1 𝜃ik = 1, i = 1, . . . , n.

Theorem

If the Condition (Identifiability) is satisfied then the MMSB is identifiable, i.e. forevery P = ΘBΘT matrices Θ and B are uniquely defined up to permutation ofcommunities (columns of matrix Θ and rows and columns of matrix B).

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 10 / 31

Algorithms for parameter estimation in MMSB

There exist several algorithms for parameter estimation in MMSB:

stochastic variational inference (Airoldi at al., 2009; SVI);

tensor spectral method (Anandkumar et al., 2013; Tensor);

geometrical nonnegative matrix factorization (Mao et al., 2013; GeoNMF).

Problems of these methods:

absence of provable guarantees (SVI);

high computational complexity (SVI, Tensor);

applicability only to limited subclass of MMSB (GeoNMF).

Recently, couple of algorithms were proposed (SPACL by Mao et al. andMixed-SCORE by Jin et al.), which are based on the ideas very similar to ours.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 11 / 31

Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix

To account for sparsity:

P = 𝜌ΘBΘT

where 𝜌 > 0 is a sparsity parameter and we restrict maxk,l Bk,l = 1.

Spectral decomposition of probability matrix (exact):

P = ULUT,

We can conclude that

U = ΘF,

where F ∈ RK×K is some full rank matrix.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 12 / 31

Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix

We can proceed with decomposition

U = ΘF.

Importantly, rows ui of matrix U lie in simplex:

−0.125 −0.100 −0.075 −0.050 −0.025 0.000 0.025 0.050

−0.100

−0.075

−0.050

−0.025

0.000

0.025

0.050

0.075

0.100

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 13 / 31

Successive projection overlapping clustering (SPOC)Successive projection algorithm

Question: How to detect simplex?

Answer: Successive projection algorithm (Araujo et al., 2001; Gillis and Vavasis,2014):

1 Find the point with the maximal norm: j* = arg maxj ‖uj‖.

2 fj = uj* .

3 U = U(I − fTj fj

‖fj‖2

).

4 Iterate

The final output is matrix F =(fj)Kj=1

.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 14 / 31

Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix

Spectral decomposition of probability matrix (approximate):

A ≃ ULUT,

where L ∈ RK×K is diagonal matrix of top-K eigenvalues and U ∈ Rn×K is matrixof corresponding eigenvalues.

Similarly,

U = ΘF + N,

where F ∈ RK×K is some full rank matrix.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 15 / 31

Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix

Importantly, rows ui of matrix U approximately lie in simplex:

−0.10 −0.05 0.00 0.05 0.10

−0.10

−0.05

0.00

0.05

0.10

So, we can compute estimate F of matrix F by SPA algorithm.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 16 / 31

Successive projection overlapping clusteringResulting estimates

Estimate of the community-community matrix:

B = FLFT.

Estimate of community membership matrix:

Θ = UF−1.

Question: What about the efficiency of estimates?

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 17 / 31

Successive projection overlapping clustering (SPOC)

Algorithm 1 SPOC

Require: Adjacency matrix A and number of communities K .Ensure: Estimated 𝜌, Θ, B.

1: Get the rank-K eigenvalue decomposition A ≃ ULUT.2: Run SPA algorithm with input U, which outputs set of indices J of cardinality

K .3: F = [J, :].4: B = FLFT.5: 𝜌 = maxij Bij .

6: B = 1𝜌 B.

7: Θ = UF−1.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 18 / 31

Provable efficiencyDavis-Kahan theorem

Lemma (Variant of Davis-Kahan)

Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest nonzerosingular value 𝜆K (P).

Let A be any symmetric matrix and U,U ∈ Rn×K be the K leadingeigenvectors of A and P, respectively.

Then there exists a K × K orthogonal matrix OP such that

‖U−UOP‖F ≤ 2√

2K‖A− P‖𝜆K (P)

.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 19 / 31

Provable efficiencyConcentration in spectral norm

Lemma (Lei and Rinaldo, 2015)

Let A be the adjacency matrix of a random graph on n nodes in which edgesoccur independently.

Set E[A] = P = (pij)i,j=1,...,n and assume that nmaxij pij ≤ d for d ≥ c0 log nand c0 > 0.

Then, for any r > 0 there exists a constant C = C (r , c0) such that

‖A− P‖ ≤ C√d

with probability at least 1 − n−r .

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 20 / 31

Provable efficiencyQuality of SPA

Theorem (Gillis and Vavasis, 2014)

Let G = FW and G = G + N. Suppose that K ≥ 2 and the Condition 2 issatisfied. If in matrix N each column ni satisfies ‖ni‖F ≤ 𝜀 with

𝜀 ≤ 𝜆min(F)

1225√r,

then SPA algorithm with the input (G, r) returns the set of indices J such thatthere exists a permutation 𝜋 which gives

‖gJ(j) − f𝜋(j)‖2 ≤ (432𝜅(F) + 4)𝜀

for all j = 1, . . . , r , where gk and fk are the columns of matrices G and F

correspondingly. Here we denote by 𝜅(F) = 𝜆max (F)𝜆min(F)

is the condition number of the

matrix F.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 21 / 31

Provable efficiencyBeyond Davis-Kahan

Lemma (Panov et al., 2017)

Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest non-zerosingular value 𝜆K (P).

Let A be any symmetric matrix such that ‖A− P‖ ≤ 12𝜆K (P) and U,U are

the n × K matrices of eigenvectors for matrices A and P corresponding totop-K eigenvalues.

Then

‖eTi (U−UOP)‖F ≤ 23K 1/2𝜅(P)‖eTi A‖F · ‖A− P‖

𝜆2K (P)

+‖eTi (A− P)U‖F

𝜆K (P),

where ei is a vector of length n with 1 in the i-th position and OP is someorthogonal matrix.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 22 / 31

Provable efficiencyFinal theorem

Theorem (Panov et al., 2017)

There exist constants c and C depending only on the condition numbers of thematrices B and Θ and parameter r such that for 𝜌 ≥ c log n

n it holds with aprobability at least 1 − n−r that

𝜌B− 𝜌ΠBΠTF

‖𝜌B‖F≤ CK

√log n

𝜌n

and Θ−ΘΠT

F

‖Θ‖F≤ CK

√log n

𝜌n,

where Π is some permutation matrix and 𝜌 is maximal value in matrix B.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 23 / 31

Provable efficiencyLower bound

Theorem

Consider the MMSB model. Then there exists a constant c > 0 that for𝜌 ≥ c log n

n the following lower bounds for matrices Θ, B hold

infΘ

supΘ∈Θn,K

P

(‖Θ−Θ‖F

‖Θ‖F≥ CΘ

1√𝜌n

)> 0.1,

infB

supB

P

(‖𝜌B− 𝜌B‖F

‖𝜌B‖F≥ CB

1

𝜌n

)> 0.1,

where CΘ,CB > 0 are some constants.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 24 / 31

Provable efficiencyOpen question

We currently have the gap between lower and upper bounds form matrix B:

c1

𝜌n≤ inf

BsupB

‖𝜌B− 𝜌B‖F‖𝜌B‖F

≤ C1

√𝜌n

.

The idea for improved algorithm:

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 25 / 31

ExperimentsModel data

Default parameter settings:

number of nodes n = 5000;

number of communities K = 3;

pure nodes number 3;

Dirichlet parameter 𝛼 = 1/3;

Community-community matrix B = diag(0.3, 0.5, 0.7).

We consider several experiments.Each experiment was repeated 20 times and results were averaged over runs.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 26 / 31

ExperimentsModel data

Figure: Experiment with varying number of nodes n.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 27 / 31

ExperimentsModel data

Figure: Experiment with noisy off-diagonal elements of B.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 28 / 31

ExperimentsModel data

Figure: Experiment with skewed B matrix.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 29 / 31

ExperimentsReal data

Figure: Experiments on DBLP co-authorship networks.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 30 / 31

Conclusions and outlook

Conclusions:

We proposed the algorithm SPOC for parameter estimation in MMSB whichis computationally efficient.

Theoretical guarantees on performance are provided.

The algorithm is still not perfect as well as analysis.

Outlook:It is interesting to extend the results to the cases of

dynamical networks;

multiplex networks.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 31 / 31