Download pdf - Semi-blind Subgraph Reconstruction in Gaussian Graphical ... · OutlineProblem MotivationsModel FormulationExperimentsConclusionReferences Semi-blind Subgraph Reconstruction in Gaussian

Outline Problem Motivations Model Formulation Experiments Conclusion References

Semi-blind Subgraph Reconstruction in Gaussian GraphicalModels

†Tianpei Xie, ?Sijia Liu, ?Alfred O. Hero

†Transaction Risk Management Team @ Amazon 1

?University of Michigan, Ann Arbor

1This work was completed while Tianpei Xie was a Phd student at University of Michigan1 / 34


1 Problem Motivations

2 Model Formulation

3 Experiments

4 Conclusion

2 / 34


Backgrounds

• Learning a dependency graph from relational data is a key step in datavisualization and analysis. Examples include

1 recommendation system2 social network analysis [Goyal et al., 2010]3 sensor network analysis [Joshi and Boyd, 2009, Liu et al., 2016]

• However, in many situations, only a limited set of data is accessible, dueto• the limited budgets during data collections (e.g. labor, energy)• the restricted accessibility to data sources (e.g. data security, privacy)

• Semi-blinded subgraph topology learning problem: only see data ona subgraph but blind to the rest.

3 / 34


Semi-blinded subgraph topology learning problem

4 / 34


Challenges

• Challenges:• The influence of external latent data⇒ the target network⇒ bias in

inference

probabilistic models: marginalization⇒ false positives in edge detection

Figure: The red nodes are conditional independent given the blue node. Aftermarginalizing the blue node, it creates a false connection in the graph

• Assumption: additional information from external sources⇒ summaryinfo. of latent data

5 / 34


Settings

• Random graph signal x ∈ Rn ∼ N(0,Θ−1), Markovian w.r.t. G = (V,E),|V| = n⇒ xi ⊥⊥ xj |x−{i,j} ⇔Θi,j = 0 iff (i, j) < E.

• Consider• partition of V = V1 ∪ V2, non-overlapping, |V1| = n1, |V2| = n2, edge

set between V1,V2 denoted as E1,2

• accessible x1 := xV1 ∈ Rn1 , inaccessible (latent) x2 := xV2 ∈ Rn2

• precision matrix Θ :=

[Θ1 Θ12

Θ21 Θ2

]• G1 = (V1,E1), target network; E1 := E ∩ (V1 × V1)⇔Θ1

• Goal: estimate Θ1, given1 accessible data on V1, x1 ⇒ Σ1 := E[x1 xT

1 ], sample marginalcovariance

2 a noisy summary Θ2 ∈ Rn2×n2 of inverse covariance of x2, sharedby external sources

6 / 34


Settings

• Random graph signal x ∈ Rn ∼ N(0,Θ−1), Markovian w.r.t. G = (V,E),|V| = n⇒ xi ⊥⊥ xj |x−{i,j} ⇔Θi,j = 0 iff (i, j) < E.

• Consider• partition of V = V1 ∪ V2, non-overlapping, |V1| = n1, |V2| = n2, edge

set between V1,V2 denoted as E1,2

• accessible x1 := xV1 ∈ Rn1 , inaccessible (latent) x2 := xV2 ∈ Rn2

• precision matrix Θ :=

[Θ1 Θ12

Θ21 Θ2

]• G1 = (V1,E1), target network; E1 := E ∩ (V1 × V1)⇔Θ1

• Goal: estimate Θ1, given1 accessible data on V1, x1 ⇒ Σ1 := E[x1 xT

1 ], sample marginalcovariance

2 a noisy summary Θ2 ∈ Rn2×n2 of inverse covariance of x2, sharedby external sources

7 / 34


Network topology learning from partially shared information

8 / 34


Related works

• w/o latent variables, many algorithms to estimate Θ of Gaussiangraphical model. e.g.

1 `1 regularized ML, such as gLasso, [Friedman et al., 2008]

2 quadratic approximation, QUIC, [Hsieh et al., 2011]

3 `0 regularized ML, [Marjanovic and Hero, 2015]

• w/ latent variables, to estimate sub-matrix Θ1 of full precision Θ

1 the latent variable Gaussian graphical model (LV-GGM) by [Chandrasekaranet al., 2012]

2 Key

Θ1 := (Σ1)−1

= Θ1︸︷︷︸sparse

−Θ12 (Θ2)−1

Θ21︸︷︷︸low-rank

:= C − M ⇒ signal + confounding factor

3 Disadvantages• the effect of latent variables is uniform and global, not change during

propagation

• does not exploit the dependency structure among latent variables

9 / 34


Related works

• w/o latent variables, many algorithms to estimate Θ of Gaussiangraphical model. e.g.

1 `1 regularized ML, such as gLasso, [Friedman et al., 2008]

2 quadratic approximation, QUIC, [Hsieh et al., 2011]

3 `0 regularized ML, [Marjanovic and Hero, 2015]

• w/ latent variables, to estimate sub-matrix Θ1 of full precision Θ

1 the latent variable Gaussian graphical model (LV-GGM) by [Chandrasekaranet al., 2012]

2 Key

Θ1 := (Σ1)−1

= Θ1︸︷︷︸sparse

−Θ12 (Θ2)−1

Θ21︸︷︷︸low-rank

:= C − M ⇒ signal + confounding factor

3 Disadvantages• the effect of latent variables is uniform and global, not change during

propagation

• does not exploit the dependency structure among latent variables10 / 34


Global influence model vs. decayed influence model

(a) Global influence by LV-GGM

• E21 dense• no edge among nodes in V2,

i.e. x2 cond. indep given x1

(b) Decayed-influence latent variable model

• E21 sparse• edges among nodes in V2, i.e.

cond. dep ∼ Θ2

11 / 34


Our contributions

• Propose the decayed-influence latent variable Gaussian graphical model(DiLat-GGM) that

1 takes into account the decayed influence effect during thepropagation of info.

2 fully utilizes the shared dependency information from externalsources

3 latent variable inference and selection

12 / 34


LV-GGM vs. DiLat-GGM

LV-GGM DiLat-GGM

variables C ∈ Rn1×n1 , M ∈ Rn1×n1C ∈ Rn1×n1

B := Θ12Θ−12 ∈ Rn1×n2

known Σ1, α,β Σ1, α,β,Θ2 � 0 ∈ Rn2×n2

constraint Θ1 = C − M � 0 Θ1 = C − BΘ2BT � 0

key M � 0, low-rankΘ21 = Θ2BT =[0

ΘδV2,1

], row-sparse

infer. onlatent var No

Yes. p(x2|x1) =

N(µ2|1, Θ2),µ2|1 = BT x1

latent feat.sel. No Yes.

convexity Yes No

implemt. ADMMConvex-concave

procedure (CCP) +ADMM

13 / 34


The decayed-influence latent variable Gaussian graphical model

The proposed DiLat-GGM solves the following

minC,B

− log det(

C − BΘ2BT)+ tr

(Σ1

(C − BΘ2BT

))+ αm ‖C‖1︸︷︷︸

sparsity of cond. graph

+ βm

∥∥∥Θ2BT∥∥∥

2,1︸︷︷︸sparsity of Ecross & latent feat. sel.

s.t. C − BΘ2BT � 0,

where

•

∥∥∥Θ2BT∥∥∥

2,1:=

∑i∈V2

∥∥∥[Θ2BT ]i

∥∥∥2

is the mixed `21 norm.

• An external source provides Θ2 ⇒ partial corr. of x2

• DiLat-GGM is a Difference-of-Convex program and can be solved viaconvex-concave procedure (CCP) [Yuille et al., 2002, Lipp and Boyd,2016]

14 / 34


The convex-concave procedure

• Example: find x∗ = argmin(f(x) − g(x)).

• Iteratively solve for xt := argmin(f(x) − g(xt−1) −∇g(xt−1)(x − xt−1))

• For DiLat-GGM, g(B) = tr(Σ1BΘ2B

)T, the rest is f(·).

15 / 34


Experiments

• Compare algorithms:• DiLat-GGM• GLasso [Friedman et al., 2008]• LV-GGM [Chandrasekaran et al., 2012]• EM-GLasso [Yuan, 2012].• Generalized Laplacian learning (GenLap) [Pavez and Ortega, 2016]

• m i.i.d realizations of x = [x1, . . . , xn]. m = 400.

• Three types of graphs:1 The complete binary tree (h :=height)2 The grid (w :=width, h :=height)3 The Erdos-Renyi (n,p)

• The Jaccard distance error [Jaccard, 1901, Choi et al., 2010] for edgeselection: between two sets A ,B as

distJ(A ,B) = 1 −|A ∩ B |

|A ∪ B |∈ [0,1].

1 A := non-zero support set of estimated Θ1

2 B := E1, the ground true edge set

16 / 34


Experiments

• Compare algorithms:• DiLat-GGM• GLasso [Friedman et al., 2008]• LV-GGM [Chandrasekaran et al., 2012]• EM-GLasso [Yuan, 2012].• Generalized Laplacian learning (GenLap) [Pavez and Ortega, 2016]

• m i.i.d realizations of x = [x1, . . . , xn]. m = 400.

• Three types of graphs:1 The complete binary tree (h :=height)2 The grid (w :=width, h :=height)3 The Erdos-Renyi (n,p)

• The Jaccard distance error [Jaccard, 1901, Choi et al., 2010] for edgeselection: between two sets A ,B as

distJ(A ,B) = 1 −|A ∩ B |

|A ∪ B |∈ [0,1].

1 A := non-zero support set of estimated Θ1

2 B := E1, the ground true edge set

17 / 34


Comparison of mean edge selection error

18 / 34


Comparison of Learned Network

(a) Ground truth (b) GLasso

(c) LV-GGM (d) DiLat-GGM19 / 34


Conclusion

• We propose the DiLat-GGM as a generalization of the LV-GGM

• The proposed model learns network topology given internal data and asummary of latent factors from external source

• Efficient algorithm based on CCP is proposed

• Future research direction: large-scale network learning, hierarchicalmodels

20 / 34


Thank you !

21 / 34


References I

Venkat Chandrasekaran, Pablo A Parrilo, and Alan S Willsky. Latent variablegraphical model selection via convex optimization. The Annals of Statistics,40(4):1935–1967, 2012.

Seung-Seok Choi, Sung-Hyuk Cha, and Charles C Tappert. A survey ofbinary similarity and distance measures. Journal of Systemics,Cybernetics and Informatics, 8(1):43–48, 2010.

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Sparse inversecovariance estimation with the graphical lasso. Biostatistics, 9(3):432–441,2008.

Amit Goyal, Francesco Bonchi, and Laks VS Lakshmanan. Learninginfluence probabilities in social networks. In Proceedings of the third ACMinternational conference on Web search and data mining, pages 241–250.ACM, 2010.

Cho-Jui Hsieh, Inderjit S Dhillon, Pradeep K Ravikumar, and Matyas ASustik. Sparse inverse covariance matrix estimation using quadraticapproximation. Advances in neural information processing systems, pages2330–2338, 2011.

22 / 34


References II

Paul Jaccard. Etude comparative de la distribution florale dans une portiondes Alpes et du Jura. Impr. Corbaz, 1901.

Siddharth Joshi and Stephen Boyd. Sensor selection via convex optimization.IEEE Transactions on Signal Processing, 57(2):451–462, 2009.

Thomas Lipp and Stephen Boyd. Variations and extension of theconvex–concave procedure. Optimization and Engineering, 17(2):263–287, 2016.

Sijia Liu, Sundeep Prabhakar Chepuri, Makan Fardad, Engin Masazade,Geert Leus, and Pramod K Varshney. Sensor selection for estimation withcorrelated measurement noise. IEEE Transactions on Signal Processing,64(13):3509–3522, 2016.

Goran Marjanovic and Alfred O Hero. `0 sparse inverse covarianceestimation. IEEE Transactions on Signal Processing, 63(12):3218–3231,2015.

Eduardo Pavez and Antonio Ortega. Generalized laplacian precision matrixestimation for graph signal processing. In 2016 IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP), pages6350–6354. IEEE, 2016.

23 / 34


References III

Ming Yuan. Discussion: Latent variable graphical model selection via convexoptimization. The Annals of Statistics, 40(4):1968–1972, 2012.

Alan L Yuille, Anand Rangarajan, and AL Yuille. The concave-convexprocedure (cccp). Advances in neural information processing systems, 2:1033–1040, 2002.

24 / 34


DiLat-GGM as Difference-of-Convex program

minC,B

− log det(

C − BΘ2BT)+ tr

(Σ1C

)︸︷︷︸

f(C,B) convex

− tr(Σ1BΘ2BT

)︸︷︷︸

g(B) convex

+regularizer

s.t. C − BΘ2BT � 0,

• f(C,B) = − log det

[C BBT Θ

−12

]+ tr

(Σ1C

)convex

g(B) = vec(

BT)T (

Σ1 ⊗ Θ2

)vec

(BT)

convex

• can be solved via convex-concave procedure (CCP) [Yuille et al.,2002, Lipp and Boyd, 2016].

25 / 34


The convex sub-problem

At iteration t ,

(C t+1,B t+1) = minC,B

. . .+ tr(Σ1

(C − 2BDT

t

))(1)

s.t. . . .

where ∇Bg(B t) = 2Σ1B tΘ2, D t := B tΘ2.

• SDP problem⇒ convex

• CCP is a special form of Majorization-minimization (MM) algorithm.

• Guarantee to converge to local stationary point (regardless of choice ofinitial point)

• SDP time complexity O(n6.5)⇒ an efficient solver based on ADMM,O(n3)

26 / 34


Solving sub-problem using ADMM

• Define R :=

[C BBT Θ

−12

], P =

[P1 PT

21P21 P2

]:= R, W := Θ2P21

We reformulate the convex sub-problem as

minR,P,W

− log detR + tr (S t R) + 1 {R � 0}+ αm ‖P1‖1 + βm ‖W‖2,1 (2)

s.t. P2 = Θ−12

R = P

W = Θ2P21

where 1 {A } is the indicator function, S t :=

[Σ1 −Σ1D t

−DTt Σ1 γt I

]• ADMM solves three subproblems w.r.t. R,P,W iteratively

27 / 34


Sensitive to α,β

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Jaccard

dis

tance e

rror

=0.01

=0.1

=0.5

=1

=5

Erdos-Renyi n = 30,p = 0.16

28 / 34


Sensitivity to Θ2

10-1

100

101

102

103

SNR (dB)

0

0.2

0.4

0.6

0.8

1

Jaccard

dis

tance e

rror=0.2, =2.0

=0.1, =1.0=0.1, =2.0

• Θ2 = L2 + σ2G, where G = HHT/n2, Hi,j ∼ N(0,1), L2 is the inverse

covariance matrix of x2.

• The Signal-to-Noise Ratio (SNR) is defined as log

(‖L2‖2

Fσ2

)(dB)

29 / 34


Sensitivity to Θ2 (cond. correlated latent var.)

0.00

0.05

0.10

0.15

0.20ja

cca

rd d

ista

nce

dilat -ggm

lv-ggm

Erdos-Renyi

30 / 34


Sensitivity to Θ2 (cond. correlated latent var.)

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

24

25

26

27

2829

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

2829

0

1

2

3

4

7

8

9

12

13

14

15

16

17

19

22

23

24

25

6

27

2829

31 / 34


Sensitivity to Θ2 (cond. indep. latent var.)

Θ2 = L2 Θ2 = I0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14jacc

ard distance

dilat-ggm

lv-ggm

Complete binary tree

32 / 34


Sensitivity to Θ2

0

1

2

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

18

19

20

21 22

23

24

2526

27

28

29

30

0

1

2

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

18

19

20

21 22

23

24

2526

27

28

29

30

0

1

2

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

18

19

20

21 22

23

24

2526

27

28

29

30

33 / 34