Outline Problem Motivations Model Formulation Experiments Conclusion References
Semi-blind Subgraph Reconstruction in Gaussian GraphicalModels
†Tianpei Xie, ?Sijia Liu, ?Alfred O. Hero
†Transaction Risk Management Team @ Amazon 1
?University of Michigan, Ann Arbor
1This work was completed while Tianpei Xie was a Phd student at University of Michigan1 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
1 Problem Motivations
2 Model Formulation
3 Experiments
4 Conclusion
2 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Backgrounds
• Learning a dependency graph from relational data is a key step in datavisualization and analysis. Examples include
1 recommendation system2 social network analysis [Goyal et al., 2010]3 sensor network analysis [Joshi and Boyd, 2009, Liu et al., 2016]
• However, in many situations, only a limited set of data is accessible, dueto• the limited budgets during data collections (e.g. labor, energy)• the restricted accessibility to data sources (e.g. data security, privacy)
• Semi-blinded subgraph topology learning problem: only see data ona subgraph but blind to the rest.
3 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Semi-blinded subgraph topology learning problem
4 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Challenges
• Challenges:• The influence of external latent data⇒ the target network⇒ bias in
inference
probabilistic models: marginalization⇒ false positives in edge detection
Figure: The red nodes are conditional independent given the blue node. Aftermarginalizing the blue node, it creates a false connection in the graph
• Assumption: additional information from external sources⇒ summaryinfo. of latent data
5 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Settings
• Random graph signal x ∈ Rn ∼ N(0,Θ−1), Markovian w.r.t. G = (V,E),|V| = n⇒ xi ⊥⊥ xj |x−{i,j} ⇔Θi,j = 0 iff (i, j) < E.
• Consider• partition of V = V1 ∪ V2, non-overlapping, |V1| = n1, |V2| = n2, edge
set between V1,V2 denoted as E1,2
• accessible x1 := xV1 ∈ Rn1 , inaccessible (latent) x2 := xV2 ∈ Rn2
• precision matrix Θ :=
[Θ1 Θ12
Θ21 Θ2
]• G1 = (V1,E1), target network; E1 := E ∩ (V1 × V1)⇔Θ1
• Goal: estimate Θ1, given1 accessible data on V1, x1 ⇒ Σ1 := E[x1 xT
1 ], sample marginalcovariance
2 a noisy summary Θ2 ∈ Rn2×n2 of inverse covariance of x2, sharedby external sources
6 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Settings
• Random graph signal x ∈ Rn ∼ N(0,Θ−1), Markovian w.r.t. G = (V,E),|V| = n⇒ xi ⊥⊥ xj |x−{i,j} ⇔Θi,j = 0 iff (i, j) < E.
• Consider• partition of V = V1 ∪ V2, non-overlapping, |V1| = n1, |V2| = n2, edge
set between V1,V2 denoted as E1,2
• accessible x1 := xV1 ∈ Rn1 , inaccessible (latent) x2 := xV2 ∈ Rn2
• precision matrix Θ :=
[Θ1 Θ12
Θ21 Θ2
]• G1 = (V1,E1), target network; E1 := E ∩ (V1 × V1)⇔Θ1
• Goal: estimate Θ1, given1 accessible data on V1, x1 ⇒ Σ1 := E[x1 xT
1 ], sample marginalcovariance
2 a noisy summary Θ2 ∈ Rn2×n2 of inverse covariance of x2, sharedby external sources
7 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Network topology learning from partially shared information
8 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Related works
• w/o latent variables, many algorithms to estimate Θ of Gaussiangraphical model. e.g.
1 `1 regularized ML, such as gLasso, [Friedman et al., 2008]
2 quadratic approximation, QUIC, [Hsieh et al., 2011]
3 `0 regularized ML, [Marjanovic and Hero, 2015]
• w/ latent variables, to estimate sub-matrix Θ1 of full precision Θ
1 the latent variable Gaussian graphical model (LV-GGM) by [Chandrasekaranet al., 2012]
2 Key
Θ1 := (Σ1)−1
= Θ1︸︷︷︸sparse
−Θ12 (Θ2)−1
Θ21︸ ︷︷ ︸low-rank
:= C − M ⇒ signal + confounding factor
3 Disadvantages• the effect of latent variables is uniform and global, not change during
propagation
• does not exploit the dependency structure among latent variables
9 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Related works
• w/o latent variables, many algorithms to estimate Θ of Gaussiangraphical model. e.g.
1 `1 regularized ML, such as gLasso, [Friedman et al., 2008]
2 quadratic approximation, QUIC, [Hsieh et al., 2011]
3 `0 regularized ML, [Marjanovic and Hero, 2015]
• w/ latent variables, to estimate sub-matrix Θ1 of full precision Θ
1 the latent variable Gaussian graphical model (LV-GGM) by [Chandrasekaranet al., 2012]
2 Key
Θ1 := (Σ1)−1
= Θ1︸︷︷︸sparse
−Θ12 (Θ2)−1
Θ21︸ ︷︷ ︸low-rank
:= C − M ⇒ signal + confounding factor
3 Disadvantages• the effect of latent variables is uniform and global, not change during
propagation
• does not exploit the dependency structure among latent variables10 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Global influence model vs. decayed influence model
(a) Global influence by LV-GGM
• E21 dense• no edge among nodes in V2,
i.e. x2 cond. indep given x1
(b) Decayed-influence latent variable model
• E21 sparse• edges among nodes in V2, i.e.
cond. dep ∼ Θ2
11 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Our contributions
• Propose the decayed-influence latent variable Gaussian graphical model(DiLat-GGM) that
1 takes into account the decayed influence effect during thepropagation of info.
2 fully utilizes the shared dependency information from externalsources
3 latent variable inference and selection
12 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
LV-GGM vs. DiLat-GGM
LV-GGM DiLat-GGM
variables C ∈ Rn1×n1 , M ∈ Rn1×n1C ∈ Rn1×n1
B := Θ12Θ−12 ∈ Rn1×n2
known Σ1, α,β Σ1, α,β,Θ2 � 0 ∈ Rn2×n2
constraint Θ1 = C − M � 0 Θ1 = C − BΘ2BT � 0
key M � 0, low-rankΘ21 = Θ2BT =[0
ΘδV2,1
], row-sparse
infer. onlatent var No
Yes. p(x2|x1) =
N(µ2|1, Θ2),µ2|1 = BT x1
latent feat.sel. No Yes.
convexity Yes No
implemt. ADMMConvex-concave
procedure (CCP) +ADMM
13 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
The decayed-influence latent variable Gaussian graphical model
The proposed DiLat-GGM solves the following
minC,B
− log det(
C − BΘ2BT)+ tr
(Σ1
(C − BΘ2BT
))+ αm ‖C‖1︸ ︷︷ ︸
sparsity of cond. graph
+ βm
∥∥∥Θ2BT∥∥∥
2,1︸ ︷︷ ︸sparsity of Ecross & latent feat. sel.
s.t. C − BΘ2BT � 0,
where
•
∥∥∥Θ2BT∥∥∥
2,1:=
∑i∈V2
∥∥∥[Θ2BT ]i
∥∥∥2
is the mixed `21 norm.
• An external source provides Θ2 ⇒ partial corr. of x2
• DiLat-GGM is a Difference-of-Convex program and can be solved viaconvex-concave procedure (CCP) [Yuille et al., 2002, Lipp and Boyd,2016]
14 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
The convex-concave procedure
• Example: find x∗ = argmin(f(x) − g(x)).
• Iteratively solve for xt := argmin(f(x) − g(xt−1) −∇g(xt−1)(x − xt−1))
• For DiLat-GGM, g(B) = tr(Σ1BΘ2B
)T, the rest is f(·).
15 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Experiments
• Compare algorithms:• DiLat-GGM• GLasso [Friedman et al., 2008]• LV-GGM [Chandrasekaran et al., 2012]• EM-GLasso [Yuan, 2012].• Generalized Laplacian learning (GenLap) [Pavez and Ortega, 2016]
• m i.i.d realizations of x = [x1, . . . , xn]. m = 400.
• Three types of graphs:1 The complete binary tree (h :=height)2 The grid (w :=width, h :=height)3 The Erdos-Renyi (n,p)
• The Jaccard distance error [Jaccard, 1901, Choi et al., 2010] for edgeselection: between two sets A ,B as
distJ(A ,B) = 1 −|A ∩ B |
|A ∪ B |∈ [0,1].
1 A := non-zero support set of estimated Θ1
2 B := E1, the ground true edge set
16 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Experiments
• Compare algorithms:• DiLat-GGM• GLasso [Friedman et al., 2008]• LV-GGM [Chandrasekaran et al., 2012]• EM-GLasso [Yuan, 2012].• Generalized Laplacian learning (GenLap) [Pavez and Ortega, 2016]
• m i.i.d realizations of x = [x1, . . . , xn]. m = 400.
• Three types of graphs:1 The complete binary tree (h :=height)2 The grid (w :=width, h :=height)3 The Erdos-Renyi (n,p)
• The Jaccard distance error [Jaccard, 1901, Choi et al., 2010] for edgeselection: between two sets A ,B as
distJ(A ,B) = 1 −|A ∩ B |
|A ∪ B |∈ [0,1].
1 A := non-zero support set of estimated Θ1
2 B := E1, the ground true edge set
17 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Comparison of mean edge selection error
18 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Comparison of Learned Network
(a) Ground truth (b) GLasso
(c) LV-GGM (d) DiLat-GGM19 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Conclusion
• We propose the DiLat-GGM as a generalization of the LV-GGM
• The proposed model learns network topology given internal data and asummary of latent factors from external source
• Efficient algorithm based on CCP is proposed
• Future research direction: large-scale network learning, hierarchicalmodels
20 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Thank you !
21 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
References I
Venkat Chandrasekaran, Pablo A Parrilo, and Alan S Willsky. Latent variablegraphical model selection via convex optimization. The Annals of Statistics,40(4):1935–1967, 2012.
Seung-Seok Choi, Sung-Hyuk Cha, and Charles C Tappert. A survey ofbinary similarity and distance measures. Journal of Systemics,Cybernetics and Informatics, 8(1):43–48, 2010.
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Sparse inversecovariance estimation with the graphical lasso. Biostatistics, 9(3):432–441,2008.
Amit Goyal, Francesco Bonchi, and Laks VS Lakshmanan. Learninginfluence probabilities in social networks. In Proceedings of the third ACMinternational conference on Web search and data mining, pages 241–250.ACM, 2010.
Cho-Jui Hsieh, Inderjit S Dhillon, Pradeep K Ravikumar, and Matyas ASustik. Sparse inverse covariance matrix estimation using quadraticapproximation. Advances in neural information processing systems, pages2330–2338, 2011.
22 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
References II
Paul Jaccard. Etude comparative de la distribution florale dans une portiondes Alpes et du Jura. Impr. Corbaz, 1901.
Siddharth Joshi and Stephen Boyd. Sensor selection via convex optimization.IEEE Transactions on Signal Processing, 57(2):451–462, 2009.
Thomas Lipp and Stephen Boyd. Variations and extension of theconvex–concave procedure. Optimization and Engineering, 17(2):263–287, 2016.
Sijia Liu, Sundeep Prabhakar Chepuri, Makan Fardad, Engin Masazade,Geert Leus, and Pramod K Varshney. Sensor selection for estimation withcorrelated measurement noise. IEEE Transactions on Signal Processing,64(13):3509–3522, 2016.
Goran Marjanovic and Alfred O Hero. `0 sparse inverse covarianceestimation. IEEE Transactions on Signal Processing, 63(12):3218–3231,2015.
Eduardo Pavez and Antonio Ortega. Generalized laplacian precision matrixestimation for graph signal processing. In 2016 IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP), pages6350–6354. IEEE, 2016.
23 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
References III
Ming Yuan. Discussion: Latent variable graphical model selection via convexoptimization. The Annals of Statistics, 40(4):1968–1972, 2012.
Alan L Yuille, Anand Rangarajan, and AL Yuille. The concave-convexprocedure (cccp). Advances in neural information processing systems, 2:1033–1040, 2002.
24 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
DiLat-GGM as Difference-of-Convex program
minC,B
− log det(
C − BΘ2BT)+ tr
(Σ1C
)︸ ︷︷ ︸
f(C,B) convex
− tr(Σ1BΘ2BT
)︸ ︷︷ ︸
g(B) convex
+regularizer
s.t. C − BΘ2BT � 0,
• f(C,B) = − log det
[C BBT Θ
−12
]+ tr
(Σ1C
)convex
g(B) = vec(
BT)T (
Σ1 ⊗ Θ2
)vec
(BT)
convex
• can be solved via convex-concave procedure (CCP) [Yuille et al.,2002, Lipp and Boyd, 2016].
25 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
The convex sub-problem
At iteration t ,
(C t+1,B t+1) = minC,B
. . .+ tr(Σ1
(C − 2BDT
t
))(1)
s.t. . . .
where ∇Bg(B t) = 2Σ1B tΘ2, D t := B tΘ2.
• SDP problem⇒ convex
• CCP is a special form of Majorization-minimization (MM) algorithm.
• Guarantee to converge to local stationary point (regardless of choice ofinitial point)
• SDP time complexity O(n6.5)⇒ an efficient solver based on ADMM,O(n3)
26 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Solving sub-problem using ADMM
• Define R :=
[C BBT Θ
−12
], P =
[P1 PT
21P21 P2
]:= R, W := Θ2P21
We reformulate the convex sub-problem as
minR,P,W
− log detR + tr (S t R) + 1 {R � 0}+ αm ‖P1‖1 + βm ‖W‖2,1 (2)
s.t. P2 = Θ−12
R = P
W = Θ2P21
where 1 {A } is the indicator function, S t :=
[Σ1 −Σ1D t
−DTt Σ1 γt I
]• ADMM solves three subproblems w.r.t. R,P,W iteratively
27 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Sensitive to α,β
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Jaccard
dis
tance e
rror
=0.01
=0.1
=0.5
=1
=5
Erdos-Renyi n = 30,p = 0.16
28 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Sensitivity to Θ2
10-1
100
101
102
103
SNR (dB)
0
0.2
0.4
0.6
0.8
1
Jaccard
dis
tance e
rror=0.2, =2.0
=0.1, =1.0=0.1, =2.0
• Θ2 = L2 + σ2G, where G = HHT/n2, Hi,j ∼ N(0,1), L2 is the inverse
covariance matrix of x2.
• The Signal-to-Noise Ratio (SNR) is defined as log
(‖L2‖2
Fσ2
)(dB)
29 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Sensitivity to Θ2 (cond. correlated latent var.)
0.00
0.05
0.10
0.15
0.20ja
cca
rd d
ista
nce
dilat -ggm
lv-ggm
Erdos-Renyi
30 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Sensitivity to Θ2 (cond. correlated latent var.)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
24
25
26
27
2829
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
2829
0
1
2
3
4
7
8
9
12
13
14
15
16
17
19
22
23
24
25
6
27
2829
31 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Sensitivity to Θ2 (cond. indep. latent var.)
Θ2 = L2 Θ2 = I0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14jacc
ard distance
dilat-ggm
lv-ggm
Complete binary tree
32 / 34
Outline Problem Motivations Model Formulation Experiments Conclusion References
Sensitivity to Θ2
0
1
2
3
4
5
6
7
8
9
10
1112
13
14
15
16
17
18
19
20
21 22
23
24
2526
27
28
29
30
0
1
2
3
4
5
6
7
8
9
10
1112
13
14
15
16
17
18
19
20
21 22
23
24
2526
27
28
29
30
0
1
2
3
4
5
6
7
8
9
10
1112
13
14
15
16
17
18
19
20
21 22
23
24
2526
27
28
29
30
33 / 34