Upload
others
View
20
Download
0
Embed Size (px)
Citation preview
Gaussian Graphical Models arguable points
Gaussian Graphical Models
Jiang Guo
Junuary 9th, 2013
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points
Contents
1 Gaussian Graphical ModelsUndirected Graphical ModelGaussian Graphical ModelPrecision matrix estimationMain approaches
SparsityGraphical LassoCLIMETIGERGreedy methods
Measure methodsnon-gaussian scenarioApplicationsProject
2 arguable points
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Undirected Graphical Model
Markov Property
pairwise
Xu ⊥⊥ Xv|XV \u,v if u, v /∈ E
localglobal
Conditional Independence
Partial Correlation
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Gaussian Graphical Model
Multivariate Gaussian
X ∼ Nd(ξ,Σ)
If Σ is positive definite, distribution has density on Rd
f(x|ξ,Σ) = (2π)−d/2(detΩ)1/2e−(x−ξ)T Ω(x−ξ)/2
where Ω = Σ−1 is the Precision matrix of the distribution.
Marginal distribution: Xs ∼ Nd′(ξs,Σs,s)Conditional distribution:
X1|X2 ∼ N (ξ1|2,Σ1|2)
where: ξ1|2 = ξ1 + Σ12Σ−122 (x2 − ξ2)
Σ1|2 = Σ11 − Σ12Σ−122 Σ21
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Gaussian Graphical Model
Multivariate Gaussian
Sample Covariance Matrix
Σ =1
n− 1
n∑i=1
(xi − ξ)(xi − ξ)T
Precision MatrixΩ = Σ−1
In high dimensional settings where p n, the Σ is not invertible(semi-positive definite).
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Gaussian Graphical Model
Every Multivariate Gaussian distribution can be represented by apairwise Gaussian Markov Random Field (GMRF)
GMRF: Undirected graph G = (V,E) with
vertex set V = 1, ..., p corresponding to random variablesedge set E = (i, j) ∈ V |i 6= j,Ωij 6= 0
Goal: Estimate sparse graph structuregiven n p iid observations.
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Precision matrix estimation
Graph recovery
also known as ”Graph structure learning/estimation”
For each pair of nodes (variables), decide whether there should bean edge
Edge(α, β) not exists ⇔ α ⊥⊥ β|V \α, β ⇔ Ωα,β = 0
Precision matrix is sparse
Hence, it turns out to be a non-zero patterns detection problem.
x y z w
xyzw
0 1 0 11 0 0 10 0 0 01 1 0 0
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Precision matrix estimation
Parameter Estimation
x y z w
xyzw
0 ? 0 ?? 0 0 ?0 0 0 0? ? 0 0
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Sparsity
The word ”sparsity” has (at least) four related meanings in NLP! (NoahSmith et al.)
1 Data sparsity: N is too small to obtain a good estimate for w.(usually bad)
2 ”Probability” sparsity: most of events receive zero probability
3 Sparsity in the dual: Associated with SVM and other kernel-basedmethods.
4 Model sparsity: Most dimensions of f is not needed for a good hw;those dimensions of w can be zero, leading to a sparse w (model)
We focus on sense #4.
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Sparsity
Linear regression
f(~x) = w0 +
d∑i=1
wi ∗ xi
sparsity means some of the wi(s) are zero
1 problem 1: why do we need sparse solution?
feature/variable selectionbetter interprete the datashrinkage the size of modelcomputatioal savingsdiscourage overfitting
2 problem 2: how to achieve sparse solution?
solutions to come...
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Sparsity
Ordinary Least Square
Objective function to minimize:
L =
N∑i=1
|yi − f(xi)|2
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Sparsity
Ridge: L2 norm Regularization
minL =
N∑i=1
|yi − f(xi)|2 +λ
2‖~w‖22
equaivalent form (constrained optimization):
minL =
N∑i=1
|yi − f(xi)|2
subject to ‖~w‖22 6 C
Corresponds to zero-mean Gaussian prior ~w ∼ N (0, σ2), i.e.p(wi) ∝ exp(−λ‖w‖22)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Sparsity
Lasso: L1 norm Regularization
minL =
N∑i=1
|yi − f(xi)|2 +λ
2‖~w‖1
equaivalent form (constrained optimization):
minL =
N∑i=1
|yi − f(xi)|2
subject to ‖~w‖1 6 C
Corresponds to zero-mean Laplace prior ~w ∼ Laplace(0, b), i.e.p(wi) ∝ exp(−λ|wi|)sparse solution
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Why lasso sparse? an intuitive interpretation
Lasso Ridge
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Algorithms for the Lasso
Subgradient methods
interior-point methods (Boyd et al 2007)
Least Angle RegreSsion(LARS) (Efron et al 2004), computes entirepath of solutions. State of the Art until 2008.
Pathwise Coordinate Descent (Friedman, Hastie et al2007)
Proximal Gradient (project gradient)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Pathwise Coordinate Descent for the Lasso
Coordinate descent: optimize one parameter (coordinate) at a time.
How? suppose we had only one predictor, problem is to minimize∑i
(γi − xiβ)2 + λ|β|
Solution is the soft-thresholded estimate
sign(β)(|β| − λ)+
where β is the least square solution. and
sign(z)(|z| − λ)+ =
z − λ if z > 0 and λ < |z|z + λ if z < 0 and λ < |z|0 if λ > |z|
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Pathwise Coordinate Descent for the Lasso
With multiple predictors, cycle through each predictor in turn. Wecompute residuals γi = yi −
∑j 6=k xij βk and apply univariate
soft-thresholding, pretending our data is (xij , ri)
Start with large value for λ (high sparsity) and slowly decrease it.
Exploits current estimation as warm start, leading to a more stablesolution.
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Variable selection
Lasso: L1 regularization
L =
N∑i=1
|yi − f(xi)|2 +λ
2‖~w‖1
Subset selection: L0 regularization
L =
N∑i=1
|yi − f(xi)|2 +λ
2‖~w‖0
Greedy forward
Greedy backward
Greedy forward-backward (Tong Zhang, 2009 and 2011)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Variable selection
Lasso: L1 regularization
L =
N∑i=1
|yi − f(xi)|2 +λ
2‖~w‖1
Subset selection: L0 regularization
L =
N∑i=1
|yi − f(xi)|2 +λ
2‖~w‖0
Greedy forward
Greedy backward
Greedy forward-backward (Tong Zhang, 2009 and 2011)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Greedy forward-backward algorithm
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Graphical Lasso
Recall the Gaussian graphical model(multi-variate gaussian)
f(x|ξ,Σ) = (2π)−d/2(detΩ)1/2e−(x−ξ)T Ω(x−ξ)/2
log-likelihoodlog det Ω− trace(ΣΩ)
Graphical lasso (Friedman et al 2007)
Maximize the L1 penalized log-likelihood:
log det Ω− trace(ΣΩ)− λ‖Ω‖1
Coordinate descent
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
CLIME
Constrained L1 Minimization approach to sparse precision matrixEstimation. (CAI et al 2011)
CLIME estimator
min‖Ω‖1 subject to:
‖ΣΩ− I‖∞ 6 λn,Ω ∈ Rp×p
Solution Ω have to be symmetrized
Equivalent to solving the p optimization problems:
min‖β‖1 subject to: ‖Σβ − ei‖∞ 6 λn
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
CLIME
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Gaussian Graphical Model and Column-by-ColumnRegression
Consider the conditional distribution
X1|X2 ∼ N (ξ1|2,Σ1|2)
where: ξ1|2 = ξ1 + Σ12Σ−122 (X2 − ξ2)
Σ1|2 = Σ11 − Σ12Σ−122 Σ21
By standardizing the data matrix, i.e. ξ1 = ξ2 = 0
X1|X2 ∼ N (Σ12Σ−122 X2,Σ11 − Σ12Σ−1
22 Σ21)
Column-by-Column Regression
X1 = αT1 X2 + ε1 where ε1 ∼ N (0, σ1)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Column-by-Column Regression
node-by-node neighbordetection
Easy to implement
Easy to parallelize, scalable tolarge scale data
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
TIGER
A generalized framework for supervised learning
β = arg minβL(β;X,Y ) + Ω(β)
A Tuning-Insensitive Approach for Optimally Estimating GaussianGraphical Models (Han et al 2012)
SQRT-Lasso
β = arg minβ∈Rd
1√n‖y −Xβ‖2 + λ‖β‖1
Tuning-insensitive (good point)
State-of-the-art
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Greedy methods
High-dimensional (Gaussian) Graphical Model Estimation UsingGreedy Methods (Pradeep et al 2012)
Rather than exploiting lasso-like methods to achieve sparse solution,we apply greedy methods to do variable selection
Global greedy: treat each element of the Precision Matrix as aVariable
Local greedy: Column-by-column fashion
(Potentially) state-of-the-art
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Global Greedy
Estimate graph structure through a series of forward and backwardstagewise steps
Forward Step: Choose ”best” new edge and add to current estimate,as long as decrease in loss δ exceeds stopping criterion.
Backward Step: Choose ”weakest” current edge and remove ifincrease in loss is < product of backward step factor and decrease inloss due to previous forward step (νδ).
”Best” and ”Weakest” determined by sample-based Gaussian MLE
L(Ω) = log det Ω− trace(ΣΩ)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Local Greedy
Estimates each node’s neighborhood in parallel using a series offorward and backward steps
Forward Step: Choose ”best” new edge and add to current estimate,as long as decrease in loss δ exceeds stopping criterion.
Backward Step: Choose ”weakest” current edge and remove ifincrease in loss is < product of backward step factor and decrease inloss due to previous forward step (νδ).
”Best” and ”Weakest” determined by least-square loss
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Measure methods
Graph recovery: ROC Curves
TP: True positive rate
FP: False positve rate
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Graph recovery: ROC Curves
Decrease the regularization parameter gradually to achieve the solutionpath
An illustration:
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Measure methods
Parameter estimation: norm error
Frobenius norm errorµ‖Ω− Ω‖F
‖A‖F =
√√√√ m∑i=1
n∑j=1
|a2ij |
Spectrum norm error: ‖Ω− Ω‖2
‖A‖2 =√λmax(A ∗A) = σmax(A)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Performance: Greedy, TIGER, Clime, Glasso
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Performance: Greedy, TIGER, Clime, Glasso
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
non-gaussian scenario
Question?
In a general graph, whether a relationship exists between conditionalindependence and the structure of the precision matrix?
Remain unsolved. Recently, there are some progresses:
High dimensional semiparametric Gaussian copula graphical models(Han et al. 2012)
The nonparanormal: Semiparametric estimation of high dimensionalundirected graphs (Han et al. 2009)
Structure estimation for discrete graphical models: Generalizedcovariance matrices and their inverses (NIPS 2012 OutstandingStudent Paper Awards)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
non-gaussian scenario
Question?
In a general graph, whether a relationship exists between conditionalindependence and the structure of the precision matrix?
Remain unsolved. Recently, there are some progresses:
High dimensional semiparametric Gaussian copula graphical models(Han et al. 2012)
The nonparanormal: Semiparametric estimation of high dimensionalundirected graphs (Han et al. 2009)
Structure estimation for discrete graphical models: Generalizedcovariance matrices and their inverses (NIPS 2012 OutstandingStudent Paper Awards)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
non-gaussian scenario
Question?
In a general graph, whether a relationship exists between conditionalindependence and the structure of the precision matrix?
Remain unsolved. Recently, there are some progresses:
High dimensional semiparametric Gaussian copula graphical models(Han et al. 2012)
The nonparanormal: Semiparametric estimation of high dimensionalundirected graphs (Han et al. 2009)
Structure estimation for discrete graphical models: Generalizedcovariance matrices and their inverses (NIPS 2012 OutstandingStudent Paper Awards)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Applications
Biomatics
Social media
NLP?
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Applications
Biomatics
Social media
NLP?
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Applications
Biomatics
Social media
NLP?
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Applications
Biomatics
Social media
NLP?
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Project: a graphical modeling platform
Graphical Modeling Platform
User
dataset Model
Amazon EC2 (Cloud) & Princeton Server
Numerical Precision Matrix
Graph Visualization· Infovis (Client, Javascript)· Circos (Server, Perl)
· Regression Analysising (future)
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project
Project: a graphical modeling platform
My visualization(1): R Visualization:
Jiang Guo Gaussian Graphical Models
Gaussian Graphical Models arguable points
Arguable points
It is student who should push supervisor, rather than the superviserpush student
Work extremely hard, blindly trust your supervisor
Keep quality first, quantity will follow
Science is about problems, not equations.
Being focused.
Think less, do more.
Open mind.
Jiang Guo Gaussian Graphical Models