Reading revue of "Inferring Multiple Graphical Structures"

Reading revue of Inferring Multiple GraphicalStructures

from J. Chiquet et al. (and related articles)

Nathalie Villa-Vialaneix - [email protected]

http://www.nathalievilla.org

Groupe de travail samm-graph - 17/02/2012

Reading revue (Chiquet et al., 2011) samm-graph, 17/02/2012 Nathalie Villa-Vialaneix 1 / 18

mailto:[email protected]

http://www.nathalievilla.org

Outline

1 Network inferencePackage GeneNet

Package glasso

2 Multiple Graphical Structures


Network inference

Outline


Package glasso



Network inference

Framework

Data: large scale gene expression data

individuals

n ' 30/50

X =

. . . . . .

. . Xji . . .

. . . . . .

︸︷︷︸variables (genes expression), p'103/4

What we want to obtain: a network with

• nodes: genes;

• edges: signicant and direct co-expression between two genes (tracktranscription regulations)


Network inference

Advantages of inferring a network from large scaletranscription data

1 over raw data: focuses on direct links

strong indirect correlation

2 over raw data (again): focuses on signicant links (more robust)3 over bibliographic network: can handle interactions with yetunknown (not annotated) genes


Network inference



2 over raw data (again): focuses on signicant links (more robust)

3 over bibliographic network: can handle interactions with yetunknown (not annotated) genes


Network inference



2 over raw data (again): focuses on signicant links (more robust)

3 over bibliographic network: can handle interactions with yetunknown (not annotated) genes


Network inference

Various approaches (and packages) to infer geneco-expression networks• Graphical Gaussian Model (Xi )i=1,...,n are i.i.d. Gaussian randomvariables N (0,Σ) (gene expression); then

j ←→ j ′(genes j and j ′ are linked)⇔ Cor(X j ,X j ′ |(X k)k 6=j ,j ′

)> 0

Cor(X j ,X j ′ |(X k)k 6=j ,j ′

)'(Σ−1

)j ,j ′⇒ nd the partial correlations

by means of (Σn)−1.

Problem: Σ is a p-dimensional matrix (with p large) and n is smallcompared to p ⇒ (Σn)−1 is a poor estimate of Σ−1!

• Bayesian network learning [Pearl, 1998, Pearl and Russel, 2002]• Networks based on mutual information (MI): MI, I (X j ,X j ′)measures the information gain (related to KL divergence):

I (X j ,X j ′) = H(X j) + H(X j ′)− H(X j ,X j ′) = H(X j)− H(X j |X j ′)

where H is the entropy H(X j) = −∑

x∈X j p(x) log p(x) (I 'uncertainty reduction in one variable after removing the uncertaintyin the other variable).Standard issues:• estimate I ;• nd out which pairs of variables have signicant MI.

Package minet, [Meyer et al., 2008].


Network inference

Various approaches (and packages) to infer geneco-expression networks• Graphical Gaussian Model (Xi )i=1,...,n are i.i.d. Gaussian randomvariables N (0,Σ) (gene expression); then

j ←→ j ′(genes j and j ′ are linked)⇔ Cor(X j ,X j ′ |(X k)k 6=j ,j ′

)> 0

Cor(X j ,X j ′ |(X k)k 6=j ,j ′

)'(Σ−1

)j ,j ′⇒ nd the partial correlations

by means of (Σn)−1.Problem: Σ is a p-dimensional matrix (with p large) and n is smallcompared to p ⇒ (Σn)−1 is a poor estimate of Σ−1!







Network inference

Various approaches (and packages) to infer geneco-expression networks• Graphical Gaussian Model

• seminal work:[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b](with bootstrapping or shrinkage and a proposal for a Bayesian test forsignicance); package genenet;

• sparse approaches [Friedman et al., 2008]: packages GGMselect[Giraud et al., 2009] or SIMoNe [Chiquet et al., 2009,Ambroise et al., 2009, Chiquet et al., 2011] (with unsupervisedclustering or able to handle multiple populations data)







Network inference


• seminal work:[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b](with bootstrapping or shrinkage and a proposal for a Bayesian test forsignicance); package genenet;

• sparse approaches [Friedman et al., 2008]: packages GGMselect[Giraud et al., 2009] or SIMoNe [Chiquet et al., 2009,Ambroise et al., 2009, Chiquet et al., 2011] (with unsupervisedclustering or able to handle multiple populations data)







Network inference

Various approaches (and packages) to infer geneco-expression networks• Graphical Gaussian Model• Bayesian network learning [Pearl, 1998, Pearl and Russel, 2002]

DAG (Direct Acyclic Graph) and (conditional) probability tables

• Networks based on mutual information (MI): MI, I (X j ,X j ′)measures the information gain (related to KL divergence):






Network inference

Various approaches (and packages) to infer geneco-expression networks• Graphical Gaussian Model• Bayesian network learning [Pearl, 1998, Pearl and Russel, 2002]Learning: nd conditional probability tables and DAG.Standard issues:• search for unobserved (latent) variables dependency;• estimate probabilities by ML optimization (EM algorithm);• search for DAG (skeleton, directionality): several DAGs are often

plausible.

Package bnlearn, [Scutari, 2010].







Network inference


• Bayesian network learning [Pearl, 1998, Pearl and Russel, 2002]







Network inference Package GeneNet

GGM: shrinkage approach

package GeneNet estimates partial correlations in the GaussianGraphical Model framework [Schäfer and Strimmer, 2005b]:

• X = (X 1, . . . ,X p) (p genes expressions): random Gaussian vectorwith variance Σ;

• j ↔ j ′ ⇔ Cor(X j ,X j ′ |(X k)k 6=j ,j ′) > 0 ⇔(Σ−1

)jj ′> 0.

Shrinkage: use (1− λ)Σ + λΩ instead of Σ (where Ω is, e.g., identitymatrix and λ is estimated from the data) to stabilize the estimation of Σ−1

(bagging is also useable [Schäfer and Strimmer, 2005a])

Signicant partial correlations are then selected using a Bayesian testbased on a distribution mixture: partial correlation ts a mixture model

η0f0(., κ) + ηAfA

η0 prior for null hypothesis, ηA = 1− η0, η0 ηA (η0, κ estimated by EM).

FDR correction: at level α (5% here), keep edges for which p(i) ≤ iαe/η0

where e is the number of edges and p(1), p(2), ..., p(e) are ordered p-values.


Network inference Package GeneNet

Example

Expression data: 272 genes and 53 observations (pigs...)

Shrinkage approach: 883 edges (density: 2.24%); Bootstrap approach:2345 edges (density: 6.36%).


Network inference Package glasso

Sparse linear regressionLinear regression for each node:

∀ j = 1, . . . , p, X j = SjX−j + εj

with X−j , gene expressions without gene j .

Relation with the network:

j ↔ j ′ ⇔ Sjj ′ 6= 0.

Estimation: [Meinshausen and Bühlmann, 2006] LS estimate

withL1-penalization

∀ j = 1, . . . , p, argminSj

n∑i=1

(X

ji − SjX

−ji

)2

+λ∑j ′ 6=j

|Sjj ′ |

Sparse penalization ⇒ only a few j ′ are such that Sjj ′ 6= 0 (variableselection).




∀ j = 1, . . . , p, X j = SjX−j + εj

with X−j , gene expressions without gene j .Relation with the network:

j ↔ j ′ ⇔ Sjj ′ 6= 0.


withL1-penalization

∀ j = 1, . . . , p, argminSj

n∑i=1

(X

ji − SjX

−ji

)2

+λ∑j ′ 6=j

|Sjj ′ |





∀ j = 1, . . . , p, X j = SjX−j + εj


j ↔ j ′ ⇔ Sjj ′ 6= 0.


withL1-penalization

∀ j = 1, . . . , p, argminSj

n∑i=1

(X

ji − SjX

−ji

)2

+λ∑j ′ 6=j

|Sjj ′ |





∀ j = 1, . . . , p, X j = SjX−j + εj


j ↔ j ′ ⇔ Sjj ′ 6= 0.

Estimation: [Meinshausen and Bühlmann, 2006] LS estimate withL1-penalization

∀ j = 1, . . . , p, argminSj

n∑i=1

(X

ji − SjX

−ji

)2+λ∑j ′ 6=j

|Sjj ′ |





∀ j = 1, . . . , p, X j = SjX−j + εj


j ↔ j ′ ⇔ Sjj ′ 6= 0.

Estimation: [Meinshausen and Bühlmann, 2006] LS estimate withL1-penalization

∀ j = 1, . . . , p, argminSj

n∑i=1

(X

ji − SjX

−ji

)2+λ∑j ′ 6=j

|Sjj ′ |




Sparse linear regression by pseudo-Likelihood maximization

Estimation: [Friedman et al., 2008] Gaussien framework allows us touse pseudo-ML optimization with a sparse penalization

L (S |X )−λ‖S‖1 =n∑

i=1

p∑j=1

logP(X ji |X−ji , Sj)

−λ‖S‖1Remark: For [Meinshausen and Bühlmann, 2006], the estimates arenot symmetric ⇒ symmetrization is done by OR or AND policies.



Summary

Density comparison

Schäfer and Strimmer (shrinkage) 2.24%Schäfer and Strimmer (bootstrap) 6.36%Friedman et al. 3.78%Meinshausen and Bühlmann (OR policy) 3.24%Meinshausen and Bühlmann (AND policy) 1.68%

Edges comparison

Schäfer & Strimmer Schäfer & Strimmer Friedman et al.

(883) (2345) (1425)

Schäfer & Strimmer 883

Friedman et al. 883 1425

Meinshausen & Bühlmann (1195) 883 1195 1195



Summary

Density comparison

Schäfer and Strimmer (shrinkage) 2.24%Schäfer and Strimmer (bootstrap) 6.36%Friedman et al. 3.78%Meinshausen and Bühlmann (OR policy) 3.24%Meinshausen and Bühlmann (AND policy) 1.68%

Edges comparison

Schäfer & Strimmer Schäfer & Strimmer Friedman et al.

(883) (2345) (1425)

Schäfer & Strimmer 883

Friedman et al. 883 1425

Meinshausen & Bühlmann (1195) 883 1195 1195


Multiple Graphical Structures

Outline


Package glasso




FrameworkT samples measuring the expression of the same genes:

X 1,t , . . . , X p,t

for t = 1, . . . ,T and each X j ,t is a nt-dimensional vectors (nt observationsin sample t).

Naive approach: independant inferences

L(S t |X t

)=

n∑i=1

p∑j=1

logP(X j ,ti |X

−j ,ti , S t

j )

and

arg maxS1,...,ST

∑t

(L(S t |X t

)− λ‖S t‖1

)Problem: Doesn't use the fact that the samples are actually related... andproduces T networks!




X 1,t , . . . , X p,t

for t = 1, . . . ,T and each X j ,t is a nt-dimensional vectors (nt observationsin sample t).Naive approach: independant inferences

L(S t |X t

)=

n∑i=1

p∑j=1

logP(X j ,ti |X

−j ,ti , S t

j )

and

arg maxS1,...,ST

∑t

(L(S t |X t

)− λ‖S t‖1

)

Problem: Doesn't use the fact that the samples are actually related... andproduces T networks!




X 1,t , . . . , X p,t

for t = 1, . . . ,T and each X j ,t is a nt-dimensional vectors (nt observationsin sample t).Naive approach: independant inferences

L(S t |X t

)=

n∑i=1

p∑j=1

logP(X j ,ti |X

−j ,ti , S t

j )

and

arg maxS1,...,ST

∑t

(L(S t |X t

)− λ‖S t‖1

)Problem: Doesn't use the fact that the samples are actually related... andproduces T networks!



3 solutions to address this issueFirst note that, in the Gaussian framework:

L (S |X ) =n

2log det(D)− n

2Tr(D−1/2SΣSD−1/2

)− np

2π

where D = Diag (S11, . . . , Spp) and Σ is the empirical covariance matrix ⇒L (S |X ) ≡ L

(S |Σ

);

• Intertwined estimation

• Group-LASSO• Cooperative-LASSO

argmax∑t

L(St |Σt

)− λ

∑j 6=j′

(∑

t

(S tjj′)

2+

)1/2

︸︷︷︸(S+)jj′

+

(∑t

(−S tjj′)

2+

)1/2

︸︷︷︸(S−)jj′

Takes into account that sign swaps are unlickely accross samples (down and

up-regulations).




L (S |X ) =n

2log det(D)− n


)− np

2π


(S |Σ

);

• Intertwined estimation Use Σt = αΣt + (1− α)Σt instead of Σt

where Σt = 1

n

∑t ntΣ

t

arg maxS1,...,ST

∑t

(L(S t |Σt

)− λ‖S t‖1

)Similar to the assumption that each sample is generated from amixture of Gaussian(?). In the experiments, α = 1/2.


argmax∑t

L(St |Σt

)− λ

∑j 6=j′

(∑

t

(S tjj′)

2+

)1/2

︸︷︷︸(S+)jj′

+

(∑t

(−S tjj′)

2+

)1/2

︸︷︷︸(S−)jj′


up-regulations).




L (S |X ) =n

2log det(D)− n


)− np

2π


(S |Σ

);

• Intertwined estimation• Group-LASSO Mixed norm:

argmax∑t

L(S t |Σt

)− λ

∑j 6=j ′

(∑t

(Sjj ′)2

)1/2

︸︷︷︸Sjj′≡

(∑t(S

tjj′ )

2)1/2

(tends to encourage Sjj ′ = 0). Hence should lead to very consensualinferred networks.

• Cooperative-LASSO

argmax∑t

L(St |Σt

)− λ

∑j 6=j′

(∑

t

(S tjj′)

2+

)1/2

︸︷︷︸(S+)jj′

+

(∑t

(−S tjj′)

2+

)1/2

︸︷︷︸(S−)jj′


up-regulations).




L (S |X ) =n

2log det(D)− n


)− np

2π


(S |Σ

);

• Intertwined estimation


argmax∑t

L(St |Σt

)− λ

∑j 6=j′

(∑

t

(S tjj′)

2+

)1/2

︸︷︷︸(S+)jj′

+

(∑t

(−S tjj′)

2+

)1/2

︸︷︷︸(S−)jj′


up-regulations).



Illustration of Group vs Cooperative LASSO



Comparison



Real life experiment

independent estimations true - sum of intertwined



Open questions

• is the group-lasso type penalty the correct approach to the biologicalproblem?

• how to be able to combine the network to analyze the dierencesbetween networks? (distances between graphs?) to build a uniqueconsensual network from all samples (mean network, AND network,OR network... ?)

• could it be relevant to penalize the sparse regression problem by anadditional relagularization (e.g., distance between each network and aconsensual network)?



References

Ambroise, C., Chiquet, J., and Matias, C. (2009).

Inferring sparse Gaussian graphical models with latent structure.Electronic Journal of Statistics, 3:205238.

Chiquet, J., Grandvalet, Y., and Ambroise, C. (2011).

Inferring multiple graphical structures.Statistics and Computing, 21(4):537553.

Chiquet, J., Smith, A., Grasseau, G., Matias, C., and Ambroise, C. (2009).

SIMoNe: Statistical Inference for MOdular NEtworks.Bioinformatics, 25(3):417418.

Friedman, J., Hastie, T., and Tibshirani, R. (2008).

Sparse inverse covariance estimation with the graphical lasso.Biostatistics, 9(3):432441.

Giraud, C., Huet, S., and Verzelen, N. (2009).

Graph selection with ggmselect.Technical report, preprint arXiv.http://fr.arxiv.org/abs/0907.0619.

Meinshausen, N. and Bühlmann, P. (2006).

High dimensional graphs and variable selection with the lasso.Annals of Statistic, 34(3):14361462.

Meyer, P., Latte, F., and Bontempi, G. (2008).

minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information.BMC Bioinformatics, 9(461).

Pearl, J. (1998).

Probabilistic reasoning in intelligent systems: networks of plausible inference.



In Kaufmann, M., editor, Representation and reasoning series (2nd printing ed.). San Fracisco, California,USA.

Pearl, J. and Russel, S. (2002).

Bayesian networks.In Michael, A., editor, Handbook of Brain Theory and Neural Networks. Bradford Books (MIT Press),Cambridge, Massachussets, USA.

Schäfer, J. and Strimmer, K. (2005a).

An empirical bayes approach to inferring large-scale gene association networks.Bioinformatics, 21(6):754764.

Schäfer, J. and Strimmer, K. (2005b).

A shrinkage approach to large-scale covariance matrix estimation and implication for functional genomics.Statistical Applications in Genetics and Molecular Biology, 4:132.

Scutari, M. (2010).

Learning Bayesian networks with the bnlearn R package.Journal of Statistical Software, 35(3):122.


Science

Reading revue of "Inferring Multiple Graphical Structures"