Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In...

Graphical Models: LearningPradeep Ravikumar

Carnegie Mellon University

Learning Graphical Models• In many contexts, we do not have an apriori specified graphical model

distribution

• All we have access to are iid samples drawn from *unknown* graphical model distribution

• Would like to estimate graphical model distribution from data:

• graph

• factor functions

• We focus on parametric graphical models where factor functions have a specific parametric form, so this entails estimating some parameters (rather than general functions)

Learning Undirected Graphical Models

We will focus on pairwise graphical models

p(X; , G) =1

Z()exp

(s,t)E(G)

st st(Xs, Xt)

st(xs, xt) : arbitrary potential functions

Ising xs xt

Potts I(xs = xt)Indicator I(xs, xt = j, k)

Graphical Model Selection

Graphical model selection

let G = (V,E) be an undirected graph on p = |V | vertices

pairwise Markov random field: family of prob. distributions

P(x1, . . . , xp; θ) =1

Z(θ)exp

(s,t)∈E

θstxsxt

Problem of graph selection: given n independent and identicallydistributed (i.i.d.) samples of X = (X1, . . . , Xp), identify the underlyinggraph structure

Martin Wainwright (UC Berkeley) High-dimensional graph selection September 2009 7 / 36

Given: n samples of X = (X1, . . . , Xp) with distribution p(X;

;G), where

) = exp

(s,t)2E(G)

stst(xs, xt)A(

Problem: Estimate graph G given just the n samples.

Samples from binary-valued pairwise MRFs

Independence model θst = 0

Medium coupling θst ≈ 0.2

Strong coupling θst ≈ 0.8

Graphical Model Selection

P(x1, . . . , xp; θ) =1

Z(θ)exp

(s,t)∈E

θstxsxt

;G), where

) = exp

(s,t)2E(G)

stst(xs, xt)A(

Learning Graphical Models

• Two Step Procedures:

‣ 1. Model Selection; estimate graph structure

‣ 2. Parameter Inference given graph structure

• Score Based Approaches: search over space of graphs, with a score for graph based on parameter inference

• Constraint-based Approaches: estimate individual edges by hypothesis tests for conditional independences

• Caveats: (a) difficult to provide guarantees for estimators; (b) estimators are NP-Hard

• State of the art methods are based on estimating neighborhoods

‣ Via high-dimensional statistical model estimation

‣ Via high-dimensional hypothesis tests

Ising Model Selection

P(x1, . . . , xp; θ) =1

Z(θ)exp

(s,t)∈E

θstxsxt

;G), where

) = exp

(s,t)2E(G)

stst(xs, xt)A(

p(X; ) = exp

(s,t)2E(G)

stXsXt A()

;G), where

) = exp

(s,t)2E(G)

stst(xs, xt)A(

p(X; ) = exp

(s,t)2E(G)

stXsXt A()

Applications: statistical physics, computer vision, social network analysis

US Senate 109th Congress

Banerjee et al, 2008

• Just computing the likelihood of a known Ising model is NP Hard (since the normalization constant requires summing over exponentially many configurations)

x21,1p

• Estimating the unknown Ising model parameters as well as graph structure might seem to be NP Hard as well

x21,1p

• Estimating the unknown Ising model parameters as well as graph structure might seem to be NP Hard as well

• On the other hand, it is tractable to estimate the node-wise conditional distributions, of one variable conditioned on the rest of the variables

x21,1p

Neighborhood Estimation in Ising Models

p(Xr|XV \r; , G) =exp(

tN(r) 2 rtXrXt)

tN(r) 2 rtXrXt) + 1

For Ising models, node conditional distribution is just a logistic regression model:

• So instead of estimating graph structure constrained global Ising model, we could estimate structure constrained local node-conditional distributions —logistic regression models

tN(r) 2 rtXrXt)

tN(r) 2 rtXrXt) + 1

• So instead of estimating graph structure constrained global Ising model, we could estimate structure constrained local node-conditional distributions —logistic regression models

• But would node-conditional distributions uniquely specify a consistent joint, or even be consistent with any joint at all?

tN(r) 2 rtXrXt)

tN(r) 2 rtXrXt) + 1

Conditional and Joint Distributions

• Would node-conditional distributions uniquely specify a consistent joint, or even be consistent with any joint at all?

• In general: no!

• But for the Ising model and node-wise logistic regression models: yes!

• In general: no!

• But for the Ising model and node-wise logistic regression models: yes!

• Theorem (Besag 1974, R., Wainwright, Lafferty 2010): An Ising model uniquely specifies and is uniquely specified by a set of node-wise logistic regression models.

• Global graph constraint of sparse, bounded degree graphs is equivalent to local constraint of bounded node-degrees (number of neighbors)

• Estimate node neighborhoods via constrained logistic regression models, and stich node-neighborhoods together to form global graph

tN(r) 2 rtXrXt)

tN(r) 2 rtXrXt) + 1

Graph selection via neighborhood regression

Observation: Recovering graph G equivalent to recovering neighborhood set N(s)for all s ∈ V .

Method: Given n i.i.d. samples X(1), . . . , X(n), perform logistic regression of

each node Xs on X\s := Xs, t = s to estimate neighborhood structure bN(s).

1 For each node s ∈ V , perform ℓ1 regularized logistic regression of Xs on theremaining variables X\s:

bθ[s] := arg minθ∈Rp−1

f(θ; X(i)\s )

| z + ρn ∥θ∥1|z

logistic likelihood regularization

2 Estimate the local neighborhood bN(s) as the support (non-negative entries) of

the regression vector bθ[s].

3 Combine the neighborhood estimates in a consistent manner (AND, or ORrule).

Empirical behavior: Unrescaled plots

0 100 200 300 400 500 6000

Number of samples

Star graph; Linear fraction neighbors

p = 64

p = 100

p = 225

Plots of success probability versus raw sample size .Martin Wainwright (UC Berkeley) High-dimensional graph selection September 2009 22 / 36

Sufficient conditions for consistent model selectiongraph sequences Gp,d = (V, E) with p vertices, and maximum degree d.

edge weights |θst| ≥ θmin for all (s, t) ∈ E

draw n i.i.d, samples, and analyze prob. success indexed by (n, p, d)

Theorem

Under incoherence conditions, for a rescaled sample size (RavWaiLaf06)

θLR(n, p, d) :=n

d3 log p> θcrit

and regularization parameter ρn ≥ c1 τ!

log pn , then with probability greater

than 1 − 2 exp"− c2(τ − 2) log p

#→ 1:

(a) Uniqueness: For each node s ∈ V , the ℓ1-regularized logistic convexprogram has a unique solution. (Non-trivial since p ≫ n =⇒ not strictly convex).

(b) Correct exclusion: The estimated sign neighborhood $N(s) correctlyexcludes all edges not in the true neighborhood.

(c) Correct inclusion: For θmin ≥ c3τ√

dρn, the method selects the correctsigned neighborhood.

Consequence: For θmin = Ω(1/d), it suffices to have n = Ω(d3 log p).

(R., Wainwright, Lafferty, 2010)

Assumptions

Define Fisher information matrix of logistic regression:Q∗ := Eθ∗

!∇2f(θ∗;X)

A1. Dependency condition: Bounded eigenspectra:

Cmin ≤ λmin(Q∗SS), and λmax(Q∗

SS) ≤ Cmax.

λmax(Eθ∗ [XXT ]) ≤ Dmax.

A2. Incoherence There exists an ν ∈ (0, 1] such that

|||Q∗ScS(Q∗

SS)−1|||∞,∞ ≤ 1 − ν.

where |||A|||∞,∞ := maxi#

j |Aij |.

bounds on eigenvalues are fairly standard

incoherence condition:

! partly necessary (prevention of degenerate models)! partly an artifact of ℓ1-regularization

incoherence condition is weaker than correlation decay

Other Undirected Graphical Models

• Similar estimators work for other undirected parametric graphical models as well

• Discrete/Categorical Graphical Models (Jalali, Ravikumar, Sanghavi, Ruan 2011)

• Gaussian Graphical Models (Ravikumar, Raskutti, Wainwright, Yu 2011)

• Exponential Family Graphical Models (Yang, Ravikumar, Allen, Liu 2015)

Example: Mixed Graphical ModelsReferences

Experiments: Cancer Genomic and Transcriptomic DataCombine ‘Level III RNA-sequencing’ data and ‘Level II non-silent somatic mutationand level III copy number variation data’ for 697 breast cancer patients.

ABCC11

CXCL17

CEACAM6

CEACAM5

TSPYP5 RPS24P1

ASSP6ADLICANP PLIN4CXCL13

OFDYP2

RBMY2XP

OFDYP13

CDY15P

FAM8A8P

XKRYP5

SEDLP5 OFDYP14

CDY4PCDY21P

EIF1AY

CDY22P

SEDLP4

XKRYP6

XKRYP2

OFDYP1

OFDYP9

RBMY1H

ACTGP2

CDY13P CDY5P

XKRYNLGN4Y

TAF9P1 HDHD1BP

VCYCYCSP46

STSPCDY2B

CXCL9RBMY2QP

RBMY2NP ADIPOQ

TSPYP3 UBDPREX1SERPINA1

SERPINA6

HCP5P14

CALML5 CST9

KLK11DHRS2

KIAA1972

NAT1 KLK6KLK5

CLEC3A

GALNT6

SH3BGRL

NKAIN1

NPC2STC1

WNK4CHGB

CARTPT RAP1AP RIMS4 CP

CDY8PCDH3

SEDLP3 IFI6

ADH1B FABP4MRP63P10

C4orf7 ARSDP

RBMY2OP

PIK3CA

TSPAN1

CBLN2VSTM2A

SYT1UGT2B11

BCAS1 TRH

DCDS100A14

CACNG4

SLC5A8

ABCA12

FAM106A

FADS2GLYATL2

CYP2A7

KRT6BTCN1

SERPINB5 GABRP

COL17A1

TGIF2LY

S100A6

USP12P3

SLC30A8

CYP2A6

CYP2B7P1

S100A9

S100A7

LOC96610

IGJ S100A8

USP9YP2

SMCYTBL1YP

CYorf15A

ZNF381P

RPS4Y2P CYorf14

CDY10P

TSPYP4 CYorf15B

OFDYP5

TAF9P2 RBMY1B

OFDYP6 TSPYP2

SMCYPUSP9YP1

RBMY2EP RBMY2KP RBMY2HP

PSMA6P AYP1p1

ATP8B2 ADAM32

OFDYP7

RBMY2TP FAM8A9P

RBMY1A1 OFDYP4

PCDH11Y

HSFY1XKRYP1

RBMY2SP BCORL2

KRT18P10

RBMY1J DAZ1

RBMY2VP RBMY2BP

RBMY2WP

TSPYP1 TBL1YPRKY

TSPY2RBMY2GP

RPS4Y1

GSTM1 COX6C MUC6

CGA GSTM3

ALBSMR3B

CNTNAP2 CYP4X1

CYP4Z1

TMEM150C RPL8

C1orf64

CRISP3 C3orf57

SLC7A5

GOLGA2LY1

OFDYP10

C8orf85 XKRYP3

OFDYP16

CDY16P

RBMY1D MX1

C20orf114

TMSB4Y ZFY

OFDYP3

CYorf16

RBMY2MP

RBMY1A3P

RBMY2UP

RBMY1F

CDY11P

RBMY2AP

RBMY1E

OFDYP8

CDY12P

USP9YAPXLP

GPR143P FAM8A7P

RBMY2JP

OFDYP15 OFDYP11 CDY17P

CDY18P OFDYP12 BPY2BRBMY3AP

CDY1PPP1R12BP

CDY20P

CSPG4LYP1

RBMY2DP

CYCSP48

RBMY2CP DAZ4CDY1B

XKRYP4

CSPG4LYP2

CDY14P DAZ3

CDY23P

GOLGA2LY2

BPY2CCDY19P PARP4P RBMY2YP

TPGM - Ising graphical model

(Yellow) Gene expression via RNA-sequencing, count-valued

(Blue) Genomic mutation, binary mutation status

Well known components: (DLK1, THSD4) - (TP53)

(UT Austin) Mixed Graphical Models via Exponential Families AISTATS 2014 22 / 25

References

Experiments: Cancer Genomic and Transcriptomic DataCombine ‘Level III RNA-sequencing’ data and ‘Level II non-silent somatic mutationand level III copy number variation data’ for 697 breast cancer patients.

ABCC11

CXCL17

CEACAM6

CEACAM5

TSPYP5 RPS24P1

ASSP6ADLICANP PLIN4CXCL13

OFDYP2

RBMY2XP

OFDYP13

CDY15P

FAM8A8P

XKRYP5

SEDLP5 OFDYP14

CDY4PCDY21P

EIF1AY

CDY22P

SEDLP4

XKRYP6

XKRYP2

OFDYP1

OFDYP9

RBMY1H

ACTGP2

CDY13P CDY5P

XKRYNLGN4Y

TAF9P1 HDHD1BP

VCYCYCSP46

STSPCDY2B

CXCL9RBMY2QP

RBMY2NP ADIPOQ

TSPYP3 UBDPREX1SERPINA1

SERPINA6

HCP5P14

CALML5 CST9

KLK11DHRS2

KIAA1972

NAT1 KLK6KLK5

CLEC3A

GALNT6

SH3BGRL

NKAIN1

NPC2STC1

WNK4CHGB

CARTPT RAP1AP RIMS4 CP

CDY8PCDH3

SEDLP3 IFI6

ADH1B FABP4MRP63P10

C4orf7 ARSDP

RBMY2OP

PIK3CA

TSPAN1

CBLN2VSTM2A

SYT1UGT2B11

BCAS1 TRH

DCDS100A14

CACNG4

SLC5A8

ABCA12

FAM106A

FADS2GLYATL2

CYP2A7

KRT6BTCN1

SERPINB5 GABRP

COL17A1

TGIF2LY

S100A6

USP12P3

SLC30A8

CYP2A6

CYP2B7P1

S100A9

S100A7

LOC96610

IGJ S100A8

USP9YP2

SMCYTBL1YP

CYorf15A

ZNF381P

RPS4Y2P CYorf14

CDY10P

TSPYP4 CYorf15B

OFDYP5

TAF9P2 RBMY1B

OFDYP6 TSPYP2

SMCYPUSP9YP1

RBMY2EP RBMY2KP RBMY2HP

PSMA6P AYP1p1

ATP8B2 ADAM32

OFDYP7

RBMY2TP FAM8A9P

RBMY1A1 OFDYP4

PCDH11Y

HSFY1XKRYP1

RBMY2SP BCORL2

KRT18P10

RBMY1J DAZ1

RBMY2VP RBMY2BP

RBMY2WP

TSPYP1 TBL1YPRKY

TSPY2RBMY2GP

RPS4Y1

GSTM1 COX6C MUC6

CGA GSTM3

ALBSMR3B

CNTNAP2 CYP4X1

CYP4Z1

TMEM150C RPL8

C1orf64

CRISP3 C3orf57

SLC7A5

GOLGA2LY1

OFDYP10

C8orf85 XKRYP3

OFDYP16

CDY16P

RBMY1D MX1

C20orf114

TMSB4Y ZFY

OFDYP3

CYorf16

RBMY2MP

RBMY1A3P

RBMY2UP

RBMY1F

CDY11P

RBMY2AP

RBMY1E

OFDYP8

CDY12P

USP9YAPXLP

GPR143P FAM8A7P

RBMY2JP

OFDYP15 OFDYP11 CDY17P

CDY18P OFDYP12 BPY2BRBMY3AP

CDY1PPP1R12BP

CDY20P

CSPG4LYP1

RBMY2DP

CYCSP48

RBMY2CP DAZ4CDY1B

XKRYP4

CSPG4LYP2

CDY14P DAZ3

CDY23P

GOLGA2LY2

BPY2CCDY19P PARP4P RBMY2YP

TPGM - Ising graphical model

(Yellow) Gene expression via RNA-sequencing, count-valued

(Blue) Genomic mutation, binary mutation status

Well known components: (DLK1, THSD4) - (TP53)

(UT Austin) Mixed Graphical Models via Exponential Families AISTATS 2014 22 / 25

Poisson-Ising Models

Example: Poisson Graphical Models

• MicroRNA network learnt from The Cancer Genome Atlas (TCGA) Breast Cancer Level II Data

Case Study: Biological Results

SPGM miRNA Network:

An Important Special Case: Poisson Graphical Model

Joint Distribution:

P(X ) = exp

(s,t)2E

log(Xs

!) A()

Node-conditional Distributions:

|XV \s) / exp

t2N(s)

+ log(Xs

Pairwise variant discussed as “Poisson auto-model” in (Besag, 74).

Learning Directed Graphical Models

Moralization

• The undirected graphical corresponding to moralized graph is the smallest undirected graphical model that includes directed graphical model distribution

• Learning undirected graphical model structure given iid samples from directed graphical model would estimate moralized graph

Recall: Moralization

(a) (b)

Learning Directed Graphical Models

• Two Step Process:

• Learn the undirected “moralized” graph

• Orient the edges using conditional independence tests

• PC algorithm — Spirtes, Glymour & Scheines (2000)

• Open Problem: Not very scalable algorithms for the second step (or for overall directed graphical model estimation)

Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In...

Documents

Directed Graphical Models

Lecture 3: Undirected Graphical Models - GitHub Pages · 2018-12-04 · Probabilistic Graphical Models Lecture 3: Undirected Graphical Models Theo Rekatsinas 1. Representing Multivariate

Graphical Models - Learning -

Graphical models: Foundations of neural computationpapers.cnl.salk.edu/PDFs/Graphical models... · graphical models framework provides formil definitions of both adap- tivity and

Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probabilistic graphical models...Probabilistic graphical models • The basic technical concepts behind probabilistic graphical models and how to work with them. • Applications in

Probabilistic Graphical Modelssrihari/papers/PGM-ESNAM.pdfProbabilistic graphical models (PGMs), also known as graphical models, are representations of probability distributionsover

05 probabilistic graphical models

Graphical Models

Graphical Models - IIT Bombay

Graphical Models - Inference -

Graphical Causal Models

Integration and Graphical Models

Notes on Graphical Models

Probabilistic Graphical Models

Introduction to Graphical Models

Graphical Models - Modeling -

Statistical Machine Learning€¦ · Principal Component Analysis Autoencoders Graphical Models 1 Graphical Models 2 Graphical Models 3 Sampling Sequential Data 1 Sequential Data

01 graphical models

Learning Graphical Models - SJTUbcmi.sjtu.edu.cn/ds/Lecture12_Xing.pdf · 2018-10-10 · Learning Graphical ModelsLearning Graphical Models Scenarios: completely observed GMs directed