51
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman (Hebrew U.)

Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Embed Size (px)

Citation preview

Page 1: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Module Networks

Eran SegalStanford University

Joint work with:Dana Pe’er (Hebrew U.)Daphne Koller (Stanford)

Aviv Regev (Harvard)Nir Friedman (Hebrew U.)

Page 2: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Bayesian Networks

Density estimation Model data distribution in

population Probabilistic inference:

Prediction Classification

Dependency structure Interactions between variables Causality Scientific discovery

Data

INTL

MSFT

MOT

NVLS

Page 3: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Stock Market Learn dependency of stock prices as a function of

Global influencing factors Sector influencing factors Price of other major stocks

Mar.’02

May.’02

Aug.’02

Oct.’02

Jan.’03

Jan.’02

MSFTDELLINTLNVLSMOTI

10

20

30

40

50

60

70

MSFT

DELL

INTL

NVLS

MOT

Page 4: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Stock Market Learn dependency of stock prices as a function of

Global influencing factors Sector influencing factors Price of other major stocks

Mar.’02

May.’02

Aug.’02

Oct.’02

Jan.’03

Jan.’02

MSFTDELLINTLNVLSMOTI

10

20

30

40

50

60

70

DELL

INTL

NVLS

MOT

MSFT

Page 5: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Stock Market Learn dependency of stock prices as a function of

Global influencing factors Sector influencing factors Price of other major stocks

Mar.’02

May.’02

Aug.’02

Oct.’02

Jan.’03

Jan.’02

MSFTDELLINTLNVLSMOTI

10

20

30

40

50

60

70

INTL

MSFT

DELL

NVLS

MOT

Bayesian Network

Page 6: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Fragment of learned BN

Stock Market

4411 stocks (variables) 273 trading days (instances) from Jan.’02 –

Mar.’03

Problems Statistical robustness Interpretability

Page 7: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Key Observation

Many stocks depend on the same influencing factors in much the same way

Example: Intel, Novelus, Motorola, Dell depend on the price of Microsoft

Many other domains with similar characteristics Gene expression Collaborative filtering Computer network performance …

Mar.’02

May.’02

Aug.’02

Oct.’02

Jan.’03

Jan.’02

MSFTDELLINTLNVLSMOTI

10

20

30

40

50

60

70

Page 8: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

INTL

MSFT

MOT

DELL

AMAT HPQ

CPD 2

CPD 1

CPD 3

Bayesian Network

The Module Network Idea

CPD 6

CPD 3

CPD 5

CPD 1

CPD 2

CPD 4

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

Module Network

Page 9: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Problems and Solutions

Statistical robustness

Interpretability

Share parameters and dependencies between variables with similar behavior

Explicit modeling of modular structure

Page 10: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Outline

Module Network Probabilistic model Learning the model

Experimental results

Page 11: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Module Network Components

Module Assignment Function A(MSFT)=MI

A(MOT)=A(DELL)=A(INTL) =MII

A(AMAT)= A(HPQ)=MIII

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

INTL

MSFT

MOTDELL

AMAT HPQ

Page 12: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Module Network Components

Module Assignment Function

Set of parents for each module Pa(MI)= Pa(MII)={MSFT} Pa(MIII)={DELL, INTL}

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

Page 13: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Module Network Components

Module Assignment Function

Set of parents for each module

CPD template for each module

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

Page 14: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Ground Bayesian Network

A module network induces a ground BN over X

A module network defines a coherent probabilty distribution over X if the ground BN is acyclic

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

INTL

MSFT

MOT

DELL

AMAT HPQ

Ground Bayesian Network

Page 15: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Module Graph

Nodes correspond to modules MiMj if at least one variable in Mi is a parent

of Mj

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

MI MII MIII

Module graph

Theorem: The ground BN is acyclic if the module graph is acyclicAcyclicity checked efficiently using the module graph

Page 16: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Outline

Module Network Probabilistic model Learning the model

Experimental results

Page 17: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Overview

Given data D, find assignment function A and structure S that maximize the Bayesian score

Marginal data likelihood A)dS,|)P(A,S,|P(DA)S,|P(D

Data likelihood

Parameter prior

A)P(S,A)S,|P(DD):A(S, loglogScore

Marginal likelihood

Assignment /structure prior

Page 18: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Instance 3

Likelihood Function

Module III

Module II

Module I

INTL

MSFT

MOT

DELL

AMAT HPQ

Instance 1Instance 2

MI

MII|MSFT

MIII|DELL,INTL

MSFT)(DELL,S^

MSFT)(MOT,S^

MSFT)(INTL,S^

MSFT),(MS II

^

Sufficient statistics of (X,Y)

Y)(X,S^

Likelihood function decomposes by

modules

Page 19: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Bayesian Score Decomposition

Bayesian score decomposes by modules

k

1j

jMM D):Χ,(PascoreD):Ascore(S,

jj

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I D):X(score 1

M1,

D):X(MSFT,score 2M2

D):XINTL},({DELL,score 3M3

score

Delete INTL ModuleIII

Module j variablesModule j parents

D):X(DELL,score 3M3

Page 20: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Bayesian Score Decomposition

Bayesian score decomposes by modules

k

1j

jMM D):Χ,(PascoreD):Ascore(S,

jj

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I D):X(score 1

M1,

D):X(MSFT,score 2M2

D):X(DELL,score 3M3

score

A(MOT)=2 A(MOT)=1

Page 21: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Algorithm Overview

Find assignment function A and structure S that maximize the Bayesian score

Dependency structure S

Improve structure

Improve assignmen

ts

A)logP(S,A)S,|logP(DD):AScore(S,

Find initial assignment A

Assignment function A

Page 22: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Initial Assignment Function

x[1]

AM

AT

MSFT

DELL

MO

T

HPQ

INTL

x[2]x[3]x[4]

Variables (stocks)

Inst

an

ces

(tra

din

g

days)

Find variables that are similar across

instances

A(MOT)= MII

A(INTL)= MII

A(DELL)= MII

MSFT

MO

T

HPQ

AM

AT

DELL

INTL

1 2 3

Page 23: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Algorithm Overview

Find assignment function A and structure S that maximize the Bayesian score

Dependency structure S

Improve structure

Improve assignmen

ts

A)logP(S,A)S,|logP(DD):AScore(S,

Find initial assignment A

Assignment function A

Page 24: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Dependency Structure

Heuristic search with operators Add/delete parent for module Cannot reverse edges

Handle acyclicity Can be checked efficiently

on the module graph

Efficient computation After applying operator for

module Mj, only update scoreof operators for module Mj

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

MI MII MIII

X

INTL ModuleI

INTL ModuleIII

X

MSFT ModuleII

Page 25: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Dependency Structure

Structure search done at module level Parent selection

Reduced search space relative to BN Acyclicity checking

Individual variables only used for computation of sufficient statistics

Page 26: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Algorithm Overview

Find assignment function A and structure S that maximize the Bayesian score

Dependency structure S

Improve structure

Improve assignmen

ts

A)logP(S,A)S,|logP(DD):AScore(S,

Find initial assignment A

Assignment function A

Page 27: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Assignment Function

A(DELL)=MI

Score: 0.7

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module IDELL

Page 28: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Assignment Function

A(DELL)=MI

Score: 0.7

A(DELL)=MII

Score: 0.9INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

DELL

Page 29: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Assignment Function

A(DELL)=MI

Score: 0.7

A(DELL)=MII

Score: 0.9

A(DELL)=MIII

Score: cyclic!

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

DELL

Page 30: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Assignment Function

A(DELL)=MI

Score: 0.7

A(DELL)=MII

Score: 0.9

A(DELL)=MIII

Score: cyclic!

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

Page 31: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Ideal Algorithm

Learn the module assignment of all variables simultaneously D):A'(S,scoreA MA'argmax

Page 32: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Problem

Due to acyclicity cannot optimize assignment for variables separately

DELL

Module I

AMAT

Module III

MSFT

Module II

HPQ

Module IV

MI MII

MIII

Module NetworkModule graph

MIV

A(DELL)=ModuleI

VA(MSFT)=ModuleI

II

DELL

DELL

DELL MSFTDELL

Page 33: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Problem

Due to acyclicity cannot optimize assignment for variables separately

DELL

Module I

AMAT

Module III

MSFT

Module II

HPQ

Module IV

MI MII

MIII

Module NetworkModule graph

MIV

A(DELL)=ModuleI

VA(MSFT)=ModuleI

II

DELL

DELL

DELL MSFTDELL

Page 34: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Assignment Function

Sequential update algorithm Iterate over all variables For each variable, find its optimal assignment

given the current assignment to all other variables

Efficient computation When changing assignment from Mi to Mj, only

need to recompute score for modules i and j

Page 35: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning the Model

Initialize module assignment A

Optimize structure S

Optimize module assignment A For each variable, find its optimal

assignment given the currentassignment to all other variables

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

INTL

MSFT

MOTDELL

AMAT HPQ

MOT

Page 36: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Related Work

Bayesian networks

Parameter sharing

PRMs

OOBNs

Module Networks

Share

d stru

cture

X

X

X

Share

d

para

mete

rs

X

Learn

para

mete

r sh

arin

g

XX

Langseth+

al

N/A

Learn

structu

re

X

Page 37: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Outline

Module Network Probabilistic model Learning the model

Experimental results Statistical validation Case study: Gene regulation

Page 38: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Algorithm Performance

-131

-130

-129

-128

0 5 10 15 20

Bayesi

an

sco

re (

avg

. p

er

gen

e)

Algorithm iterations

0

10

20

30

40

50

0 5 10 15 20

Algorithm iterations

Vars

ch

an

ged

(%

fro

m t

ota

l) Structure change iterations

Page 39: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

-800

-750

-700

-650

-600

-550

-500

-450

0 20 40 60 80 100 120 140 160 180 200

Test

data

lik

elih

ood

(p

er

inst

an

ce)

Number of modules

25 instances50 instances

100 instances

200 instances

500 instances

Generalization to Test Data

Synthetic data: 10 modules, 500 variables

Best performance achieved for models with 10 modules

Page 40: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

-800

-750

-700

-650

-600

-550

-500

-450

0 20 40 60 80 100 120 140 160 180 200

Generalization to Test DataTest

data

lik

elih

ood

(p

er

inst

an

ce)

Number of modules

Synthetic data: 10 modules, 500 variables

25 instances50 instances

100 instances

200 instances

500 instances

Gain beyond 100 instances is small

Page 41: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100 120 140 160 180 200

Structure Recovery Graph Synthetic data: 10 modules, 500 variables

Number of modules

Reco

vere

d s

truct

ure

(%

co

rrect

)

25 instances50 instances

200 instances

500 instances

100 instances

74% of 2250 parent-child relationships recovered

Page 42: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Stock Market 4411 variables (stocks), 273 instances (trading

days) Comparison to Bayesian networks (cross

validation)

Test

Data

Log

-Li

kelih

ood

(gain

per

inst

an

ce)

Number of modules

400

450

500

550

600

0 50 100 150 200 250 3000

Bayesian network

performance

Page 43: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Regulatory Networks Learn structure of regulatory networks:

Which genes are regulated by each regulator

Page 44: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Gene Expression Data

Measures mRNA level forall genes in one condition

Learn dependency of the expression of genes as a function of expression of regulators

Experiments

Gen

es

Induced

Repressed

Page 45: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Gene Expression 2355 variables (genes), 173 instances

(arrays) Comparison to Bayesian networks

Test

Data

Log

-Li

kelih

ood

(gain

per

inst

an

ce)

Number of modules

-150

-100

-50

0

50

100

150

0 100 200 300 400 500

Bayesian network

performance

Page 46: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Biological Evaluation

Find sets of co-regulated genes (regulatory module)

Find the regulators of each module

Segal et al., Nature Genetics, 2003

46/50

30/50

Page 47: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Experimental Design Hypothesis: Regulator ‘X’ activates process

‘Y’ Experiment: Knock out ‘X’ and repeat

experimentHAP4

Ypl230Wtruefalse

truefalse X?

Segal et al., Nature Genetics, 2003

Page 48: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

wt Ypl230w

0 3 5 7 9 24 0 2 5 7 9 24

(hrs.)

>16x

341 differentially expressed genes

0 7 15 30 60 0 7 15 30 60

wt (min.)

Ppt1

>4x

602

0 5 15 30 60 0 5 15 30 60

wt (min.)

Kin82

>4x

281

Differentially Expressed Genes

Segal et al., Nature Genetics, 2003

Page 49: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Were the differentially expressed genes predicted as targets?

Rank modules by enrichment for diff. expressed genes

# Module Significance

14 Ribosomal and phosphate metabolism 8/32, 9e 3

11 Amino acid and purine metabolism 11/53, 1e 2

15 mRNA, rRNA and tRNA processing 9/43, 2e 2

39 Protein folding 6/23, 2e 2

30 Cell cycle 7/30, 2e 2

Ppt1

# Module Significance

39Protein folding 7/23, 1e-4

29Cell differentiation 6/41, 2e-2

5 Glycolysis and folding 5/37, 4e-2

34Mitochondrial and protein fate 5/37, 4e-2

Ypl230w

# Module Significance

3 Energy and osmotic stress I 8/31, 1e 4

2 Energy, osmolarity & cAMP signaling 9/64, 6e 3

15 mRNA, rRNA and tRNA processing 6/43, 2e 2

Kin82

Biological Experiments Validation

All regulators regulate predicted modules

Segal et al., Nature Genetics, 2003

Page 50: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Summary

Probabilistic model for learning modules of variables and their structural dependencies

Improved performance over Bayesian networks Statistical robustness Interpretability

Application to gene regulation Reconstruction of many known regulatory

modules Prediction of targets for unknown regulators

Page 51: Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman