Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman

Learning Module Networks

Eran SegalStanford University

Joint work with:Dana Pe’er (Hebrew U.)Daphne Koller (Stanford)

Aviv Regev (Harvard)Nir Friedman (Hebrew U.)

Learning Bayesian Networks

Density estimation Model data distribution in

population Probabilistic inference:

Prediction Classification

Dependency structure Interactions between variables Causality Scientific discovery

Data

INTL

MSFT

MOT

NVLS

Stock Market Learn dependency of stock prices as a function of

Global influencing factors Sector influencing factors Price of other major stocks

Mar.’02

May.’02

Aug.’02

Oct.’02

Jan.’03

Jan.’02

MSFTDELLINTLNVLSMOTI

10

20

30

40

50

60

70

MSFT

DELL

INTL

NVLS

MOT



Mar.’02

May.’02

Aug.’02

Oct.’02

Jan.’03

Jan.’02


10

20

30

40

50

60

70

DELL

INTL

NVLS

MOT

MSFT



Mar.’02

May.’02

Aug.’02

Oct.’02

Jan.’03

Jan.’02


10

20

30

40

50

60

70

INTL

MSFT

DELL

NVLS

MOT

Bayesian Network

Fragment of learned BN

Stock Market

4411 stocks (variables) 273 trading days (instances) from Jan.’02 –

Mar.’03

Problems Statistical robustness Interpretability

Key Observation

Many stocks depend on the same influencing factors in much the same way

Example: Intel, Novelus, Motorola, Dell depend on the price of Microsoft

Many other domains with similar characteristics Gene expression Collaborative filtering Computer network performance …

Mar.’02

May.’02

Aug.’02

Oct.’02

Jan.’03

Jan.’02


10

20

30

40

50

60

70

INTL

MSFT

MOT

DELL

AMAT HPQ

CPD 2

CPD 1

CPD 3

Bayesian Network

The Module Network Idea

CPD 6

CPD 3

CPD 5

CPD 1

CPD 2

CPD 4

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

Module Network

Problems and Solutions

Statistical robustness

Interpretability

Share parameters and dependencies between variables with similar behavior

Explicit modeling of modular structure

Outline

Module Network Probabilistic model Learning the model

Experimental results

Module Network Components

Module Assignment Function A(MSFT)=MI

A(MOT)=A(DELL)=A(INTL) =MII

A(AMAT)= A(HPQ)=MIII

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

INTL

MSFT

MOTDELL

AMAT HPQ


Module Assignment Function

Set of parents for each module Pa(MI)= Pa(MII)={MSFT} Pa(MIII)={DELL, INTL}

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I


Module Assignment Function

Set of parents for each module

CPD template for each module

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

Ground Bayesian Network

A module network induces a ground BN over X

A module network defines a coherent probabilty distribution over X if the ground BN is acyclic

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

INTL

MSFT

MOT

DELL

AMAT HPQ

Ground Bayesian Network

Module Graph

Nodes correspond to modules MiMj if at least one variable in Mi is a parent

of Mj

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

MI MII MIII

Module graph

Theorem: The ground BN is acyclic if the module graph is acyclicAcyclicity checked efficiently using the module graph

Outline


Experimental results

Learning Overview

Given data D, find assignment function A and structure S that maximize the Bayesian score

Marginal data likelihood A)dS,|)P(A,S,|P(DA)S,|P(D

Data likelihood

Parameter prior

A)P(S,A)S,|P(DD):A(S, loglogScore

Marginal likelihood

Assignment /structure prior

Instance 3

Likelihood Function

Module III

Module II

Module I

INTL

MSFT

MOT

DELL

AMAT HPQ

Instance 1Instance 2

MI

MII|MSFT

MIII|DELL,INTL

MSFT)(DELL,S^

MSFT)(MOT,S^

MSFT)(INTL,S^

MSFT),(MS II

^

Sufficient statistics of (X,Y)

Y)(X,S^

Likelihood function decomposes by

modules

Bayesian Score Decomposition

Bayesian score decomposes by modules

k

1j

jMM D):Χ,(PascoreD):Ascore(S,

jj

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I D):X(score 1

M1,

D):X(MSFT,score 2M2

D):XINTL},({DELL,score 3M3

score

Delete INTL ModuleIII

Module j variablesModule j parents

D):X(DELL,score 3M3

Bayesian Score Decomposition

Bayesian score decomposes by modules

k

1j

jMM D):Χ,(PascoreD):Ascore(S,

jj

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I D):X(score 1

M1,

D):X(MSFT,score 2M2

D):X(DELL,score 3M3

score

A(MOT)=2 A(MOT)=1

Algorithm Overview

Find assignment function A and structure S that maximize the Bayesian score

Dependency structure S

Improve structure

Improve assignmen

ts

A)logP(S,A)S,|logP(DD):AScore(S,

Find initial assignment A

Assignment function A

Initial Assignment Function

x[1]

AM

AT

MSFT

DELL

MO

T

HPQ

INTL

x[2]x[3]x[4]

Variables (stocks)

Inst

an

ces

(tra

din

g

days)

Find variables that are similar across

instances

A(MOT)= MII

A(INTL)= MII

A(DELL)= MII

MSFT

MO

T

HPQ

AM

AT

DELL

INTL

1 2 3

Algorithm Overview



Improve structure

Improve assignmen

ts




Learning Dependency Structure

Heuristic search with operators Add/delete parent for module Cannot reverse edges

Handle acyclicity Can be checked efficiently

on the module graph

Efficient computation After applying operator for

module Mj, only update scoreof operators for module Mj

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

MI MII MIII

X

INTL ModuleI

INTL ModuleIII

X

MSFT ModuleII

Learning Dependency Structure

Structure search done at module level Parent selection

Reduced search space relative to BN Acyclicity checking

Individual variables only used for computation of sufficient statistics

Algorithm Overview



Improve structure

Improve assignmen

ts




Learning Assignment Function

A(DELL)=MI

Score: 0.7

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module IDELL


A(DELL)=MI

Score: 0.7

A(DELL)=MII

Score: 0.9INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

DELL


A(DELL)=MI

Score: 0.7

A(DELL)=MII

Score: 0.9

A(DELL)=MIII

Score: cyclic!

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

DELL


A(DELL)=MI

Score: 0.7

A(DELL)=MII

Score: 0.9

A(DELL)=MIII

Score: cyclic!

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

Ideal Algorithm

Learn the module assignment of all variables simultaneously D):A'(S,scoreA MA'argmax

Problem

Due to acyclicity cannot optimize assignment for variables separately

DELL

Module I

AMAT

Module III

MSFT

Module II

HPQ

Module IV

MI MII

MIII

Module NetworkModule graph

MIV

A(DELL)=ModuleI

VA(MSFT)=ModuleI

II

DELL

DELL

DELL MSFTDELL

Problem

Due to acyclicity cannot optimize assignment for variables separately

DELL

Module I

AMAT

Module III

MSFT

Module II

HPQ

Module IV

MI MII

MIII

Module NetworkModule graph

MIV

A(DELL)=ModuleI

VA(MSFT)=ModuleI

II

DELL

DELL

DELL MSFTDELL


Sequential update algorithm Iterate over all variables For each variable, find its optimal assignment

given the current assignment to all other variables

Efficient computation When changing assignment from Mi to Mj, only

need to recompute score for modules i and j

Learning the Model

Initialize module assignment A

Optimize structure S

Optimize module assignment A For each variable, find its optimal

assignment given the currentassignment to all other variables

INTL

MSFT

MOT

DELL

AMAT HPQ

Module III

Module II

Module I

INTL

MSFT

MOTDELL

AMAT HPQ

MOT

Related Work

Bayesian networks

Parameter sharing

PRMs

OOBNs

Module Networks

Share

d stru

cture

X

X

X

Share

d

para

mete

rs

X

Learn

para

mete

r sh

arin

g

XX

Langseth+

al

N/A

Learn

structu

re

X

Outline


Experimental results Statistical validation Case study: Gene regulation

Learning Algorithm Performance

-131

-130

-129

-128

0 5 10 15 20

Bayesi

an

sco

re (

avg

. p

er

gen

e)

Algorithm iterations

0

10

20

30

40

50

0 5 10 15 20

Algorithm iterations

Vars

ch

an

ged

(%

fro

m t

ota

l) Structure change iterations

-800

-750

-700

-650

-600

-550

-500

-450

0 20 40 60 80 100 120 140 160 180 200

Test

data

lik

elih

ood

(p

er

inst

an

ce)

Number of modules

25 instances50 instances

100 instances

200 instances

500 instances

Generalization to Test Data

Synthetic data: 10 modules, 500 variables

Best performance achieved for models with 10 modules

-800

-750

-700

-650

-600

-550

-500

-450

0 20 40 60 80 100 120 140 160 180 200

Generalization to Test DataTest

data

lik

elih

ood

(p

er

inst

an

ce)

Number of modules

Synthetic data: 10 modules, 500 variables


100 instances

200 instances

500 instances

Gain beyond 100 instances is small

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100 120 140 160 180 200

Structure Recovery Graph Synthetic data: 10 modules, 500 variables

Number of modules

Reco

vere

d s

truct

ure

(%

co

rrect

)


200 instances

500 instances

100 instances

74% of 2250 parent-child relationships recovered

Stock Market 4411 variables (stocks), 273 instances (trading

days) Comparison to Bayesian networks (cross

validation)

Test

Data

Log

-Li

kelih

ood

(gain

per

inst

an

ce)

Number of modules

400

450

500

550

600

0 50 100 150 200 250 3000

Bayesian network

performance

Regulatory Networks Learn structure of regulatory networks:

Which genes are regulated by each regulator

Gene Expression Data

Measures mRNA level forall genes in one condition

Learn dependency of the expression of genes as a function of expression of regulators

Experiments

Gen

es

Induced

Repressed

Gene Expression 2355 variables (genes), 173 instances

(arrays) Comparison to Bayesian networks

Test

Data

Log

-Li

kelih

ood

(gain

per

inst

an

ce)

Number of modules

-150

-100

-50

0

50

100

150

0 100 200 300 400 500

Bayesian network

performance

Biological Evaluation

Find sets of co-regulated genes (regulatory module)

Find the regulators of each module

Segal et al., Nature Genetics, 2003

46/50

30/50

Experimental Design Hypothesis: Regulator ‘X’ activates process

‘Y’ Experiment: Knock out ‘X’ and repeat

experimentHAP4

Ypl230Wtruefalse

truefalse X?


wt Ypl230w

0 3 5 7 9 24 0 2 5 7 9 24

(hrs.)

>16x

341 differentially expressed genes

0 7 15 30 60 0 7 15 30 60

wt (min.)

Ppt1

>4x

602

0 5 15 30 60 0 5 15 30 60

wt (min.)

Kin82

>4x

281

Differentially Expressed Genes


Were the differentially expressed genes predicted as targets?

Rank modules by enrichment for diff. expressed genes

# Module Significance

14 Ribosomal and phosphate metabolism 8/32, 9e 3

11 Amino acid and purine metabolism 11/53, 1e 2

15 mRNA, rRNA and tRNA processing 9/43, 2e 2

39 Protein folding 6/23, 2e 2

30 Cell cycle 7/30, 2e 2

Ppt1


39Protein folding 7/23, 1e-4

29Cell differentiation 6/41, 2e-2

5 Glycolysis and folding 5/37, 4e-2

34Mitochondrial and protein fate 5/37, 4e-2

Ypl230w


3 Energy and osmotic stress I 8/31, 1e 4

2 Energy, osmolarity & cAMP signaling 9/64, 6e 3

15 mRNA, rRNA and tRNA processing 6/43, 2e 2

Kin82

Biological Experiments Validation

All regulators regulate predicted modules


Summary

Probabilistic model for learning modules of variables and their structural dependencies

Improved performance over Bayesian networks Statistical robustness Interpretability

Application to gene regulation Reconstruction of many known regulatory

modules Prediction of targets for unknown regulators

Documents

Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman