Upload
simon-thomas
View
221
Download
1
Tags:
Embed Size (px)
Citation preview
Learning Module Networks
Eran SegalStanford University
Joint work with:Dana Pe’er (Hebrew U.)Daphne Koller (Stanford)
Aviv Regev (Harvard)Nir Friedman (Hebrew U.)
Learning Bayesian Networks
Density estimation Model data distribution in
population Probabilistic inference:
Prediction Classification
Dependency structure Interactions between variables Causality Scientific discovery
Data
INTL
MSFT
MOT
NVLS
Stock Market Learn dependency of stock prices as a function of
Global influencing factors Sector influencing factors Price of other major stocks
Mar.’02
May.’02
Aug.’02
Oct.’02
Jan.’03
Jan.’02
MSFTDELLINTLNVLSMOTI
10
20
30
40
50
60
70
MSFT
DELL
INTL
NVLS
MOT
Stock Market Learn dependency of stock prices as a function of
Global influencing factors Sector influencing factors Price of other major stocks
Mar.’02
May.’02
Aug.’02
Oct.’02
Jan.’03
Jan.’02
MSFTDELLINTLNVLSMOTI
10
20
30
40
50
60
70
DELL
INTL
NVLS
MOT
MSFT
Stock Market Learn dependency of stock prices as a function of
Global influencing factors Sector influencing factors Price of other major stocks
Mar.’02
May.’02
Aug.’02
Oct.’02
Jan.’03
Jan.’02
MSFTDELLINTLNVLSMOTI
10
20
30
40
50
60
70
INTL
MSFT
DELL
NVLS
MOT
Bayesian Network
Fragment of learned BN
Stock Market
4411 stocks (variables) 273 trading days (instances) from Jan.’02 –
Mar.’03
Problems Statistical robustness Interpretability
Key Observation
Many stocks depend on the same influencing factors in much the same way
Example: Intel, Novelus, Motorola, Dell depend on the price of Microsoft
Many other domains with similar characteristics Gene expression Collaborative filtering Computer network performance …
Mar.’02
May.’02
Aug.’02
Oct.’02
Jan.’03
Jan.’02
MSFTDELLINTLNVLSMOTI
10
20
30
40
50
60
70
INTL
MSFT
MOT
DELL
AMAT HPQ
CPD 2
CPD 1
CPD 3
Bayesian Network
The Module Network Idea
CPD 6
CPD 3
CPD 5
CPD 1
CPD 2
CPD 4
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
Module Network
Problems and Solutions
Statistical robustness
Interpretability
Share parameters and dependencies between variables with similar behavior
Explicit modeling of modular structure
Outline
Module Network Probabilistic model Learning the model
Experimental results
Module Network Components
Module Assignment Function A(MSFT)=MI
A(MOT)=A(DELL)=A(INTL) =MII
A(AMAT)= A(HPQ)=MIII
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
INTL
MSFT
MOTDELL
AMAT HPQ
Module Network Components
Module Assignment Function
Set of parents for each module Pa(MI)= Pa(MII)={MSFT} Pa(MIII)={DELL, INTL}
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
Module Network Components
Module Assignment Function
Set of parents for each module
CPD template for each module
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
Ground Bayesian Network
A module network induces a ground BN over X
A module network defines a coherent probabilty distribution over X if the ground BN is acyclic
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
INTL
MSFT
MOT
DELL
AMAT HPQ
Ground Bayesian Network
Module Graph
Nodes correspond to modules MiMj if at least one variable in Mi is a parent
of Mj
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
MI MII MIII
Module graph
Theorem: The ground BN is acyclic if the module graph is acyclicAcyclicity checked efficiently using the module graph
Outline
Module Network Probabilistic model Learning the model
Experimental results
Learning Overview
Given data D, find assignment function A and structure S that maximize the Bayesian score
Marginal data likelihood A)dS,|)P(A,S,|P(DA)S,|P(D
Data likelihood
Parameter prior
A)P(S,A)S,|P(DD):A(S, loglogScore
Marginal likelihood
Assignment /structure prior
Instance 3
Likelihood Function
Module III
Module II
Module I
INTL
MSFT
MOT
DELL
AMAT HPQ
Instance 1Instance 2
MI
MII|MSFT
MIII|DELL,INTL
MSFT)(DELL,S^
MSFT)(MOT,S^
MSFT)(INTL,S^
MSFT),(MS II
^
Sufficient statistics of (X,Y)
Y)(X,S^
Likelihood function decomposes by
modules
Bayesian Score Decomposition
Bayesian score decomposes by modules
k
1j
jMM D):Χ,(PascoreD):Ascore(S,
jj
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I D):X(score 1
M1,
D):X(MSFT,score 2M2
D):XINTL},({DELL,score 3M3
score
Delete INTL ModuleIII
Module j variablesModule j parents
D):X(DELL,score 3M3
Bayesian Score Decomposition
Bayesian score decomposes by modules
k
1j
jMM D):Χ,(PascoreD):Ascore(S,
jj
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I D):X(score 1
M1,
D):X(MSFT,score 2M2
D):X(DELL,score 3M3
score
A(MOT)=2 A(MOT)=1
Algorithm Overview
Find assignment function A and structure S that maximize the Bayesian score
Dependency structure S
Improve structure
Improve assignmen
ts
A)logP(S,A)S,|logP(DD):AScore(S,
Find initial assignment A
Assignment function A
Initial Assignment Function
x[1]
AM
AT
MSFT
DELL
MO
T
HPQ
INTL
x[2]x[3]x[4]
Variables (stocks)
Inst
an
ces
(tra
din
g
days)
Find variables that are similar across
instances
A(MOT)= MII
A(INTL)= MII
A(DELL)= MII
MSFT
MO
T
HPQ
AM
AT
DELL
INTL
1 2 3
Algorithm Overview
Find assignment function A and structure S that maximize the Bayesian score
Dependency structure S
Improve structure
Improve assignmen
ts
A)logP(S,A)S,|logP(DD):AScore(S,
Find initial assignment A
Assignment function A
Learning Dependency Structure
Heuristic search with operators Add/delete parent for module Cannot reverse edges
Handle acyclicity Can be checked efficiently
on the module graph
Efficient computation After applying operator for
module Mj, only update scoreof operators for module Mj
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
MI MII MIII
X
INTL ModuleI
INTL ModuleIII
X
MSFT ModuleII
Learning Dependency Structure
Structure search done at module level Parent selection
Reduced search space relative to BN Acyclicity checking
Individual variables only used for computation of sufficient statistics
Algorithm Overview
Find assignment function A and structure S that maximize the Bayesian score
Dependency structure S
Improve structure
Improve assignmen
ts
A)logP(S,A)S,|logP(DD):AScore(S,
Find initial assignment A
Assignment function A
Learning Assignment Function
A(DELL)=MI
Score: 0.7
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module IDELL
Learning Assignment Function
A(DELL)=MI
Score: 0.7
A(DELL)=MII
Score: 0.9INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
DELL
Learning Assignment Function
A(DELL)=MI
Score: 0.7
A(DELL)=MII
Score: 0.9
A(DELL)=MIII
Score: cyclic!
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
DELL
Learning Assignment Function
A(DELL)=MI
Score: 0.7
A(DELL)=MII
Score: 0.9
A(DELL)=MIII
Score: cyclic!
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
Ideal Algorithm
Learn the module assignment of all variables simultaneously D):A'(S,scoreA MA'argmax
Problem
Due to acyclicity cannot optimize assignment for variables separately
DELL
Module I
AMAT
Module III
MSFT
Module II
HPQ
Module IV
MI MII
MIII
Module NetworkModule graph
MIV
A(DELL)=ModuleI
VA(MSFT)=ModuleI
II
DELL
DELL
DELL MSFTDELL
Problem
Due to acyclicity cannot optimize assignment for variables separately
DELL
Module I
AMAT
Module III
MSFT
Module II
HPQ
Module IV
MI MII
MIII
Module NetworkModule graph
MIV
A(DELL)=ModuleI
VA(MSFT)=ModuleI
II
DELL
DELL
DELL MSFTDELL
Learning Assignment Function
Sequential update algorithm Iterate over all variables For each variable, find its optimal assignment
given the current assignment to all other variables
Efficient computation When changing assignment from Mi to Mj, only
need to recompute score for modules i and j
Learning the Model
Initialize module assignment A
Optimize structure S
Optimize module assignment A For each variable, find its optimal
assignment given the currentassignment to all other variables
INTL
MSFT
MOT
DELL
AMAT HPQ
Module III
Module II
Module I
INTL
MSFT
MOTDELL
AMAT HPQ
MOT
Related Work
Bayesian networks
Parameter sharing
PRMs
OOBNs
Module Networks
Share
d stru
cture
X
X
X
Share
d
para
mete
rs
X
Learn
para
mete
r sh
arin
g
XX
Langseth+
al
N/A
Learn
structu
re
X
Outline
Module Network Probabilistic model Learning the model
Experimental results Statistical validation Case study: Gene regulation
Learning Algorithm Performance
-131
-130
-129
-128
0 5 10 15 20
Bayesi
an
sco
re (
avg
. p
er
gen
e)
Algorithm iterations
0
10
20
30
40
50
0 5 10 15 20
Algorithm iterations
Vars
ch
an
ged
(%
fro
m t
ota
l) Structure change iterations
-800
-750
-700
-650
-600
-550
-500
-450
0 20 40 60 80 100 120 140 160 180 200
Test
data
lik
elih
ood
(p
er
inst
an
ce)
Number of modules
25 instances50 instances
100 instances
200 instances
500 instances
Generalization to Test Data
Synthetic data: 10 modules, 500 variables
Best performance achieved for models with 10 modules
-800
-750
-700
-650
-600
-550
-500
-450
0 20 40 60 80 100 120 140 160 180 200
Generalization to Test DataTest
data
lik
elih
ood
(p
er
inst
an
ce)
Number of modules
Synthetic data: 10 modules, 500 variables
25 instances50 instances
100 instances
200 instances
500 instances
Gain beyond 100 instances is small
0
10
20
30
40
50
60
70
80
90
0 20 40 60 80 100 120 140 160 180 200
Structure Recovery Graph Synthetic data: 10 modules, 500 variables
Number of modules
Reco
vere
d s
truct
ure
(%
co
rrect
)
25 instances50 instances
200 instances
500 instances
100 instances
74% of 2250 parent-child relationships recovered
Stock Market 4411 variables (stocks), 273 instances (trading
days) Comparison to Bayesian networks (cross
validation)
Test
Data
Log
-Li
kelih
ood
(gain
per
inst
an
ce)
Number of modules
400
450
500
550
600
0 50 100 150 200 250 3000
Bayesian network
performance
Regulatory Networks Learn structure of regulatory networks:
Which genes are regulated by each regulator
Gene Expression Data
Measures mRNA level forall genes in one condition
Learn dependency of the expression of genes as a function of expression of regulators
Experiments
Gen
es
Induced
Repressed
Gene Expression 2355 variables (genes), 173 instances
(arrays) Comparison to Bayesian networks
Test
Data
Log
-Li
kelih
ood
(gain
per
inst
an
ce)
Number of modules
-150
-100
-50
0
50
100
150
0 100 200 300 400 500
Bayesian network
performance
Biological Evaluation
Find sets of co-regulated genes (regulatory module)
Find the regulators of each module
Segal et al., Nature Genetics, 2003
46/50
30/50
Experimental Design Hypothesis: Regulator ‘X’ activates process
‘Y’ Experiment: Knock out ‘X’ and repeat
experimentHAP4
Ypl230Wtruefalse
truefalse X?
Segal et al., Nature Genetics, 2003
wt Ypl230w
0 3 5 7 9 24 0 2 5 7 9 24
(hrs.)
>16x
341 differentially expressed genes
0 7 15 30 60 0 7 15 30 60
wt (min.)
Ppt1
>4x
602
0 5 15 30 60 0 5 15 30 60
wt (min.)
Kin82
>4x
281
Differentially Expressed Genes
Segal et al., Nature Genetics, 2003
Were the differentially expressed genes predicted as targets?
Rank modules by enrichment for diff. expressed genes
# Module Significance
14 Ribosomal and phosphate metabolism 8/32, 9e 3
11 Amino acid and purine metabolism 11/53, 1e 2
15 mRNA, rRNA and tRNA processing 9/43, 2e 2
39 Protein folding 6/23, 2e 2
30 Cell cycle 7/30, 2e 2
Ppt1
# Module Significance
39Protein folding 7/23, 1e-4
29Cell differentiation 6/41, 2e-2
5 Glycolysis and folding 5/37, 4e-2
34Mitochondrial and protein fate 5/37, 4e-2
Ypl230w
# Module Significance
3 Energy and osmotic stress I 8/31, 1e 4
2 Energy, osmolarity & cAMP signaling 9/64, 6e 3
15 mRNA, rRNA and tRNA processing 6/43, 2e 2
Kin82
Biological Experiments Validation
All regulators regulate predicted modules
Segal et al., Nature Genetics, 2003
Summary
Probabilistic model for learning modules of variables and their structural dependencies
Improved performance over Bayesian networks Statistical robustness Interpretability
Application to gene regulation Reconstruction of many known regulatory
modules Prediction of targets for unknown regulators