Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ActiveNetworksCross-Condition Analysis
of Functional Genomic Data
T. M. Murali
April 18, 2006
T. M. Murali April 18, 2006 ActiveNetworks
Motivation: Manual Systems Biology
I Biologists want to study a favourite stress, e.g., oxidativestress or desiccation tolerance.
I Measure gene expression, apply clustering algorithms, and findgenes whose expression level change in response to the stress.
I Trace genes by hand through databases of protein-proteininteractions, gene regulatory networks, metabolic pathways,PubMed searches to build networks activated in response tothe stress.
T. M. Murali April 18, 2006 ActiveNetworks
Motivation: Manual Systems Biology
I Biologists want to study a favourite stress, e.g., oxidativestress or desiccation tolerance.
I Measure gene expression, apply clustering algorithms, and findgenes whose expression level change in response to the stress.
I Trace genes by hand through databases of protein-proteininteractions, gene regulatory networks, metabolic pathways,PubMed searches to build networks activated in response tothe stress.
T. M. Murali April 18, 2006 ActiveNetworks
Motivation: Manual Systems Biology
ARO4
SAH1
YFR055W
SAM1SER3
URA7
LYS14
HIS4 HIS1
lysinesaccharopine
acetoacetate
-KG-ketoglutarate
Thiamine
SAM
transport
PHO3riboswitch ligands?
phospholipid synthesis
SIP18binding
NADPH
HMG-CoA
ERG13
glutamate
acetyl-CoA
acetyl-CoA
YBR238C
FRDS
OSM1
fumaratereductases
osmotic growthprotein
YBL085WTEF4
MET30
SIR3
CYS4
F-box; proteinubiquitination
TPO2 polyamine transport
YHB1 oxidative stress response
LYS12
CLN2
CDC34
GAS3
GAS1
ARO1
S0B L40B
L17B L7A
S26B L27B
S16B L13AS10AS22B
S9B
L7Bribosomalgenes
URA8
serine biosynthesis
cell wallorganization
A
B
C
Redescription R5Heat Shock, 30 min -1 T2 vs T1 -5 AND NOT T2 vs T1 -1
Redescription R5 Gene List
ARO4, ASN1, CLN2, GAS3, HEM13, HIS1, IMD4, PHO3, RPL-7A, 7B, 13A, 17B, 27B, 40B, RPS-0B, 9B, 10A, 16B, 22B, 26B, SAH1, SAM1, SUN4, TEF4, TPO2, URA7, UTR2, YHB1, YBR238C, YER156C, YFR055W, YOR309C
-7 -5 -3 -1Heat shock,
30 minT2 vs. T1
0.71-7 -5 -3 -1
Can we automate this process?
T. M. Murali April 18, 2006 ActiveNetworks
Motivation: Manual Systems Biology
ARO4
SAH1
YFR055W
SAM1SER3
URA7
LYS14
HIS4 HIS1
lysinesaccharopine
acetoacetate
-KG-ketoglutarate
Thiamine
SAM
transport
PHO3riboswitch ligands?
phospholipid synthesis
SIP18binding
NADPH
HMG-CoA
ERG13
glutamate
acetyl-CoA
acetyl-CoA
YBR238C
FRDS
OSM1
fumaratereductases
osmotic growthprotein
YBL085WTEF4
MET30
SIR3
CYS4
F-box; proteinubiquitination
TPO2 polyamine transport
YHB1 oxidative stress response
LYS12
CLN2
CDC34
GAS3
GAS1
ARO1
S0B L40B
L17B L7A
S26B L27B
S16B L13AS10AS22B
S9B
L7Bribosomalgenes
URA8
serine biosynthesis
cell wallorganization
A
B
C
Redescription R5Heat Shock, 30 min -1 T2 vs T1 -5 AND NOT T2 vs T1 -1
Redescription R5 Gene List
ARO4, ASN1, CLN2, GAS3, HEM13, HIS1, IMD4, PHO3, RPL-7A, 7B, 13A, 17B, 27B, 40B, RPS-0B, 9B, 10A, 16B, 22B, 26B, SAH1, SAM1, SUN4, TEF4, TPO2, URA7, UTR2, YHB1, YBR238C, YER156C, YFR055W, YOR309C
-7 -5 -3 -1Heat shock,
30 minT2 vs. T1
0.71-7 -5 -3 -1
Can we automate this process?
T. M. Murali April 18, 2006 ActiveNetworks
Requirements for Automation
I Wiring diagram of the cell: protein-protein interactions,metabolic pathways, transcriptional regulatory networks, . . .
I Measurement of molecular profiles (gene expression, proteinexpression, metabolite levels) under different conditions or cellstates.
I Algorithms for combining these types of information.
T. M. Murali April 18, 2006 ActiveNetworks
High-throughput Biology Provides WiringDiagram
I Large amounts of information on different types of cellularinteractions are now available.
I Protein-protein interactions: genome-scale yeast 2-hybridexperiments, in-vivo pulldowns of protein complexes.
I Transcriptional regulatory networks: ChIP-on-chipexperiments yield protein-DNA binding data.
I Metabolic networks: databases culled from the literature(KEGG).
I Techniques that extract interactions automatically fromabstracts.
T. M. Murali April 18, 2006 ActiveNetworks
S. cerevisiea Wiring Diagram
I Physical networkI 15,429 protein-protein interactions from the Database of
Interacting Proteins (DIP).I 5869 protein-DNA interactions (Lee et al., Science, 2002).I 6,306 metabolic interactions (proteins operate on at least
common metabolite) based on KEGG.
I Genetic networkI 4,125 synthetically lethal/sick interactions (Tong et al.,
Science, 2004).I 687 synthetically lethal interactions (MIPS).
I Overall network has 32,416 (27,604 physical and 4,812genetic) interactions between 5601 proteins (Kelley and Ideker,
Nature Biotech., 2005).
T. M. Murali April 18, 2006 ActiveNetworks
Challenges in Utilising the Wiring Diagram
I Networks are large; they contain tens of thousands ofinteractions.
I High-throughput experiments contain many errors.
I Networks are incomplete; experiments are expensive and havebiases.
I A biologist wants to explore and analyse system of interest.
I How do we zoom into the appropriate parts of the wiringdiagram?
T. M. Murali April 18, 2006 ActiveNetworks
Challenges in Utilising the Wiring Diagram
I Networks are large; they contain tens of thousands ofinteractions.
I High-throughput experiments contain many errors.
I Networks are incomplete; experiments are expensive and havebiases.
I A biologist wants to explore and analyse system of interest.
I How do we zoom into the appropriate parts of the wiringdiagram?
T. M. Murali April 18, 2006 ActiveNetworks
ActiveNetworks
ActiveNetwork: network of interactions activated inresponse to a stress or in a particular condition.
1. Overlay molecular profile for a particular stress on wiringdiagram to obtain ActiveNetwork for that stress.
2. Combine computed ActiveNetworks for each stress tofind
2.1 ActiveNetwork common to multiple stresses.2.2 ActiveNetwork unique to a particular stress or group of
stresses.
T. M. Murali April 18, 2006 ActiveNetworks
ActiveNetworks
ActiveNetwork: network of interactions activated inresponse to a stress or in a particular condition.
1. Overlay molecular profile for a particular stress on wiringdiagram to obtain ActiveNetwork for that stress.
2. Combine computed ActiveNetworks for each stress tofind
2.1 ActiveNetwork common to multiple stresses.2.2 ActiveNetwork unique to a particular stress or group of
stresses.
T. M. Murali April 18, 2006 ActiveNetworks
The ActiveNetworks pipeline
ActiveNetworksCondition−specific Cross−condition
Act
iveN
etw
ork
s fo
r ea
ch c
ondit
ion
Exper
imen
ts
ActiveNetworks
Hypotheses
interaction networksProtein−protein
pathwaysMetabolic
networks
network
ActiveNetworkmining
computationCarbon ActiveNetwork
Heat shock
Cold shock
Desiccation
starvation
Universal
Transcriptionalregulatory
T. M. Murali April 18, 2006 ActiveNetworks
Overlaying Gene Expression Data
I Weight of an interaction is the Pearson correlation betweenthe expression profiles of the interacting genes.
I Weight ≡ “activity” level of the interaction.
I Discard interactions based on a threshold.I Unsatisfactory since we test each interaction individually.
We find the most highly active subnetwork.
T. M. Murali April 18, 2006 ActiveNetworks
Overlaying Gene Expression Data
I Weight of an interaction is the Pearson correlation betweenthe expression profiles of the interacting genes.
I Weight ≡ “activity” level of the interaction.I Discard interactions based on a threshold.
I Unsatisfactory since we test each interaction individually.
We find the most highly active subnetwork.
T. M. Murali April 18, 2006 ActiveNetworks
Overlaying Gene Expression Data
I Weight of an interaction is the Pearson correlation betweenthe expression profiles of the interacting genes.
I Weight ≡ “activity” level of the interaction.I Discard interactions based on a threshold.
I Unsatisfactory since we test each interaction individually.
We find the most highly active subnetwork.
T. M. Murali April 18, 2006 ActiveNetworks
Defining Highly-Active Subnetworks
I How do we measure the activity/weight of a subnetwork?
I Sum or average of edge weights?
I The density of a network with n nodes is the total weight ofthe edges divided by n.
I Problem: Compute the subnetwork with highest density.
0.7
0.4
0.5
0.5
0.3
0.5
0.1
0.80.1
0.9
T. M. Murali April 18, 2006 ActiveNetworks
Defining Highly-Active Subnetworks
I How do we measure the activity/weight of a subnetwork?
I Sum or average of edge weights?
I The density of a network with n nodes is the total weight ofthe edges divided by n.
I Problem: Compute the subnetwork with highest density.
0.7
0.4
0.5
0.5
0.3
0.5
0.1
0.80.1
0.9
T. M. Murali April 18, 2006 ActiveNetworks
Defining Highly-Active Subnetworks
I How do we measure the activity/weight of a subnetwork?
I Sum or average of edge weights?
I The density of a network with n nodes is the total weight ofthe edges divided by n.
I Problem: Compute the subnetwork with highest density.
0.7
0.4
0.5
0.5
0.3
0.5
0.1
0.80.1
0.9
T. M. Murali April 18, 2006 ActiveNetworks
Computing Most Dense Subnetwork
I O(n3) time network flow-based approach gives optimal result(Gallo, Grigoriadis, Tarjan, SIAM J. Comp, 1989).
I Can also be solved by linear programming.
T. M. Murali April 18, 2006 ActiveNetworks
Computing Most Dense Subnetwork
I Greedy algorithm:I Weight of a node ≡ total weight of incident edges.I Repeatedly delete nodes with the smallest weight.I Keep track of density of remaining network.I Return the most dense subnetwork.
I Computed subnetwork is at least half as dense as the mostdense subnetwork (Charikar, Proc. APPROX, 2000).
0.7
0.4
0.5
0.5
0.3
0.5
0.1
0.80.1
0.9
T. M. Murali April 18, 2006 ActiveNetworks
Computing Most Dense Subnetwork
I Greedy algorithm:I Weight of a node ≡ total weight of incident edges.I Repeatedly delete nodes with the smallest weight.I Keep track of density of remaining network.I Return the most dense subnetwork.
I Computed subnetwork is at least half as dense as the mostdense subnetwork (Charikar, Proc. APPROX, 2000).
0.7
0.4
0.5
0.5
0.3
0.5
0.1
0.80.1
0.9
T. M. Murali April 18, 2006 ActiveNetworks
Computing Most Dense Subnetwork
I Greedy algorithm:I Weight of a node ≡ total weight of incident edges.I Repeatedly delete nodes with the smallest weight.I Keep track of density of remaining network.I Return the most dense subnetwork.
I Computed subnetwork is at least half as dense as the mostdense subnetwork (Charikar, Proc. APPROX, 2000).
0.7
0.4
0.5
0.5
0.3
0.5
0.1
0.80.1
0.9
T. M. Murali April 18, 2006 ActiveNetworks
Computing Multiple Dense Subnetworks
I Repeat
1. Apply greedy algorithm to compute most dense subnetwork.2. Remove edges of computed subnetwork from the network.
I Until remaining network has density less than the originalnetwork.
I Output is a sequence of decreasingly dense subnetworks thatcan share nodes but not edges.
T. M. Murali April 18, 2006 ActiveNetworks
Advantages of Dense Subnetworks
I Uses no parameters.
I Avoid inclusion of interactions that appear active due to noise.
I Relatively weakly correlated interactions can reinforce eachother.
T. M. Murali April 18, 2006 ActiveNetworks
Example of an ActiveNetwork
T. M. Murali April 18, 2006 ActiveNetworks
Example of an ActiveNetwork
T. M. Murali April 18, 2006 ActiveNetworks
Further Analysis of an ActiveNetwork
I Visualise the network (Graphviz package) and the geneexpression profiles.
I Measure functional enrichment.I Use hypergeometric distribution to calculate the significance of
functions enriched in an ActiveNetwork.I Use Bonferroni correction to adjust for testing multiple
hypotheses.
T. M. Murali April 18, 2006 ActiveNetworks
AA Starvation: Conventional Analysis
T. M. Murali April 18, 2006 ActiveNetworks
AA Starvation: ActiveNetwork
SDH1
YLR164W
COR1 RIP1
UGA2
LSC1
QCR10
LSC2
SDH2
IME1
CIN5
PHD1
NRG1
YNL092W
YAP6 MGA1
GPM1
TDH2
TDH3
TDH1
ADH5
PDC5
ADH1
PDC1
PHO11
ADH3
GCN4
GLT1
GLN1
CPA1
CPA2
ADE6
ADE4
GLN4
SIT4
GUA1
MSN2
GID8
HSP104
YKL044W
MSN4
CYB2
PYC1
PYK2
COX5B
ALT1URA8
ELP2IKI1ELP3
KTI12
TKL1
PRS1
PUS1
PRS5
PPT1
PMT4
CKA2
CTT1
PRS3
PRS4
T. M. Murali April 18, 2006 ActiveNetworks
ActiveNetworks for Multiple Stresses
ActiveNetwork: network of interactions activated inresponse to a stress or in a particular condition.
1. Overlay molecular profile for a particular stress on universalnetwork to obtain ActiveNetwork for that stress.
2. Combine computed ActiveNetworks for each stress tofind
2.1 ActiveNetwork common to multiple stresses.2.2 ActiveNetwork unique to a particular stress or group of
stresses.
T. M. Murali April 18, 2006 ActiveNetworks
ActiveNetworks for Multiple Stresses
ActiveNetwork: network of interactions activated inresponse to a stress or in a particular condition.
1. Overlay molecular profile for a particular stress on universalnetwork to obtain ActiveNetwork for that stress.
2. Combine computed ActiveNetworks for each stress tofind
2.1 ActiveNetwork common to multiple stresses.2.2 ActiveNetwork unique to a particular stress or group of
stresses.
T. M. Murali April 18, 2006 ActiveNetworks
Comparative ActiveNetwork Analysis
Ex
per
imen
ts
Act
iveN
etw
ork
s fo
r ea
ch c
on
dit
ion
Carbon ActiveNetwork
Universalnetwork
interaction networksProtein−protein Transcriptional
regulatory
starvation
networks
computation
Heat shock
Cold shock
I Richard and Malcolm want to compare desiccationActiveNetwork with other ActiveNetworks to findsimilarities and differences.
Can we automate this process?
T. M. Murali April 18, 2006 ActiveNetworks
Comparative ActiveNetwork Analysis
Act
iveN
etw
ork
s fo
r ea
ch c
on
dit
ion
Ex
per
imen
tsregulatory
network
interaction networks
Universal
Protein−protein
ActiveNetworkcomputation
Carbonstarvation
Heat shock
Cold shock
Desiccation
networks
Transcriptional
I Richard and Malcolm want to compare desiccationActiveNetwork with other ActiveNetworks to findsimilarities and differences.
Can we automate this process?
T. M. Murali April 18, 2006 ActiveNetworks
Cross-Condition ActiveNetworks
Cross−conditionCondition−specificActiveNetworks ActiveNetworks
Act
iveN
etw
ork
s fo
r ea
ch c
ondit
ion
Exper
imen
ts
Carbon
Protein−protein
mining
network
ActiveNetwork
Transcriptionalregulatory
Heat shock
Cold shock
Desiccation
starvation
networks
computationActiveNetwork
interaction networks
Universal
I A “conserved” ActiveNetwork is a set of conditions and aset of interactions, such that each interaction appears in theActiveNetwork for each condition.
T. M. Murali April 18, 2006 ActiveNetworks
Computing Conserved ActiveNetworks
Act
iveN
etw
orks
Interactions
I Construct a 0-1 interaction-by-condition matrix.
I A “conserved” ActiveNetwork is
a set of conditions and aset of interactions
,
such that each interaction appears in theActiveNetwork for each condition.
I We can compute a conserved ActiveNetwork usingtechniques for finding itemsets or biclusters.
T. M. Murali April 18, 2006 ActiveNetworks
Computing Conserved ActiveNetworks
Act
iveN
etw
orks
Interactions
I Construct a 0-1 interaction-by-condition matrix.
I A “conserved” ActiveNetwork is
a set of conditions and aset of interactions
,
such that each interaction appears in theActiveNetwork for each condition.
I We can compute a conserved ActiveNetwork usingtechniques for finding itemsets or biclusters.
T. M. Murali April 18, 2006 ActiveNetworks
Computing Conserved ActiveNetworks
Interactions
Act
iveN
etw
orks
I Construct a 0-1 interaction-by-condition matrix.
I A “conserved” ActiveNetwork is a set of conditions
and aset of interactions
,
such that each interaction appears in theActiveNetwork for each condition.
I We can compute a conserved ActiveNetwork usingtechniques for finding itemsets or biclusters.
T. M. Murali April 18, 2006 ActiveNetworks
Computing Conserved ActiveNetworks
Interactions
Act
iveN
etw
orks
I Construct a 0-1 interaction-by-condition matrix.
I A “conserved” ActiveNetwork is a set of conditions and aset of interactions,
such that each interaction appears in theActiveNetwork for each condition.
I We can compute a conserved ActiveNetwork usingtechniques for finding itemsets or biclusters.
T. M. Murali April 18, 2006 ActiveNetworks
Computing Conserved ActiveNetworks
Act
iveN
etw
orks
Interactions
I Construct a 0-1 interaction-by-condition matrix.
I A “conserved” ActiveNetwork is a set of conditions and aset of interactions, such that each interaction appears in theActiveNetwork for each condition.
I We can compute a conserved ActiveNetwork usingtechniques for finding itemsets or biclusters.
T. M. Murali April 18, 2006 ActiveNetworks
Computing Conserved ActiveNetworks
Act
iveN
etw
orks
Interactions
I Construct a 0-1 interaction-by-condition matrix.
I A “conserved” ActiveNetwork is a set of conditions and aset of interactions, such that each interaction appears in theActiveNetwork for each condition.
I We can compute a conserved ActiveNetwork usingtechniques for finding itemsets or biclusters.
T. M. Murali April 18, 2006 ActiveNetworks
Computing Conserved ActiveNetworks
Interactions
Act
iveN
etw
orks
I A “large” submatrix of 1’s is a frequent itemset.
I Such a submatrix is a special case of a bicluster in geneexpression data.
I We use the apriori algorithm for finding all maximal (closed)itemsets (Agrawal and Srikant 1995) and the xMotif algorithmfor finding large biclusters (Murali and Kasif, 2003).
T. M. Murali April 18, 2006 ActiveNetworks
Computing Conserved ActiveNetworks
Interactions
Act
iveN
etw
orks
I A “large” submatrix of 1’s is a frequent itemset.
I Such a submatrix is a special case of a bicluster in geneexpression data.
I We use the apriori algorithm for finding all maximal (closed)itemsets (Agrawal and Srikant 1995) and the xMotif algorithmfor finding large biclusters (Murali and Kasif, 2003).
T. M. Murali April 18, 2006 ActiveNetworks
Example of a Cross-Condition ActiveNetwork
RIP1COX12
DLD1
CYB2
QCR6
COR1
QCR8
COX7
HAP4
CYT1
QCR2
I Common to “Alternative carbon sources.” “DTT treatment”and “Growth in YPD culture.”
T. M. Murali April 18, 2006 ActiveNetworks
Example of a Cross-Condition ActiveNetwork
I Common to “Alternative carbon sources.” “DTT treatment”and “Growth in YPD culture.”
T. M. Murali April 18, 2006 ActiveNetworks
Comparative Systems Biology
I ActiveNetworks provide an integrated view ofmulti-modal universal networks and measurements ofmolecular profiles.
I Compute single stimulus ActiveNetworks using densesubgraphs.
I Compare and contrast ActiveNetworks for differentstimuli using frequent itemsets.
I Automatic extraction of network modules and legos from largescale data.
I Promises system-level insights from comparisons betweendifferent conditions, disease states, or species.
T. M. Murali April 18, 2006 ActiveNetworks
Closing the Loop
ActiveNetworksCondition−specific Cross−condition
Act
iveN
etw
ork
s fo
r ea
ch c
on
dit
ion
Ex
per
imen
ts
ActiveNetworks
Hypotheses
interaction networksProtein−protein
pathwaysMetabolic
networks
network
ActiveNetworkmining
computationCarbon ActiveNetwork
Heat shock
Cold shock
Desiccation
starvation
Universal
Transcriptionalregulatory
T. M. Murali April 18, 2006 ActiveNetworks
Project Members
I Greg Grothaus
I Deept Kumar
I Maulik Shukla
I Graham Jack
I Corban Rivera
I Richard Helm
I Malcolm Potts
I Naren Ramakrishnan
T. M. Murali April 18, 2006 ActiveNetworks
Future Research: Modelling and AlgorithmicImprovements
I Integrate other types of data: metabolic measurements,protein expression.
I Explicitly incorporate expression level of a gene.
T. M. Murali April 18, 2006 ActiveNetworks
Future Research: Applications
I ActiveNetworks in cancer: integrate gene expression dataand protein interaction networks.
I Compare oxidative stress networks across kingdom boundaries(yeast, Arabidopsis thaliana, malaria parasite, P. sojae).
I Cross-stress networks in Arabidopsis thaliana.
I Redox signalling in various plant species.
T. M. Murali April 18, 2006 ActiveNetworks
Related Research
I Discovering regulatory and signalling circuits in molecularinteraction networks, Ideker et al. ISMB 2002.
I Physical network models and multi-source data integration, Yeangand Jakkola, RECOMB 2003.
I Discovering molecular pathways from protein interaction and geneexpression data, Segal, Wang, and Koller, ISMB 2003.
I Computational discovery of gene modules and regulatory networks,Bar-Joseph et al., Nature Biotechnology, November 2003.
I Revealing modularity and organization in the yeast molecularnetwork by integrated analysis of highly heterogeneous genomewidedata, Tanay et al., PNAS, March 2004.
I Evidence for dynamically organized modularity in the yeastprotein-protein interaction network, Han et al., Nature, July 2004.
I Genomic analysis of regulatory network dynamics reveals largetoplogical changes, Luscombe et al., Nature, October 2004.
T. M. Murali April 18, 2006 ActiveNetworks