Upload
vivien-wilkins
View
218
Download
0
Embed Size (px)
Citation preview
1
Joint analysis of regulatory networks and expression
profilesRon Shamir
School of Computer ScienceTel Aviv University
April 2013
1
Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). Igor Ulitsky and Ron Shamir. Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics Vol. 25 no. 9 1158-1164 (2009) .
Outline• Background• Joint network and expression profiles
– Matisse– Cezanne
2
DNA RNA protein
transcription translation
The hard disk
One program
Its output
4
DNA Microarrays / RNA-seq
• Simultaneous measurement of expression levels of all genes / transcripts.
• Perform 105-109 measurements in one experiment
• Allow global view of cellular processes. • The most important biotechnological
breakthroughs of the last /current decade
http://www.biomedcentral.com/1471-2105/12/323/figure/F25
The Raw Data
genes
experiments
Entries of the Raw Data matrix: expression levels.Ratios/absolute values/…
• expression pattern for each gene• Profile for each experiment/condition/sample/chip
Needs normalization!
6
7
EXPression ANalyzer and DisplayER
Clustering Identify clusters of co-expressed
genes
CLICK, KMeans, SOM, hierarchical
http://acgt.cs.tau.ac.il/expander
A. Maron, R. Sharan Bioinformatics 03
Function.
enrichment
GO, TANGO
Visualization
Promoter analysis
Analyze TF binding sites of
co-regulated genes
PRIMA
Biclustering Identify
homogeneous submatrices
SAMBA
A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, R. Elkon BMC Bioinformatics 05
microRNA
function
inference: FAME
Ulitsky et al. Nature Protocols 10
Networks of Protein-protein interactions (PPIs)
• Large, readily available resource• Representation: Network with
nodes=proteins/genes edges=interactions
8
Analysis methods:Global propertiesMotif content analysisComplex extractionCross-species comparison
• Potential inroad into pathways and function
• Can the network help to improve the analysis?
10
12
Goal
• Challenge: Detect active functional modules: connected subnetwork of proteins whose genes are co-expressed
• “Where is the action in the network in a particular experiment?”
Ron Shamir, RNA Antalia, April 081313
• Input: Expression data and a PPI network
• Output: a collection of modules– Connected PPI subnetworks– Correlated expression profiles
Interaction
High expression similarity
http://acgt.cs.tau.ac.il/matisse16
Modular Analysis for Topology of Interactions and
Similarity SEts
Probabilistic model• Event Mij: i,j are mates = highly co-expressed
• P(Sij|Mij) ~ N(m , 2m)
• P(Sij|Mij) ~ N(n , 2n)
• H0: U is a set of unrelated genes• H1: U is a module = connected subnetwork with high internal similarity
• Ri: gene i transcriptionally regulated• m: fraction of mates out of module gene pairs that are transcriptionally regulated
• m= P(Mij| Ri Rj, H1)• pm: fraction of mates out of all gene pairs that are transcriptionally regulated
))P(R(R)|HP(S jiMUxU 17
Probabilistic model (2)• Is connected gene set U a module?
Assuming pair indep:• Define m
ij= m P(Ri)P(Rj)
• Define nij= pm P(Ri)P(Rj).
• Likelihood ratio Pr(Data|H1)/Pr Data|H0)
• Taking log: sum of terms ij:
18
))P(R(R)|HP(S jiMUxU
Probabilistic model - summary
• Similarities: mixture of two Gaussians• For a candidate group U, the likelihood ratio of originating from a module or from the background is
• Module score = Gene group likelihood ratio = sum over all the gene pairs
• Find connected subgraphs U with high WU
( , ) ( , )
( | )( | )log log
( | ) ( | )ij MU U M
U iji j U U i j U UU U N ij N
P S HP S HW w
P S H P S H
))P(R(R)|HP(S jiMUxU 19
Complexity
• Finding heaviest connected subgraph: NP hard even without connectivity constraints (+/- edge weights)
• Devised a heuristic for the problem
20
MATISSE workflow
• Seed generation• Greedy optimization• Significance filtering
Finding seeds
• Three seeding alternatives tested• All alternatives build a seed and
delete it from the network• Building small seeds around single
nodes:• Best neighbors• All neighbors
• Approximating the heaviest subgraph• Delete low-degree nodes and record the
heaviest subnetwork found
Greedy optimization
• Simultaneous optimization of all the seeds
• The following steps are considered:• Node addition• Node removal• Assignment change• Module merge
Front vs. Back nodes
• Only a fraction of the genes (front nodes) have meaningful similarity values
• MATISSE can link them using other genes (back nodes).
• Back nodes correspond to:– Unmeasured transcripts– Post-translational regulation– Partially regulated pathways 24
Test case: Yeast osmotic shock
• Network: 65,990 PPIs & protein-DNA interactions among 6,246 genes
• Expression: 133 experimental conditions – response of perturbed strains to osmotic shock (O’Rourke & Herskowitz 04)
• Front nodes: 2,000 genes with the highest variance
26
Performance comparison
0
10
20
30
40
50
60
70
80
90
100
Matisse Co-Clustering CLICK Random
GO-Process
GO-Compartment
MIPS Phenotypes
KEGG Pathways
% of modules with category enrichment at p< 10-3
0
5
10
15
20
25
30
35
40
45
Matisse Co-Clustering CLICK Random
GO-Process
GO-Compartment
MIPS Phenotypes
KEGG Pathways
% annotations enriched at p<10-3 in modules
28
GO and promoter analysisSubnetwork Size Front Enriched GO terms P-value TFs P-Value
1 120 119 processing of 20S pre-rRNA < 0.001 Fhl1 4.82E-16rRNA processing < 0.001 Rap1 2.89E-1135S primary transcript processing < 0.001 Sfp1 2.98E-08ribosomal large subunit assembly and maintenance 0.019rRNA modification < 0.001ribosome biogenesis 0.029
2 120 118 translational elongation < 0.001 Fhl1 1.03E-053 120 118 processing of 20S pre-rRNA < 0.001
rRNA processing 0.0335S primary transcript processing 0.011ribosomal large subunit assembly and maintenance 0.019ribosomal large subunit biogenesis < 0.001
5 120 112 signal transduction during filamentous growth 0.01 Ste12 5.41E-13conjugation with cellular fusion < 0.001 Dig1 5.41E-13
6 120 99 transcription from RNA polymerase III promoter < 0.001transcription from RNA polymerase I promoter 0.006
7 120 107 ergosterol biosynthesis < 0.001hexose transport 0.019
8 114 85 chromatin remodeling 0.0511 120 114 pseudohyphal growth 0.01 Msn2 3.17E-04
response to stress < 0.001 Msn4 1.82E-1214 120 102 ubiquitin-dependent protein catabolism 0.04715 120 96 nuclear mRNA splicing, via spliceosome < 0.00116 89 61 ubiquitin-dependent protein catabolism < 0.001 Rpn4 6.44E-0617 120 109 response to stress < 0.001 Msn4 1.74E-03
mitochondrial electron transport < 0.00118 87 59 nuclear mRNA splicing, via spliceosome 0.01220 46 35 pyridoxine metabolism 0.045 29
Application to stem cells• ~150 human stem cell lines of diverse
types profiled using microarrays• Clustered profiles into groups• Adjusted Matisse to seek subnetworks
that characteristic to each group • Focused analysis on pluripotent stem
cells
F. Müller, L. Laurent, D. Kostka, I. Ulitsky, R. Williams, C. Lu, I. Park, M. Rao, P. Schwartz, N. Schmidt, J. Loring Nature 08
30
Pluripotent stem cells network
Highlights the key protein machinery underlying pluripotency
31
Accounting for PPI confidence• PPI-based analysis is made difficult by
abundant false positive / negative interactions• Various methods can assign confidence
(probability) to individual edges• Idea: seek modules that are connected with
high probability
Ulitsky & Shamir Bioinformatics, 2009
33
CEZANNE: (Co-Expression Zone ANalysis using NEtworks)
•Edge probability p(e) Edge weight –log(1-p(e))
•For any WU, ≥1 edge connects W with U\W with probability q (e.g. 0.95) The weight of the minimum cut of U is at least -log(1-q)
•Algorithm: among the subnets whose minimum cut exceeds -log(1-q) find the one with the maximum co-expression score
P({A},{B,C,D})=1-0.3*0.3=0.91
P({A,C,D},{B})=0.94P({A,B},{C,D})=0.94
P({A,B,D},{C})=0.994
minimum cut 0.7
0.9
0.70.8
A
B
C
D
36
DNA damage response in S. cerevisiae• 47 DNA Damage Response
expression profiles (Gasch et al., 01)
• Front nodes: 2,074 genes with at least two-fold expression change
• Network and confidence values: purification enrichment (PE) scores (Collins et al. 07)
38
Module size GO biological process p-value GO-slim protein complexes p-value
346
ribosome biogenesis and assembly 1.2·10-117 ribosome 5.9·10-91
translation 1.0·10-85 eukaryotic 43S preinitiation complex 3.8·10-49
rRNA processing 7.5·10-79 small nucleolar ribonucleoprotein complex 1.5·10-41
35S primary transcript processing 4.6·10-44 DNA-directed RNA polymerase III complex 3.1·10-17
ribosome assembly 4.3·10-39 exosome (RNase complex) 4.4·10-15
ribosomal large subunit biogenesis 9.2·10-14 DNA-directed RNA polymerase I complex 5.7·10-14
rRNA modification 4.4·10-12 Noc complex 3.2·10-6
38protein catabolism 1.8·10-46 proteasome complex (sensu Eukaryota) 5.7·10-71
proteolysis 9.0·10-44 proteasome core complex (sensu Eukaryota) 9.4·10-32
ubiquitin cycle 1.1·10-42
12histone acetylation 3.6·10-13 histone acetyltransferase complex 2.1·10-12
chromatin modification 5.9·10-11
transcription from RNA polymerase II promoter 1.4·10-6
12 translation 1.1·10-14 ribosome 1.4·10-15
12nuclear mRNA splicing, via spliceosome 3.5·10-21 spliceosome complex 3.5·10-17
small nuclear ribonucleoprotein complex 2.5·10-15
10barbed-end actin filament capping 4.8·10-6 F-actin capping protein complex 4.8·10-6
endocytosis 1.1·10-5
cytoskeleton organization and biogenesis 2.8·10-5
8 establishment and/or maintenance of chromatin architecture 1.1·10-5 chromatin remodeling complex 4.6·10-6
7 glycogen metabolism 3.0·10-8 protein phosphatase type 1 complex 3.3·10-5
sporulation (sensu Fungi) 2.0·10-6
6 translation 1.1·10-7 ribosome 4.0·10-8
6 tRNA processing 2.5·10-14 ribonuclease P complex 9.2·10-8
rRNA processing 2.2·10-9
4 trehalose biosynthesis 6.8·10-14 alpha,alpha-trehalose-phosphate synthase complex (UDP-forming) 6.8·10-14
4 ubiquitin-dependent protein catabolism 5.2·10-7
3 pseudohyphal growth 9.8·10-7 cAMP-dependent protein kinase complex 9.6·10-7
3 proteasome assembly 3.2·10-6
protein folding 3.9·10-6
DNA damage response modules
Cytoplasmic ribosome biogenesis
Proteasome
Mitochondrial ribosome – small subunit
Mitochondrial ribosome – large subunit
Spliceosome
Novel actin-localized pathway?
Hsp90
PKA
Trehalose biosynthesis
Ribonuclease P
Suggests SWS2 a novel member
Novel pathway enriched with actin-localized proteins; Supported in other datasets; Similar
deletion phenotypes
39
Comparison with prior work
Combined measure of sensitivity
(% of annotations enriched)
and specificity (% of modules enriched) with
p<0.001
Clustering of only expression data
Clustering expression &
network (Hanisch et al., 2002)
Expression similarity +
network connectivity
Expression similarity + confident network
connectivity
40
Summary•Algorithms using co-expression + networks to
detect functionally coherent modules •Accommodate both sparse and dense
subnetworks•Subnetworks linked to osmotic shock and
DNA damage•A general framework for confident
connectivity in PPI networks•The next steps:
▫Co-expression is not the only interesting way to utilize GE data
▫Scaling to complex human datasets
42