Upload
damon-morris
View
223
Download
3
Tags:
Embed Size (px)
Citation preview
MATISSE - Modular Analysis for Topology of Interactions
and Similarity SEts
http://acgt.cs.tau.ac.il/matisseIgor Ulitsky and Ron Shamir Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007).
Microarray data analysis
• Input: expression levels of (all) genes in several conditions
• Analysis methods:• Clustering (CLICK)• Biclustering (SAMBA)• Extraction of
regulatory networks
Protein interaction network analysis
• Input: Network with nodes=proteins/genes edges=interactions
• Analysis methods:• Global properties• Motif content analysis• Complex extraction• Cross-species
comparison
Integrated analysis
• Combined support for low quality data
• Joint visualization• Statistics of known pathways• Detection of “hot spots”
MATISSE
• Identify sets of genes (modules) that • Have highly correlated expression
patterns • Induce connected subgraphs in the
interaction network
Interaction
HighSimilarity
MATISSE workflow
• Seed generation• Greedy optimization• Significance filtering
Advantages of MATISSE
• No need for confidence estimation on individual measurements
• Works even when only a fraction of the genes have expression patterns
• Can handle any similarity data, not only expression
• Produces connected modules• No need to specify the number of modules
Osmotic shock response of S. cerevisiae
• Network of 6,246 genes and 65,990 protein-protein and protein-DNA interactions
• 133 experimental conditions – response of perturbed strains to osmotic shock (O’Rourke and Herskowitz, 2004)
• 2,000 genes filtered based on variation criterion
GO and promoter analysisSubnetwork Size Front Enriched GO terms P-value TFs P-Value
1 120 119 processing of 20S pre-rRNA < 0.001 Fhl1 4.82E-16rRNA processing < 0.001 Rap1 2.89E-1135S primary transcript processing < 0.001 Sfp1 2.98E-08ribosomal large subunit assembly and maintenance 0.019rRNA modification < 0.001ribosome biogenesis 0.029
2 120 118 translational elongation < 0.001 Fhl1 1.03E-053 120 118 processing of 20S pre-rRNA < 0.001
rRNA processing 0.0335S primary transcript processing 0.011ribosomal large subunit assembly and maintenance 0.019ribosomal large subunit biogenesis < 0.001
5 120 112 signal transduction during filamentous growth 0.01 Ste12 5.41E-13conjugation with cellular fusion < 0.001 Dig1 5.41E-13
6 120 99 transcription from RNA polymerase III promoter < 0.001transcription from RNA polymerase I promoter 0.006
7 120 107 ergosterol biosynthesis < 0.001hexose transport 0.019
8 114 85 chromatin remodeling 0.0511 120 114 pseudohyphal growth 0.01 Msn2 3.17E-04
response to stress < 0.001 Msn4 1.82E-1214 120 102 ubiquitin-dependent protein catabolism 0.04715 120 96 nuclear mRNA splicing, via spliceosome < 0.00116 89 61 ubiquitin-dependent protein catabolism < 0.001 Rpn4 6.44E-0617 120 109 response to stress < 0.001 Msn4 1.74E-03
mitochondrial electron transport < 0.00118 87 59 nuclear mRNA splicing, via spliceosome 0.01220 46 35 pyridoxine metabolism 0.045
Pheromone response subnetwork
Back
Front
Proteolysis subnetwork
Back
Front
Performance comparison
0
20
40
60
80
100
120
Matisse Co-Clustering CLICK Random
GO-Process
GO-Compartment
MIPS Phenotypes
KEGG Pathw ays
% o
f m
odu
les
% of modules with category enrichment at p< 10-3
Performance comparison (2)
0
5
10
15
20
25
30
35
40
45
Matisse Co-Clustering CLICK Random
GO-Process
GO-Compartment
MIPS Phenotypes
KEGG Pathw ays
% o
f an
nota
tions
% annotations w enrichment at p<10-3 in modules
Human cell cycle
• Constructed a network with 6,000 nodes, 25,000 edges• HPRD• BIND• Y2H studies• SPIKE
• HeLa cell cycle time series (Whitfield ’02)• Produced subnetworks enriched with all
the phases of the cell cycle
M phase subnetwork
Extensions of MATISSE
• CEZANNE • Utilizes confidence-based networks• Extracts subnetworks that are
connected with high confidence and co-expressed
• Applied to 11 studies of gene expression in the blood
• Not yet implemented in the MATISSE application
Extensions of MATISSE
• DEGAS • Utilizes case-control expression data• Identifies disregulated pathways – areas in
the network in which many genes are dysregulated in most of the cases
• Beta version implemented in the MATISSE software
• Ulitsky, Karp and Shamir RECOMB 2008
Difficulties with prior approaches
• In case-control data, gene pattern correlation can be due to diverse non-disease related factors
• Patients are different• Genetic background• Other diseases/confounding factors• Disease grade
• Current methods assume that the same genes are dysregulated in all the patients
• A weaker assumption – a lot of dysregulated genes appear in the same dysregulated pathway
www.hrphotocontest.com
HD down-regulated
• The pathway down-regulated in Huntington’s disease (HD)
• Enriched with:• HD modifiers• HD relevant genes• Calcium signalling
HuntingtinClear outlier
Extensions of MATISSE
• Identification of modules correlated with external parameters• Numerical parameters: Age, tumor
grade etc.• Logical parameters: Gender, tumor type
• Identifies subnetworks with genes that are both• Correlated with the clinical parameter• Correlated with one another
MATISSE tool capabilities
• MATISSE algorithm execution• Dynamic subnetwork layout
• Customized node/edge highlighting
• Dynamic expression matrix viewer• Module annotation
• TANGO – Gene Ontology• Annotations with custom datasets
• Calculation of different coefficients based on network/expression