22
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts http://acgt.cs.tau.ac.il/matiss e Igor Ulitsky and Ron Shamir Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007).

MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification

Embed Size (px)

Citation preview

Page 1: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

MATISSE - Modular Analysis for Topology of Interactions

and Similarity SEts

http://acgt.cs.tau.ac.il/matisseIgor Ulitsky and Ron Shamir Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007).

Page 2: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Microarray data analysis

• Input: expression levels of (all) genes in several conditions

• Analysis methods:• Clustering (CLICK)• Biclustering (SAMBA)• Extraction of

regulatory networks

Page 3: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Protein interaction network analysis

• Input: Network with nodes=proteins/genes edges=interactions

• Analysis methods:• Global properties• Motif content analysis• Complex extraction• Cross-species

comparison

Page 4: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Integrated analysis

• Combined support for low quality data

• Joint visualization• Statistics of known pathways• Detection of “hot spots”

Page 5: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

MATISSE

• Identify sets of genes (modules) that • Have highly correlated expression

patterns • Induce connected subgraphs in the

interaction network

Interaction

HighSimilarity

Page 6: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

MATISSE workflow

• Seed generation• Greedy optimization• Significance filtering

Page 7: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Advantages of MATISSE

• No need for confidence estimation on individual measurements

• Works even when only a fraction of the genes have expression patterns

• Can handle any similarity data, not only expression

• Produces connected modules• No need to specify the number of modules

Page 8: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Osmotic shock response of S. cerevisiae

• Network of 6,246 genes and 65,990 protein-protein and protein-DNA interactions

• 133 experimental conditions – response of perturbed strains to osmotic shock (O’Rourke and Herskowitz, 2004)

• 2,000 genes filtered based on variation criterion

Page 9: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

GO and promoter analysisSubnetwork Size Front Enriched GO terms P-value TFs P-Value

1 120 119 processing of 20S pre-rRNA < 0.001 Fhl1 4.82E-16rRNA processing < 0.001 Rap1 2.89E-1135S primary transcript processing < 0.001 Sfp1 2.98E-08ribosomal large subunit assembly and maintenance 0.019rRNA modification < 0.001ribosome biogenesis 0.029

2 120 118 translational elongation < 0.001 Fhl1 1.03E-053 120 118 processing of 20S pre-rRNA < 0.001

rRNA processing 0.0335S primary transcript processing 0.011ribosomal large subunit assembly and maintenance 0.019ribosomal large subunit biogenesis < 0.001

5 120 112 signal transduction during filamentous growth 0.01 Ste12 5.41E-13conjugation with cellular fusion < 0.001 Dig1 5.41E-13

6 120 99 transcription from RNA polymerase III promoter < 0.001transcription from RNA polymerase I promoter 0.006

7 120 107 ergosterol biosynthesis < 0.001hexose transport 0.019

8 114 85 chromatin remodeling 0.0511 120 114 pseudohyphal growth 0.01 Msn2 3.17E-04

response to stress < 0.001 Msn4 1.82E-1214 120 102 ubiquitin-dependent protein catabolism 0.04715 120 96 nuclear mRNA splicing, via spliceosome < 0.00116 89 61 ubiquitin-dependent protein catabolism < 0.001 Rpn4 6.44E-0617 120 109 response to stress < 0.001 Msn4 1.74E-03

mitochondrial electron transport < 0.00118 87 59 nuclear mRNA splicing, via spliceosome 0.01220 46 35 pyridoxine metabolism 0.045

Page 10: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Pheromone response subnetwork

Back

Front

Page 11: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Proteolysis subnetwork

Back

Front

Page 12: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Performance comparison

0

20

40

60

80

100

120

Matisse Co-Clustering CLICK Random

GO-Process

GO-Compartment

MIPS Phenotypes

KEGG Pathw ays

% o

f m

odu

les

% of modules with category enrichment at p< 10-3

Page 13: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Performance comparison (2)

0

5

10

15

20

25

30

35

40

45

Matisse Co-Clustering CLICK Random

GO-Process

GO-Compartment

MIPS Phenotypes

KEGG Pathw ays

% o

f an

nota

tions

% annotations w enrichment at p<10-3 in modules

Page 14: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Human cell cycle

• Constructed a network with 6,000 nodes, 25,000 edges• HPRD• BIND• Y2H studies• SPIKE

• HeLa cell cycle time series (Whitfield ’02)• Produced subnetworks enriched with all

the phases of the cell cycle

Page 15: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

M phase subnetwork

Page 16: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Extensions of MATISSE

• CEZANNE • Utilizes confidence-based networks• Extracts subnetworks that are

connected with high confidence and co-expressed

• Applied to 11 studies of gene expression in the blood

• Not yet implemented in the MATISSE application

Page 17: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification
Page 18: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Extensions of MATISSE

• DEGAS • Utilizes case-control expression data• Identifies disregulated pathways – areas in

the network in which many genes are dysregulated in most of the cases

• Beta version implemented in the MATISSE software

• Ulitsky, Karp and Shamir RECOMB 2008

Page 19: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Difficulties with prior approaches

• In case-control data, gene pattern correlation can be due to diverse non-disease related factors

• Patients are different• Genetic background• Other diseases/confounding factors• Disease grade

• Current methods assume that the same genes are dysregulated in all the patients

• A weaker assumption – a lot of dysregulated genes appear in the same dysregulated pathway

www.hrphotocontest.com

Page 20: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

HD down-regulated

• The pathway down-regulated in Huntington’s disease (HD)

• Enriched with:• HD modifiers• HD relevant genes• Calcium signalling

HuntingtinClear outlier

Page 21: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

Extensions of MATISSE

• Identification of modules correlated with external parameters• Numerical parameters: Age, tumor

grade etc.• Logical parameters: Gender, tumor type

• Identifies subnetworks with genes that are both• Correlated with the clinical parameter• Correlated with one another

Page 22: MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts  Igor Ulitsky and Ron Shamir Identification

MATISSE tool capabilities

• MATISSE algorithm execution• Dynamic subnetwork layout

• Customized node/edge highlighting

• Dynamic expression matrix viewer• Module annotation

• TANGO – Gene Ontology• Annotations with custom datasets

• Calculation of different coefficients based on network/expression