NetBioSIG2013-Talk David Amar

1

From gene networks to module-maps:

improving interpretability and prediction in systems biology

David AmarSchool of Computer Science

Tel Aviv UniversityJuly 2013

2

Biological interaction networksNodes: genes/proteins or other moleculesEdges based on evidence for interaction

Voineagu et al. 2011 Nature

Breker and Schuldiner 2009

Gene co-expression

Protein-protein interaction

Genetic interaction

Goal: Integrated analysis of different types of networks

3

Integration of networksBetter picture, reduces noiseTraditional approaches:

Look for “conserved” clusters co-clustering (Hanisch et al. 2002); JointCluster

(Narayanan et al. 2011), Look for clusters with special properties

MATISSE (Ulitsky and Shamir 2008)

4

Analysis of network pairsInteractions types can differ: within (“positive”)

vs. between (“negative”) functional units Input: networks P, N with same vertex setGoal: summarize both networks in a module map

Node – module: gene set highly connected in PLink – two modules highly interconnected

in NBetween-pathway models

Kelley and Ideker 2005Ulitsky et al. 2008Kelley and Kingsford 2011Leiserson et al. 2011

PN

5

AlgorithmsDifferent definitions for the links and the

optimization objective functionProblems are NP hardApproximation is also hard (weighted

graphs)

Our algorithmic strategy: Initiators: Find a good initial solutionImprovers: refine by merging/excluding

modules

6

Initiators Cluster P

HierarchicalNode addition

Find linked module pairs DICER: Local search in

the P and N (Kelley, Ideker 2005, Amar et al. 2013)

MBC-DICER: Find bi-cliques

Define candidate sets U and V that are bicliques in N

Exhaustive solver (FP-MBC Li et al. 2007) - requires tuning

7

Local Improvement (DICER algorithm, Amar et al. PLoS CB 2013)Link: sum of N weights between modules is

positiveGoal: enlarge links

Greedy approachMerge module links or add single nodes to link

8

Global analysis: node vs. moduleNull hypothesis: edges

between v and M are drawn randomly (n=deg(v))

Hyper-geometric p-valueOptions for weighted

graphs:Use Wilcoxon rank-sum

testSet a threshold and use the

same test

M

Not M

v

9

Global analysis: module vs. moduleCalculate a p-value for each

node in V and each node in UMerge p-values using Fisher’s

method

Under the null-hypothesis follows a Chi-square distribution (dfs=number of p-values)

U V

Other nodes

10

Global analysisGiven a set of modules M and a set of

significant links L, the solution score:

Improvement steps: merge modules if the score improves (select the best step iteratively)

Fast and accurate analysis:Decide when to recalculate p-values Perform many merges simultaneously

11

Experimental Results

12

(0) SimulationsGraphs with 500 nodes, edge weight 1, non edge -1Plant a tree map with 6 modules (module size 10-20)Add random Gaussian noise (mean 0, SD = 1.2), additional

modules, bi-cliques

MBC-D

ICER

DICER5

hier

arch

ical

NodeA

dditi

on

DICER

MBC-D

ICER

DICER5

NodeA

dditi

on

hier

arch

ical

DICER

MBC-D

ICER

DICER5

hier

arch

ical

NodeA

dditi

on

DICER

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Jaccard

Global Local Initiator only

13

(1) Yeast PPI and GI networks3979 genesP: protein-protein interactions (45,456 edges)N: negative genetic interactions (76,267

edges)Local improvers: poor results (less than 3

links)Results for global improver:

Initiator Modules Gene coverage

Max module

size

Enriched GO terms

Enriched modules

(%)

Enriched links (%)

Links

MBC-DICER 100 946 49 243 87 80 430

DICER5 103 957 46 249 82 74 438

DICER 104 837 34 192 67 61 498

Hierarchical 123 877 30 186 68 59 394

NodeAddition 102 950 49 240 83 79 430

14

Link p <10-50

Chromatin related hubs similar to Baryshnikova et al. 2011

The yeast module map

15

The top links in the map (p <10-70)

Between complexes

Between

subcomplexes

16

Comparison to extant methodsAnalysis of the Collins et al. 2007 dataComparing to extant methods that exploit

both positive and negative GIs and their weights

AlgorithmNumber of modules

Gene coverage

Maximal module size

Number of enriched GO terms

Percent enriched modules

Percent enriched

linksNumber of

links

MBC-DICER (Global) 32 238 20 53 84 79 67

Genecentric (Leiserson et al. 11)

116 1248 25 39 63 43 58

Kelley and Kingsford 11 117 355 17 32 17 6 403

17

(2) Arabidopsis PPI & MD networks P: PPIs. N: metabolic dependencies (Tzfadia et al. 2012)

Discover protein complexes and their metabolic links

18

Using the module map for function predictionValidated modules by their ability to predict gene

functions in MapMan Function assignment: the gene’s module best

assignmentLOOCV: precision and recall > 80%

Gene MapMan termModule p-value

AT5G48000sulfur-containing.glucosinolates 0.0001

AT5G42590sulfur-containing.glucosinolates 0.0001

AT2G30870redox.ascorbate and glutathione.ascorbate 0.0028

AT4G15440 isoprenoids.carotenoids 0.0002



New predictions

19

(3) Human case-control profilesData: expression profiles of Lung cancer (blood)P: multi-phenotype co-expression network ; N: differential

correlation (DC): change in correlation in disease vs. controls

Cross-validation: most links show high DC in the test set

Link example:

Breakage of immune activation in cancer (enrichment q-value<1E-10)

Enrichment for NSLC-specific causal miRNA (mir-34 family, p =0.002, mir2disease DB)

20

SummaryIntegration of networks

Considering different interaction typesA summary module-map

AlgorithmsInitiatorsImprovers

Algorithms perform well in simulations and real dataPPI+GIPPI+MDHuman disease: correlation and differential correlation

Next steps (?)Cytoscape app (maybe next year…)Can we use module maps instead of gene networks for network

inference?

21

Thank you!

Ron Shamir

Health & Medicine

NetBioSIG2013-Talk David Amar