NetBioSIG2013-Talk David Amar

From gene networks to module-maps:

improving interpretability and prediction in systems biology

David AmarSchool of Computer Science

Tel Aviv UniversityJuly 2013

Biological interaction networksNodes: genes/proteins or other moleculesEdges based on evidence for interaction

Voineagu et al. 2011 Nature

Breker and Schuldiner 2009

Gene co-expression

Protein-protein interaction

Genetic interaction

Goal: Integrated analysis of different types of networks

Integration of networksBetter picture, reduces noiseTraditional approaches:

Look for “conserved” clusters co-clustering (Hanisch et al. 2002); JointCluster

(Narayanan et al. 2011), Look for clusters with special properties

MATISSE (Ulitsky and Shamir 2008)

Analysis of network pairsInteractions types can differ: within (“positive”)

vs. between (“negative”) functional units Input: networks P, N with same vertex setGoal: summarize both networks in a module map

Node – module: gene set highly connected in PLink – two modules highly interconnected

in NBetween-pathway models

Kelley and Ideker 2005Ulitsky et al. 2008Kelley and Kingsford 2011Leiserson et al. 2011

AlgorithmsDifferent definitions for the links and the

optimization objective functionProblems are NP hardApproximation is also hard (weighted

graphs)

Our algorithmic strategy: Initiators: Find a good initial solutionImprovers: refine by merging/excluding

modules

Initiators Cluster P

HierarchicalNode addition

Find linked module pairs DICER: Local search in

the P and N (Kelley, Ideker 2005, Amar et al. 2013)

MBC-DICER: Find bi-cliques

Define candidate sets U and V that are bicliques in N

Exhaustive solver (FP-MBC Li et al. 2007) - requires tuning

Local Improvement (DICER algorithm, Amar et al. PLoS CB 2013)Link: sum of N weights between modules is

positiveGoal: enlarge links

Greedy approachMerge module links or add single nodes to link

Global analysis: node vs. moduleNull hypothesis: edges

between v and M are drawn randomly (n=deg(v))

Hyper-geometric p-valueOptions for weighted

graphs:Use Wilcoxon rank-sum

testSet a threshold and use the

same test

Global analysis: module vs. moduleCalculate a p-value for each

node in V and each node in UMerge p-values using Fisher’s

method

Under the null-hypothesis follows a Chi-square distribution (dfs=number of p-values)

Other nodes

Global analysisGiven a set of modules M and a set of

significant links L, the solution score:

Improvement steps: merge modules if the score improves (select the best step iteratively)

Fast and accurate analysis:Decide when to recalculate p-values Perform many merges simultaneously

Experimental Results

(0) SimulationsGraphs with 500 nodes, edge weight 1, non edge -1Plant a tree map with 6 modules (module size 10-20)Add random Gaussian noise (mean 0, SD = 1.2), additional

modules, bi-cliques

DICER5

Jaccard

Global Local Initiator only

(1) Yeast PPI and GI networks3979 genesP: protein-protein interactions (45,456 edges)N: negative genetic interactions (76,267

edges)Local improvers: poor results (less than 3

links)Results for global improver:

Initiator Modules Gene coverage

Max module

Enriched GO terms

Enriched modules

Enriched links (%)

MBC-DICER 100 946 49 243 87 80 430

DICER5 103 957 46 249 82 74 438

DICER 104 837 34 192 67 61 498

Hierarchical 123 877 30 186 68 59 394

NodeAddition 102 950 49 240 83 79 430

Link p <10-50

Chromatin related hubs similar to Baryshnikova et al. 2011

The yeast module map

The top links in the map (p <10-70)

Between complexes

Between

subcomplexes

Comparison to extant methodsAnalysis of the Collins et al. 2007 dataComparing to extant methods that exploit

both positive and negative GIs and their weights

AlgorithmNumber of modules

Gene coverage

Maximal module size

Number of enriched GO terms

Percent enriched modules

Percent enriched

linksNumber of

MBC-DICER (Global) 32 238 20 53 84 79 67

Genecentric (Leiserson et al. 11)

116 1248 25 39 63 43 58

Kelley and Kingsford 11 117 355 17 32 17 6 403

(2) Arabidopsis PPI & MD networks P: PPIs. N: metabolic dependencies (Tzfadia et al. 2012)

Discover protein complexes and their metabolic links

Using the module map for function predictionValidated modules by their ability to predict gene

functions in MapMan Function assignment: the gene’s module best

assignmentLOOCV: precision and recall > 80%

Gene MapMan termModule p-value

AT5G48000sulfur-containing.glucosinolates 0.0001

AT5G42590sulfur-containing.glucosinolates 0.0001

AT2G30870redox.ascorbate and glutathione.ascorbate 0.0028

AT4G15440 isoprenoids.carotenoids 0.0002

New predictions

(3) Human case-control profilesData: expression profiles of Lung cancer (blood)P: multi-phenotype co-expression network ; N: differential

correlation (DC): change in correlation in disease vs. controls

Cross-validation: most links show high DC in the test set

Link example:

Breakage of immune activation in cancer (enrichment q-value<1E-10)

Enrichment for NSLC-specific causal miRNA (mir-34 family, p =0.002, mir2disease DB)

SummaryIntegration of networks

Considering different interaction typesA summary module-map

AlgorithmsInitiatorsImprovers

Algorithms perform well in simulations and real dataPPI+GIPPI+MDHuman disease: correlation and differential correlation

Next steps (?)Cytoscape app (maybe next year…)Can we use module maps instead of gene networks for network

inference?

Thank you!

Ron Shamir

NetBioSIG2013-Talk David Amar

Health & Medicine

Amar Significa Amar

Amar santhosh

Aprendiendo Amar

NetBioSIG2013-KEYNOTE Natasa Przulj

Refleja. Amar

NetBioSIG2013-Talk Tijana Milenkovic

AMAR MAkruf

NetBioSIG2013-Intro Alexander Pico

NetBioSIG2013-Talk Vuk Janjic

Como amar .net sin amar a microsoft

NetBioSIG2013-Talk Gang Su

Amar ujala

NetBioSIG2013-KEYNOTE Michael Schroeder

Amar Horoscope

Amar project

Amar a Jesus Amar Al Hermano Karl Rahner

Saber Amar

NetBioSIG2013-KEYNOTE Stefan Schuster

Dejate amar

NetBioSIG2013-KEYNOTE Esti Yeger-Lotem