58
Network integration and function prediction: Putting it all together Curtis Huttenhower 04-13-11 rvard School of Public Health partment of Biostatistics

Network integration and function prediction: Putting it all together

  • Upload
    ursa

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Network integration and function prediction: Putting it all together. Curtis Huttenhower 04-13-11. Harvard School of Public Health Department of Biostatistics. Outline. Functional network integration Bayes nets and LR The human genome, tissues, and disease Network meta-analysis - PowerPoint PPT Presentation

Citation preview

Page 1: Network integration and function prediction: Putting it all together

Network integration and function prediction:Putting it all together

Curtis Huttenhower

04-13-11Harvard School of Public HealthDepartment of Biostatistics

Page 2: Network integration and function prediction: Putting it all together

2

Outline

• Functional network integration– Bayes nets and LR– The human genome, tissues, and disease

• Network meta-analysis– Pathogens and MTb– Quantifying progress in yeast

• Networks to pathways– Functional mapping: networks of networks– Hierarchical integration– Pathway prediction

• Regulatory network integration– Network motifs

Page 3: Network integration and function prediction: Putting it all together

3

A computational definition offunctional genomics

Genomic data Prior knowledge

Data↓

Function

Function↓

Function

Gene↓

Gene

Gene↓

Function

Page 4: Network integration and function prediction: Putting it all together

4

A framework for functional genomics

HighSimilarity

LowSimilarity

HighCorrelation

LowCorrelation

G1G2

+

G4G9

+…

G3G6

-

G7G8

-…

G2G5

?

0.9 0.7 … 0.1 0.2 … 0.8

+ - … - - … +

0.8 0.5 … 0.05 0.1 … 0.6

HighCorrelation

LowCorrelation

Freq

uenc

y

Coloc.Not coloc.

Freq

uenc

y

SimilarDissim.

Freq

uenc

y

P(G2-G5|Data) = 0.85

100Ms gene pairs →

← 1

Ks

data

sets

+ =

Page 5: Network integration and function prediction: Putting it all together

5

MEFIT: A Framework forFunctional Genomics

Golub 1999

Butte 2000

Whitfield 2002

Hansen 1998

Functional Relationship

Biological Context

Functional areaTissueDisease…

Page 6: Network integration and function prediction: Putting it all together

6

Functional networkprediction and analysis

Global interaction network

Metabolism network Signaling network Gut community network

Currently includes data from30,000 human experimental results,

15,000 expression conditions +15,000 diverse others, analyzed for

200 biological functions and150 diseases

HEFalMp

Page 7: Network integration and function prediction: Putting it all together

7

HEFalMp: Predicting human gene function

HEFalMp

Page 8: Network integration and function prediction: Putting it all together

8

HEFalMp: Predicting humangenetic interactions

HEFalMp

Page 9: Network integration and function prediction: Putting it all together

9

HEFalMp: Analyzing human genomic data

HEFalMp

Page 10: Network integration and function prediction: Putting it all together

10

HEFalMp: Understanding human disease

HEFalMp

Page 11: Network integration and function prediction: Putting it all together

11

Validating Human Predictions

Autophagy

Luciferase(Negative control)

ATG5(Positive control) LAMP2 RAB11A

NotStarved

Starved(Autophagic)

Predicted novel autophagy proteins

5½ of 7 predictions currently confirmed

With Erin Haley, Hilary Coller

Page 12: Network integration and function prediction: Putting it all together

12

Outline

• Functional network integration– Bayes nets and LR– The human genome, tissues, and disease

• Network meta-analysis– Pathogens and MTb– Quantifying progress in yeast

• Networks to pathways– Functional mapping: networks of networks– Hierarchical integration– Pathway prediction

• Regulatory network integration– Network motifs

Page 13: Network integration and function prediction: Putting it all together

13

Meta-analysis for unsupervisedfunctional data integration

Evangelou 2007

Huttenhower 2006Hibbs 2007

11log

21'

'

''

z

eiey ,

ieeeiey ,,

i

ieiee yw ,*,̂

22,

*, ˆ

1

eieie s

w

Simple regression:All datasets are equally accurate

Random effects:Variation within and

among datasets and interactions

Page 14: Network integration and function prediction: Putting it all together

14

Meta-analysis for unsupervisedfunctional data integration

Evangelou 2007

Huttenhower 2006Hibbs 2007

11log

21'

'

''

z

+ =

Page 15: Network integration and function prediction: Putting it all together

15

Unsupervised data integration:TB virulence and ESX-1 secretionWith Sarah Fortune

Graphle http://huttenhower.sph.harvard.edu/graphle/

Page 16: Network integration and function prediction: Putting it all together

16

Unsupervised data integration:TB virulence and ESX-1 secretionWith Sarah Fortune

Graphle http://huttenhower.sph.harvard.edu/graphle/

X?

Page 17: Network integration and function prediction: Putting it all together

17

Predicting gene function

Cell cycle genes

Predicted relationships between genes

HighConfidence

LowConfidence

Page 18: Network integration and function prediction: Putting it all together

18

Predicting gene functionPredicted relationships

between genes

HighConfidence

LowConfidence

Cell cycle genes

Page 19: Network integration and function prediction: Putting it all together

19

Cell cycle genes

Predicting gene functionPredicted relationships

between genes

HighConfidence

LowConfidence

These edges provide a measure of how likely a gene is to

specifically participate in the process of

interest.

Page 20: Network integration and function prediction: Putting it all together

20

Comprehensive validation of computational predictions

Genomic data

Computational Predictions of Gene FunctionMEFITSPELL

Hibbs et al 2007bioPIXIEMyers et al 2005

Genes predicted to function in mitochondrion organization

and biogenesis

Laboratory ExperimentsPetite

frequencyGrowthcurves

Confocal microscopy

New known functions for correctly predicted genes

Retraining

With David Hess, Amy Caudy

Prior knowledge

Page 21: Network integration and function prediction: Putting it all together

21

Evaluating the performance of computational predictions

106Original GO Annotations

Genes involved in mitochondrion organization and biogenesis

135Under-annotations

82Novel Confirmations,

First Iteration

17Novel Confirmations,

Second Iteration

340 total: >3x previously known genes in ~5 person-months

Page 22: Network integration and function prediction: Putting it all together

22

Evaluating the performance of computational predictions

106Original GO Annotations

Genes involved in mitochondrion organization and biogenesis

95Under-annotations

40Confirmed

Under-annotations

80Novel Confirmations

First Iteration

17Novel Confirmations

Second Iteration

340 total: >3x previously known genes in ~5 person-months

Computational predictions from large collections of genomic data can be

accurate despite incomplete or misleading gold standards, and they

continue to improve as additional data are incorporated.

Page 23: Network integration and function prediction: Putting it all together

23

Outline

• Functional network integration– Bayes nets and LR– The human genome, tissues, and disease

• Network meta-analysis– Pathogens and MTb– Quantifying progress in yeast

• Networks to pathways– Functional mapping: networks of networks– Hierarchical integration– Pathway prediction

• Regulatory network integration– Network motifs

Page 24: Network integration and function prediction: Putting it all together

24

Functional mapping: mining integrated networks

Predicted relationships between genes

HighConfidence

LowConfidence

The strength of these relationships indicates how

cohesive a process is.

Chemotaxis

Page 25: Network integration and function prediction: Putting it all together

25

Functional mapping: mining integrated networks

Predicted relationships between genes

HighConfidence

LowConfidence

Chemotaxis

Page 26: Network integration and function prediction: Putting it all together

26

Functional mapping: mining integrated networks

Flagellar assembly

The strength of these relationships indicates how

associated two processes are.

Predicted relationships between genes

HighConfidence

LowConfidence

Chemotaxis

Page 27: Network integration and function prediction: Putting it all together

27

Functional mapping:Associations among processes

EdgesAssociations between processes

VeryStrong

ModeratelyStrong

Hydrogen Transport

Electron Transport

Cellular Respiration

Protein ProcessingPeptide

Metabolism

Cell Redox HomeostasisAldehyde

Metabolism

Energy Reserve

Metabolism

Vacuolar Protein

CatabolismNegative Regulation

of Protein Metabolism

Organelle Fusion

Protein Depolymerization

Organelle Inheritance

Page 28: Network integration and function prediction: Putting it all together

28

Functional mapping:Associations among processes

EdgesAssociations between processes

VeryStrong

ModeratelyStrong

BordersData coverage of processes

WellCovered

SparselyCovered

Hydrogen Transport

Electron Transport

Cellular Respiration

Protein ProcessingPeptide

Metabolism

Cell Redox HomeostasisAldehyde

Metabolism

Energy Reserve

Metabolism

Vacuolar Protein

CatabolismNegative Regulation

of Protein Metabolism

Organelle Fusion

Protein Depolymerization

Organelle Inheritance

Page 29: Network integration and function prediction: Putting it all together

29

Functional mapping:Associations among processes

EdgesAssociations between processes

VeryStrong

ModeratelyStrong

NodesCohesiveness of processes

BelowBaseline

Baseline(genomic

background)

VeryCohesive

BordersData coverage of processes

WellCovered

SparselyCovered

Hydrogen Transport

Electron Transport

Cellular Respiration

Protein ProcessingPeptide

Metabolism

Cell Redox HomeostasisAldehyde

Metabolism

Energy Reserve

Metabolism

Vacuolar Protein

CatabolismNegative Regulation

of Protein Metabolism

Organelle Fusion

Protein Depolymerization

Organelle Inheritance

Page 30: Network integration and function prediction: Putting it all together

30

Functional mapping:Associations among processes

EdgesAssociations between processes

VeryStrong

ModeratelyStrong

NodesCohesiveness of processes

BelowBaseline

Baseline(genomic

background)

VeryCohesive

BordersData coverage of processes

WellCovered

SparselyCovered

Page 31: Network integration and function prediction: Putting it all together

• Gene expression

• Physical PPIs

• Genetic interactions

• Colocalization

• Sequence

• Protein domains

• Regulatory binding

sites

?

How do functional interactionsbecome pathways?

31

+ =

Page 32: Network integration and function prediction: Putting it all together

Functional genomic data

32

With Chris Park, Olga Troyanskaya

Simultaneous inference of physical, genetic, regulatory, and functional networks

Functional interactions

Regulatory interactions

Post-transcriptional regulation

Metabolic interactions

Phosphorylation Protein complexes

Page 33: Network integration and function prediction: Putting it all together

33

Learning a compendium of interaction networks

Train one SVM per interaction type

Resolve consistency using hierarchical Bayes net

Page 34: Network integration and function prediction: Putting it all together

34

Learning a compendium of interaction networks

AUC

0.5 1.0

Both presence/absence and directionality of

interactions are accurately inferred

Page 35: Network integration and function prediction: Putting it all together

35

Using network compendia to predictcomplete pathways

Additional 20 novel synthetic lethality predictions tested,

14 confirmed(>100x better than random)

Confirmed

Unconfirmed

With David Hess

Page 36: Network integration and function prediction: Putting it all together

36

Interactive aligned network viewer –http://function.princeton.edu/bioweaver

Graphle

Page 37: Network integration and function prediction: Putting it all together

37

Outline

• Functional network integration– Bayes nets and LR– The human genome, tissues, and disease

• Network meta-analysis– Pathogens and MTb– Quantifying progress in yeast

• Networks to pathways– Functional mapping: networks of networks– Hierarchical integration– Pathway prediction

• Regulatory network integration– Network motifs

Page 38: Network integration and function prediction: Putting it all together

38

• Of only five regulators found, four have

generic cell cycle/proliferation targets

• Just five basic regulators for ~7,000 genes?

• These motifs only appear upstream of ~half

of the genes

Human Regulatory Networks

G0

I

III

IV

V

VIVII

IX

VIII

II

X

6,829genes

Serum re-stimulated (hrs)Serum starved (hrs)1

5< <50

2 4 8 24 96 1 2 4 8 24 48

Dev

elop

men

t

Dev

elop

men

t

Cho

lest

erol

Pro

tein

loca

lizat

ion

Cel

l cyc

le

RN

A pr

oces

sing

Met

abol

ism

FIRE: Elemento et al. 2007

Elk-1

Sp1

NF-Y

YY1

Quiescence: reversible exit from the cell cycle

Page 39: Network integration and function prediction: Putting it all together

39

COALESCE: Combinatorial Algorithm forExpression and Sequence-based Cluster Extraction

Gene Expression DNA Sequence

5’ UTR 3’ UTR

Upstream flank Downstream flank

Evolutionary Conservation

Nucleosome Positions

Identify conditions where genes

coexpress

Identify motifs enriched in

genes’ sequences

Create a new module

Select genes based on conditions

and motifs

Subtract mean from all data

Regulatory modules• Coregulated genes• Conditions where they’re

coregulated• Putative regulating motifs

Feature selection:Tests for differential expression/frequency

Bayesian integration

Page 40: Network integration and function prediction: Putting it all together

40

COALESCE: SelectingCoexpressed Conditions

• For each gene expression condition…– Compare distributions of values for

• Genes in the module versus• Genes not in the module

– If significantly different, include the condition

Preserving data structure:• If multiple conditions derive from the

samedataset, can be included/excluded as a

unit• For example, time course vs. deletion

collection• Test using multivariate z-test• Precalculate covariance matrix; still very

efficient

Page 41: Network integration and function prediction: Putting it all together

41

COALESCE: SelectingSignificant Motifs

• Coalesce looks for three kinds of motifs:– K-mers– Reverse complement pairs– Probabilistic Suffix Trees (PSTs)

• For every possible motif…– Compare distributions of values for

• Genes in the module versus• Genes not in the module

– If significantly different, include the motif

ACGACGT

ACGACAT | ATGTCGT

A

TC

G

T

TG

CA

• This can distinguish flanks from UTRs• Fast!• Efficient enough to search coding sequence

(e.g. exons/introns)

Page 42: Network integration and function prediction: Putting it all together

42

COALESCE: SelectingProbable Genes

• For each gene in the genome…For each significant condition… For each significant motif…

What’s the probability the gene came from the module’s distribution?

What’s the probability that it came from outside the module?

)()|()()|()()|()|(

MgPMgDPMgPMgDPMgPMgDPDMgP

Distributions of each feature in and out of the developing module are observed from the data.

Prior is used to stabilize module convergence; genes already in the module are more likely to stay there next iteration.

The probability of a gene being in the module given some data…

Page 43: Network integration and function prediction: Putting it all together

43

COALESCE: IntegratingAdditional Data Types

Nucleosome placement Evolutionary conservation

• Can be included as additional datasets and feature

selected just like expression conditions/motifs.

• Or can be used as a prior or weight on the values of

individual motifs.

N CG1 2.5 0.0

G2 0.6 0.5

G3 1.2 0.9

… … …

TCCGGTAGAACTACTGGTATTGTTTTGGATTCCGGTGATG

Page 44: Network integration and function prediction: Putting it all together

44

COALESCE Results:S. cerevisiae Modules

~2,200 conditions

~6,000 genes

The haystack

A needle100 genes

80 conditions

Page 45: Network integration and function prediction: Putting it all together

45

COALESCE Results:S. cerevisiae Modules

54 genes, 144 conditionsConjugation

33 genes, 434 conditionsBudding

112 genes, 82 conditionsMitosis and DNA replication

Swi5

Stb1/Swi6Ste12

Page 46: Network integration and function prediction: Putting it all together

46

COALESCE Results:S. cerevisiae Modules

50 genes, 775 conditionsIron transport

11 genes, 844 conditionsPhosphate transport

126 genes, 660 conditionsGlycolysis, iron and phosphate transport, amino acid metabolism…

Pho4

Helix-Loop-HelixTye7/Cbf1/Pho4

Aft1/2

Page 47: Network integration and function prediction: Putting it all together

47

COALESCE Results:S. cerevisiae Modules

72 genes, 319 conditionsMitochondrial translation Puf3

…plus more ribosome clusters

than you can shake a stick at!

Page 48: Network integration and function prediction: Putting it all together

48

COALESCE Results:Yeast TF/Target Accuracy

Bas1p Hap4p Met32p Cup2p Met31p Zap1p Upc2p Mbp1p Hsf1p Gln3p Hap3p Gcn4p Uga3p Gis1p Hap5p

-0.3

-0.1

0.1

0.3

0.5

0.7

0.9

1.1

1.3

COALESCEcMonkeyFIREWeeder

Z-Sc

ore

Page 49: Network integration and function prediction: Putting it all together

49

COALESCE Results:TF/Targets Influenced by Supporting Data

Sfl1p Gcr1p Uga3p Mot3p Sum1p Cst6p Mig3p

-0.5

0

0.5

1

1.5

2

2.5

3

COALESCECOALESCE, conservationCOALESCE, nucleosomesCOALESCE, cons. + nuc.

Z-Sc

ore

Improved by any addl. data, mainly conservation

Decreased by addl. data Improved by conservation

Improved only by both

Page 50: Network integration and function prediction: Putting it all together

50

COALESCE Results:Yeast Clustering Accuracy

• ~2,200 yeast conditions– Recapitulation of known biology from Gene Ontology

Page 51: Network integration and function prediction: Putting it all together

51

COALESCE Results:Yeast Clustering Accuracy

• ~2,200 yeast conditions– Recapitulation of known biology from Gene Ontology

ASCL1 in 5’ flank, unch. sequences underenriched in 3’ UTR

M. musculus: Up in callosal and motor neurons

C. elegans: Up in larvae, down in adults

GATA in 5’ flank, miR-788 seed in 3’ UTR

AAGGGGC (zf?) and enriched in 5’ flank

H. sapiens: Up in normal muscle, down in diabetic

Page 52: Network integration and function prediction: Putting it all together

52

COALESCE: Coregulated Quiescence Modules

• Predicts regulatory modules from genomic data:– Coregulated genes– Conditions under which coregulation occurs– Putative regulatory motifs

• 5 quiescence-related microarray datasets,60 conditions– Quiescence program (Coller et al. 2006)– Adenoviral infection (Miller et

al. 2007)– let-7 response

(Legesse-Miller et al. unpub.)– Contact inhibition

(Scarino et al. unpub.)– Serum withdrawal (Legesse-

Miller et al. unpub.)

Page 53: Network integration and function prediction: Putting it all together

53

COALESCE: Coregulated Quiescence Modules

Down during quiescence entry, up during quiescence exit,down with adenoviral infection

Specific predicted uncharacterized reverse complement motif

Up during quiescence entry, down during quiescence exit

Many known related (proliferation) motifs:Pax4, Staf, NFKB1, Gfi, ESR1, Runx1, Su(H)

Down during quiescence entry,enriched for transport/trafficking

miR-297 motif predicted in 3’ UTR (CACATAC)

Down with let-7 exposure

let-7 motifs predicted in 3’ UTR (UACCUC)

Page 54: Network integration and function prediction: Putting it all together

Network Motifs

54

Coherent feed-forward

filter

Incoherent feed-forward

pulse

Bi-fan

Positiveauto-regulation

delay

WGD and evolvability

Negativeauto-regulation

speed + stability

Feedback

memory

Page 55: Network integration and function prediction: Putting it all together

March 1, 2010 55

From Milo, et al., Science, 2002

Page 56: Network integration and function prediction: Putting it all together

56

Outline

• Functional network integration– Bayes nets and LR– The human genome, tissues, and disease

• Network meta-analysis– Pathogens and MTb– Quantifying progress in yeast

• Networks to pathways– Functional mapping: networks of networks– Hierarchical integration– Pathway prediction

• Regulatory network integration– Network motifs

Page 57: Network integration and function prediction: Putting it all together

1:1 Lewis Carroll Map“… And then came the grandest idea of

all! We actually made a map of the country, on the scale of a mile to the mile!"

"Have you used it much?" I enquired.

"It has never been spread out, yet," said Mein Herr: "the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.

Sylvie and Bruno Concluded by Lewis Carroll, 1893.March 1, 2010 57

Page 58: Network integration and function prediction: Putting it all together