29
Pathway and network analysis Functional interpretation of gene lists Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected]

Pathway and network analysis Functional interpretation of gene lists Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected]

Embed Size (px)

Citation preview

Pathway and network analysisFunctional interpretation of gene lists

Bing Zhang

Department of Biomedical Informatics

Vanderbilt University

[email protected]

Omics studies generate lists of interesting genes

2

92546_r_at

92545_f_at

96055_at

102105_f_at

102700_at

161361_s_at

92202_g_at

103548_at

100947_at

101869_s_at

102727_at

160708_at

…...

Differential expression

Clustering Lists of genes with potential biological interest

log2(ratio)

-log

10

(p v

alu

e)

Microarray

RNA-Seq

Proteomics

3

Organizing genes based on pathways

Gene 1 Pathway 1

Pathway 2

Gene 2 Pathway 1

Pathway 3

Gene 3 Pathway 2

Pathway 3

Gene 4 Pathway 3

Pathway 1 Gene 1

Gene 2

Pathway 2 Gene 1

Gene 3

Pathway 3 Gene 2

Gene 3

Gene 4

4

Advantages of pathway analysis

Better interpretation From interesting genes to interesting biological themes

Improved robustness Robust against noise in the data

Improved sensitivity Detecting minor but concordant changes in a pathway

5

Databases BioCarta (http://www.biocarta.com/genes/index.asp)

KEGG (http://www.genome.jp/kegg/pathway.html)

MetaCyc (http://metacyc.org)

Pathway commons (http://www.pathwaycommons.org)

Reactome (http://www.reactome.org)

STKE (http://stke.sciencemag.org/cm)

Signaling Gateway (http://www.signaling-gateway.org)

Wikipathways (http://www.wikipathways.org)

Limitation Limited coverage

Inconsistency among different databases

Relationship between pathways is not defined

Pathway databases

6

Structured, precisely defined, controlled vocabulary for describing the roles of genes and gene products

Three organizing principles: molecular function, biological process, and cellular component Dopamine receptor D2, the product of human gene DRD2

molecular function: dopamine receptor activity

biological process: synaptic transmission

cellular component: plasma membrane

Terms in GO are linked by several types of relationships Is_a (e.g. plasma membrane is_a membrane) Part_of (e.g. membrane is part_of cell) Has part Regulates Occurs in

Gene Ontology

7

Gene Ontology

8

Two types of GO annotations Electronic annotation

Manual annotation

All annotations must: be attributed to a source

indicate what evidence was found to support the GO term-gene/protein association

Types of evidence codes Experimental codes - IDA, IMP, IGI, IPI, IEP

Computational codes - ISS, IEA, RCA, IGC

Author statement - TAS, NAS

Other codes - IC, ND

Annotating genes using GO terms

9

Annotating genes using GO terms

…… ……

DLGAP1 discs, large (Drosophila) homolog-associated protein 1

DLGAP2 discs, large (Drosophila) homolog-associated protein 2

DNM1 dynamin 1

DOC2A double C2-like domains, alpha

DRD1 dopamine receptor D1

DRD1IP dopamine receptor D1 interacting protein

DRD2 dopamine receptor D2

DRD3 dopamine receptor D3

DRD4 dopamine receptor D4

DRD5 dopamine receptor D5

…… ……

Synaptic transmission

226 human genes

Cell-cell signaling

Child

Parent

10

Downloads (http://www.geneontology.org) Ontologies

http://www.geneontology.org/page/download-ontology

Annotations

http://www.geneontology.org/page/download-annotations

Web-based access AmiGO: http://www.godatabase.org

QuickGO: http://www.ebi.ac.uk/QuickGO

Access GO

11

Homo sapiens Mus musculus

#term #gene #term #gene

GO/BP 6502 15228 6227 15709

GO/MF 3144 16389 2961 17287

GO/CC 947 16765 882 16801

Coverage of GO annotations

12

Over-representation analysis: concept

Development (1842 genes) Is the observed overlap significantly larger than the expected value?

581

581

compareObserve

Expect

98

65

1842

1842Differentially expressed genes (581 genes)

Hoxa5Hoxa11Ltbp3Sox4Foxc1Edn1Ror2GnagSmad3Wdr5Trp63Sox9Pax1AcdRai1Pitx1……

Sash1Cd24aAgtPsrc1Ctla2bAngptl4Depdc7Sorbs1Macrod1Enpp2Tmem176a……

13

Over-representation analysis: method

Significant genes Non-significant genes Total

genes in the group k j-k j

Other genes n-k m-n-j+k m-j

Total n m-n m

Hypergeometric test: given a total of m genes where j genes are in the functional group, if we pick n genes randomly, what is the probability of having k or more genes from the group?

Zhang et.al. Nucleic Acids Res. 33:W741, 2005

n

Observed k

j

m

Arbitrary thresholding

Ignoring the order of genes in the significant gene list

Over-representation analysis: limitations

14

15

Gene Set Enrichment Analysis: concept

Do genes in a gene set tend to locate at the top or bottom of the ranked gene list?

16

Gene Set Enrichment Analysis: method

Subramanian et.al. PNAS 102:15545, 2005

http://www.broad.mit.edu/gsea/

+1/k

-1/(n-k)

k: Number of genes in the gene set Sn: Number of all genes in the ranked gene list

17

Organizing genes by Pathways

Gene Ontology

Enrichment analysis methods Over-representation analysis

Gene Set enrichment analysis

Major limitation Existing knowledge on pathways or gene functions is far from

complete

Pathway-based analysis

Biological networks

18

Networks Nodes Edges

Physical interaction networks

Protein-protein interaction network

Proteins Physical interaction, undirected

Signaling network Proteins Modification,directed

Gene regulatory network

TFs/miRNAsTarget genes

Physical interaction,directed

Metabolic network Metabolites Metabolic reaction,directed

Functional association networks

Co-expression network

Genes/proteins

Co-expression,undirected

Genetic network Genes Genetic interaction,undirected

Properties of complex networks

Scale-free(hubs)

Hierarchical modularSmall world(six degree separation)

Human protein-protein interaction network9,198 proteins and 36,707 interactions

Network visualization

Cytoscape (http://www.cytoscape.org)

20

Gehlenborg et al. Nature Methods, 7:S56, 2010

Network visualization tools

Proteins that lie closer to one another in a protein interaction network are more likely to have similar function and involve in similar biological process.

Network-based gene function prediction

Network-based disease gene prediction

Network distance vs functional similarity

21

Sharan et al. Mol Syst Biol, 3:88, 2007

22

Protein-protein interaction modules

Transcriptional regulatory modules Transcription factor targets

miRNA targets

Network module-based analysis Over-representation analysis

GSEA

Organizing genes based on network modules

TF

23

92546_r_at92545_f_at96055_at102105_f_at102700_at……

WebGestalt: http://www.webgestalt.org

Jul. 1, 2013 – Jun. 30, 201449,136 visits from 18,213 visitors

~200 ID types

~60K gene sets

Statistical analysis

Zhang et.al. Nucleic Acids Res. 33:W741, 2005Wang et al. Nucleic Acids Res. 41:W77, 2013

24

WebGestalt output: Enriched GO terms

Response to unfolded proteins

12 genesadjp=1.32e-08

25

WebGestalt output: an enriched pathways

TGF Beta Signaling Pathway

Input genes

26

WebGestalt output: enriched network modules

27

GSEA: http://www.broadinstitute.org/gsea

28

GSEA: output

29

Organizing genes by “gene sets” Pathways

Gene Ontology

Network modules

Enrichment analysis methods Over-representation analysis: WebGestalt

Gene Set enrichment analysis: GSEA

Tools WebGestalt (Over-representation analysis)

GSEA (Gene set enrichment analysis)

Manuals for WebGestalt and GSEA in the reading folder

Summary