Upload
scott-asher-cook
View
226
Download
2
Tags:
Embed Size (px)
Citation preview
Pathway and network analysisFunctional interpretation of gene lists
Bing Zhang
Department of Biomedical Informatics
Vanderbilt University
Omics studies generate lists of interesting genes
2
92546_r_at
92545_f_at
96055_at
102105_f_at
102700_at
161361_s_at
92202_g_at
103548_at
100947_at
101869_s_at
102727_at
160708_at
…...
Differential expression
Clustering Lists of genes with potential biological interest
log2(ratio)
-log
10
(p v
alu
e)
Microarray
RNA-Seq
Proteomics
3
Organizing genes based on pathways
Gene 1 Pathway 1
Pathway 2
Gene 2 Pathway 1
Pathway 3
Gene 3 Pathway 2
Pathway 3
Gene 4 Pathway 3
Pathway 1 Gene 1
Gene 2
Pathway 2 Gene 1
Gene 3
Pathway 3 Gene 2
Gene 3
Gene 4
4
Advantages of pathway analysis
Better interpretation From interesting genes to interesting biological themes
Improved robustness Robust against noise in the data
Improved sensitivity Detecting minor but concordant changes in a pathway
5
Databases BioCarta (http://www.biocarta.com/genes/index.asp)
KEGG (http://www.genome.jp/kegg/pathway.html)
MetaCyc (http://metacyc.org)
Pathway commons (http://www.pathwaycommons.org)
Reactome (http://www.reactome.org)
STKE (http://stke.sciencemag.org/cm)
Signaling Gateway (http://www.signaling-gateway.org)
Wikipathways (http://www.wikipathways.org)
Limitation Limited coverage
Inconsistency among different databases
Relationship between pathways is not defined
Pathway databases
6
Structured, precisely defined, controlled vocabulary for describing the roles of genes and gene products
Three organizing principles: molecular function, biological process, and cellular component Dopamine receptor D2, the product of human gene DRD2
molecular function: dopamine receptor activity
biological process: synaptic transmission
cellular component: plasma membrane
Terms in GO are linked by several types of relationships Is_a (e.g. plasma membrane is_a membrane) Part_of (e.g. membrane is part_of cell) Has part Regulates Occurs in
Gene Ontology
8
Two types of GO annotations Electronic annotation
Manual annotation
All annotations must: be attributed to a source
indicate what evidence was found to support the GO term-gene/protein association
Types of evidence codes Experimental codes - IDA, IMP, IGI, IPI, IEP
Computational codes - ISS, IEA, RCA, IGC
Author statement - TAS, NAS
Other codes - IC, ND
Annotating genes using GO terms
9
Annotating genes using GO terms
…… ……
DLGAP1 discs, large (Drosophila) homolog-associated protein 1
DLGAP2 discs, large (Drosophila) homolog-associated protein 2
DNM1 dynamin 1
DOC2A double C2-like domains, alpha
DRD1 dopamine receptor D1
DRD1IP dopamine receptor D1 interacting protein
DRD2 dopamine receptor D2
DRD3 dopamine receptor D3
DRD4 dopamine receptor D4
DRD5 dopamine receptor D5
…… ……
Synaptic transmission
226 human genes
Cell-cell signaling
Child
Parent
10
Downloads (http://www.geneontology.org) Ontologies
http://www.geneontology.org/page/download-ontology
Annotations
http://www.geneontology.org/page/download-annotations
Web-based access AmiGO: http://www.godatabase.org
QuickGO: http://www.ebi.ac.uk/QuickGO
Access GO
11
Homo sapiens Mus musculus
#term #gene #term #gene
GO/BP 6502 15228 6227 15709
GO/MF 3144 16389 2961 17287
GO/CC 947 16765 882 16801
Coverage of GO annotations
12
Over-representation analysis: concept
Development (1842 genes) Is the observed overlap significantly larger than the expected value?
581
581
compareObserve
Expect
98
65
1842
1842Differentially expressed genes (581 genes)
Hoxa5Hoxa11Ltbp3Sox4Foxc1Edn1Ror2GnagSmad3Wdr5Trp63Sox9Pax1AcdRai1Pitx1……
Sash1Cd24aAgtPsrc1Ctla2bAngptl4Depdc7Sorbs1Macrod1Enpp2Tmem176a……
13
Over-representation analysis: method
Significant genes Non-significant genes Total
genes in the group k j-k j
Other genes n-k m-n-j+k m-j
Total n m-n m
Hypergeometric test: given a total of m genes where j genes are in the functional group, if we pick n genes randomly, what is the probability of having k or more genes from the group?
Zhang et.al. Nucleic Acids Res. 33:W741, 2005
n
Observed k
j
m
Arbitrary thresholding
Ignoring the order of genes in the significant gene list
Over-representation analysis: limitations
14
15
Gene Set Enrichment Analysis: concept
Do genes in a gene set tend to locate at the top or bottom of the ranked gene list?
16
Gene Set Enrichment Analysis: method
Subramanian et.al. PNAS 102:15545, 2005
http://www.broad.mit.edu/gsea/
+1/k
-1/(n-k)
k: Number of genes in the gene set Sn: Number of all genes in the ranked gene list
17
Organizing genes by Pathways
Gene Ontology
Enrichment analysis methods Over-representation analysis
Gene Set enrichment analysis
Major limitation Existing knowledge on pathways or gene functions is far from
complete
Pathway-based analysis
Biological networks
18
Networks Nodes Edges
Physical interaction networks
Protein-protein interaction network
Proteins Physical interaction, undirected
Signaling network Proteins Modification,directed
Gene regulatory network
TFs/miRNAsTarget genes
Physical interaction,directed
Metabolic network Metabolites Metabolic reaction,directed
Functional association networks
Co-expression network
Genes/proteins
Co-expression,undirected
Genetic network Genes Genetic interaction,undirected
Properties of complex networks
Scale-free(hubs)
Hierarchical modularSmall world(six degree separation)
Human protein-protein interaction network9,198 proteins and 36,707 interactions
Network visualization
Cytoscape (http://www.cytoscape.org)
20
Gehlenborg et al. Nature Methods, 7:S56, 2010
Network visualization tools
Proteins that lie closer to one another in a protein interaction network are more likely to have similar function and involve in similar biological process.
Network-based gene function prediction
Network-based disease gene prediction
Network distance vs functional similarity
21
Sharan et al. Mol Syst Biol, 3:88, 2007
22
Protein-protein interaction modules
Transcriptional regulatory modules Transcription factor targets
miRNA targets
Network module-based analysis Over-representation analysis
GSEA
Organizing genes based on network modules
TF
23
92546_r_at92545_f_at96055_at102105_f_at102700_at……
WebGestalt: http://www.webgestalt.org
Jul. 1, 2013 – Jun. 30, 201449,136 visits from 18,213 visitors
~200 ID types
~60K gene sets
Statistical analysis
Zhang et.al. Nucleic Acids Res. 33:W741, 2005Wang et al. Nucleic Acids Res. 41:W77, 2013
29
Organizing genes by “gene sets” Pathways
Gene Ontology
Network modules
Enrichment analysis methods Over-representation analysis: WebGestalt
Gene Set enrichment analysis: GSEA
Tools WebGestalt (Over-representation analysis)
GSEA (Gene set enrichment analysis)
Manuals for WebGestalt and GSEA in the reading folder
Summary