Upload
bernice-taylor
View
259
Download
2
Tags:
Embed Size (px)
Citation preview
1
The Genome Gamble, Knowledge or Carnage?
Comparative Genomics Leading the Way @ Organon
Tim Hulsen, Oss, November 11, 2003
2
Summary
• (1) An introduction to orthology and paralogy
• (2) Orthology determination within eukaryotes
• (3) Testing the advantages of our ortholog set
• (4) Using evolutionary conservation of co-expression for function prediction
• (5) Evolutionary conservation of chromosomal distance and orientation
3
(1) An introduction to orthology and paralogy
• Homologous genes: genes that have a common ancestor
• Orthologous genes: genes that evolved from a common ancestor through a speciation event ( equivalents in different species)
• Paralogous genes: genes that evolved from a common ancestor through a duplication event
4
Orthology and paralogy explained graphically
(from http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html)
5
The importance of orthology and paralogy
• Orthology relationships especially important for function prediction: orthologous genes generally have the same function but in different species
• Paralogy relationships can be used for function prediction too: paralogous genes are often involved in the same process, but have different molecular functions (e.g. globins)
6
(2) Orthology determination within eukaryotes
• Not much eukaryotic orthology available at this moment:
•euKaryotic Orthologous Groups (KOG,NCBI)•Inparanoid•OrthoMCL
• Existing databases are either too inclusive or too restrict• Most methods rely on best bidirectional hit (E-value), while orthology is an evolutionary principle.. should be determined using phylogenetic trees!
7
Our orthology determination
within eukaryotes Hs
At, Ce, Dm, Ec, Gt, Hs, Mm, Sc, Sp
Z>20, RH>0.5*QL
24,263 groups
PHYLOME
SELECTION OF HOMOLOGS
ALIGNMENTS AND TREE
GENOME
GENOMES
TREE SCANNING
LIST Hs-Mm:85,848 pairsHs-Dm:55,934 pairsetc.
8
Our orthology determination: using phylogenetic trees
Example: BMP6 (Bone Morphogenetic Protein 6) 5 orthologous relations are defined, all Hs-Mm
9
The ortholog database: Eukaryortho
http://t2.teras.sara.nl:4086(only accessible from Organon, CMBI and SARA)
10
(3) Testing the advantages of our ortholog set
• Quality of orthology difficult to test• Orthologs should have more or less the same
function --> use conservation of function as an orthology benchmark
• Gene Ontology (GO) database: hierarchical system of function and location descriptions
• Orthologs are in same functional category when they are in the same 4th level GO Molecular Function class
11
GO molecular function benchmark
01234
• Molecular function: one of the three ‘subroots’ (together with biological process and cellular location)
• ‘True’ orthologs should share a 4th level molecular function (here: GO0019912)
• Our Hs-Mm ortholog set: 67 %• KOG Hs-Mm ortholog set: 51 %
12
Co-expression benchmark
• Second method: comparing expression profiles of each orthologous gene pair
• Using GeneLogic Expressor data set:– Human chips: 3269 samples, 44792
fragments, 115 tissue categories, 15 SNOMED tissue categories
– Mouse chips: 859 samples, 36701 fragments, 25 tissue categories, 12 SNOMED tissue categories
13
SNOMED tissue categories used for co-expression calculation
HUMAN MOUSE
1 Blood vessel 1 Blood vessel
2 Cardiovascular system
2 Cardiovascular system
3 Digestive organs
3 Digestive organs
4 Digestive system
4 Digestive system
5 Endocrine gland -
6 Female genital system
5 Female genital system
7 Hematopoietic system
6 Hematopoietic system
8 Integumentary system
7 Integumentary system
HUMAN MOUSE
9 Male genital system
8 Male genital system
10 Musculoskeletal system
9 Musculoskeletal system
11 Nervous system
10 Nervous system
12 Product of conception
-
13 Respiratory system
11 Respiratory system
14 Topographic region
-
15 Urinary tract 12 Urinary tract
14
Calculating the correlationNxy – (x)(y)
r = ------------------------------------------------- sqrt( (Nx2 - (x)2)(Ny2 – (y)2) )
Human gene 1: 206316_s_at
Mouse gene 1:162926_at
Tissue category
Human gene 2:205428_s_at
Mouse gene 2:97166_at
41.04 83.56 1 62.95 49.11
30.78 61.11 2 67.72 45.18
74.73 92.95 3 93.2 40.76
43.9 78.85 4 68.48 41.2
39.23 88.93 5 54.8 41.24
88.72 100.7 6 52.16 49.64
39.71 83.15 7 73.56 42.84
135.42 169.28 8 46.59 49.58
55.98 79.91 9 205.58 0
0 59.05 10 142.9 34.7
54.78 97.37 11 48.57 48.04
68.11 87.85 12 48.97 46.26
High correlation: 0.914167 Low correlation: -0.935731
15
Co-expression comparison of our ortholog set to the KOG set
0
0,002
0,004
0,006
0,008
0,01
0,012
0,014
0,016
-1 -0,9 -0,8 -0,7 -0,6 -0,5 -0,4 -0,3 -0,2 -0,1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
Co-expression (Correlation)
Fra
cti
on
of
pa
irs
in t
his
co
rre
lati
on
ra
ng
e
KOG rel. OUR rel.
16
(4) Using evolutionary conservation of co-expression for
function predictionHuman
Gene A Gene B
Human/MouseGene A’ Gene B’
Co-expression = Cab (-1<=corr.<=1)
Ca’b’ >= Cab
Increases probability that A and B are involved in the same process
(Co-expression calculated over 115 tissues in human, 25 in mouse)
17
GO biological process benchmark
01234
• Biological process: one of the three ‘subroots’ (together with cellular location and molecular function)
• Both orthologs and paralogs are often involved in the same process/pathway (=sharing a 4th level biological process, here: GO0007584)
18
Conservation of co-expression used in function prediction
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
-0,3 -0,2 -0,1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7
Co-expression (Correlation)
Fra
ctio
n S
ame
GO
Bio
log
ical
Pro
cess
(4t
h L
evel
)
Human Human-Human (Paralogous conservation) Human-Mouse (Orthologous conservation)
19
The importance of (conserved) co-expression for function
prediction• Co-expression without conservation can
already be used for function prediction• Paralogous conservation gives a 2x
higher accuracy• Orthologous conservation gives a 3x or
4x higher accuracy• Alternative for GO Biological Process:
KEGG Pathway database similar results
20
(5) Evolutionary conservation of chromosomal distance and
orientation
HumanGene A Gene B Distance = Dab (# bp)
Orientation = Oab (,,)Co-expression = Cab (-1<=corr.<=1)
Da’b’ <= DabOa’b’ == OabCa’b’ >= Cab
Human/Mouse
Increases probability that A and B are involved in the same process
Gene A’ Gene B’
(Co-expression calculated over 115 tissues in human, 25 in mouse)
21
Function prediction using co-expression and chromosomal
distance (without conservation)
0,000000
0,050000
0,100000
0,150000
0,200000
0,250000
0,300000
0,350000
0,400000
0,450000
0,500000
FractionSame GoProcess
(4th Level)
-1-0,9
-0,8-0,7
-0,6-0,5
-0,4-0,3
-0,2-0,10
0,10,2
0,30,4
0,50,6
0,70,8
0,9
1000000
5000000
9000000
13000000
17000000
21000000
25000000
29000000
33000000
37000000
41000000
45000000
49000000
53000000
57000000
61000000
65000000
69000000
73000000
77000000
81000000
85000000
89000000
93000000
97000000
Co-expression(Correlation)
Chromosomal Distance
22
Conservation of chromosomal distance used in function
prediction
0.0
0.1
0.2
0.3
0.4
0.5
0.6
-0.3 -0.2 -0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Co-expression (Correlation)
Fra
ctio
n S
ame
GO
Pro
cess
(4t
h L
evel
)
Human Coexpr. Human Coexpr. + Dist. < 10 Mbp
Human-Human Paral. Cons. Coexpr. Human-Human Paral. Cons. Coexpr. + Dist. < 10 Mbp
Human-Mouse Orthol. Cons. Coexpr. Human-Mouse Orthol. Cons. Coexpr. + Dist. < 10 Mbp
23
The importance of chromosomal distance and orientation for function
prediction
• Chromosomal distance in eukaryotes less important than in prokaryotes (due to the absence of operons)
• Only genes with distance < 1 Mbp seem to be coregulated
• Conservation of relative orientation seems to be important only for very close gene pairs
• Limited number of genes can be functional annotated using the conservation of chromosomal distance and orientation
24
Conclusions
• Orthologous and paralogous relations can be used to improve function prediction
• Our orthologous pairs of Protein World proteins perform better than KOG, in terms of co-expression and involvement in the same process
• Chromosomal distance and relative orientation between genes can be used for function prediction too, in a limited number of cases
• Future plans: find examples where the function of a protein can be predicted using these methods
25
Credits
• Martijn Huynen• Peter Groenen• Others at Comics• Others at Organon Bioinf.