View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Detecting horizontal gene transfers using discrepancies in species and
gene classifications
Alix Boc
Université du Québec à Montréal
Rhodobacter
Hydrogenovibrio L2
Chromatium L
Thiobacillus fe1
Nitrobacter
Xanthobacter
Rhodobacter
Xanthobacter
Nitrobacter
Chromatium L
Thiobacillus fe1
Hydrogenovibrio L2
Problem description
Species tree Gene Tree (rbcL)
Presentation summary
• Some words about phylogeny
• Network models in phylogenetic analysis
• What is a horizontal gene transfer (HGT)?
• Description of the new method
• Examples of application
• HGT-Detection vs LatTrans
• Future works
• T-Rex software
Recontruction of a phylogenetic tree
A: CGTAATB: CGTACGC: CGTCGAD: ACT………E: ………………F: ………………
A B C D E F
A 0 2 3 5 5 4
B 2 0 3 5 5 4
C 3 3 0 4 4 3
D 5 5 4 0 2 3
E 5 5 4 2 0 3
F 4 4 3 3 3 0
A
F
E
D
C
B
DNA Sequences Distance Matrix Phylogenetic Tree
Recontruction of a phylogenetic tree
AAATGATCTGCGTCAATATTATAA
GCCTGATCCTCACTACTGTCATCTTAA
ATAGGGCCCGTATTTACCCTATAG
AACTGGTCCACCCTTATACTAAAAGACGCCTCACTAGGAAGCTAA
AACTGATCTGCTTCAATAATTTAA
1 . Align sequences
AAATGATCTGCGTCAATATTA---------------------TAA
GCCTGATCCTCACTA------------------CTGTCATCTTAA
ATA---------------------GGGCCCGTATTTACCCTATAG
AACTGGTCCACCCTTATACTAAAAGACGCCTCACTAGGAAGCTAA
AACTGATCTGCTTCAATAATT---------------------TAA
• Clustal (Higgins, 1994)• Dialign (Morgenstern, 1999)• ….
2 . Apply a model of evolution (distances methods)
Recontruction of a phylogenetic tree
0 3 2 4 3
3 0 3 4 2
2 3 0 4 3
4 4 4 0 4
3 2 3 4 0
• Uncorrected Distances• Jukes Cantor• Tajima Nei• Kimura 2 parameters• Tamura• Jin-Nei Gamma• Kimura protein• LogDet• F84• ….
3 . Apply a reconstruction method (distances methods)
Recontruction of a phylogenetic tree
• Neighbor Joining• ADDTREE• Unweighted Neighbor Joining• Circular order reconstruction• Weighted Least-squares• BioNJ• ….
Inferring phylogenetic trees
Four main approaches:
• Distance-based methods • UPGMA by Michener and Sokal (1957)
• ADDTREE by Sattath et Tversky (1977)
• Neighbor-joining (NJ) by Saitou and Nei (1988)
• UNJ and BioNJ methods by Gascuel (1997)
• Fitch by Felsenstein (1997)
• Weighted least-squares MW by Makarenkov and Leclerc (1999)
• Maximum Parsimony (Camin and Sokal 1965; Farris 1970; Fitch 1971)
• Maximum Likelihood (Felsenstein 1981)
• Bayesian approach (Rannala and Yang 1996; Huelsenbeck and Ronquist 2001)
Network model
Phylogenetic mechanisms requiring a network representation
• Horizontal gene transfer (i.e. lateral gene transfer)
• Hybridization
• Homoplasy and gene convergence
• Gene duplication and gene loss
1 2 3
4 5
• SplitsTree, Huson and Bryant (2006)
• T-Rex, Makarenkov (2001)
• NeighborNet, Bryant and Moulton (2002)
Software for building phylogenetic networks
Three types of horizontal gene transfer
• Hein (1990) and Hein et al. (1995, 1996)
• Haseler and Churchill (1993)
• Page (1994); Page and Charleston (1998)
• Charleston (1998)
• Hallet and Lagergren (2001)
• Mirkin, Fenner, Galperin and Koonin (2003)
• V’yugin, Gelfand and Lyubetsky (2003)
• Boc and Makarenkov (2003); Makarenkov, Boc and Diallo (2004)
Methods for detecting horizontal gene transfers
The new model
The new model
Basic ideas:
1) Reconcile the species and gene phylogenetic trees using either a topological (Robinson and Foulds topological distance) or a metric (least-squares) criterion
A
B
F
C
E
D
A
F
E
D
C
B
Species Tree Gene Tree
2) Incorporate necessary biological rules into the mathematical model
3) Maintain the algorithmic time complexity polynomial
Partial gene transfer versus complete transfer
A B C D E F
Root
A B C D E F
Root
A B C D E F
Root
Partial Transfer Complete Transfer
4
3
2
1
5
4
3
2
1
5
6
7
(a) (b)
Biological rules
Partial gene transfer. Incorporating biological rules.
Situations when a new HGT
branch (a,b) can affect the
evolutionary distance between
species i and j,
and cannot affect the distance
between i1 and j.
Root
i
j
x
y
z
wa b
i1
Partial gene transfer. Incorporating biological rules (2).
Three cases when the evolutionary distance between the species i and j is not affected by addition of a new HGT
branch (a,b)
Root
i
jx
y
z
wa b
Root
i
j
x
y
z
wa b
Root
j
x
y
z
wa b
i
(a) (b) (c)
Root
No HGTs can be considered when affected branches are located on the same lineage
Partial gene transfer. Incorporating biological rules (3).
Root
Lineage 2Lineage 1
LGT1
LGT2
No HGT can be considered when two HGTs affecting a pair of lineages intersect as shown
Partial gene transfer. Incorporating biological rules (4).
i j i j
i j i j
(a) (b)
(c) (d)
b b
b
b1 b
b1
a
a1
a1
a
b1 a1b1 a1
aa
• Cases (a) and (b): path between the leaves i and j is allowed to go through both HGT branches (a,b) and (a1,b1).
• Cases (c) and (d) : path between the leaves i and j is not allowed to go through both HGT branches (a,b) and (a1,b1).
Partial gene transfer. Incorporating biological rules (5).
Sub-Tree constraint
Genesub-tree 1
Genesub-tree 2
Genesub-tree 1
x
y
z
w
a b
Genesub-tree 2
x
y
z
wa b
T T1
Timing constraint: the transfer between the branches (z,w) and (x,y) of the species tree T can be allowed if and only if the cluster regrouping both affected sub-trees is present in the gene tree T1. Here and further in the article a single branch is depicted by a plane line and a path is
depicted by a wavy line.
•To arrange the topological conflicts between T and T1 that are due to the
transfers between single species or their close ancestors.
•To identify the transfers that have occurred deeper in the phylogeny.
Optimization
Optimization problem : Least-squares
The least-squares loss function Q to be minimized with the unknown vector of edge lengths l in T and the unknown fraction of the transferred gene α is as follows :
l – the length of the branch k of the path(ij) in T
(i,j) - the given dissimilarity value between i and j
)(
2
)( )(
)),()()1((),(jbpathk
kjb
Sij ijpathk iapathk
kia
kij jilllLQ
Root
i
j
x
y
z
wa b
1-α1-α
1-α
α
α
α
min)),(()(
2 Sij ijpatk
kij jil
Optimization problem : Robinson and Foulds topological distance
The topological distance of Robinson and Foulds (1981)
between two phylogenetic trees is equal to the minimum
number of elementary operations consisting of merging or
splitting vertices necessary to transform one tree into another.
A
B
DC
E
A
B
DE
CT1T
Robinson and Foulds topological distance
Robinson and Foulds distance between T and T1 is 2.
The HGT minimizing the Robinson and Foulds topological distance between the species and gene phylogenetic trees can be considered as the best candidate to reconcile the species and gene phylogenies.
A
B
DC
E
A
B
D
E
CA
B
DC
E
T T1
Algorithm
6A 0 2 3 5 5 4B 2 0 3 5 5 4C 3 3 0 4 4 3D 5 5 4 0 2 3E 5 5 4 2 0 3F 4 4 3 3 3 06A 0 4 4 2 4 4B 4 0 4 4 2 4C 4 4 0 4 4 2 D 2 4 4 0 4 4E 4 2 4 4 0 4F 4 4 2 4 4 0
Input file for our program
Distance matrix or newick string for the species tree
Distance matrix or newick string for the gene tree
Set X of Taxa = {A,B,C,D,E,F}
• Optimization criterion : – Least-Squares – Robinson and Foulds distance.
• Type of scenario : – Unique. – Multiple.
• Maximum number of HGTs.
• Subtree constraint.
• Position of the root.
• Bootstrap (with sequences).
Program options
Algorithm : unique scenario
Begin
Reconstruction of the species tree T and the gene tree T1
Reestimate the length of each branch in T
While Optimization criterion > 0 loop
Test all possible HGTsAdd the best HGTReestimate the length of each branch in TCompute the value of the optimization criterion
End Loop
End
Algorithm : multiple scenario
Begin
Reconstruction of the species tree T and the gene tree T1
Reestimate the length of each branch in T
Test all connections between pairs of branches
Establish a list of HGTs ordered according to the optimization criterion.
End
Algorithm : Step 1
• Reconstruction of the species tree T with Neighbor Joinning
• Set X of n taxa
• Binary tree: internal nodes are all of
degree 3, 2n-3 branches
• T is explicitly rooted (file,prompt,midpoint)
A
F
E
D
C
B
Algorithm : Step 2
• Comparing the gene tree T1 and the species tree T
Criterion 1 :
RF - Robinson and Foulds distance between T and T1
LS - Least-Squares coefficient between distances in T and T1
If RF == 0 or LS == 0 then There is no HGTsElse Step 3End if
A
B
F
C
E
D
A
F
E
D
C
B
Species Tree T Gene Tree T1
Algorithm : Step 3
A
F
E
D
C
BMultiple Scenario
• Test all connections between pairs of branches.
• Reestimate the length of each branch in T according to the gene distance matrix.
• Establish a list of HGTs ordered according to the least-squares coefficient or the Robinson-Foulds distance.
Algorithm : Step 3
A
F
E
D
C
B
A
C
F
E
B
D
A
F
B
E
C
D
A
B
F
C
E
D
Species Tree Upcomming HGT1
Species Tree + HGT1Upcomming HGT2
Species Tree + HGT2Upcomming HGT3
Species Tree + HGT3(Gene Tree)
1
2 3
Unique Scenario
1. The best HGT found is added to the species tree.2. The length of each branch is reestimated according to the gene tree.3. RF distance or LS coefficient are computed.
output**Species tree inferred with NJ (with branch lengths fitted to the gene distance)**
1. 8--B 1.666667 2. 9--C 1.866667 3. 10--D 1.866667 4. 10--11 0.000050 5. 10--E 1.666667 6. 8--A 1.866667 7. 9--8 0.000050 8. 12--Root 0.366687 9. 11--F 1.866667 10. 11--12 0.000050 11. 9--12 0.000050
========================================= = Criteria values before computation ========================================= RF distance = 8 LS criterion = 10.667334 BD criterion = 10.000000 ================ = HGT #1 ============== From branch 8--A to branch D--10 RF = 6 LS = 6.160320 BD = 5.500000 ================ = HGT #2 ============== From branch C--9 to branch F--11 RF = 4 LS = 1.821514 BD = 3.000000 ================ = HGT #3 ============== From branch E--10 to branch B--8 RF = 0 LS = 0.000000 BD = 0.000000
bootstrap
A
F
E
D
C
B
A
B
E
D
C
F
E
F
A
D
C
B
A
E
F
C
D
B
.
.
.
.
n replicats(gene trees)
A
E
F
C
D
B
75%
60%
Species Tree Gene Tree
The first gene tree is choosen as the gene tree of reference.
Examples
Horizontal transfer of the Rubisco Large subunit gene
Delwiche, C.F., and J. D. Palmer. 1996. Rampant
Horizontal Transfer and Duplication of Rubisco Genes in
Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882.
Application example 1
rbcL Gene Phylogeny
Mn oxidizing bacterium (S|85-9a1)
Rhodobacter Sphaeroides IXanthobacter
Alcaligenes 17707 chromosomalAlcaligenes H16 plasmidAlcaligenes H16 chromosomal
Cyanidium
PorphyridiumAntithamnion
Ahnfeltia
CryptomonasEctocarpus
OlistodiscusCylindrotheca
ProchlorococcusHydrogenovibrio L2
Chromatium L
Pseudomonas
Thiobacillus ferrooxidans fe1Nitrobacter
Thiobacillus ferr. 19859Thiobacillus denitrificans I
Endosymbiont
AnabaenaSynechococcus
AnacystisProchlorothrix
SynechocystisProchloron
Cyanophora
ChlorellaBryopsis
Chlamidomonas
EuglenaPyramimonas
ColeochaeteMarchantia
PseudotsugaNicotiana
Oryza
Proteobacteria
Proteobacteria
Proteobacteria
Red and Brown Plastids
Cyanobacteria Proteobacteria
Proteobacteria
Proteobacteria
ProteobacteriaProteobacteria
Cyanobacteria
Green Plastids
Glaucophyte Plastid
Green Type(FORM I)
Red Type (FORM I)
Proteobacteria
Proteobacteria
Delwiche and Palmer (1996) - hypotheses of HGTs
1- Cyanobacteria → γ-Proteobacteria
2- α-Proteobacteria → Red and brown algae
3- γ-Proteobacteria → α-Proteobacteria
4- γ-Proteobacteria → β-Proteobacteria
Prochloron
Cyanophora
CyanidiumPorphyridiumAhnfeltiaAnthithamnionCryptomonasEctocarpusOlistodiscusCylindrotheca
CholrellaBryopsisChlamydomonasEuglenaPyramimonasColeochaeteMarchantiaPseudotsugaNicotianaOryza
SynechocystisProchlorothrixAnacystisSynechococcusAnabaenaProchlorococus
NitrobacterMn oxidizing bacteriumXanthobacterRhodobacter Sphaeroïde I
Hydrogenovibrio L2Chromatium L
Thiobacillus ferr. 19859Thiobacillus fe1
PseudomonasEndosymbiontThiobacillus denitificans IAlcaligenes 17707 ChromosomalAlcaligenes H16 ChromosomalAlcaligenes H16 plasmid
Green Plastids
Glaucophyte plastid
Cyanobacteria
-Proteobacteria
-Proteobacteria
-Proteobacteria
Red & Brown algae
6
4
8
7
5
3
1
2
Mn oxidizing bacterium (S|85-9a1)
Rhodobacter Sphaeroides IXanthobacter
Alcaligenes 17707 chromosomalAlcaligenes H16 plasmidAlcaligenes H16 chromosomal
Cyanidium
PorphyridiumAntithamnion
Ahnfeltia
CryptomonasEctocarpus
OlistodiscusCylindrotheca
ProchlorococcusHydrogenovibrio L2
Chromatium L
Pseudomonas
Thiobacillus ferrooxidans fe1Nitrobacter
Thiobacillus ferr. 19859Thiobacillus denitrificans I
Endosymbiont
AnabaenaSynechococcus
AnacystisProchlorothrix
SynechocystisProchloron
Cyanophora
ChlorellaBryopsis
Chlamidomonas
EuglenaPyramimonas
ColeochaeteMarchantia
PseudotsugaNicotiana
Oryza
Proteobacteria
Proteobacteria
Proteobacteria
Red and Brown Plastids
Cyanobacteria Proteobacteria
Proteobacteria
Proteobacteria
ProteobacteriaProteobacteria
Cyanobacteria
Green Plastids
Glaucophyte Plastid
Green Type(FORM I)
Red Type (FORM I)
Proteobacteria
Proteobacteria
Results for hgt-detection
HGTs of the rbcL gene - comparison
Hypotheses by Delwiche and Palmer (1996)
1- Cyanobacteria → -Proteobacteria
2- α-Proteobacteria → Red and brown algae
3- -Proteobacteria → α-Proteobacteria
4- -Proteobacteria → β-Proteobacteria
Solution
1. Cyanobacteria → -Proteobacteria
2. -Proteobacteria → -Proteobacteria
3. -Proteobacteria → -Proteobacteria
4. -Proteobacteria → Cyanobacteria
5. -Proteobacteria → -Proteobacteria
6. β-Proteobacteria → -Proteobacteria
7. -Proteobacteria → Red and brown algae
8. -Proteobacteria → -Proteobacteria
Application example 2
Horizontal transfers of the protein rpl12e
Data taken from:
Matte-Tailliez O., Brochier C., Forterre P. & Philippe H. Archaeal phylogeny based on ribosomal proteins. (2002). Mol. Biol. Evol. 19, 631-639.
Rpl12e HGTs
Pyrobaculum aerophilum
Aeropyrum pernix
Sulfolobus solfataricus
Pyrococcus furiosus
Pyrococcus abyssi
Pyrococcus horikoshii
Methanococcus jannaschii
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Methanosarcina barkeri
Haloarcula marismortui
Halobacterium sp.
Thermoplasma acidophilum
Ferroplasma acidarinanus
Species tree Rpl12e gene tree
Assumed HGTs of the rpl12e gene involved the clusters of Crenarchaeota and Thermoplasmatales (Matte-Tailliez, 2004)
Pyrobaculum aerophilum
Aeropyrum pernix
Sulfolobus solfataricus
Pyrococcus furiosus
Pyrococcus abyssi
Pyrococcus horikoshii
Methanococcus jannaschii
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Methanosarcina barkeri
Haloarcula marismortui
Halobacterium sp.
Thermoplasma acidophilum
Ferroplasma acidarinanus
56
100
100
56
51
65
81
61
79
79
74
10
Result for hgt-detection with bootstrap
Pyrobaculum aerophilum
Aeropyrum pernix
Sulfolobus solfataricus
Pyrococcus furiosus
Pyrococcus abyssi
Pyrococcus horikoshii
Methanococcus jannaschii
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Methanosarcina barkeri
Haloarcula marismortui
Halobacterium sp.
Thermoplasma acidophilum
Ferroplasma acidarinanus
1
3
4
5
270%
35%
36.3%
48.8%
26.3%
10
Pyrobaculum aerophilum
Aeropyrum pernix
Sulfolobus solfataricus
Pyrococcus furiosus
Pyrococcus abyssi
Pyrococcus horikoshii
Methanococcus jannaschii
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Methanosarcina barkeri
Haloarcula marismortui
Halobacterium sp.
Thermoplasma acidophilum
Ferroplasma acidarinanus
56
100
100
56
51
65
81
61
79
79
74
HGT-Detection vs. LatTrans
HGT-Detection vs LatTrans
• LatTrans, Hallett & Lagergren (2001)• time complexity : O(24t n2)
• HGT-Detection•Time complexity : O(n5)
40
45
50
55
60
65
70
75
10 20 30 40 50 60 70 80 90 100
Number of leaves
Dete
cti
on r
ate
(%
)
50
55
60
65
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10
Number of transfers
Dete
cti
on r
ate
(%
)
HGT-Detection vs Lattrans
5
10
15
20
25
30
35
40
10 20 30 40 50 60 70 80 90 100
Number of leaves
Sam
e s
cenari
o (
%)
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7 8 9 10
Number of transfers
Runnin
g t
ime
Software
T-REX — Tree and Reticulogram Reconstruction1
Downloadable from http://www.info.uqam.ca/~makarenv/trex.html
Authors: Vladimir Makarenkov
Versions: Windows 9x/NT/2000/XP and Macintosh
With contributions from A. Boc, P. Casgrain, A. B. Diallo, O. Gascuel, A. Guénoche, P.-A. Landry, F.-J. Lapointe, B. Leclerc, and P. Legendre.
________1 Makarenkov, V. 2001. T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks. Bioinformatics 17: 664-668.
T-Rex : Multiple scenario screenshotB
ioin
form
atic
s so
ftw
are
Serveur web (www.trex.uqam.ca)
Internet
Calculateur32 processeurs
de 3.4 Ghz PHP 4.4.1
GeneBase UQAMOracle 10g
UQAM
Chercheurs en bioinformatique de l’ UQAM
Autres utilisateurs
T-Rex Web infrastructure
Future developments
• Partial transfer
• Maximum Likelihood model
• Maximum Parsimony model
• Decreasing the running time
Bibliography
• Boc, A. and Makarenkov, V. (2003), New Efficient Algorithm for Detection of Horizontal Gene Transfer
Events, Algorithms in Bioinformatics, G. Benson and R. Page (Eds.), 3rd Workshop on Algorithms in
Bioinformatics, Springer-Verlag, pp. 190-201.
• Delwiche, C.F., and J. D. Palmer (1996). Rampant Horizontal Transfer and Duplication of Rubisco
Genes in Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882.
• Makarenkov,V. (2001), T-Rex: reconstructing and visualizing phylogenetic trees and reticulation
networks. Bioinformatics, 17, 664-668.
• Makarenkov, V., Boc, A., Delwiche, C.F. and Philippe, H. (2005), A novel approach for detecting
horizontal gene transfers: Modeling partial and complete gene transfer scenarios, submitted Mol. Biol.
Evol.
• Makarenkov, V., Boc, A. and Diallo A.B. (2004), Representing Lateral gene transfer in species
classification. Unique scenario, IFCS’2004 proceedings, Chicago.
• Matte-Tailliez O., Brochier C., Forterre P. & Philippe H. (2002). Archaeal phylogeny based on ribosomal
proteins. Mol. Biol. Evol. 19, 631-639.
• Robinson, D.R. and Foulds L.R. (1981), Comparison of phylogenetic trees, Mathematical Biosciences
53, 131-147.
• Woese, C. R., G. Olsen, M. Ibba, and D. Söll. 2000. Aminoacyl-tRNA synthetases, the genetic code,
and the evolutionary process. Microbiol. Mol. Biol. Rev. 64:202-236.