Upload
harlan
View
41
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Detecting horizontal gene transfers using discrepancies in species and gene classifications. Alix Boc Vladimir Makarenkov Université du Québec à Montréal. Presentation summary. Some words about phylogeny Network models in phylogenetic analysis What is a horizontal gene transfer (HGT)? - PowerPoint PPT Presentation
Citation preview
Detecting horizontal gene transfers using discrepancies in species and
gene classifications
Alix Boc
Vladimir Makarenkov
Université du Québec à Montréal
Presentation summary
• Some words about phylogeny
• Network models in phylogenetic analysis
• What is a horizontal gene transfer (HGT)?
• Description of the new method
• Examples of application
• Future works
• T-Rex software
Recontruction of a phylogenetic tree
A: CGTAATB: CGTACGC: CGTCGAD: ACT………E: ………………F: ………………
A B C D E F
A 0 2 3 5 5 4
B 2 0 3 5 5 4
C 3 3 0 4 4 3
D 5 5 4 0 2 3
E 5 5 4 2 0 3
F 4 4 3 3 3 0
A
F
E
D
C
B
DNA Sequences Distance Matrix Phylogenetic Tree
Inferring phylogenetic trees
Four main approaches:
• Distance-based methods • UPGMA by Michener and Sokal (1957)
• ADDTREE by Sattath et Tversky (1977)
• Neighbor-joining (NJ) by Saitou and Nei (1988)
• UNJ and BioNJ methods by Gascuel (1997)
• Fitch by Felsenstein (1997)
• Weighted least-squares MW by Makarenkov and Leclerc (1999)
• Maximum Parsimony (Camin and Sokal 1965; Farris 1970; Fitch 1971)
• Maximum Likelihood (Felsenstein 1981)
• Bayesian approach (Rannala and Yang 1996; Huelsenbeck and Ronquist 2001)
Phylogenetic mechanisms requiring a network representation
• Horizontal gene transfer (i.e. lateral gene transfer)
• Hybridization
• Homoplasy and gene convergence
• Gene duplication and gene loss
1 2 3
4 5
1 2 3
4 5
• SplitsTree, Huson (1998)
• T-Rex, Makarenkov (2001)
• NeighborNet, Bryant and Moulton (2002)
Software for building phylogenetic networks
• Hein (1990) and Hein et al. (1995, 1996)
• Haseler and Churchill (1993)
• Page (1994); Page and Charleston (1998)
• Charleston (1998)
• Hallet and Lagergren (2001)
• Mirkin, Fenner, Galperin and Koonin (2003)
• V’yugin, Gelfand and Lyubetsky (2003)
• Boc and Makarenkov (2003); Makarenkov, Boc and Diallo (2004)
Methods for detecting horizontal gene transfers
Three types of horizontal gene transfer
The new model
Basic ideas:
1) Reconcile the species and gene phylogenetic trees using either a topological (Robinson and Foulds topological distance) or a metric (least-squares) criterion
A
B
F
C
E
D
A
F
E
D
C
B
Species Tree Gene Tree
2) Incorporate necessary biological rules into the mathematical model
3) Maintain the algorithmic time complexity polynomial
Partial gene transfer versus complete transfer
A B C D E F
Root
A B C D E F
Root
A B C D E F
Root
Partial Transfer Complete Transfer
4
3
2
1
5
4
3
2
1
5
6
7
(a) (b)
Biological rules
Partial gene transfer. Incorporating biological rules.
Situations when a new HGT
branch (a,b) can affect the
evolutionary distance between
species i and j,
and cannot affect the distance
between i1 and j.
Root
i
j
x
y
z
wa b
i1
Partial gene transfer. Incorporating biological rules (2).
Three cases when the evolutionary distance between the species i and j is not affected by addition of a new HGT
branch (a,b)
Root
i
jx
y
z
wa b
Root
i
j
x
y
z
wa b
Root
j
x
y
z
wa b
i
(a) (b) (c)
Root
No HGTs can be considered when affected branches are located on the same lineage
Partial gene transfer. Incorporating biological rules (3).
Root
Lineage 2Lineage 1
LGT1
LGT2
No HGT can be considered when two HGTs affecting a pair of lineages intersect as shown
Partial gene transfer. Incorporating biological rules (4).
i j i j
i j i j
(a) (b)
(c) (d)
b b
b
b1 b
b1
a
a1
a1
a
b1 a1b1 a1
aa
• Cases (a) and (b): path between the leaves i and j is allowed to go through both HGT branches (a,b) and (a1,b1).
• Cases (c) and (d) : path between the leaves i and j is not allowed to go through both HGT branches (a,b) and (a1,b1).
Partial gene transfer. Incorporating biological rules (5).
Sub-Tree constraint
Genesub-tree 1
Genesub-tree 2
Genesub-tree 1
x
y
z
w
a b
Genesub-tree 2
x
y
z
wa b
T T1
Timing constraint: the transfer between the branches (z,w) and (x,y) of the species tree T can be allowed if and only if the cluster regrouping both affected sub-trees is present in the gene tree T1. Here and further in the article a single branch is depicted by a plane line and a path is
depicted by a wavy line.
•To arrange the topological conflicts between T and T1 that are due to the
transfers between single species or their close ancestors.
•To identify the transfers that have occurred deeper in the phylogeny.
Optimization
Optimization problem : Least-squares
The least-squares loss function to be minimized with an unknown length l of the HGT branch (a,b):
Q(ab,l) =
+ min
d(i,j) - the minimum path-length distance between the leaves (i.e. taxa) i and j in the tree T
(i,j) - the given dissimilarity value between i and j
dist(i,j) = d(i,j) – Min { d(i,a) + d(j,b); d(j,a) + d(i,b) }
ljidist
ji l bi d aj dbj d aidMin ),(
2) ),( )},(),(;),(),({ (
ljidist
ji jid ),(
2) ),( ),( (
Root
i
j
x
y
z
wa b
Optimization problem : Robinson and Foulds topological distance
The topological distance of Robinson and Foulds (1981)
between two phylogenetic trees is equal to the minimum
number of elementary operations consisting of merging or
splitting vertices necessary to transform one tree into another.
A
B
DC
E
A
B
DE
CT1T
Robinson and Foulds topological distance
Robinson and Foulds distance between T and T1 is 2.
The HGT minimizing the Robinson and Foulds topological distance between the species and gene phylogenetic trees can be considered as the best candidate to reconcile the species and gene phylogenies.
A
B
DC
E
A
B
D
E
CA
B
DC
E
T T1
Algorithm
6A 0 2 3 5 5 4B 2 0 3 5 5 4C 3 3 0 4 4 3D 5 5 4 0 2 3E 5 5 4 2 0 3F 4 4 3 3 3 0
A 0 4 4 2 4 4B 4 0 4 4 2 4C 4 4 0 4 4 2 D 2 4 4 0 4 4E 4 2 4 4 0 4F 4 4 2 4 4 0
6
Input file for our program
Distance Matrix for the species tree
Distance Matrix for the gene tree
Set X of Taxa = {A,B,C,D,E,F}
• Optimization criterion : Least-Squares or Robinson and Foulds distance.
• Type of scenario : Unique or Multiple.
• Maximum number of HGTs.
• Position of the root.
Program options
Algorithm : unique scenario
Begin
Reconstruction of the species tree TReestimate the length of each branch in T
While Optimization criterion > 0 loop
Test all possible HGTsAdd the best HGTReestimate the length of each branch in TCompute the value of the optimization criterion
End Loop
End
Algorithm : multiple scenario
Begin
Reconstruction of the species tree TReestimate the length of each branch in T
Test all connections between pairs of branches
Establish a list of HGTs ordered according to the optimization criterion.
End
Algorithm : Step 1
• Reconstruction of the species tree T with Neighbor Joinning
• Set X of n taxa
• Binary tree: internal nodes are all of
degree 3, 2n-3 branches
• T is explicitly rooted
A
F
E
D
C
B
Criterion 2 :
Reestimate the length of each branch of the species tree T according to the distances in T1.
LS - Least-Squares coefficient between distances in T and T1
If LS == 0 then There is no HGTsElse Step 3 (next slide)End if
Algorithm : Step 2
• Comparing the gene tree T1 and the species tree T
Criterion 1 :
RF - Robinson and Foulds distance between T and T1
If RF == 0 then There is no HGTsElse Step 3 (next slide)End if
A
B
F
C
E
D
A
F
E
D
C
B
Species Tree T Gene Tree T1
Algorithm : Step 3
A
F
E
D
C
BMultiple Scenario
• Test all connections between pairs of branches.
• Reestimate the length of each branch in T according to the gene distance matrix.
• Establish a list of HGTs ordered according to the least-squares coefficient or the Robinson-Foulds distance.
Algorithm : Step 3
Species Tree + HGT3
(Gene Tree)
A
B
F
C
E
D
Species Tree Upcoming HGT1
A
F
E
D
C
B
Species Tree + HGT2
Upcoming HGT3
A
F
B
E
C
D
Species Tree + HGT1
Upcoming HGT2
A
C
F
E
B
D
A
B
F
C
E
D
A
F
E
D
C
B
Species Tree T Gene Tree T1
Unique Scenario
1. The best HGT found is added to the species tree.2. The length of each branch is reestimated
according to the gene tree.3. RF distance or LS coefficient are computed.
1
2
3
output
Type de scenario : Unique
Liste des aretes et leur longueur de l'arbre d'especes construit avec NJ
1 7---B 1.8000002 8---C 1.8000003 9---D 1.8000004 10---9 0.0000205 9---E 1.8000006 10---F 1.8000007 7---A 1.8000008 7---8 0.0000209 10---8 0.000020
Le critere des moindres carres LS pour l'arbre d'especes dont les branches sont evaluees en fonction de l'arbre de gene est: 9.600160
La racine se trouve sur la branche 8--10
===================== TLG #1 ======================Menant de la branche 7--B a la branche 10--9LS = 5.333387RF = 4
===================== TLG #2 ======================Menant de la branche A--7 a la branche 9--DLS = 0.000000RF = 0
Examples
Horizontal transfer of the Rubisco Large subunit gene
Delwiche, C.F., and J. D. Palmer. 1996. Rampant
Horizontal Transfer and Duplication of Rubisco Genes in
Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882.
Application example 1
rbcL Gene Phylogeny
Mn oxidizing bacterium (S|85-9a1)
Rhodobacter Sphaeroides IXanthobacter
Alcaligenes 17707 chromosomalAlcaligenes H16 plasmidAlcaligenes H16 chromosomal
Cyanidium
PorphyridiumAntithamnion
Ahnfeltia
CryptomonasEctocarpus
OlistodiscusCylindrotheca
ProchlorococcusHydrogenovibrio L2
Chromatium L
Pseudomonas
Thiobacillus ferrooxidans fe1Nitrobacter
Thiobacillus ferr. 19859Thiobacillus denitrificans I
Endosymbiont
AnabaenaSynechococcus
AnacystisProchlorothrix
SynechocystisProchloron
Cyanophora
ChlorellaBryopsis
Chlamidomonas
EuglenaPyramimonas
ColeochaeteMarchantia
PseudotsugaNicotiana
Oryza
Proteobacteria
Proteobacteria
Proteobacteria
Red and Brown Plastids
Cyanobacteria Proteobacteria
Proteobacteria
Proteobacteria
ProteobacteriaProteobacteria
Cyanobacteria
Green Plastids
Glaucophyte Plastid
Green Type(FORM I)
Red Type (FORM I)
Proteobacteria
Proteobacteria
Delwiche and Palmer (1996) - hypotheses of HGTs
1- Cyanobacteria → γ-Proteobacteria
2- α-Proteobacteria → Red and brown algae
3- γ-Proteobacteria → α-Proteobacteria
4- γ-Proteobacteria → β-Proteobacteria
HGTs of the rbcL gene
Prochloron
Cyanophora
CyanidiumPorphyridiumAhnfeltiaAnthithamnionCryptomonasEctocarpusOlistodiscusCylindrotheca
CholrellaBryopsisChlamydomonasEuglenaPyramimonasColeochaeteMarchantiaPseudotsugaNicotianaOryza
SynechocystisProchlorothrixAnacystisSynechococcusAnabaenaProchlorococus
NitrobacterMn oxidizing bacteriumXanthobacterRhodobacter Sphaeroïde I
Hydrogenovibrio L2Chromatium L
Thiobacillus ferr. 19859Thiobacillus fe1
PseudomonasEndosymbiontThiobacillus denitificans IAlcaligenes 17707 ChromosomalAlcaligenes H16 ChromosomalAlcaligenes H16 plasmid
Green Plastids
Glaucophyte plastid
Cyanobacteria
-Proteobacteria
-Proteobacteria
-Proteobacteria
Red & Brown algae
8
1
4
2 5
63
7
HGTs of the rbcL gene - comparison
Hypotheses by Delwiche and Palmer (1996)
1- Cyanobacteria → γ-Proteobacteria
2- α-Proteobacteria → Red and brown algae
3- γ-Proteobacteria → α-Proteobacteria
4- γ-Proteobacteria → β-Proteobacteria
Solution
1. -Proteobacteria → β-Proteobacteria
2. α-Proteobacteria → Red and brown algae
3. -Proteobacteria → γ-Proteobacteria
4. -Proteobacteria → -Proteobacteria
5. γ-Proteobacteria → Cyanobacteria
6. β-Proteobacteria → γ-Proteobacteria
7. γ-Proteobacteria → β-Proteobacteria
8. Cyanobacteria → γ-Proteobacteria
Application example 2
Horizontal transfers of the protein rpl12e
Data taken from:
Matte-Tailliez O., Brochier C., Forterre P. & Philippe H. Archaeal phylogeny based on ribosomal proteins. (2002). Mol. Biol. Evol. 19, 631-639.
Rpl12e HGTs
Pyrobaculum aerophilum
Aeropyrum pernix
Sulfolobus solfataricus
Pyrococcus furiosus
Pyrococcus abyssi
Pyrococcus horikoshii
Methanococcus jannaschii
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Methanosarcina barkeri
Haloarcula marismortui
Halobacterium sp.
Thermoplasma acidophilum
Ferroplasma acidarinanus
Pyrobaculum aerophilum
Aeropyrum pernix
Sulfolobus solfataricus
Pyrococcus furiosus
Pyrococcus abyssi
Pyrococcus horikoshii
Methanococcus jannaschii
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Methanosarcina barkeri
Haloarcula marismortui
Halobacterium sp.
Thermoplasma acidophilum
Ferroplasma acidarinanus
Species tree Rpl12e gene tree
Assumed HGTs of the rpl12e gene involved the clusters of Crenarchaeota and Thermoplasmatales (Matte-Tailliez, 2004)
Reconciliation scenario
Pyrobaculum aerophilum
Aeropyrum pernix
Sulfolobus solfataricus
Pyrococcus furiosus
Pyrococcus abyssi
Pyrococcus horikoshii
Methanococcus jannaschii
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Methanosarcina barkeri
Haloarcula marismortui
Halobacterium sp.
Thermoplasma acidophilum
Ferroplasma acidarinanus
10
100
100
100
30
5594
100
100
100
72
1
3
2
4
5
74%
69%
60%
60%
55%
Application example 3
Horizontal transfers of the PheRS synthetase
Data taken from:
Woese, C. R., G. Olsen, M. Ibba, and D. Söll. 2000. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol. Mol. Biol. Rev. 64:202-236.
PheRS synthetase
M. thermoautotrophicum
A. fulgidus
H. sapiens
S. cerevisiae
M. jannaschii
S. solfataricus
P. aerophilum
P. horikoshii
T. pallidum
B. burgdorferi
M. pneumoniae
M. genitalium
B. subtilis
S. pyogenes
E. faecalis
R. capsulatus
T. maritima
M. tuberculosis
C. acetobutylicum
D. radiodurans
T. thermophilus
C. tepidum
C. trachomatis
P. gingivalis
A. aeolicus
Synechocystis sp. PCC 6803
N. gonorrhoeae
P. aeruginosa
R. prowazekii
H. pylori
H. influenzae
E. coli
100
100
10088
74
99
67
100
8588
90
85100
98
20 changes
Bacteria
Archaea
Eukarya
Archaea
Reconciliation scenario
Synechocystis sp. PCC 6803
C. trachomatis
T. thermophilus
D. radiodurans
R. capsulatus
R. prowazekii
N. gonorrhoeae
H. pylori
P. aeruginosa
E. coli
H. influenzae
M. tuberculosis
B. burgdorferi
T. pallidum
T. maritima
A. aeolicus
P. gingivalis
C. tepidum
M. pneumoniae
M. genitalium
C. acetobutylicum
B. subtilis
E. faecalis
S. pyogenes
S. cerevisiae
H. sapiens
S. solfataricus
P. aerophilum
P. horikoshii
M. thermoautotrophicum
A. fulgidus
M. jannaschii
4
2
1
3
5
62%
88%
65%
85%
60%
M. thermoautotrophicum
A. fulgidus
H. sapiens
S. cerevisiae
M. jannaschii
S. solfataricus
P. aerophilum
P. horikoshii
T. pallidum
B. burgdorferi
M. pneumoniae
M. genitalium
B. subtilis
S. pyogenes
E. faecalis
R. capsulatus
T. maritima
M. tuberculosis
C. acetobutylicum
D. radiodurans
T. thermophilus
C. tepidum
C. trachomatis
P. gingivalis
A. aeolicus
Synechocystis sp. PCC 6803
N. gonorrhoeae
P. aeruginosa
R. prowazekii
H. pylori
H. influenzae
E. coli
100
100
10088
74
99
67
100
8588
90
85100
98
20 changes
Bacteria
Archaea
Eukarya
Archaea
Software
T-REX — Tree and Reticulogram Reconstruction1
Downloadable from http://www.info.uqam.ca/~makarenv/trex.html
Authors: Vladimir Makarenkov
Versions: Windows 9x/NT/2000/XP and Macintosh
With contributions from A. Boc, P. Casgrain, A. B. Diallo, O. Gascuel, A. Guénoche, P.-A. Landry, F.-J. Lapointe, B. Leclerc, and P. Legendre.
________1 Makarenkov, V. 2001. T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks. Bioinformatics 17: 664-668.
T-Rex : Multiple scenario screenshotB
ioin
form
atic
s so
ftw
are
Serveur web (www.trex.uqam.ca)
Internet
Calculateur32 processeurs
de 3.4 Ghz PHP 4.4.1
GeneBase UQAMOracle 10g
UQAM
Chercheurs en bioinformatique de l’ UQAM
Autres utilisateurs
T-Rex Web infrastructure
Future developments
• Maximum Likelihood model
• Maximum Parsimony model
• Decreasing the running time
Bibliography
• Boc, A. and Makarenkov, V. (2003), New Efficient Algorithm for Detection of Horizontal Gene Transfer
Events, Algorithms in Bioinformatics, G. Benson and R. Page (Eds.), 3rd Workshop on Algorithms in
Bioinformatics, Springer-Verlag, pp. 190-201.
• Delwiche, C.F., and J. D. Palmer (1996). Rampant Horizontal Transfer and Duplication of Rubisco
Genes in Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882.
• Makarenkov,V. (2001), T-Rex: reconstructing and visualizing phylogenetic trees and reticulation
networks. Bioinformatics, 17, 664-668.
• Makarenkov, V., Boc, A., Delwiche, C.F. and Philippe, H. (2005), A novel approach for detecting
horizontal gene transfers: Modeling partial and complete gene transfer scenarios, submitted Mol. Biol.
Evol.
• Makarenkov, V., Boc, A. and Diallo A.B. (2004), Representing Lateral gene transfer in species
classification. Unique scenario, IFCS’2004 proceedings, Chicago.
• Matte-Tailliez O., Brochier C., Forterre P. & Philippe H. (2002). Archaeal phylogeny based on ribosomal
proteins. Mol. Biol. Evol. 19, 631-639.
• Robinson, D.R. and Foulds L.R. (1981), Comparison of phylogenetic trees, Mathematical Biosciences
53, 131-147.
• Woese, C. R., G. Olsen, M. Ibba, and D. Söll. 2000. Aminoacyl-tRNA synthetases, the genetic code,
and the evolutionary process. Microbiol. Mol. Biol. Rev. 64:202-236.