49
Detecting horizontal gene transfers using discrepancies in species and gene classifications Alix Boc Vladimir Makarenkov Université du Québec à Montréal

Detecting horizontal gene transfers using discrepancies in species and gene classifications

  • Upload
    harlan

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Detecting horizontal gene transfers using discrepancies in species and gene classifications. Alix Boc Vladimir Makarenkov Université du Québec à Montréal. Presentation summary. Some words about phylogeny Network models in phylogenetic analysis What is a horizontal gene transfer (HGT)? - PowerPoint PPT Presentation

Citation preview

Page 1: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Detecting horizontal gene transfers using discrepancies in species and

gene classifications

Alix Boc

Vladimir Makarenkov

Université du Québec à Montréal

Page 2: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Presentation summary

• Some words about phylogeny

• Network models in phylogenetic analysis

• What is a horizontal gene transfer (HGT)?

• Description of the new method

• Examples of application

• Future works

• T-Rex software

Page 3: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Recontruction of a phylogenetic tree

A: CGTAATB: CGTACGC: CGTCGAD: ACT………E: ………………F: ………………

A B C D E F

A 0 2 3 5 5 4

B 2 0 3 5 5 4

C 3 3 0 4 4 3

D 5 5 4 0 2 3

E 5 5 4 2 0 3

F 4 4 3 3 3 0

A

F

E

D

C

B

DNA Sequences Distance Matrix Phylogenetic Tree

Page 4: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Inferring phylogenetic trees

Four main approaches:

• Distance-based methods • UPGMA by Michener and Sokal (1957)

• ADDTREE by Sattath et Tversky (1977)

• Neighbor-joining (NJ) by Saitou and Nei (1988)

• UNJ and BioNJ methods by Gascuel (1997)

• Fitch by Felsenstein (1997)

• Weighted least-squares MW by Makarenkov and Leclerc (1999)

• Maximum Parsimony (Camin and Sokal 1965; Farris 1970; Fitch 1971)

• Maximum Likelihood (Felsenstein 1981)

• Bayesian approach (Rannala and Yang 1996; Huelsenbeck and Ronquist 2001)

Page 5: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Phylogenetic mechanisms requiring a network representation

• Horizontal gene transfer (i.e. lateral gene transfer)

• Hybridization

• Homoplasy and gene convergence

• Gene duplication and gene loss

1 2 3

4 5

1 2 3

4 5

Page 6: Detecting horizontal gene transfers using discrepancies in species and gene classifications

• SplitsTree, Huson (1998)

• T-Rex, Makarenkov (2001)

• NeighborNet, Bryant and Moulton (2002)

Software for building phylogenetic networks

Page 7: Detecting horizontal gene transfers using discrepancies in species and gene classifications

• Hein (1990) and Hein et al. (1995, 1996)

• Haseler and Churchill (1993)

• Page (1994); Page and Charleston (1998)

• Charleston (1998)

• Hallet and Lagergren (2001)

• Mirkin, Fenner, Galperin and Koonin (2003)

• V’yugin, Gelfand and Lyubetsky (2003)

• Boc and Makarenkov (2003); Makarenkov, Boc and Diallo (2004)

Methods for detecting horizontal gene transfers

Page 8: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Three types of horizontal gene transfer

Page 9: Detecting horizontal gene transfers using discrepancies in species and gene classifications

The new model

Basic ideas:

1) Reconcile the species and gene phylogenetic trees using either a topological (Robinson and Foulds topological distance) or a metric (least-squares) criterion

A

B

F

C

E

D

A

F

E

D

C

B

Species Tree Gene Tree

2) Incorporate necessary biological rules into the mathematical model

3) Maintain the algorithmic time complexity polynomial

Page 10: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Partial gene transfer versus complete transfer

A B C D E F

Root

A B C D E F

Root

A B C D E F

Root

Partial Transfer Complete Transfer

4

3

2

1

5

4

3

2

1

5

6

7

(a) (b)

Page 11: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Biological rules

Page 12: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Partial gene transfer. Incorporating biological rules.

Situations when a new HGT

branch (a,b) can affect the

evolutionary distance between

species i and j,

and cannot affect the distance

between i1 and j.

Root

i

j

x

y

z

wa b

i1

Page 13: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Partial gene transfer. Incorporating biological rules (2).

Three cases when the evolutionary distance between the species i and j is not affected by addition of a new HGT

branch (a,b)

Root

i

jx

y

z

wa b

Root

i

j

x

y

z

wa b

Root

j

x

y

z

wa b

i

(a) (b) (c)

Page 14: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Root

No HGTs can be considered when affected branches are located on the same lineage

Partial gene transfer. Incorporating biological rules (3).

Page 15: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Root

Lineage 2Lineage 1

LGT1

LGT2

No HGT can be considered when two HGTs affecting a pair of lineages intersect as shown

Partial gene transfer. Incorporating biological rules (4).

Page 16: Detecting horizontal gene transfers using discrepancies in species and gene classifications

i j i j

i j i j

(a) (b)

(c) (d)

b b

b

b1 b

b1

a

a1

a1

a

b1 a1b1 a1

aa

• Cases (a) and (b): path between the leaves i and j is allowed to go through both HGT branches (a,b) and (a1,b1).

• Cases (c) and (d) : path between the leaves i and j is not allowed to go through both HGT branches (a,b) and (a1,b1).

Partial gene transfer. Incorporating biological rules (5).

Page 17: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Sub-Tree constraint

Genesub-tree 1

Genesub-tree 2

Genesub-tree 1

x

y

z

w

a b

Genesub-tree 2

x

y

z

wa b

T T1

Timing constraint: the transfer between the branches (z,w) and (x,y) of the species tree T can be allowed if and only if the cluster regrouping both affected sub-trees is present in the gene tree T1. Here and further in the article a single branch is depicted by a plane line and a path is

depicted by a wavy line.

•To arrange the topological conflicts between T and T1 that are due to the

transfers between single species or their close ancestors.

•To identify the transfers that have occurred deeper in the phylogeny.

Page 18: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Optimization

Page 19: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Optimization problem : Least-squares

The least-squares loss function to be minimized with an unknown length l of the HGT branch (a,b):

Q(ab,l) =

+ min

d(i,j) - the minimum path-length distance between the leaves (i.e. taxa) i and j in the tree T

(i,j) - the given dissimilarity value between i and j

dist(i,j) = d(i,j) – Min { d(i,a) + d(j,b); d(j,a) + d(i,b) }

ljidist

ji l bi d aj dbj d aidMin ),(

2) ),( )},(),(;),(),({ (

ljidist

ji jid ),(

2) ),( ),( (

Root

i

j

x

y

z

wa b

Page 20: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Optimization problem : Robinson and Foulds topological distance

The topological distance of Robinson and Foulds (1981)

between two phylogenetic trees is equal to the minimum

number of elementary operations consisting of merging or

splitting vertices necessary to transform one tree into another.

A

B

DC

E

A

B

DE

CT1T

Page 21: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Robinson and Foulds topological distance

Robinson and Foulds distance between T and T1 is 2.

The HGT minimizing the Robinson and Foulds topological distance between the species and gene phylogenetic trees can be considered as the best candidate to reconcile the species and gene phylogenies.

A

B

DC

E

A

B

D

E

CA

B

DC

E

T T1

Page 22: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Algorithm

Page 23: Detecting horizontal gene transfers using discrepancies in species and gene classifications

6A 0 2 3 5 5 4B 2 0 3 5 5 4C 3 3 0 4 4 3D 5 5 4 0 2 3E 5 5 4 2 0 3F 4 4 3 3 3 0

A 0 4 4 2 4 4B 4 0 4 4 2 4C 4 4 0 4 4 2 D 2 4 4 0 4 4E 4 2 4 4 0 4F 4 4 2 4 4 0

6

Input file for our program

Distance Matrix for the species tree

Distance Matrix for the gene tree

Set X of Taxa = {A,B,C,D,E,F}

Page 24: Detecting horizontal gene transfers using discrepancies in species and gene classifications

• Optimization criterion : Least-Squares or Robinson and Foulds distance.

• Type of scenario : Unique or Multiple.

• Maximum number of HGTs.

• Position of the root.

Program options

Page 25: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Algorithm : unique scenario

Begin

Reconstruction of the species tree TReestimate the length of each branch in T

While Optimization criterion > 0 loop

Test all possible HGTsAdd the best HGTReestimate the length of each branch in TCompute the value of the optimization criterion

End Loop

End

Page 26: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Algorithm : multiple scenario

Begin

Reconstruction of the species tree TReestimate the length of each branch in T

Test all connections between pairs of branches

Establish a list of HGTs ordered according to the optimization criterion.

End

Page 27: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Algorithm : Step 1

• Reconstruction of the species tree T with Neighbor Joinning

• Set X of n taxa

• Binary tree: internal nodes are all of

degree 3, 2n-3 branches

• T is explicitly rooted

A

F

E

D

C

B

Page 28: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Criterion 2 :

Reestimate the length of each branch of the species tree T according to the distances in T1.

LS - Least-Squares coefficient between distances in T and T1

If LS == 0 then There is no HGTsElse Step 3 (next slide)End if

Algorithm : Step 2

• Comparing the gene tree T1 and the species tree T

Criterion 1 :

RF - Robinson and Foulds distance between T and T1

If RF == 0 then There is no HGTsElse Step 3 (next slide)End if

A

B

F

C

E

D

A

F

E

D

C

B

Species Tree T Gene Tree T1

Page 29: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Algorithm : Step 3

A

F

E

D

C

BMultiple Scenario

• Test all connections between pairs of branches.

• Reestimate the length of each branch in T according to the gene distance matrix.

• Establish a list of HGTs ordered according to the least-squares coefficient or the Robinson-Foulds distance.

Page 30: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Algorithm : Step 3

Species Tree + HGT3

(Gene Tree)

A

B

F

C

E

D

Species Tree Upcoming HGT1

A

F

E

D

C

B

Species Tree + HGT2

Upcoming HGT3

A

F

B

E

C

D

Species Tree + HGT1

Upcoming HGT2

A

C

F

E

B

D

A

B

F

C

E

D

A

F

E

D

C

B

Species Tree T Gene Tree T1

Unique Scenario

1. The best HGT found is added to the species tree.2. The length of each branch is reestimated

according to the gene tree.3. RF distance or LS coefficient are computed.

1

2

3

Page 31: Detecting horizontal gene transfers using discrepancies in species and gene classifications

output

Type de scenario : Unique

Liste des aretes et leur longueur de l'arbre d'especes construit avec NJ

1 7---B 1.8000002 8---C 1.8000003 9---D 1.8000004 10---9 0.0000205 9---E 1.8000006 10---F 1.8000007 7---A 1.8000008 7---8 0.0000209 10---8 0.000020

Le critere des moindres carres LS pour l'arbre d'especes dont les branches sont evaluees en fonction de l'arbre de gene est: 9.600160

La racine se trouve sur la branche 8--10

===================== TLG #1 ======================Menant de la branche 7--B a la branche 10--9LS = 5.333387RF = 4

===================== TLG #2 ======================Menant de la branche A--7 a la branche 9--DLS = 0.000000RF = 0

Page 32: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Examples

Page 33: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Horizontal transfer of the Rubisco Large subunit gene

Delwiche, C.F., and J. D. Palmer. 1996. Rampant

Horizontal Transfer and Duplication of Rubisco Genes in

Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882.

Application example 1

Page 34: Detecting horizontal gene transfers using discrepancies in species and gene classifications

rbcL Gene Phylogeny

Mn oxidizing bacterium (S|85-9a1)

Rhodobacter Sphaeroides IXanthobacter

Alcaligenes 17707 chromosomalAlcaligenes H16 plasmidAlcaligenes H16 chromosomal

Cyanidium

PorphyridiumAntithamnion

Ahnfeltia

CryptomonasEctocarpus

OlistodiscusCylindrotheca

ProchlorococcusHydrogenovibrio L2

Chromatium L

Pseudomonas

Thiobacillus ferrooxidans fe1Nitrobacter

Thiobacillus ferr. 19859Thiobacillus denitrificans I

Endosymbiont

AnabaenaSynechococcus

AnacystisProchlorothrix

SynechocystisProchloron

Cyanophora

ChlorellaBryopsis

Chlamidomonas

EuglenaPyramimonas

ColeochaeteMarchantia

PseudotsugaNicotiana

Oryza

Proteobacteria

Proteobacteria

Proteobacteria

Red and Brown Plastids

Cyanobacteria Proteobacteria

Proteobacteria

Proteobacteria

ProteobacteriaProteobacteria

Cyanobacteria

Green Plastids

Glaucophyte Plastid

Green Type(FORM I)

Red Type (FORM I)

Proteobacteria

Proteobacteria

Page 35: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Delwiche and Palmer (1996) - hypotheses of HGTs

1- Cyanobacteria → γ-Proteobacteria

2- α-Proteobacteria → Red and brown algae

3- γ-Proteobacteria → α-Proteobacteria

4- γ-Proteobacteria → β-Proteobacteria

Page 36: Detecting horizontal gene transfers using discrepancies in species and gene classifications

HGTs of the rbcL gene

Prochloron

Cyanophora

CyanidiumPorphyridiumAhnfeltiaAnthithamnionCryptomonasEctocarpusOlistodiscusCylindrotheca

CholrellaBryopsisChlamydomonasEuglenaPyramimonasColeochaeteMarchantiaPseudotsugaNicotianaOryza

SynechocystisProchlorothrixAnacystisSynechococcusAnabaenaProchlorococus

NitrobacterMn oxidizing bacteriumXanthobacterRhodobacter Sphaeroïde I

Hydrogenovibrio L2Chromatium L

Thiobacillus ferr. 19859Thiobacillus fe1

PseudomonasEndosymbiontThiobacillus denitificans IAlcaligenes 17707 ChromosomalAlcaligenes H16 ChromosomalAlcaligenes H16 plasmid

Green Plastids

Glaucophyte plastid

Cyanobacteria

-Proteobacteria

-Proteobacteria

-Proteobacteria

Red & Brown algae

8

1

4

2 5

63

7

Page 37: Detecting horizontal gene transfers using discrepancies in species and gene classifications

HGTs of the rbcL gene - comparison

Hypotheses by Delwiche and Palmer (1996)

1- Cyanobacteria → γ-Proteobacteria

2- α-Proteobacteria → Red and brown algae

3- γ-Proteobacteria → α-Proteobacteria

4- γ-Proteobacteria → β-Proteobacteria

Solution

1. -Proteobacteria → β-Proteobacteria

2. α-Proteobacteria → Red and brown algae

3. -Proteobacteria → γ-Proteobacteria

4. -Proteobacteria → -Proteobacteria

5. γ-Proteobacteria → Cyanobacteria

6. β-Proteobacteria → γ-Proteobacteria

7. γ-Proteobacteria → β-Proteobacteria

8. Cyanobacteria → γ-Proteobacteria

Page 38: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Application example 2

Horizontal transfers of the protein rpl12e

Data taken from:

Matte-Tailliez O., Brochier C., Forterre P. & Philippe H. Archaeal phylogeny based on ribosomal proteins. (2002). Mol. Biol. Evol. 19, 631-639.

Page 39: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Rpl12e HGTs

Pyrobaculum aerophilum

Aeropyrum pernix

Sulfolobus solfataricus

Pyrococcus furiosus

Pyrococcus abyssi

Pyrococcus horikoshii

Methanococcus jannaschii

Methanobacterium thermoautotrophicum

Archaeoglobus fulgidus

Methanosarcina barkeri

Haloarcula marismortui

Halobacterium sp.

Thermoplasma acidophilum

Ferroplasma acidarinanus

Pyrobaculum aerophilum

Aeropyrum pernix

Sulfolobus solfataricus

Pyrococcus furiosus

Pyrococcus abyssi

Pyrococcus horikoshii

Methanococcus jannaschii

Methanobacterium thermoautotrophicum

Archaeoglobus fulgidus

Methanosarcina barkeri

Haloarcula marismortui

Halobacterium sp.

Thermoplasma acidophilum

Ferroplasma acidarinanus

Species tree Rpl12e gene tree

Assumed HGTs of the rpl12e gene involved the clusters of Crenarchaeota and Thermoplasmatales (Matte-Tailliez, 2004)

Page 40: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Reconciliation scenario

Pyrobaculum aerophilum

Aeropyrum pernix

Sulfolobus solfataricus

Pyrococcus furiosus

Pyrococcus abyssi

Pyrococcus horikoshii

Methanococcus jannaschii

Methanobacterium thermoautotrophicum

Archaeoglobus fulgidus

Methanosarcina barkeri

Haloarcula marismortui

Halobacterium sp.

Thermoplasma acidophilum

Ferroplasma acidarinanus

10

100

100

100

30

5594

100

100

100

72

1

3

2

4

5

74%

69%

60%

60%

55%

Page 41: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Application example 3

Horizontal transfers of the PheRS synthetase

Data taken from:

Woese, C. R., G. Olsen, M. Ibba, and D. Söll. 2000. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol. Mol. Biol. Rev. 64:202-236.

Page 42: Detecting horizontal gene transfers using discrepancies in species and gene classifications

PheRS synthetase

M. thermoautotrophicum

A. fulgidus

H. sapiens

S. cerevisiae

M. jannaschii

S. solfataricus

P. aerophilum

P. horikoshii

T. pallidum

B. burgdorferi

M. pneumoniae

M. genitalium

B. subtilis

S. pyogenes

E. faecalis

R. capsulatus

T. maritima

M. tuberculosis

C. acetobutylicum

D. radiodurans

T. thermophilus

C. tepidum

C. trachomatis

P. gingivalis

A. aeolicus

Synechocystis sp. PCC 6803

N. gonorrhoeae

P. aeruginosa

R. prowazekii

H. pylori

H. influenzae

E. coli

100

100

10088

74

99

67

100

8588

90

85100

98

20 changes

Bacteria

Archaea

Eukarya

Archaea

Page 43: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Reconciliation scenario

Synechocystis sp. PCC 6803

C. trachomatis

T. thermophilus

D. radiodurans

R. capsulatus

R. prowazekii

N. gonorrhoeae

H. pylori

P. aeruginosa

E. coli

H. influenzae

M. tuberculosis

B. burgdorferi

T. pallidum

T. maritima

A. aeolicus

P. gingivalis

C. tepidum

M. pneumoniae

M. genitalium

C. acetobutylicum

B. subtilis

E. faecalis

S. pyogenes

S. cerevisiae

H. sapiens

S. solfataricus

P. aerophilum

P. horikoshii

M. thermoautotrophicum

A. fulgidus

M. jannaschii

4

2

1

3

5

62%

88%

65%

85%

60%

M. thermoautotrophicum

A. fulgidus

H. sapiens

S. cerevisiae

M. jannaschii

S. solfataricus

P. aerophilum

P. horikoshii

T. pallidum

B. burgdorferi

M. pneumoniae

M. genitalium

B. subtilis

S. pyogenes

E. faecalis

R. capsulatus

T. maritima

M. tuberculosis

C. acetobutylicum

D. radiodurans

T. thermophilus

C. tepidum

C. trachomatis

P. gingivalis

A. aeolicus

Synechocystis sp. PCC 6803

N. gonorrhoeae

P. aeruginosa

R. prowazekii

H. pylori

H. influenzae

E. coli

100

100

10088

74

99

67

100

8588

90

85100

98

20 changes

Bacteria

Archaea

Eukarya

Archaea

Page 44: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Software

Page 45: Detecting horizontal gene transfers using discrepancies in species and gene classifications

T-REX — Tree and Reticulogram Reconstruction1

Downloadable from   http://www.info.uqam.ca/~makarenv/trex.html

Authors: Vladimir Makarenkov

Versions: Windows 9x/NT/2000/XP and Macintosh

With contributions from A. Boc, P. Casgrain, A. B. Diallo, O. Gascuel, A. Guénoche, P.-A. Landry, F.-J. Lapointe, B. Leclerc, and P. Legendre.

________1 Makarenkov, V. 2001. T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks. Bioinformatics 17: 664-668.

Page 46: Detecting horizontal gene transfers using discrepancies in species and gene classifications

T-Rex : Multiple scenario screenshotB

ioin

form

atic

s so

ftw

are

Page 47: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Serveur web (www.trex.uqam.ca)

Internet

Calculateur32 processeurs

de 3.4 Ghz PHP 4.4.1

GeneBase UQAMOracle 10g

UQAM

Chercheurs en bioinformatique de l’ UQAM

Autres utilisateurs

T-Rex Web infrastructure

Page 48: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Future developments

• Maximum Likelihood model

• Maximum Parsimony model

• Decreasing the running time

Page 49: Detecting horizontal gene transfers using discrepancies in species and gene classifications

Bibliography

• Boc, A. and Makarenkov, V. (2003), New Efficient Algorithm for Detection of Horizontal Gene Transfer

Events, Algorithms in Bioinformatics, G. Benson and R. Page (Eds.), 3rd Workshop on Algorithms in

Bioinformatics, Springer-Verlag, pp. 190-201.

• Delwiche, C.F., and J. D. Palmer (1996). Rampant Horizontal Transfer and Duplication of Rubisco

Genes in Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882.

• Makarenkov,V. (2001), T-Rex: reconstructing and visualizing phylogenetic trees and reticulation

networks. Bioinformatics, 17, 664-668.

• Makarenkov, V., Boc, A., Delwiche, C.F. and Philippe, H. (2005), A novel approach for detecting

horizontal gene transfers: Modeling partial and complete gene transfer scenarios, submitted Mol. Biol.

Evol.

• Makarenkov, V., Boc, A. and Diallo A.B. (2004), Representing Lateral gene transfer in species

classification. Unique scenario, IFCS’2004 proceedings, Chicago.

• Matte-Tailliez O., Brochier C., Forterre P. & Philippe H. (2002). Archaeal phylogeny based on ribosomal

proteins. Mol. Biol. Evol. 19, 631-639.

• Robinson, D.R. and Foulds L.R. (1981), Comparison of phylogenetic trees, Mathematical Biosciences

53, 131-147.

• Woese, C. R., G. Olsen, M. Ibba, and D. Söll. 2000. Aminoacyl-tRNA synthetases, the genetic code,

and the evolutionary process. Microbiol. Mol. Biol. Rev. 64:202-236.