17
3- RIBOSOMAL RNA GENE RECONSTRUCITON Phenetics Vs. Cladistics Phenetics Vs. Cladistics Homology/Homoplasy/Orthology/Paralogy Homology/Homoplasy/Orthology/Paralogy Evolution Vs. Phylogeny Evolution Vs. Phylogeny The relevance of the alignment The relevance of the alignment The algorithms The algorithms Bootstrap Bootstrap One tree is no tree One tree is no tree

3- RIBOSOMAL RNA GENE RECONSTRUCITON Phenetics Vs. Cladistics Homology/Homoplasy/Orthology/Paralogy Evolution Vs. Phylogeny The relevance of the

Embed Size (px)

Citation preview

Page 1: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

3- RIBOSOMAL RNA GENE RECONSTRUCITON

Phenetics Vs. CladisticsPhenetics Vs. Cladistics

Homology/Homoplasy/Orthology/ParalogyHomology/Homoplasy/Orthology/Paralogy

Evolution Vs. PhylogenyEvolution Vs. Phylogeny

The relevance of the alignmentThe relevance of the alignment

The algorithmsThe algorithms

BootstrapBootstrap

One tree is no treeOne tree is no tree

Page 2: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

Phylogenetic coherencePhylogenetic coherence(monophyly) (monophyly)

phylogenetic coherencephylogenetic coherence

RNAr 16SRNAr 16SFunctional genes (MLSA)Functional genes (MLSA)

Genomic analysesGenomic analyses

70-50%

70%

genomic coherencegenomic coherence

Reasociación DNA-DNAReasociación DNA-DNAG+C, AFLP, G+C, AFLP, MLSAMLSA

Genomic comparisonsGenomic comparisons(ANI; AAI)(ANI; AAI)

100%100%

60%60%

70%70%

80%80%

50%50%

phenotypic coherencephenotypic coherence

metabolismmetabolismchemotaxonomychemotaxonomy

spectrometry spectrometry (Maldi-Tof; ICR-FT/MS)(Maldi-Tof; ICR-FT/MS)

Generally based on Generally based on 16S rRNA16S rRNA gene analysis gene analysis

important to recognize the closest relatives by means of the important to recognize the closest relatives by means of the Type Strain gene sequencesType Strain gene sequences

Housekeeping genes (Housekeeping genes (MLSAMLSA approach or approach or singlesingle gene) may help in resolve phylogenies gene) may help in resolve phylogenies

Future perspectives will be done with Future perspectives will be done with full-genomefull-genome sequences sequences

Page 3: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

M8M31

PR1C12

M1

E11

C25AE7

P13

C16

C4C5

A1

C9

P18

A7

E3

80 9085

Similarity matrix or alignmentSimilarity matrix or alignment

OTU AOTU A 1010001001001001010100010010010010

OTU BOTU B 1101000101000101011010001010001010

OTU COTU C 0001001001111010100010010011110101

OTU DOTU D 0011111001010101000111110010101010

OTU EOTU E 0001001011100110100010010111001101

……

PHENETICSPHENETICS

CLADISTICSCLADISTICS

Phenetics vs CladisticsPhenetics vs Cladistics

Data can be treated as presence/absence/intensity to generate Data can be treated as presence/absence/intensity to generate similarity matricessimilarity matrices

If data is analyzed by their similarity If data is analyzed by their similarity PHENETICSPHENETICS

If data is analyzed in an evolutionary context (i.e. changes in If data is analyzed in an evolutionary context (i.e. changes in homologous characters are mutations or evolutive steps) homologous characters are mutations or evolutive steps) CLADISTICSCLADISTICS

For evolutive purposes is necessary to recognize For evolutive purposes is necessary to recognize HOMOLOGYHOMOLOGY

Page 4: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

HOMOLOGY HOMOLOGY ORTOLOGY ORTOLOGY PARALOGY PARALOGY HOMOPLASY HOMOPLASY

Organism AOrganism AGene XGene X

Organism BOrganism BGene XGene X

Organism AOrganism AGene XGene XGene X’Gene X’Gene X’’Gene X’’

Homology Homology same ancestral origin same ancestral origin Homoplasy Homoplasy false homology false homology

Orthology Orthology homologous genes in different organisms homologous genes in different organisms Paralogy Paralogy homologous genes in homologous genes in the same organism, gene the same organism, gene duplications with identical or duplications with identical or different functiondifferent function

Page 5: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

HOMOLOGY HOMOLOGY ORTOLOGY ORTOLOGY PARALOGY PARALOGY HOMOPLASY HOMOPLASY

Organism AOrganism AGene XGene X

Organism BOrganism BGene XGene X

Organism AOrganism AGene XGene XGene X’Gene X’Gene X’’Gene X’’

Orthology Orthology homologous homologous genes in different genes in different organismsorganisms

Paralogy Paralogy homologous genes homologous genes in the same organism, gene in the same organism, gene duplications with identical or duplications with identical or different functiondifferent function

HomoplasyHomoplasy(false homology)(false homology)

HomologyHomology(same ancestral origin)(same ancestral origin)

Page 6: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

Evolution Evolution ≠ phylogeny≠ phylogeny

Evolution Evolution =>=> mutations (morphometrics) mutations (morphometrics) + + age (fossil record) age (fossil record)

Phylogeny Phylogeny == genealogy genealogy =>=> we know only the tips of the tree, we know only the tips of the tree,

nothing is said about putative ancestorsnothing is said about putative ancestors

PROKARYOTES PROKARYOTES =>=> no fossil record no fossil record =>=> molecular clocks molecular clocks

Molecular clocks (housekeeping Molecular clocks (housekeeping genes):genes):

16S rRNA; 23S rRNA; ATPases; 16S rRNA; 23S rRNA; ATPases; TU-elongation factor; gyrases…TU-elongation factor; gyrases…

The 16S rRNA:The 16S rRNA: Universally representedUniversally represented ConservedConserved No protein codingNo protein coding Base pairing (helix)Base pairing (helix) Natural amplificationNatural amplification Proper sizeProper size

Evolution vs. PhylogenyEvolution vs. Phylogeny

Ludwig and Schleifer, 1994 FEMS Rev 15:155-173Ludwig and Schleifer, 1994 FEMS Rev 15:155-173

Page 7: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

The relevance of the alignmentThe relevance of the alignment

To perform cladistic analyses we should first align al sequences in order to To perform cladistic analyses we should first align al sequences in order to recognize all homologous positions. recognize all homologous positions.

Recognition by:Recognition by:

Sequence similaritiesSequence similarities

Base pairing due secondary structure (helixes for rRNA)Base pairing due secondary structure (helixes for rRNA)

Insertions & deletionsInsertions & deletions

Empirically (subjective)Empirically (subjective)

Minimize homoplasic influencesMinimize homoplasic influences

There are many alignment programs, all look to common features that may indicate There are many alignment programs, all look to common features that may indicate homologous sites:homologous sites:

Clustal XClustal X

MAFFTMAFFT

PileUpPileUp

……

Page 8: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

The relevance of the alignmentThe relevance of the alignment

Most of the programs do not take into account secondary structure, just sequence motive similaritiesMost of the programs do not take into account secondary structure, just sequence motive similarities

rRNA has a secondary structure with helixes that help in aligning sequencesrRNA has a secondary structure with helixes that help in aligning sequences

Functional gene or translated proteins cannot be improved by secondary structure analysisFunctional gene or translated proteins cannot be improved by secondary structure analysis

Page 9: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

The relevance of the alignmentThe relevance of the alignment

ARB does take into account features as helix pairingARB does take into account features as helix pairing

By increasing the numbers of sequences, the By increasing the numbers of sequences, the

alignment improvesalignment improves

www.arb-home.dewww.arb-silva.de

Page 10: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

TT CC

GGAA

transitionstransitions

tran

sver

sio

ns

tran

sver

sio

ns

Maximum LikelihoodMaximum Likelihood

Like Maximum Parsimony Like Maximum Parsimony but takes into accountbut takes into account

difficulties in mutation difficulties in mutation events (transitions vs. events (transitions vs. transversions)transversions)

mutation positionmutation position

SlowerSlower

Maximum ParsimonyMaximum Parsimony

G C C A T => aG C C A T => aG C A C T => bG C A C T => bG C A C C => cG C A C C => c

a – b => 2 mutationsa – b => 2 mutations

a – c => 3 mutationsa – c => 3 mutations

b – c => 1 mutationb – c => 1 mutation

bb ccaa aa cc bb aabb cc

22

33

33

22

11

22

55 55 33

(pitfalls: nature may not be parsimonious)(pitfalls: nature may not be parsimonious)

a => a => 00b => b => 4040 00c => c => 6060 2020 00

aa bb cc

aa

bb

ccNeighbor Joining:Neighbor Joining:

G C C A T => aG C C A T => aG C A C T => bG C A C T => bG C A C C => cG C A C C => c

a => a => 100100b => b => 6060 100100c => c => 4040 8080 100100

aa bb cc

alignmentalignment

Similarity matrixSimilarity matrix Distance matrixDistance matrix

dendrogramsdendrograms

Distance transformationDistance transformation

Jukes-CantorJukes-CantorKimuraKimura

De SoeteDe Soete

(pitfalls: does not take into account multiple mutations)(pitfalls: does not take into account multiple mutations)

aabb cc

The algorithmsThe algorithms

Page 11: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

Bootstrap indicates how stable is a branching order when a given dataset is Bootstrap indicates how stable is a branching order when a given dataset is

submitted to multiple analysissubmitted to multiple analysis

Generally short internode branches will have low bootstrap valuesGenerally short internode branches will have low bootstrap values

BootstrapBootstrap

Page 12: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

PHYLOGENETIC FILTERSPHYLOGENETIC FILTERS

TERMINI TERMINI 42,284 homologous positions 42,284 homologous positions

BACTERIA BACTERIA 1,532 homologous positions 1,532 homologous positions

30% 30% 1,433 homologous positions 1,433 homologous positions

50% 50% 1,288 homologous positions 1,288 homologous positions

Page 13: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

NJ_bac

NJ_30% NJ_50%

USE OF PHYLOGENETIC USE OF PHYLOGENETIC FILTERSFILTERS

Conservational filters are useful for deep-Conservational filters are useful for deep-branching phylogeniesbranching phylogenies

complete sequences are useful for close complete sequences are useful for close relative organismsrelative organisms

Page 14: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

Size & information contentSize & information content

complete sequences give complete information complete sequences give complete information

partial sequences lose phylogenetic signalpartial sequences lose phylogenetic signal

short sequences lose resolutionshort sequences lose resolution

1500 nuc1500 nuc

900 nuc900 nuc

300 nuc300 nuc

Page 15: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

One tree is no treeOne tree is no tree

PARPAR

NJNJ

RaXMLRaXML

different algorithms different algorithms different topologies different topologies

try different datasets as welltry different datasets as well

draw a consensus tree draw a consensus tree

Page 16: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

RECOMMENDATIONS FOR 16S rRNA TREE RECONSTRUCTIONRECOMMENDATIONS FOR 16S rRNA TREE RECONSTRUCTION

BA DC

FE

100

IHG

100

90

10095

5025

25

BA DC

FE IHG

Tree with bootstrap Tree with multifurcation

SEQUENCESEQUENCE almost complete is better than short partial sequencesalmost complete is better than short partial sequences

ALIGNMENTALIGNMENT Better take into account secondary structuresBetter take into account secondary structures

ALGORITHMALGORITHM Better maximum likelihood, but compare with other as neighbor joining and maximum Better maximum likelihood, but compare with other as neighbor joining and maximum parsimonyparsimony

DATASETDATASET Never just one dataset, try different sets of data (i.e. different number of sequences; Never just one dataset, try different sets of data (i.e. different number of sequences; different filters to find the best resolution)different filters to find the best resolution)

FINAL TREE FINAL TREE Either you show all trees, or the best bootstrapped, or a multifurcation showing Either you show all trees, or the best bootstrapped, or a multifurcation showing unresolved branching order.unresolved branching order.

Page 17: 3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the

MLSA: phylogenetic reconstructions

MULTIPLE SEQUENCE ALIGNMENTS

sometimes have better resolution than the 16S rRNA gene

16S rRNA gene can have very low resolution

Jiménez et al., 2013, System Appl Microbiol, 36: 383- 391