31
A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees José Augusto Amgarten Quitzau João Meidanis Scylla Bioinformatics, Brazil University of Campinas, Brazil

A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

  • Upload
    ima

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees. Jos é Augusto Amgarten Quitzau João Meidanis Scylla Bioinformatics, Brazil University of Campinas, Brazil. Phylogeny reconstruction methods. - PowerPoint PPT Presentation

Citation preview

Page 1: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

A Fully Resolved Consensus Between Fully Resolved

Phylogenetic Trees

José Augusto Amgarten QuitzauJoão Meidanis

Scylla Bioinformatics, BrazilUniversity of Campinas, Brazil

Page 2: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Phylogeny reconstruction methods

Phylogeny reconstruction methods aim at inferring the phylogenetic tree that best describes the evolutionary history for a set of taxa.

Page 3: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Which tree to choose?

“The field of systematics has been in considerable turmoil as various investigators developed different methods of classification and argued their merits. I guarantee you that no one method or view has all the good points.”

Walter M. Fitch – 1984

Page 4: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Consensus as tree constructor

Consensus trees have been used traditionally in tree comparison and calculation of bootstrap values

We propose the use of consensus as a tree constructor

It can be efficiently implemented as long as we keep trees fully resolved

Page 5: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Every edge in a phylogenetic tree divides the leaves in two subgroupssubgroups.

Each of these pairs of subgroups are splitssplits of the tree.

EF

G

H

AB

CD

Splits

Page 6: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Tree weight

Our method relies on weighingweighing trees and taking the one with maximum weight

Let the frequencyfrequency of a split in a collection of trees be the number of trees which contain the split divided by the total number of trees in the collection

Let the weightweight of an unrooted phylogenetic tree be the product of its splits frequencies

Page 7: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Most probable tree

A most probable treemost probable tree for a collection of fully resolved phylogenetic trees is a tree that maximizes the weight:

Page 8: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Example

Page 9: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Solution

w = 0.0703125

Page 10: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Running time

The tree weight formula can be written as a product of the frequencies of the small subgroups

We designed an algorithm that finds all most probable trees for a given set of fully resolved phylogenetic trees

The complexity of the algorithm is O(l3t2log(lt)),where l is the number of leaves and t is the number of trees

Page 11: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Experiments

Data setsData sets used to test the new method:

Synthetic data: from Gascuel’s LIRMM site

K2P – Kimura 2 Parameter, no MC

K2Pm – Kimura 2 Parameter, with MC

COV – Covarion model, no MC

COVm – Covarion model, with MC

Real data: Ribosomal RNA

Page 12: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Experiments

ProgramsPrograms used to test the new method (19):Software Method Model

fastMe Minimum evolution JC, K2P

Mega Minimum evolution JC, K2P, TN

Mega Maximum parsimony

Mega Neighbor joining JC, K2P, TN

dnacomp DNA compatibility

dnaml Maximum likelihood

dnapars Maximum parsimony

neighbor Neighbor joining JC, K2P

neighbor UPGMA JC, K2P

weighbor Weighted neighbor joining JC, K2P

Page 13: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Most probable = Median

Page 14: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Reflects general tendency

Page 15: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Results: average split distance

Data set Minimum Distance

K2P 43.44

K2Pm 77.78

COV 52.67

COVm 69.11

Ribosomal 60.71

Consensus consistently yields minimum average split distance

Page 16: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

May result in better tree

Page 17: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Results: distance to “real” tree

Data set Consensus Not Worse Than ...

K2P 72 %

K2Pm 39 %

COV 78 %

COVm 72 %

Ribosomal 100 %

Consensus consistently not worse off than majority of input trees

… of input trees

Page 18: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Theoretical foundations

AB

CD

EF

G

H

Page 19: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

All splits of a tree

AB

CD

EF

G

H AA | BCDEFGH| BCDEFGHBB | ACDEFGH| ACDEFGH

ABAB | CDEFGH| CDEFGH

CC | ABDEFGH| ABDEFGHDD | ABCEFGH| ABCEFGH

HH | ABCDEFG| ABCDEFG

GG | ABCDEFH| ABCDEFH

FF | ABCDEGH| ABCDEGHEE | ABCDFGH| ABCDFGH

CDCD | ABEFGH| ABEFGH

EFEF | ABCDGH| ABCDGH

EFGEFG | ABCDH| ABCDH

ABCDABCD | EFGH| EFGH

Page 20: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Small subgroup of each split

AB

CD

EF

G

H AA | BCDEFGH

BB | ACDEFGH

ABAB | CDEFGH

CC | ABDEFGH

DD | ABCEFGH

HH | ABCDEFG

GG | ABCDEFH

FF | ABCDEGH

EE | ABCDFGH

CDCD | ABEFGH

EFEF | ABCDGH

EFGEFG | ABCDH

ABCDABCD | EFGH

Page 21: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Small subgroups

AABB

ABAB

CCDD

HH

GG

FFEE

CDCD

EFEF

EFGEFG

ABCDABCD

Page 22: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Maximal clusters (n-trees)

AABB

ABAB

CCDD

HH

GG

FFEE

CDCD

EFEF

EFGEFG

ABCDABCD

Page 23: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Fundamental theoretical result

AA BBABAB

CC DDHH

GGFFEE

CDCD

EFEFEFGEFG

ABCDABCD

● The small subgroup set of a phylogenetic tree is always a finite set of n-treesn-trees

● There are exactly three n-trees in this set, and all n-trees are maximal if and only if the phylogenetic tree is fully resolved

Page 24: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Implementation details

DD EE FF GG EFEF GHGH ABCABC

Page 25: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Dynamic programming

DD EE FF GG EFEF GHGH ABCABC

Page 26: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Dynamic programming

DD EE FF GG EFEF GHGH ABCABC

Page 27: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Dynamic programming

DD EE FF GG EFEF GHGH ABCABC

Page 28: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Implementation details

DD EE FF GG EFEF GHGH

FGHFGHDEFDEFABCABCDD EE DEDE

L \

ABCABC

Page 29: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Implementation details

Page 30: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

To Do List

Rooted trees

Polytomies

Non uniform weights for input trees

Page 31: A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Acknowledgments

Scylla Bioinformatics and Institute of Computing, Unicamp, for machine time, infrastructure, and support

Brazilian Research Financing Agency CNPq, grant 470420/2004-9