Building phylognetic trees

Building phylognetic treesRead Chapter 5

Building a tree Aim in building a phylogenetic tree is to

use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.

Organisms with many characters in common are more likely to be related than those with few in common.

Building a tree We want to use characters that are

homologous [shared because of common ancestry] rather than analagous [independently evolved].

But how is this to be done? Turns out that there are many

approaches the first of which is to apply parsimony.

Parsimony The basic idea of parsimony in tree

building is to build a tree that requires the fewest evolutionary changes in its construction.

In the following trees one species differs from the other three. In each tree a single evolutionary change is all that is required to build it.

Parsimony Similarly, we can [in the next slide]

analyze a situation where two non-sister taxa (3&4) share a trait.

There are two equally likely explanations in this case.

Parsimony The same logic applies when dealing

with multiple traits (3 traits each with two states in the next example).

Parsimony Each trait is treated separately and the

most parsimonious explanation is calculated.

Parsimony When the data are pooled a total of

five changes are present on the tree.

Parsimony Its turns out that the tree we just dealt

with is not the most parsimonious tree.

It is possible to build a tree that has only three changes [it is impossible to have fewer than three changes].

Parsimony – Fitch algorithm In the previous example it was easy to

see the minimum number of changes needed to make a most parsimonious tree.

For larger trees this is not so simple to do.

The Fitch algorithm can be used to figure the minimum number of changes necessary for a given tree.

Parsimony – Fitch algorithm The Fitch algorithm begins at the

branch tips of a tree and proceeds towards the base of the tree.

A running count is kept of the number of the character changes needed.

Parsimony – Fitch algorithm As we proceed down the tree each

internal node is assigned one or more character states.

Two rules are used to assign character states at nodes.

Parsimony – Fitch algorithm Rule 1. If the two daughters of a node

share no stated in common we assign to the node all possible states for both daughters.

In other words the set of possible traits at the node is the union of the sets of possible traits for daughters 1 and 2.

In this case we increase the tally of character changes by one.

Parsimony – Fitch algorithm Rule 2. If the daughters of a node

share one or more possible states of a trait then we assign the shared states to the node.

In other words we assign the intersection of the sets of possible states for each daughter to the node.

In this case we do not increase the tally of character changes.

Parsimony The Fitch algorithm just tells us the

minimum number of changes needed for a given tree.

It does not tell us if a different tree would have fewer.

In order to compare different trees to find the most parsimonious we would have to repeat the Fitch process for all the trees.

Distance Methods Another approach to building

phylogenetic trees is to use distance methods.

In this approach pairwise distances, (where distance is a measure of morphological or genetic differences between species) are calculated and used in tree construction.

Distance Methods Distances can be:

› Counts of the number of character differences between species.

› Based on morphological measurements› In living species most commonly counts of

base pair differences in DNA sequences or amino acid differences coded for are used to build trees.

Sequence alignment Because insertion/deletion mutations

occur and can shift the reading frame of a length of DNA sometimes sequences need to be aligned before using them to build a phylogenetic tree.

Distance methods Once distance measures have been

calculated the pairwise measures (differences between individual pairs of species) are arranged into a distance matrix.

Distance methods Once distance measures are tabulated

we need to figure out how to arrange these data on a tree and decide how long to make the branches.

For four species there is only one basic tree shape and only three pairwise species arrangements.

Distance methods There are multiple statistical procedures

that can be used to construct trees using distance data. The details of these are beyond the scope of this class.

However, the aim of all of them is to find a tree topology (or structure) in which each pairwise distance in the tree is as close as possible to that in the data matrix.

Distance methods One philosophical objection to trees

built using distance methods is that they don’t explicitly incorporate underlying evolutionary relationships.

They are similarity measures (and assume that similarity reflects homology), but analagous traits may sometimes be used.

How many trees are there? We have spent a lot of time looking at

ways of assessing how well trees are supported by data.

However, the big challenge in building phylogenies is in identifying potentially useful trees from the huge number of potential trees

How many trees are there? It turns out that the number of

potential phylogenetic trees increases exponentially with the number of taxa in the tree.

How many trees are there? The challenge for phylogenticists who

cannot search every possible tree is to develop strategies to search only for plausible trees.

Very computer intensive algorithms are used to do this, but the underlying methodologies are beyond the scope of this class.

Statistical confidence in phylogenies

Phylogenetic trees are hypotheses about the relationships between taxa.

Once a tree is constructed how much confidence can we have that the tree (or some part of it) is correct?

This is an issue of statistical confidence.

Statistical confidence in phylogenies

There are a number of techniques that scientists have developed to measure how well the data support a given tree.

One of the most widely used is bootstrap resampling.

Bootstrap resampling Bootstrap resampling is based on the idea

that the data set that the phylogeny is based on is itself only one possible set of data that the tree could have been built with.

How sensitive is the tree’s structure to the set of data we used? If we had used a similar but not identical set of data would we have produced the same tree?

Bootstrap resampling To carry out a bootstrap analysis we

simply resample from our original character matrix.

We randomly pick sets of traits with replacement from our data set and the new data matrix is used to build a phylogenetic tree. That tree is then compared to the original tree.

Bootstrap resampling After repeated bootstrap resamplings

we see how often the new trees match the original tree.

If resampled trees match the original tree 90% of the time we say the tree has 90% bootstrap support.

Example of Bootstrap analysis

For a considerable period of time before widespread genomic analysis there was controversy about whether the closest relatives of the eutherian (or placental) mammals were the marsupials or the monotremes.


In 2001 Killian et al. sequenced a large nuclear gene from 11 species of placental mammal, two marsupials and two monotremes.

Using the sequence data they constructed a phylogeny of the mammals that indicated the placental and marsupial mammals were sister groups.


To check how strongly their data supported the monophyly of the placental and marsupial mammals Killian et al. carried out a bootstrap resampling analysis of their data.

The results showed that the marsupials and placental mammals formed a monophyletic clade in 100% of the trees.


The bootstrap analysis thus indicated that strongly supported for this data set the monophyly of the placental and marsupial mammals.

Since Killian’s paper numerous other studies of nuclear DNA have supported this conclusion.

Documents

Building phylognetic trees