30
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204

Phylogenetics - Distance-Based Methods

  • Upload
    eros

  • View
    47

  • Download
    1

Embed Size (px)

DESCRIPTION

Phylogenetics - Distance-Based Methods. CIS 667 March 11, 2204. Phylogenetics. Attempts to infer the evolutionary history of a group of organisms or sequences of nucleic acids or proteins - PowerPoint PPT Presentation

Citation preview

Page 1: Phylogenetics - Distance-Based Methods

Phylogenetics - Distance-Based

Methods

CIS 667 March 11, 2204

Page 2: Phylogenetics - Distance-Based Methods

Phylogenetics

• Attempts to infer the evolutionary history of a group of organisms or sequences of nucleic acids or proteins Phylogenetic methods can be used for the

study of evolutionary relationships between species of organisms as well as genes

Attempt to reconstruct evolutionary ancestors

Estimate time of divergence from ancestor

Page 3: Phylogenetics - Distance-Based Methods

Phylogenetic Trees

• We can use phylogenetic trees to illustrate the evolutionary relationships among groups of species or genes

• Leaf nodes of the tree are the species or genes we are comparing, interior nodes are inferred common ancestors

Page 4: Phylogenetics - Distance-Based Methods

Phylogenetic Trees

Phylogenetic Tree for Close Human Relatives

Humans

Orangutans

Chimpanzees Gorillas

Common Ancestor of Gorillas Chimps

Comon Ancestor Gorillas, Chimps, Orangs

Common Ancestor of Humans and Apes

Page 5: Phylogenetics - Distance-Based Methods

History

• Taxonomists used anatomy and physiology to group and classify organisms Morphological features like presence of

feathers or number of legs• When protein sequencing, and later

DNA sequencing became common, amino acid and DNA sequences became the common way to contruct trees

Page 6: Phylogenetics - Distance-Based Methods

Phylogenetic Tree constructed from aa sequences of Cytochrome C

protein

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 7: Phylogenetics - Distance-Based Methods

The Big Picture

• Determine the species or genes to be studied

• Acquire homologous sequence data• Use multiple sequence alignment software

like ClustalW to align• Clean up data by hand• Use phylogenetic analysis software like

Phylip based on techniques we will study• Verify experimentally

Page 8: Phylogenetics - Distance-Based Methods

Phylogenetics

• Can be used to solve a number of interesting problems Forensics

HIV virus mutates rapidly Predicting evolution of influenza viruses Predicting functions of uncharacterized

genes - ortholog detection Drug discovery Vaccine development

Target inferred common ancestor

Page 9: Phylogenetics - Distance-Based Methods

Types of Data

• Two categories Numerical data

Distance between objects E.g.evolutionary distance between two species Usually derived from sequence data

Character data Each character has a finite number of states E.g. number or legs = 1, 2, 4 DNA = {A, C, T, G}

Page 10: Phylogenetics - Distance-Based Methods

Phylogenetic Trees

• Trees are composed of nodes and branches Terminal or leaf nodes correspond to a

gene or organism for which data has been collected

Internal nodes usually represent an inferred common ancestor that gave rise to two independent lineages sometime in the past

Page 11: Phylogenetics - Distance-Based Methods

Rooted and Unrooted Trees

• Some trees make an inference about a common ancestor and the direction of evolution and some don’t First type is called a rooted tree and has

a single node designated as root which is the common ancestor

Second type is called an unrooted tree Specifies only relationship between nodes

and says nothing about direction of evolution

Page 12: Phylogenetics - Distance-Based Methods

Rooted and Unrooted Trees

R

A B C D E

Time

B C

AE

D

Page 13: Phylogenetics - Distance-Based Methods

Rooted and Unrooted Trees

• Roots can usually be assigned to unrooted trees using an outgroup Species unambiguously separated the

earliest from others being studied E.g. baboons in case of humans and

gorillas For three species there are 3 possible

rooted trees, but only one possible unrooted tree

Page 14: Phylogenetics - Distance-Based Methods

Rooted and Unrooted Trees

• In fact the numbers of rooted (NR) and unrooted trees (NU) for n species is NR = (2n - 3)!/2n-2(n - 2)! NU = (2n - 5)!/2n-3(n - 3)!

Data Sets

Rooted Trees Unrooted Trees

2 1 1

3 3 1

4 15 3

5 105 15

10 34,459,425 2,027,025

15 213,458,046,767,875 7,905,853,580,625

20 8,200,794,532,637,891,559,375 221,643,095,476,699,771,875

Page 15: Phylogenetics - Distance-Based Methods

Rooting Trees

• Trees can be rooted by using the outgroup method previously mentioned, or by putting the root midway between the two most distant species as determined by branch length Branch length measures the amount of

difference that occurred along a branch Assumes the species are evolving in a

clock-like manner

Page 16: Phylogenetics - Distance-Based Methods

Rooting a Tree

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 17: Phylogenetics - Distance-Based Methods

More Tree Terminology

• Structure of a phylogenetic tree can be represented in Newick format using nested parentheses (((B, C), (D, E)), A)

• If we lack data to tell in which order two or more independent lineages occurred in the past, the tree may be multifurcating (more than two ancestors) otherwise, it is bifurcating (exactly two ancestors per interior node)

Page 18: Phylogenetics - Distance-Based Methods

Character and Distance Data

• Character-based methods use aligned DNA or protein sequences directly for tree inferenceSpecies A ATCGAATCGTTCCGGASpecies B ATCCAATAGTTCCGGASpecies C AACGAATCCTACCGGTSpecies D ATCGTTTCCAACCGCTSpecies E ATAGATTCGTTCGGGA

Page 19: Phylogenetics - Distance-Based Methods

Character and Distance Data

• Distance-based methods must transform the sequence data into a pairwise similarity matrix for use during tree inference

Species

A B C D

B 2 - - -C 4 5 - -D 7 9 5 -E 3 5 7 8

Page 20: Phylogenetics - Distance-Based Methods

Distance-Based Methods

• Given such an input matrix we want to find an edge-weighted tree where the leafs of the tree correspond to the species and the distances measured between two leaves corresponds to the corresponding matrix value for the leaves

Page 21: Phylogenetics - Distance-Based Methods

UPGMA

• UPGMA (Unweighted Pair Group Method with Arithmetic mean) is the oldest distance matrix method Uses a distance matrix representing

measure of genetic distance between pairs of species being considered

Clusters the two closest species Compute new distance matrix using

arithmetic mean to first cluster Repeat until all species grouped

Page 22: Phylogenetics - Distance-Based Methods

UPGMA

A

B

C E

D

A B C E D

Page 23: Phylogenetics - Distance-Based Methods

Estimation of Branch Length

• Scaled trees, where the length of the branches correspond to the degree to which sequences have diverged are called cladograms

• If rates of evolution are assumed to be constant in all lineages then internal nodes are placed at equal distances from each of the species they give rise to on a bifurcating tree (UPGMA ex.)

Page 24: Phylogenetics - Distance-Based Methods

UPGMA

• So UPGMA is very simple and generates rooted trees, however…

• Major weakness is that the algorithm assumes that rates of evolution are the same among different lineages

• This does not fit existing biological data, so probably shouldn’t use UPGMA to build phylogenetic trees

Page 25: Phylogenetics - Distance-Based Methods

Transformed Distance Method

• Several distance matrix-based alternatives to UPGMA allow different rates of evolution within different lineages Oldest and simplest is the transformed distance

method which takes advantage of an outgroup Other lineages only evolve separately from each

other after they diverged and since the outgroup diverged first we can use it as a frame of reference to compare how much the other lineages evolved by seeing when they diverged

Page 26: Phylogenetics - Distance-Based Methods

Neighbor’s Relation Method

• One variant of UPGMA tries to pair species in such a way as to minimize the sum of the branch lengths On a rooted tree, pairs of species

separated from each other by only one node are called neighbors

We have important relationships between neighbors of a phylogenetic tree with four nodes

Page 27: Phylogenetics - Distance-Based Methods

Neighbor’s Relation Method

A

B

C

D

a

b d

e

c

dAC + dBD = dAD + dBC = a + b + c + d + 2e = dAB + dCD + 2edAB + dCD < dAC + dBD dAB + dCD < dAD + dBC

The following hold for this tree

Page 28: Phylogenetics - Distance-Based Methods

Neighbor’s Relation Method

• Consider all possible pairwise arrangements of four species, and determine which satisfies the four point condition (set of 2 inequalities)

• This process can be iterated to generate a complete tree, but the process is unfeasible for large sets of species

Page 29: Phylogenetics - Distance-Based Methods

Neighbor-Joining Methods

• Other neighborliness approaches are available as well

• Neighbor-joining methods start with all species arranged in a star tree

ab

d

c

e

a

b

cd

e

Page 30: Phylogenetics - Distance-Based Methods

Neighbor-Joining Methods

• The pair of nodes pulled out (grouped) at each iteration are chosen so that the total length of the branches on the tree is minimized

• After a pair of nodes is pulled out, it forms a cluster in the tree and is included in further rounds of iteration (and a new distance matrix is generated)

• The tree’s total branch length is calculated as: Q12 = (N - 2)d12 - (d1i )- (d2i )