31

Distance based method

Embed Size (px)

Citation preview

Page 1: Distance based method
Page 2: Distance based method

PHYLOGENETIC TREE CONSTRUCTION BY DISTANCE

BASED METHOD

Page 3: Distance based method

INTRODUCTION A phylogenetic tree also known as

a phylogeny is a diagram that depicts the lines of evolutionary descent of different species, organisms, or genes from a common ancestor. Attempt to reconstruct evolutionary

ancestors Estimate time of divergence from ancestor

Page 4: Distance based method

Can be used to solve a number of interesting problems Forensics

• HIV virus mutates rapidly Predicting evolution of influenza viruses Predicting functions of uncharacterized

genes - ortholog detection Drug discovery Vaccine development

• Target inferred common ancestor

Page 5: Distance based method

HOW TO CONSTRUCT A PHYLOGENETIC TREE

Step1: Make a multiple alignment from base alignment or amino acid sequence (by using MUSCLE, BLAST, or other method)

Page 6: Distance based method

Step 2: Check the multiple alignment if it reflects the evolutionary process.

Step3: Choose what method we are going to use and calculate the distance or use the result depending on the method.

Step 4: Verify the result statistically.

Page 7: Distance based method

TYPES OF APPROACHES CHARACTER BASED APPROACH It makes use of all known evolutionary information, i.e. the individual substitutions among the sequences, to determine the most likely ancestral sequences.

Page 8: Distance based method

DISTANCE BASED APPROACH Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of "genetic distance" between the sequences being classified and therefore they require an MSA(multiple sequnce alignment) as an input.

Page 9: Distance based method

Distance-based methods must transform the sequence data into a pairwise similarity matrix for use during tree inference.

Page 10: Distance based method

VARIOUS DISTANCE BASED METHODS

1. UPGMA2. NJ(Neighbor Joining)3. FM(Fitch-Margoliash)4. Minimum evolution

Page 11: Distance based method

UPGMA• Stands for Unweighted pair group

method with arithmetic mean.• Originally developed for numeric

taxonomy in 1958 by Sokal and Michener.

• This method uses sequential clustering algorithm.

Page 12: Distance based method

This method follows a clustering procedure:

(1) Assume that initially each species is a cluster on its own.

(2) Join closest 2 clusters and recalculate distance of the joint pair by taking the average.

(3) Repeat this process until all species are connected in a single cluster.

Page 13: Distance based method

CONSTRUCTION OF PHYLOGENETIC TREE

Page 14: Distance based method
Page 15: Distance based method
Page 16: Distance based method
Page 17: Distance based method

DRAWBACK• Strictly speaking, this algorithm is

phenetic, which does not aim to reflect evolutionary descent.

• It assigns equal weight on the distance and assumes a randomized molecular clock.

• WPGMA(Weighted Pair Group Method with Arithmetic Mean)is a similar algorithm but assigns different weight on the distances.

Page 18: Distance based method

NEIGHBOUR JOINING METHOD Neighbor-joining methods apply general 

data clustering techniques to sequence analysis using genetic distance as a clustering metric.

 Developed in 1987 by Saitou and Nei.

 The simple neighbor-joining method produces unrooted trees, but it does not assume a constant rate of evolution (i.e., a molecular clock) across lineages. 

Page 19: Distance based method

It begins with an unresolved star-like tree . Each pair is evaluated for being joined and

the sum of all branches length is calculated of the resultant tree.

The pair that yields the smallest sum is considered the closest neighbors and is thus joined .

A new branch is inserted between them and the rest of the tree and the branch length is recalculated.

This process is repeated until only one terminal is present.

Page 20: Distance based method

DRAWBACKS But it produces only one tree and

neglects other possible trees, which might be as good as NJ trees, if not significantly better.

Moreover since errors in distance estimates are exponentially larger for longer distances, under some condition, this method will yield a biased tree.

Page 21: Distance based method
Page 22: Distance based method

WEIGHTED NEIGHBOUR JOINING(WEIGHBOR)

It is a new method proposed recently. The Weighbor criterion consists of two

terms; 1. additivity term (of external branches) 2. positivity term (of internal branches), that quantifies the implications of joining the pair.

Page 23: Distance based method

Weighbor gives less weight to the longer distances in the distance matrix and the resulting trees are less sensitive to specific biases than NJ and relatively immune to the "long branches attraction/distraction" drawbacks observed with other methods.

Page 24: Distance based method

FITCH – MARGOLIASH METHOD Proposed in 1967 Produces unrooted trees Criteria for fitting trees to distance matrices Uses a weighted least squares method for

clustering based on genetic distance. Closely related sequences are given more

weight in the tree construction process to correct for the increased inaccuracy in measuring distances between distantly related sequences.

Page 25: Distance based method
Page 26: Distance based method

MINIMUM EVOLUTION First decribed by Kidd & Sgaramella –

Zonta in 1971, then earlier by Rzhetsky & Nei in 1992.

Based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one.

Unrooted metric trees

Page 27: Distance based method

In ME, the tree that minimizes the lengths of the tree, which is the sum of the lengths of the branches, is regarded as the estimate of the phylogeny:

where n is the number of taxa in the tree, vi is the ith branch.

Page 28: Distance based method

DRAWBACKS In principle all different tree topologies

have tobe investigated to find the minimum tree. However, this is impossible in practice because of the explosive increase in the number of tree topologies.

Slower than clustering methods. Information lot when characters

transformed to distances.

Page 29: Distance based method

ADVANTAGES OF DISTANCE BASED APPROACH

Less sensitive to variations in evolutionary rate than cluster analysis

Fast Can handle many sequences at a time Produce a reasonable estimate of

phylogeny

Page 30: Distance based method

DISADVANTAGES OF DISTANCE BASED APPROACH

More sensitive than Parsimony or Maximum Likelihood to systematic errors.

The relationship between the individual characters and the tree is lost in the process of reducing characters to distances.

Strength of the technique is dependent on accuracy of the distance estimate, and thus dependent on the model used to obtain the distance matrix.

Page 31: Distance based method

THANK YOU