52
Phylogenetic Analysis

Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

  • View
    228

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Phylogenetic AnalysisPhylogenetic Analysis

Page 2: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

General comments on phylogeneticsGeneral comments on phylogenetics• Phylogenetics is the branch of

biology that deals with evolutionary relatedness

• Uses some measure of evolutionary relatedness: e.g., morphological features

• Phylogenetics is the branch of biology that deals with evolutionary relatedness

• Uses some measure of evolutionary relatedness: e.g., morphological features

Page 3: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Phylogenetics on sequence data is an attempt to reconstruct the evolutionary history of those sequences

• Relationships between individual sequences are not necessarily the same as those between the organisms they are found in

• Phylogenetics on sequence data is an attempt to reconstruct the evolutionary history of those sequences

• Relationships between individual sequences are not necessarily the same as those between the organisms they are found in

Page 4: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• The ultimate goal is to be able to use sequence data from many sequences to give information about phylogenetic history of organisms

• Phylogenetic relationships usually depicted as trees, with branches representing ancestors of “children”; the bottom of the tree (individual organisms) are leaves. Individual branch points are nodes.

• The ultimate goal is to be able to use sequence data from many sequences to give information about phylogenetic history of organisms

• Phylogenetic relationships usually depicted as trees, with branches representing ancestors of “children”; the bottom of the tree (individual organisms) are leaves. Individual branch points are nodes.

Page 5: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Phylogenetic trees Phylogenetic trees

A B C Dtime

A rooted tree

A

B

C

D

An unrooted tree

time?

Page 6: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• We will only consider binary trees: edges split only into two branches (daughter edges)

• rooted trees have an explicit ancestor; the direction of time is explicit in these trees

• unrooted trees do not have an explicit ancestor; the direction of time is undetermined in such trees

• We will only consider binary trees: edges split only into two branches (daughter edges)

• rooted trees have an explicit ancestor; the direction of time is explicit in these trees

• unrooted trees do not have an explicit ancestor; the direction of time is undetermined in such trees

Page 7: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Types of phylogenetic analysis methodsTypes of phylogenetic analysis methods

• Phenetic: trees are constructed based on observed characteristics, not on evolutionary history

• Cladistic: trees are constructed based on fitting observed characteristics to some model of evolutionary history

• Phenetic: trees are constructed based on observed characteristics, not on evolutionary history

• Cladistic: trees are constructed based on fitting observed characteristics to some model of evolutionary history

Distancemethods

ParsimonyandMaximumLikelihoodmethods

Page 8: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Similarity and HomologySimilarity and Homology

• The evolutionary relationship between sequences is inferred from the similarity of the sequences

• Similarity is a measurable quantity (e.g., % identity, alignment score, etc.)

• Homology is the inference from sequence similarity data that sequences are evolutionarily related

• The evolutionary relationship between sequences is inferred from the similarity of the sequences

• Similarity is a measurable quantity (e.g., % identity, alignment score, etc.)

• Homology is the inference from sequence similarity data that sequences are evolutionarily related

Page 9: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Sequence alignmentsSequence alignments

• Aligning sequences gives information about– Similarity– Areas of sequences that are conserved

through evolution

• Aligning sequences gives information about– Similarity– Areas of sequences that are conserved

through evolution

Page 10: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

The real problem …The real problem …

• How do we compare sequences?• Seq 1: CTGCACTA• Seq 2: CACTA• or C---ACTA

• How do we compare sequences?• Seq 1: CTGCACTA• Seq 2: CACTA• or C---ACTA

Page 11: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

The real problem …The real problem …

• How do we compare sequences?• Seq 1: CTGCACTA• Seq 2: CACTA• or C---ACTA• Scoring tries to approximate evolution:

scores for substitutions and for gaps (insertions/deletions)

• Scores = sum of terms for substitutions and for gaps (sequence as character string)

• How do we compare sequences?• Seq 1: CTGCACTA• Seq 2: CACTA• or C---ACTA• Scoring tries to approximate evolution:

scores for substitutions and for gaps (insertions/deletions)

• Scores = sum of terms for substitutions and for gaps (sequence as character string)

41 17

Page 12: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Sequence alignment ISequence alignment I

• Simplest scoring: 1 for match, 0 for no match

• CTGCACTA• CACTA

• CTGCACTA• C---ACTA

• Simplest scoring: 1 for match, 0 for no match

• CTGCACTA• CACTA

• CTGCACTA• C---ACTA

Score = 5

Score = 5

Page 13: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Sequence alignment IISequence alignment II

• Slightly more advanced scoring: +1 for match, 0 for no match, -1 for gap

• Slightly more advanced scoring: +1 for match, 0 for no match, -1 for gap

•CTGCACTA

• CACTA

•CTGCACTA

•C---ACTA

Score = 5

Score = 2

Page 14: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• G C A T• G 1 0 0 0• C 0 1 0 0• A 0 0 1 0• T 0 0 0 1

• G C A T• G 1 -1 -1 -1• C -1 1 -1 -1• A -1 -1 1 -1• T -1 -1 -1 1• Identity scoring matrices: top, simple form; below, with mismatch penalty

• G C A T• G 1 0 0 0• C 0 1 0 0• A 0 0 1 0• T 0 0 0 1

• G C A T• G 1 -1 -1 -1• C -1 1 -1 -1• A -1 -1 1 -1• T -1 -1 -1 1• Identity scoring matrices: top, simple form; below, with mismatch penalty

Page 15: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

In-class exercise IIIn-class exercise II

• Using the “advanced scoring method” calculate the scores for the following pairs of nucleotide sequences:

• Using the “advanced scoring method” calculate the scores for the following pairs of nucleotide sequences:

CCTGGGCTATGC

CAGGGTT-TGC

CCTGGGCTATGC

CA-GGG-TTTGC

Page 16: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

What about proteins?What about proteins?

• Chemistry of amino acids means that some substitutions in the sequence are better than others

• Substitution matrix: empirically derived scores for frequency of substitution of each amino acid for all 19 others.

• Chemistry of amino acids means that some substitutions in the sequence are better than others

• Substitution matrix: empirically derived scores for frequency of substitution of each amino acid for all 19 others.

Page 17: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

BLOSUM 62 Substitution matrix

BLOSUM 62 Substitution matrix

Page 18: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

In-class exercise III In-class exercise III • Using the BLOSUM62 substitution matrix

and a gap penalty of -2, score the following pairs of protein sequences (do not penalize end gaps)

• Using the BLOSUM62 substitution matrix and a gap penalty of -2, score the following pairs of protein sequences (do not penalize end gaps)

YIHMNVFLSFML

RVGAANFPNPRL

YIHMNVFLSFML

FIHMNLFVSFML

YIHMNVFLSFML

IHMNLFV--SFML

YIHMNVFLSFML

IVLSMMFFLNHY

Page 19: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Dynamic programming: strategyDynamic programming: strategy• Break alignment problem into small

pieces• Optimize first piece• Then extend into second piece; since

first piece is optimized already, program only needs to optimize extension

• Continue until end of comparison

• Break alignment problem into small pieces

• Optimize first piece• Then extend into second piece; since

first piece is optimized already, program only needs to optimize extension

• Continue until end of comparison

Page 20: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

H E E

0 -6 -12 -18

H -6 10 4 -2

E -12 4 16 10

A -18 -2 10 15

E -24 -8 4 16

Page 21: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

H E E

0 -6 -12 -18

H -6 10 4 -2

E -12 4 16 10

A -18 -2 10 15

E -24 -8 4 16

Page 22: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

H E E

0 -6 -12 -18

H -6 10 4 -2

E -12 4 16 10

A -18 -2 10 15

E -24 -8 4 16

Page 23: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Why multiple alignments?Why multiple alignments?

• Alignment of more than two sequences• Usually gives better information about

conserved regions and function (more data)

• Better estimate of significance when using a sequence of unknown function

• Must use multiple alignments when establishing phylogenetic relationships

• Alignment of more than two sequences• Usually gives better information about

conserved regions and function (more data)

• Better estimate of significance when using a sequence of unknown function

• Must use multiple alignments when establishing phylogenetic relationships

Page 24: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Dynamic programming extended to many dimensions?

Dynamic programming extended to many dimensions?• No – uses up too much computer time and

space• E.g. 200 amino acids in a pairwise

alignment – must evaluate 4 x 104 matrix elements

• If 3 sequences, 8 x 106 matrix elements• If 6 sequences, 6.4 x 1013 matrix

elements

• No – uses up too much computer time and space

• E.g. 200 amino acids in a pairwise alignment – must evaluate 4 x 104 matrix elements

• If 3 sequences, 8 x 106 matrix elements• If 6 sequences, 6.4 x 1013 matrix

elements

Page 25: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Need to find more efficient method

• Sacrifice certainty of optimum alignment for certainty of good alignment but faster

• Need to find more efficient method

• Sacrifice certainty of optimum alignment for certainty of good alignment but faster

Page 26: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Feng-doolittle algorithmFeng-doolittle algorithm

• Does all pairwise alignments and scores them

• Converts pairwise scores to “distances”• D = -logSeff = -log [(Sobs –Srand)/(Smax –Srand)]• Sobs = pairwise alignment score• Srand = expected score for random alignment• Smax = average of self-alignments of the two

sequences

• Does all pairwise alignments and scores them

• Converts pairwise scores to “distances”• D = -logSeff = -log [(Sobs –Srand)/(Smax –Srand)]• Sobs = pairwise alignment score• Srand = expected score for random alignment• Smax = average of self-alignments of the two

sequences

Page 27: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• As Smax approaches Srand (increasing evolutionary distance), Seff goes down; to make the distance measure positive, use the -log

• As Smax approaches Srand (increasing evolutionary distance), Seff goes down; to make the distance measure positive, use the -log

Page 28: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Once the distances have been calculated, construct a guide tree (more in the phylogeny class) – tells what order to group the sequences

• Sequences can be aligned with sequences or groups; groups can be aligned with groups

• Once the distances have been calculated, construct a guide tree (more in the phylogeny class) – tells what order to group the sequences

• Sequences can be aligned with sequences or groups; groups can be aligned with groups

Page 29: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Sequence-sequence alignments: dynamic programming

• Sequence-group alignments: all possible pairwise alignments between sequence and group are tried, highest scoring pair is how it gets aligned to group

• Group-group alignments: all possible pairwise alignments of sequences between groups are tried; highest scoring pair is how groups get aligned

• Sequence-sequence alignments: dynamic programming

• Sequence-group alignments: all possible pairwise alignments between sequence and group are tried, highest scoring pair is how it gets aligned to group

• Group-group alignments: all possible pairwise alignments of sequences between groups are tried; highest scoring pair is how groups get aligned

Page 30: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

ExampleExample

Seq1 Seq2 Seq3 Seq4 Seq5

Alignment 1 Alignment 2

Alignment 3 Final alignment

Page 31: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Notice that this method does not guarantee the optimum alignment; just a good one.

Gaps are preserved from alignment to alignment: “once a gap, always a gap”

Notice that this method does not guarantee the optimum alignment; just a good one.

Gaps are preserved from alignment to alignment: “once a gap, always a gap”

Page 32: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Distance methodsDistance methods

• Measuring distance -- just like when we talked about multiple alignment, distance represents all the differences at the various positions; these differences can be treated as equal or weighted according to empirical knowledge of substitution rates

• Measuring distance -- just like when we talked about multiple alignment, distance represents all the differences at the various positions; these differences can be treated as equal or weighted according to empirical knowledge of substitution rates

Page 33: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Another way to say this is that there are a set of distances dij between each pair of sequences i,j in the dataset. dij can be the fraction f of sites u where residues xi and xj differ; or dij can be such a fraction but weighted in some way (e.g. Jukes-Cantor distance)

• Another way to say this is that there are a set of distances dij between each pair of sequences i,j in the dataset. dij can be the fraction f of sites u where residues xi and xj differ; or dij can be such a fraction but weighted in some way (e.g. Jukes-Cantor distance)

Page 34: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Clustering algorithmsClustering algorithms

• UPGMA -- this is the distance clustering method that is used in pileup to make the guide tree

• dij is the average distance between pairs of sequences found in two clusters, Ci and Cj.

• Text’s notation: |Ci| = number of sequences in Ci

• UPGMA -- this is the distance clustering method that is used in pileup to make the guide tree

• dij is the average distance between pairs of sequences found in two clusters, Ci and Cj.

• Text’s notation: |Ci| = number of sequences in Ci

Page 35: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• The algorithm in the text means just what we said before: find the closest distance between two sequences, cluster those; then find the next closest distance, cluster those; as sequences are added to existing clusters find the average distance between existing clusters

• Work through the notation!• UPGMA assumes a molecular clock

mechanism of evolution

• The algorithm in the text means just what we said before: find the closest distance between two sequences, cluster those; then find the next closest distance, cluster those; as sequences are added to existing clusters find the average distance between existing clusters

• Work through the notation!• UPGMA assumes a molecular clock

mechanism of evolution

Page 36: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Neighbor-joining: corrects for UPGMA’s assumption of the same rate of evolution for each branch by modifying the distance matrix to reflect different rates of change.

• The net difference between sequence i and all other sequences is

• ri = dik

• Neighbor-joining: corrects for UPGMA’s assumption of the same rate of evolution for each branch by modifying the distance matrix to reflect different rates of change.

• The net difference between sequence i and all other sequences is

• ri = dikk

Page 37: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• The rate-corrected distance matrix is then

• Mij = dij - (ri + rj)/(n - 2)

• Join the two sequences whose Mij is minimal; then calculate the distance from this new node to all other sequences using

• dkm = (dim + djm - dij)/2

• Again correct for rates and join nodes.

• The rate-corrected distance matrix is then

• Mij = dij - (ri + rj)/(n - 2)

• Join the two sequences whose Mij is minimal; then calculate the distance from this new node to all other sequences using

• dkm = (dim + djm - dij)/2

• Again correct for rates and join nodes.

Page 38: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

In-class exercise IIn-class exercise I• Retrieve the file named phylo2 from

bioinfI.list in my directory• Open it in the editor, select all the sequencs• Select Functions Evolution

PAUPSearch; in Tree Optimality Criterion choose distance; in Method for Obtaining Best Tree choose heuristic. Leave everything else as default (make sure bootstrap option is not selected)

• Select Run. Inspect output

• Retrieve the file named phylo2 from bioinfI.list in my directory

• Open it in the editor, select all the sequencs• Select Functions Evolution

PAUPSearch; in Tree Optimality Criterion choose distance; in Method for Obtaining Best Tree choose heuristic. Leave everything else as default (make sure bootstrap option is not selected)

• Select Run. Inspect output

Page 39: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Parsimony methodsParsimony methods• Parsimony methods are based on the

idea that the most probable evolutionary pathway is the one that requires the smallest number of changes from some ancestral state

• For sequences, this implies treating each position separately and finding the minimal number of substitutions at each position

• Parsimony methods are based on the idea that the most probable evolutionary pathway is the one that requires the smallest number of changes from some ancestral state

• For sequences, this implies treating each position separately and finding the minimal number of substitutions at each position

Page 40: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Example of parsimonious tree building Example of parsimonious tree building • Tree on left

requires only one change, tree on left requires two: left tree is most parsimonious

• Tree on left requires only one change, tree on left requires two: left tree is most parsimonious

Page 41: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Parsimony methods assign a cost to each tree available to the dataset, then screen trees available to the dataset and select the most parsimonious

• Screening all the trees available to even a smallish dataset would take too much time; branch and bound method builds trees with increasing numbers of leaves but abandons the topology whenever the current tree has a bigger cost than any complete tree

• Parsimony methods assign a cost to each tree available to the dataset, then screen trees available to the dataset and select the most parsimonious

• Screening all the trees available to even a smallish dataset would take too much time; branch and bound method builds trees with increasing numbers of leaves but abandons the topology whenever the current tree has a bigger cost than any complete tree

Page 42: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

In-class exercise IIIn-class exercise II

• Use same data set and program as in exercise I, but choose maximum parsimony. Use heuristic for the tree building method.

• Inspect your tree. Compare it to the distance generated tree.

• Use same data set and program as in exercise I, but choose maximum parsimony. Use heuristic for the tree building method.

• Inspect your tree. Compare it to the distance generated tree.

Page 43: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Maximum likelihood methodsMaximum likelihood methods• Maximum likelihood reconstructs a

tree according to an explicit model of evolution. For the given model, no other method will work as well

• But, such models must be simple, because the method is computationally intensive

• Maximum likelihood reconstructs a tree according to an explicit model of evolution. For the given model, no other method will work as well

• But, such models must be simple, because the method is computationally intensive

Page 44: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Actually, all the other methods discussed implicitly use a simple model of evolution similar to the typical model made explicit in maximum likelihood:

• All sites selectively neutral

• All mutate independently, forward and reverse rates equal, given by

• Actually, all the other methods discussed implicitly use a simple model of evolution similar to the typical model made explicit in maximum likelihood:

• All sites selectively neutral

• All mutate independently, forward and reverse rates equal, given by

Page 45: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Also assume discrete generations and sites change independently

• Given this model, can calculate probability that a site with initial nucleotide I will change to nucleotide j within time t:

• Ptij = ije-t + (1 - e-t)gj, where ij = 1 if i = j

and ij = 0 otherwise, and where gj is the equilibrium frequency of nucleotide j

• Also assume discrete generations and sites change independently

• Given this model, can calculate probability that a site with initial nucleotide I will change to nucleotide j within time t:

• Ptij = ije-t + (1 - e-t)gj, where ij = 1 if i = j

and ij = 0 otherwise, and where gj is the equilibrium frequency of nucleotide j

Page 46: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• The likelihood that some site is in state i at the kth node of a tree is Li

(k)

• The likelihoods for all states for each site for each node are calculated separately; the product of the likelihoods for each site gives the overall likelihood for the observed data

• Different tree topologies are searched to find the highest overall likelihood

• The likelihood that some site is in state i at the kth node of a tree is Li

(k)

• The likelihoods for all states for each site for each node are calculated separately; the product of the likelihoods for each site gives the overall likelihood for the observed data

• Different tree topologies are searched to find the highest overall likelihood

Page 47: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Maximum likelihood is maybe the “gold standard” for phylogenetic analysis; but because of its computational intensity it can only be used for select data and only after much initial fine tuning of many parameters of sequence alignments

• Often used to distinguish between several already generated trees

• Maximum likelihood is maybe the “gold standard” for phylogenetic analysis; but because of its computational intensity it can only be used for select data and only after much initial fine tuning of many parameters of sequence alignments

• Often used to distinguish between several already generated trees

Page 48: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Assessing treesAssessing trees

• The bootstrap: randomly sample all positions (columns in an alignment) with replacement -- meaning some columns can be repeated -- but conserving the number of positions; build a large dataset of these randomized samples

• The bootstrap: randomly sample all positions (columns in an alignment) with replacement -- meaning some columns can be repeated -- but conserving the number of positions; build a large dataset of these randomized samples

Page 49: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

Bootstrap alignment processBootstrap alignment process

Page 50: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Then use your method (distance, parsimony, likelihood) to generate another tree

• Do this a thousand or so times • Note that if the assumptions the method is

based on hold, you should always get the same tree from the bootstrapped alignments as you did originally

• The frequency of some feature of your phylogeny in the bootstrapped set gives some measure of the confidence you can have for this feature

• Then use your method (distance, parsimony, likelihood) to generate another tree

• Do this a thousand or so times • Note that if the assumptions the method is

based on hold, you should always get the same tree from the bootstrapped alignments as you did originally

• The frequency of some feature of your phylogeny in the bootstrapped set gives some measure of the confidence you can have for this feature

Page 51: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

In-class exercise IIIIn-class exercise III

• Use the same dataset, select distance again. This time, select the bootstrap box.

• In options, make sure to select the box labelled Save a file containing PAUP screen output. Take defaults for everything else. Run.

• Inspect your output. In particular, look at the paup.log file and compare it to the paupdisplay.figure file.

• Use the same dataset, select distance again. This time, select the bootstrap box.

• In options, make sure to select the box labelled Save a file containing PAUP screen output. Take defaults for everything else. Run.

• Inspect your output. In particular, look at the paup.log file and compare it to the paupdisplay.figure file.

Page 52: Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure

• Repeat for the maximum parsimony method.

• Were the original trees (not bootstrapped) meaningful?

• Repeat for the maximum parsimony method.

• Were the original trees (not bootstrapped) meaningful?