15
1 General Phylogenetics General Phylogenetics ints that will be covered in this presentati ints that will be covered in this presentati ree Terminology ree Terminology eneral Points About Phylogenetic Trees eneral Points About Phylogenetic Trees hylogenetic Analyses hylogenetic Analyses The importance of Alignments The importance of Alignments The different analysis methods The different analysis methods Tree confidence measures Tree confidence measures

General Phylogenetics

  • Upload
    joshua

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Points that will be covered in this presentation Tree Terminology General Points About Phylogenetic Trees Phylogenetic Analyses The importance of Alignments The different analysis methods Tree confidence measures. General Phylogenetics. Tree Terminology. - PowerPoint PPT Presentation

Citation preview

Page 1: General Phylogenetics

1

General PhylogeneticsGeneral Phylogenetics

Points that will be covered in this presentationPoints that will be covered in this presentation

•Tree TerminologyTree Terminology

•General Points About Phylogenetic TreesGeneral Points About Phylogenetic Trees

•Phylogenetic AnalysesPhylogenetic AnalysesThe importance of AlignmentsThe importance of Alignments

The different analysis methodsThe different analysis methods

Tree confidence measuresTree confidence measures

Page 2: General Phylogenetics

2

Tree TerminologyTree Terminology

Node: point at which 2 or more branches diverge

Internal node: hypothetical last common ancestorTerminal node: molecular or morphological data from which the tree is derived. (These will often be used to represent species or individual specimens and may be referred to as OTUs = Operational Taxonomic Units)

Clade: a node (hypothetical ancestor) and all the lineages descending from it

internal node

terminal nodeor OTU

internal node

terminal nodeor OTU

clade clade

Page 3: General Phylogenetics

3

Tree TerminologyTree Terminology

Monophyletic group: a group in which all members are derived from a unique common ancestor

Polyphyletic group: a group in which all members are not derived from a unique common ancestor. The common ancestor of the group has many descendants that are not in the group

Paraphyletic group: a group that excludes some of the descendants of the common ancestor (a form of polyphyly)

Page 4: General Phylogenetics

4

General Points About Phylogenetic TreesGeneral Points About Phylogenetic Trees

All branches can rotate freely around a node (i.e. B is not more closely related to C than A, and C is not more closely related to D than E)

A

B

C

D

E

Branch lengths may be be drawn as equal between nodes – “cladograms” (see tree above)

(these are used when one is interested only in the branching pattern)

Branch lengths may be proportional to the hypothesized distance between nodes – “phylogram” (see tree on left)

A

B

C

D

E

Page 5: General Phylogenetics

5

polytomy

General Points About Phylogenetic TreesGeneral Points About Phylogenetic Trees

polytomy

Fully resolved trees are bifurcating (only two decendant lineages from nodes)

A node with more than two decendant lineages is a multifurcating node or a polytomy.

Polytomies may be “soft” or “hard”

“Soft” = product of data or analysis

“Hard” = product of biology

Page 6: General Phylogenetics

6

polytomy

General Points About Phylogenetic TreesGeneral Points About Phylogenetic Trees

Example of a “soft” polytomy: LSU analysis is unable to resolve the relationships of some Ptilophora species.

LSU tree

Using different data (rbcL) the relationships among Ptilophora species are better resolved.

rbcL tree

Tronchin et al. 2004

Tronchin et al. 2004

Page 7: General Phylogenetics

7

Phylogenetic AnalysesPhylogenetic Analyses

The Importance of AlignmentsThe Importance of Alignments

Phylogenetic trees derived from the analysis of DNA or amino acid Phylogenetic trees derived from the analysis of DNA or amino acid sequences are only as good as the data they are based upon.sequences are only as good as the data they are based upon.

Garbage In = Garbage OutGarbage In = Garbage Out

Consequently, sequence alignment is the most important step in Consequently, sequence alignment is the most important step in phylogenetic analysis.phylogenetic analysis.

The aligned sites of a sequence must be The aligned sites of a sequence must be homologoushomologous (or identical by decent (or identical by decent = taxa share the same state because their ancestor did).= taxa share the same state because their ancestor did).

If two taxa share the same state but not by decent it is called If two taxa share the same state but not by decent it is called homoplasyhomoplasy

Page 8: General Phylogenetics

8

The Importance of AlignmentsThe Importance of Alignments

Phylogenetic AnalysesPhylogenetic Analyses

same sites in different sequences need to be

homologous inferred insertion/deletion mutations (gaps)

area to possibly remove from analyses because of

uncertain homology between sites

DNA sequences are prone to homoplasy because there are only 4 possible sites (and DNA sequences are prone to homoplasy because there are only 4 possible sites (and insertion/deletion mutations[indels] for some loci).insertion/deletion mutations[indels] for some loci).

Page 9: General Phylogenetics

9

Phylogenetic AnalysesPhylogenetic Analyses

The Different Analysis MethodsThe Different Analysis MethodsSee: evolution.genetics.washington.edu/phylip/software.html#methods for a list of software programsSee: evolution.genetics.washington.edu/phylip/software.html#methods for a list of software programs

Distance methods: based on similarity between OTUs

UPGMA – originally used for phenotypic characters in numerical taxonomy. Generally not applied to sequence data because it is highly sensitive to mutation rate changes in lineages, i.e. the data must fit a “molecular clock.”

NJ (Neighbor Joining) – algorithm method that will find the “minimum evolution” tree without examining all possible topologies.

The accuracy of a distance tree depends on 2 things:

1)How “true” are the distances calculated between taxa (how good is the model of evolution that your distances are based upon).

2) The standard error of the distance measure estimation

Page 10: General Phylogenetics

10

Phylogenetic AnalysesPhylogenetic Analyses

The Different Analysis MethodsThe Different Analysis MethodsOptimization methods

•Parsimony: searching for the tree that requires the least number of mutational steps i.e. the simplest is the best.

•Maximum Likelihood: searching for the most likely tree (the tree with highest probability) given the OTUs (sequences) and model of evolution i.e. the tree that maximizes the probability of observing the data is the best tree.

•Bayesian: searching for the best set of trees i.e. the set of trees in which the likelihoods are so similar that changes between them are essentially random.

Page 11: General Phylogenetics

11

Phylogenetic AnalysesPhylogenetic Analyses

Tree Confidence MeasuresTree Confidence MeasuresDecay Analysis or Goodman-Bremer Support Values: a test used in parsimony analyses where one determines how many steps less parsimonious than minimal, is a particular branch in your tree no longer resolved in the consensus of all possible trees that length.

Most parsimonious treeL = 35

One step less parsimoniousL = 36

Two steps less parsimoniousL = 37

d1

d2

How meaningful the values are may depend on the tree length.

Page 12: General Phylogenetics

12

Phylogenetic AnalysesPhylogenetic Analyses

Tree Confidence MeasuresTree Confidence Measures

Bootstrapping: A non-parametric test of how well the data support the nodes of a given tree.

Determining support is a bit of a statistical problem: Evolution only happened once so there is no underlying distribution to sample in order to develop confidence values.

Method: the original analysis is performed multiple times on pseudo-datasets derived by sampling the original dataset with replacement. The number, or fraction, of times that a particular clade is present in the resulting trees is its boostrap value.

Bootstrapping is not portable i.e. you can not compare values across studies because changing any parameters will change the values.

Page 13: General Phylogenetics

13

Tree Confidence MeasuresTree Confidence MeasuresBootstrapping

By default most programs will show bootstrap values when they are greater than 50 but, does a bootstrap value of 50 mean anything?

For a discussion of this see Hillis & Bull (1993) Systematic Biology 42:182-192 (they tested bootstrap values based on a known phylogeny).

Wilson’s General Rule:

•60-80, is there other evidence to support the relationship, be cautious;

•80-90, usually pretty solid;

•90-100, solid and unlikely to be misleading.

Phylogenetic AnalysesPhylogenetic Analyses

Page 14: General Phylogenetics

14

General Points About Phylogenetic TreesGeneral Points About Phylogenetic Trees

DNA or protein sequence trees are hypotheses of how a particular DNA locus or protein has evolved.

We assume that the way the DNA or protein has evolved reflects the way the species has evolved i.e. gene tree = species tree

IMPORTANT: This may or may not reflect reality.

i.e. You Still Have To Think as molecules do not necessarily trump morphology, development, etc.

Page 15: General Phylogenetics

15

General Points About Phylogenetic TreesGeneral Points About Phylogenetic Trees

gene tree = species tree

gene tree

species tree

gene tree = species tree

A AB BC C