47
for immunologists 2013 Introduction to Phylogenies Dr Laura Emery [email protected] www.ebi.ac.uk/training

For immunologists 2013 Introduction to Phylogenies Dr Laura Emery [email protected]

Embed Size (px)

Citation preview

Page 1: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

for immunologists 2013

Introduction to Phylogenies

Dr Laura Emery

[email protected]

www.ebi.ac.uk/training

Page 2: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Objectives

After this tutorial you should be able to…

• Use essential phylogenetic terminology effectively

• Discuss aspects of phylogenies and their implications for phylogenetic interpretation

• Apply phylogenetic principles to interpret simple trees

This course will not:

• Provide you with an overview of phylogenetic methods

• Enable you to use tools to construct your own phylogenies

• Enable you to evaluate whether a sensible phylogenetic model or method was selected to construct a phylogeny

Page 3: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Outline

• Introduction

• Aspects of a tree1. Topology

2. Branch lengths

3. Nodes

4. Confidence

• Simple phylogenetic interpretation

• Including homology, gene duplication, co-evolution

Page 4: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

What can I do with phylogenetics?

• Deduce relationships among species or genes or cells

• Deduce the origin of pathogens

• Identify biological processes that affect how your sequence has evolved e.g. identify genes or residues undergoing positive selection

• Explore the evolution of traits through history

• Estimate the timing of major historical events

• Explore the impact of geography on species diversification

Page 5: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

What is a phylogenetic tree?

A tree is an explanation of how sequences evolved, their genealogical relationships and thus how they came to be the way they are today (or at the time of sampling).

Darwin 1837

Page 6: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Phylogenies explain genealogical relationships

• Family tree

Page 7: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Aspects of a tree

1. Topology (branching order)

2. Branch lengths (indication of genetic

change)

3. Nodes

i. Tips (sampled sequences known as taxa)

ii. Internal nodes (hypothetical ancestors)

iii. Root (oldest point on the tree)

4. Confidence (bootstraps/probabilities)

Page 8: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

1. Topology

The topology describes the branching structure of the tree, which indicate patterns of relatedness.

A B C ABCB A CThese trees display the

same topology

A B C CBAC A BThese trees

display different topologies

Page 9: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Topology Question

Are these topologies the same?

Answer = yes

Page 10: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Topology Question II

Which of these trees has a different topology from the others?

A B CF DE A E DF BC B A CF DE

C A BF ED E D FC AB

Page 11: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

2. Branch lengths indicate genetic change

• Longer branches indicate greater change

• Change is typically represented in units of number of substitutions per site (but check the legend)

1.20.6

0.8

0.5

0.5

0.5

Page 12: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

A scale bar can represent branch lengths

0.5

These are alternative representations of the same phylogeny

1.20.6

0.8

0.5

0.5

0.5

Page 13: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Alternative representations of phylogenies

All of these representations depict the same topologyBranch lengths are indicated in blue

Red lengths are meaningless

Page 14: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Not all trees include branch length data

Cladogram Phylogram

Page 15: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Distance and substitution rate are confounded• Branch lengths indicate the genetic change that

has occurred

• We often don’t know if long branch lengths reflect:

• A rapid evolutionary rate

• An ancient divergence time

• A combination of both

• Genetic change = Evolutionary rate x Divergence time (substitutions/site) (substitutions/site/year) (years)

C

D

EA B

Page 16: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

3. Nodes

• Nodes occur at the ends of branches

• There are three types of nodes:

i. Tips (sampled sequences known as taxa)

ii. Internal nodes (hypothetical ancestors)

iii. Root (oldest point on the tree)

C D EA B

Figures Andrew Rambaut

Page 17: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

The root is the oldest point on the tree

• The root indicates the direction of evolution

• It is also the (hypothesised) most recent common ancestor (MRCA) of all of the samples in the tree

C D EA B

past

present

Figures Andrew Rambaut

Page 18: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Trees can be drawn in an unrooted form

Rooted Unrooted

These are alternative representations of the same topology

C D EA BA

B

C

D

E

Page 19: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

There are multiple rooted tree topologies for any given unrooted tree

• Most tree-building methods produce unrooted trees

• Identifying the correct root is often critical for interpretation!

*

Figure Aiden Budd

Page 20: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

How to root a tree

• Midpoint rooting

• Assume constant evolutionary rate

• Often not the case!

• Outgroup rooting

• The outgroup is one or more taxa that are known to have diverged prior to the group being studied

• The node where the outgroup lineage joins the other taxa is the root

Midpoint rooted

Outgroup rooted

Unrooted

Recommended

Page 21: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Root Question

This tree shows a cladogram i.e. the branch lengths do not indicate genetic change.

Indicate any root positions where bird and crocodile are not sister taxa (each other's closest relatives).

Page 22: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Alternative Representations Question

Page 23: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

4. Confidence

How good is a tree?A tree is a collection of hypotheses so we assess our confidence in each of its parts or branches independently

There are three main approaches:

• Bootstraps

• Bayesian methods

• Approximate likelihood ratio test (aLRT) methods

85

63

100

probabilistic

0.93

0.81

0.99

Page 25: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Confidence Question

Which of the bootstrap values indicates our confidence in the grouping of A, B, C, and D together as a monophyletic group? Do you think we can be confident in this grouping?

A

B

C

D

E

F

84

63

91

100

Note: high bootstrap values do not always mean that we have confidence in a branch. False confidence can be generated under some phylogenetic methods

Page 26: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

for immunologists 2013

Part two: Phylogenetic interpretation

Dr Laura Emery

[email protected]

www.ebi.ac.uk/training

Page 27: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Phylogenetic interpretation skill set

1. Tree-thinking skills

• relatedness, confidence, homology

2. Knowledge of phylogenetic methods and their limitations

3. Knowledge of biological processes affecting sequence evolution

• gene duplication, recombination, horizontal gene transfer, population genetic processes, and many more!

4. Knowledge of the data you wish to interpret

Page 28: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Simple phylogenetic interpretation question• Which is true?

• A) Mouse is more closely related to fish than frog is to fish

• B) Lizard is more closely related to fish than mouse is to fish

• C) Human and frog are equally related to fish

Page 30: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Gene duplication

Gene duplication and subsequent divergence can result in novel gene functions (it can also result in pseudogenes)

• Genes that are homologous due to gene duplication are paralogous

• Genes that are homologous due to speciation are orthologous

Page 32: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Park et al 2012. Scientific Reports

• Immunology genes have a high dN/dS ratio indicative of positive selection

• Rapid evolutionary rate

• Difficult to align

• Violate assumptions of many phylogenetic models

Immunology related genes have atypical patterns of molecular evolution

Page 33: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Positive selection can lead to ladder-like phylogenies

Page 34: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Example: influenza haemagglutination phylogeny and immunological mapping

Smith et al 2004. Science

Page 35: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Phylogenetics can inform us of host-pathogen interactions and co-evolution

• "Mirror" phylogenies are indicative of host-parasite vertical inheritance

Jiggins web page: http://www.gen.cam.ac.uk/research/jiggins/research.html

Page 37: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

T-cell receptors and immunoglobulin chains are homologous

Richards et al 2000

Page 38: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

An extremely brief introduction to methods, analyses, & pitfalls

Page 39: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

There is only one true tree

• The true tree refers to what actually happened in the evolutionary past

• All methods attempt to reconstruct the true phylogeny

• Even the best method may not give you the true tree

Page 40: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Phylogenetic Methods: The general approach• We want to find the tree that best explains our

aligned sequences

• We need to be able to define “best explains”

• we need a model of sequence evolution

• we need a criterion (or set of criteria) to use to choose between alternative trees

• then evaluate all possible trees

(NB: if N=20, then 2 x 1020 possible unrooted trees!)

• or take a short cut

Paul Sharp

Page 42: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Methodological approaches

1. Distance matrix methods (pre-computed distances)

• UPGMA assumes perfect molecular clock Sokal & Michener (1958)

• Minimum evolution (e.g. Neighbor-joining, NJ) Saitou & Nei (1987)

2. Maximum parsimony Fitch (1971)

• Minimises number of mutational steps

3. Maximum likelihood, ML• Evaluates statistical likelihood of alternative trees,

based on an explicit model of substitution

4. Bayesian methods• Like ML but can incorporate prior knowledge

Page 43: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Phylogenetic analyses are not straightforwardData assessment

- known biology- additional data

(e.g. geography)

Decide upon and impleme

ntmethod

Phylogenetic

Result(s)

Formulate hypothes

es

Answered your

question?

Investigate unexpected and

unresolved aspects further

- consider including more

dataFinal phylogeny

and analysis

Can you

validate this?

YesNo

No

Yes

Page 44: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Further Reading

• Molecular Evolution: A Phylogenetic Approach (1998) Roderic D M Page & Edward C Holmes, Blackwell Science, Oxford.

• The Phylogenetic Handbook (2003), Marco Salemi and Anne-Mieke Vandamme Eds, Cambridge University Press, Cambridge.

• Inferring Phylogenies (2003) Joseph Felsenstein, Sinauer.

• Molecular Evolution (1997) Wen-Hsiung Li , Sinauer

Page 45: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Phylogenetics at the EBI

• Clustal phylogeny currently available

• RAxML coming soon…

• www.EBI.ac.uk/tools/phylogeny

Page 46: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

AcknowledgementsPeople

• Andrew Rambaut (University of Edinburgh) …and the EBI training team

• Paul Sharp (University of Edinburgh)

• Nick Goldman (EMBL-EBI)

• Benjamin Redelings (Duke University)

• Brian Moore (University of California, Davis)

• Olivier Gascuel (University of Montpelier)

• Aiden Budd (EMBL-Heidelberg)

Funding EMBL member states and…

Page 47: For immunologists 2013 Introduction to Phylogenies Dr Laura Emery Laura.Emery@ebi.ac.uk

Thank you!

www.ebi.ac.uk

Twitter: @emblebi

Facebook: EMBLEBI