53
Phylogene)cs

Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Phylogene)cs

Page 2: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Outline

! What’s Phylogenetic Trees?

! Build Phylogenetic Trees by Distance

Methods

! Validate Phylogenetic Trees by Re-sampling

! Rock with PHYLIP

Page 3: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A
Page 4: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Phylogenetic Trees

! Phylogenetics is the study of evolutionaryrelationships among organisms

! A phylogenetic tree or phylogeny for a set oftaxa (species, genes, …) is an evolutionarytree representing their relationships.

! A tree is an acyclic graph: horizontal transferis ignored

! Edge weights may represent distance inevolution

Page 5: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Phylogenetic Trees

! Trees can be rooted or unrooted.

! In the case of unrooted trees we can assume

to have not enough data to determine the root

of the tree

! The leaves of a phylogenetic tree usually

represent the present day taxa, the internal

nodes represent hypothesized ancestors.

Page 6: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Tree Topology

1

23

4

5

6

78

2 3

45

67

8

root

1

Page 7: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Why Phylogenetic Trees?

! Evolution of organisms !tree of species)

! Evolution of genes (tree of gene)

! Application:

! Comparative Genomics

! Gene function prediction

Page 8: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Models and Methods

! Model: an abstract of “real” evolutionary

events.

! Maximum Parsimony methods

! Distance Matrix methods

! Maximum Likelihood methods

! Which is better?

Page 9: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Maximum Parsimony

! Variation is small

! All possible trees are evaluated

! <=11 or 12 sequences concerned

! Time-consuming

! Concensus tree for more than one MP trees

Page 10: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Distance Matrix methods

! Variation is intermediate

! Hierarchical inference

! Rather faster then MP.

! Large number of sequences

! The distance matrix can be derived from

multiple alignment or evolution event or

others like K-tuple method

Page 11: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Maximum Likelihood

! Variation could be some larger

! All possible trees are evaluated

! <=11 or 12 sequences concerned

! Both topology and edge lengths are

considered.

! based on probability inference.1x

2x1t 2t

4x4t 5x

root

3t

),|( •

•tTxP

Page 12: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

How many possible trees?

Rooted tree

Unrooted tree

m=10:

34,459,425

m=10:

2,027,025

Page 13: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

A Quick Summary

++++++Flexibility

YNNEdge Length

Estimation

++++++Computation

Complex

++++++Variation

MLDMMP

Page 14: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

A General Protocol

Choose

set of

related

seqs

multiple

seq

alignment

Strong seq

similarity MP

DM

ML

Clearly

recognizable

similarity

Validate Result

Y

N

Y

N

Combine

Different Methods

for Consensus

Page 15: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Outline

! What’s Phylogenetic Trees?

! Build Phylogenetic Trees by Distance

Methods

! Validate Phylogenetic Trees by Re-sampling

! Rock by PHYLIP

Page 16: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Distance Methods

! Neighbors – the closest taxa

! Rather fast

! More reliable than MP when branch lengths

vary (Jin and Nei, 1990; Swofford et al. 1996)

! Additive: the lengths be additive

Page 17: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Neighbors Joining

! Proposed by Saitou and Nei in 1987

! Pearson et al. enhance NJ in 1999 (Not a

single tree predicted)

! Pairing sequences based on the effect of the

pairing on the sum of the sum of the branch

lengths of the tree

! Starting from a star-like tree

Page 18: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Similarity to Distance

! Convert alignment scores to distances:

is observed pairwise alignment score

is the maximum score, the average of the scoreof aligning either sequence to itself.

is the expected score for aligning two randomsequences of the same length and residuecomposition, which can be calculated by randomshuffling of the two sequences or by an approximatecalculation given in Feng & Doolittle[1996]

)}/()log{(log max randrandobseff SSSSSD !!!=!=

obsS

maxS

randS

Page 19: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Neighbour Joining Algorithm

! For each node i the distance from the rest of the tree is estimated by

! Choose the nodes i and j that for which

is smallest

join i and j (ij is new node)

! Compute branch length from i and j to ij

! Compute the distances between the new cluster and each other cluster:

!"#

=ik

kiid

Nr

,2

1

)(2

1

2

1),(

2

1

2

1,)(,,)(, ijjiijjjijiiji rrddrrdd !+=!+=

2

,,,

),(

jikjki

kij

dddd

!+=

jiijij rrdD !!=

Page 20: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

92

88

84.4

96

87

80.8

88.4

ri

10296204392107G

10262106895823F

9662100831667E

201061004796111D

438983477994C

925816967963B

10723671119463A

GFEDCBA

A

CD

EB

F

G

Start from the star-like treeCalculate ir

Neighbour joining algorithm(1)

No

molecular clock

assumption

Page 21: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

92

88

84.4

96

87

80.8

88.4

ri

10296204392107G

-7862106895823F

-80.4-110.4100831667E

-168-78-80.44796111D

-136-86-84.4-1367994C

-80.8-110.8-149.2-80.8-88.863B

-69.4-153.4-105.8-73.4-81.4-106.2A

GFEDCBA

Calculate , D and G are the closest

Calculate the branch lengths of D and G

ijD

12=d

8=g

Neighbour joining algorithm(2)

Page 22: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

94

88

35

84

94

DG

91.259488358494DG

81.5

79

95

75

85.25

ri

62895823F

62831667E

89837994C

58167963B

23679463A

FECBA

Join D and G, calculate the distances

from DG to other nodes

ir

A

C

DE

B

F

G

DG

Neighbour joining algorithm(3)

Page 23: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

-78.75

-82.25

-151.25

-82.25

-82.5

DG

91.259488358494DG

81.5

79

95

75

85.25

ri

62895823F

-98.5831667E

-87.5-917994C

-98.5-138-9163B

-143.75-97.25-86.25-97.25A

FECBA

Calculate , C and DG are the closest

Calculate the branch lengths of C and DG

ijD

375.19=c

625.15=dg

Neighbour joining algorithm(4)

Page 24: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

74

60

64

61

CDG

98.374606461CDG

72.3

68.3

67

71.3

ri

625823F

621667E

581663B

236763A

FEBA

A

C

D

E

B

F

G

DG

CDGJoin DG and C, calculate the distances

from CDG to other nodesir

Neighbour joining algorithm(5)

Page 25: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

-96.3

-90

-101.3

-108.6

CDG

98.374606461CDG

72.3

68.3

67

71.3

ri

625823F

-78.61667E

-81.3-119.363B

-120.6-72.6-75.3A

FEBA

Calculate , A and F are the closest

Calculate the branch lengths of A and F

11=a

12=f

ijD

Neighbour joining algorithm(6)

Page 26: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

60

64

112

CDG

1186064112CDG

91

89

158

ri

16106E

1698B

10698AF

EBAF

A

C

D

E

B

F

G

CDG

DGAF

Join A and F, calculate the distancesfrom AF to other nodes

ir

Neighbour joining algorithm(7)

Page 27: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

-149

-143

-164

CDG

1186064112CDG

91

89

158

ri

16106E

-16498B

-143-149AF

EBAF

Calculate , B and E are the closest

Calculate the branch lengths of B and E

7=b

9=e

ijD

Neighbour joining algorithm(8)

Page 28: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

108

112

CDG

220108112CDG

296

300

ri

188BE

188AF

BEAF

Join B and E, calculate the distances

from BE to other nodes and ir A

C

D

E

B

F

G

CDG

DGAF

BE

Neighbour joining algorithm(9)

Page 29: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

-408

-408

CDG

220108112CDG

296

300

ri

188BE

-408AF

BEAF

Calculate , BE and CDG are the closest

Calculate the branch lengths of BE and CDG

92=be

16=cdg

ijD

Join BE and CDG, calculate the

distances from BECDG to the last nodeAF :146

A

CD

E

B

F

G

CDG

DGAF

BE

Neighbour joining algorithm(10)

Page 30: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

A

C

D

E

B

F

G

CDG

DG AF

BE

12=d

8=g

375.19=c

625.15=dg

11=a

12=f

7=b

9=e

92=be16=cdg

146=last

Neighbour joining algorithm(11)

Page 31: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

A Quick Summary

! NJ is fast and reliable for topology

! But not edges length

! NJ do not necessarily assume molecular

clock.

! But it guarantees the assumption hold if

required.

! Distances should hold Triangle Law.

Page 32: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Outline

! What’s Phylogenetic Trees?

! Build Phylogenetic Trees by Distance

Methods

! Validate Phylogenetic Trees by Re-sampling

! Rock with PHYLIP

Page 33: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A
Page 34: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Validate the Inference

! Phylogenetic trees are inferred based on

Model

! Hypothetical Inference

! How reliable are the result?

! Reliability vs. Stability

! Validate the result by Re-sampling.

Page 35: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Bootstrap(1)

! Given a dataset consisting of an alignment of

sequences, an artificial dataset of the same

size is generated

! by picking columns from the alignment at

random with replacement.

! One given column in the original dataset can

therefore appear several times in the artificial

dataset

Page 36: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Bootstrap(2)

! The tree building algorithm is then applied to

this new dataset, and the whole selection

and tree building procedure is repeated

typically 100 times.

! The frequency with which a chosen

phylogenetic feature appears is taken to be a

measure of the confidence we can have in

this feature.

! At last, a consensus tree is created

Page 37: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Validate the Tree

! To improve prediction of trees and assist with

localization of the root, an outgroup could be

set.

! An outgroup of the following criteria:

! From species that are known to have

separated from the others at an early

evolutionary time

! More distantly related with other sequences

Page 38: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

More words on Outgroup

! More than one can be selected

! By independently information, such as fossil

evidence

! Too distant an outgroup may lead to

incorrect prediction

Page 39: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Outline

! What’s Phylogenetic Trees?

! Build Phylogenetic Trees by Distance

Methods

! Validate Phylogenetic Trees by Re-sampling

! Rock with PHYLIP

Page 40: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Phylogenetic Software

! Multialignment

! ClustalW

! POA

! Phylogenetic analysis

! PHYLIP (Felsenstein,1989,1996)

! PAUP (Sinauar Associates)

! PAML (Yang Ziheng)

! MEGA (Nei)

! MacClade (Macintosh computer)

Page 41: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Programs in PHYLIP

! Create a distance table by:

! DNADIST: various models of evolution

! PROTDIST: based on the PAM model or

others

! as input to the following:

! NEIGHBOR:

! NJ, no clock, no root

! UPGMA and a clock and root

Page 42: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

NJ @ PHYLIP

! Multiple alignment: clustalw,

! save the output in phylip format (*.phy)

! Bootstrap the sequence data: SEQBOOT

! Build Phylogenetic trees: NEIGHBOR

! Calc Consensus : CONSENSUS

Page 43: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Mutiple Sequence Alignment

(*.PHY)

! Mo3 ATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGCACGGTACCAT

! Mo5 ATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGTACGGTACCAT

! Mo6 ATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGTACGGTACCAT

! Mo7 ATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGTACAGTACCAT

! Mo8 ATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGTACAGTACCAT

! Mo9 ATGTATCTCGTACATTACTGCCAGCCACCATGAATATTGTACGGTACCAT

! Mo12 ATGTATTTCGTACATTACTG CCAGCCACCATGAATATTGTACGGTACCAT

! Mo13 ATGTATCTCGTACATTACTGCCAGCCACCATGAATATTGTACGGTACCAT

Page 44: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

Multiple alignment in Phylip format

OTUs

No of OTUs

Sequence

length

Page 45: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

SEQBOOT

1. The name of *.PHY

2. Input a Random number seed (must be odd)

Page 46: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

SEQBOOT

J == Bootstrap

R == number of republicate, typical 100

Page 47: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

The result file with 100 replicate

Page 48: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

DNADIST

T: 15 ~ 30

M: 100

Page 49: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

100 replica " 100 distance

matrix

Distance Matrix

Page 50: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

NEIGHBOR

#M == 100

Page 51: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

CONSENSE

Page 52: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

View the Treefile by TREEVIEW

Page 53: Phylogenecs - CGIARhpc.ilri.cgiar.org/beca/training/maseno2012/Phylo... · 2012. 3. 8. · Phylogenetic Trees!Phylogenetics is the study of evolutionary relationships among organisms!A

More Help on PHYLIP

! Homepage:

! http://evolution.genetics.washington.edu/phylip

.html

! A pretty good tutorial:

! http://koti.mbnet.fi/tuimala/oppaat/phylip2.pdf