Upload
sybil-nash
View
218
Download
1
Embed Size (px)
Citation preview
Xuhua Xia Slide 2
Lecture Outline• Objectives in this lecture
– Grasp the basic concepts distance-based tree-building algorithms
– Learn the least-squares criterion and the minimum evolution criterion and how to use them to construct a tree
• Distance-based methods– Genetic distance: generally defined as the number of substitutions per site.
• JC69 distance• K80 distance• TN84 distance• F84 distance• TN93 distance• LogDet distance
– Tree-building algorithms (UPGMA): • UPGMA• Neighbor-joining• Fitch-Margoliash• FastME
Xuhua Xia Slide 3
Genetic Distances
• Genetic distances: Assuming a substitution model, we can obtain the genetic distance (i.e., difference) between two nucleotide or amino acid sequences, e.g.,
• JC
• K80
• TN93:
3
41ln
4
3 pK JC
80
1 1ln ln
1 2 1 2
2 4K
P Q QK
RY2GA1CT93 4 + 4 + 4TND
Y 1
T C YY
Y
P Q-ln 1- - ln 1
2 2 2 =
2
RY R
Q
R 2
A G RR
R
P Q-ln 1- - ln 1
2 2 2 =
2
YY R
Q
2
21ln
RY
Q
Xuhua Xia Slide 4
Calculation of KJC69
3 4ln 1
4 3
pK
AACGACGATCG: Species 1
AACGACGATCG
AACGACGATCG: Species 2
t
t
The time is 2t between Species 1 to Species 2
Sp1: AAG CCT CGG GGC CCT TAT TTT TTG
|| | ||| ||| | ||| ||| ||
Sp2: AAT CTC CGG GGC CTC TAT TTT TTT
p = 6/24 = 0.25
K = 0.304099
Genetic distances are scaled to be the number of substitutions per site.
Xuhua Xia Slide 5
Numerical Illustration
Sp1: AAG CCT CGG GGC CCT TAT TTT TTG
|| | ||| ||| | ||| ||| ||
Sp2: AAT CTC CGG GGC CTC TAT TTT TTT
What are P and Q?
P = 4/24, Q = 2/24
80
ln 1 2 ln 1 20.31507864
2 4K
P Q QK
Comparison of distances:
P = 0.25
Poisson P = -ln(1-p) = 0.288
KJC69 = 0.304099
KK80 = 0.3150786
Xuhua Xia Slide 6
Distance-based phylogenetic algorithms
Algorithms Optimization Assuming a molecular clockUPGMA Local YesNeighbor-joining Local NoMinimum EvolutionGlobal NoFitch-Margoliash Global No FastME Global No
Xuhua Xia Slide 7
A Star Tree (Completely Unresolved Tree)
Human
Chimpanzee
Gorilla
Orangutan
Gibbon
Xuhua Xia Slide 8
Genetic Distance Matrix
Matrix of Genetic distances (Dij):
Human Chimp Gorilla Orang GibbonHuman 0.015 0.045 0.143 0.198Chimp 0.030 0.126 0.179Gorilla 0.092 0.179Orang 0.179Gibbon
Xuhua Xia Slide 9
• Human Chimp Gorilla Orang GibbonHuman 0.015 0.045 0.143 0.198Chimp 0.030 0.126 0.179Gorilla 0.092 0.179Orang 0.179Gibbon
• D(hu-ch),go = (Dhu,go + Dch,go)/2 = 0.038 D(hu-ch),or = (Dhu,or + Dch,or)/2 = 0.135D(hu-ch),gi = (Dhu,gi + Dch,gi)/2 = 0.189
• hu-ch Gorilla Orang Gibbonhu-ch 0.038 0.135 0.189Gorilla 0.092 0.179Orang 0.179Gibbon
HumanChimpGorillaOrangGibbon
GorillaOrangGibbonHumanChimp
UPGMA
OrangGibbonGorillaHumanChimp
(hu,ch),(go,or,gi)
((hu,ch),go),(or,gi)
Xuhua Xia Slide 10
• Human Chimp Gorilla Orang GibbonHuman 0.015 0.045 0.143 0.198Chimp 0.030 0.126 0.179Gorilla 0.092 0.179Orang 0.179Gibbon
• D(hu-ch-go),or = (Dhu,or + Dch,or + Dgo,or)/3 = 0.120D(hu-ch-go),gi = (Dhu,gi + Dch,gi +Dgo,gi)/3 = 0.185
• hu-ch-go Orang Gibbonhu-ch-go 0.120 0.185Orangutan 0.179Gibbon
• D(hu-ch-go-or),gi = (Dhu,gi + Dch,gi +Dgo,gi + Dor,gi)/4 = 0.184
OrangGibbonGorillaHumanChimp
GibbonOrangGorillaHumanChimp
UPGMA
(((hu,ch),go),or),gi)
Xuhua Xia Slide 11
Phylogenetic Relationship from UPGMA
• Human Chimp Gorilla Orang GibbonHuman 0.015 0.045 0.143 0.198Chimp 0.030 0.126 0.179Gorilla 0.092 0.179Orang 0.179Gibbon
• hu-ch Gorilla Orang Gibbonhu-ch 0.038 0.135 0.189Gorilla 0.092 0.179Orang 0.179Gibbon
• hu-ch-go Orang Gibbonhu-ch-go 0.120 0.185Orang 0.179Gibbon
Xuhua Xia Slide 12
Branch Lengths
((hu,ch),(go,or,gi))
(((hu,ch),go),(or,gi))
((((hu,ch),go),or),gi)
Dhu-ch = 0.015D(hu-ch),go = (Dhu,go + Dch,go)/2 = 0.038 D(hu-ch),or = (Dhu,or + Dch,or)/2 = 0.135D(hu-ch),gi = (Dhu,gi + Dch,gi)/2 = 0.189
D(hu-ch-go),or = (Dhu,or + Dch,or + Dgo,or)/3 = 0.120D(hu-ch-go),gi = (Dhu,gi + Dch,gi +Dgo,gi)/3 = 0.185
D(hu-ch-go-or),gi = (Dhu,gi + Dch,gi +Dgo,gi + Dor,gi)/4 = 0.184
((hu:0.0075,ch:0.0075),(go,or,gi))
(((hu:0.0075,ch:0.0075):0.019,go:0.019),(or,gi))
((((hu:0.0075,ch:0.0075):0.0115,go:0.019):0.041,or:0.06):0.032,gi:0.092)
Human
Chimp
Gorilla
Orang
Gibbon
0.0075
0.019
0.06
0.092
Xuhua Xia Slide 13
Final UPGMA TreeHuman
Chimp
Gorilla
Orang
Gibbon
0.092 0.060 0.019 0.0075
19 13 8 6 MY
((((hu:0.0075,ch:0.0075):0.0115,go:0.019):0.041,or:0.06):0.032,gi:0.092);
Xuhua Xia Slide 14
Distance-based method
• Distance matrix
• Tree-building algorithms– UPGMA
– Neighbor-joining
– FastME
– Fitch-Margoliash
• Criterion-based methods– Branch-length estimation
– Tree-selection criterion
Xuhua Xia Slide 15
Branch Length Estimation• For three OTUs, the branch lengths can be estimated
directly
• For more than three OTUs, there are two commonly used methods for estimating branch lengths– The least-square method
– Fitch-Margoliash method
• Don’t confuse the Fitch-Margoliash method of branch length estimation with the Fitch-Margoliash criterion of tree selection
• Illustration of the least-square method of branch length estimation
Xuhua Xia Slide 16
For three OTUs
1 2 3 1 0.092 0.1792 0.1793
1 2 31 d12 d13 2 d23 3
d12 = x1 + x2
d13 = x1 + x3
d23 = x2 + x3
x1
2
1
x3
x2
3
Xuhua Xia Slide 17
Least-square method
4
x1
3
2
1
x5
x4
x3
x2
4Sp1Sp2 0.3Sp3 0.4 0.5Sp4 0.4 0.6 0.6
4
Sp1
Sp2 d12
Sp3 d13 d23
Sp4 d14 d24 d34
Xuhua Xia Slide 18
Least-square method
4
x1
3
2
1
x5
x4
x3
x2
d’12 = x1 + x2
d’13 = x1 + x5+ x3
d’14 = x1 + x5 + x4
d’23 = x2 + x5 + x3
d’24 = x2 + x5 + x4
d’34 = x3 + x4
(d12 - d’12)2= [d12 – (x1 + x2)]2
(d13 - d’13)2 = [d13 – (x1 + x5+ x3)]
2
(d14 - d’14)2 = [d14 – (x1 + x5 + x4)]
2
(d23 - d’23)2 = [d23 – (x2 + x5 + x3)]
2
(d24 - d’24)2 = [d24 – (x2 + x5 + x4)]
2
(d34 - d’34)2 = [d34 – (x3 + x4)]
2
n
jiijij ddSS 2' )( Least-squares method: Find xi
values that minimize SS
Xuhua Xia Slide 19
Least-squares method
SS = [d12 – (x1 + x2)]2 + [d13 – (x1 + x5+ x3)]
2 + [d14 – (x1 + x5 + x4)]2
+ [d23 – (x2 + x5 + x3)]2+ [d24 – (x2 + x5 + x4)]
2+ [d34 – (x3 + x4)]2
Take the partial derivative of SS with respective to xi, we have SS/x1 := -2 d12 + 6 x1 + 2 x2 - 2 d13 + 4 x5 + 2 x3 - 2 d14 + 2 x4
SS/x2 := -2 d12 + 2 x1 + 6 x2 - 2 d23 + 4 x5 + 2 x3 - 2 d24 + 2 x4
SS/x3 := -2 d13 + 2 x1 + 4 x5 + 6 x3 - 2 d23 + 2 x2 - 2 d34 + 2 x4
SS/x4 := -2 d14 + 2 x1 + 4 x5 + 6 x4 - 2 d24 + 2 x2 - 2 d34 + 2 x3
SS/x5 := -2 d13 + 4 x1 + 8 x5 + 4 x3 - 2 d14 + 4 x4 - 2 d23 + 4 x2 - 2 d24
Setting these partial derivatives to 0 and solve for x i, we have
x1 = d13/4 + d12/2 - d23/4 + d14/4 - d24/4x2 = d12/2 - d13/4 + d23/4 - d14/4 + d24/4,x3 = d13/4 + d23/4 + d34/2 - d14/4 - d24/4,x4 = d14/4 - d13/4 - d23/4 + d34/2 + d24/4,x5 = - d12/2 + d23/4 - d34/2 + d14/4 + d24/4 + d13/4
Xuhua Xia Slide 20
Least-squares method
x1 = d13/4 + d12/2 - d23/4 + d14/4 - d24/4x2 = d12/2 - d13/4 + d23/4 - d14/4 + d24/4,x3 = d13/4 + d23/4 + d34/2 - d14/4 - d24/4,x4 = d14/4 - d13/4 - d23/4 + d34/2 + d24/4,x5 = - d12/2 + d23/4 - d34/2 + d14/4 + d24/4 + d13/4
4Sp1Sp2 0.3Sp3 0.4 0.5Sp4 0.4 0.6 0.6
x1 = 0.075x2 = 0.225x3 = 0.275x4 = 0.325x5 = 0.025
4
x1
3
2
1
x5
x4
x3
x2
Xuhua Xia Slide 21
Minimum Evolution Criterion
4
x1
3
2
1
x5
x4
x3
x2
4
x1
2
3
1
x5
x4
x3
x2
3
x1
2
4
1
x5
x4
x3
x2
The minimum evolution (ME) criterion: The tree with the shortest TreeLen is the best tree.
OTUs ofnumber n where
32
1
n
iixTreeLen