Upload
leena
View
34
Download
0
Embed Size (px)
DESCRIPTION
DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction. Maria Eugenia D’Amato. BCB 705:Biodiversity. Organization of the presentation. Distance ML MP. Phylogenetic reconstruction Networks Multivariate analysis. - PowerPoint PPT Presentation
Citation preview
Available at http://planet.uwc.ac.za/nisl
DNA variation in Ecology DNA variation in Ecology and Evolutionand Evolution
IV- Clustering methods and Phylogenetic reconstruction
Maria Eugenia D’Amato
BCB 705:Biodiversity
Organization of the Organization of the presentationpresentation
•Phylogenetic reconstruction
•Networks
•Multivariate analysis
Distance
ML
MP
Characters:Characters: independent homologous
• Continuous
• Discrete Binary
MultistateC1 C2 C3 C4 C5 C6
1 1 0 0 1 1 12 1 0 0 0 1 13 1 1 1 1 1 14 0 0 1 1 1 15 1 1 1 1 0 0
DNA sequence charactersDNA sequence characters
Alignment = hypothesizing of a homology relationship for each site
Sequence comparison BLAST search - GenBank
Coding sequence blastn
blastx
Non-coding DNA blastn
Blast search resultsBlast search results
Score Score EE
Sequences producing significant alignments:Sequences producing significant alignments: (Bits) (Bits) ValueValue
gi|87299397|dbj|AB239568.1| gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5...Mantella baroni mitochondrial ND5... 101 101 3e-183e-18
gi|343991|dbj|D10368.1|FRGMTURF2 gi|343991|dbj|D10368.1|FRGMTURF2 Rana catesbeiana mitochondri...Rana catesbeiana mitochondri... 97.6 97.6 5e-175e-17
gi|14209845|gb|AF314017.1|AF314017 gi|14209845|gb|AF314017.1|AF314017 Rana sylvatica NADH dehydr...Rana sylvatica NADH dehydr... 93.7 93.7 8e-168e-16
The lower the E-value,
the better the alignment
GeneBank Accession numbers
for the sequence
Species that match the query
Blast search resultsBlast search results
>gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5, ND1, ND2 genes for NADH dehydrogenase
subunit 5, NADH dehydrogenase subunit 1, NADH dehydrogenase
subunit 2, complete cds
Length=10814
Identities = 99/115 (86%), Gaps = 0/115 (0%)
Strand=Plus/Minus
5’end
Score = 101 bits (51), Expect = 3e-18
Query 451 TTAGTTGAGGATTAAATTTTAGGATAATAACTATTCAGCCGAGGTGGCTGATGGAAGAAA 510
||||||||||||||||||||| ||||||| ||||||||| ||||| | |||||||| |
Sbjct 10203 TTAGTTGAGGATTAAATTTTAAAATAATAAGTATTCAGCCCAGGTGACCAATGGAAGAGA 10144
Query 511 AAGCTAAAATTTTACGTAGTTGTGTTTGGCTAATGCCGCCTCATCCGCCTACAAG 565
| |||| ||||||||||||||| |||||| |||| || ||||| || ||||||||
Sbjct 10143 AGGCTATAATTTTACGTAGTTGAGTTTGGTTAATACCCCCTCAACCTCCTACAAG 10089
Description of the genes contained in the sequence with this Accession number
Strands aligned
alignment
Phylogenetic reconstructionPhylogenetic reconstruction Distance methods
C1 C2 C3 C4 C5 C6 C7
1
2
3
4
5
Distance criterion
Similarity / dissimilarity criterion
dendrogram
5 x 5
5 X 7
Distances criterion for binary Distances criterion for binary datadata
a a + b + c
a = bands common to a and bb = bands exclusive to ac = bands exclusive to b
J =
(x1, y1)
(x2, y2)
Jaccard’s distance
Manhattan distance M =
P1
P2
(x1-x2) 2 + (x2-y2) 2Euclidean distance
Distance criterion for DNA data-Distance criterion for DNA data-Models of DNA susbstitution
p = n of different nucleotides/ total n nucleotides
fAA fAC fAG fAT
fCA fCC fCG fCT
fGA fGC fGG fGT
fTA fTC fTG fTT
Fxy =
a b c d
e f g h
i j k l
m n o p
Fxy =
Models of DNA susbstitutionModels of DNA susbstitution
Jukes and CantorD = 1 – ( a + f + k + p)
dxy = - ¾ ln (1- 4/3 D)
F81 B = 1 – ( 2A + 2
C + 2G + 2
T)
dxy = - B ln (1- D/B)
Equal rate
Unequal base freqs
K2PP = c + h + i + n Transitions
Q = b + d + e + g + j + l + m + o Transversions
1
1-2P-Qdxy =
1 ln
2
1 ln 1
4 1-2Q
+
Distances criterion for diploid dataDistances criterion for diploid data
Dn -ln Jxiyi
JxiJyi
Nei 1972
=
I
Jx = xi2
Jx = yi2
Jxy = xiyi
Cavalli Sforza 1967 Darc = (1/L) (2/)2
= cos-1 xiyi
Phylogenetic reconstruction criterion for distance data
V1
V2
V3
V4
V5
A
B
C
D
Additive tree (NJ)
Ultrametric tree (UPGMA)
A
B
C
V1
V2
V3
V4
Properties Properties
dAB = v1 + v2
dAC = v1 + v3 + v4
dAD = v1 + v3 + v5
dBC = v2 + v3 + v5
dCD = v4 + v5
dAB = v1 + v2 + v3
dAC = v1 + v2+ v4
dBC = v3 + v4
v3 = v4
v1 = v2 + v3 = v2 = v4
Maximum Likelihood
(1)C….GGACACGTTTA….C
(2)C….AGACACCTCTA….C
(3)C….GGATAAGTTAA….C
(4)C….GGATAGCCTAG….C
1 J n
4
3
2
1
C AC G
C AC G
6
5
4321
Unrooted treeTree after rooting at an
internal node
Lj = ProbA
A
C AC G
+ Prob
A
C + Prob…….
L = L1 x L2 x L3…x LN. = Lj LnL = ln L1+ ln L2 + …. LN = ln Lj
LD = Pr (DH)
Hypothesis testingHypothesis testingLikelihood ratio test
= log L1 – log L0
Rate variation
Appropriate substitution Model
2 2 distributiond.f. = N sequences in the tree –2; or
d.f = difference number of parameters H1 and H0
BootstrappingBootstrappingHow well supported are the groups?
Trumpet fish
Maximum ParsimonyMaximum ParsimonyMinimize tree length
To obtain rooted trees (and character polarity) use an outgroup . The ingroup is monophyletic.
1 ATATT
2 ATCGT
3 GCAGT
4 GCCGT
Tree (first site)
1
2
3
4
1 change 5 changes
G
G
AG
A
AG
G
GA
A
A
C
Maximum Parsimony-Maximum Parsimony- example
C
T
CT
T
Site 2 Site 3
AA C
A A
C C C
C
C
A A
Site 4
T
G
G
G
GG
Site 5 No changes
TT
T T
T T
Tree length
L = ki=1 li
Maximum parsimony:Maximum parsimony: example
Sites
1 2 3 4 5 Total
((1,2),(3,4)) 1 1 2 1 0 5
((1,3),(2,4)) 2 2 1 1 0 6
((1,4),(2,3)) 2 2 2 1 0 7
Tree
Phylogenetically informative sites
NetworksNetworks
• Phylogenetic representation allowing reticulation
• More appropriate for intraespecific data
• Ancestor is alive
• hybridization, recombination, horizontal transfer, polyploidization agct
agctacat
ac ct
acat agctacct
1
2 34
5 6 7
Multivariate clusteringMultivariate clusteringC1 C2 C3 C4 C5 C6 C7
1
2
3
4
5
5 X 7
similarity criterion correlations
7 x 7
Calculate eigenvectors with highest eigenvalues
Project data onto new axes (eigenvectors)
••
•
••
X 1st axis
Y2nd axis Z 3rd axis