20
Available at http://planet.uwc.ac.za/nisl DNA variation in DNA variation in Ecology and Evolution Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction Maria Eugenia D’Amato BCB 705:Biodiversity

DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

  • Upload
    leena

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction. Maria Eugenia D’Amato. BCB 705:Biodiversity. Organization of the presentation. Distance ML MP. Phylogenetic reconstruction Networks Multivariate analysis. - PowerPoint PPT Presentation

Citation preview

Page 1: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Available at http://planet.uwc.ac.za/nisl

DNA variation in Ecology DNA variation in Ecology and Evolutionand Evolution

IV- Clustering methods and Phylogenetic reconstruction

Maria Eugenia D’Amato

BCB 705:Biodiversity

Page 2: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Organization of the Organization of the presentationpresentation

•Phylogenetic reconstruction

•Networks

•Multivariate analysis

Distance

ML

MP

Page 3: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Characters:Characters: independent homologous

• Continuous

• Discrete Binary

MultistateC1 C2 C3 C4 C5 C6

1 1 0 0 1 1 12 1 0 0 0 1 13 1 1 1 1 1 14 0 0 1 1 1 15 1 1 1 1 0 0

Page 4: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

DNA sequence charactersDNA sequence characters

Alignment = hypothesizing of a homology relationship for each site

Sequence comparison BLAST search - GenBank

Coding sequence blastn

blastx

Non-coding DNA blastn

Page 5: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Blast search resultsBlast search results

Score Score EE

Sequences producing significant alignments:Sequences producing significant alignments: (Bits) (Bits) ValueValue

gi|87299397|dbj|AB239568.1| gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5...Mantella baroni mitochondrial ND5... 101 101 3e-183e-18

gi|343991|dbj|D10368.1|FRGMTURF2 gi|343991|dbj|D10368.1|FRGMTURF2 Rana catesbeiana mitochondri...Rana catesbeiana mitochondri... 97.6 97.6 5e-175e-17

gi|14209845|gb|AF314017.1|AF314017 gi|14209845|gb|AF314017.1|AF314017 Rana sylvatica NADH dehydr...Rana sylvatica NADH dehydr... 93.7 93.7 8e-168e-16

The lower the E-value,

the better the alignment

GeneBank Accession numbers

for the sequence

Species that match the query

Page 6: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Blast search resultsBlast search results

 

>gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5, ND1, ND2 genes for NADH dehydrogenase

subunit 5, NADH dehydrogenase subunit 1, NADH dehydrogenase

subunit 2, complete cds

Length=10814

Identities = 99/115 (86%), Gaps = 0/115 (0%)

Strand=Plus/Minus 

 5’end

Score = 101 bits (51), Expect = 3e-18

Query 451 TTAGTTGAGGATTAAATTTTAGGATAATAACTATTCAGCCGAGGTGGCTGATGGAAGAAA 510

||||||||||||||||||||| ||||||| ||||||||| ||||| | |||||||| |

Sbjct 10203 TTAGTTGAGGATTAAATTTTAAAATAATAAGTATTCAGCCCAGGTGACCAATGGAAGAGA 10144

Query 511 AAGCTAAAATTTTACGTAGTTGTGTTTGGCTAATGCCGCCTCATCCGCCTACAAG 565

| |||| ||||||||||||||| |||||| |||| || ||||| || ||||||||

Sbjct 10143 AGGCTATAATTTTACGTAGTTGAGTTTGGTTAATACCCCCTCAACCTCCTACAAG 10089  

Description of the genes contained in the sequence with this Accession number

Strands aligned

alignment

Page 7: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Phylogenetic reconstructionPhylogenetic reconstruction Distance methods

C1 C2 C3 C4 C5 C6 C7

1

2

3

4

5

Distance criterion

Similarity / dissimilarity criterion

dendrogram

5 x 5

5 X 7

Page 8: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Distances criterion for binary Distances criterion for binary datadata

a a + b + c

 

a = bands common to a and bb = bands exclusive to ac = bands exclusive to b

J =

(x1, y1)

(x2, y2)

Jaccard’s distance

Manhattan distance M =

P1

P2

(x1-x2) 2 + (x2-y2) 2Euclidean distance

Page 9: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Distance criterion for DNA data-Distance criterion for DNA data-Models of DNA susbstitution

p = n of different nucleotides/ total n nucleotides

 

fAA fAC fAG fAT

fCA fCC fCG fCT

fGA fGC fGG fGT

fTA fTC fTG fTT

Fxy =

a b c d

e f g h

i j k l

m n o p

Fxy =

Page 10: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Models of DNA susbstitutionModels of DNA susbstitution

Jukes and CantorD = 1 – ( a + f + k + p)

dxy = - ¾ ln (1- 4/3 D)

F81 B = 1 – ( 2A + 2

C + 2G + 2

T)

dxy = - B ln (1- D/B)

Equal rate

Unequal base freqs

K2PP = c + h + i + n Transitions

Q = b + d + e + g + j + l + m + o Transversions 

1

1-2P-Qdxy =

1 ln

2

1 ln 1

4 1-2Q

+

Page 11: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Distances criterion for diploid dataDistances criterion for diploid data

Dn -ln Jxiyi

JxiJyi

Nei 1972

=

I

Jx = xi2

Jx = yi2

Jxy = xiyi

Cavalli Sforza 1967 Darc = (1/L) (2/)2

 

= cos-1 xiyi

Page 12: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Phylogenetic reconstruction criterion for distance data

V1

V2

V3

V4

V5

A

B

C

D

Additive tree (NJ)

Ultrametric tree (UPGMA)

A

B

C

V1

V2

V3

V4

Properties Properties

dAB = v1 + v2

dAC = v1 + v3 + v4

dAD = v1 + v3 + v5

dBC = v2 + v3 + v5

dCD = v4 + v5

dAB = v1 + v2 + v3

dAC = v1 + v2+ v4

dBC = v3 + v4

v3 = v4

v1 = v2 + v3 = v2 = v4

Page 13: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Maximum Likelihood

(1)C….GGACACGTTTA….C

(2)C….AGACACCTCTA….C

(3)C….GGATAAGTTAA….C

(4)C….GGATAGCCTAG….C

1 J n

4

3

2

1

C AC G

C AC G

6

5

4321

Unrooted treeTree after rooting at an

internal node

Lj = ProbA

A

C AC G

+ Prob

A

C + Prob…….

L = L1 x L2 x L3…x LN. = Lj LnL = ln L1+ ln L2 + …. LN = ln Lj

LD = Pr (DH)

Page 14: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Hypothesis testingHypothesis testingLikelihood ratio test

= log L1 – log L0

Rate variation

Appropriate substitution Model

2 2 distributiond.f. = N sequences in the tree –2; or

d.f = difference number of parameters H1 and H0

Page 15: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

BootstrappingBootstrappingHow well supported are the groups?

Trumpet fish

Page 16: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Maximum ParsimonyMaximum ParsimonyMinimize tree length

To obtain rooted trees (and character polarity) use an outgroup . The ingroup is monophyletic.

1 ATATT

2 ATCGT

3 GCAGT

4 GCCGT

Tree (first site)

1

2

3

4

1 change 5 changes

G

G

AG

A

AG

G

GA

A

A

Page 17: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

C

Maximum Parsimony-Maximum Parsimony- example

C

T

CT

T

Site 2 Site 3

AA C

A A

C C C

C

C

A A

Site 4

T

G

G

G

GG

Site 5 No changes

TT

T T

T T

Tree length

L = ki=1 li

Page 18: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Maximum parsimony:Maximum parsimony: example

Sites

1 2 3 4 5 Total

((1,2),(3,4)) 1 1 2 1 0 5

((1,3),(2,4)) 2 2 1 1 0 6

((1,4),(2,3)) 2 2 2 1 0 7

Tree

Phylogenetically informative sites

Page 19: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

NetworksNetworks

• Phylogenetic representation allowing reticulation

• More appropriate for intraespecific data

• Ancestor is alive

• hybridization, recombination, horizontal transfer, polyploidization agct

agctacat

ac ct

acat agctacct

1

2 34

5 6 7

Page 20: DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Multivariate clusteringMultivariate clusteringC1 C2 C3 C4 C5 C6 C7

1

2

3

4

5

5 X 7

similarity criterion correlations

7 x 7

Calculate eigenvectors with highest eigenvalues

Project data onto new axes (eigenvectors)

••

••

X 1st axis

Y2nd axis Z 3rd axis