28
Statistical Analysis of DNA • Simple Repeats – Identical length and sequence • agat agat agat agat agat • Compound Repeats – Two or more adjacent simple repeats • agat agat agat ttaa ttaa ttaa • Complex Repeats – Variable unit length & possible intervening seq • agat agat aggat agat agat ttaacggccat agat agat

Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Embed Size (px)

Citation preview

Page 1: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Statistical Analysis of DNA

• Simple Repeats– Identical length and sequence

• agat agat agat agat agat

• Compound Repeats– Two or more adjacent simple repeats

• agat agat agat ttaa ttaa ttaa

• Complex Repeats– Variable unit length & possible intervening seq

• agat agat aggat agat agat ttaacggccat agat agat

Page 2: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

STR NOMENCLATURE

• Microvariants

– Alleles that contain incomplete units

• TH01 9.3

• aatg aatg aatg aatg aatg aatg aatg aatg aatg aatg - 10

• aatg aatg aatg aatg aatg aatg atg aatg aatg aatg - 9.3

Page 3: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

STRs Used In Forensic Science• Need lots of variation - polymorphic• Overall short segments - 100-400 bp

– Can use degraded DNA samples– Segment size usually limits preferential amplification

of smaller alleles• Single base resolution

– TH01 9.3• TETRANUCLEOTIDE REPEATS

– Narrow allele size range - multiplexing

– Reduces allelic dropout (stochastic effects)

– Use with degraded DNA possible

– Reduced stutter rates - easier to interpret mixtures

Page 4: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

ALLELIC LADDERS

• Artificial mixture of common alleles• Reference standards• Enable forensic scientists to compare results

– Different instruments– Different detection methods

• Allele quantities balanced • Produced with same primers as test samples• Commercially available in kits

Page 5: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Allelic Ladder Formation

Separate PCR products from varioussamples amplified with primers targetedto a particular STR locus

Combine

Re-amplify

Find representative allelesspanning population variation

Polyacrylamide Gel

Page 6: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Profiler Plus Allelic Ladders

D3S1358 FGAVWA

AMEL D8S1179 D21S11 D18S51

D5S818 D13S317D7S820

Page 7: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

ALLELIC LADDERS

Page 8: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Development of miniSTRs to Aid Testing of Degraded DNA

Page 9: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Profiler Plus

COfiler

SGM Plus

Green I

Profiler

Blue

TH01

Amel D16S539

D7S820

CSF1POTPOX

D3S1358

D16S539 D18S51D21S11

Amel

Amel

D3S1358

D3S1358

D18S51D21S11

D8S1179

D7S820

D13S317

D5S818

D19S433 D2S1338

FGA

vWA

vWA

FGA

TH01

D3S1358 vWA FGA

D7S820D5S818D13S317

TH01CSF1POTPOX

D8S1179

vWATH01 CSF1PO

TPOXAmel FGAD3S1358

Amel

PCR Product Size (bp) Same DNA Sample Run with Each of the ABI STR Kits

Power of Discrimination1:5000

1:410

1:3.6 x 109

1:9.6 x 1010

1:8.4 x 105

1:3.3 x 1012

Page 10: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

STR LOCI ALLELES

• TPOX– THYROID PEROXIDASE– Chromosome 2– AATG repeat– 6 to 13 repeats

• TH01– TYROSINE HYDROXYLASE– Chromosome 11– TCTA repeat (Bottom strand)– 4 to 11 repeats– Common microvariant 9.3

Page 11: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

STR LOCI ALLELES• vWA

– von Willebrand Factor– Chromosome 12– TCTA with TCTG repeat– 10 to 22 repeats

• D3S1358– Chromosome 3– AGAT with AGAC repeat– 12 to 20 repeats

Page 12: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

13 CODIS Core STR Loci with Chromosomal Positions

CSF1PO

D5S818

D21S11

TH01

TPOX

D13S317

D7S820

D16S539 D18S51

D8S1179

D3S1358

FGA

VWA

AMEL

AMEL

Page 13: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

CSF1PO

D5S818

D21S11

TH01

TPOX

D13S317

D7S820

D16S539 D18S51

D8S1179

D3S1358

FGA

VWA

13 CODIS Core STR Loci AMEL

AMEL

Sex-typing

Position of Forensic STR Markers on

Human Chromosomes

Penta E

Penta D

D2S1338

D19S433

Page 14: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

STR Allele Frequencies Exclusions don’t require numbers Matches do require statistics

0

5

10

15

20

25

30

35

40

45

6 7 8 9 9.3 10

Caucasians (N=427)

Blacks (N=414)

Hispanics (N=414)

TH01 Marker

*Proc. Int. Sym. Hum. ID (Promega) 1997, p. 34

Number of repeats

Fre

qu

ency

Page 15: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Hardy - Weinberg Equilibrium frequency at one locus

A1 A2

A1

A2 A1A2

A1A2

A2A2

A1A1

A1A1 A1A2 A2A2

freq(A1) = p1

freq(A2) = p2

p12 p2

22p1p2

p12 p1p2

p1p2 p22

(p1 + p2 )2 = p12 + 2p1p2 + p2

2

Page 16: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Product Rule

frequency at one locus • The frequency of a multi-locus STR

profile is the product of the genotype frequencies at the individual loci

ƒ locus1 x ƒ locus2 x ƒ locusn = ƒcombined

Criteria for Use of Product Rule

Inheritance of alleles at one locus have no effect on alleles inherited at other loci

Page 17: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Item D3S1358 D16S539 TH01 TPOX CSF1P0 D7S820

Q1 16,16 10,12 8,9.3 9,10 12,12 8,11

Item D3S1358 vWA FGA D8S1179 D21S11 D18S51 D5S818 D13S317 D7S820

Q1 16,16 15,17 21,22 13,13 29,30 16,20 8,12 12,12 8,11

CoFIler

ProfIler Plus

Page 18: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

D3S1358 = 16, 16 (homozygote)

Frequency of 16 allele = ??

Page 19: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

D3S1358 = 16, 16 (homozygote)

Frequency of 16 allele = 0.3071

When same allele:

Frequency = genotype frequency (p2)(for now!)

Genotype freq = 0.3071 x 0.3071 = 0.0943

This is the random match probability

Page 20: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

Item D3S1358 D16S539 TH01 TPOX CSF1P0 D7S820

Q1 16,16 10,12 8,9.3 9,10 12,12 8,11

Item D3S1358 vWA FGA D8S1179 D21S11 D18S51 D5S818 D13S317 D7S820

Q1 16,16 15,17 21,22 13,13 29,30 16,20 8,12 12,12 8,11

CoFIler

ProfIler Plus

Page 21: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

VWA = 15, 17 (heterozygote)

Frequency of 15 allele = ??Frequency of 17 allele = ??

Page 22: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

VWA = 15, 17 (heterozygote)

Frequency of 15 allele = 0.2361

Frequency of 17 allele = 0.1833

When heterozygous: Frequency = 2 X allele 1 freq X allele 2 freq

(2pq)Genotype freq = 2 x 0.2361 x 0.18331 = 0.0866

Overall profile frequency = Frequency D3S1358 X Frequency vWA 0.0943 x 0.0866 = 0.00817This is the combined random match probability

Page 23: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats
Page 24: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats
Page 25: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

13 14 15 16 17 18 19 2013 0 0 0 1 0 0 1 014 1 4 10 10 10 4 015 3 7 14 8 4 116 11 27 7 3 117 11 23 8 118 16 6 019 3 120 0 0

Frequency of allele 13 = [(1 + 1)/(196*2)] x 100 = 0.510%i.e. total # of occurrences / total # of alleles

Frequency of allele 15 = [(4+6+7+14+8+4+1)/(196*2)] x 100 = 11.224%i.e. total # of occurrences / total # of allelesNOTE: for the case of the homozygous occurrence (16,16) the frequency of allele 16 is twice the number of individual observations

Page 26: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

0.005 0.102 0.112 0.201 0.263 0.222 0.084 0.0100.005 0.005 0.200 0.220 0.394 0.515 0.435 0.165 0.0200.102 2.039 4.478 8.037 10.516 8.876 3.359 0.4000.112 2.459 8.825 11.547 9.747 3.688 0.4390.201 7.919 20.722 17.492 6.619 0.7880.263 13.557 22.887 8.660 1.0310.222 9.660 7.310 0.8700.084 1.383 0.3290.010 0.020

From the observed allele frequencies that we have just calculateda table of expected observations is calculated.Each entry is calculated as the allele frequency for that pair but the result must then multiplied by the total number of individuals

When heterozygous: 2 x (allele 1 freq) x ( allele 2 freq) x N = (2pq) x 196

When homozygous: (allele freq)2 x N = (p)2 x 196

Page 27: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

We now have a table of observed and a table of expected values. To compare the observed values with the expected values a a CHI-SQUARE test is performedIn EXCEL .

Step 1.Select a cell in the work sheet, the location which you like the p value of the CHI-SQUARE to appear. Step 2. From the menus, select insert then click on the Function option, Paste Function dialog box appears.Step 3.Refer to function category box and choose statistical, from function name box select CHITEST and click on OK.Step 4.When the CHITEST dialog appears:Enter the actual-range and then enter the expected-range , and finally click on OK.

The p-value will appear in the selected cell.Since the p-value of 0.9798 is greater than the level of significance (0.05), it fails to reject the null hypothesis. This verifies the independence of the alleles, as well asindicating that the the sample used is not statisticallydifferent from the general population.

Page 28: Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats

The 2 test first calculates a 2 statistic using the formula: where:

Aij = actual frequency in the i-th row, j-th columnEij = expected frequency in the i-th row, j-th columnr = number or rowsc = number of columnsA low value of 2 is an indicator of independence. As can be seen from the formula, 2 is always positive or 0, and is 0 only if Aij = Eij for every i,j.

CHITEST returns the probability that a value of the 2 statistic at least as high as the value calculated by the above formula could have happened by chance under the assumption of independence.

To find the 2 statistic value for the reported value of p:Step 1.Select a cell in the work sheet, the location which you like the CHI-SQUARE statistic to appear. Step 2. From the menus, select insert then click on the Function option, Paste Function dialog box appears.Step 3.Refer to function category box and choose statistical, from function name box select CHIINV and click on OK.Step 4.When the CHIINV dialog appears:Enter the cell containing the p-value (0.9798) and then enter 28 for the degrees of freedom , and finally click on OK.

A value of 14.98 is returned, and this is equal to the 2 statistic