The drop-out/drop-in model

  • View
    453

  • Download
    0

  • Category

    Science

Preview:

DESCRIPTION

Illustration of the LR principle applied to DNA mixtures

Citation preview

31 May 2012

The Likelihood Ratio model

Hinda Hanedh.haned@nfi.minvenj.nl

Outline

I Illustration of the LR principle applied to DNAmixtures

I Two-person mixtures to explain the principle(but no general formula is given!)

I Example with and without allelic dropout

The LR model— Avila June 2013 1

DNA mixtures

I Two or more individuals contributing to the sample

I More than two peaks per locus

The LR model— Avila June 2013 2

Why are mixtures challenging?

What genotypes created the mixture?

I 12,12/13,15

I 12,15/13,15

I 12,13/13,15

I ...

The LR model— Avila June 2013 3

ISFG DNA commission recommendations

The likelihood ratio is the preferred approach to mixtureinterpretation. DNA commission 2005

Probabilistic approaches and likelihood ratio principles are superiorto classical methods.

DNA commission 2012

The LR model— Avila June 2013 4

The Bayesian framework: likelihood ratios

LR =Pr(data|Hprosecution)

Pr(data|Hdefence)

I data: alleles and their peaks

I ratio of two probabilities or,ratio of two likelihoods

The LR model— Avila June 2013 5

Interpretation

I Need for an interpretation framework that applies to all typesof samples:

• High template• Low template: PCR-related stochastic effects are exacerbated,

creating uncertainty about the composition of thecrime-sample

Reporting officers make pre-case assessments and formulate thepropositions to be evaluated within the likelihood ratio framework.

The LR model— Avila June 2013 6

Dropout/Drop-in definitions

Allele or locus dropout is defined as a signal that is below the limitof detection threshold, it occurs when one or both alleles of aheterozygote fail to PCR-amplify.

Allele drop-in is an allele that is not associated with thecrime-sample and remains unexplained by the contributors undereither Hp or Hd.

The LR model— Avila June 2013 7

Low/High template DNA

High template DNA

I The epg reflects the composition of the sample:

• no dropout• no drop-in

Low level DNA

I The epg does not reflect the composition of the sample:

• allele dropout• allele drop-in• stutters• ...

The LR model— Avila June 2013 8

Part 1: High template DNA, the epg reflects thecomposition of the sample.

The LR model— Avila June 2013 9

Two-person mixture example

I Two-person mixture

The LR model— Avila June 2013 10

Two-person mixture example

Locus1

Evidence 9,11,12

Suspect 9,11

Victim 11,12

I Hp: Suspect + Victim contributed to the sample

I Hd : Victim + Unknown person (unrelated to the suspect)contributed to the sample

The LR model— Avila June 2013 11

Two-person mixture: Under Hp

Locus1

Evidence 9,11,12

Suspect 9,11

Victim 11,12

Hp: Suspect + Victim contributed tothe sample

Pr(Evidence|Hp) = 1

The LR model— Avila June 2013 12

Two-person mixture: Under Hd

Locus1

Evidence 9,11,12

Victim 11,12

Unknown ?

Hd : Unknown + Victim contributedto the sample

The LR model— Avila June 2013 13

Two-person mixture: Under Hd

I The victim’s profile explains 11 and 12

I The unknown has to have allele 9: allele 9 is constrained

Locus1

Evidence 9,11,12

Victim 11,12

Unknown 9,119,129,9

Pr(evidence|Hd) =2p9p11 + 2p9p12 + p29

The LR model— Avila June 2013 14

Two-person mixure: LR

I Hp: Suspect + Victim contributed to the sample

I Hd : Victim + Unknown person (unrelated to the suspect)contributed to the sample

Pr(Evidence|Hp) = 1

Pr(evidence|Hd) = 2p9p11 + 2p9p12 + p29

LR =1

2p9p11 + 2p9p12 + p29

The LR model— Avila June 2013 15

Two-person mixure: LR

I Hp: Suspect + Victim contributed to the sample

I Hd : Victim + Unknown person (unrelated to the suspect)contributed to the sample

Pr(Evidence|Hp) = 1

Pr(evidence|Hd) = 2p9p11 + 2p9p12 + p29

LR =1

2p9p11 + 2p9p12 + p29

The LR model— Avila June 2013 15

What is the underlying model?

I LR is a function of the genotypic frequencies

I Assumes independent association of the alleles within loci:Hardy Weinberg equilibrium

I Multiply between loci: Linkage equilibrium

The product rule

The LR model— Avila June 2013 16

Summary

I Derive the possible genotypes for the unknowns

I Determine the genotypic probabilities

I Sum up the probabilities for all plausible genotypes

I Calculate the ratio of the probabilities under Hp and under Hd

You should not do this by hand!

I usually, analysis of 15 or more loci simultaneously

I calculations get complicated with two or more unknowns

The LR model— Avila June 2013 17

What happens if there are two unknowns under Hd?

I Hp: Suspect + Victim contributed to the sample

I Hd : Two Unknown individuals (unrelated to the suspect)contributed to the sample

Locus1

Evidence 9,11,12

Unknown 1 ?

Unknown 2 ?

I Have to consider all theplausible genotypiccombinations for the unknownthat explain alleles 9,11,12observed in the crime-sample.

The LR model— Avila June 2013 18

Under Hd: two unknowns

Unknown 1 Unknown 2

9,9 11,1211,11 9,1212,12 9,119,11 9,129,11 11,129,12 11,12

Pr(Evidence|Hd) = 2(p292p11p12 + p2112p9p12 + p2122p9p11+

2p9p112p9p12 + 2p9p112p11p12 + 2p9p122p11p12)

The LR model— Avila June 2013 19

LR: two unknowns

LR =1

2(p292p11p12 + p2112p9p12 + p2122p9p11 + 2p9p112p9p12 + 2p9p112p11p12 + 2p9p122p11p12)

I Increasing the number of unknowns increases the number ofterms under Hd

The LR model— Avila June 2013 20

Part 2: Low template DNA, the epg does not reflectthe composition of the sample.

The LR model— Avila June 2013 21

Likelihood ratios vs. Low template DNA

I Classical approach of the LR: the product rule

I Main source of uncertainty in previous examples: Genotypesof unknown contributors

We will now see how we can modify the classical LR approach toaccount for uncertainty in the data, due to low template DNAconditions

The LR model— Avila June 2013 22

Uncertainty in the data: single-source example

Locus1

Evidence 11

Suspect 9,11

I Hp: Suspect contributed to the sample

I Hd: Unknown person (unrelated tothe suspect) contributed to the sample

I Classical LR: Pr(Evidence|Hp) = 0

I LR with dropout and drop-in: Pr(Evidence|Hp) 6= 0

The LR model— Avila June 2013 23

Uncertainty in the data: single-source example

Locus1

Evidence 11

Suspect 9,11

I Hp: Suspect contributed to the sample

I Hd: Unknown person (unrelated tothe suspect) contributed to the sample

I Classical LR: Pr(Evidence|Hp) = 0

I LR with dropout and drop-in: Pr(Evidence|Hp) 6= 0

The LR model— Avila June 2013 23

LR with dropout and drop-in

I Main theory described by:• Haned et al, FSIG, 2012• DNA commission ISFG, FSIG 2012• Gill et al, FSI 2007• Curran et al, FSI, 2005

I Two key parameters in the model• dropout: Heterozygote, Homozygote• drop-in: not treated here

Basic model: qualitative data only, also called the drop-model.

The LR model— Avila June 2013 24

LR with dropout and drop-in

I An allele drops out with a probability of d

I An allele does not drop out with a probability of 1− d

I Allele dropout from a heterozygote: d

I Allele dropout from a homozygote: d ′

The LR model— Avila June 2013 25

Single-source example: Under Hp

I Hp: Suspect contributed to the sample

dropout

Allele 9 yesAllele 11 no

Pr(evidence|Hp) = Pr(dropout of 9)× Pr(non-dropout of 11)

= d × (1− d)

The LR model— Avila June 2013 26

Single-source example: Under Hd

I Unknown contributed to the sample

Locus1

Evidence 11

Unknown ?

The LR model— Avila June 2013 27

The Q alleles

I What are the possible genotypes for the unknown?• The dropped out alleles are gathered under a virtual alleles Q• Q is a ‘place-holder’ to all possible genotypes!• The Unknown’s genotype has to explain allele 11 (no drop-in)

The LR model— Avila June 2013 28

Under Hd

Locus1

Evidence 11

Unknown 11,1111,Q

I Q can be anything except 11

I Unknown genotype must explain 11

I This leaves us with two possibilities:

• Homozygote: 11, 11• Heterozygote 11, Q

The LR model— Avila June 2013 29

Q allele

• Locus L has five alleles: {9, 10, 11, 12}• p9 + p10 + p11 + p12 = 1

• pQ = 1− p11

• pQ = p9 + p10 + p12

I 11,Q can be:• 9,11• 10,11• 11,12

No need to worry about deriving all thegenotypes!

I All thee genotypes are regroupedunder 11Q with frequency: 2p11pQ

The LR model— Avila June 2013 30

Summary

I Two possible genotypes: 11,11 and 11Q

Dropout Genotype probability11,11 (1− d ′) p21111Q (1− d)d 2p11pQ

LR =d(1− d)

(1− d ′)p211 + (1− d)d2p11pQ

The LR model— Avila June 2013 31

Summary

I Two possible genotypes: 11,11 and 11Q

Dropout Genotype probability11,11 (1− d ′) p21111Q (1− d)d 2p11pQ

LR =d(1− d)

(1− d ′)p211 + (1− d)d2p11pQ

The LR model— Avila June 2013 31

LR vs. probability of dropout

The LR model— Avila June 2013 32

Low-template mixture

I Low-template DNA mixture

The LR model— Avila June 2013 33

Two-person mixture example: one dropout, no drop-in

Locus1

Evidence 9,10,12

Suspect 9,11

Victim 10,12

I Hp: Suspect + Victim

I Hd: Two unknowns (unrelated tosuspect/victim)

The LR model— Avila June 2013 34

Under Hp: Dropout from the suspect

Suspect 9,11 d(1-d)

Victim 10,12 (1-d)2

Pr(Evidence|Hp) = d(1− d)3

The LR model— Avila June 2013 35

Under Hd: dropout is possible

I Hd: two unknowns

I Dropout is possible: Q allele, can be anything except 9, 10, 12

9,9 10,12

No-dropout

10,10 9,1212,12 9,109,12 9,109,12 10,1210,12 9,10

9Q 10,12One dropout10Q 9,12

12Q 9,10

The LR model— Avila June 2013 36

Under Hd: dropout is possible

I Hd: two unknowns

I Dropout is possible: Q allele, can be anything execept 9, 10,12

Dropout Genotype Prob.

9,9 10,12(1− d ′)(1− d)2

p29 × 2p10p1210,10 9,12 p210 × 2p9p1212,12 9,10 p212 × 2p9p109,12 9,10

(1− d)42p9p12 × 2p9p10

9,12 10,12 2p9p12 × 2p10p1210,12 9,10 2p10p12 × 2p9p109Q 10,12

d(1-d)32p9pQ × 2p10p12

10Q 9,12 2p10pQ × 2p9p1212Q 9,10 2p12pQ × 2p9p10

The LR model— Avila June 2013 37

Likelihood ratio

The LR model— Avila June 2013 38

LR vs. dropout probability

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

2530

d

LRLR vs. Drop−out

The LR model— Avila June 2013 39

How about drop-in probability?

Under Hp: Dropout from the suspect

Suspect 9,11 d(1-d)

Victim 10,12 (1-d)2

I If drop-in=0 Pr(Evidence|Hp) = d(1− d)3

I If drop-in 6= 0: Pr(Evidence|Hp) = d(1− d)3 × (1− c)

I c is the probability of drop-in

The LR model— Avila June 2013 40

Under Hd: two unknowns

I Dropout is possible, no drop-in: Q allele, can be anythingexcept 9, 10, 12

I If drop-in is possible: Q allele can be anything!

I So the genotypes of the unknown have no longer to explainalleles 9, 10, 12.

I This increases the number of terms under Hd

The LR model— Avila June 2013 41

Think of drop-in as a scaling factor

I If an allele is a drop-in: multiply by c× frequency of allele i.

I If an allele is not a drop-in, multiply by (1− c)

The LR model— Avila June 2013 42

LR vs. dropout and drop-in probability

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

2530

d

LRLR vs. Drop−out

drop−in=0drop−in=0.01drop−in=0.05

The LR model— Avila June 2013 43

Summary

I Derive the possible genotypes for the unknowns

I Determine the genotypic probabilities

I Sum up the probabilities for all plausible genotypes

I Determine the corresponding dropout probabilities

I Calculate the ratio of the probabilities under Hp and under Hd

The LR model— Avila June 2013 44

Software

I Derive genotypes of the unknowns is the key issue

I Assign genotype probability to each genotype

I The number of possibilities increases with the number ofcontributors, deriving LRs for mixtures by hand is not realistic!

The LR model— Avila June 2013 45

Casework example 1: A 3-person mixture

I Victim is major contributor

I At least two minor contributors

The LR model— Avila June 2013 46

Casework example 1: A 3-person mixture

I Hp: Victim + Suspect + Unknown

I Hd: Victim + two unknowns

The LR model— Avila June 2013 47

Sensitivity analysis: Overall LR

Same dropout probability for allcontributors

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

d

log1

0 LR

0.01 0.20 0.40 0.60 0.80 0.99

Overall LR for the 10 SGM+ loci

The LR model— Avila June 2013 48

Sensitivity analysis: Overall LR

Average probability vs. Splittingdropout/contributor =⇒ Nosignificant differences between themodels!

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

d

log1

0 LR

0.01 0.20 0.40 0.60 0.80 0.99

Basic modelSplitDrop model

Overall LR for the 10 SGM+ loci

The LR model— Avila June 2013 49

Plausible ranges for PrD?

LR dropout≤ 1010 0.01 ≤ D ≤ 0.50[109, 108] 0.50 < D ≤ 0.99

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

d

log1

0 LR

0.01 0.20 0.40 0.60 0.80 0.99

Overall LR for the 10 SGM+ loci

The LR model— Avila June 2013 50

Casework example 2: two-person mixture

LR dropout(1) [1010, 109] 0 ≤ D ≤ 0.50(2) [109, 106] 0.50 < D ≤ 0.76(3) [106, 104] 0.76 < D ≤ 0.84(4) [104, 1] D > 0.84

0

5

10

Probability of dropout

log1

0 LR

0.01 0.50 0.76 0.93

(1) (2) (3) (4)

The LR model— Avila June 2013 51

Casework example 3: three-person mixture

LR dropout(1) [1014, 109] 0 ≤ D ≤ 0.08(2) [109, 106] 0.08 < D ≤ 0.53(3) [106, 104] 0.53 < D ≤ 0.75(4) [104, 100] 0.75 < D ≤ 0.86(5) [100, 1] 0.86 < D ≤ 0.93

0

5

10

15

Probability of dropout

log1

0 LR

0.08 0.53 0.75 0.86

(1) (2) (3) (4) (5)

The LR model— Avila June 2013 52

All models are wrong...

I Continuous models are expected to extract more informationfrom the data, but their implementation is difficult andtedious in practice

I semi-continuous methods are easier to implement and canserve as a good approximation

The LR model— Avila June 2013 53

How to inform dropout probabilities?

I Estimate dropout probabilities via logistic regression• difficult to extended to > 2-person mixtures

I Define plausible ranges of dropout

• based on expert belief• based on maximum likelihood principle

I Bayesian approach: combine prior belief and likelihood toyield a posterior distribution

The LR model— Avila June 2013 54