Thinking About DNA Database Searches William C. Thompson Dept. of Criminology, Law & Society University of California, Irvine

Thinking About DNA Database Searches

William C. ThompsonDept. of Criminology, Law & Society

University of California, Irvine

Value of DNA Match for Proving Identity

Prior Odds x Likelihood Ratio = Posterior Odds

May be very low

1 x ------------------ RMP + FPP*

*Actually RMP + [FPP x (1-RMP)], see Thompson, Taroni & Aitken, 2003

1:1 million x 1 billion:1 = 1000:1

1:1 million x 1 million:1 = 1:1

1:1000 x 10,000:1 = 10:1

Mysterious Clusters and the Law of Truly Large Numbers

• In a truly large sample space, seemingly unusual events are bound to occur– E.g., double lottery winners; cancer clusters

– See, Diaconis & Mosteller (1989). Methods for studying coincidences, JASA, 84 853-861.

Taking Account of Coincidence When Searching Truly Large DNA Databases

Should the frequency of the matching profile be presented to the jury? Standard answers:– No

• NRC I – test additional loci; report only freq. of those

• NRC II—multiply freq. by N (for database)

– Yes• Friedman/Donnelley—present LR but keep in mind prior

odds may be very low

• Prosecutors Everywhere—jury should hear most impressive number possible “because it’s relevant”

My Solution: Present Profile Frequency Only When It Equals the RMP*

• Multiple Tests of Different Hypotheses– Search unsolved crime evidence against offender

database– For each offender, p(match|not source) = frequency

• Multiple Tests of Same Hypothesis– Search suspect against unsolved crime database to

see if he matches any unsolved crime– For this suspect, p(match|not source) = Freq. x N

*RMP = p(match|suspect not the source)

My Solution: Present Profile Frequency Only When It Equals the RMP*

• Testing relatives of people who almost match– For most suspects, p(match|not source) = frequency

of matching profile– For relatives of people who almost match, p(match|

not source) >>>> frequency– Therefore it is misleading to present the frequency

of the matching profile in cases where the suspect is selected because a relative almost matches

Database Searches and the Birthday Problem

• The probability that a randomly chosen person will have my birthday is 1 in 365

• The probability that any two people in a room share a birthday can be far higher– With 23 people in a room, the likelihood that

two will share a birthday exceeds 1 in 2– With 60 people in the room, the probability is

nearly 1 in 1

Database Searches and the Birthday Problem

• Suppose the probability of a random match between any two DNA profiles is between 1 in 10 billion and 1 in 1 trillion

• What is the probability of finding a match between two such profiles in a database of:– 1,000– 100,000– 1,000,000

Approximate likelihood that two profiles in a DNA database will match

Database Size 1 in

10 billion

1 in

100 billion

1 in

1 trillion

1000 1 in 20,000 1 in 200,000 1 in 2 million

10,000 1 in 200 1 in 2000 1 in 20,000

100,000 1 in 2.5 1 in 20 1 in 200

1,000,000 1 in 1 1 in 1 1 in 2.5

Profile Frequency

Why present a birthday statistic in database cases?

• Because it is relevant…

Documents

Thinking About DNA Database Searches William C. Thompson Dept. of Criminology, Law & Society University of California, Irvine