Upload
cameron-hensley
View
213
Download
0
Embed Size (px)
Citation preview
Thinking About DNA Database Searches
William C. ThompsonDept. of Criminology, Law & Society
University of California, Irvine
Value of DNA Match for Proving Identity
Prior Odds x Likelihood Ratio = Posterior Odds
May be very low
1 x ------------------ RMP + FPP*
*Actually RMP + [FPP x (1-RMP)], see Thompson, Taroni & Aitken, 2003
1:1 million x 1 billion:1 = 1000:1
1:1 million x 1 million:1 = 1:1
1:1000 x 10,000:1 = 10:1
Mysterious Clusters and the Law of Truly Large Numbers
• In a truly large sample space, seemingly unusual events are bound to occur– E.g., double lottery winners; cancer clusters
– See, Diaconis & Mosteller (1989). Methods for studying coincidences, JASA, 84 853-861.
Taking Account of Coincidence When Searching Truly Large DNA Databases
Should the frequency of the matching profile be presented to the jury? Standard answers:– No
• NRC I – test additional loci; report only freq. of those
• NRC II—multiply freq. by N (for database)
– Yes• Friedman/Donnelley—present LR but keep in mind prior
odds may be very low
• Prosecutors Everywhere—jury should hear most impressive number possible “because it’s relevant”
My Solution: Present Profile Frequency Only When It Equals the RMP*
• Multiple Tests of Different Hypotheses– Search unsolved crime evidence against offender
database– For each offender, p(match|not source) = frequency
• Multiple Tests of Same Hypothesis– Search suspect against unsolved crime database to
see if he matches any unsolved crime– For this suspect, p(match|not source) = Freq. x N
*RMP = p(match|suspect not the source)
My Solution: Present Profile Frequency Only When It Equals the RMP*
• Testing relatives of people who almost match– For most suspects, p(match|not source) = frequency
of matching profile– For relatives of people who almost match, p(match|
not source) >>>> frequency– Therefore it is misleading to present the frequency
of the matching profile in cases where the suspect is selected because a relative almost matches
Database Searches and the Birthday Problem
• The probability that a randomly chosen person will have my birthday is 1 in 365
• The probability that any two people in a room share a birthday can be far higher– With 23 people in a room, the likelihood that
two will share a birthday exceeds 1 in 2– With 60 people in the room, the probability is
nearly 1 in 1
Database Searches and the Birthday Problem
• Suppose the probability of a random match between any two DNA profiles is between 1 in 10 billion and 1 in 1 trillion
• What is the probability of finding a match between two such profiles in a database of:– 1,000– 100,000– 1,000,000
Approximate likelihood that two profiles in a DNA database will match
Database Size 1 in
10 billion
1 in
100 billion
1 in
1 trillion
1000 1 in 20,000 1 in 200,000 1 in 2 million
10,000 1 in 200 1 in 2000 1 in 20,000
100,000 1 in 2.5 1 in 20 1 in 200
1,000,000 1 in 1 1 in 1 1 in 2.5
Profile Frequency
Why present a birthday statistic in database cases?
• Because it is relevant…