39
Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università Roma Tre

Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Representing and Solving Complex DNA Identification

Cases UsingBayesian Networks

Representing and Solving Complex DNA Identification

Cases UsingBayesian Networks

Philip DawidUniversity College London

Julia Mortera & Paola VicardUniversità Roma Tre

Page 2: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

FORENSIC USES FOR DNA PROFILES

FORENSIC USES FOR DNA PROFILES

• Murder/Rape/…: Is A the culprit?

• Paternity: Is A the father of B?

• Immigration: Is A the mother of B? How are A and B related?

• Disasters: 9/11, tsunami, Romanovs,…

Page 3: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Disputed PaternityDisputed Paternity

We have DNA data D from a disputed child cc, its mother mm and the putative father pfpf

We have DNA data D from a disputed child cc, its mother mm and the putative father pfpf

child

founder

founder

hypothesis

Building blocks: founder, child

query

founder

If pfpf is not the true father tftf, this is a “random” alternative father afaf

If pfpf is not the true father tftf, this is a “random” alternative father afaf

, query

Page 4: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Disputed PaternityDisputed Paternity

We have DNA data D from a disputed child cc, its mother mm and the putative father pfpf

We have DNA data D from a disputed child cc, its mother mm and the putative father pfpf

LIKELIHOOD RATIO

Prob( | )

Prob( | )D

D PLR

D P=

Prob( | , , )

Prob( | , )

c m pf P

c m P=

Essen-Möller 1938

If pfpf is not the true father tftf, this is a “random” alternative father afaf

If pfpf is not the true father tftf, this is a “random” alternative father afaf

Page 5: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

MISSING DNA DATAMISSING DNA DATA

• What if we can not obtain DNA from the suspect ? (or other relevant individual?)

• Sometimes we can obtain indirect information by DNA profiling of relatives

• But analysis is complex and subtle…

Page 6: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

query

child

founder

founder

founder

hypothesis

Disputed Paternity CaseDisputed Paternity Case

Building blocks: founder, child, query

Page 7: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Complex Paternity CaseComplex Paternity Case

We have DNA from a disputed child c1c1 and its mother m1m1 but not from the putative father pfpf. We do have DNA from c2c2 an undisputed child of pfpf, and from her mother m2m2 as well as from two undisputed full brothers b1b1 and b2b2 of pfpf.

We have DNA from a disputed child c1c1 and its mother m1m1 but not from the putative father pfpf. We do have DNA from c2c2 an undisputed child of pfpf, and from her mother m2m2 as well as from two undisputed full brothers b1b1 and b2b2 of pfpf.

founder

founder

founder

founder

founder

child

child

child

child child

query

hypothesis

Building blocks: founder, child, query

Page 8: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Criminal Identification CaseCriminal Identification Case

A bodybody has been found, burnt beyond recognition, but there is reason to believe it might be that of a missing criminal CRCR. DNA is available from the bodybody, from the wifewife of CRCR, and from two children c1c1 and c2c2 of CR and wifewife

A bodybody has been found, burnt beyond recognition, but there is reason to believe it might be that of a missing criminal CRCR. DNA is available from the bodybody, from the wifewife of CRCR, and from two children c1c1 and c2c2 of CR and wifewife

founder

founder

child

founder

query

childhypothesis

Building blocks: founder, child, query

Page 9: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

• Each building block (founderfounder / childchild / queryquery) in a pedigree can be an INSTANCE of a generic CLASS network — which can itself have further structure

• The pedigree is built up using simple mouse clicks to insert new nodes/instances and connect them up

• Genotype data are entered and propagated using simple mouse clicks

Object-Oriented Bayesian NetworkObject-Oriented Bayesian Network

HUGIN 6HUGIN 6

Page 10: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Under the microscope…Under the microscope…

• Each CLASS is itself a Bayesian Network, with internal structure

• Recursive: can contain instances of further class networks

• Communication via input and output nodes

Page 11: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Marker VWA

(Austro-German population allele frequencies)

12 .0003

13 .0018

14 .1009

15 .1004

16 .1949

17 .2834

18 .2162

19 .0866

20 .0137

21 .0015

22 .0003

Single-marker analysisSingle-marker analysis

(multiply LR’s across markers)

Page 12: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Lowest Level Building BlocksLowest Level Building BlocksSTR MARKER having associated repertory of alleles together with their frequenciesgene

mendel

MENDELIAN SEGREGATIONChild’s gene copies paternal or maternal gene, according to outcome of fair coin flip

GENOTYPE consisting of maximum and minimum of paternal and maternal genes

genotype

Page 13: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

founderfounder

FOUNDER INDIVIDUAL represented by a pair of genes pgin and mgin (instances of gene) sampled independently from population distribution, and combined in instance gt of genotype

gene gene

genotype

Page 14: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

childchildCHILD INDIVIDUALpaternal [maternal] gene selected by instances fmeiosis

[mmeiosis] of mendel from father’s [mother’s] two genes, and combined in instance cgt of genotype

mendel mendel

genotype

Page 15: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

queryquery

query

QUERY INDIVIDUALChoice of true father’s paternal gene tfpg [maternal gene mfpg] as either that of f1 or that of f2, according as tf=f1? is true or false.

QUERY INDIVIDUALChoice of true father’s paternal gene tfpg [maternal gene mfpg] as either that of f1 or that of f2, according as tf=f1? is true or false.

Page 16: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Complex Paternity CaseComplex Paternity Case

founder

founder

founder

founder

founder

child

child

child

child child

query

hypothesis

• Measurements for 12 DNA markers on all 6 individuals

• Enter data, “propagate” through system

• Overall Likelihood Ratio in favour of paternity:

1300

Page 17: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

MORE COMPLEX DNA CASES

MORE COMPLEX DNA CASES

• Mutation• Silent/missed alleles,…• Mixed crime stains

– rape– scuffle

• Multiple perpetrators and stains• Database search• Contamination, laboratory errors

– …

Page 18: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

MUTATIONMUTATION

mendelmut

+ appropriate network mut to describe mutation process

Page 19: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

e.g. proportional mutation:e.g. proportional mutation:

founderProb(otherg)

~ mutation rate

mut

– or build other, more realistic, models

Page 20: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

SILENT ALLELESSILENT ALLELES

Code by additional allele (99)

gene

genotype

unobserved + inheritede.g. 5 = 5/5 or 5/s

Page 21: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

MISSED ALLELESMISSED ALLELES

genotype

geneobs geneobs

unobserved + non-inherited

geneobs

Page 22: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

COMBINATIONCOMBINATION

• Can combine any or all of above features (and others), by using all appropriate subnetworks

• Can use any desired pedigree network

– no visible difference at top level

• Simply enter data (and desired parameter-values) and propagate…

Page 23: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Effect of accounting for silent allele

Effect of accounting for silent allele

• Simple paternity testing

• Paternity testing with additional measured individuals

Page 24: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Marker VWA

(Austro-German population allele frequencies)

12 .0003

13 .0018

14 .1009

15 .1004

16 .1949

17 .2834

18 .2162

19 .0866

20 .0137

21 .0015

22 .0003

Page 25: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Simple paternity testing

– allowing for silent alleles

Simple paternity testing

– allowing for silent alleles

Page 26: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

pr(silent) LR LR

0 0 3.8

0.000015 26 30

0.0001 125 127

0.001 203 203

mgt = 12/20 pfgt = 13 cgt = 12

Paternal incompatibilityPaternal incompatibility

p12 = 0.0003 – rare allele

with mutation ~ 0.005

Page 27: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

pr(silent) LR

0 Impossible

0.000015 4.6

0.0001 4.6

0.001 4.6

mgt = 16 pfgt = 18 cgt = 18

The mother must have passed a silent allele to the child– who must have inherited allele 18 from his father

Maternal incompatibilityMaternal incompatibility

Page 28: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Paternity testingPaternity testing

Page 29: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Paternity testing with brother tooPaternity testing with brother too

Page 30: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Overall likelihood ratio is

overall D BLR LR LR= ´

Consider additional information carried by the brother’s data B:

)P,D|BPr()P,D|BPr(

LRB

where D denotes data on triplet (pf, c, m)

Page 31: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

mgt = 12/15 pfgt = 14 cgt = 12Incompatible tripletIncompatible triplet

16/20 12/14 14 22

p(silent) LRD LRB LRB LRB LRB

0 0 1 0.55 1 3334

0.000015 0.5 1 0.55 1.00 1595

0.0001 2.5 1 0.55 1.00 404

0.001 7.5 1 0.55 1.00 46

B =

p22 = .0003

*Maximum LRoverall is 1027, at p(silent) = 0.0000642

*

Page 32: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

mgt = 12/15 pfgt = 13 cgt = 12/13Compatible tripletCompatible triplet

13 13/16 21/22 22

p(silent) LRD LRB LRB LRB LRB

0 556 1 1 1 1

0.000015 551 1 1.00 1 0.51

0.0001 528 1 1.02 1 0.52

0.001 410 1 1.11 1 0.61

B =

Page 33: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

ExtensionsExtensions

• Estimation of mutation rates from paternity data

• Peak area data– mixtures– contamination– low copy number

Page 34: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Network to estimatemutation rate

Network to estimatemutation rate

Page 35: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Marker: D8 D18 D21

Alleles: 10 11 14 13 16 17 59 65 67 70

Peak

Area (RFUs):

6416 383 5659 38985 1914 1991 1226 1434 8816 8894

Suspect alleles in yellow

Excerpt of data on 6 markers from Evett et al. (1998)

Mixed crime traceMixed crime trace

Page 36: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Mixed crime trace – alleles onlyMixed crime trace – alleles only

Page 37: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Mixed crime trace – peak areasMixed crime trace – peak areas

Page 38: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Marker: D8 D18 D21

Alleles: 10 11 14 13 16 17 59 65 67 70

Peak

area:6416 383 5659 38985 1914 1991 1226 1434 8816 8894

Mixed crime traceMixed crime trace

+ 3 more…• LR (alleles only):

25,000• LR (peak areas too):

170,000,000

Page 39: Representing and Solving Complex DNA Identification Cases Using Bayesian Networks Philip Dawid University College London Julia Mortera & Paola Vicard Università

Thanks to:

Steffen LauritzenRobert Cowell

and

The Leverhulme Trust

Thanks to:

Steffen LauritzenRobert Cowell

and

The Leverhulme Trust