13
8/19/2019 1 Artificial Intelligence in Forensic DNA Interpretation: Artifact Management and Number of Contributor Prediction Michael A. Marciano, Ph.D. Jonathan D. Adelman, M.S. Research Assistant Professor Research Assistant Professor College of Arts and Sciences Forensic and National Security Sciences Institute August 1 2019 Green Mountain DNA Conference 2 Overview Overview of AI (Machine Learning) Why? What to expect? Application: Artifact ID and NoC PACE v1 vs PACE v2 3 The Anatomy of Decision https://www.pinterest.com/pin/1477812348803917/ 4 This is a long lasting love story… Data + __________ Decision Data & Decision making Experience Validation conclusions Computational/Statistical output Input Prediction Judgement Decision

Anatomy of Data Decision - Vermont

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Anatomy of Data Decision - Vermont

8/19/2019

1

Artificial Intelligence in Forensic DNA Interpretation:

Artifact Management and Number of Contributor

Prediction

Michael A. Marciano, Ph.D. Jonathan D. Adelman, M.S.Research Assistant Professor Research Assistant Professor

College of Arts and SciencesForensic and National Security Sciences Institute

August 1 2019

Green Mountain DNA Conference 2

Overview

• Overview of AI (Machine Learning) Why? What to expect?

• Application: Artifact ID and NoC PACE v1 vs PACE v2

3

The Anatomy of Decision

https://www.pinterest.com/pin/1477812348803917/

4

This is a long lasting love story…

Data + __________ DecisionData & Decision making Experience

Validation conclusionsComputational/Statistical output

Input Prediction Judgement Decision

Page 2: Anatomy of Data Decision - Vermont

8/19/2019

2

5

Process of making a decision

Input is needed

• Electropherogram• Experience• How much DNA?• Degraded• Locus and profile wide assessments• Process related expectations (e.g. pull-up,

stutter etc)• Validation data

1. Artifact vs Allele?2. NOC?

6

Process of making a decision

JUDGEMENT

How do we value or weight the input?

https://www.abc.net.au/news/2018‐11‐07/legal‐system‐1/10465232

7

What is machine learning?

http://www.itbriefcase.net/machine‐learning‐an‐intuitive‐definition

8

Definitions

• Artificial intelligence• Definition: capability of a machine

to…• …imitate intelligent human behavior• …perform tasks that normally require

human intelligence, such as:• speech recognition• image recognition• translation• decision-making

College of Arts and Sciences | Forensic and National Security Sciences Institute

Page 3: Anatomy of Data Decision - Vermont

8/19/2019

3

9

Definitions

• Machine learning• Definition: capability of a computer

to learn without being explicitly programmed

• Branch of AI• Unlike other AI, these algorithms are

dynamic, adjust in response to data• Example: handwritten address

interpretation

College of Arts and Sciences | Forensic and National Security Sciences Institute 10

Projected utility

• “The last 10 years have been about building a world that is mobile-first. In the next 10 years, we will shift to a world that is AI-first.” (Sundar Pichai, CEO of Google, October 2016)

• “It’s hard to overstate how big of an impact it's going to have on society over the next 20 years.” (Jeff Bezos, CEO of Amazon, May 2016)

• Total value of machine learning M&A, 2014-2017• A.I. startup acquisitions, 2013-2017

College of Arts and Sciences | Forensic and National Security Sciences Institute

11

Projected utility

• Two core aspects of machine learning:• Data

• Bottleneck: machine learning requires massive data sets

• Resolution: Big data; internet of things• Computational power

• Bottleneck: Moore’s Law• Resolution: GPUs

• Take-home points:• No immediate bottlenecks• Potential application space >> applied

application space

College of Arts and Sciences | Forensic and National Security Sciences Institute 12

Advances

• Rapid adoption…

College of Arts and Sciences | Forensic and National Security Sciences Institute

Page 4: Anatomy of Data Decision - Vermont

8/19/2019

4

13

Advances

• Rapid adoption…

College of Arts and Sciences | Forensic and National Security Sciences Institute 14

Advances

• Rapid adoption…

College of Arts and Sciences | Forensic and National Security Sciences Institute

15

Advances

• Rapid adoption…

College of Arts and Sciences | Forensic and National Security Sciences Institute 16

Advances

• Rapid adoption…but not in forensic science• Latent prints• Firearms• DNA

College of Arts and Sciences | Forensic and National Security Sciences Institute

Page 5: Anatomy of Data Decision - Vermont

8/19/2019

5

17

Advances

• Rapid adoption…but not in forensic science• Latent prints: age estimation (Merkel et al.)• Firearms• DNA

College of Arts and Sciences | Forensic and National Security Sciences Institute 18

Advances

• Rapid adoption…but not in forensic science• Latent prints: age estimation (Merkel et al.)• Firearms: chemical analysis of GSR (Gallidabino et al.)• DNA

College of Arts and Sciences | Forensic and National Security Sciences Institute

19

Advances

• Rapid adoption…but not in forensic science• Latent prints: age estimation (Merkel et al.)• Firearms: chemical analysis of GSR (Gallidabino et al.)• DNA: 2014 NIJ – mixture interpretation (Marciano and Adelman)

PACECell morphology (Christopher Ehrhardt et al.)EPG peak classification (Taylor et al.)

College of Arts and Sciences | Forensic and National Security Sciences Institute 20

Advances

• Rapid adoption…but not in forensic science• Latent prints: age estimation (Merkel et al.)• Firearms: chemical analysis of GSR (Gallidabino et al.)• DNA: 2014 NIJ – mixture interpretation (Marciano and Adelman)

PACECell morphology (Christopher Ehrhardt et al.)EPG peak classification (Taylor et al.)

• Evolutionary steps forward, but not disruptive innovation• Biggest hurdles are data availability and practitioners’ caution

College of Arts and Sciences | Forensic and National Security Sciences Institute

Page 6: Anatomy of Data Decision - Vermont

8/19/2019

6

21

Conclusions

• AI is coming, whether forensic science is ready or not• Preparation for the paradigm shift:

• Treat data as fuel• Gain basic MLAI understanding

College of Arts and Sciences | Forensic and National Security Sciences Institute

http://www.buckhamduffy.com/blog/artificial‐intelligence‐machine‐learning‐and‐data

22

Application:

PACE: Probabilistic Assessment for Contributor Estimation

23

Why?

Adapted from: https://undsci.berkeley.edu/article/_0_0/howscienceworks_09

So, how many contributors do

you think there is?

You might want to get a snack first…this could take

a while.

24

What is PACE v2?

• Hybrid statistical and machine learning technique

• Probabilistic method to predict the number of contributors and assess profile complexity Fully Continuous Rapid 100% reproducible Outputs probabilities of classes (1-4+) … 5+ coming soon

• Artifact identification/correction Not just contributor estimation Traditional and non-traditional stutter Pull-up Excess noise

Page 7: Anatomy of Data Decision - Vermont

8/19/2019

7

25

Introducing PACE : Software Assist Tool

College of Arts and Sciences | Forensic and National Security Sciences Institute 26

PACE is NOT…

• Not…a method to assign allelic data to individual contributors PACE Focus: number of contributors

• Does not directly use allele labels to make conclusions PACE Focus: distinguishing true signal from artifactual signal and

recognizing patterns associated with NOC

• Not… a magic bullet PACE is an assist tool, meant to complement analyst intuition

27

PACE Models , Helper Algorithms and Results

College of Arts and Sciences | Forensic and National Security Sciences Institute 28

Sample sets

PowerPlex® Fusion 6c Globalfiler ™

Sample # 1969 3921Individuals 120 79

Template Range 3.0pg – 5.1ng 3.0pg – 3.5ngMixture Ratios 49 88

Instruments 9 ( 5 – 3500, 4 – 31XX series) 3 (3500s)

Injection time / voltage 9 times / 2 kVs 4 times / 2 kVs

College of Arts and Sciences | Forensic and National Security Sciences Institute

Page 8: Anatomy of Data Decision - Vermont

8/19/2019

8

29

Artifact Identification/Correction

Dynamic analytical threshold

• Detect alleles and remove low-level noise. • Locus - sample - specific threshold (LSST).

1

2

Pull-up • Machine learning• Automated ID and removal

30

Dynamic Analytical Threshold : Locus-Sample Specific

63.9

6.6

0.0625 ng

4.0 ng

= 23.12 ± 10.21𝑥

= 1.98 ± 1.16𝑥

31

Artifact Identification/Correction

Dynamic analytical threshold

Stutter filter

Trimming algorithms

Machine learning signal assessment

• Detect alleles and remove low-level noise. • Locus - sample - specific threshold (LSST).

• Removes noise remaining from thresholding

• Removes effects of stutter• Models: a-10 to a+5

• Probabilistic assessment and correction of remaining signal

1

2

3

4

Pull-up • Machine learning• Automated ID and removal

5

32

Results: Accuracy of detection and stutter removal

Threshold / trimming method

Stutter filter

Dropout alleles Accuracy

Incorrect remaining

alleles

Percentage of additional alleles

LSST-NR modeled 583 97.2% 142 0.79%

LSST modeled 362 98.2% 746 3.67%

50 RFU - NR modeled 1225 94.1% 44 0.23%

50 RFU stock 3004 85.5% 116 0.66%

100 RFU - NR modeled 2301 88.9% 16 0.09%

100 RFU stock 4059 80.4% 51 0.31%

150 RFU - NR modeled 3330 83.9% 6 0.03%

150 RFU stock 4957 76.0% 31 0.20%

Marciano, M. A., Williamson, V. R. & Adelman, J. D. A hybrid approach to increase the informedness of CE‐based data using locus‐specific thresholding and machine learning. Forensic Sci. Int. Genet. 35, 26–37 (2018)

Increasing

Incr

easi

ng

Page 9: Anatomy of Data Decision - Vermont

8/19/2019

9

33

Accuracy: Artifact removal

0 1000 2000 3000 4000 5000 6000 7000Number of peaks

0 20000 40000 60000 80000 100000 120000

area/height

low-min:max

excess noise

pull-up

stutter

LSST-NR

Number of peaks

95.9%

96.7%

95.7%

96.7%

94.7%

96.4%

94.1%

94.1%

93.5%

96.4%

97.0%

96.8%

PACE-GF Accuracy PACE-PPF6c Accuracy

34

PACE: Maximum Probability PP Fusion 6c® Results

PACE Predicted Number of Contributors Accuracy1 2 3 4+

Expected Number of

Contributors

1 58 2 0 0 96.0%2 6 72 0 1 91.4%3 0 7 26 2 94.9%

4+ 0 1 1 21 97.5%

College of Arts and Sciences | Forensic and National Security Sciences Institute

Submitted for Publication

PP Fusion 6c®

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑒𝑣𝑒𝑛𝑡𝑠

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑒𝑣𝑒𝑛𝑡𝑠

35

PACE-PPF6c: Detailed results

36

PACE: Maximum Probability Globalfiler™ Results

PACE Predicted Number of Contributors Accuracy1 2 3 4+

Expected Number of

Contributors

1 271 0 0 0 98.6%2 9 137 4 1 96.3%3 2 11 119 8 94.4%

4+ 0 4 19 200 95.9%

College of Arts and Sciences | Forensic and National Security Sciences Institute

Globalfiler™

Page 10: Anatomy of Data Decision - Vermont

8/19/2019

10

37

PACE-GF: Detailed results

38

4-contributor, 9:1:1:1, 0.18ng, degraded-12mU DNase I,

Pr(3)=0.18 Pr(4+)=0.82

39

5-contributor, 1:1:1:1:1, 0.075ng, degraded-12mU DNase IPr(1)= 0.22, Pr(2)=0.29, Pr(3)= 0.11 and Pr(4+)=0.37

40

PPF6c Incorrect call – sample quality

Bottom line –low quality sample

Actual nExpected mixture

ratio

Template DNA amplified (ng)

PACE predicted n

Class probability % dropout alleles

Mean % allele sharing1 2 3 4+

2 2:1 0.0375 4 0.00 0.40 0.19 0.40 27.7% 29.3%

Page 11: Anatomy of Data Decision - Vermont

8/19/2019

11

41

Interpreting Results

https://marketoonist.com/2014/04/big‐data‐analytics.html

42

Interpreting NoC Results…Output

College of Arts and Sciences | Forensic and National Security Sciences Institute

Correct Incorrect Percent Correct

Expected NOC

1 33 2 94 %2 48 0 100 %3 38 4 90 %4 17 3 85 %

P(NOC_1) P(NOC_2) P(NOC_3) P(NOC_4)

0 0.65 0.35 0

0 0.73 0.27 0

• What is the probability threshold…is 0.73 good enough?o At what probability should you know that results lack confidence?

43

PACE Results: Probability threshold

• At what probability do we expect correct result?o At what probability should you know that results lack confidence?

• Ultimately this is a lab-specific validation task

Maximum Class

Probability

PACE-PPF6c PACE-GF

% Correct % of total samples % Correct % of total samples

0.95 97.6 62.1% (123/198) 99.4 59.1% (464/785)0.9 97.8 69.7% (138/198) 99.3 68.3% (536/785)0.8 96.1 77.8% (154/198) 99.0 78.1% (613/785)

44

Prediction : Putting on the thinking cap

Prediction (method)• Input

• Electropherogram• Experience• Quality/ Quantity of DNA • Locus and profile wide

assessments• Process related components• Validation data

• PACE OUTPUT

Traditional

New

Input Prediction Judgement Decision

PACE…A new “input” tool

Page 12: Anatomy of Data Decision - Vermont

8/19/2019

12

45

Conclusions (1)

College of Arts and Sciences | Forensic and National Security Sciences Institute

• Fully continuous probabilistic approach • High accuracy and resolution between 3 & 4+• Reproducible• No change in computational resources• Fast (seconds) • Use prior to PG

46

Conclusions (2)

College of Arts and Sciences | Forensic and National Security Sciences Institute

PACE assigns weights to each probability class, allowing the analyst to assess the distribution of probabilities to aid in the decision-making process.

The combination of PACE and manual interpretation is more accurate and robust than either used in isolation.

47

Resources

College of Arts and Sciences | Forensic and National Security Sciences Institute

Developmental Validation of PACE™: Automated Artifact Identification and Contributor Estimation for use with GlobalFiler™ and PowerPlex® Fusion 6c Generated Data

Michael Marciano, Jonathan Adelman

FSI Genetics 2019 (accepted revision last night)

48

Acknowledgements

Other• Laura Haarer• Victoria Williamson• Angie Zhao• Ebrar Mohammed

National Institute of Justice

• NYC OCME• Oakland PD• Indiana State Police• Washington DC DFS• Rutgers University• San Diego Sheriffs Dept• Onondaga County CFS

College of Arts and Sciences | Forensic and National Security Sciences Institute

Niche Vision LLC• Luigi Armogida• Vic Meles• Tom Faris

Contributing Laboratories• CT Division of Scientific Services• Palm Beach County Sheriffs Office• Kansas City PD• Michigan State Police• Erie County DFS• Kentucky State Police• Promega Corporation• Idaho State Police Forensic Services

Page 13: Anatomy of Data Decision - Vermont

8/19/2019

13

49

Thank You

Questions

[email protected] ; [email protected]

Phone: 315-443-5279107 College Place; 120 LSB

Syracuse, NY 13244

College of Arts and Sciences | Forensic and National Security Sciences Institute