24
#StrataData - Predicting real-time transaction fraud using supervised learning Predicting Real-Time Transaction Fraud Sami Niemi, PhD Barclays, Quantitative Analytics, Fraud Detection

Predicting Real-Time Transaction Fraud

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

#StrataData - Predicting real-time transaction fraud using supervised learning

Predicting Real-Time Transaction FraudSami Niemi, PhDBarclays, Quantitative Analytics, Fraud Detection

#StrataData - Predicting real-time transaction fraud using supervised learning

Contents

2

Background

4 Development

Implementation

Raw Data

Data Processing

Summary

Validation

7

1

2

3

4

5

6

#StrataData - Predicting real-time transaction fraud using supervised learning

Background – Definitions and Examples

an individual, or group of people, create oruse a third-party's identity in order to applyfor products or take over an account withoutthe consent or knowledge of the third-party.

3rd Party Fraud

Card Present (CP)• e.g. lost, stolen, counterfeit/cloneCard not Present (CnP)• e.g. identity theft, hacking, fake online shops

Card Transaction

Fraud

3

#StrataData - Predicting real-time transaction fraud using supervised learning

Background – Motivation (global view)

4

#StrataData - Predicting real-time transaction fraud using supervised learning

Background – Motivation (UK view)

5

Source: Fraud the Facts 2018 by UK Finance

#StrataData - Predicting real-time transaction fraud using supervised learning

Background – Challenges

6

Fraudsters Adapt and Invent New MOs

Real-Time Runtime Requirements

Front Page News Material

#StrataData - Predicting real-time transaction fraud using supervised learning 7

Aim of the project was to develop and implement newDebit CP and CnP real-time fraud detection models, whichcan reduce fraud losses and protect genuine customers.

#StrataData - Predicting real-time transaction fraud using supervised learning

Raw Data – Sources

8

Debit Card Transactions

Non-Mon Events

Other Cards and Accounts

Customer Info

Account Info

Payment Instrument

Info

Confirmed Frauds

#StrataData - Predicting real-time transaction fraud using supervised learning

Data Processing – Quality Assurance and Data Exploration

9

Data Quality

• Reconciliation, Volumes, and Amounts

• Daily and Monthly Summary Statistics

• Anomaly and Outlier Detection

Exploration

• Trend Analysis and Anomaly Detection

• Distributions (PDFs and bar charts for Fraud / Non)

• Correlations (covariance, correlation w/ target, etc.)

Report

• Thresholding

• Issue Generation and Resolution

• Documentation and Governance

#StrataData - Predicting real-time transaction fraud using supervised learning

Data – High Level Statistics

10

• Total: 220 – 300M debit card transactions with total value of £9 – 11B per month• CP: 110M contactless, 20M ATM• CnP: 85M e-commerce + telephony

Volumes

• 10M unique customers per month• transacting in 220 countries• using 12M debit cards• with 1.9M different merchants

Customers

• Fraud Rates• CP: less than 0.01%, depending on segment• CnP: less than 0.15%, depending on segment

Frauds

#StrataData - Predicting real-time transaction fraud using supervised learning

Development – Datasets

11

Debit14 months

CP

Train12 months

Sample45M transactions

OOT2 recent mnths

CnP

Train12 months

Sample55M transactions

OOT2 recent mnths

#StrataData - Predicting real-time transaction fraud using supervised learning

Data Processing – Feature Engineering

12

and many more (e.g. merchant)… finally, ratios between values and current transaction.

#StrataData - Predicting real-time transaction fraud using supervised learning

Development – Feature Selection

13

Univariate

• Remove zero or extremely low variance• Remove if all or extremely high level of missing values• Very low Information Value or Spearman rank correction

Model

• Lasso co-efficient importance• Random Forest feature importance

Wrapper• Recursive Feature Elimination

Domain

• Business Review• Implementation Considerations

20k

10k

1k

500

#StrataData - Predicting real-time transaction fraud using supervised learning

Development – Feature Selection & Business Review

14

FraudGenuine

Debit CP model feature:ratio of current transaction amount and maximum contactless in last X days

#StrataData - Predicting real-time transaction fraud using supervised learning

Development – Model Development Cycle

15

Select Features

Pick Model and Train

OptimizeEvaluate

Review

#StrataData - Predicting real-time transaction fraud using supervised learning

Development – Hyper-parameter Optimization

16

Fraction of Features in a SplitNumber of Iterations

Perfo

rman

ce

Example of Bayesian hyper-parameter optimization using hyper-opt

#StrataData - Predicting real-time transaction fraud using supervised learning

Validation – CP Model Performance

17

Precision Recall Curve: AUC ~0.23 ROC Curve: AUC ~0.95

#StrataData - Predicting real-time transaction fraud using supervised learning

Validation – CP Model Performance

18

Incumbent model

New model

Incumbent model

New model

False Positive Rate [bps] False Positive Rate [bps]

Transaction Detection Rate Value Detection Rate

#StrataData - Predicting real-time transaction fraud using supervised learning

Validation – CnP Model Performance

19

New model

Incumbent model

Incumbent model

Transaction Detection Rate Value Detection Rate

False Positive Rate [bps] False Positive Rate [bps]

New model

#StrataData - Predicting real-time transaction fraud using supervised learning

Validation – CP Model Interrogation

20

Time since a new card was issued

Frau

d Ri

skChip usedChip not-used

#StrataData - Predicting real-time transaction fraud using supervised learning

Implementation – Development Artefacts

21

Model Artefacts• Model Specification (JSON)• Model File (txt)• Validation Data (parquet)

Feature Artefacts• Feature Specification (JSON)• Validation Data (parquet)

#StrataData - Predicting real-time transaction fraud using supervised learning

Implementation - Process

22

Artefacts from Nexus using Jenkins

Model File and Implementation Validation

Feature Code Gen and Validation

Feature Maturation and Shadow Operations

Production Validation and Go-Live

#StrataData - Predicting real-time transaction fraud using supervised learning

Summary

• Increasing number of customers become victims of fraud, especially remote purchase (e.g. e-commerce).

• To improve fraud prevention and customer experience, we undertook– Development of generation 1 models for Debit CP and

CnP using tree ensemble algorithms– 12 months of training data were converted to about 20k

features to develop the best possible models– Both models are in implementation, shadow operations

due in May with go-live during summer• R&D for generation 2 models (e.g. RNNs, autoencoders) on-

going, promising results, but implementation requires more work…

23

#StrataData - Predicting real-time transaction fraud using supervised learning

Rate today’s session

Session page on conference website O’Reilly Events App