Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
#StrataData - Predicting real-time transaction fraud using supervised learning
Predicting Real-Time Transaction FraudSami Niemi, PhDBarclays, Quantitative Analytics, Fraud Detection
#StrataData - Predicting real-time transaction fraud using supervised learning
Contents
2
Background
4 Development
Implementation
Raw Data
Data Processing
Summary
Validation
7
1
2
3
4
5
6
#StrataData - Predicting real-time transaction fraud using supervised learning
Background – Definitions and Examples
an individual, or group of people, create oruse a third-party's identity in order to applyfor products or take over an account withoutthe consent or knowledge of the third-party.
3rd Party Fraud
Card Present (CP)• e.g. lost, stolen, counterfeit/cloneCard not Present (CnP)• e.g. identity theft, hacking, fake online shops
Card Transaction
Fraud
3
#StrataData - Predicting real-time transaction fraud using supervised learning
Background – Motivation (global view)
4
#StrataData - Predicting real-time transaction fraud using supervised learning
Background – Motivation (UK view)
5
Source: Fraud the Facts 2018 by UK Finance
#StrataData - Predicting real-time transaction fraud using supervised learning
Background – Challenges
6
Fraudsters Adapt and Invent New MOs
Real-Time Runtime Requirements
Front Page News Material
#StrataData - Predicting real-time transaction fraud using supervised learning 7
Aim of the project was to develop and implement newDebit CP and CnP real-time fraud detection models, whichcan reduce fraud losses and protect genuine customers.
#StrataData - Predicting real-time transaction fraud using supervised learning
Raw Data – Sources
8
Debit Card Transactions
Non-Mon Events
Other Cards and Accounts
Customer Info
Account Info
Payment Instrument
Info
Confirmed Frauds
#StrataData - Predicting real-time transaction fraud using supervised learning
Data Processing – Quality Assurance and Data Exploration
9
Data Quality
• Reconciliation, Volumes, and Amounts
• Daily and Monthly Summary Statistics
• Anomaly and Outlier Detection
Exploration
• Trend Analysis and Anomaly Detection
• Distributions (PDFs and bar charts for Fraud / Non)
• Correlations (covariance, correlation w/ target, etc.)
Report
• Thresholding
• Issue Generation and Resolution
• Documentation and Governance
#StrataData - Predicting real-time transaction fraud using supervised learning
Data – High Level Statistics
10
• Total: 220 – 300M debit card transactions with total value of £9 – 11B per month• CP: 110M contactless, 20M ATM• CnP: 85M e-commerce + telephony
Volumes
• 10M unique customers per month• transacting in 220 countries• using 12M debit cards• with 1.9M different merchants
Customers
• Fraud Rates• CP: less than 0.01%, depending on segment• CnP: less than 0.15%, depending on segment
Frauds
#StrataData - Predicting real-time transaction fraud using supervised learning
Development – Datasets
11
Debit14 months
CP
Train12 months
Sample45M transactions
OOT2 recent mnths
CnP
Train12 months
Sample55M transactions
OOT2 recent mnths
#StrataData - Predicting real-time transaction fraud using supervised learning
Data Processing – Feature Engineering
12
and many more (e.g. merchant)… finally, ratios between values and current transaction.
#StrataData - Predicting real-time transaction fraud using supervised learning
Development – Feature Selection
13
Univariate
• Remove zero or extremely low variance• Remove if all or extremely high level of missing values• Very low Information Value or Spearman rank correction
Model
• Lasso co-efficient importance• Random Forest feature importance
Wrapper• Recursive Feature Elimination
Domain
• Business Review• Implementation Considerations
20k
10k
1k
500
#StrataData - Predicting real-time transaction fraud using supervised learning
Development – Feature Selection & Business Review
14
FraudGenuine
Debit CP model feature:ratio of current transaction amount and maximum contactless in last X days
#StrataData - Predicting real-time transaction fraud using supervised learning
Development – Model Development Cycle
15
Select Features
Pick Model and Train
OptimizeEvaluate
Review
#StrataData - Predicting real-time transaction fraud using supervised learning
Development – Hyper-parameter Optimization
16
Fraction of Features in a SplitNumber of Iterations
Perfo
rman
ce
Example of Bayesian hyper-parameter optimization using hyper-opt
#StrataData - Predicting real-time transaction fraud using supervised learning
Validation – CP Model Performance
17
Precision Recall Curve: AUC ~0.23 ROC Curve: AUC ~0.95
#StrataData - Predicting real-time transaction fraud using supervised learning
Validation – CP Model Performance
18
Incumbent model
New model
Incumbent model
New model
False Positive Rate [bps] False Positive Rate [bps]
Transaction Detection Rate Value Detection Rate
#StrataData - Predicting real-time transaction fraud using supervised learning
Validation – CnP Model Performance
19
New model
Incumbent model
Incumbent model
Transaction Detection Rate Value Detection Rate
False Positive Rate [bps] False Positive Rate [bps]
New model
#StrataData - Predicting real-time transaction fraud using supervised learning
Validation – CP Model Interrogation
20
Time since a new card was issued
Frau
d Ri
skChip usedChip not-used
#StrataData - Predicting real-time transaction fraud using supervised learning
Implementation – Development Artefacts
21
Model Artefacts• Model Specification (JSON)• Model File (txt)• Validation Data (parquet)
Feature Artefacts• Feature Specification (JSON)• Validation Data (parquet)
#StrataData - Predicting real-time transaction fraud using supervised learning
Implementation - Process
22
Artefacts from Nexus using Jenkins
Model File and Implementation Validation
Feature Code Gen and Validation
Feature Maturation and Shadow Operations
Production Validation and Go-Live
#StrataData - Predicting real-time transaction fraud using supervised learning
Summary
• Increasing number of customers become victims of fraud, especially remote purchase (e.g. e-commerce).
• To improve fraud prevention and customer experience, we undertook– Development of generation 1 models for Debit CP and
CnP using tree ensemble algorithms– 12 months of training data were converted to about 20k
features to develop the best possible models– Both models are in implementation, shadow operations
due in May with go-live during summer• R&D for generation 2 models (e.g. RNNs, autoencoders) on-
going, promising results, but implementation requires more work…
23