Upload
sri-kanajan
View
39
Download
2
Tags:
Embed Size (px)
Citation preview
1
Phone Fraudsters in a HaystackSri Kanajan, Prasad Telekuntla, Mijail Gomez
3rd place in Tata Telecommunications Global Hackathon
2
Leaves International Missed Call
Unknowingly Calls Premium Number or Manipulative Advertisement
$2 BILLION OF LOST REVENUE FROM TELCOM PROVIDERS
Example of Phone Fraud
3
Motivations
• Current statistical solutions have low specificity and sensitivity
• Human fraud analysts have to continually update their heuristic
based rules and thresholds
• Need an adaptive solution that works in real time with minimal false
positives
4
Statistical Analysis Anomaly Detection
Live Streaming Phone Data
Hybrid Statistical and Machine Learning Solution
Number of Callers/Callee/Cumulative Call Duration
Machine Learning(Random Forests)
Evaluation of other features in the call log such as answer indicator, area code, pricing…
Used Hackathon De-identified Phone Log Dataset 16 GB
5
Anomaly Detection Through Statistical Analysis
# of Unique Caller’s per Phone Number
# of Unique Callee’s per Phone Number
Cumulative Duration of Calls to Specific Phone Numbers
ANOMALOUS Phone Numbers!!
6
Statistical Analysis Anomaly Detection
Machine Learning(Random Forests)
Graph Analysis Anomaly Detection
Live Streaming Phone Data
Predicted Anomalies
Hybrid Statistical and Machine Learning Solution
7
Fraud Detection Using Graph Metrics
• Triangle Counting
• PageRank
• Others… Note: Goal is to uncover the callers that are
very different from the large majority
8
Using Principal Component Analysis to uncover the outliers in the graph metrics
Fraud Detection Using Graph Metrics
Possible Fraudsters!
9
Statistical Analysis Anomaly Detection
Machine Learning(Random Forests)
Graph Analysis Anomaly Detection
Live Streaming Phone Data
Predicted Anomalies
Human Observed
Fraud Analyst
Hybrid Statistical and Machine Learning Solution
Possible Fraud
10
Human Fraud Analyst Confirmation of Fraudsterwww.fraud-detector.net
Fraud Detection Using Graph Metrics
11
Statistical Analysis Anomaly Detection
Machine Learning(Random Forests)
Graph Analysis Anomaly Detection
Live Streaming Phone Data
Predicted Anomalies
Confirmed Fraudsters
Human Observed
Fraud Analyst
Hybrid Statistical and Machine Learning Solution
Possible Fraud
12
Ensemble Model – Machine Learning and Statistical
• With labeled data, the classifier can progressively identify patterns
beyond the graph metrics (uses all other features in the raw call log)
– E.g. patterns in area codes or specific pricing plans used by fraudsters
• Active learning is done online while the system is active. I.e. the
longer the system is in use, the better it gets
14
Conclusion
Possible False PositivePossible Fraudster
16
Acknowledgements
D3Python
Zipfian Academy
Technologies Used