CONFIDENTIAL 2019
Talking to the Machines: Monitoring Production Machine Learning Systems
Ting-Fang Yen, Director of Research, DataVisor
September 12, 2019
CONFIDENTIAL 2019 2
Machine Learning Applications Everywhere
CONFIDENTIAL 2019
Production ML Pipeline is Complex
3
Sculley, D., et al. "Hidden technical debt in machine learning systems." NIPS. 2015.
“The required surrounding infrastructure is vast and complex.”
CONFIDENTIAL 2019 4
Machine Learning Pipeline at DataVisor
Real-time fraud detection service powered by unsupervised machine learning (UML) technologyBillions of application-level events processed daily
Correlation engine of billions of users
Temporal Eventseq
Velocity / freq
Spatial / geo
Domainattributes
Graphattributes
Data Collection and Storage
Data Processing
Feature Generation
UML Clustering Analysis
Raw Data
Case Management UI / API
Supervised Machine Learning
Automated Rule Engine
GIN
DataVisor Global Intelligence Network
CONFIDENTIAL 2019 5
What Can Go Wrong?
- Misconfigurations / Buggy pipeline- Input and output location- Module dependencies- Version control
- Resource Management- Out of memory- Out of disk- Job scheduling- Other job failures
- Data Issues- Data upload problems- Data format or parsing error
- Bugs in code- Runtime exceptions
- Model accuracy- …
CONFIDENTIAL 2019 6
Monitor Production ML Pipelines
CONFIDENTIAL 2019 7
Monitor Production ML Pipelines
CONFIDENTIAL 2019 8
Making Job Scheduling/Dependencies Simpler
DataVisor SparkGen: One job per cluster• High utilization
• No inter-job interference• Maximize job concurrency• Low maintenance
Time
Time
Sequential
One Job Per Cluster
A
D
C F G
B E
J
K
I
M
L
H
A
B
C
D
E
F G J
H
K M
I L
A B C D E F G H I LJ K M
Multiple Clusters
Time
A B
C
D E
F G
H
I
LJ
K
M
CONFIDENTIAL 2019 9
What About Model Prediction Quality?
Common approaches for measuring model prediction quality• Labeled data
• Holdout testing• Business metrics• Manual review
Can we come up with other metrics for evaluating machine learning models?
ML Service
ModelData
Predictions
95DataData ???
CONFIDENTIAL 2019 10
Defining Model Quality…
• Comparing with another model (the “surrogate”) • For example, a “simplified” model is used for better runtime performance. Run the
“full” model to monitor performance of “simplified” model
• Goodness of clustering• E.g., Silhouette index, prediction boundaries
• Quantify model “drift”• Model-specific metrics?• Meta-data about model internals and output
• Detections triggered by each feature• Number of other users a given user is associated with • Detection volume by attack type / category• Prediction score distribution• …
CONFIDENTIAL 2019 11
Our Approach
An anomaly detection problem• Assume model at deployment time is “good”
• Monitor for subsequent changes in model output
CONFIDENTIAL 2019 12
Time Series Decomposition
Seasonal decomposition of time series by Loess (STL)• Additive decomposition: Yt = Tt + St + Rt
References:• Cleveland et al. “STL: a seasonal-trend
decomposition” Journal of official statistics 6(1) pp. 3-73, 1990
• Vallis et al. "A Novel Technique for Long-Term Anomaly Detection in the Cloud" HotCloud,2014.
Trend (Tt)
Seasonal (St)
Residual (Rt)
CONFIDENTIAL 2019 13
Time Series Decomposition – with Trend Replacement
References:Hochenbaum et al. “Automatic Anomaly Detection in the Cloud Via Statistical Learning.” arXiv preprint arXiv:1704.07706, 2017
Residual (Rt) = Original (Yt) – Trend (Tt) – Seasonal (St)
CONFIDENTIAL 2019 14
Time Series Anomaly Detection: Residual Component
MAD statistic: Median absolute deviation• Median value of the difference between each data point and the median
• A robust measure of data variability
Anomalies: Distance to median > X * MAD
Median valueX*MAD
CONFIDENTIAL 2019 15
Breakouts: Gradual Changes
Could indicate subtle changes in model behavior
Apply heuristics on ”trend” component• Derivative of signal has the same sign over multiple,
consecutive data points
• Recent rate of change is large
CONFIDENTIAL 2019 16
Putting it Together
CONFIDENTIAL 2019 17
Investigating Anomalies
Group together “similar” time series• (Pearson) Correlation distance
• Clustering
• Rank time series in the same cluster as anomalous time series
CONFIDENTIAL 2019 18
Investigating Anomalies (cont’d)
• Multidimensional scaling (MDS) for plotting
CONFIDENTIAL 2019 19
Investigating Anomalies (cont’d)
Store results as Parquet filesLoad as PySpark dataframe (supports SQL-like operations)
Lowers technical barrier for debugging machine learning models
CONFIDENTIAL 2019 20
Lessons Learned
• Build resilient, modular pipelines• Decouple model decisions from data engineering• Decouple data from code, features from data
• Establish process for incident response, assign module owners• Encourage cross-team communication
• Explore alternative model evaluation metrics that fit your application/business goals
CONFIDENTIAL 2019
“By 2021, 50% of enterprises will have added unsupervised machine learning to their fraud detection solution suites. “
- Begin Investing Now in Enhanced Machine Learning Capabilities for Fraud Detection’, Gartner Research
22