Upload
mapr-data-technologies
View
360
Download
0
Embed Size (px)
Citation preview
© 2017 MapR Technologies 1
Machine Learning
Comparison and Evaluation
© 2017 MapR Technologies 2
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Board Member, Apache Software Foundation
O’Reilly author
Email [email protected] [email protected]
Twitter @ted_dunning
© 2017 MapR Technologies 3
Machine Learning Everywhere
Image courtesy Mtell used with permission.Images © Ellen Friedman.
© 2017 MapR Technologies 4
Scores
ArchiveDecoy
m1
m2
m3
Features / profiles
Input Raw
© 2017 MapR Technologies 5
ResultsRendezvousScores
ArchiveDecoy
m1
m2
m3
Features / profiles
Input Raw
© 2017 MapR Technologies 6
MetricsMetrics
ResultsRendezvousScores
ArchiveDecoy
m1
m2
m3
Features / profiles
Input Raw
© 2017 MapR Technologies 7
Let’s talk about how the rendezvous architecture makes
evaluation easier
© 2017 MapR Technologies 8
Decoy Model in the Rendezvous Architecture
InputScores
Decoy
Model 2
Model 3
Archive
• Looks like a server, but it just archives inputs
• Safe in a good streaming environment, less safe without good isolation
© 2017 MapR Technologies 9
Other Data Collected in Rendezvous
• Request ID + Input data
• All output scores
• Evaluation latency
• Round trip latency
• Rendezvous choices
© 2017 MapR Technologies 10
Direct Model Comparison
• Don’t need ground truth to compare models at a gross level
• For uncalibrated models, score quantiles are useful
• For mature models, most results will be very similar
– Large differences from known good models cannot be good
• Ultimately, ground truth is important
– But only for cases where scores differ significantly
© 2017 MapR Technologies 11
Direct Model Differencing
−2 0 2 4
02
46
Raw Scores
0.0 0.5 1.00.0
0.5
1.0
Q−Q plot
© 2017 MapR Technologies 12
Direct Model Differencing
−2 0 2 4
02
46
Raw Scores
0.0 0.5 1.00.0
0.5
1.0
Q−Q plot
Scales may
differ radically
© 2017 MapR Technologies 13
Direct Model Differencing
−2 0 2 4
02
46
Raw Scores
0.0 0.5 1.00.0
0.5
1.0
Q−Q plot
Scales may
differ radically
Quantiles
correct scaling
© 2017 MapR Technologies 14
Direct Model Differencing
−2 0 2 4
02
46
Raw Scores
0.0 0.5 1.00.0
0.5
1.0
Q−Q plot
Scales may
differ radically
Quantiles
correct scaling
Perfect match
on high scores
© 2017 MapR Technologies 15
Reject Inferencing
• Today’s model selects tomorrows training data
• Safe decisions often prevent data collection
– Fraud flag prevents the transaction
– Recommendation ranking has the same effect
• The model winds up confirming what it already knows
• Model comparison has same problem
– Champion says reject, challenger says retain
© 2017 MapR Technologies 16
Reject Inferencing Solution
• We must balance EXPLORATION
– Calling a bluff to look at ground truth
• Versus EXPLOITATION
– Doing what we think is right
• Exploration costs us because we make worse decisions
– But it can help make better decisions later
• Exploitation costs us because we don’t learn better answers
– But it is the best we know now
© 2017 MapR Technologies 17
Multi-Armed Bandits
• Classic formulation for explore/exploit trade-offs
• Thompson sampling is very good option
• Simple dithering may be good enough
• Key intuition is that we don’t need to perfectly characterize
losers … once we know they are losers, we don’t care
• Variant for ranking also good for model evaluation
– Also used to rank reddit comments
© 2017 MapR Technologies 18
© 2017 MapR Technologies 19
© 2017 MapR Technologies 20
© 2017 MapR Technologies 21
© 2017 MapR Technologies 22
© 2017 MapR Technologies 23
© 2017 MapR Technologies 24
© 2017 MapR Technologies 25
© 2017 MapR Technologies 26
Some Warnings
• Bad models can be good explorers
• That can make other models look better
• Offline evaluation is fine, but you don’t know what would have
happened … real innovation has high error bars
• Where models all agree, we learning nothing
• In the end, it is differences that matter the most
© 2017 MapR Technologies 27
Having complete and precise history is golden for
offline comparisons
© 2017 MapR Technologies 28
Allowing the rendezvous server
to do Thompson sampling is
even better
© 2017 MapR Technologies 29
Change Detection
• Model comparison is all fine and good until the world changes
• And the world will change
• One of the most sensitive indicators is score distribution for a
good model
– T-digest is very effective for sketching distributions, especially in tails
– Compare current vs historical distribution using q-q or KS
© 2017 MapR Technologies 30
Analyzing latencies
© 2017 MapR Technologies 31
Hotel Room Latencies
• These are ping latencies from my hotel
• Looks pretty good, right?
• But what about longer term?
208.302198.571185.099191.258201.392214.738197.389187.749201.693186.762185.296186.390183.960188.060190.763
> mean(y$t[i])[1] 198.6047> sd(y$t[i])[1] 71.43965
© 2017 MapR Technologies 32
Not So Fast …
© 2017 MapR Technologies 33
This is long-tailed land
© 2017 MapR Technologies 34
This is long-tailed land
You have to know the distribution
of values
© 2017 MapR Technologies 35
© 2017 MapR Technologies 36
A single number
is simply not enough
© 2017 MapR Technologies 37
And this histogram is hard to read
© 2017 MapR Technologies 38
Idea – Exponential Bins
• Suppose we want relative accuracy in measurement space
• Latencies are positive and only matter within a few percent
– 1.1 ms versus 1.0 ms
– 1100 ms versus 1000 ms
• We can cheat by using floating point representations
– Compute bin using magic
– Adjust bins slightly using more magic
– Count
© 2017 MapR Technologies 39
FloatHistogram
• Assume all measurements are in the range
• Divide this range into power of 2 sub-ranges
• Sub-divide each sub-range evenly with steps
– is typical
• Relative error is bounded in measurement space
© 2017 MapR Technologies 40
FloatHistogram
• Assume all measurements are in the range
• Divide this range into power of 2 sub-ranges
• Sub-divide each sub-range evenly with steps
– is typical
• Relative error is bounded in measurement space
• Bin index can be computed using FP representation!
© 2017 MapR Technologies 41
What about visualization?
© 2017 MapR Technologies 42
Can’t see small count bars
© 2017 MapR Technologies 43
Good Results
© 2017 MapR Technologies 44
Bad Results – 1% of measurements are 3x bigger
© 2017 MapR Technologies 45
Bad Results – 1% of measurements are 3x bigger
© 2017 MapR Technologies 46
Uniform Bins
© 2017 MapR Technologies 47
FloatHistogram Bins
© 2017 MapR Technologies 48
With FloatHistogram
© 2017 MapR Technologies 49
Sign Up for Next Workshop in the MLL Series
by Ted Dunning, Chief Applications Architect at MapR:
Machine Learning in the Enterprise:
How to do model management in production
http://bit.ly/mapr-machine-learning-logistics-series
© 2017 MapR Technologies 50
Additional Resources
O’Reilly report by Ted Dunning & Ellen Friedman © March 2017
Read free courtesy of MapR:
https://mapr.com/geo-distribution-big-data-and-analytics/
O’Reilly book by Ted Dunning & Ellen Friedman
© March 2016
Read free courtesy of MapR:
https://mapr.com/streaming-architecture-using-
apache-kafka-mapr-streams/
© 2017 MapR Technologies 51
Additional Resources
O’Reilly book by Ted Dunning & Ellen Friedman
© June 2014
Read free courtesy of MapR:
https://mapr.com/practical-machine-learning-
new-look-anomaly-detection/
O’Reilly book by Ellen Friedman & Ted Dunning
© February 2014
Read free courtesy of MapR:
https://mapr.com/practical-machine-learning/
© 2017 MapR Technologies 52
Additional Resources
by Ellen Friedman 8 Aug 2017 on MapR blog:
https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/
Interview by Thor Olavsrud in CIO:
https://www.cio.com.au/article/630299/
what-dataops-collaborative-cross-
functional-analytics/?fp=16&fpid=1
© 2017 MapR Technologies 53
Read more in new book on model management:
New O’Reilly book by Ted Dunning & Ellen Friedman© September 2017
Download free pdf courtesy of MapR:
https://mapr.com/ebook/machine-learning-logistics/
© 2017 MapR Technologies 54
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen
© 2017 MapR Technologies 55
Q&A
@mapr
Maprtechnologies
ENGAGE WITH US
@ted_dunning