Upload
jo-fai-chow
View
584
Download
0
Embed Size (px)
Citation preview
H2O Open Tour 2016, New York 1
Better Customer Experience withData Science
(just add water)
Bernard Burg
Comcast
7/19/16
XFINITY TVXFINITY Internet
XFINITY VoiceXFINITY Home
Digital & OtherOther
*Minority interest and/or non-controlling interest.
Slide is not comprehensive of all Comcast NBCUniversal assets
Updated: December 22, 2015
H2O Open Tour 2016, New York 3
Complex Troubleshooting• Failure scenario
– Customer orders a Video-on-Demand– Transaction fails, customer care call initiated
• Consequences– Unhappy customer: no visibility or opportunity to mitigate issue– Potentially avoidable phone call
• Numerous potential reasons for failure– Billing– Resource unavailable– Service issue– Hardware issue (set-top box or router)– Software issue– Parental control settings
7/19/16
H2O Open Tour 2016, New York 4
Analysis
• What brought the customer to this point?– Call records– Billing history– Events generated by hardware– Upstream outages– Usage spikes
• What’s the best course of action now?• How can we predict such issues?
7/19/16
H2O Open Tour 2016, New York 5
Project Goals
7/19/16
Improve Customer Experience• Keep our customers informed• Empower our CARE agents– Timely, accurate, complete information & context– Smart recommendations
• Higher first call resolutionMaximize Efficiency • Customer self service– Fewer calls & truck rolls
• Self Assisted-healing equipment
H2O Open Tour 2016, New York 6
Goal of Data Science
7/19/16
Each user’s set top boxes sends up to 150+ different codes of error messages, at any time:
Goal 1: predict if a user will call Goal 2: predict why they call
H2O Open Tour 2016, New York 7
Predicting User Calls
Using Error Model Alone
Data scienceGradient Boosting Machine
66% accuracyTemporal model
The algorithm reached a glass ceiling
calls
no-calls
Using Error + User Behavior Models
Data scienceGradient Boosting Machine
79% accuracyTemporal model
Behavior model
calls
no-calls
no-calls
7/19/16
H2O Open Tour 2016, New York 8
Predicting Why Users CallA Single Algorithm Predicting 10 Buckets
Data scienceGradient Boosting Machine
47% accuracy is not great but is about 5 times better than random
Temporal model
7/19/16
Spark ML H2O
Accuracy 42% 47%
Processing time 10 minutes 2 minutes
Memory Limited size of test No limit reached
Ease of use Program dataFrame UI
H2O Open Tour 2016, New York 9
Very easy to make in sparkling Water: Map enum to n binary buckets
7/19/16
Predicting Why Users Call10 Specialized Algorithms Predicting 10 Buckets
10 binary buckets
H2O Open Tour 2016, New York 10
Predicting Why Users Call10 Specialized Algorithms Predicting 10 Buckets
Data scienceGradient Boosting Machine Temporal model
7/19/16
H2O Open Tour 2016, New York 11
Predicting Why Users Call
Looks good but…
Data scienceGradient Boosting Machine Temporal model
7/19/16
Data scienceGradient Boosting Machine
Spark ML H2O
Accuracy ? 60%
Processing time 10 * 10 minutes 11 * 2 minutes
Memory Limited size of test No limit reached
Ease of use Program dataFrame UI
Why this drop from
95% to 60%
H2O Open Tour 2016, New York 12
Learning 10 Specialized Algorithms in H2O
7/19/16
Predicting Why Users Call
H2O Open Tour 2016, New York 13
Overlapping Buckets
7/19/16
Hope given by a 95% composite precision of the 10 binary algorithms did not materialize because of overlapping classes misclassifying elements as shown in ROC (Receiver Operating characteristic) charts as drawn by H2Ofalse positive
false positive
true
pos
itive
true
pos
itive
H2O Open Tour 2016, New York 14
Forecasting Improvements with H20
7/19/16
• Hypothesis case 1: B2:billing can be predicted with 100% accuracy• The overall prediction model would jump to : 75% accuracy
Replace Estimation by result
H2O Open Tour 2016, New York 15
Forecasting Improvements
7/19/16
• By fixing one of the problematic buckets:• The overall prediction model would jump to : 75% accuracy • By fixing both problematic buckets:• The overall prediction model would jump to : 86% accuracy
These simple forecasts are worth gold, as they allow us to focus on the essential
(out of 1000’s of parameters)
H2O Open Tour 2016, New York 16
Conclusion
7/19/16
Choice to switch to H20 was simple • Superior results (accuracy)• Faster algorithms (factor 3)• Better use of memory• Accelerated studies because of
– Input UI allowing to select/deselect columns– Very smart output UI (ROC, influent parameters…)
• Stable and reliable algorithms
Room for improvement: • Sparkling water interface showed some instabilities• We designed around it by generating csv files