View
160
Download
1
Embed Size (px)
Citation preview
Inside the mind of Sports and Energy Industry using Machine LearningIGOR ILIC
MICROSOFT DEVELOPMENT CENTER SERBIA
Microsoft Development Center SerbiaEstablished 2005
220 employees:◦ Software Engineers
◦ Data Scientists
◦ Labelers
◦ Designers
◦ PR, HR, Finance
Teams:◦ SQL
◦ Office
◦ Bing
◦ Handwriting recognition
◦ etc.
Known for hiring local “super-stars”, but focused on discovering new talents
Start-up within a corporation
Data Science at MDCSMicrosoft (corporation) invests heavily in data-driven culture
MDCS: follow -> lead the way
Several areas being worked on in several teams:◦ Machine Learning
◦ Time Series Analysis
◦ Hypothesis testing
◦ etc.
Expanding role – proving our value increases the team
Data Science at SQL TeamAzure – Microsoft’s cloud service
Azure SQL DB – service for storing SQL DBs on Azure
Box products – based on assumptions and communication with limited amount of customers
Cloud – possibilities of customer insight like never before
Insights transformed into models in order to fit SQL features to most customers
◦ 99% approach
Insights
Model
Private Preview
General Availability
Box
Do data science teams bring enough value?Recognizing a gap:
◦ Data Science for marketing (IT) or regulations (banks) purposes
◦ Apart from the trivial ones, insights/models not used on a fundamental level
Why is there a gap?◦ Independence/Separation of Data Science teams
◦ Working on complex models
◦ Not scalable
◦ Needs internal marketing to push through
◦ Policies don’t work
Worst-case ScenarioData Science certainly is the future, but…
Perceived Not Adding Value + Hiring Heavily = Bubble:◦ Is data science going to follow .com or smartphone industry path?
How are we adding value?Our solution – mixed teams:
◦ Building expert systems
◦ No data science specific projects
◦ Working to the smallest details with SEs
◦ Complete understanding between each other’s work and models
This leads to products with data scientists’ signatures all over them
Absolutely never do DS specific stuff? Of course not!◦ ML for targeted marketing
◦ AR + FFT for predictability
◦ Expert systems – root which brings freedom to experiment
Mapping to other companiesHow does it map to other companies/industries:
◦ Credit Sales with Quantitative Analysts (Finance)
◦ Management with Business Intelligence (Retail)
◦ Software Engineers with Data Science (IT)
Approach good at companies with mature data-driven culture
Company new to data science? Prove value!
Most effective way – Machine Learning
Most effective way of the most effective way – user-friendly Machine Learning tools
This leads us to… Azure ML
So, what is Azure ML?A fully managed cloud service that is designed to enable simple, user-friendly building, deploying, and sharing of predictive analytics solutions
Building ML based solutions using simple drag-and-drop interface - from idea to deployment in a matter of clicks
Includes hundreds of built-in packages and support for custom code
Where is ML applicable? Everywhere!Public High School• Low graduation rate• Predicting drop-out probability• Pointing to “areas of intervention” for each
student-at-risk (reading, math, psychologist)• 90% accuracy => graduation rate increased by
nearly 50%
Bank• Predicting default probability• Learning based on historical data• Tailor made offer/interest rate
Aircraft Engine Producer• Predict aircraft engine failure• Maintenance expensive, done on fixed time
periods• Optimize using ML• Sensors in the engine – used for learning• IoT + ML = predicting maintenance
Oil Company• Oil tank spills prevented by periodic pick-up
by trucks• Prediction of oil levels in tanks using time
derivative• Alerts -> pick-up by trucks only when needed
Sports industry case studyProblem definitionNFL team
Problem: Non-targeted marketing of seasonal ticket packages
Goal: Use Machine Learning to identify potential customers who are more likely to purchase the next season ticket package and maintain a good customer retention rate to keep loyalty customers
Why: Target marketing and tailor offers for customers recognized as having high probability of purchasing season tickets or current season ticket holders likely to change status in the following season
Business goal: Boost the overall ticket sales and revenues
Sports industry case studyPre-analysisMost customers will not change their purchasing plans
What are the characteristics of those customers who do change plans?◦ Single game ticket or ticket package – historical preference
◦ Purchasing behavioral patterns
◦ Attendance patterns
◦ Financial status
◦ Age
◦ Job
Data sources available:◦ Customer information
◦ Historical sales records
◦ Historical attendance records
◦ Analytics data from third-party
Sports industry case study Feature EngineeringCombine available data sources and characteristics from pre-analysis to define a full feature set needed to build a model for prediction (this includes labels used to train prediction models)
Define a raw dataset – data available directly from the aforementioned data sources
Extract as many features as possible from the raw dataset:◦ Gender◦ Occupation◦ Is season ticket holder?◦ etc.
Features unavailable in the raw dataset, but considered needed for the prediction models, are inferred from the raw data:
◦ Price paid distribution (min, max, mean, standard deviation)◦ Attendance count (for each of the remaining 31 NFL teams)◦ etc.
Over a 100 features in the final set
Sports industry case studyModelling
4-class classifier that used to indicate the purchasing plan changes
◦ Non-season ticket holder to season ticket holder
◦ Season ticket holder to non-season ticket holder
◦ Keep as non-season ticket holder
◦ Keep as season ticket holder
Binary classifier◦ Change current purchasing plan
◦ Not change current purchasing plan
Sports industry case studyModelling (cont.)Try out different classification algorithms to find the one with the best prediction power:
◦ Decision Tree
◦ Decision Forest
◦ Logistic Regression
◦ Neural Networks
◦ etc.
70% of the (historical) dataset used for training, 30% for testing
Models compared based on accuracy on test dataset – best performing 4-class and 2-class models compared on further statistics
2-class model better in terms of accuracy plus 4-class has some logical misses (Renew STH after Non-STH)
Boosted decision tree chosen
Sports industry case studySolution ArchitectureReading from Azure SQL DB
Feature Engineering using Python scripts
Modelling (learning) using Azure ML built-in algorithms
Results stored in Azure Storage
Azure components connected using Azure Data Factory
All connected with NFL team’s Marketing System where relevant decisions are made based on the results
Energy Industry case studyNew York power energy distributor
Analyzing real-time power energy consumption:
◦ Total/Average hourly demand
◦ On-Peak/Off-Peak demand
◦ Regional demand
◦ etc.
Predicting future power energy consumption with ML:
◦ Historical consumption data
◦ Predicted weather (temperature)
Actual and predicted demand with confidence interval
Historical Error Rate
Energy Industry Case Study Solution Architecture
Energy Consumption Data & Weather Data
(Public Source)
Event Hub Stores Streaming
Data
Stream Analytics processes events as they arrive in the EventHub
AML Model Web Service BES endpoint
Power BI Dashboard
Data for
Real-time Processing
Data Stream
Job
Hourly Prediction
Updates
External Data Azure Services
Send to Azure SQL
for batch predictions
Get Data
Azure WebJob
Runs jobs to get data
from public source
Azure SQLContains Historical Energy
Consumption & Weather Data
Real time
data stats
Azure Data Factory Pipeline invokes AML
Web Service
Real Tim
eB
atc
h
ConclusionMany data science strategy approaches based on many different factors:
◦ Expert systems
◦ Independent Data Science teams
◦ Using third-party vendor
◦ Experimentation with Data Science – most efficient by ML
ML can be used as proof of concept for value added by introducing data-driven culture
Azure ML as an example of simple implementation for experimentation with ML
Many success stories in widest range of areas:◦ Education
◦ Energy
◦ Finance
◦ Sports
◦ Aircraft Industry
◦ etc.