29
Inside the mind of Sports and Energy Industry using Machine Learning IGOR ILIC MICROSOFT DEVELOPMENT CENTER SERBIA

Inside the mind of Sports and Energy Industry through Machine Learning - Igor Ilić

Embed Size (px)

Citation preview

Inside the mind of Sports and Energy Industry using Machine LearningIGOR ILIC

MICROSOFT DEVELOPMENT CENTER SERBIA

Microsoft Development Center SerbiaEstablished 2005

220 employees:◦ Software Engineers

◦ Data Scientists

◦ Labelers

◦ Designers

◦ PR, HR, Finance

Teams:◦ SQL

◦ Office

◦ Bing

◦ Handwriting recognition

◦ etc.

Known for hiring local “super-stars”, but focused on discovering new talents

Start-up within a corporation

Data Science at MDCSMicrosoft (corporation) invests heavily in data-driven culture

MDCS: follow -> lead the way

Several areas being worked on in several teams:◦ Machine Learning

◦ Time Series Analysis

◦ Hypothesis testing

◦ etc.

Expanding role – proving our value increases the team

Data Science at SQL TeamAzure – Microsoft’s cloud service

Azure SQL DB – service for storing SQL DBs on Azure

Box products – based on assumptions and communication with limited amount of customers

Cloud – possibilities of customer insight like never before

Insights transformed into models in order to fit SQL features to most customers

◦ 99% approach

Insights

Model

Private Preview

General Availability

Box

Before we proceed to Data Science team’s strategy, one small question

What would happen to your company if it were to lose all data

scientists?

Do data science teams bring enough value?Recognizing a gap:

◦ Data Science for marketing (IT) or regulations (banks) purposes

◦ Apart from the trivial ones, insights/models not used on a fundamental level

Why is there a gap?◦ Independence/Separation of Data Science teams

◦ Working on complex models

◦ Not scalable

◦ Needs internal marketing to push through

◦ Policies don’t work

Worst-case ScenarioData Science certainly is the future, but…

Perceived Not Adding Value + Hiring Heavily = Bubble:◦ Is data science going to follow .com or smartphone industry path?

How are we adding value?Our solution – mixed teams:

◦ Building expert systems

◦ No data science specific projects

◦ Working to the smallest details with SEs

◦ Complete understanding between each other’s work and models

This leads to products with data scientists’ signatures all over them

Absolutely never do DS specific stuff? Of course not!◦ ML for targeted marketing

◦ AR + FFT for predictability

◦ Expert systems – root which brings freedom to experiment

Mapping to other companiesHow does it map to other companies/industries:

◦ Credit Sales with Quantitative Analysts (Finance)

◦ Management with Business Intelligence (Retail)

◦ Software Engineers with Data Science (IT)

Approach good at companies with mature data-driven culture

Company new to data science? Prove value!

Most effective way – Machine Learning

Most effective way of the most effective way – user-friendly Machine Learning tools

This leads us to… Azure ML

Before we continue…

This talk is not an ad for Azure ML

This talk is an ad for ML

Because

The more companies use ML

=> The more companies use Azure ML

So, what is Azure ML?A fully managed cloud service that is designed to enable simple, user-friendly building, deploying, and sharing of predictive analytics solutions

Building ML based solutions using simple drag-and-drop interface - from idea to deployment in a matter of clicks

Includes hundreds of built-in packages and support for custom code

Where is ML applicable? Everywhere!Public High School• Low graduation rate• Predicting drop-out probability• Pointing to “areas of intervention” for each

student-at-risk (reading, math, psychologist)• 90% accuracy => graduation rate increased by

nearly 50%

Bank• Predicting default probability• Learning based on historical data• Tailor made offer/interest rate

Aircraft Engine Producer• Predict aircraft engine failure• Maintenance expensive, done on fixed time

periods• Optimize using ML• Sensors in the engine – used for learning• IoT + ML = predicting maintenance

Oil Company• Oil tank spills prevented by periodic pick-up

by trucks• Prediction of oil levels in tanks using time

derivative• Alerts -> pick-up by trucks only when needed

Sports industry case studyProblem definitionNFL team

Problem: Non-targeted marketing of seasonal ticket packages

Goal: Use Machine Learning to identify potential customers who are more likely to purchase the next season ticket package and maintain a good customer retention rate to keep loyalty customers

Why: Target marketing and tailor offers for customers recognized as having high probability of purchasing season tickets or current season ticket holders likely to change status in the following season

Business goal: Boost the overall ticket sales and revenues

Sports industry case studyPre-analysisMost customers will not change their purchasing plans

What are the characteristics of those customers who do change plans?◦ Single game ticket or ticket package – historical preference

◦ Purchasing behavioral patterns

◦ Attendance patterns

◦ Financial status

◦ Age

◦ Job

Data sources available:◦ Customer information

◦ Historical sales records

◦ Historical attendance records

◦ Analytics data from third-party

Sports industry case study Feature EngineeringCombine available data sources and characteristics from pre-analysis to define a full feature set needed to build a model for prediction (this includes labels used to train prediction models)

Define a raw dataset – data available directly from the aforementioned data sources

Extract as many features as possible from the raw dataset:◦ Gender◦ Occupation◦ Is season ticket holder?◦ etc.

Features unavailable in the raw dataset, but considered needed for the prediction models, are inferred from the raw data:

◦ Price paid distribution (min, max, mean, standard deviation)◦ Attendance count (for each of the remaining 31 NFL teams)◦ etc.

Over a 100 features in the final set

Sports industry case studyModelling

4-class classifier that used to indicate the purchasing plan changes

◦ Non-season ticket holder to season ticket holder

◦ Season ticket holder to non-season ticket holder

◦ Keep as non-season ticket holder

◦ Keep as season ticket holder

Binary classifier◦ Change current purchasing plan

◦ Not change current purchasing plan

Sports industry case studyModelling (cont.)Try out different classification algorithms to find the one with the best prediction power:

◦ Decision Tree

◦ Decision Forest

◦ Logistic Regression

◦ Neural Networks

◦ etc.

70% of the (historical) dataset used for training, 30% for testing

Models compared based on accuracy on test dataset – best performing 4-class and 2-class models compared on further statistics

2-class model better in terms of accuracy plus 4-class has some logical misses (Renew STH after Non-STH)

Boosted decision tree chosen

Sports industry case studySolution ArchitectureReading from Azure SQL DB

Feature Engineering using Python scripts

Modelling (learning) using Azure ML built-in algorithms

Results stored in Azure Storage

Azure components connected using Azure Data Factory

All connected with NFL team’s Marketing System where relevant decisions are made based on the results

Energy Industry case studyNew York power energy distributor

Analyzing real-time power energy consumption:

◦ Total/Average hourly demand

◦ On-Peak/Off-Peak demand

◦ Regional demand

◦ etc.

Predicting future power energy consumption with ML:

◦ Historical consumption data

◦ Predicted weather (temperature)

Actual and predicted demand with confidence interval

Historical Error Rate

Energy Industry Case Study Solution Architecture

Energy Consumption Data & Weather Data

(Public Source)

Event Hub Stores Streaming

Data

Stream Analytics processes events as they arrive in the EventHub

AML Model Web Service BES endpoint

Power BI Dashboard

Data for

Real-time Processing

Data Stream

Job

Hourly Prediction

Updates

External Data Azure Services

Send to Azure SQL

for batch predictions

Get Data

Azure WebJob

Runs jobs to get data

from public source

Azure SQLContains Historical Energy

Consumption & Weather Data

Real time

data stats

Azure Data Factory Pipeline invokes AML

Web Service

Real Tim

eB

atc

h

ConclusionMany data science strategy approaches based on many different factors:

◦ Expert systems

◦ Independent Data Science teams

◦ Using third-party vendor

◦ Experimentation with Data Science – most efficient by ML

ML can be used as proof of concept for value added by introducing data-driven culture

Azure ML as an example of simple implementation for experimentation with ML

Many success stories in widest range of areas:◦ Education

◦ Energy

◦ Finance

◦ Sports

◦ Aircraft Industry

◦ etc.

Thank you!

Q & A