Business Analytics and Optimization Introduction (part 2)

Business Analytics and Optimization: A Technical Introduction (Part 2)

Oleksandr Romanko, Ph.D. Senior Research Analyst, Risk Analytics – Business Analytics, IBM Adjunct Professor, University of Toronto

Toronto SMAC Meetup October 9, 2014

© 2014 IBM Corporation

Business Analytics


Predictive Analytics What will happen?

Descriptive Analytics What has happened?

Prescriptive Analytics What should we do?

What is analytics?

Data Insight Action

Decide Analyze

Business Value

3

Analytics is the scientific process of deriving insights from

data in order to make decisions


Business Analytics Education


IBM Academic Initiative program

Cognos SPSS ILOG


Business Analytics programs – curriculum

Applied Statistics and Probability

Fundamentals of Computational Mathematics

Data Mining and Knowledge Discovery

Simulation Modelling

Optimization

Financial Decision Making

Computational Methods for Business Data Analysis

Computational Finance and Risk Management

Visual Analytics and Knowledge Representation

Mathematical Modelling for Business

Machine Learning, Cognitive Computing and Artificial Intelligence

Marketing Analytics

Strategies for Managing Innovations

Analytics of Web, Social Networks and Business News


Applied Statistics


What kind of data are we dealing with?

Types of data

• Quantitative

• Categorical (ordered, unordered)

Data collection

• Independent observations (one observation per subject)

• Dependent observations (repeated observation of the same subject, relationships

within groups, relationships over time or space)

Type of data drives the direction of your analysis

• How to plot

• How to summarize

• How to draw inferences and conclusions

• How to issue predictions

8


Quantitative data

Examples: temperature, age, income

Quick check: “Does it makes sense to calculate an average?”

Appropriate summary statistics:

– Mean and Median

– Standard Deviation

– Percentiles

More advanced predictive methods: Regression, Time Series Analysis, …

Plot your data!

9


Summarizing quantitative data

One-number summaries

– Mean

Average, obtained by summing all observations and dividing by the number of obs.

– Median

The center value, below and above which you will find 50% of the observations.

Summarizing your data with one number may not tell the whole story:

10

Median = 19.8 Median = 19.8 Median = 10.5


“Most observations fall within ±2 standard deviations of the mean.”

Standard deviation

11

If the data is normally distributed

95 % of observations

Standard Deviation = 4.2

~95% of observations between 11.4 and 28.2


Distributions: Normal distribution

12


Distributions

13

© 2014 IBM Corporation 14

Distributions

Estimate of the probability distribution of global mean temperature resulting

from a doubling of CO2 relative to its pre-industrial value, made from

100000 simulations


Modeling


Models


Models

Simplified representation or abstraction of reality

Capture essence of system without unnecessary details

Models tailored for specific types of problems

Models help us understand the world – Prediction (What if?) – Optimization (What’s best?)

Often models much easier, faster, and cheaper to experiment with than the real system


Models and reality

Problem

Decisions

Model

Interpretation

Calculations

From Monahan, G., “Management Decision Making”, Cambridge University Press, 2000

“Real” World

Analysts World

Simplified abstraction

of reality

Capture essence of

problem


Environmental risk management


Predictive maintenance

Wind turbines are big and expensive machines, so keeping them running

smoothly helps keeping their operational cost down. The sensor data generated

by the turbine can help achieving this – by analysing it, you can spot potential

failures earlier. The longer the warning period before a part fails, the better you

can prepare for it.

To do that, you need to be able to

anticipate failures in heavy and

expensive parts like the gearbox,

generator and main shaft.

Preventive maintenance saves

money:

Shorter downtime and less lost

production

Better planning of people and

materials

Cheaper repairs

Source: Algoritmica, http://www.algoritmica.nl


Predictive maintenance – how it works

Wind turbines have an array of sensors that measure temperatures, pressures, voltages,

currents, and blade angles. This data is available for analysis, typically as 10-minute

averages of the sensor values.

The computer that controls the turbine uses these measurements for its operations. This

includes error thresholds like ‘the gearbox oil temperature should be below 120 degrees

Celsius’. However, by the time the threshold is exceeded it is usually too late: the damage

has already been done. To catch failures earlier we should look for anomalies, e.g.

measurements that are unexpected and therefore might indicate a problem – but are not

yet so severe that they exceed a threshold.



Predictive maintenance – anomaly detection

Anomaly detection begins by defining what measurement values are expected and then

calculating the difference with the actual situation. Since sensor data is delivered as a time

series, we create a model that predicts the next value of a specific sensor given its

previous values as well as the previous values of any other sensors that may be relevant.

Based on these multiple inputs, the model then calculates its predicted value and

compares it with the actual sensor reading. The difference (or residual) is now a measure

of how much the turbine is deviating from its expected performance. If it is persistent or

grows too large (i.e. becomes an anomaly), an analyst can investigate the cause and

decide on a course of action together with the operations staff at the wind farm.



Predictive maintenance – machine learning model

To create such a sensor model we apply machine learning or data mining, i.e. one or

more algorithms that use a set of examples (the ‘training set’) to learn a predictive model.

For a wind turbine, it is a natural fit to use a year of sensor data as the training set so that

all seasonal variations are included.



Predictive maintenance – driven by data

This is a data-driven approach: the model learns the relationship between the various

sensor readings purely based on the training data. This is in contrast to a so-called

physical model that explicitly describes the turbine design using detailed knowledge of its

physical characteristics.

The main advantage of a data-driven approach is that the model can be trained by a non-

turbine expert and matches the actual situation by definition, whereas a physical model has

to be carefully calibrated by an expert.



Simulation – Business Case Study


Study environmental impact of restaurant operations

Restaurant

order types and probabilities

processing times (fixed portion and variable portion)

design alternatives

Drive Through

number of service windows

queuing capacity

Parking Lot

parking capacity

customer prioritization

Goals:

maximize customer satisfaction (high customer service level)

minimize environmental impact (quantity of emissions)

Case study – optimal store design


Problem description


Restaurant operations


Restaurant operations


Most of the variable portion of the emissions are generated

at the drive through lane

Customers should be encouraged

to park their cars and enter the

restaurant

Drive through customers

should be served as fast as

possible

Problems with the standard design

less than

12 minutes

waiting for

more than

1 minute to

enter

Results – key indicatotrs

Simulation results


Emissions vs. Customer Satisfaction

Data Points

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Emissions vs. Customer Satisfaction

Data Points and Efficient Frontier

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Customer Prioritization

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Outside

Equal

Inside

Comparing 72 alternatives:

– Limiting drive through to coffee/bakery orders

– Pull-off space for large drive through orders

– 2 or 3 service windows in drive through

– Customer prioritization: inside, outside or equal

– Varying queuing/parking capacity

Drive Through 2- and 3-Window Design

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

3-Window Design

2-Window Design

Pull-Off Space

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Disabled

Enabled

Parking Capacity

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Capacity #1

Capacity #2

Capacity #3

Capacity #4

Drive Through Food Variety

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Drive through limited to

coffee/baked goods

Drive through serving everything

yes

no

3

outside

layout #4

(6/19)

Results - alternatives


Additional extensions and policies

Make orders more expensive for the drive through customers – equivalent of introducing the emission sales tax and can be justified from the

environmental point of view

Provide customers with the information about expected waiting times and

greenhouse gas emissions per vehicle for the drive through lane and for

using the parking lot – this information can be displayed on the illuminated indicator board (lighting panel)

outside the restaurant

The “green” policy of the restaurant:

make drive through more efficient or

encourage customers to use parking lot instead


Recommendations

We recommend implementing the following design:

Drive through limited to coffee and baked goods

No pull-off space

Separate pay and pickup windows at the drive through (3 service

windows)

Priority given to drive through customers (or equal priority if any

difficulties are expected with prioritizing the outside customers)

Any reasonable parking lot/drive through design would work (it

depend more on the physical restrictions on the available space for

the newly planned locations than on the other factors)

Implement our additional recommendations about the staffing patterns and

waiting area size as well as “green” policies


Data Mining


Data mining

37

Data mining application classes of problems –Classification –Clustering –Regression –Forecasting –Others

Hypothesis or discovery driven

Iterative

Scalable


What is the difference between descriptive (BI) and predictive analytics?

38

John Lives in Seattle, zip: 98109 21 years old iPhone 5 Plan: $98 a month Talk: 400 minutes Data: 1.9Gb SMS: 370 Complaints: 0 Customer care calls: 1 Dropped calls: low

Mike Lives in Atlanta, zip: 30308 38 years old Samsung Galaxy S3 Plan: $78 a month Talk: 1200 minutes Data: 0.2 Gb of data SMS: 8 Customer care calls: 6 Dropped calls: high

Low churn risk

High churn risk

Descriptive Predictive


Classification

Classification is a supervised learning technique, which maps data into predefined classes or groups

Training set contains a set of records, where one of the records indicates class

Modeling objective is to assign a class variable to all of the records, using attributes of other variables to predict a class

Data is divided into test / train, where “train” is used to build the model and “test” is used to validate the accuracy of classification

Typical techniques: Decision Trees, Neural Networks

39

Gender Age Lipstick

Female 21 Yes

Male 30 No

Female 14 No

Female 35 Yes

Male 17 No

Female 16 Yes

Customers

Female Male

>=15 years <15 years

Yes No

No


Classification: Creating Model

40

Gender Age Lipstick

Female 21 Yes

Male 30 No

Female 14 No

Female 35 Yes

Male 17 No

Female 16 Yes

Classification Algorithms

Training Data

Trained Classifier

Purchased lipstick if Gender = Female

and Age >= 15

Works with both interval and categorical variables


Classification: Applying Rules

41

Gender Age Lipstick

Female 27 ?

Male 55 ?

Female 47 ?

Male 39 ?

Female 27 ?

Male 19 ?

Gender Age Lipstick

Female 27 P Yes

Male 55 P No

Female 47 P Yes

Male 39 P No

Female 27 P Yes

Male 19 P No

Apply Scoring

If Gender = Female

and Age >= 15 then

Purchase lipstick = YES


Decision (classification) Trees

A tree can be "learned" by splitting the source set into

subsets based on an attribute value test

Tree partitions samples into mutually exclusive groups

by selecting the best splitting attribute, one group for

each terminal node

The process is repeated recursively for each derived

subset, until the stopping criteria is reached

Works with both interval and

categorical variables

No need to normalize the data

Intuitive if-then rules are easy to

extract and apply

Best applied to binary outcomes

Decision trees can be used to

support multiple modeling objectives

o Customer segmentation

o Investment / portfolio decisions

o Issuing a credit card or loan

o Medical patient / disease classification

Customers

Female Male

>=15 years <15 years

Yes No

No


Cluster Analysis (segmentation)

Unsupervised learning algorithm

o Unlabeled data and no “target” variable

Frequently used for segmentation (to identify natural groupings of customers)

o Market segmentation, customer segmentation

Most cluster analysis methods involve the use of a distance measure to calculate

the closeness between pairs of items

o Data points in one cluster are more similar to one another

o Data points in separate clusters are less similar to one another

43

Spend

Income

Cluster #1 Cluster #3

Cluster #2


K-means clustering

44


K-means clustering

45


K-means clustering

46


Clustering: LinkedIn

47


Clustering: LinkedIn


Optimization


Optimization

Optimization problem

Examples:

– Minimize cost

– Maximize profit


Shortest path or most beautiful path?

7


Shortest path or most beautiful path?

7

© 2014 IBM Corporation 53 53

.85

1

.80

1.05 200

M1

100 M2

500 M3

600 M4

Cash, USD

Debt, USD

Cash, EUR

Debt, EUR

200 +

Collateral optimization – problem setup

x8

x1 200

R1

550 R2

300 R3

Only cash

Any

Only EUR


Collateral optimization – problem setup

.85

1

.80

1.05 200 M1

100 M2

500 M3

600 M4

Cash, USD

Debt, USD

Cash, EUR

Debt, EUR

200 +

x8

x1 200 R1

550 R2

300 R3

Only cash

Any

Only EUR


200 M1

100 M2

500 M3

200 R1

550 R2

600 M4

300 R3

.85

1

.80

1.05 200 +

Cash, USD

Debt, USD

Cash, EUR

Debt, EUR 100

Collateral optimization – optimal cost = 985

0

Only cash

Any

Only EUR

0

100

415

600


.85

1

.80

1.05 200

M1

100 M2

500 M3

600 M4

Cash, USD

Debt, USD

Cash, EUR

Debt, EUR

200 +

Collateral optimization – concentration constraints

x8

x1 200

R1

550 R2

300 R3

Only cash

Any

Only EUR

At most 50% EUR in total


Multi-objective optimization

Multi-objective optimization: simultaneously optimizing two or more

conflicting objectives subject to certain constraints

Examples:

Finance: Minimize risk & Maximize return

Business: Minimize cost & Minimize environmental impact

Health care: Maximize X-ray dose to tumor &

Minimize X-ray dose to healthy tissues

Units of the objectives are typically not the same:

dollars, probability, units of time, …


Multi-objective optimization

Solving multi-objective optimization problems:


Visual Analytics


Visual analytics

Visual statistics of the Napoleon Campaign: the Minard Map


Visual analytics


Visual analytics – portfolio


Historical visualization

Activity Histogram Heat Map Track Summary

Distribution of events over time

How long objects spent in different places

Show tracks of all objects returned from search


Visual analytics


Visual analytics

http://www.nytimes.com/2011/11/06/opinion/sunday/population-control-marauder-style.html

• cause (vertical location) • historical time (horizontal location) • duration (equator)

• number of deaths (circle size) • continent (color) • rank, cause, number of deaths (text)










Visualization types


Visualization formatting


Watson Analytics

Natural language dialogue

Cloud-based agility

Data discovery

Quick start intuitive interface

Mobile-ready


Watson Analytics

Unified analytics experience

Visual storytelling

Intelligent automation

Data access and refinement

Report and dashboard

creation

Integrated social business

Guided analytic discovery


Watson Analytics


Watson Analytics


Analytics Software


Software for analytics



Lavastorm survey of analytics tools

Source: R. Muenchen "The Popularity of Data Analysis Software", http://r4stats.com/articles/popularity/



Gartner “Magic Quadrant” plot of companies that sell advanced analtyics software (2014)

Source: R. Muenchen "The Popularity of Data Analysis Software", http://r4stats.com/articles/popularity/



Questions?

Technology

Business Analytics and Optimization Introduction (part 2)