77
Business Analytics and Optimization: A Technical Introduction (Part 2) Oleksandr Romanko, Ph.D. Senior Research Analyst, Risk Analytics Business Analytics, IBM Adjunct Professor, University of Toronto Toronto SMAC Meetup October 9, 2014

Business Analytics and Optimization Introduction (part 2)

Embed Size (px)

DESCRIPTION

Technical introduction to Business Analytics and optimization. This is part 2. Part 1 can be found here: http://www.slideshare.net/rfchong/business-analytics-and-optimization-introduction

Citation preview

Page 1: Business Analytics and Optimization Introduction (part 2)

Business Analytics and Optimization: A Technical Introduction (Part 2)

Oleksandr Romanko, Ph.D. Senior Research Analyst, Risk Analytics – Business Analytics, IBM Adjunct Professor, University of Toronto

Toronto SMAC Meetup October 9, 2014

Page 2: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Business Analytics

Page 3: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Predictive Analytics What will happen?

Descriptive Analytics What has happened?

Prescriptive Analytics What should we do?

What is analytics?

Data Insight Action

Decide Analyze

Business Value

3

Analytics is the scientific process of deriving insights from

data in order to make decisions

Page 4: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Business Analytics Education

Page 5: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

IBM Academic Initiative program

Cognos SPSS ILOG

Page 6: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Business Analytics programs – curriculum

Applied Statistics and Probability

Fundamentals of Computational Mathematics

Data Mining and Knowledge Discovery

Simulation Modelling

Optimization

Financial Decision Making

Computational Methods for Business Data Analysis

Computational Finance and Risk Management

Visual Analytics and Knowledge Representation

Mathematical Modelling for Business

Machine Learning, Cognitive Computing and Artificial Intelligence

Marketing Analytics

Strategies for Managing Innovations

Analytics of Web, Social Networks and Business News

Page 7: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Applied Statistics

Page 8: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

What kind of data are we dealing with?

Types of data

• Quantitative

• Categorical (ordered, unordered)

Data collection

• Independent observations (one observation per subject)

• Dependent observations (repeated observation of the same subject, relationships

within groups, relationships over time or space)

Type of data drives the direction of your analysis

• How to plot

• How to summarize

• How to draw inferences and conclusions

• How to issue predictions

8

Page 9: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Quantitative data

Examples: temperature, age, income

Quick check: “Does it makes sense to calculate an average?”

Appropriate summary statistics:

– Mean and Median

– Standard Deviation

– Percentiles

More advanced predictive methods: Regression, Time Series Analysis, …

Plot your data!

9

Page 10: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Summarizing quantitative data

One-number summaries

– Mean

Average, obtained by summing all observations and dividing by the number of obs.

– Median

The center value, below and above which you will find 50% of the observations.

Summarizing your data with one number may not tell the whole story:

10

Median = 19.8 Median = 19.8 Median = 10.5

Page 11: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

“Most observations fall within ±2 standard deviations of the mean.”

Standard deviation

11

If the data is normally distributed

95 % of observations

Standard Deviation = 4.2

~95% of observations between 11.4 and 28.2

Page 12: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Distributions: Normal distribution

12

Page 13: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Distributions

13

Page 14: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 14

Distributions

Estimate of the probability distribution of global mean temperature resulting

from a doubling of CO2 relative to its pre-industrial value, made from

100000 simulations

Page 15: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Modeling

Page 16: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 16

Models

Page 17: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 17

Models

Simplified representation or abstraction of reality

Capture essence of system without unnecessary details

Models tailored for specific types of problems

Models help us understand the world – Prediction (What if?) – Optimization (What’s best?)

Often models much easier, faster, and cheaper to experiment with than the real system

Page 18: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 18

Models and reality

Problem

Decisions

Model

Interpretation

Calculations

From Monahan, G., “Management Decision Making”, Cambridge University Press, 2000

“Real” World

Analysts World

Simplified abstraction

of reality

Capture essence of

problem

Page 19: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 19

Environmental risk management

Page 20: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 20

Predictive maintenance

Wind turbines are big and expensive machines, so keeping them running

smoothly helps keeping their operational cost down. The sensor data generated

by the turbine can help achieving this – by analysing it, you can spot potential

failures earlier. The longer the warning period before a part fails, the better you

can prepare for it.

To do that, you need to be able to

anticipate failures in heavy and

expensive parts like the gearbox,

generator and main shaft.

Preventive maintenance saves

money:

Shorter downtime and less lost

production

Better planning of people and

materials

Cheaper repairs

Source: Algoritmica, http://www.algoritmica.nl

Page 21: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 21

Predictive maintenance – how it works

Wind turbines have an array of sensors that measure temperatures, pressures, voltages,

currents, and blade angles. This data is available for analysis, typically as 10-minute

averages of the sensor values.

The computer that controls the turbine uses these measurements for its operations. This

includes error thresholds like ‘the gearbox oil temperature should be below 120 degrees

Celsius’. However, by the time the threshold is exceeded it is usually too late: the damage

has already been done. To catch failures earlier we should look for anomalies, e.g.

measurements that are unexpected and therefore might indicate a problem – but are not

yet so severe that they exceed a threshold.

Source: Algoritmica, http://www.algoritmica.nl

Page 22: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 22

Predictive maintenance – anomaly detection

Anomaly detection begins by defining what measurement values are expected and then

calculating the difference with the actual situation. Since sensor data is delivered as a time

series, we create a model that predicts the next value of a specific sensor given its

previous values as well as the previous values of any other sensors that may be relevant.

Based on these multiple inputs, the model then calculates its predicted value and

compares it with the actual sensor reading. The difference (or residual) is now a measure

of how much the turbine is deviating from its expected performance. If it is persistent or

grows too large (i.e. becomes an anomaly), an analyst can investigate the cause and

decide on a course of action together with the operations staff at the wind farm.

Source: Algoritmica, http://www.algoritmica.nl

Page 23: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 23

Predictive maintenance – machine learning model

To create such a sensor model we apply machine learning or data mining, i.e. one or

more algorithms that use a set of examples (the ‘training set’) to learn a predictive model.

For a wind turbine, it is a natural fit to use a year of sensor data as the training set so that

all seasonal variations are included.

Source: Algoritmica, http://www.algoritmica.nl

Page 24: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 24

Predictive maintenance – driven by data

This is a data-driven approach: the model learns the relationship between the various

sensor readings purely based on the training data. This is in contrast to a so-called

physical model that explicitly describes the turbine design using detailed knowledge of its

physical characteristics.

The main advantage of a data-driven approach is that the model can be trained by a non-

turbine expert and matches the actual situation by definition, whereas a physical model has

to be carefully calibrated by an expert.

Source: Algoritmica, http://www.algoritmica.nl

Page 25: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Simulation – Business Case Study

Page 26: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 26

Study environmental impact of restaurant operations

Restaurant

order types and probabilities

processing times (fixed portion and variable portion)

design alternatives

Drive Through

number of service windows

queuing capacity

Parking Lot

parking capacity

customer prioritization

Goals:

maximize customer satisfaction (high customer service level)

minimize environmental impact (quantity of emissions)

Case study – optimal store design

Page 27: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Problem description

Page 28: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Restaurant operations

Page 29: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Restaurant operations

Page 30: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 30

Most of the variable portion of the emissions are generated

at the drive through lane

Customers should be encouraged

to park their cars and enter the

restaurant

Drive through customers

should be served as fast as

possible

Problems with the standard design

Page 31: Business Analytics and Optimization Introduction (part 2)

less than

12 minutes

waiting for

more than

1 minute to

enter

Results – key indicatotrs

Page 32: Business Analytics and Optimization Introduction (part 2)

Simulation results

Page 33: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 33

Emissions vs. Customer Satisfaction

Data Points

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Emissions vs. Customer Satisfaction

Data Points and Efficient Frontier

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Customer Prioritization

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Outside

Equal

Inside

Comparing 72 alternatives:

– Limiting drive through to coffee/bakery orders

– Pull-off space for large drive through orders

– 2 or 3 service windows in drive through

– Customer prioritization: inside, outside or equal

– Varying queuing/parking capacity

Drive Through 2- and 3-Window Design

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

3-Window Design

2-Window Design

Pull-Off Space

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Disabled

Enabled

Parking Capacity

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Capacity #1

Capacity #2

Capacity #3

Capacity #4

Drive Through Food Variety

94

95

96

97

98

99

100

35 45 55 65 75 85 95

Emissions (kg/week)

Cu

sto

mer

sati

sfa

cti

on

(%

)

Drive through limited to

coffee/baked goods

Drive through serving everything

yes

no

3

outside

layout #4

(6/19)

Results - alternatives

Page 34: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 34

Additional extensions and policies

Make orders more expensive for the drive through customers – equivalent of introducing the emission sales tax and can be justified from the

environmental point of view

Provide customers with the information about expected waiting times and

greenhouse gas emissions per vehicle for the drive through lane and for

using the parking lot – this information can be displayed on the illuminated indicator board (lighting panel)

outside the restaurant

The “green” policy of the restaurant:

make drive through more efficient or

encourage customers to use parking lot instead

Page 35: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 35

Recommendations

We recommend implementing the following design:

Drive through limited to coffee and baked goods

No pull-off space

Separate pay and pickup windows at the drive through (3 service

windows)

Priority given to drive through customers (or equal priority if any

difficulties are expected with prioritizing the outside customers)

Any reasonable parking lot/drive through design would work (it

depend more on the physical restrictions on the available space for

the newly planned locations than on the other factors)

Implement our additional recommendations about the staffing patterns and

waiting area size as well as “green” policies

Page 36: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Data Mining

Page 37: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Data mining

37

Data mining application classes of problems –Classification –Clustering –Regression –Forecasting –Others

Hypothesis or discovery driven

Iterative

Scalable

Page 38: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

What is the difference between descriptive (BI) and predictive analytics?

38

John Lives in Seattle, zip: 98109 21 years old iPhone 5 Plan: $98 a month Talk: 400 minutes Data: 1.9Gb SMS: 370 Complaints: 0 Customer care calls: 1 Dropped calls: low

Mike Lives in Atlanta, zip: 30308 38 years old Samsung Galaxy S3 Plan: $78 a month Talk: 1200 minutes Data: 0.2 Gb of data SMS: 8 Customer care calls: 6 Dropped calls: high

Low churn risk

High churn risk

Descriptive Predictive

Page 39: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Classification

Classification is a supervised learning technique, which maps data into predefined classes or groups

Training set contains a set of records, where one of the records indicates class

Modeling objective is to assign a class variable to all of the records, using attributes of other variables to predict a class

Data is divided into test / train, where “train” is used to build the model and “test” is used to validate the accuracy of classification

Typical techniques: Decision Trees, Neural Networks

39

Gender Age Lipstick

Female 21 Yes

Male 30 No

Female 14 No

Female 35 Yes

Male 17 No

Female 16 Yes

Customers

Female Male

>=15 years <15 years

Yes No

No

Page 40: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Classification: Creating Model

40

Gender Age Lipstick

Female 21 Yes

Male 30 No

Female 14 No

Female 35 Yes

Male 17 No

Female 16 Yes

Classification Algorithms

Training Data

Trained Classifier

Purchased lipstick if Gender = Female

and Age >= 15

Works with both interval and categorical variables

Page 41: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Classification: Applying Rules

41

Gender Age Lipstick

Female 27 ?

Male 55 ?

Female 47 ?

Male 39 ?

Female 27 ?

Male 19 ?

Gender Age Lipstick

Female 27 P Yes

Male 55 P No

Female 47 P Yes

Male 39 P No

Female 27 P Yes

Male 19 P No

Apply Scoring

If Gender = Female

and Age >= 15 then

Purchase lipstick = YES

Page 42: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Decision (classification) Trees

A tree can be "learned" by splitting the source set into

subsets based on an attribute value test

Tree partitions samples into mutually exclusive groups

by selecting the best splitting attribute, one group for

each terminal node

The process is repeated recursively for each derived

subset, until the stopping criteria is reached

Works with both interval and

categorical variables

No need to normalize the data

Intuitive if-then rules are easy to

extract and apply

Best applied to binary outcomes

Decision trees can be used to

support multiple modeling objectives

o Customer segmentation

o Investment / portfolio decisions

o Issuing a credit card or loan

o Medical patient / disease classification

Customers

Female Male

>=15 years <15 years

Yes No

No

Page 43: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Cluster Analysis (segmentation)

Unsupervised learning algorithm

o Unlabeled data and no “target” variable

Frequently used for segmentation (to identify natural groupings of customers)

o Market segmentation, customer segmentation

Most cluster analysis methods involve the use of a distance measure to calculate

the closeness between pairs of items

o Data points in one cluster are more similar to one another

o Data points in separate clusters are less similar to one another

43

Spend

Income

Cluster #1 Cluster #3

Cluster #2

Page 44: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

K-means clustering

44

Page 45: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

K-means clustering

45

Page 46: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

K-means clustering

46

Page 47: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Clustering: LinkedIn

47

Page 48: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 48

Clustering: LinkedIn

Page 49: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Optimization

Page 50: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 50

Optimization

Optimization problem

Examples:

– Minimize cost

– Maximize profit

Page 51: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Shortest path or most beautiful path?

7

Page 52: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Shortest path or most beautiful path?

7

Page 53: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 53 53

.85

1

.80

1.05 200

M1

100 M2

500 M3

600 M4

Cash, USD

Debt, USD

Cash, EUR

Debt, EUR

200 +

Collateral optimization – problem setup

x8

x1 200

R1

550 R2

300 R3

Only cash

Any

Only EUR

Page 54: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 54 54

Collateral optimization – problem setup

.85

1

.80

1.05 200 M1

100 M2

500 M3

600 M4

Cash, USD

Debt, USD

Cash, EUR

Debt, EUR

200 +

x8

x1 200 R1

550 R2

300 R3

Only cash

Any

Only EUR

Page 55: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 55 55

200 M1

100 M2

500 M3

200 R1

550 R2

600 M4

300 R3

.85

1

.80

1.05 200 +

Cash, USD

Debt, USD

Cash, EUR

Debt, EUR 100

Collateral optimization – optimal cost = 985

0

Only cash

Any

Only EUR

0

100

415

600

Page 56: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 56 56

.85

1

.80

1.05 200

M1

100 M2

500 M3

600 M4

Cash, USD

Debt, USD

Cash, EUR

Debt, EUR

200 +

Collateral optimization – concentration constraints

x8

x1 200

R1

550 R2

300 R3

Only cash

Any

Only EUR

At most 50% EUR in total

Page 57: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 57

Multi-objective optimization

Multi-objective optimization: simultaneously optimizing two or more

conflicting objectives subject to certain constraints

Examples:

Finance: Minimize risk & Maximize return

Business: Minimize cost & Minimize environmental impact

Health care: Maximize X-ray dose to tumor &

Minimize X-ray dose to healthy tissues

Units of the objectives are typically not the same:

dollars, probability, units of time, …

Page 58: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 58

Multi-objective optimization

Solving multi-objective optimization problems:

Page 59: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Visual Analytics

Page 60: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 60

Visual analytics

Visual statistics of the Napoleon Campaign: the Minard Map

Page 61: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 61

Visual analytics

Page 62: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 62

Visual analytics – portfolio

Page 63: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 63

Historical visualization

Activity Histogram Heat Map Track Summary

Distribution of events over time

How long objects spent in different places

Show tracks of all objects returned from search

Page 64: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 64

Visual analytics

Page 66: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 66

Visualization types

Page 67: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 67

Visualization formatting

Page 68: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 68

Watson Analytics

Natural language dialogue

Cloud-based agility

Data discovery

Quick start intuitive interface

Mobile-ready

Page 69: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 69

Watson Analytics

Unified analytics experience

Visual storytelling

Intelligent automation

Data access and refinement

Report and dashboard

creation

Integrated social business

Guided analytic discovery

Page 70: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 70

Watson Analytics

Page 71: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 71

Watson Analytics

Page 72: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation

Analytics Software

Page 73: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 73

Software for analytics

Page 74: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 74

Software for analytics

Lavastorm survey of analytics tools

Source: R. Muenchen "The Popularity of Data Analysis Software", http://r4stats.com/articles/popularity/

Page 75: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 75

Software for analytics

Gartner “Magic Quadrant” plot of companies that sell advanced analtyics software (2014)

Source: R. Muenchen "The Popularity of Data Analysis Software", http://r4stats.com/articles/popularity/

Page 76: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 76

Page 77: Business Analytics and Optimization Introduction (part 2)

© 2014 IBM Corporation 77

Questions?