57
DATA ANALYTICS IN DECISION MAKING S Anand, Chief Data Scientist, Gramener

Data analytics in decision making

Embed Size (px)

Citation preview

Page 1: Data analytics in decision making

DATA ANALYTICS INDECISION MAKING

S Anand, Chief Data Scientist, Gramener

Page 2: Data analytics in decision making

DO THESE FOUR CITIES LOOK IDENTICAL TO YOU?

So is the variance in sales.Variance in price is the same.

Average sales is the same too.Average price is the same.

Take a look at the sales report alongside. A company has branches in 4 cities, and each branch changes the product price every month. This leads to a corresponding change in the sales.

Here is the performance of the four branches with their monthly price and sales for each month.

Looking at the average, the four branches have an identical performance.

2010 Boston Chicago Detroit New York

Month Price Sales Price Sales Price Sales Price Sales

Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58

Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76

Mar 13.0 7.58 13.0 8.74 13.012.7

48.0 7.71

Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84

May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47

Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04

Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25

Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.012.5

0

Sep 12.010.8

412.0 9.13 12.0 8.15 8.0 5.56

Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91

Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50

Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75

DO YOU AGREE?

Page 3: Data analytics in decision making

ARE THEY REALLY IDENTICAL? CHECK AGAIN…

But in fact, the four cities are totally different in behaviour.

Boston’s sales has generally increased with price.

Detroit has a nearly perfect increase in sales with price, except for one aberration.

Chicago shows a decline in sales beyond a price of 10.

New York’s sales fluctuates despite a nearly constant price.

Boston Detroit

Chicago New York

Page 4: Data analytics in decision making
Page 5: Data analytics in decision making
Page 6: Data analytics in decision making
Page 7: Data analytics in decision making

Rural

Semi-urban

Urban

Metro

Total

Sanctioned

Utilised

Gap

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Page 8: Data analytics in decision making

INVESTMENTS IN BIG DATA & ANALYTICS NEED NOT GUARANTEE

BUSINESS EFFECTIVENESSNo coherent

consumption

Enterprises have a disjoint view of data across

divisions. This impedes org action & speed

Last-mile disconnect

Longer Realization

s

Processed & analyzed data is not presented effectively as a

story. Meaningful consumption is an issue

Implementation takes years. System stabilization takes 1-2

years or more, with prohibitive cost of change

ENTERPRISES NEED HELP CROSSING THE ANALYTICS CHASM

Org design Impedes

Org structures & authorization processes impede quick action after data bears needed action

Page 9: Data analytics in decision making

COUNTER-INTUITION:

INSIGHTS FROM DATA

Page 10: Data analytics in decision making

PREDICTING MARKS

“What determines a child’s marks?

Do girls score better than boys?

Does the choice of subject matter?

Does the medium of instruction matter?

Does community or religion matter?

Does their birthday matter?

Does the first letter of their name matter?

EDUCATION

Page 11: Data analytics in decision making

TN CLASS X: ENGLISH

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Page 12: Data analytics in decision making

TN CLASS X: SOCIAL SCIENCE

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Page 13: Data analytics in decision making

TN CLASS X: MATHEMATICS

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Page 14: Data analytics in decision making

CBSE 2013 CLASS XII: ENGLISH MARKS

Page 15: Data analytics in decision making

DETECTING FRAUD

Page 16: Data analytics in decision making

DETECTING FRAUD

“We know meter readings are incorrect, for various reasons.

We don’t, however, have the concrete proof we need to start the process of meter reading automation.

Part of our problem is the volume of data that needs to be analysed. The other is the inexperience in tools or analyses to identify such patterns.

ENERGY UTILITY

Page 17: Data analytics in decision making

AN ENERGY UTILITY DETECTED BILLING FRAUD

This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large

number of readings are aligned with the slab boundaries.

Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the number of customers with a customers with a specific bill amount (in units, or KWh).

Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than someone with 100 units. So people have a strong incentive to stay at or within a slab boundary.

An energy utility (with over 50 million subscribers) had 10 years worth of customer billing data available.

Most fraud detection software failed to load the data, and sampled data revealed little or no insight.

This can happen in one of two ways.

First, people may be monitoring their usage very carefully, and turn of their lights and fans the instant their usage hits the slab boundary.

Or, more realistically, there’s probably some level of corruption involved, where customers pay a small sum to the meter reading staff to ensure that it stays exactly at the slab boundary, giving them the advantage of a lower price.

Page 18: Data analytics in decision making

This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large

number of readings are aligned with the tariff slab boundaries.

This clearly shows collusion of some form with the customers.

Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11217 219 200 200 200 200 200 200 200 350 200 200250 200 200 200 201 200 200 200 250 200 200 150250 150 150 200 200 200 200 200 200 200 200 150150 200 200 200 200 200 200 200 200 200 200 50200 200 200 150 180 150 50 100 50 70 100 100100 100 100 100 100 100 100 100 100 100 110 100100 150 123 123 50 100 50 100 100 100 100 100

0 111 100 100 100 100 100 100 100 100 50 500 100 27 100 50 100 100 100 100 100 70 1001 1 1 100 99 50 100 100 100 100 100 100

This happens with specific customers, not randomly. Here are such customers’ meter readings.

Section

Apr-10

May-10

Jun-10

Jul-10

Aug-10Sep-10

Oct-10Nov-10

Dec-10

Jan-11

Feb-11

Mar-11

Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%Section 2 66% 92% 66% 87% 70% 64% 63% 50% 58% 38% 41% 54%Section 3 90% 46% 47% 43% 28% 31% 50% 32% 19% 38% 8% 34%Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%

If we define the “extent of fraud” as the percentage excess of the 100 unitmeter reading, the value varies considerably across sections, and time

New section manager arrives

… and is transferred out

… with some explainable anomalies.

Why would these

happen?

Page 19: Data analytics in decision making

SIMPLE HEURISTICS

EMERGENCY

“A man is rushed to a hospital in the throes of a heart attack.

The nurse needs to decide whether the victim should be admitted into emergency care.

Although this decision can save or cost a life, the nurse must decide using only the available cues, and within a few seconds – preferably using some fancy statistical software package.

Page 20: Data analytics in decision making

SIMPLE HEURISTICS

EMERGENCY

Pressure < 91

Age > 62

Pulse > 100

No Yes

No Yes

No Yes

Page 21: Data analytics in decision making

8.3% 0.0%100 0.0%Base

OK

WASTED

Marketing cost

Rs 40

MISSED

Acquisition cost

Rs 80

OK

No churn Churn

No c

hu

rnC

hu

rn

Prediction

Act

ual

MISSED WASTEDCOST PER

CUST.IMPROVEMEN

TMODEL

Page 22: Data analytics in decision making

3.2% 3.6%

MISSED WASTED

61.7

COST PER CUST.

39.3%

IMPROVEMENT

Decision tree

MODEL

Outgoing call

0 0 - 4 15+5-14

1

REFILL AMOUNT > 50

RS

01

YN

> 1 RECHARGE

0

N Y

Page 23: Data analytics in decision making

0.6% 2.5%

MISSED WASTED

34.0

COST PER CUST.

66.6%

IMPROVEMENT

SVM

MODEL

Page 24: Data analytics in decision making

TAKEAWAYS

1. In a single circle with 2 crore customers,

this improvement represents a saving of Rs

2.6 x 2 cr ~ Rs 5 cr / month / circle

2. Testing structure allows us to test out any

number of models, and evaluate their

effectiveness

3. Need to trade-off between simplicity vs over-

fitting. Incremental improvements often not

worth the trouble

4. Implementation needs to be constantly

monitored, with continuous re-evaluation of

the model

Page 25: Data analytics in decision making

ANALYSING CAUSAL DRIVERS

We group by every input

factor

… and calculate the impact on every metric.

By moving from average to the best group, what’s the improvement?

The actual performance by each group is shown

0-3m 3-6m 6m-1yr 1-2 yrs > 2 yrs

11 12.3 12.7 15.3 16.1

Only significant results shown

Page 26: Data analytics in decision making

EMERGENT PATTERNS

Page 27: Data analytics in decision making

Tata TeleservicesTata Consultancy Services

Tata Business Support ServicesTata Global BeveragesTata Infotech (merged)

Tata Toyo RadiatorHoneywell Automation India

Tata CommunicationsA G C Networks

Tata Technologies

Tata ProjectsTata PowerTata FinanceIdea CellularTata MotorsTata SonsTata SteelTayo RollsTata SecuritiesTata CoffeeTata Investment Corp

A J EngineerH H MalghamH K SethnaKeshub MahindraRavi KantRussi ModySujit Gupta

A S BamAmal GanguliD B EngineerD N GhoshM N BhagwatN N KampaniU M Rao

B MuthuramanIshaat Hussain

J J IraniN A PalkhivalaN A Soonawala

R GopalakrishnanRatan Tata

S RamadoraiS Ramakrishnan

DIRECTORSHIPS AT THE TATASEvery person who was a Director at the Tata Group is shown here as an orange circle. The size of the circle is based on the number of directorship positions held over their lifetime.Every company in the Tata Group is shown here as a blue circle. The size of the circle is based on the number of directors the company has had over time.Every directorship relation is shown by a line. If a person has held a directorship position at a company, the two are connected by a line.The group appears to be divided into two clusters based on the network of directorship roles.

Prominent leadersbridge the groups

Second group of companies

First group of companies

Some directors are mainly associated with the first group of companies

Some directors are mainly associated with the second group of companies

Page 28: Data analytics in decision making

SIMILARITIES IN AN SME TRANSACTION NETWORK

The same visual was applied to the SME clientele of a bank

• Identified clusters of SMEs transacting with each other

• Targeted non-clients in the middle of a client cluster

• Enhanced service for client in the middle of non-clients

This resulted in a28% QOQ GROWTHin new accounts (against a default QoQ base of 3-8% in the city for the last 5 years)

We’ve used network diagrams to detect terrorism, corporate fraud,de-dup customers, and identify product affinities

Page 29: Data analytics in decision making

MONITORING PERFORMANCE

Page 30: Data analytics in decision making

PORTFOLIO PERFORMANCE VISUAL

Worldwide$288.0mn

A: Accelerate$68.9mn

B: Build$77.2mn

C: Cut down$141.9mn

Worldwide:$288 mn UK: 87.0

Stores: 34.4

Product 9: 6.2Product 10: 5.4Product 7: 5.1Product 15: 4.8

Product 8: 3.1Product 14: 2.1

Partners: 29.2Product 15: 6.7Product 17: 4.1Product 6: 3.4Product 1: 3.2Product 7: 2.9Product 11: 2.4

Direct: 23.5 Product 17: 5.2

Product 8: 4.4Product 16: 4.0

Product 14: 2.5

Product 1: 2.5

Japan

: 71.9 Stores:

25.9 Product 14: 6

.0

Product 7: 5

.4

Product 11: 4

.0

Product 17: 2

.8

Partn

ers:

25.5Pro

duct 8: 8

.2

Product

11: 3

.6

Product

16: 3

.3

Product

1: 3

.1

Product

9: 2

.0

Dire

ct: 2

0.5

Produ

ct 1

1: 5

.2

Prod

uct 1

5: 4

.5

Prod

uct 1

4: 2

.8

Prod

uct 9

: 2.3

Chi

na: 6

5.6

Part

ners

: 27.

3

Prod

uct 1

0: 8

.0

Prod

uct 3

: 7.1

Prod

uct 15

: 3.0

Prod

uct 2:

2.1

Prod

uct 8

: 2.0

Dir

ect:

19.

6

Prod

uct 3:

5.5

Pro

duct

2: 4

.7

Pro

duct

8: 2

.6

Prod

uct

17: 2.

1

Sto

res:

18.7

Pro

duct

10:

5.4

Pro

duct

14:

2.2

Pro

duct

7: 2.

1

Pro

duct

15: 2.0

India

: 46.6

Sto

res:

17.5

Pro

duct

16: 6.8

Dir

ect:

15.6

Pro

duct

10:

3.4

Pro

duct

16:

2.9

Pro

duct

17: 2.5

Pro

duct

7:

2.4

Part

ner

s: 1

3.4

Pro

du

ct 8

: 2.5

Pro

du

ct 7

: 2

.3

US

: 1

7.0

Part

ners

: 6

.0P

rodu

ct 1

0:

4.4

Dir

ect

: 5

.8P

rod

uct

11:

3.9

Sto

res:

5.3

Pro

du

ct 1

1:

3. 8

The visualization shows the market opportunities across various countries to identify areas of focus. This chart has been built as an interactive-app to present the key findings, while letting user click-through and drill-down to a custom view across 4 different levels.

Open

Page 31: Data analytics in decision making

BANKING DASHBOARD

Product Profitability

Cross Holding Analysis

ATM Transactions

Branch Performance

Employee Productivity

600+ mn transactions

40+ GB of data

11,000+ ATMs

2000+ Branches

120+ products

Hourly view

Data processed

Page 35: Data analytics in decision making

LIVE MONITORING: IMPACT OF BUDGET ON STOCKS

Page 36: Data analytics in decision making

LEVERAGING CROSS-SELL

Page 37: Data analytics in decision making

FINDING PATTERNS

“Which securities move together?

How should I diversify?

What should I sell to reduce risk?

What’s a reliable predictor of a security?

SECURITIES

Page 38: Data analytics in decision making

68% correlation between AUD &

EUR

Plot of 6 month daily AUD - EUR

values

Block of correlated currencies

… clustered hierarchically

Page 39: Data analytics in decision making

RESTAURANT: PRODUCT SALES CORRELATION

Page 40: Data analytics in decision making

RESTAURANT: PRODUCT SALES CORRELATION

Page 41: Data analytics in decision making

RESTAURANT FOUND AN UNUSUAL DIP IN SALESA restaurant chain had data for every single transaction made over a few years. Plotting this as a time series showed them nothing unusual.

However, the same data on a calendar map reveals a very different story.

Specifically, at the bottom left point-of-sale terminal, sales dips on every Wednesday. At the bottom right point-of-sale terminal, sales rises on every Wednesday (almost as if to compensate for the loss.)

It turns out that the manager closes the bottom-left counter every Wednesday afternoon due to shortage of staff, assuming that it results in no loss of sales. There is, however, a net loss every Wednesday.

Page 42: Data analytics in decision making

BANK FOUND ALL LOANS BEFORE 20TH POOR

Every loan disbursed after the 20th of the month, i.e. from the 21st to the end of the month, shows consistently lower non-performing assets (i.e. better quality) than any loan disbursed prior to the 20th.

The bank mapped this back to their incentive scheme. The sales team’s commission is based only on loans disbursed until the 20th. Hence new loans are squeezed into this period without regard for their quality.

The personal finance division of a bank, focusing on retail loans, drove its sales through a branch sales team.

A study of the non-performing assets of loans generated over the course of one year shows a strange pattern.

Analytics can detect something that you’re specifically looking for.

It takes a visual to detect what we don’t know to look for

This representation, known as a calendar map, can show some interesting patterns, particularly weekday-based patterns, as the next example will show.

Page 43: Data analytics in decision making

MONITORING SOCIAL MEDIA

Page 46: Data analytics in decision making
Page 49: Data analytics in decision making

UNSTRUCTURED CONTENT

Page 50: Data analytics in decision making

How does Mahabharata, one of the largest epics

with 1.8 million words lend itself to text analytics?

Can this ‘unstructured data’ be processed to

extract analytical insights?

What does sentiment analysis of this tome convey?

Is there a better way to explore relations between

characters?

How can closeness of characters be analysed &

visualized?

VISUALISING THE MAHABHARATA

Page 51: Data analytics in decision making

3642 LIC3148 MTNL2494 BSES

444 RELIANCE ENERGY426 ESCROW396 ICICI378 CLG RTD294 MAHANAGAR GAS232 HDFC216 MAHANGAR GAS LTD212 ORANGE204 LIC OF INDIA190 ESCROW A/C

Page 53: Data analytics in decision making

BUILDING ANALYTIC CAPABILITY

DATA → INSIGHTS → ACTION

Page 54: Data analytics in decision making

TWO ROUTES TO BUILDING ANALYTIC CAPABILITY

Stakeholder groups

Objectives Initiatives Questions Data

have a set of that can be met by which answer specific using

for that meet that can address suggests

Business driven approach

Data driven approach

Importance

Ease

Quick wins

Strategic

Deferred

Revenue impactBreadth of usageEffort reduction

Data availabilityTechnology feasibility

Start small with quick wins

Cover strategic landscape

Deferreds become easier with growing capability

Actions

Gap in current reports

Addressed by current reports

1

2

Page 55: Data analytics in decision making

TYPICAL INITIATIVES WE SEE ACROSS BANKS TODAY

Deposit mobilisation

Product performance

Branch performance

Employee performance

Transaction performance (e.g. ATM)

Performance

Product bundling

Competitive positioning

Product management

Predicting churn

Driving cross-sell

Product recommendations

Customer mgmt

Fraud detection

Scenario modelling (e.g. interest rate change)

Risk management

Data driven insights in statements

Social listening

Client communication

Infrastructure Initiatives in parallel: Digitisation and Data Cleansing

Page 56: Data analytics in decision making

NEW TECHNIQUES MAKE THESE POSSIBLEThe visuals shown in the earlier slides were created using the Gramener visualization server, which leverages some of the recent innovations at Gramener in automating

Visuals are templatized.

As the data or the parameters change, the visuals are re-drawn to match the data, ensuring that the view shows live data in real-time.

We’ve extracted common patterns of insights that apply across all datasets. When data is fed in, these automated analysis components perform a sequence of analytic steps and display results visually.

Binding visuals together into a logical story using text or audio that weaves a story is an integral part of communicating insights. This too is automated in Gramener’s visualizations.

Visualizations Analysis Narration

For e.g., this has been used to• view social media events• election results• oil leakages in fuel stations• monitor retail inventory• plan truck delivery• monitor sentiments on

social media

This has been applied to• identify which security

would go well with a given portfolio

• predict which telecom customers will leave

• assess the impact of changing delivery channel for proxy votes

This has been applied to• automatically “writing” a

newspaper column on the day’s stock market

• automatically writing the report summarising the status of clinical trials

• automated videos

These techniques are focused on automating patterns of insights made by humans – effectively systematizing the “magic” that happens when we find something interesting in data. This is similar to how chess playing programs work. It’s not intelligent, as such. It just calculates and evaluates so many moves automatically that it seems intelligent.AUTOMATIO

N

Page 57: Data analytics in decision making

TAKE YOUR NEXT STEP TOWARDS

DATA-DRIVEN LEADERSHIP

S Anand, Chief Data Scientist, Gramener