30
Copyright © 2012, SAS Institute Inc. All rights reserved. High Performance Analytics and the Challenges of Big Data SAS Business Analytics Forum Vancouver, 28 Nov 2012 Charu Shankar, SAS Technical Specialist

2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

High Performance Analytics and the Challenges of Big Data SAS Business Analytics Forum

Vancouver, 28 Nov 2012

Charu Shankar, SAS Technical Specialist

Page 2: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Big Data

When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making

Big Data is RELATIVE not ABSOLUTE

Page 3: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

VOLUME

VARIETY

VELOCITY

VALUE

TODAY THE FUTURE

DA

TA

SIZ

E

THRIVING IN THE BIG DATA ERA

Page 4: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Most organizations:

Can’t generate the information they need.

Can’t generate information fast enough to act on it.

Continue to incur huge costs due to uninformed decisions and misguided strategies.

The opportunities afforded by analytics have never

been greater.

THE ANALYTICS GAP OUR PERSPECTIVE

Page 5: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Does this look familiiar

Data is a

corporate asset

yet org are not

leveraging the

asset like they

do labour &

capital assets

they normally

have.

Page 6: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Data is no longer

in megabytes or

gigabytes

We’re talking

Petabytes

And that is 10 15

1.1 VOLUME

Page 7: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

1.2 VARIETY & this is a real life experience

The New

Facebook

Twitter

LinkedIn

Blogsphere

YIKES!

The Old

Print Media only

Page 8: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

1.3 Velocity.. Big data is coming at high velocity. Are you ready?

VELOCITY

Page 9: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Page 10: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

IDENTIFY /

FORMULATE

PROBLEM

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTS

THE ANALYTICS LIFECYCLE

Data is the number one challenge in the adoption or use of business analytics.

Companies continue to struggle with data accuracy, consistency, and even access.

Bloomberg BusinessWeek Survey 2011

• Consumes up to 80% of the project

• Specific to the data and the analysis

DATA

PREPARATION

2.1 Problem #1 Data Preparation Time is part of the problem

Page 11: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

2.2 Problem # 2 - Shortage of Talent

Page 12: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

2.3 Problem # 3-Our working ways don’t help

Page 13: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Health Data cholesterol counts

Page 14: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Copyright © 2012, SAS Institute Inc. All rights reserved.

FORECASTING

DATA MINING

TEXT ANALYTICS

OPTIMIZATION

STATISTICS

Finding treasures in unstructured data

like social media or survey tools

that could uncover insights

about consumer sentiment

Mine transaction databases

for data of spending patterns

that indicate a stolen card

Leveraging historical data

to drive better insight into

decision-making

for the future

Analyze massive

amounts of data in

order to accurately

identify areas likely to

produce the most

profitable results

ANALYTICS

BUSINESS

INTELLIGENCE

Page 15: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.1 Some Definitions

1. HPA is the ability to rapidly perform complex analysis on big data, enabling

you to solve problems that you thought were unsolvable. HP on the front of

a proc.

2. HPA Server - lifts data into memory. When it sees HP PROC it splits into

worker nodes to split up sorting data, summarizing data, and even the sort

it splits up to do the work parallely

3. SAS VA provides a drag and drop web interface to enable you to quickly

explore huge amounts of data.

4. Hadoop Think of it as an infinitely expandable filing cabinet

5. That has the ability to help you summarize

what is stored in it

5. SAS LASR Server - is part of HPAS(High performance analytic server). Its

role is to push data into Memory.

Page 16: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Data Mining Models

Which products are customers likely to buy?

Which workers are likely to quit/resign/be fired?

Text Models

What are people saying about my products and services? Can I detect emerging issues from customer feedback or service claims?

Forecasting Models

How many products will be sold this year, next year?

How does this break down into each product over the next 3 months, 6 months?

Operations Research

What is the optimal inventory and stock to be held of each of the products to minimize out of stock and overall holding costs?

What is the least cost route for transporting goods from warehouses to final destinations? (PRESCRIPTIVE)

3.2 What questions should we be asking?

Page 17: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Range penetration -

salary level compared to peers

WHAT CAN DATA MINING MODELS TELL US?

Page 18: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

What questions to ask.. Are we satisfied with age & income, what

happens when we add location?

Page 19: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

TELCO -cust satisfaction at a telco, wait time is imp, then I might take action

to put best customers head of the line. I can influence cust satisfaction by

understanding underlying factors & then taking action to influence

purchasing behaviour.

HEALTH -The next cure for cancer lies in big data. If we had a way to track,

monitor, store & retrieve cancer patients’ way of life, we would be able to

draw inferences to lead us to cure.

PUBLIC HEALTH-Public Health Ontario built their own HP cluster for

genomic analysis to analyze flu strain quickly. Have we seen it before or is it

a new pandemic. Don't want to wait weeks to understand implication. SARS

example of how long it took to get answers. Hpserver removes technology

constraint.

The value of harvesting big data in different industries

Page 20: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

example-HPA in unemployment statistics

Saskatchewan-5%

Alberta - 4.5%

Ontario - 7.9% Looks like labour doesn't move easily.

What if we were able to compute the unemployment rate in

provinces early on, that’s where HP comes in. taking masses &

masses of data on population growth, immigration levels, and then

help government design programs to attract newcomers to provinces

where unemployment rates are lower, giving incentives etc.

Page 21: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

HPA value another example

More labour economics, this time about your work. The Data Scientist

EMC Survey 65% of the respondents expect demand for data scientists to

outstrip availability over the next five years

Page 22: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA help?

Page 23: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA help?

Page 24: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA help?

Page 25: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA help?

Page 26: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA help?

Page 27: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA help?

Page 28: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA help?

Page 29: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Key Takeaways of working with big data using HPA

• Working with entire data no longer just a sample

• Leverage real time data access

Page 30: 2012 High Performance Analytics - Sas Institute · Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. What if we were able to compute the unemployment

Copyright © 2012, SAS Institute Inc. All rights reserved.

Questions & Comments

[email protected]

SAS BLOG http://blogs.sas.com/content/sastraining/author/charushankar/

TWITTER CharuSAS