Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Copyright © 2012, SAS Institute Inc. All rights reserved.
High Performance Analytics and the Challenges of Big Data SAS Business Analytics Forum
Vancouver, 28 Nov 2012
Charu Shankar, SAS Technical Specialist
Copyright © 2012, SAS Institute Inc. All rights reserved.
Big Data
When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making
Big Data is RELATIVE not ABSOLUTE
Copyright © 2012, SAS Institute Inc. All rights reserved.
VOLUME
VARIETY
VELOCITY
VALUE
TODAY THE FUTURE
DA
TA
SIZ
E
THRIVING IN THE BIG DATA ERA
Copyright © 2012, SAS Institute Inc. All rights reserved.
Most organizations:
Can’t generate the information they need.
Can’t generate information fast enough to act on it.
Continue to incur huge costs due to uninformed decisions and misguided strategies.
The opportunities afforded by analytics have never
been greater.
THE ANALYTICS GAP OUR PERSPECTIVE
Copyright © 2012, SAS Institute Inc. All rights reserved.
Does this look familiiar
Data is a
corporate asset
yet org are not
leveraging the
asset like they
do labour &
capital assets
they normally
have.
Copyright © 2012, SAS Institute Inc. All rights reserved.
Data is no longer
in megabytes or
gigabytes
We’re talking
Petabytes
And that is 10 15
1.1 VOLUME
Copyright © 2012, SAS Institute Inc. All rights reserved.
1.2 VARIETY & this is a real life experience
The New
Blogsphere
YIKES!
The Old
Print Media only
Copyright © 2012, SAS Institute Inc. All rights reserved.
1.3 Velocity.. Big data is coming at high velocity. Are you ready?
VELOCITY
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
IDENTIFY /
FORMULATE
PROBLEM
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTS
THE ANALYTICS LIFECYCLE
Data is the number one challenge in the adoption or use of business analytics.
Companies continue to struggle with data accuracy, consistency, and even access.
Bloomberg BusinessWeek Survey 2011
• Consumes up to 80% of the project
• Specific to the data and the analysis
DATA
PREPARATION
2.1 Problem #1 Data Preparation Time is part of the problem
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.2 Problem # 2 - Shortage of Talent
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.3 Problem # 3-Our working ways don’t help
Copyright © 2012, SAS Institute Inc. All rights reserved.
Health Data cholesterol counts
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
FORECASTING
DATA MINING
TEXT ANALYTICS
OPTIMIZATION
STATISTICS
Finding treasures in unstructured data
like social media or survey tools
that could uncover insights
about consumer sentiment
Mine transaction databases
for data of spending patterns
that indicate a stolen card
Leveraging historical data
to drive better insight into
decision-making
for the future
Analyze massive
amounts of data in
order to accurately
identify areas likely to
produce the most
profitable results
ANALYTICS
BUSINESS
INTELLIGENCE
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.1 Some Definitions
1. HPA is the ability to rapidly perform complex analysis on big data, enabling
you to solve problems that you thought were unsolvable. HP on the front of
a proc.
2. HPA Server - lifts data into memory. When it sees HP PROC it splits into
worker nodes to split up sorting data, summarizing data, and even the sort
it splits up to do the work parallely
3. SAS VA provides a drag and drop web interface to enable you to quickly
explore huge amounts of data.
4. Hadoop Think of it as an infinitely expandable filing cabinet
5. That has the ability to help you summarize
what is stored in it
5. SAS LASR Server - is part of HPAS(High performance analytic server). Its
role is to push data into Memory.
Copyright © 2012, SAS Institute Inc. All rights reserved.
Data Mining Models
Which products are customers likely to buy?
Which workers are likely to quit/resign/be fired?
Text Models
What are people saying about my products and services? Can I detect emerging issues from customer feedback or service claims?
Forecasting Models
How many products will be sold this year, next year?
How does this break down into each product over the next 3 months, 6 months?
Operations Research
What is the optimal inventory and stock to be held of each of the products to minimize out of stock and overall holding costs?
What is the least cost route for transporting goods from warehouses to final destinations? (PRESCRIPTIVE)
3.2 What questions should we be asking?
Copyright © 2012, SAS Institute Inc. All rights reserved.
Range penetration -
salary level compared to peers
WHAT CAN DATA MINING MODELS TELL US?
Copyright © 2012, SAS Institute Inc. All rights reserved.
What questions to ask.. Are we satisfied with age & income, what
happens when we add location?
Copyright © 2012, SAS Institute Inc. All rights reserved.
TELCO -cust satisfaction at a telco, wait time is imp, then I might take action
to put best customers head of the line. I can influence cust satisfaction by
understanding underlying factors & then taking action to influence
purchasing behaviour.
HEALTH -The next cure for cancer lies in big data. If we had a way to track,
monitor, store & retrieve cancer patients’ way of life, we would be able to
draw inferences to lead us to cure.
PUBLIC HEALTH-Public Health Ontario built their own HP cluster for
genomic analysis to analyze flu strain quickly. Have we seen it before or is it
a new pandemic. Don't want to wait weeks to understand implication. SARS
example of how long it took to get answers. Hpserver removes technology
constraint.
The value of harvesting big data in different industries
Copyright © 2012, SAS Institute Inc. All rights reserved.
example-HPA in unemployment statistics
Saskatchewan-5%
Alberta - 4.5%
Ontario - 7.9% Looks like labour doesn't move easily.
What if we were able to compute the unemployment rate in
provinces early on, that’s where HP comes in. taking masses &
masses of data on population growth, immigration levels, and then
help government design programs to attract newcomers to provinces
where unemployment rates are lower, giving incentives etc.
Copyright © 2012, SAS Institute Inc. All rights reserved.
HPA value another example
More labour economics, this time about your work. The Data Scientist
EMC Survey 65% of the respondents expect demand for data scientists to
outstrip availability over the next five years
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
Key Takeaways of working with big data using HPA
• Working with entire data no longer just a sample
• Leverage real time data access
Copyright © 2012, SAS Institute Inc. All rights reserved.
Questions & Comments
SAS BLOG http://blogs.sas.com/content/sastraining/author/charushankar/
TWITTER CharuSAS