19
Data Science A Practitioner’s Perspective Mass Technology Leadership Council Panel Discussion David Menninger, Formerly VP & Research Director, Ventana Research [email protected] ©2012, Ventana Research

Mass tlc presentation menninger

  • Upload
    masstlc

  • View
    174

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Mass tlc presentation    menninger

Data Science A Practitioner’s Perspective

Mass Technology Leadership Council Panel Discussion

David Menninger, Formerly VP & Research Director, Ventana Research

[email protected]

©2012, Ventana Research

Page 2: Mass tlc presentation    menninger

David Menninger

Former Vice President – Ventana Research

Now head of business development and strategy for EMC Greenplum.

Until last week, covered analytics, business intelligence and information management for Ventana Research. Over two decades of experience developing and bringing to market some of the leading edge technologies for helping organizations analyze data to support a range of action-taking and decision-making processes.

Prior to joining Ventana Research, served as VP of Marketing and Product Management at Vertica Systems, Oracle, Applix, InforSenseand IRI Software. Helped create over three quarter billion dollars of shareholder value while serving in these roles.

Email: [email protected]

©2011, Ventana Research, Inc.

2

Page 3: Mass tlc presentation    menninger

Some Recent Relevant Research

Page 4: Mass tlc presentation    menninger

Volume and Velocity of Data Are Most

Important In Evaluating Big Data Technology

4

10%

29%

31%

13%

11%

7%

less than 1 TB

1-10 TB

11-100 TB

101 TB-1 PB

more than 1 PB

Don't know

0% 10% 20% 30% 40%

26%

33%

20%

4%

6%

12%

less than 10 GB per day

11-100 GB per day

101 GB-1 TB per day

1-10 TB per day

More than 10 TB per …

Don't know

0% 10% 20% 30% 40%

Source: Ventana Research The Challenge of Big Data Benchmark Research

©2012, Ventana Research

Page 5: Mass tlc presentation    menninger

Hadoop Is Being Adopted or Considered

by 54% of Enterprises

©2011, Ventana Research, Inc.

5

22%

15%

17%

Production

Planned

Evaluating

Source: Ventana Research Hadoop Information Management Analytics Research

Page 6: Mass tlc presentation    menninger

…but the Vast Majority Use a Variety of

Big Data Technologies

6

89%

70%

34%

33%

22%

26%

15%

2%

7%

11%

13%

12%

4%

9%

2%

1%

3%

4%

3%

4%

5%

3%

4%

21%

17%

17%

10%

19%

3%

18%

31%

33%

45%

57%

51%

An RDBMS (for example, IBM

DB2, Microsoft SQLServer, MySQL, Oracle) on

standard hardware

Flat files

A DW appliance (for example

, Netezza, Exadata, EMC Greenplum, Teradata)

In-memory databases

Hadoop

Other

A specialized DBMS (for

example, Aster Data, Infobright, Kognitio, Parac

cel, SybaseIQ, Vertica)

Currently in production Plan to use within 12 monthsPlan to use in 12-24 months Still evaluatingNo plans to use

Source: Ventana Research The Challenge of Big Data Benchmark Research

©2012, Ventana Research

Page 7: Mass tlc presentation    menninger

What Types of Applications?

©2011, Ventana Research, Inc.

7

What types of large-scale data applications is your

organization running?

60%

63%

65%

56%

69%

46%

44%

89%

71%

68%

60%

47%

32%

32%

Query and reporting

Consolidation of multiple data sources for analysis

Custom/production

application

Data preparation

Advanced analyses

Analysis or indexing

of unstructured data

Data sandbox/

Data experimentation

Hadoop

Non-Hadoop

Hadoop is most often

used for advanced

analyses and is more

likely to be used to

analyze unstructured

data and for data

sandboxing than other

technologies. It is less

likely to be used for

query and reporting.

Source: Ventana Research Hadoop Information Management Analytics Research

Page 8: Mass tlc presentation    menninger

Predictive Analytics Still Emerging

Despite its potential, predictive analytics remain a

specialist tool, ranking 10th among BI capabilities with

only 13% using them

©2012, Ventana Research

Spreadsheets

Business Intelligence

Analytic Databases

Custom-built systems

Data warehouse

Planning and forecasting

Application server

LOB analytics

RDB

Predictive Analytics 13%

60%

49%

41%

34%

28%

26%

20%

18%

14% … yet 80% ranked predictive analytics

capabilities as important or very important

8

Source: Ventana Research Business Analytics Benchmark Research

Page 9: Mass tlc presentation    menninger

Forecasting and Marketing are the Most

Common Uses of Predictive Analytics

72%

70%

45%

43%

34%

28%

27%

26%

18%

17%

17%

16%

9%

17%

24%

22%

34%

22%

31%

28%

38%

27%

34%

36%

27%

29%

33%

24%

Forecasting …

Marketing analyses …

Customer service or support …

Product recommendations or offers

Fraud detection

Intelligence or surveillance analysis

Social network analysis

Logistics analysis

Predicting product development …

Predicting prices in the supply chain

Scientific or clinical research

Healthcare decisions

Predicting mechanical failures

Other

Current

Future

©2012, Ventana Research

9

Source: Ventana Research Predictive Analytics Benchmark Research

Page 10: Mass tlc presentation    menninger

Organizations Employ a Variety of Predictive

Analytics Algorithms

Classification and regression trees / decision trees and Linear

Regression are the most popular predictive analytics techniques used.

©2012, Ventana Research

10

69%

66%

61%

49%

36%

30%

30%

22%

21%

20%

15%

13%

25%

33%

29%

37%

42%

36%

35%

43%

43%

23%

41%

47%

6%

10%

14%

21%

34%

35%

34%

36%

57%

44%

40%

Classification and

regression trees / …

Linear Regression

Logistic regression or

other discrete choice …

Association rules

K-nearest neighbors

Neural networks

Box

Jenkins, Autoregressive …

Exponential smoothing /

double exponential …

Naïve Bayes

Support vector machines

Survival analysis

Monte Carlo Simulations

Frequently Occasionally Not at all

Source: Ventana Research Predictive Analytics Benchmark Research

Page 11: Mass tlc presentation    menninger

Who Designs and Deploys Predictive Analytics?

… but who should be performing these tasks?

©2012, Ventana Research

Q1811

Data Scientist /

Data Mining

Resources

32%

Bus. Intelligence /

Data Warehouse

Team

31%

Line-of-

Business

Analysts

19%

Source: Ventana Research Predictive Analytics Benchmark Research

Page 12: Mass tlc presentation    menninger

Who Does the Best Job?

©2012, Ventana Research

12

70%

65%

59%

50% 55% 60% 65% 70% 75%

Specialized data scientist, statistical or data mining resources

Line of business analysts

Business intelligence and data warehouse team

Satisfaction vs. Project Team

Overall Average

Source: Ventana Research Predictive Analytics Benchmark Research

Page 13: Mass tlc presentation    menninger

Real-Time Scoring of New Records

More than half

the organizations

perform real-time

scoring

infrequently or

not at all.

©2012, Ventana Research

Q2613

Regularly30%

Occasionally18%Infrequently

22%

Not at all30%

Source: Ventana Research Predictive Analytics Benchmark Research

Page 14: Mass tlc presentation    menninger

Organizations Need More Timely Results

from Predictive Analytics

©2012, Ventana Research

14

88%

73%

47%

0% 20% 40% 60% 80% 100%

Regularly

Occasionally

Infrequentlyor Not at all

Satisfaction vs. Use of Real-time Scoring

Overall AverageSource: Ventana Research Predictive Analytics Benchmark Research

Page 15: Mass tlc presentation    menninger

Frequency of Updating Predictive Models

©2012, Ventana Research

Q2715

Constantly12%

Hourly2%

Daily6%

Weekly11%

Monthly14%Quarterly

22%

Less often than

quarterly17%

Don't know16%

Most organizations

don’t update their

analytic models

frequently enough.

Nearly four in 10 update

their models quarterly or

less frequently.

Source: Ventana Research Predictive Analytics Benchmark Research

Page 16: Mass tlc presentation    menninger

Organizations that Update Models More

Frequently Have Higher Satisfaction

©2011, Ventana Research

16

81%

74%

48%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

At Least Daily

At least Monthly

Less Frequently

Satisfaction vs. Model Updates

Overall AverageSource: Ventana Research Predictive Analytics Benchmark Research

Page 17: Mass tlc presentation    menninger

Most Organizations Are Not Providing

Adequate Support and Training

©2012, Ventana Research

17

44%

42%

39%

31%

24%

32%

33%

38%

39%

34%

24%

26%

23%

31%

42%

Training in Predictive analytics

concepts and techniques

Product training

Training in the application of

predictive analytics to business problems

Specialized consulting resources

(internal or external)

Help desk resources

Adequately Only somewhat adequately Inadequately

Source: Ventana Research Predictive Analytics Benchmark Research

Page 18: Mass tlc presentation    menninger

What Types of Training and Support Are

Most Effective?

©2012, Ventana Research

18

Overall Average

89%

89%

86%

79%

77%

60% 65% 70% 75% 80% 85% 90% 95%

Training in Predictive analytics concepts and techniques

Help desk resources

Training in the application of predictive analytics to business problems

Product training

Specialized consulting resources (internal or external)

Satisfaction vs. Training and Support

Source: Ventana Research Predictive Analytics Benchmark Research

Page 19: Mass tlc presentation    menninger

Data Science A Practitioner’s Perspective

Mass Technology Leadership Council Panel Discussion

David Menninger, Formerly VP & Research Director, Ventana Research

[email protected]

©2012, Ventana Research