13
Karl Rexer, PhD President Rexer Analytics www.RexerAnalytics.com 2010 Data Miner Survey Highlights … The Views of 735 Data Miners Predictive Analytics World Washington, DC October 2010

2010 Data Miner Survey Highlights - Ning

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Karl Rexer, PhD President Rexer Analytics

www.RexerAnalytics.com

2010 Data Miner Survey Highlights … The Views of 735 Data Miners

Predictive Analytics World Washington, DC October 2010

© 2010 Rexer Analytics 2

2010 Data Miner Survey: Overview

•  Fourth annual survey

•  47 questions

•  10,000+ invitations emailed plus newsgroups, vendors, and snowball referrals

•  Respondents: 735 data miners from 60 countries

33%

31%

12%

5%

19%

Corporate

Consultants

Note: Data from tool vendors was excluded from many analyses

Academics

NGO / Gov’t

Vendors

45%

36%

12% North America

• USA 40% • Canada 4%

Europe • Germany 7% • UK 5% • France 4% • Poland 4%

Asia Pacific •  India 4% • Australia 3% • China 2%

Central & South America (4%)

• Columbia 2% • Brazil 1%

Middle East & Africa (3%) •  Israel 1% • Turkey 1%

© 2010 Rexer Analytics 3

10%

10%

10%

11%

13%

13%

14%

15%

15%

25%

29%

41%

0% 10% 20% 30% 40% 50%

Government

Internet-based

Manufacturing

Medical

Technology

Pharmaceutical

Retail

Telecommunications

Insurance

Academic

Financial

CRM / Marketing

Fields Applying Data Mining

Question: In what fields do you TYPICALLY apply data mining? (Select all that apply)

•  CRM / Marketing, Financial and Academic are the most commonly reported fields. This has been consistent since the 2007 survey. –  Many data miners work in several fields.

© 2010 Rexer Analytics 4

8%

9%

9%

11%

12%

13%

14%

16%

21%

21%

22%

25%

26%

27%

31%

32%

60%

68%

69%

0% 10% 20% 30% 40% 50% 60% 70%

MARS Uplift Modeling

Link Analysis Genetic Algorithms

Social Network Analysis Rule Induction

Survival Analysis Anomoly Detection

Bayesian Support Vector

Machines

Ensemble Models Association Rules

Text Mining Factor Analysis

Neural Nets Time Series

Cluster Analysis Regression

Decision Trees

Data Mining Algorithms •  Decision trees, regression, and cluster analysis continue to form a triad of core

algorithms for most data miners. This is very consistent, year to year. •  However, a wide variety of algorithms are being used.

Question: What algorithms/analytic methods do you TYPICALLY use? (Select all that apply)

Corporate Consultants Academic NGO / Gov’t

10% 12% 4% 5%

Ensemble Models

Uplift Modeling

Corporate Consultants Academic NGO / Gov’t

21% 27% 20% 18%

© 2010 Rexer Analytics 5

Text Mining

STATISTICA Text Miner 19% IBM SPSS Modeler 17% SAS Text Miner 9% IBM SPSS Text Analytics 7% Rapid Miner 6% Provalis Wordstat 2% GATE 2% KXEN 2% Oracle Text or ODM 1% Megaputer Text Analyst 1% Autonomy 1% Other 35% Text Miners

• About a third of data miners currently incorporate text mining into their analyses, and another third plan to.

Software Used

Plan to Start Text Mining

No Plans to Conduct Text

Mining

0% 20% 40% 60%

The focus of our text mining is to extract key themes

(sentiment analysis)

We use text fields as inputs / predictors in a larger model

We use text mining as part of social network analyses

30%

34%

36%

55%

59%

21%

© 2010 Rexer Analytics 6

35%

24%

49%

39%

26%

18%

7%

0% 60%

Computing Environments • A lot of data mining happens on desktop and laptop computers. • Frequently the data and processing is local

(not on servers, mainframe or cloud). • Only a small minority of data mining is on the cloud.

Question: What are the computing environments/platforms on which data mining/analytics occurs at your company/organization? (Check all that apply)

Cor

pora

te

Con

sulta

nt

Aca

dem

ic

NG

O /

Gov

’t

Vend

or

5% 10% 7% 3% 14%

20% 16% 14% 32% 26%

28% 30% 19% 29% 45%

48% 36% 25% 47% 39%

43% 49% 58% 58% 35%

29% 24% 15% 32% 37%

28% 36% 46% 42% 44%

Cloud Computing

Centralized Mainframe/Server

Local Server

Desktop PC/Workstation (with data & processing on server, mainframe or cloud)

Desktop PC/Workstation (with data & processing locally)

Laptop PC (with data & processing on server, mainframe or cloud)

Laptop PC (with data & processing locally)

Overall

© 2010 Rexer Analytics 7

Analytic Capability & Data Quality

•  Analytic capability: –  There’s room to improve if we’re going to “Compete on Analytics”.

Data Quality Question: How do you rate the quality of data available for analysis at your company/organization?

•  Data quality: –  48% rate it “strong” or “very strong” (same as last year) –  16% rate it “poor” or “very poor” (13% last year)

Analytic Capability Question: How do you rate the analytic capabilities of your company/organization?

13% 35% 30% 20%

8% 40% 35% 13%

© 2010 Rexer Analytics 8

Overcoming Challenges: Best Practices

•  Top challenges facing data miners: –  Dirty data: #1 challenge every year, 2007-2010 –  Explaining data mining to others: always in the top 4 challenges,

2007-2010 –  Difficult access to data: always in the top 3 challenges, 2007-2010

•  This year survey respondents provided “Best Practices” for overcoming these challenges. –  E.g., Dirty Data: Use anomaly detection to flag records to put before

subject matter experts. –  E.g., Dirty Data: All projects begin with low-level data reports showing

counts of records, verification of keys (uniqueness, widows/orphans), and distributions of field contents. These reports are echoed back to the data content experts.

–  See the list of Best Practices at www.RexerAnalytics.com in early November.

© 2010 Rexer Analytics 9

Data Mining Software Survey Questions:

•  What Data mining/analytic tools did you use in 2009? (rate each as “never”, “occasionally”, or “frequently”)

•  What one Data Mining software package do you use most frequently?

Overall Corporate Consultants Academics NGO / Gov’t

•  The average data miner reports using 4.6 software tools. •  R is used by the most data miners (43%). •  STATISTICA is the primary data mining tool chosen most often (18%).

© 2010 Rexer Analytics 10

Satisfaction with Data Mining Tools

Question: Please rate your overall satisfaction with your primary Data Mining software package.

2010 2009

Sample size < 20

•  STATISTICA received the highest satisfaction ratings. Consistent with the 2009 findings, R and SPSS Modeler users are also quite satisfied.

–  About 80% of STATISTICA and R users also report that they are extremely likely to stay with these primary tools over the next 3 years. This is reported by only 42-45% of SAS, SPSS Statistics, and SAS-EM users; and only 18% of Weka users.

Continued Use question (not graphed): What is the likelihood that you will continue to use this tool as your primary Data Mining software package over the next 3 years?

© 2010 Rexer Analytics 11

Data Mining and the Economy

Question: How will the number of data mining projects your organization conducts in 2010 compare to what has been typical in the past few years?

There is a strong market for data mining: •  73% of data miners foresee increases in the number of data mining projects. •  Offshoring of data mining is also increasing: It is reported by 14% of data

miners this year (8% last year).

Offshoring Question (not graphed): Has your company moved any data mining or other analytics to another country to take advantage of lower wages in the destination country?

Number of Data Mining Projects in 2010

© 2010 Rexer Analytics 12

Number of respondents

“What do you envision as the primary future trends in data mining?” (open-ended survey question)

Future Trends in Data Mining

50

32

32

26

15

15

12

11

0 10 20 30 40 50 60

Growth in Data Mining Adoption

Text Mining

Social Network Analysis

Automation

Cloud Computing

Data Visualization

Tools Get Easier to Use

Scaling to Bigger Data

© 2010 Rexer Analytics 13

How to Get More Information

•  Questions? – Talk with me at PAW –  Call or email me if you don’t see me in the hallways

•  Copy of these slides – Available now

•  2010 Data Miner Survey Summary Report (Free) –  Available in early November –  Available at PAW website or email me

•  Best Practices for overcoming data mining challenges –  Available in early November at

www.RexerAnalytics.com Karl Rexer, PhD [email protected] www.RexerAnalytics.com 617-233-8185