20
Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing

Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

Embed Size (px)

Citation preview

Page 1: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop Into Business Intelligence & Data Warehousing

Page 2: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

2

TDWI would like to thank the following companies for sponsoring the 2013 TDWI Best Practices research report:

Integrating Hadoop into Business Intelligence and Data Warehousing

This presentation is based on the findings of that report.

Page 3: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

3

Today’s Agenda

• Definitions – What is Hadoop? Its components?

– Why care about Hadoop’s integration with BI & DW?

• State of Hadoop Integration – Benefits and Barriers

– Problems and Opportunities

• Hadoop Best Practices – Developer Productivity

– Specific Techniques

• Trends in Hadoop Integration

• Top Ten Priorities

– for Integrating Hadoop with BI & DW

PLEASE TWEET

@pRussom, #TDWI,

#Hadoop, #HDFS,

#Analytics, #BigData

Page 4: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

4

Ten Facts About Hadoop These bust Hadoop’s ten most common myths.

1. Hadoop consists of multiple products.

2. Hadoop is open source from the Apache Software Foundation

(apache.org), but available from vendors, too.

3. Hadoop is an ecosystem, not a single product.

4. The Hadoop Distributed File System (HDFS)

is a file system, not a database mgt system.

5. Hive QL resembles SQL, but isn’t standard SQL.

6. HDFS and MapReduce are related,

but don’t require each other.

7. MapReduce provides control for analytics,

not analytics per se.

8. Hadoop is about data diversity, not just data volume.

9. Hadoop complements a DW, rarely replaces one.

10. Hadoop enables many types of analytics,

not just Web analytics.

Page 5: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

5

Hadoop Technologies in

Use Today and Tomorrow • HDFS & a few add-ons are the most

common Hadoop products today

– MapReduce – Distributed processing of hand-coded logic, whether for analytics or other apps

– Hive – Projects structure onto Hadoop data, to query it with SQL-like language called HiveQL

– HBase – Simple, record-store database functions w/ HDFS’ data

• Some Hadoop tools are rare today:

– Chukwa, Ambari, Oozie, Hue, Flume

• Some will see aggressive growth:

– Mahout – Recommendation engine

– R – Language for analytics

– HCatalog – Metadata management

Page 6: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

6

Status of HDFS Implementations • HDFS is used by a small minority of

organizations today. – Only 10% of survey respondents

report having reached a production deployment.

• A whopping 73% of respondents expect to have HDFS in production.

– 10% are already in production, with another 63% coming.

– Only 27% of respondents say they will never put HDFS in production.

• HDFS usage will go from scarce to ensconced in three years.

– If survey respondents’ plans pan out, HDFS and other Hadoop products and technologies will be quite common in the near future

• HDFS will have a large impact on – BI, DW, DI, and analytics

– IT and data management in general

– How businesses leverage these

Page 7: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

7

Potential Benefits of Hadoop Integration In priority order, based on survey responses

• Hadoop’s primary application = big data source for analytics (71%)

– Other apps: data archiving (20%); schema-free data staging (19%); managing machine data from robots, sensors, meters, etc (17%)

• Hadoop-based analytics yields new facts about a business

– Information exploration and discovery (33%); exploratory analytics with big data (48%)

• Hadoop supports advanced forms of analytics, beyond OLAP

– Data mining, statistical analysis, complex SQL, and so on (68%); often coupled with data visualization (25%)

• HDFS complements a data warehouse (30%)

– Handles advanced analytics and multi-structured data

– So DW can stay focused on reporting, OLAP, performance mgt, etc.

• Extreme scalability (19%) on low-cost hardware and software (26%)

– So users can capture more data than before (24%)

Page 8: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

8

Challenges to Hadoop Integration In priority order, based on survey responses

• Inadequate staffing or skills for big data analytics (62%)

– HDFS and Hadoop tools (in their current state) demand a fair amount

of hand-coding in languages that the average BI professional does not

know well, namely Java, R, and Hive. Tools will get better.

• Tools for Hadoop are few and immature (28%)

– Hadoop tools lack adequate metadata management (25%); don’t

handle data in real time (22%); don’t support standard SQL

– Tools get better about these almost daily

• Changes required for successfully integrating Hadoop with BI/DW

– Adjustments to an existing user-defined DW architecture (27%)

– Best practices are emerging, so this point will become moot

• Good News – Scalability is not a barrier to Hadoop usage

– Only 8% anticipate problems scaling up HDFS & other Hadoop tools

Page 9: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

9

Integrating Hadoop with BI, DW, & Analytics

is an Opportunity, not a Problem

Page 10: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

10

Why care about Hadoop integration now?

Because it enables new, compelling apps. • Hadoop scales with file-based big data

– Imagine HDFS as shared

infrastructure, similar to SAN & NAS

– Imagine a huge, live archive

– Imagine content mgt on steroids

– Imagine low price per terabyte

• HDFS extends BI, DW, analytics…

– Managing multi-structured data

– Repository for detail source data

– Processing big data for analytics

– Advanced forms of analytics

– Data staging on steroids

Page 11: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

11

DW Architectures are growing

more distributed.

• System on the Side (SOS) or Edge System – A workload and its data that’s deployed on a

platform separate from the EDW

– Usually integrates with EDW, so not a silo

• Long-standing tradition of SOSs w/EDWs – Data marts, operational data stores (ODSs),

data staging areas

– Workload types: analytics, real-time, detailed source data, unstructured data

• Trend – As workloads increase in number, so do SOSs and Edge Systems

– Each analytic method (or even each analytic application) may need its own SOS

• Hadoop can enable some DW areas – Data staging, analytic sandboxes, detailed

source data, multi-structured data mgt

– MapReduce for analytic processing, HBase for record stores, Hive for unstruc queries…

• Core EDW remains a killer app for… – Standard reports, OLAP, performance mgt,

dashboards, real-time operational BI, etc…

Many Systems on the Side (SOSs)

or Edge Systems can surround a

central DW in a heavily distributed

architecture.

EDW

Federated

Data

Marts

Real

Time

ODS

Customer

Mart or

ODS

No-SQL

Database

Hadoop

Distributed

File Sys

Data

Staging

Area

Metrics for

Performance

Mgt

OLAP

Cubes

Multi-

dimensional

Data Models

Detailed

Source

Data

Analytic

Sand

Box

DW

Appliance

Columnar

DBMS

Map

Reduce

Data

Mining

Cache

Star or

Snowflake

Scheme

Page 12: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

12

STAT BITES

Organizations surveyed that have

HDFS in production…

• Have 12 HDFS clusters on average

– Median is 2

• Have 45 nodes per cluster on average

– Median is 12

• Manage a few TBs in HDFS today

– But expect a half PB within 3 years

• Load HDFS mostly via batch every 24 hrs

– So, not much streaming big data yet

Page 13: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

13

Most Needed Improvements in Hadoop Techs

• Security

– Needs to go beyond simple file-permission checks & become more granular

• Administration

– Need better tools for admin and deployment of clusters

• NameNode reliability

– Patches are available

• Latency issues

– Users want real-time (31%), fast queries (29%), streaming data (25%)

• Development tools

– For metadata, query design, less hand coding

Page 14: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

14

Job Titles for Hadoop Workers

• Architects

– For data/BI, apps, generic

• Developers

– For apps, data/BI

• Data Scientists

– This job title is slowly replacing

analyst titles

• Analysts

– Business, data, system

• Miscellaneous

– Ranges from engineers to

marketers

– Ever-broadening range of end

users who depend on data

Page 15: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

15

No Plans to Integrate Integrated Today; Will Stay Integrated Will Integrate Within 3 Years

GROUP 1 – BI, DW, DI, and Analytics

Commonly Integrated with Hadoop Today.

Will become a bit more common in the

future.

GROUP 3 – Data Management

Rarely Integrated with Hadoop Today.

Will soon experience aggressive adoption.

GROUP 4 – Machine Data

Half of Hadoop users don’t need these.

But adoption will grow anyway. 38%

35%

52%

52%

50%

40%

38%

42%

46%

44%

38%

40%

42%

44%

8%

13%

13%

19%

21%

25%

27%

38%

38%

40%

44%

44%

46%

46%

54%

52%

35%

29%

29%

35%

35%

21%

17%

17%

19%

17%

13%

10%

Sensors (thermometers, etc.)

Machinery (robots, vehicles)

Master data management tools

Data quality tools

Third-party data providers

Data marts

Operational applications

Data visualization tools

Analytic databases

Data integration tools

Web servers

Reporting tools

Data warehouses

Analytic tools

GROUP 2 – Applications

Trends – What Tool Types are Users Integrating with Hadoop?

SOURCE: TDWI Best Practices Survey of late 2012. Based on 48 respondents who have

experience with Hadoop. The chart is sorted by “Integrated Today,” in descending order.

Page 16: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

16

Top Ten Priorities for Hadoop Integration These are recommendations, requirements, or rules that can guide you.

1. Embrace the new tool and platform ecosystem of Hadoop.

2. Know the 10 myths of Hadoop and bust them daily.

3. Don’t be fooled: Hadoop isn’t free.

4. Get training (and maybe new staff) for new Hadoop.

5. Look for capabilities that make Hadoop data look relational.

6. Expect to wait a while for certain Hadoop functionality to mature.

7. Beware silo’d analytics, including Hadoop implementations.

8. Adjust your DW architecture to make place(s) for Hadoop.

9. Set up a proof of concept (POC), if you haven’t already.

10. Develop/apply a strategy for Hadoop integration with BI/DW.

Page 17: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

17

Download a free

copy of the report

• Download the report in a

PDF file at:

bit.ly/TDWI-BP-Rpt-List

• Feel free to distribute the

PDF file of any TDWI

Best Practices Report

Page 18: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

18

Want to learn more about Big Data & Analytics?

Take courses at the TDWI World Conference in Chicago!

• May 5-10, 2013

• Chicago, Illinois

• New courses on big

data, its mgt, its analysis

• Keynote addresses on

big data best practices

• Peer networking, meals,

social evenings, exhibits

• More information online

• Register online:

tdwi.org/CH2013

Page 19: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

19

Questions??

Page 20: Integrating Hadoop into Business Intelligence and …download.101com.com/pub/tdwi/Files/TDWI 040913.pdfIntegrating Hadoop Into Business Intelligence & Data Warehousing . 2 TDWI would

20

Contact Information

If you have further questions or comments:

Philip Russom, TDWI

[email protected]