37
Unlocking value from data with Data Integration Tools Phil Watt, Principal Integration Architect, HP Business Intelligence Solutions, EMEA 29/04/2010 1

Unlocking value from data with data integration tools

Embed Size (px)

DESCRIPTION

Every day, consumers, businesses and not for profit organisations generate increasing volumes of data. Initiatives such as Smart Meters in the utilities sector, along with user generated 'Web 2.0' data sources and High Energy Physics are causing an exponential growth in available data. Many business seek to take advantage of this data to analyse business performance or understand trends in customer or prospect behaviour. This analytical data often requires looking at very high volume, complex data sources. To bring this together in a format that is easy for analysts to understand and query is often very challenging - particularly for businesses when business requirements for this data change and a rapid response can mean the difference between profit and loss. This is just one of many areas that Data Integration tools and technologies are being applied - providing the 'plumbing' from a source system to a target system. DI tools are designed to offer an order of magnitude increase in developer productivity compared to using languages such as SQL, Java and .NET. This productivity allows developers to deliver more quickly, respond to changes faster or deliver more with fewer resources. According to Gartner, the market for such tools is estimated to grow to $2..7 billion by 2013, and is currently dominated by a handful of enterprise class vendors. However, a new crop of Data Integration tools is emerging, with a mix of open source and commercial offerings each that seek to challenge the dominance of the established players. This talk will discuss the history of this area of technology to help understand the conditions we see today, offer a view of the future of the market and describe how these tools can help drive value within today's business and academic communities. At the end of the talk, attendees will have an opportunity to use one of the commercial tools and make their own minds up about the value of such technology. Phil Watt is Principal Consultant at one of the world’s largest Systems Integrators, and has been working with high volume enterprise data for more than 17 years, building and designing data warehouses for customers in telco, media, utilities and financial services sectors. During the last 10 years, Phil has worked with a number of Data Integration technologies and advised many businesses about choosing a DI tool and applying best practices in their deployment.

Citation preview

Page 1: Unlocking value from data with data integration tools

1

Unlocking value from data with Data Integration Tools

Phil Watt, Principal Integration Architect, HP Business Intelligence Solutions, EMEA

29/04/2010

Page 2: Unlocking value from data with data integration tools

2

Outline Introduction Business drivers – why use a DI tool?

the challenge private sector public sector

Background and history DI tools timeline

Emerging features – and value Governance and Best Practice Selecting a tool for your situation Demonstration: Summary – followed by hands on session

29/04/2010

Page 3: Unlocking value from data with data integration tools

3

About me

29/04/2010

19 years big data 10 years Data Integration tools

High volume Complex business rules Governance and metadata management

Clients include BSkyB BT Barclays/Barclaycard Centrica Experian John Lewis Partnership Microsoft A major UK political party

Strong focus on pragmatic delivery Best practices Design patterns Tool evaluation, selection and implementation

Page 4: Unlocking value from data with data integration tools

4

Scope

29/04/2010

In scope• Data plumbing

• moving data around, and making it more useful to certain stakeholders

• Tools that help to• get data out of databases• get data into databases• transform data following some

business rules

Out of scope• Database technologies

• OLTP vs OLAP• Column versus row based storage• NoSQL movement (Hadoop,

Cassandra, etc.)• Information security

Page 5: Unlocking value from data with data integration tools

5

Glossary

29/04/2010

Data Integration

Data Governance

Master Data Managemen

t (MDM)

Data Dictionary

Data Lineage

Data Discovery/Data Profiling

Page 6: Unlocking value from data with data integration tools

6

The challenge

29/04/2010

Data growth• 60% annual global data growth through to 2012 (IDC research)• New sources of machine generated data will see this increase rapidly), e.g. Telemetry – new

Energy smart meters mean a x4000 growth in readings

Business drivers• Increased complexity of Business Requirements and Diverse sources, complex data• Consistent application of business terms across the enterprise• Time To Market (TTM) is a critical success factor• Reduce costs/improve productivity• Reduce power consumption

Collaboration• Onshore versus offshore delivery teams

Variable data quality• Data is often captured for one specific reason, then used or repurposed for different reasons

Cannot learn anything from data alone*• The model must inform the analysis• If the data does not support the model, then adjust the model

Page 7: Unlocking value from data with data integration tools

7

Data warehouse example sizes

29/04/2010

Yaho

o*eB

ay

Face

book

Wal

-mar

tLH

C

Natio

nal I

D Car

ds*

0

2

4

6

8

10

12

Petabytes

Page 9: Unlocking value from data with data integration tools

9

Benefits of DI tools

29/04/2010

Productivity improves dramatically

Vendors often claim an order of magnitude improvement•that is, coding activities alone

50% improveme

nt is realistic when

considering other non-

coding activities

Improve understanding of the overall businessusing built in metadata management tools•build data dictionaries more easliy

•support and drive data governance

Built in scalability

Parallel processing – component, pipeline and data

Page 10: Unlocking value from data with data integration tools

10

Extract, Transform and Load

29/04/2010

Extract Transform Load

e.g. CRM or ERP system Hub and spokeShared DW and ETL server

Page 11: Unlocking value from data with data integration tools

11

Extract, Load and Transform

29/04/2010

Extract Load Transform

e.g. CRM or ERP system Shared DW and ETL server

Page 12: Unlocking value from data with data integration tools

12

ETL versus ELT

29/04/2010

• Transformations often faster• No reliance on database

performance limitations• Typically scale better

ETL

• Avoids unloading large datasets for transformations and aggregations

• Best used with high performance analytical database systems such as:• Netezza, Neoview,

Oracle, Exadata Teradata, Greenplum, etc.ELT

Page 13: Unlocking value from data with data integration tools

13

Multiple sources and targets

29/04/2010

Page 14: Unlocking value from data with data integration tools

14

DI Tools Features Timeline1995 – 2005

29/04/2010

Parallelism

SCD

EAI/Message Queues

Connectors

Data Lineage

Config Mgmt

Business Metadata

CWM

Data Governance

MDM

1994 1996 1998 2000 2002 2004 2006

Page 15: Unlocking value from data with data integration tools

15

DI Tools Features Timeline from 2006

29/04/2010

SOAP/WSDL

CDC

Screen Scrapers

Test management

CEP

Push Down Processing

Semantic Metadata

Rich Dashboards

Analyst Tools

Self Service DI

2006 2007 2008 2009 2010 2011

Page 16: Unlocking value from data with data integration tools

16

Market features

29/04/2010

• Niche players acquired by established vendors• Watch out for product bloat

Industry consolidation

• Open Source versus pure commercial • Credit crunch• Established vendors often have complex pricing models

Price pressures / pricing complexity

• Increase productivity / Reduce time to market• Moving to self service for ‘purple people’

Focus on optimising workflow,

• Cool tech not enough for UK: must have strong business case

UK market very different to US

Page 17: Unlocking value from data with data integration tools

17

Gartner Magic Quadrant

Taken from research document, ‘Magic Quadrant for Data Integration Tools’

Authors: Ted Friedman, Mark A. Beyer, Eric Thoo

Full report available by registering at www.talend.com

29/04/2010

Image removed for web publication as agreed with Gartner

Page 18: Unlocking value from data with data integration tools

18

Magic Quadrant Disclaimer The Magic Quadrant is copyrighted November 25, 2009 by

Gartner, Inc. and is reused with permission. The Magic Quadrant is a graphical representation of a

marketplace at and for a specific time period. It depicts Gartner's analysis of how certain vendors measure

against criteria for that marketplace, as defined by Gartner. Gartner does not endorse any vendor, product or service

depicted in the Magic Quadrant, and does not advise technology users to select only those vendors placed in the "Leaders" quadrant.

The Magic Quadrant is intended solely as a research tool, and is not meant to be a specific guide to action.

Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

29/04/2010

Page 19: Unlocking value from data with data integration tools

19

Best practices

29/04/2010

Page 20: Unlocking value from data with data integration tools

20

Worst Practices

29/04/2010

Page 21: Unlocking value from data with data integration tools

21

Gartner advice

29/04/2010

Allocate minimum 20% to data

source analysis

Allocate 20 - 30% to mapping and

transformation rules

Avoid custom-coding or desktop

tools

Increase business user

involvement to improve success

Best Practices Mitigate Data Migration Risks andChallenges – May 2009

Page 22: Unlocking value from data with data integration tools

22

Governance and the data integration lifecycle

29/04/2010

Page 23: Unlocking value from data with data integration tools

23

Best practices

29/04/2010

Do: Spend 50% of project time doing discovery,

analysis, design Get business users involved early and often Use tools to accelerate and compress timescales Pay attention to governance and metadata

So you can: De-risk the project Reduce overall cost and timescales Achieve best possible quality

Page 24: Unlocking value from data with data integration tools

24

Selecting a tool for your situation

29/04/2010

2 stage process

Paper based

shortlist

On site Proof Of Concept (POC)

Understand the vendor

roadmapMatch to

your requiremen

ts

try to anticipate your needs

over the next 3-5

years

Do it yourself

or outsourc

e?

Is there an SI

ecosystem for the

vendors product?

Get help to choose

and upskill

Find a partner that

fits your culture and

has the right skills

Page 25: Unlocking value from data with data integration tools

25

Qualification matrix (PW )

29/04/2010

Page 26: Unlocking value from data with data integration tools

26

Demonstration

29/04/2010

Page 27: Unlocking value from data with data integration tools

27 29/04/2010

Page 28: Unlocking value from data with data integration tools

28 29/04/2010

Page 29: Unlocking value from data with data integration tools

29 29/04/2010

Page 30: Unlocking value from data with data integration tools

30 29/04/2010

Page 31: Unlocking value from data with data integration tools

31 29/04/2010

Page 32: Unlocking value from data with data integration tools

32 29/04/2010

Page 33: Unlocking value from data with data integration tools

33

Demo metrics

29/04/2010

Performance Hardware – dual core 2.0Ghz Intel Centrino, 2.5Gb

Ram Environment – WinXP, Oracle Express (DB) +DI tool

(Expressor 2.0) 3 data sources

Customers 155 MB 1000K records Today’s orders 112 MB 100K records Yesterday's orders 0.3 MB 3K

records Total data volume 267 MB 1.1M

records Execution time 72 seconds Throughput 3.7 MB/sec 41k/sec

Page 34: Unlocking value from data with data integration tools

34

Demo features

29/04/2010

Developer Productivity Graphical development Semantic Rationalisation and Re-usable Business

Rules

Demo represents a generic business scenario XML, message queues (MSMQ) , database

inputs/outputs, joins, aggregations and referential integrity management

Similar features to the ATG/Integrated Basket challenges?

Page 35: Unlocking value from data with data integration tools

35

Summary

29/04/2010

Business drivers – why use a DI tool? the challenge

private sector public sector

Background and history DI tools timeline

Emerging features – and value Governance and Best Practice Selecting a tool for your situation Demonstration:

Page 36: Unlocking value from data with data integration tools

36

Questions

29/04/2010

Page 37: Unlocking value from data with data integration tools

37

References

29/04/2010

Curt Monash http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/

Wired: http://www.wired.com/wired/archive/12.04/grid.html

Zdnet: http://blogs.zdnet.com/storage/?p=213 Professor Chris Bishop:

http://conferences.theiet.org/lectures/turing/ Gartner http://www.gartner.com LHC data (2007):

http://www-conf.slac.stanford.edu/xldb07/xldb_lhc.pdf