28
25568 Genesee Trail Rd Golden, Colorado 80401 (303) 526-0340 Data Vault Modeling and Approach DW2.0 and Unstructured Data Master Data Management and Metadata Data Warehousing Agility BI-Event May 17 Hans Hultgren 2011 Genesee Academy, LLC 25568 Genesee Trail Rd Golden, Colorado 80401 © © 2011 Genesee Academy, LLC

Data Warehouse Agility Array Conference2011

Embed Size (px)

DESCRIPTION

Hans Hultgren Agile Data Warehousing presentation from the Array BI conference in Netherlands May 2011

Citation preview

Page 1: Data Warehouse Agility Array Conference2011

25568 Genesee Trail Rd

Golden, Colorado 80401

(303) 526-0340

Data Vault Modeling and Approach DW2.0 and Unstructured Data Master Data Management and Metadata

Data Warehousing Agility

BI-Event May 17

Hans Hultgren

2011 Genesee Academy, LLC

25568 Genesee Trail Rd

Golden, Colorado 80401

©

© 2011 Genesee Academy, LLC

Page 2: Data Warehouse Agility Array Conference2011

• Definition of agility

• Types of agility

• Discuss current approaches

• Hyper-agility

• Observations from the field

– Also topics of operational data warehousing, operational bi, agile project

management techniques, agility oriented tools, and operational integration

Welcome

Page 3: Data Warehouse Agility Array Conference2011

Data Warehouse Agility

• Agility

– The overall measure of adaptability in terms of speed & scope.

– Overall performance in adapting to change.

NOTE: Not warehouse machine throughput, near real time (NRT)

processing, and operational DW performance…

Ability of the data warehouse to adapt to change

Versus

Performance of an existing (steady state) warehouse

Page 4: Data Warehouse Agility Array Conference2011

Data Warehouse Agility

• Agility

– Agile in IT

• Agile Project Management

• Agile Software Development – Agile Manifesto

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.

• Agile Modeling Driven Design (AMDD)

• Test-Driven Design (TDD)

Page 5: Data Warehouse Agility Array Conference2011

Data Warehouse Agility

• Agility in the Data Warehouse

– Agility in terms of Data Warehousing is related to the ability to build incrementally.

– The approach today is more concerned with the development of a business intelligence, data warehousing program – the capability to increment (adapt and grow).

– Since the business is always changing (new reporting needs, new business processes, new business units, new data sources, etc.) the EDW program is an ongoing initiative that needs to focus on adapting to these changes.

– Note: distinguish between operational integration and data warehousing.

Page 6: Data Warehouse Agility Array Conference2011

Types of Data Warehouse Agility

Data Warehouse

New Source

New Attribute

New Mart

New Subject Area

Change DW

Page 7: Data Warehouse Agility Array Conference2011

Types of Data Warehouse Agility

– Presentation Layer Agility – ability to adapt to new business requirements based on existing data elements in the EDW.

• Bottom Line: Ability to quickly and flexibly spin off new data marts

– New Data Source Agility – ability to assimilate new data sources into the EDW architecture from stage to CDW+ and existing data marts.

• Bottom Line: Ability to quickly adapt to new data sources * using existing structures

– New Attribute Agility – ability to absorb new attributes into the EDW architecture such that they can be loaded from the sources and integrate new attributes in terms of business context.

• Bottom Line: Ability to quickly incorporate new attributes in the EDW and apply business context to these attributes

– EDW Machine Agility – ability of the EDW machine (business and technical) to accommodate a new subject area from stage to mart.

• Bottom Line: EDW response time; a function of people, process & tools

– Changes in the DW – ability to absorb other changes such as integration logic, mappings, and business rules.

© 2011 Genesee Academy, LLC

Current

Page 8: Data Warehouse Agility Array Conference2011

Presentation Layer Agility

– Presentation Layer Agility - ability to adapt to new business requirements based on existing data elements in the EDW.

• Bottom Line: Ability to quickly and flexibly spin off new data marts

– In this layer, agility is measured as a function of the time it takes to design, construct and deliver a new data mart.

– Variables in this layer include:

• Strength of the BI team to capture requirements and define data mart.

• Ability of ETL integration team to understand EDW model and mart.

• Strength and repeatability of ETL processes for sourcing the EDW.

• Strength and repeatability of ETL development, testing and delivery.

– Constraints:

• Dependent upon the existence of the data in the EDW.

• Dependent upon the level of business alignment of the data in the EDW.

© 2011 Genesee Academy, LLC

Page 9: Data Warehouse Agility Array Conference2011

New Data Source Agility

– New Data Source Agility - ability to assimilate new data sources into the EDW architecture from stage to CDW+ and existing data marts.

• Bottom Line: Ability to quickly adapt to new data sources * using existing structures

– In this layer, agility is measured as a function of the time it takes to design, model, build and load data into the EDW from a new source.

– Variables in this layer include:

• Strength of the DW team to design the required model changes.

• Strength and repeatability of EDW development, testing and delivery.

• Ability of ETL integration team to understand new EDW model.

• Strength and repeatability of ETL processes for mapping and loading new source into the EDW.

– Constraints:

• Level of alignment of the new source data with the existing model.

• Dependent upon the level of business alignment with the data in the EDW

© 2011 Genesee Academy, LLC

Page 10: Data Warehouse Agility Array Conference2011

New Attribute Technical Agility

– New Attribute (Technical) Agility - ability to absorb new attributes into the EDW architecture such that they can be loaded from the sources.

• Bottom Line: Ability to quickly incorporate new attributes in the EDW

– In this layer, agility is measured as a function of the time it takes to design, map, add and load a new attribute from a source.

– Variables in this layer include:

• Strength of the DW team to design the required model changes.

• Strength and repeatability of EDW development, testing and delivery.

• Ability of ETL integration team to understand new EDW attribute(s).

• Strength and repeatability of ETL processes for mapping and loading new source attributes into the EDW.

– Constraints:

• Level of alignment of the new attribute with the existing model.

• Dependent upon business context being defined.

© 2011 Genesee Academy, LLC

Page 11: Data Warehouse Agility Array Conference2011

New Attribute Business Context

– New Attribute (Business) Context Agility - ability to integrate new attributes in terms of business context.

• Bottom Line: Ability to quickly apply business context to new attributes

– In this layer, agility is measured as a function of the time it takes to align business context with a new attribute from a source.

– Variables in this layer include:

• Ability of the BI / DW team to accurately assess the business context of the new source attribute.

– Constraints:

• Level of alignment of the new attribute with the existing model.

• Dependent upon the level of business alignment with the data in the EDW

© 2011 Genesee Academy, LLC

Page 12: Data Warehouse Agility Array Conference2011

EDW Machine Agility

– EDW Machine Agility – ability of the EDW machine (business and technical) to accommodate a new subject area from stage to mart.

• Bottom Line: EDW response time; a function of people, process & tools

– In this layer, agility is measured as an overall function of the EDW machine to integrate a new subject area from stage to mart.

– Variables in this layer include:

• Strength of the BI / DW development team.

• Strength and repeatability of EDW development, testing and delivery.

• Strength and ability of ETL integration team.

• Strength and repeatability of all BI / DW processes.

– Constraints:

• Executive sponsorship of the EDW program.

• Well defined organizational structure for BIW, BICC, Architecture and Governance.

© 2011 Genesee Academy, LLC

Page 13: Data Warehouse Agility Array Conference2011

CURRENT APPROACHES

Page 14: Data Warehouse Agility Array Conference2011

DW Agility Current Approaches

– Incremental Data Warehouse Development

• Data Vault modeling, 2G, Anchor, etc.

– Agile BI Programs (People, Process, Models & Data)

• Methodologies (Centennium, Platon, etc.)

• Templates, Tools & Automation (Wherescape, etc.)

– Alternate & New Paradigms for the Agile DW

© 2011 Genesee Academy, LLC

Page 15: Data Warehouse Agility Array Conference2011

DW Agility Components

– Absorb Changes

• Capture the Change

• Understand the Change

– A major constraint on agility is the required data warehouse modeling changes...

• So we can capture the data (create the buckets)

• So we can understand the data (context, meaning)

– Align to business keys, classify, describe (metadata)

© 2011 Genesee Academy, LLC

Page 16: Data Warehouse Agility Array Conference2011

Data Warehouse Agility

• Why create a Data Model for the DW?

• Model Data versus Meaning?

– Separate the capture of data from the meaning?

– The structure of a table versus the semantics

– Business meaning versus data loading

– As XML is to EDI

Page 17: Data Warehouse Agility Array Conference2011

HYPER AGILITY AND THE NAME VALUE PAIR (NVP)

Page 18: Data Warehouse Agility Array Conference2011

Concept of Name/Value Pair

Cust_ID Lname Fname Add City State Zip Bdate

121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977

123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982

139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988

119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967

120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976

Moving to Name / Value Pair…

Each Value or ”data item” (record value for each attribute), is provided in a

List format paired with the corresponding Name or ”field name” (column

header) from the normalized table structure.

Page 19: Data Warehouse Agility Array Conference2011

Concept of Name/Value Pair

Cust_ID Lname Fname Add City State Zip Bdate

121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977

Cust_ID Lname Fname Add City State Zip Bdate

123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982

Cust_ID Lname Fname Add City State Zip Bdate

139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988

Cust_ID Lname Fname Add City State Zip Bdate

119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967

Cust_ID Lname Fname Add City State Zip Bdate

120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976

Name Value

Page 20: Data Warehouse Agility Array Conference2011

Moving to Name/Value Pair

Cust_ID Lname Fname Add City State Zip Bdate

121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977

123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982

139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988

119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967

120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976

V

A

L

U

E

N

A

M

E

Transpose

…with column headings…

Page 21: Data Warehouse Agility Array Conference2011

Name/Value Pair Name Value

Cust_ID 121202

Lname Lundquist

Fname Carl

Add 22 Bird St

City NYC

State NY

Zip 98291

Bdate 10/9/1977

Cust_ID 123335

Lname Dahlgren

Fname Eva

Add 7 Academy

City Madison

State NJ

Zip 7940

Bdate 2/12/1982

Cust_ID 139090

Lname Lundberg

Fname Scott

Page 22: Data Warehouse Agility Array Conference2011

Name Value

Cust_ID 121202

Lname Lundquist

Fname Carl

Add 22 Bird St

City NYC

State NY

Zip 98291

Bdate 10/9/1977

Cust_ID 123335

Lname Dahlgren

Fname Eva

Add 7 Academy

City Madison

State NJ

Zip 7940

Bdate 2/12/1982

Cust_ID 139090

Lname Lundberg

Fname Scott

The concept of the ”record” is effectively

lost in this transformation.

Now a RECORD is a set of Name/Value Pair

instances…

CON Lose resolution on the record.

Page 23: Data Warehouse Agility Array Conference2011

Name Value

Cust_ID 121202

Lname Lundquist

Fname Carl

Add 22 Bird St

City NYC

State NY

Zip 98291

Bdate 10/9/1977

Cust_ID 123335

Lname Dahlgren

Fname Eva

Add 7 Academy

City Madison

State NJ

Zip 7940

Bdate 2/12/1982

Cust_ID 139090

Lname Lundberg

Fname Scott

CON Attributes are not pre-defined.

Also, the attributes are not defined in

advance – we don’t know what to expect and

we can’t check for attribute meaning,

definitions, domain values or data types.

Page 24: Data Warehouse Agility Array Conference2011

Name Value

Cust_ID 121202

Lname Lundquist

Fname Carl

Add 22 Bird St

City NYC

State NY

Zip 98291

Bdate 10/9/1977

CustClass Big

Cust_ID 123335

Lname Dahlgren

Fname Eva

Add 7 Academy

City Madison

State NJ

Zip 7940

Bdate 2/12/1982

CustClass Small

Cust_ID 139090

New attributes that are introduced into the

source feed are added instantly to the DW.

There is no modeling delay, no code

change, and no ETL impact…

PRO Absorb new attributes instantly.

Page 25: Data Warehouse Agility Array Conference2011

Hyper Agility

• The solution to deal with these issues requires a further level of abstraction which in effect moves the persisted (historized, permanent, integrated) data store even further away from the business context that it is intended to represent.

• The DW model – the data model itself – is then not readable (not understandable). In fact ETL professionals will also find themselves further removed from this model. To the extent that a model is intuitive, self-descriptive, and aligned with business meaning, this approach takes a step in the other direction.

• Moving towards addressing these business driven agility requirements casues the model itself to move much further away (an order of magnitude away) from the business. So far as to become effectively a technical solution utilizing only abstract representations.

Page 26: Data Warehouse Agility Array Conference2011

Hyper Agility

• The context – the meaning of the data – will in these cases need to be managed in a different way.

• This can include a form of persisted and historized metadata concerning the mappings and business rules. In effect a form of EAI within the DW.

• Or it might include a more traditional secondary DW layer.

Page 27: Data Warehouse Agility Array Conference2011

DW AGILITY SUMMARY

• Consider specific Agility Requirements

• Classify Agility Types and consider Alternatives

• Distinguish between operational integration and DW

• Look to modeling techniques optimized for Data Warehouse

• Look at entire picture – people, process, models and data

• Consider specific methodologies, templates and tools

• Determine if hyper agility is a requirement

Page 28: Data Warehouse Agility Array Conference2011

Questions?

www.GeneseeAcademy.com

CDVDM Certification Seminar

June 23-24

October 27-28

2011 Genesee Academy, LLC

25568 Genesee Trail Rd

Golden, Colorado 80401

[email protected]

USA +1 303.526.0340

Sverige 070 250 2102

©

28