Upload
hans-hultgren
View
2.023
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Hans Hultgren Agile Data Warehousing presentation from the Array BI conference in Netherlands May 2011
Citation preview
25568 Genesee Trail Rd
Golden, Colorado 80401
(303) 526-0340
Data Vault Modeling and Approach DW2.0 and Unstructured Data Master Data Management and Metadata
Data Warehousing Agility
BI-Event May 17
Hans Hultgren
2011 Genesee Academy, LLC
25568 Genesee Trail Rd
Golden, Colorado 80401
©
© 2011 Genesee Academy, LLC
• Definition of agility
• Types of agility
• Discuss current approaches
• Hyper-agility
• Observations from the field
– Also topics of operational data warehousing, operational bi, agile project
management techniques, agility oriented tools, and operational integration
Welcome
Data Warehouse Agility
• Agility
– The overall measure of adaptability in terms of speed & scope.
– Overall performance in adapting to change.
NOTE: Not warehouse machine throughput, near real time (NRT)
processing, and operational DW performance…
Ability of the data warehouse to adapt to change
Versus
Performance of an existing (steady state) warehouse
Data Warehouse Agility
• Agility
– Agile in IT
• Agile Project Management
• Agile Software Development – Agile Manifesto
We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more.
• Agile Modeling Driven Design (AMDD)
• Test-Driven Design (TDD)
Data Warehouse Agility
• Agility in the Data Warehouse
– Agility in terms of Data Warehousing is related to the ability to build incrementally.
– The approach today is more concerned with the development of a business intelligence, data warehousing program – the capability to increment (adapt and grow).
– Since the business is always changing (new reporting needs, new business processes, new business units, new data sources, etc.) the EDW program is an ongoing initiative that needs to focus on adapting to these changes.
– Note: distinguish between operational integration and data warehousing.
Types of Data Warehouse Agility
Data Warehouse
New Source
New Attribute
New Mart
New Subject Area
Change DW
Types of Data Warehouse Agility
– Presentation Layer Agility – ability to adapt to new business requirements based on existing data elements in the EDW.
• Bottom Line: Ability to quickly and flexibly spin off new data marts
– New Data Source Agility – ability to assimilate new data sources into the EDW architecture from stage to CDW+ and existing data marts.
• Bottom Line: Ability to quickly adapt to new data sources * using existing structures
– New Attribute Agility – ability to absorb new attributes into the EDW architecture such that they can be loaded from the sources and integrate new attributes in terms of business context.
• Bottom Line: Ability to quickly incorporate new attributes in the EDW and apply business context to these attributes
– EDW Machine Agility – ability of the EDW machine (business and technical) to accommodate a new subject area from stage to mart.
• Bottom Line: EDW response time; a function of people, process & tools
– Changes in the DW – ability to absorb other changes such as integration logic, mappings, and business rules.
© 2011 Genesee Academy, LLC
Current
Presentation Layer Agility
– Presentation Layer Agility - ability to adapt to new business requirements based on existing data elements in the EDW.
• Bottom Line: Ability to quickly and flexibly spin off new data marts
– In this layer, agility is measured as a function of the time it takes to design, construct and deliver a new data mart.
– Variables in this layer include:
• Strength of the BI team to capture requirements and define data mart.
• Ability of ETL integration team to understand EDW model and mart.
• Strength and repeatability of ETL processes for sourcing the EDW.
• Strength and repeatability of ETL development, testing and delivery.
– Constraints:
• Dependent upon the existence of the data in the EDW.
• Dependent upon the level of business alignment of the data in the EDW.
© 2011 Genesee Academy, LLC
New Data Source Agility
– New Data Source Agility - ability to assimilate new data sources into the EDW architecture from stage to CDW+ and existing data marts.
• Bottom Line: Ability to quickly adapt to new data sources * using existing structures
– In this layer, agility is measured as a function of the time it takes to design, model, build and load data into the EDW from a new source.
– Variables in this layer include:
• Strength of the DW team to design the required model changes.
• Strength and repeatability of EDW development, testing and delivery.
• Ability of ETL integration team to understand new EDW model.
• Strength and repeatability of ETL processes for mapping and loading new source into the EDW.
– Constraints:
• Level of alignment of the new source data with the existing model.
• Dependent upon the level of business alignment with the data in the EDW
© 2011 Genesee Academy, LLC
New Attribute Technical Agility
– New Attribute (Technical) Agility - ability to absorb new attributes into the EDW architecture such that they can be loaded from the sources.
• Bottom Line: Ability to quickly incorporate new attributes in the EDW
– In this layer, agility is measured as a function of the time it takes to design, map, add and load a new attribute from a source.
– Variables in this layer include:
• Strength of the DW team to design the required model changes.
• Strength and repeatability of EDW development, testing and delivery.
• Ability of ETL integration team to understand new EDW attribute(s).
• Strength and repeatability of ETL processes for mapping and loading new source attributes into the EDW.
– Constraints:
• Level of alignment of the new attribute with the existing model.
• Dependent upon business context being defined.
© 2011 Genesee Academy, LLC
New Attribute Business Context
– New Attribute (Business) Context Agility - ability to integrate new attributes in terms of business context.
• Bottom Line: Ability to quickly apply business context to new attributes
– In this layer, agility is measured as a function of the time it takes to align business context with a new attribute from a source.
– Variables in this layer include:
• Ability of the BI / DW team to accurately assess the business context of the new source attribute.
– Constraints:
• Level of alignment of the new attribute with the existing model.
• Dependent upon the level of business alignment with the data in the EDW
© 2011 Genesee Academy, LLC
EDW Machine Agility
– EDW Machine Agility – ability of the EDW machine (business and technical) to accommodate a new subject area from stage to mart.
• Bottom Line: EDW response time; a function of people, process & tools
– In this layer, agility is measured as an overall function of the EDW machine to integrate a new subject area from stage to mart.
– Variables in this layer include:
• Strength of the BI / DW development team.
• Strength and repeatability of EDW development, testing and delivery.
• Strength and ability of ETL integration team.
• Strength and repeatability of all BI / DW processes.
– Constraints:
• Executive sponsorship of the EDW program.
• Well defined organizational structure for BIW, BICC, Architecture and Governance.
© 2011 Genesee Academy, LLC
CURRENT APPROACHES
DW Agility Current Approaches
– Incremental Data Warehouse Development
• Data Vault modeling, 2G, Anchor, etc.
– Agile BI Programs (People, Process, Models & Data)
• Methodologies (Centennium, Platon, etc.)
• Templates, Tools & Automation (Wherescape, etc.)
– Alternate & New Paradigms for the Agile DW
© 2011 Genesee Academy, LLC
DW Agility Components
– Absorb Changes
• Capture the Change
• Understand the Change
– A major constraint on agility is the required data warehouse modeling changes...
• So we can capture the data (create the buckets)
• So we can understand the data (context, meaning)
– Align to business keys, classify, describe (metadata)
© 2011 Genesee Academy, LLC
Data Warehouse Agility
• Why create a Data Model for the DW?
• Model Data versus Meaning?
– Separate the capture of data from the meaning?
– The structure of a table versus the semantics
– Business meaning versus data loading
– As XML is to EDI
HYPER AGILITY AND THE NAME VALUE PAIR (NVP)
Concept of Name/Value Pair
Cust_ID Lname Fname Add City State Zip Bdate
121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977
123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982
139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988
119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967
120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976
Moving to Name / Value Pair…
Each Value or ”data item” (record value for each attribute), is provided in a
List format paired with the corresponding Name or ”field name” (column
header) from the normalized table structure.
Concept of Name/Value Pair
Cust_ID Lname Fname Add City State Zip Bdate
121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977
Cust_ID Lname Fname Add City State Zip Bdate
123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982
Cust_ID Lname Fname Add City State Zip Bdate
139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988
Cust_ID Lname Fname Add City State Zip Bdate
119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967
Cust_ID Lname Fname Add City State Zip Bdate
120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976
Name Value
Moving to Name/Value Pair
Cust_ID Lname Fname Add City State Zip Bdate
121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977
123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982
139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988
119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967
120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976
V
A
L
U
E
N
A
M
E
Transpose
…with column headings…
Name/Value Pair Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
City NYC
State NY
Zip 98291
Bdate 10/9/1977
Cust_ID 123335
Lname Dahlgren
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
Cust_ID 139090
Lname Lundberg
Fname Scott
Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
City NYC
State NY
Zip 98291
Bdate 10/9/1977
Cust_ID 123335
Lname Dahlgren
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
Cust_ID 139090
Lname Lundberg
Fname Scott
The concept of the ”record” is effectively
lost in this transformation.
Now a RECORD is a set of Name/Value Pair
instances…
CON Lose resolution on the record.
Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
City NYC
State NY
Zip 98291
Bdate 10/9/1977
Cust_ID 123335
Lname Dahlgren
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
Cust_ID 139090
Lname Lundberg
Fname Scott
CON Attributes are not pre-defined.
Also, the attributes are not defined in
advance – we don’t know what to expect and
we can’t check for attribute meaning,
definitions, domain values or data types.
Name Value
Cust_ID 121202
Lname Lundquist
Fname Carl
Add 22 Bird St
City NYC
State NY
Zip 98291
Bdate 10/9/1977
CustClass Big
Cust_ID 123335
Lname Dahlgren
Fname Eva
Add 7 Academy
City Madison
State NJ
Zip 7940
Bdate 2/12/1982
CustClass Small
Cust_ID 139090
New attributes that are introduced into the
source feed are added instantly to the DW.
There is no modeling delay, no code
change, and no ETL impact…
PRO Absorb new attributes instantly.
Hyper Agility
• The solution to deal with these issues requires a further level of abstraction which in effect moves the persisted (historized, permanent, integrated) data store even further away from the business context that it is intended to represent.
• The DW model – the data model itself – is then not readable (not understandable). In fact ETL professionals will also find themselves further removed from this model. To the extent that a model is intuitive, self-descriptive, and aligned with business meaning, this approach takes a step in the other direction.
• Moving towards addressing these business driven agility requirements casues the model itself to move much further away (an order of magnitude away) from the business. So far as to become effectively a technical solution utilizing only abstract representations.
Hyper Agility
• The context – the meaning of the data – will in these cases need to be managed in a different way.
• This can include a form of persisted and historized metadata concerning the mappings and business rules. In effect a form of EAI within the DW.
• Or it might include a more traditional secondary DW layer.
DW AGILITY SUMMARY
• Consider specific Agility Requirements
• Classify Agility Types and consider Alternatives
• Distinguish between operational integration and DW
• Look to modeling techniques optimized for Data Warehouse
• Look at entire picture – people, process, models and data
• Consider specific methodologies, templates and tools
• Determine if hyper agility is a requirement
Questions?
www.GeneseeAcademy.com
CDVDM Certification Seminar
June 23-24
October 27-28
2011 Genesee Academy, LLC
25568 Genesee Trail Rd
Golden, Colorado 80401
USA +1 303.526.0340
Sverige 070 250 2102
©
28