28
Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1 © Jeffrey A. Hoffer

Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Embed Size (px)

Citation preview

Page 1: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Using Pre-Packaged Data Models to Support Rapid BI Development

David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton

1© Jeffrey A. Hoffer

Page 2: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Overall Agenda

• Overview of iLDMs

• Learnings from case studies of iLDM application

• Workshop on using iLDMs in your organization

© Jeffrey A. Hoffer 2

Page 3: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

What We’ve Learned From Contrasting Case Studies

Jeffrey HofferUniversity of Dayton

3© Jeffrey A. Hoffer

Page 4: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Agenda

• Learning Resources I’ve Used• Traditional Database Development Processes– Life cycle– Prototyping

• Case Studies of Rapid Development with LDMs– Overall process– Data mapping–More general rapid BI environment that made it

feasible and successful.

4© Jeffrey A. Hoffer

Page 5: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Learning Resources• On www.teradata.com

– Search on “Hoberman” or “logical data models”, especially see• “Leveraging the Industry Logical Data Model as Your Enterprise Data Model”

– Search on “agile business intelligence”• On www.beyenetwork.com

– See Dan Linstedt blog, and “The 2-Month Data Model” by Bill Inmon– Search on “logical data models” or “industry data model”

• On www.tdwi.org– In White Papers, search on “agile business intelligence” or “industry data

model”• Hay, D.C. 1996, Data Model Patterns: Conventions of Thought, and 2006,

Data Model Patterns: A Metadata Map• Silverston, L. various dates, several volumes of The Data Model Resource

Book and various articles from 2002 in DM Review• Moss, Larissa, President, Method Focus – see articles, seminars on agile BI

• And, of course, there is Modern Database Management.

5© Jeffrey A. Hoffer

Page 6: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Traditional (Invented Here) Database Development Process

Conceptual Data Modeling: detailed metadata

Conceptual/Enterprise Data Modeling: scope, ISA, EDM

Logical Data Modeling: integrate, normalize, integrity, security

Database Definition: schema, documentation, installation, training

Tuning: integrate new requirements, improve, fix (mini cycles of Analysis, Design, Implementation)

Physical /technical database design: technology design, performance

6© Jeffrey A. Hoffer

Page 7: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Database Development with Prototyping (A Learning Together Approach)

Identify Need

Develop Initial

Prototype

Revise & Enhance

Prototype

Implement & Use

Prototype

Convert to Operational

Form

Conceptual Data Modeling: preliminary CDM

Initial requirements

Logical Database Design: detailed requirements

Physical Database Design: new database contents , structures, programs

Database Implementation: coding, integrate contents

Database Maintenance: evaluate and enhance

Database Maintenance: tune, improve for performance New

requirements

Working prototype

Deficiencies

Next version

If prototype is inefficient

7© Jeffrey A. Hoffer

Page 8: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Two Case Studies: LDMs Work in a Variety of Situations

• Case Study A– On-line retailer– Young– Highly competitive, rapidly

changing– Information-driven– Dynamic, immersed

leadership team– Turbulent period, needed

solution quickly– Business analysts

embedded in units– LDM as “golden model”

• Case Study B– Technology provider– Mature– Innovative, detail-oriented,

comprehensive– Highly analytical– Decentralized leadership

team– Constant pressure and

environmental changes– Diversified structure for

business analysts– Internal systems as

“golden model”

8© Jeffrey A. Hoffer

Page 9: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Data Modeling Process Changes for Rapid BI: Case Study A

• Background:– Hoffer, Watson, and Wixom– Large, on-line retailer

• >300 hourly/daily reports • >400 Business Object IDs• also SAS, on a Teradata EDW platform

– Critical need to get a BI environment up before the next Christmas buying season (core needs of marketing, merchandising, and auction parts of business met in 9 months)

– Limited internal resources due, in great measure, to simultaneous implementation of a new ERP operational system.

9© Jeffrey A. Hoffer

Page 10: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Overview of Results:Case Study A

• LDM was about 80% “right” before customization (used several LDMs for different industries represented by company’s offerings)– Cost of an LDM is about one DBA for one year– Saved time, improved quality, less re-work

• LDM did not allow them to develop new environment piecemeal – needed quick start with a solid foundation for future of rapidly changing business – enterprise perspective from beginning

• Collaboration of external consultants – 3 for one month, 2 for another 5 months, 1 for another 6 months and

internal data analysts – Key for short- and long-term success was to involve internal data

analysts, who do evolution of data modeling• “Acquisition of the LDMs was one of the key strategic things (we)

did to gain quick results and long-term success with data warehousing and BI.” – DW Director.

10© Jeffrey A. Hoffer

Page 11: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Overview of Results:Case Study B

• Why did they use LDMs?– Use data consistently throughout BI applications– Adhere to government regulations– Understand data across organization using common

names• Supplier = Vendor, Commodity = Material

– Comprehend transformations (part of LDMs)• Can combine / use for analytics data we didn’t know could be

analyzed together– Allows for normalized data structures to be traversed

from any where to any where without introducing reporting anomalies

– Allows for quicker building of dimensional star schemas (dependant data marts) because of ease to negotiate data structures.

© Jeffrey A. Hoffer 11

Page 12: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Database Development with LDMs

Identify Need

Evaluate Alternative Packages

Customize LDM

Evolve for New Needs

• 6 months from need to first application• 2 weeks for data model• 90 days for first application• 9 months from need to all phase I applications

Initial Infrastructure

Applications & Infrastructure Evolution

2-week release packages

Evolve for New NeedsEvolve for

New NeedsApplication package development overlaps 12© Jeffrey A. Hoffer

Page 13: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Observations About Customization

• Identify entities, attributes, relationships in the LDM – those you need for the future– Concentrate on details for those you

need first– Create a phased roadmap (can use

entity clustering to show this – functional decomposition for data)

• Rename data to local terms• Refine LDM to local business rules• Map LDM data to current databases

(e.g., to design migration plans and load processes)

13© Jeffrey A. Hoffer

Customize LDM

Page 14: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

What Is Mapping?• The process of relating each LDM data element with a

source– Do we need it? (now, later?)– from either current systems or

LDM– Where do we get it?– When do we get it?– How do we define it and what do we name it?– Does it need to be transformed? Or do we need more atomic

source?– Does source system need to be “improved”?

• It is NOT about resolving conflicts between source systems or fixing source systems

• It is NOT about designing/writing the ETL.

© Jeffrey A. Hoffer 14

Page 15: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Key Points About Mapping• Some elements will be missing in LDM and

current databases – these become obvious because of LDM– Are mismatches really needed?– Avoid temptation to always accept current databases

as tie-breaker– Encourage “thinking of the possibilities” from

elements in LDM not in current databases– Current databases are often poorly documented,

which makes process difficult– Watch for duplicate, inconsistent entries of the “same

data” in different databases.15© Jeffrey A. Hoffer

Page 16: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

• The LDM is comprehensive in business rules (e.g., cardinalities and generalization) and can be complex; thus it is flexible to change– Do you really need all this complexity? Do we need

something more restrictive?– Does comprehensiveness suggest opportunities?– “Smartly tailor” LDM to organization– LDM updates can react to changing standards and

regulations– Current environment likely has different standards

and regulations for different sources.

Key Points About Mapping

16© Jeffrey A. Hoffer

Page 17: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

• Engage users and managers early because you have a validated prototype data model from the start – the LDM provides a visual, comprehensive checklist of possible questions– “Would we ever have a customer order with more

than one customer?”– “Might an employee also be a customer?”– Give special attention to elements of LDMs that SME’s

did not mention in interviews – “Will we ever go in that direction?” – a basis for impertinence – it’s all about the questions you ask!

Key Points About Mapping

17© Jeffrey A. Hoffer

Page 18: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

• Mapping is critical – can’t afford to do a bad job

• Mapping projects are great student projects in a capstone course – requires integration of data and systems knowledge and skills, with understanding of differences across platforms, ETL, timing, etc.

Key Points About Mapping

18© Jeffrey A. Hoffer

Page 19: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

More on Customization• Even with good mapping, do data profiling to

identify overloading, obsolescence, empty columns, hidden (undocumented) requirements, outliers – the proof is in the data– Understand reasons for inconsistencies

• Poorly designed databases• Accuracy of current data, which you do not want to migrate

to new database for analytics – a time for data cleansing• Investigate reasons for missing data for mapped

attributes– Application software errors, human data entry errors,

optional data (subtypes).

19© Jeffrey A. Hoffer

Page 20: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Data Profiling a Must

• Profiling = statistical analysis to uncover hidden patterns and flaws

• Look for outliers• Sorting by date can reveal overloading and patterns for

empty values, or when data moved columns over time, or shifts in data

• Can match shifts in data to major system changes• Empty columns can imply entity subtypes• Wide tables can imply denormalization, which can

encourage erroneous data• Can be used to identify flaws in current systems, need for

cleanup efforts, and need to improve database design.

© Jeffrey A. Hoffer 20

Page 21: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

A Chance to Verify Business Rules• Verify each business rule (in the LDM) for your

organization– Review metadata (names, definitions, data types,

formats, lengths, cardinality, etc.) with the best SMEs

– Business rules dictate transformations of operational data into analytical database

– Different operational systems may = different business rules.

© Jeffrey A. Hoffer 21

Page 22: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Observations About Evolve• As new business needs arise, conduct mini

customization projects to extend current implementation from LDM with a different focus (the LDM implementation easily scales as an architectural foundation for agile development)

• Dynamic businesses will yield extensions to LDMs, so vendors like feedback

• LDMs provide the flexibility and speed to react to (to anticipate) “new” needs

• BI “systems” are not complex (although the infrastructure is), which is why LDMs are valuable and agile development works.

22© Jeffrey A. Hoffer

Page 23: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

PMI View of Agile Project Management

Source: Sliger, M. “A Project Manager’s Guide to Going Agile”, Rally Software Development Corp., © 2006

23© Jeffrey A. Hoffer

Page 24: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Typical Evolve Scenario

24© Jeffrey A. Hoffer

Page 25: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

An Environment Conducive to Rapid BI: Case Study A

• Organizational Climate– Compelled to do rapid development of infrastructure

and applications• Business moves quickly – dot.com or “swarming” mentality –

when leadership turns their focus to it• Attitude of “we’ve defined it, let’s get it done, then move

on” – perfection not critical– Leaders see firm as an “information company”

• An interaction of technology and retail• Using technology and information well is a competitive

advantage• Needed a drastic change to jump start the transformation –

the LDMs– LDM also overcomes the hazards of swarming – lack of

architecture/plan.25© Jeffrey A. Hoffer

Page 26: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

• LDM and Organizational Fit– LDMs essentially modify the agile approach initially by

making the business define core requirements up front – infrastructure – but still supports iterative evolution• A balance to “swarming”

– Leadership team sets priorities and is willing to evolve in phases (normal agile chunk approach)• Synergistic initiative gets greatest attention• LDM supports iteration, which builds trust• Incremental changes (2-week chunks of work) shows

continuing commitment (rather than one time, big bang change), which also builds trust.

An Environment Conducive to Rapid BI: Case Study A

26© Jeffrey A. Hoffer

Page 27: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

• Need tech and business savvy people– Business analysts embedded in each business area

(removes bureaucracy), and report to both VP of business area and head of BI applications, which creates deep knowledge about business and facilitates rapid development

– Business managers with strong technical aptitude and skills – a hiring priority..

An Environment Conducive to Rapid BI: Case Study A

27© Jeffrey A. Hoffer

Page 28: Using Pre-Packaged Data Models to Support Rapid BI Development David Schoeff, Teradata Corp. Jeff Hoffer, University of Dayton 1© Jeffrey A. Hoffer

Workshop Questions

• To start, do you have any questions about the iLDM?• How does your ERD match up with iLDM?• What difficulties do you have merging the iLDM with your

ERD?• In your environment, which model trumps the other and

why?• Is the iLDM “more” than you need? Why? How deal with

that?• Are there things missing in iLDM that you need in your

environment?• What kinds of resistance would you get for using an iLDM?• How would you make use of an iLDM in your environment?

© Jeffrey A. Hoffer 28