40
A company of Daimler AG LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT MODELING ANDREAS BUCKENHOFER, DAIMLER TSS

LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

  • Upload
    others

  • View
    17

  • Download
    8

Embed Size (px)

Citation preview

Page 1: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

A company of Daimler AG

LECTURE @DHBW: DATA WAREHOUSE

PART XI: DATA VAULT MODELINGANDREAS BUCKENHOFER, DAIMLER TSS

Page 2: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

ABOUT ME

https://de.linkedin.com/in/buckenhofer

https://twitter.com/ABuckenhofer

https://www.doag.org/de/themen/datenbank/in-memory/

http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/

https://www.xing.com/profile/Andreas_Buckenhofer2

Andreas BuckenhoferSenior DB [email protected]

Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics

Page 3: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

ANDREAS BUCKENHOFER, DAIMLER TSS GMBH

Data Warehouse / DHBWDaimler TSS 3

“Forming good abstractions and avoiding complexity is an essential part of a successful data architecture”

Data has always been my main focus during my long-time occupation in the area of data integration. I work for Daimler TSS as Database Professional and Data Architect with over 20 years of experience in Data Warehouse projects. I am working with Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things, experiment, and program every day.

I share my knowledge in internal presentations or as a speaker at international conferences. I'm regularly giving a full lecture on Data Warehousing and a seminar on modern data architectures at Baden-Wuerttemberg Cooperative State University DHBW. I also gained international experience through a two-year project in Greater London and several business trips to Asia.

I’m responsible for In-Memory DB Computing at the independent German Oracle User Group (DOAG) and was honored by Oracle as ACE Associate. I hold current certifications such as "Certified Data Vault 2.0 Practitioner (CDVP2)", "Big Data Architect“, „Oracle Database 12c Administrator Certified Professional“, “IBM InfoSphere Change Data Capture Technical Professional”, etc.

DHBWDOAG

xing

Contact/Connect

Page 4: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

As a 100% Daimler subsidiary, we give

100 percent, always and never less.

We love IT and pull out all the stops to

aid Daimler's development with our

expertise on its journey into the future.

Our objective: We make Daimler the

most innovative and digital mobility

company.

NOT JUST AVERAGE: OUTSTANDING.

Daimler TSS

Page 5: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

INTERNAL IT PARTNER FOR DAIMLER

+ Holistic solutions according to the Daimler guidelines

+ IT strategy

+ Security

+ Architecture

+ Developing and securing know-how

+ TSS is a partner who can be trusted with sensitive data

As subsidiary: maximum added value for Daimler

+ Market closeness

+ Independence

+ Flexibility (short decision making process,

ability to react quickly)

Daimler TSS 5

Page 6: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

Daimler TSS

LOCATIONS

Data Warehouse / DHBW

Daimler TSS China

Hub Beijing

10 employees

Daimler TSS Malaysia

Hub Kuala Lumpur

42 employeesDaimler TSS IndiaHub Bangalore22 employees

Daimler TSS Germany

7 locations

1000 employees*

Ulm (Headquarters)

Stuttgart

Berlin

Karlsruhe

* as of August 2017

6

Page 7: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

After the end of this lecture you will be able to

Understand differences in data modeling between OLTP and OLAP

Understand why data modeling is important

Understand data modeling in the Core Warehouse Layer and Data Mart Layer

• Data Vault

• Dimensional Model / Star schema

Understand dimensions and facts

Understand ROLAP & MOLAP

WHAT YOU WILL LEARN TODAY

Data Warehouse / DHBWDaimler TSS 7

Page 8: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 8

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Core Warehouse

Layer(Storage

Layer)

Mart Layer(Output Layer)

(Reporting Layer)

Integration Layer

(Cleansing Layer)

Aggregation Layer

Metadata Management

Security

DWH Manager incl. Monitor

Page 9: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

DATA MODELS IN THE DWH

Data Warehouse / DHBWDaimler TSS 9

Layer Characteristics Data Model

Staging Layer ▪ Temporary storage

▪ Ingest of source data

▪ Normally 1:1 copy of source table structure –usually without constraints and indexes

Core Warehouse Layer

▪ Historization / bitemporal data

▪ Integration

▪ Tool-independent

▪ Non-redundant data storage

▪ Historization

▪ 3NF with historization

▪ Head and Version modelling

▪ Data Vault

▪ Anchor modeling

▪ Dimensional model with historization (possible)

Data Mart Layer ▪ Performance for end user queries required, Tool-dependent

▪ Lots of joins necessary to answer complex questions

▪ Flat structures, esp. Dimensional model(ROLAP / MOLAP / HOLAP)

Page 10: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

DATA MODELING: 3NF, STAR SCHEMA, DATA VAULT

Data Warehouse / DHBWDaimler TSS 10

Business Key

Relationships

Contextandhistory

3NF

Star:Dimensions

Star:Facts

Data Vault:Hub

Data Vault:Link

Data Vault:Sat

Vehicle identifierManufacturerModelTypePlantDelivery DateProduction DatePriceSource systemLoad date/timeBuyerSalespersonEngine

Vehicle data

Page 11: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

DATA VAULT - ARCHITECTURE, METHODOLOGY, MODEL

Data Warehouse / DHBWDaimler TSS 11

Lecture part 1: DWH Architectures

Lecture part 2:DWH Data Modeling

Architecture

• Multi-Tier

• Scalable

• Supports NoSQL

Methodology

• Repeatable

• Measureable

• Agile

Model

• Flexible

• Hash based

• Hub & Spoke

Implementation: Automation, Pattern based, High speed

Page 12: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

"Data Vault 2.0” is a system of business intelligence which includes: Modeling, Methodology, Architecture, and Implementation best practices. The components, also known as pillars of Data Vault 2.0 are identified as follows:

• Data Vault 2.0 Modeling - Focused on Process and Data Models

• Data Vault 2.0 Methodology – Following SCRUM and agile ways of working

• Data Vault 2.0 Architecture – Includes NoSQL and big-data systems

• Data Vault 2.0 Implementation – Pattern based automation and generation

The term “Data Vault” is merely a marketing term chosen in 2001 to represent the system to the market. The true name for the Data Vault System of BI is: common foundational warehouse modeling, methodology, architecture, and implementation.

DATA VAULT 2.0, DEFINITION BY DAN LINSTEDT

Data Warehouse / DHBWDaimler TSS 12

Source: https://www.linkedin.com/pulse/defining-data-vault-10-20-business-dan-linstedt/

Page 13: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

Unique

identification

by

Natural keys

(Business Keys)

HUB

Page 14: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

STRUCTURE HUB TABLES

Data Warehouse / DHBWDaimler TSS 14

Page 15: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

HUB TABLES: TYPICAL CHARACTERISTICS

Data Warehouse / DHBWDaimler TSS 15

Business Keys should be natural keys used by the business (e.g. Vehicle Identifier, Serial number)

Business Keys should stand alone and have meaning to the business

Business Keys should never change, have the same semantic meaning and the same granularity

Focus on Business Keys (instead focus on source system surrogates) ensures that the result serves the needs of the business

Page 16: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

TYING BUSINESS PROCESSES TO BUSINESS KEYS

Time

ProcurementSales

$$Revenue

DeliveryContractsFinance

PlanningManufacturing

CustomerContact

Finance

Sales Procurement

SLS123 SLS123SLS123 *P123MFG

*P123MFG

Excel Spreadsheet

Manual Process

NO VISIBILITY!© Copyright 1990-2017, Dan Linstedt, all rights reserved

Page 17: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

Data Warehouse / DHBW 17Daimler TSS

LINK

Unique

relationships

between

Business Keys

(HUBs)

Page 18: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

STRUCTURE LINK TABLES

Data Warehouse / DHBWDaimler TSS 18

Page 19: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

LINK TABLES: TYPICAL CHARACTERISTICS

Data Warehouse / DHBWDaimler TSS 19

A LINK models a relationship between 2 or more HUBs

The relationship is always n:m

The composed key must be unique. One of the foreign keys is driving key

Link to Link allowed but should be avoided in a physical implementation due to load dependency

Page 20: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

• Relationships / Associations

• Foreign Keys in OLTP systems

• Hierarchies and Redefinitions

• Hierarchical relationships are modeled by one link and two connections to HUBs: HAL (parent-child LINK) and SAL (same-as LINK)

• Transactions and events are often modeled as link (could also be a Hub)

• E.g. sales order or sensor data

• Intensive discussions about modeling as Hub or Link on conferences or social media (modeling solution depends from requirements, context, etc)

CANDIDATES FOR LINKS

Data Warehouse / DHBWDaimler TSS 20

Page 21: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

Data Warehouse / DHBW 21Daimler TSS

SAT

Descriptive,

detailled,

current

and

historized

data

Page 22: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

STRUCTURE SAT TABLES

Data Warehouse / DHBWDaimler TSS 22

Page 23: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

SAT TABLES: TYPICAL CHARACTERISTICS

Data Warehouse / DHBWDaimler TSS 23

Contains all non-key attributes

Is connected to exactly one Hub or Link

HUB or LINK tables can (should) have several SAT tables, e.g. by source system

Can contain in the extreme case one column only (or any number of columns)

Page 24: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

Different criteria to design SAT tables (separate data into different SAT tables)

• Source system

• Rate of change

• Data types (e.g. separate CLOBS or other lengthy textual fields)

SAT TABLE DESIGN

Data Warehouse / DHBWDaimler TSS 24

Page 25: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

SAT TABLE DESIGN

Rate of change in order to avoid redundant storage of data

Data Warehouse / DHBWDaimler TSS 25

Data that change oftenData that do not change

Delivery date SW-Version Controlunit

Theft message

Color CommentsInterior

Page 26: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

HOW MANY ROWS ARE STORED IN THE HUB AND LINK TABLES?

Data Warehouse / DHBWDaimler TSS 26

vehicleid model productiondate

engine color

V1 SUV 15.01.13 E1 red

V2 Cabrio 16.01.13 E2 blue

V1 SUV 15.01.13 E1 red

V3 Cabrio 17.01.13 E3 red

Staging Data in table stg_vehicle from 15.01.2015

V1 SUV 16.01.13 E4 red

V4 Cabrio 17.01.13 E5 blue

Staging Data Data in table stg_vehicle from 16.01.2015

V1 SUV 16.01.13 E1 red

Staging Data Data in table stg_vehicle from 17.01.2015

Page 27: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

• H_VEHICLE

• 4 rows: V1, V2, V3, V4

• H_ENGINE

• 5 rows: E1, E2, E3, E4, E5

• L_PLUGGED_IN_EFFECTIVITY

• 5 rows: V1-E1, V2-E2, V3-E3, V1-E4, V4-E5

HOW MANY ROWS ARE STORED IN THE HUB AND LINK TABLES?

Data Warehouse / DHBWDaimler TSS 27

Page 28: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

HOW MANY ROWS ARE STORED IN THE FIRST 3 SAT TABLES?

Data Warehouse / DHBWDaimler TSS 28

vehicleid model productiondate

engine color

V1 SUV 15.01.13 E1 red

V2 Cabrio 16.01.13 E2 blue

V1 SUV 15.01.13 E1 red

V3 Cabrio 17.01.13 E3 red

Staging Data Data in table stg_vehicle from 15.01.2015

V1 SUV 16.01.13 E4 red

V4 Cabrio 17.01.13 E5 blue

Staging Data Data in table stg_vehicle from 16.01.2015

V1 SUV 16.01.13 E1 red

Staging Data Data in table stg_vehicle from 17.01.2015

Page 29: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

HOW MANY ROWS ARE STORED IN THE FIRST 3 SAT TABLES?

Data Warehouse / DHBWDaimler TSS 29

vehicleid model productiondate

engine color

V1 SUV 15.01.13 E1 red

V2 Cabrio 16.01.13 E2 blue

V1 SUV 15.01.13 E1 red

V3 Cabrio 17.01.13 E3 red

Staging Data Data in table stg_vehicle from 15.01.2015

V1 SUV 16.01.13 E4 red

V4 Cabrio 17.01.13 E5 blue

Staging Data Data in table stg_vehicle from 16.01.2015

V1 SUV 16.01.13 E1 red

Staging Data Data in table stg_vehicle from 17.01.2015

5

4

6

4

5

5

Page 30: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

Hans Hultgren: “An ensemble is a representation of a Core Business Concept including all of its parts – the business key, with context and relationships”

ENSEMBLE MODELING

Data Warehouse / DHBWDaimler TSS 30

Source:e.g. vehicle

Ensemble

Decomposition

Entity

Page 31: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

ENSEMBLE MODELING – NOT JUST DATA VAULT 2.0

Data Warehouse / DHBWDaimler TSS 31

Page 32: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

EXERCISE DATA VAULT

The following data model shows vehicle sales with entities

• Person (sales_person and owner)

• Vehicle

• Production_plant

Architect a Data Vault model for theCore Warehouse Layer

Data Warehouse / DHBWDaimler TSS 32

Page 33: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

SAMPLE SOLUTION DATA VAULT

Data Warehouse / DHBWDaimler TSS 33

Page 34: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

• Flexible / agile approach

• Highly parallel data loads, Scalable

• Automatable

• Systematic approach that covers historization and integration

• Full auditability

• No updates or deletes on business data

• Horizontal and vertical partitioning

• Supports / Combines RDBMS and Hadoop/NoSQL technologies

• Separates soft and hard rules into different parts of the data integration

DATA VAULT - ADVANTAGES

Data Warehouse / DHBWDaimler TSS 34

Page 35: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

• More Tables

• More joins

• Performance to load Data Mart can be a challenge

• Logic to load Data Marts can be rather complex if many tables are involved

• All relationships are modeled n:m (documentation necessary!). Data Vault assumes worst-case scenario for relationships

• The same source table is used several times while loading HUBs, SATs, LINKs

• Data Vault is an additional layer compared to a Kimball DWH bringing in some additional overhead

DATA VAULT - DISADVANTAGES

Data Warehouse / DHBWDaimler TSS 35

Page 36: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

DATA VAULT - QUOTES

Data Warehouse / DHBWDaimler TSS 36

Bill Inmon: “Over multiple years, Dan improved the Data Vault and evolved it into Data Vault 2.0. Today this System Of Business Intelligence includes not only a more sophisticated model, but an agile methodology, a reference architecture for enterprise data warehouse systems, and best practices for implementation.The Data Vault 2.0 System Of Business Intelligence is ground-breaking, again. It incorporates concepts from massively parallel architectures, Big Data, real-time and unstructured data.“Source: Linstedt / Olschimke: Building a Scalable Data Warehouse with Data Vault 2.0

Barry Devlin: “The Data Vault approach, since the early 2000s, promises a much-improved balance, with a hybrid of the

normalized and star schema forms above. Version 2.0 introduced in 2013 ,

consisting of a data model, methodology, and systems architecture, provides a

design basis for data warehouses that emphasizes core data quality,

consistency, and agility.“Source: https://www.wherescape.com/media/3476/data-vault-thoughpoint-april-2017.pdf

Page 37: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

The issues Data Vault 2.0 is built to solve include:

• Global distributed Teams

• Global distributed physical data warehouse components

• „Lazy“ joining during query time across multi-country servers

• Ingestion and query parsing of images, video, audio, documents (unstructured data)

• Ingestion of real-time streaming (IOT) data

• Cloud and On-premise seamless integration

• Agile Team Delivery

• Incorporation of Data Virtualization, and NoSQL platforms

• Extremely large data sets (in to the Petabyte ranges and beyond)

• Automation and Generation of 80% of the work products

DATA VAULT 2.0 BENEFITS ACCORDING TO DAN LINSTEDT

Data Warehouse / DHBWDaimler TSS 37

Source: https://www.linkedin.com/pulse/defining-data-vault-10-20-business-dan-linstedt/

Page 38: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99

[email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle

Data Warehouse / DHBWDaimler TSS 38

THANK YOU

Page 39: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

1. You can't understand what you don't research

2. You can't define what you don't understand (standards, context, concepts)

3. You can't identify what you don't define (KPA's and structure)

4. You can't measure what you don't identify (KPA's and KPI's)

5. You can't optimize what you can't measure (KPI's and retrospectiveadaptation)

5 LEVELS OF CMMI

Data Warehouse / DHBWDaimler TSS 39

Source: https://www.linkedin.com/pulse/data-vault-20-beyond-model-dan-linstedt/

Page 40: LECTURE @DHBW: DATA WAREHOUSE PART XI: DATA VAULT …buckenhofer/20182DWH/Buckenhofer-… · Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things,

„The Data Vault is a detail oriented, historical tracking and uniquely linked setof normalized tables that support one or more functional areas of business.It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.“

FORMAL DEFINITION DATA VAULT (1.0) BY DAN LINSTEDT

Data Warehouse / DHBWDaimler TSS 40

Source: http://www.vertabelo.com/blog/technical-articles/data-vault-series-agile-modeling-not-an-option-anymore