42
A company of Daimler AG LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH ARCHITECTURES ANDREAS BUCKENHOFER, DAIMLER TSS

LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

  • Upload
    others

  • View
    7

  • Download
    3

Embed Size (px)

Citation preview

Page 1: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

A company of Daimler AG

LECTURE @DHBW: DATA WAREHOUSE

03 COMMON DWH ARCHITECTURESANDREAS BUCKENHOFER, DAIMLER TSS

Page 2: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

ABOUT ME

https://de.linkedin.com/in/buckenhofer

https://twitter.com/ABuckenhofer

https://www.doag.org/de/themen/datenbank/in-memory/

http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/

https://www.xing.com/profile/Andreas_Buckenhofer2

Andreas BuckenhoferSenior DB [email protected]

Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics

Page 3: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

ANDREAS BUCKENHOFER, DAIMLER TSS GMBH

Data Warehouse / DHBWDaimler TSS 3

“Forming good abstractions and avoiding complexity is an essential part of a successful data architecture”

Data has always been my main focus during my long-time occupation in the area of data integration. I work for Daimler TSS as Database Professional and Data Architect with over 20 years of experience in Data Warehouse projects. I am working with Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things, experiment, and program every day.

I share my knowledge in internal presentations or as a speaker at international conferences. I'm regularly giving a full lecture on Data Warehousing and a seminar on modern data architectures at Baden-Wuerttemberg Cooperative State University DHBW. I also gained international experience through a two-year project in Greater London and several business trips to Asia.

I’m responsible for In-Memory DB Computing at the independent German Oracle User Group (DOAG) and was honored by Oracle as ACE Associate. I hold current certifications such as "Certified Data Vault 2.0 Practitioner (CDVP2)", "Big Data Architect“, „Oracle Database 12c Administrator Certified Professional“, “IBM InfoSphere Change Data Capture Technical Professional”, etc.

DHBWDOAG

xing

Contact/Connect

Page 4: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

As a 100% Daimler subsidiary, we give

100 percent, always and never less.

We love IT and pull out all the stops to

aid Daimler's development with our

expertise on its journey into the future.

Our objective: We make Daimler the

most innovative and digital mobility

company.

NOT JUST AVERAGE: OUTSTANDING.

Daimler TSS

Page 5: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

INTERNAL IT PARTNER FOR DAIMLER

+ Holistic solutions according to the Daimler guidelines

+ IT strategy

+ Security

+ Architecture

+ Developing and securing know-how

+ TSS is a partner who can be trusted with sensitive data

As subsidiary: maximum added value for Daimler

+ Market closeness

+ Independence

+ Flexibility (short decision making process,

ability to react quickly)

Daimler TSS 5

Page 6: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

Daimler TSS

LOCATIONS

Data Warehouse / DHBW

Daimler TSS China

Hub Beijing

10 employees

Daimler TSS Malaysia

Hub Kuala Lumpur

42 employeesDaimler TSS IndiaHub Bangalore22 employees

Daimler TSS Germany

7 locations

1000 employees*

Ulm (Headquarters)

Stuttgart

Berlin

Karlsruhe

* as of August 2017

6

Page 7: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• Describe different DWH architectures

• Explain DWH data modeling methods and design logical models

• Name DB techniques that are well-suited for DWHs

• Explain ETL processes

• Specify reporting & project management & meta data requirements

• Name current DWH trends

DWH LECTURE - LEARNING TARGETS

Data Warehouse / DHBWDaimler TSS 7

Page 8: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

The article

http://www.kimballgroup.com/2004/03/differences-of-opinion/

compares THE two classic DWH architectures.

Read the paper and complete the table / questions on the next slide.

(Caution: The paper is biased / favors one approach; you may want to read other/more papers for a neutral view.)

EXERCISE: CLASSICAL DWH ARCHITECTURES

Data Warehouse / DHBWDaimler TSS 8

Page 9: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

EXERCISE: CLASSICAL DWH ARCHITECTURES

Data Warehouse / DHBWDaimler TSS 9

How are the approaches called?

Who “invented” the approach?

How many layers are used and how are the layers called?

Which data modeling approaches are used in which layer?

In which layer are atomic detail data stored?

In which layer are aggregated / summary data stored?

List at least 2 advantages

List at least 2 disadvantages

Page 10: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

EXERCISE: CLASSICAL DWH ARCHITECTURES

Data Warehouse / DHBWDaimler TSS 10

How are the approaches called?

Kimball Bus Architecture Corporate Information Factory

Who “invented” the approach?

• Ralph Kimball • Bill Inmon

How many layers are used and how are the layers called?

• Data Staging• Dimensional Data Warehouse

• Data Acquisition• Normalized Data Warehouse• Data Delivery / Dimensional Mart

Which data modeling approaches are used in which layer?

• Data Staging: variable, corresponds to source system

• Dimensional Data Warehouse:Dimensional Model

• Data Acquisition: variable, corresponds to source system

• Normalized Data Warehouse: 3NF• Data Delivery: Dimensional Model

In which layer are atomic detail data stored?

• Dimensional Data Warehouse • Normalized Data Warehouse

In which layer are aggregated / summary data stored?

• Dimensional Data Warehouse • Data Delivery / Dimensional Mart

Page 11: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

EXERCISE: CLASSICAL DWH ARCHITECTURES

Data Warehouse / DHBWDaimler TSS 11

Kimball Bus Architecture Corporate Information Factory

Advantages • Two layers only mean faster development and less work

• Rather simple approach to make data fast and easily accessible

• Lower startup costs (but higher subsequent development costs)

• Separation of concerns: long-term enterprise data storage separated from data presentation

• Changes in requirements and scope are easier to manage

• Lower subsequent development costs (but higher startup costs)

Disadvantages • If table structures change (instable source systems), high effort to implement the changes and reload data, especially conformed dimensions (“Dimensionitis” desease)

• Non-metric data not optimal for dimensional model

• Dimensional model (esp. Star Schema) contains data redundancy

• Data model transformations from 3NF to Dimensional model required

• More complex as two different data models are required

• Larger team(s) of specialists required

Page 12: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• Kimball Bus Architecture (Central data warehouse based on data marts)

• Inmon Corporate Information Factory

• Data Vault 2.0 Architecture (Dan Linstedt)

• DW 2.0: The Architecture for the Next Generation of Data Warehousing

• Virtual Data Warehouse

• Operational Data Store (ODS)

OTHER ARCHITECTURES

Data Warehouse / DHBWDaimler TSS 12

Page 13: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

KIMBALL BUS ARCHITECTURE (CENTRAL DATA WAREHOUSE BASED ON DATA MARTS)

Data Warehouse / DHBWDaimler TSS 13

Source: http://www.kimballgroup.com/2004/03/differences-of-opinion/

Page 14: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

KIMBALL BUS ARCHITECTURE (CENTRAL DATA WAREHOUSE BASED ON DATA MARTS)

Data Warehouse / DHBWDaimler TSS 14

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Core Warehouse Layer= Mart Layer

Data Mart 1

Data Mart 2Data Mart 3

Metadata Management

Security

DWH Manager incl. Monitor

More Business-process oriented

than subject-oriented,

integrated, time-variant,non-volatile

Page 15: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• Bottom-up approach

• Dimensional model with denormalized data

• Sum of the data marts constitute the Enterprise DWH

• Enterprise Service Bus / conformed dimensions for integration purposes• (don’t confuse with ESB as middleware/communication system between applications)

• Kimball describes that agreeing on conformed dimensions is a hard job and it’s expected that the team will get stuck from time to time trying to align the incompatible original vocabularies of different groups

• Data marts need to be redesigned if incompatibilities exist

KIMBALL BUS ARCHITECTURE (CENTRAL DATA WAREHOUSE BASED ON DATA MARTS)

Data Warehouse / DHBWDaimler TSS 15

Page 16: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

Co

re W

are

ho

use

La

yer

DATA INTEGRATION WITH AND WITHOUT COREWAREHOUSE LAYER

Data Warehouse / DHBWDaimler TSS 16

Page 17: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

INMON CORPORATE INFORMATION FACTORY

Data Warehouse / DHBWDaimler TSS 17

Source: http://www.kimballgroup.com/2004/03/differences-of-opinion/

Page 18: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

INMON CORPORATE INFORMATION FACTORY

Data Warehouse / DHBWDaimler TSS 18

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Core Warehouse

Layer(Storage

Layer)

Mart Layer(Output Layer)

(Reporting Layer)

Metadata Management

Security

DWH Manager incl. Monitor

subject-oriented,

integrated, time-

variant,non-

volatile

Page 19: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• Top-down approach

• (Normalized) Core Warehouse is essential for subject-oriented, integrated, time-variant and nonvolatile data storage

• Create (departmental) Data Marts as subsets of Core Enterprise DWH as needed

• Data Marts can be designed with Dimensional model

• The logical standard architecture is more general compared to CIF, but was mainly influenced by CIF

INMON CORPORATE INFORMATION FACTORY

Data Warehouse / DHBWDaimler TSS 19

Page 20: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

DATA VAULT 2.0 ARCHITECTURE – TODAY’S WORLD (DANLINSTEDT)

Data Warehouse / DHBWDaimler TSS

Page 21: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)

Data Warehouse / DHBWDaimler TSS 21

Michael Olschimke, Dan Linstedt: Building a Scalable Data Warehouse with Data Vault 2.0, Morgan Kaufmann, 2015, Chapter 2.2

Page 22: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)

Data Warehouse / DHBWDaimler TSS 22

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Raw Data Vault

Mart Layer(Output Layer)

(Reporting Layer)

Business Data Vault

Metadata Management

Security

DWH Manager incl. Monitor

Hard Rules only

Soft Rules

Page 23: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• Core Warehouse Layer is modeled with Data Vault and integrates data by BK (business key) “only” (Data Vault modeling is a separate lecture)

• Business rules (Soft Rules) are applied from Raw Data Vault Layer to Mart Layer and not earlier

• Alternatively from Raw Data Vault to additional layer called Business Data Vault

• Hard Rules don’t change data

• Data is fully auditable

• Real-time capable architecture

• Architecture got very popular recently; also applicable to BigData, NoSQL

DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)

Data Warehouse / DHBWDaimler TSS 23

Page 24: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• In the classical DWHs, the Core Warehouse Layer is regarded as “single version of the truth”

• Integrates + cleanses data from different sources and eliminates contradiction

• Produces consistent results/reports across Data Marts

• But: cleansing is (still) objective, Enterprises change regularly, paradigm does not scale as more and more systems exist

• Data in Raw Data Vault Layer is regarded as “Single version of the facts”

• 100% of data is loaded 100% of time

• Data is not cleansed and bad data is not removed in the Core Layer (Raw Vault)

DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)

Data Warehouse / DHBWDaimler TSS 24

Page 25: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• Data Vault is optimized for the following requirements:

• Flexibility

• Agility

• Data historization

• Data integration

• Auditability

• Bill Inmon wrote in 2008: “Data Vault is the optimal approach for modeling the EDW in the DW2.0 framework.” (DW2.0)

DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)

Data Warehouse / DHBWDaimler TSS 25

Page 26: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

DW 2.0: THE ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING

Data Warehouse / DHBWDaimler TSS 26

Source: W.H. Inmon, Dan Linstedt: Data Architecture: A Primer for the Data Scientist, Morgan Kaufmann, 2014, chapter 3.1

Operational applicationdata model

Integrated corporatedata model

Integrated corporatedata model

Archivaldata model

Dat

a Li

fecy

cle

Page 27: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

Main characteristics:

• Structured and “unstructured” data, not just metrics

• Life Cycle of data with different storage areas

• Hot data: High speed, expensive storage (RAM, SSDs) for most recent data

• …

• Cold data: Low speed, inexpensive storage (e.g. hard disks) for old data; archival data model with high compression

• Metadata is an integral part of the DWH and not an afterthought

DW 2.0: THE ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING

Data Warehouse / DHBWDaimler TSS 27

Page 28: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

VIRTUAL DATA WAREHOUSE

Data Warehouse / DHBWDaimler TSS 28

Data Warehouse

FrontendBackend

External data sources

Internal data sources

OLTP

OLTPQuery Management

Weakly+partly subject-oriented, Weakly+partly integrated,

Not time-variant,Not non-volatile

Page 29: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• Data not extracted from operational systems and stored separately

• Standardized interface for all operational data sources

• One "GUI" for all existing data

• Generates combined queries

• Query Processor joins query result data from different sources

• Can also access data in Hadoop (Polybase, Big SQL, BigData SQL, etc)

VIRTUAL DATA WAREHOUSE

Data Warehouse / DHBWDaimler TSS 29

Page 30: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• Query Management manages metadata about all operational systems

• (physical) location of data and algorithms for extracting data from OLTP system

• Implementation easier

• Low cost: can use existing hardware infrastructure

• Queries cause significant performance problems in operational systems

• Known problems when analyzing operational data directly

• Same query is processed multiple times (if queried multiple times)

• Same query delivers different results when processed at different times

VIRTUAL DATA WAREHOUSE

Data Warehouse / DHBWDaimler TSS 30

Page 31: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

OPERATIONAL DATA STORE (ODS)

Data Warehouse / DHBWDaimler TSS 31

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Core Warehouse

Layer(Storage

Layer)

Mart Layer(Output Layer)

(Reporting Layer)

Metadata Management

Security

DWH Manager incl. Monitor

subject-oriented,

integrated, time-

variant,non-

volatile

Operational Data Store

Page 32: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• ODS: Real-time/Right-time layer

• Replication techniques used to transport data from source database to ODS layer with minimal impact on source system

• Data in the ODS has no history and is stored without any cleansing and without any integration (1:1 copy from single source)

• DWH performance not optimal as data model is suited for OLTP and not for reporting requirements

• ODS normally additionally to Staging / Core DWH / Mart Layer but can exist alone without other layers

OPERATIONAL DATA STORE (ODS)

Data Warehouse / DHBWDaimler TSS 32

Page 33: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

EXAMPLE DWH FOR STATE OF CONSTRUCTION DOCU

Data Warehouse / DHBWDaimler TSS 33

Page 34: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

ARCHITECTURE FROM AN ACTUAL PROJECT IN THE AUTOMOTIVE INDUSTRY

Data Warehouse / DHBWDaimler TSS 34

ETL Engine

Fron

tend

StandardReports

AdHocReportsLogs

TSM

IIDRReplEngine

Source

DatastoreSource

Mirror DB (Operational Data Store)

OLTPDB

IIDR ReplEngineMirror

DatastoreMirror

IIDR ReplEngineDWH

DatastoreDWH

BackendDWH DB

Staging Layer

Raw + Business Data Vault

Mart Layer

Page 35: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

END USER SAMPLE QUESTIONS

Data Warehouse / DHBWDaimler TSS 35

Which vehicles or aggregates are documented incompletely? (Data quality)

Which vehicles / which control units require SW updates?

Which interiors are most common by region?

Supply data for external simulations, customs clearance, spare part planning, etc.

Page 36: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

Review the presented data warehouse architectures.

Which architecture would you recommend for

• A holding of 3 telecommunication companies

• An online store with real/right-time data integration needs

• Marketing department of a bank

List advantages and drawbacks of your proposal.

EXERCISE: RECOMMEND AN ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 36

Page 37: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

A holding of 3 telecommunication companies

• Architecture: Virtual Data Warehouse

• + Companies may not want to provide their data to a new storage

• + Can easily be extended if new companies join the holding or reduced if a company leaves the holding

• - Bad performance

• - Not really data integration achieved, low Data Quality

• - Firewalls have to be opened

EXERCISE: RECOMMEND AN ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 37

Page 38: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

An online store with real-time/right-time data integration needs

• Architecture: Data Vault 2.0

• + Integration of many internal and external source systems (e.g. integrate social media data about the online store)

• + Fast data delivery in Raw Vault Layer (Real-time/Right-time data integration). Complex data cleansing / transformation / soft rules are delayed downstream towards Mart Layer

• - Transformation overhead (Source system data model to Data Vault data model to Dimensional data model)

EXERCISE: RECOMMEND AN ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 38

Page 39: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

Marketing department of a bank

• Architecture: Kimball Bus architecture

• + Start small for a department. If other departments are interested, new data and new Marts can be added on demand

• - High risk to loose the Enterprise view and several DWHs are built

That’s still quite a common scenario nowadays. A single Enterprise DWH is often not achieved (e.g. Mergers & Acquisitions, inflexibility due to a single centralized DWH, rapidly changing conditions, etc.) and therefore very often several DWHs with different architectures exist in parallel within a company.

EXERCISE: RECOMMEND AN ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 39

Page 40: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

• Now imagine that you prepare an exam.

• Identify 1-3 questions about DWH architecture (and/or DWH introduction) that you would ask in an exam.

• Write down the questions on stick-it cards.

EXERCISE - INTRODUCTION AND DWH ARCHITECTUREGROUP TASK

Data Warehouse / DHBWDaimler TSS 40

Page 41: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

Which layers does the logical standard architecture have?

• Staging (Input), Integration (Cleansing), Core Warehouse (Storage), Aggregation, Mart (Reporting, Output) and additionally Metadata, Security, DWH Manager, Monitor

Which other architectures exist?

• Kimball Bus Architecture (Central data warehouse based on data marts)

• Inmon Corporate Information Factory

• Data Vault 2.0 Architecture (Dan Linstedt)

• DW 2.0: The Architecture for the Next Generation of Data Warehousing

• Virtual Data Warehouse

• Operational Data Store (ODS)

SUMMARY

Data Warehouse / DHBWDaimler TSS 41

Page 42: LECTURE @DHBW: DATA WAREHOUSE 03 COMMON DWH …buckenhofer/20182DWH/...I'm regularly giving a full lecture on Data Warehousing and a seminar on ... version of the truth” • Integrates

Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99

[email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle

Data Warehouse / DHBWDaimler TSS 42

THANK YOU