32
UNIT-II UNIT-II •Principles of dimensional modeling •Dimensional modeling: advanced topics •ETL •OLAP 1

UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Embed Size (px)

Citation preview

Page 1: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

UNIT-IIUNIT-II

• Principles of dimensional modeling

• Dimensional modeling:advanced topics

• ETL

• OLAP1

Page 2: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Principles of dimensional modelingPrinciples of dimensional modeling

• From requirements to data design

• STAR schema

• STAR schema keys

• Advantages of the STAR schema

2

Page 3: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

From requirementsFrom requirementsto data designto data design

• Requirements gathering

• Requirements definition document (with information packages)

• Data design

• Dimensional model

(figure 10-1)

3

Page 4: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Figure 10-1Figure 10-1

4

Page 5: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Design decisionsDesign decisions

• Choosing the process (Subjects)

• Choosing the grain(Level of Details)

• Identifying and conforming the dimensions

• Choosing the facts

• Choosing the duration of the database

(Duration of historical data)5

Page 6: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Dimensional modeling basicsDimensional modeling basics

• From the information package diagram:– The metrics or facts fact table (figure 10-2)– Dimensions dimension tables with attributes

(figure 10-3)

6

Page 7: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Figure 10-2Figure 10-2

7

Page 8: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Figure 10-3Figure 10-3

8

Page 9: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Dimensional modelingDimensional modeling

• Dimensional model with fact table in the middle and the dimension tables around

• Called a STAR schema (figure 10-4)

9

Page 10: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Figure 10-4Figure 10-4

10

Page 11: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Dimensional Data Modeling Dimensional Data Modeling (DDM)(DDM)

• DDM comprises of one or more dimension tables and fact tables.

• Dimension tables store records related to that particular dimension. E.g. location, Product, Time.

• A fact (measure) table contains measures (sales gross value, total units sold) and dimension columns.

• These dimension columns are actually foreign keys from the respective dimension tables. 11

Page 12: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Example of Dimensional Data Example of Dimensional Data Model: Model:

12

Page 13: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

• In the figure, sales fact table is connected to dimensions (location, product, time and organization).

• It shows that data can be sliced across all dimensions and

• It is also possible for the data to be aggregated across multiple dimensions.

13

Page 14: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

• ‘Sales Dollar’ in sales fact table can be calculated across all dimensions independently or in a combined manner that is explained below.

– Sales Dollar value for a particular product – Sales Dollar value for a product in a location – Sales Dollar value for a product in a year within a

location – Sales Dollar value for a product in a year within a

location sold or serviced by an employee

14

Page 15: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Uses of DDMUses of DDM

• DDM is used for calculating summarized data. • For example, sales data could be collected on

a daily basis and then be aggregated to the week level, the week data could be aggregated to the month level, and so on.

• The data can then be referred to as aggregate / summarized data.

• The performance of DDM can be significantly increased when materialized views are used.

15

Page 16: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

• Materialized view is a pre-computed table comprising aggregated or joined data from fact and possibly dimension tables which also known as a summary or aggregate table.

16

Page 17: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Dimension TableDimension Table

• Dimension table is one that describes the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products.

• Dimension tables are sometimes called lookup or reference tables.

17

Page 18: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Relational vs DimensionalRelational vs Dimensional

• Relational Data Model (RDM) is used in OLTP systems, which are transaction oriented, and DDM is used in OLAP systems, which are analytical based.

•  In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in a DW.

18

Page 19: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

• Data is stored in RDBMS

• Tables are units of storage• Data is normalized and used for

OLTP. Optimized for OLTP processing

• Several tables and chains of relationships among them

• Volatile (several updates) • Detailed level of transactional

data

• Normal Reports

• Data is stored in RDBMS or Multidimensional databases

• Cubes are units of storage• Data is denormalized and used

in DW and data mart. Optimized for OLAP

• Few tables and fact tables are connected to dimensional tables

• Non volatile and time variant• Summary of bulky transactional

data (Aggregates and Measures) used in business decisions

• User friendly, interactive, drag and drop multidimensional OLAP Reports

19

Page 20: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

DM Versus E-R modeling DM Versus E-R modeling (figure 10-5, 10-6)(figure 10-5, 10-6)

20

Page 21: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

The STAR SchemaThe STAR Schema

• Star Schema is a database schema for representing multi-dimensional data.

•  It is the simplest form of DW schema that contains one or more dimensions and fact tables.

21

Page 22: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

The STAR SchemaThe STAR Schema

• It is called a star schema because the relationship between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions.

•  The center of the star schema consists of a large fact table and it points towards the dimension tables.

• Simple STAR schema (figure 10-7)22

Page 23: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Figure 10-7Figure 10-7

23

Page 24: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Steps in designing Star Steps in designing Star Schema Schema

• Identify a business process for analysis (like sales).

• Identify measures or facts.

• Identify dimensions for facts.

• List the columns that describe each dimension.

• Determine the lowest level of summary in a fact table.

24

Page 25: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Characteristics of Characteristics of Dimension TableDimension Table

• Dimension Table Key (PK)

• Table is Wide

• Textual Attributes

• Attributes not directly related

• Not Normalized

• Drilling-down, rolling-up

• Multiple Hierarchies

• Fewer no of records25

Page 26: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Inside a dimension table Inside a dimension table (figure 10-10)(figure 10-10)

26

Page 27: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Characteristics of Characteristics of Fact TableFact Table

• Concatenated key• Data granularity• Measure Types

– Full Additive - Measures that can be added across all dimensions.

– Non-Additive - Measures that cannot be added across all dimensions.

– Semi Additive - Measures that can be added across few dimensions and not with others.

• Table deep, not wide• Sparse data

27

Page 28: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Inside the fact table Inside the fact table (figure 10-11)(figure 10-11)

28

Page 29: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Factless fact table Factless fact table (figure 10-12)(figure 10-12)

29

Page 30: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Data granularityData granularity

• fact table at lowest grain

30

Page 31: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Star schema keysStar schema keys

• Primary key (dimension table)

• Surrogate keys (system-generated sequence keys)– Avoid built-in meanings in keys– Do not use production system keys

• Foreign key in fact table

• Concatenated primary key in fact table

31

Page 32: UNIT-II Principles of dimensional modeling Dimensional modeling: advanced topics ETL OLAP 1

Advantages of Advantages of STAR schema STAR schema

STAR schema is a relational model, it is not a normalized model:

• Easy for user to understand

• Optimizes navigation

• Most suitable for query processing

32