29
Riadh Ben Messaoud

Data Warehouses & OLAP - FSEGN Schema Design... · On-Line Analytical Processing 8. OLAP Applications 9. Data Warehouse Implementation 10. ... Normalization is the standard database

Embed Size (px)

Citation preview

Riadh Ben Messaoud

1. The Big Picture2. Data Warehouse Philosophy3. Data Warehouse Concepts4. Warehousing Applications5. Warehouse Schema Design6. Business Intelligence Reporting7. On-Line Analytical Processing8. OLAP Applications9. Data Warehouse Implementation10. Warehousing Software

2Data Warehouses & OLAP

1. The Big Picture2. Data Warehouse Philosophy3. Data Warehouse Concepts4. Warehousing Applications5. Warehouse Schema Design6. Business Intelligence Reporting7. On-Line Analytical Processing8. OLAP Applications9. Data Warehouse Implementation10. Warehousing Software

3Data Warehouses & OLAP

Data Warehouses & OLAP 4

Dimensional modeling refers to a set of data modeling techniques that have gained popularity and acceptance for DW implementations

The acknowledged guru of dimensional modeling is Ralph Kimball

The Data Warehouse, Toolkit. Practical Techniques for Building Dimensional Data Warehouses, John Wiley & Sons

Data Warehouses & OLAP 5

Normalization is the standard database design technique for the relational DB of OLTP systems

Normalized Data Structures (NDS) allow operational systems to record hundreds of discrete, individual transactions, with minimal risk of data loss or data error

Although normalized databases are appropriate for OLTP systems, they quickly create problems when used with decisional systems

Data Warehouses & OLAP 6

NDS are not easy to understand

NDS do not map to the natural thinking processes of business users

Business users are expected to perform queries against the DW on an ad hoc basis

They must be provided with data structures that are simple and easy to understand

NDS do not provide the required level of simplicity and friendliness

Data Warehouses & OLAP 7

NDS require technical knowledge

To create queries and reports against a NDS one requires knowledge of SQL

Business users, decision-makers, senior executives are not expected to manipulate SQL

Their time is better spent on non-programming activities

Unsurprisingly, the use of NDS results in many hours of IT resources devoted to writing reports for operational and decisional managers

Data Warehouses & OLAP 8

NDS are not optimized to support decisional queries

Decisional queries require the summation of hundreds to thousands of figures stored in perhaps many rows in a DB

Such processing on a fully NDS is slow and cumbersome

Data Warehouses & OLAP 9

NDS are not optimized to support decisional queries

Data Warehouses & OLAP 10

Dimensional Modeling for Decisional Systems

Principles for denormalizing the database structure to create schemas suitable for supporting decisional processing

Two types of tables are used in dimensional modeling:

Fact tables

Dimensional tables

Data Warehouses & OLAP 11

Dimensional Modeling for Decisional Systems

Fact tables

Used to record actual facts or measures in the business

Facts are the numeric data items that are of interest to the business

Facts are the numbers that users analyze and summarizeto gain a better understanding of the business

Data Warehouses & OLAP 12

Dimensional Modeling for Decisional Systems

Fact tables (Examples)

Retail. Number of units sold, sales amount

Telecommunications. Length of call in minutes, average number of calls

Banking. Average daily balance, transaction amount

Insurance. Claims amounts

Airline. Ticket cost, baggage weight

Data Warehouses & OLAP 13

Dimensional Modeling for Decisional Systems

Dimension tables

Establish the context and store fields describing the facts

Retail. Store name, store zip, product name, product category, day of week

Telecommunications. Call origin, call destination

Banking. Customer name, account number, data, branch, account officer

Insurance. Policy type, insured policy

Airline. Flight number, flight destination, airfare class

Data Warehouses & OLAP 14

Dimensional Modeling for Decisional Systems

Facts and Dimensions in Reports

A manager requires a report showing the revenue for Store X, at Month Y, for Product Z

He needs the Store dimension, the Time dimension, and the Product dimension to describe the context of the revenue

Data Warehouses & OLAP 15

Dimensional Modeling for Decisional Systems

Facts and Dimensions in Reports Sales region and country are dimensional attributes

“2Q, 1997” is a dimensional value

They establish the context and lend meaning to the facts sales targets and sales actual

Data Warehouses & OLAP 16

Star Schema

The multidimensional view of data that is expressed using relational database semantics

Information are classified into 2 groups: facts and dimensions

Fact tables reside at the center of the schema, and their dimensions are typically drawn around it

Data Warehouses & OLAP 17

Star Schema

A key principles of dimensional modeling:

The use of fully normalized Fact tables

The use of fully denormalized Dimension tables

Normalized dimension tables decreases the friendlinessand navigability of the schema

By denormalizing the dimensions, we make available to the user all relevant attributes in a single table

Data Warehouses & OLAP 18

Dimensional Hierarchies

A dimension has hierarchies that imply a groupingstructure

Data Warehouses & OLAP 19

Hierarchical Drilling

Users drill up and down dimensional hierarchies to obtain more or less detail about the business

Data Warehouses & OLAP 20

Hierarchical Drilling

Users drill up and down dimensional hierarchies to obtain more or less detail about the business

Data Warehouses & OLAP 21

Granularity of the fact table

Granularity: indicates the level of detail stored in a table

The granularity of the Fact table follows from the level of detail of its related dimensions

For example, if each:◦ Time record represents a day,

◦ Product record represents a product,

◦ Organization record represents one branch,

then the grain of a sales Fact table with these dimensions is sales per product per day per branch

Data Warehouses & OLAP 22

Granularity of the fact table

Proper identification of the granularity of each schema is crucial to the usefulness and cost of the DW

Granularity at too high a level severely limits the ability of users to obtain additional detail

Granularity at too low a level results in an exponential increase in the size requirements of the DW

Data Warehouses & OLAP 23

The Fact Table Key Concatenates Dimension Keys

The key of the fact table is actually a concatenation of the keys of each of the dimensions that surround it

The sales fact table key is the concatenation of the client key, the product key and the time key (Day)

Data Warehouses & OLAP 24

Aggregates or Summaries

One of the most powerful concepts in DW

The proper use of aggregates dramatically improves the performance of the DW in terms of query response times

Improves the overall performance and usability of the DW

An aggregate is a pre calculated summary stored within the warehouse, usually in a separate schema

Aggregates are used to improve the performance of the warehouse for queries that require only high-level or summarized data

Data Warehouses & OLAP 25

Aggregates or Summaries

Aggregates are summaries of the base-level data higher pointsalong the dimensional hierarchies

Rather than running a high-level query against base-level or detailed data, users can run the query against aggregated data

Aggregates provide improvements in performance because of significantly smaller number of records

Data Warehouses & OLAP 26

Aggregates have fewer records – Example

Data Warehouses & OLAP 27

Dimensional Attributes

Play a critical role in dimensional star schemas

The attribute values are used to establish the context of the facts

Data Warehouses & OLAP 28

Multiple Star Schemas

A DW can have multiple star schemas (many Fact tables)

Each schema is designed to meet a specific set of needs

Each focusing on a different aspect of the business

We can use the same Dimension table in more than one schema

The enterprises can reuse the Time dimension in all warehouse schemas, provided that the level of detail is appropriate

Data Warehouses & OLAP 29

Advantages of Dimensional Modeling

It is simple

◦ Business users can easily grasp and comprehend the schema

It promotes data quality

◦ Enforcing foreign key constraints as a form of referential integrity check

Performance optimization

◦ Aggregates are a way to optimize query performance

Use of relational database technology

◦ We can rely on the highly scalable relational DB technology