1. Dimensional Modeling (2) Gregory Ng Data Warehouse /
Business Intelligence Designer 17th March 2008
2. Dimension Model vs. ER Model
ER Model:
Normalization to remove redundancy, anomaly and improve
integrity up to 6NF
3 major types of relation, one-to-one, one-to-many,
many-to-many
Optimized for INSERT, UPDATE and DELETE type operation
i.e. Perfect for OLTP applications (high volume of small
transactions)
Things to consider:
ER does not really model a business; rather modelling the micro
relationships amount data elements
Query optimization
3. Dimension Model vs. ER Model (cont) Dimension Model:
Denormalized to 2NF (reduce number of tables and join paths),
creates redundancy 1 major type of relationship, one-to-many Ideal
for SELECT operation Top down approach: focus on business process
Designed to support analytical queries and user access Handle
anomaly within ETL Predictable SQL Perfect for OLAP
applications
4. Case Study 1 Project Writeaway (2009) Database SQL Server
2000 Reporting Hyperion IR Star Schema 4 No. of records ~ 1 mil
Load Complete refresh Typical report generation time ~3 seconds
Project build time 4 months Highlights Drill Across Factless Fact
Table Dimension Outrigger Dimension Bridging Junk Dimension
5. Case Study 2 Project Absenteeism (2006) Database SQL Server
2000 Reporting Cognos Star Schema 1 No. of records ~ 1 mil Load
Incremental Typical report generation time ~25 seconds Project
build time 4 weeks Highlights Drill Across Slowly Changing
Dimension Active Data Warehouse
6. Case Study 3 Project Mortgage Wealth DNA (2009) Database
Teradata Reporting Hyperion IR Star Schema 3 No. of records ~ 150
mil Load Incremental Typical report generation time ~20-30 seconds
Project build time 3 months Highlights Drill Across Aggregate Join
Index Partitioning/Multi-Partitioning 99% aggregation done on
Teradata on the fly minimise data retrieval
7. Case Study 4 Project Commway (2005) Database SQL Server 2000
Reporting Cognos Star Schema 3 No. of records ~ 10 mil Load
Incremental Typical report generation time ~ 30 seconds Project
build time 18 months Highlights Drill Across Slowly Changing
Dimension Active Data Warehouse .NET Front End for Data Entry
(4000+ Users)
8. Skills we have now
Dimension modeling techniques/templates for different processes
and subject areas
Data Warehouse architecture for Dimensional Modeling
Dimensional Modeling Workshop procedures
ETL mapping documentations
Reporting with Dimensional Model; multi-pass SQL
Practiced Star Schema friendly Teradata functions; AJI,
Partition, Multi-Partitions
9. Technologies we have now
State-of-the-art Teradata hardware
GDW in 3rd NF
Essbase Studio (EIS)
DataStage
Oracle Grid coming online?
OBIEE
Next
10. Shared Dimension (Conformed) and Drill Across Drill across
to different business process fact can be enable via confromed
dimension
11. Shared Dimension (Conformed) and Drill Across (cont) To
produce the following drill across report: SELECT Customer, Actual
Amount, Forecast Amount FROM --Subquery Act returns Actuals (
SELECT Customer, SUM(Sales Amount) AS Actual Amount FROM Sales
Fact, Customer JOIN ) Act INNER JOIN --Subquery Fsct returns
Forecast ( SELECT Customer, SUM(Forecast Amount) AS Forecast Amount
FROM Forecast Fact, Customer JOIN )Fsct --Join for the above 2
result sets ON Act.Customer = Fcst.Customer AND Back Customer
Actual Amount Forecast Amount Bill Owen $76859 $75768 James Brown
$63548 $85676
12. Junk Dimension
Grouping of flags and indicators
Clean up cluttered design that already has too many
dimensions
4 indicators (as above example) collapsed into a single integer
surrogate key in the fact table
Provide a smaller, quicker point of entry for queries (probably
not so relevant for database with BITMAP indices, e.g. Oracle)
See Also: Kiball Design Tip #48: De-Clutter With Junk
(Dimensions)
http:www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT48DeClutter.pdf
Back
Key Indicator1 Indicator2 Indicator3 Indicator4 1 Y Y Y Y 2 Y Y Y N
3 Y Y N Y 4 Y N Y Y
13. Accumulating Snapshot Schema Useful to track a multi-step
business process capture the process history in a single row Design
to ease the query design and query performance Back
14. Roadmap
Conformed Dimensions (Product, Department, Date) with full
Slowly Changing Dimension (SCD) capability
Best practice ETL (Error handling, batch controls, slowly
changing dimension ETL, foreign key lookup, assigning surrogate
key, entity start/end date generation, naming standard)
Star Schema design review process (we build it and we kill it
until it cant be killed!)
Column-Store vs. Row-Store ..column-store is able to process
column-oriented data so effectivelyfinding that late
materialization improves performance by a factor of
threecompression provides about a factor of two on average [1]
[1] D. J. Abadi, S. R. Madden, N. Hachem, Column-Stores vs.
Row-Store: How Different Are They Really? In SIGMOD08.
16. The road is long but we wont get lost!
Books are on the way to our library!
The Data Warehouse ETL Toolkit: Practical Techniques for
Extracting, Cleaning, Conforming, and Delivering Data (Ralph
Kimall)
Building the Data Warehouse (William E. Inmon)
Mastering Data Warehouse Aggregates: Solutions for Star Schema
Performance (Christopher Adamson)
The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling (Ralph Kimball)
Online materials (Kimball Group http://www.kimballgroup.com
)
Bus Matrix Diagram
Some more interesting academic papers/research on my desk!