18
Dimensional Modeling (2) Gregory Ng Data Warehouse / Business Intelligence Designer 17th March 2008

Dimensional Modelling Session 2

  • Upload
    akitda

  • View
    2.101

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

  • 1. Dimensional Modeling (2) Gregory Ng Data Warehouse / Business Intelligence Designer 17th March 2008
  • 2. Dimension Model vs. ER Model
    • ER Model:
    • Normalization to remove redundancy, anomaly and improve integrity up to 6NF
    • 3 major types of relation, one-to-one, one-to-many, many-to-many
    • Optimized for INSERT, UPDATE and DELETE type operation
    • i.e. Perfect for OLTP applications (high volume of small transactions)
    • Things to consider:
      • ER does not really model a business; rather modelling the micro relationships amount data elements
      • Query optimization
  • 3. Dimension Model vs. ER Model (cont) Dimension Model: Denormalized to 2NF (reduce number of tables and join paths), creates redundancy 1 major type of relationship, one-to-many Ideal for SELECT operation Top down approach: focus on business process Designed to support analytical queries and user access Handle anomaly within ETL Predictable SQL Perfect for OLAP applications
  • 4. Case Study 1 Project Writeaway (2009) Database SQL Server 2000 Reporting Hyperion IR Star Schema 4 No. of records ~ 1 mil Load Complete refresh Typical report generation time ~3 seconds Project build time 4 months Highlights Drill Across Factless Fact Table Dimension Outrigger Dimension Bridging Junk Dimension
  • 5. Case Study 2 Project Absenteeism (2006) Database SQL Server 2000 Reporting Cognos Star Schema 1 No. of records ~ 1 mil Load Incremental Typical report generation time ~25 seconds Project build time 4 weeks Highlights Drill Across Slowly Changing Dimension Active Data Warehouse
  • 6. Case Study 3 Project Mortgage Wealth DNA (2009) Database Teradata Reporting Hyperion IR Star Schema 3 No. of records ~ 150 mil Load Incremental Typical report generation time ~20-30 seconds Project build time 3 months Highlights Drill Across Aggregate Join Index Partitioning/Multi-Partitioning 99% aggregation done on Teradata on the fly minimise data retrieval
  • 7. Case Study 4 Project Commway (2005) Database SQL Server 2000 Reporting Cognos Star Schema 3 No. of records ~ 10 mil Load Incremental Typical report generation time ~ 30 seconds Project build time 18 months Highlights Drill Across Slowly Changing Dimension Active Data Warehouse .NET Front End for Data Entry (4000+ Users)
  • 8. Skills we have now
      • Dimension modeling techniques/templates for different processes and subject areas
      • Practiced appropriate dimensional modeling techniques in different scenarios; Conformed Dimension , Junk Dimension , Outrigger Dimension, Rapid Changing Dimension, Dimension Bridging, Degenerate Dimension, Accumulating Snapshot Fact Table , Late Arrival Fact, Factless Fact Table
      • Refining ETL coding techniques; fact-to-dimension foreign key lookup via natural key, source staging/staging/helper/interim table methodology
      • Data Warehouse architecture for Dimensional Modeling
      • Dimensional Modeling Workshop procedures
      • ETL mapping documentations
      • Reporting with Dimensional Model; multi-pass SQL
      • Practiced Star Schema friendly Teradata functions; AJI, Partition, Multi-Partitions
  • 9. Technologies we have now
      • State-of-the-art Teradata hardware
      • GDW in 3rd NF
      • Essbase Studio (EIS)
      • DataStage
      • Oracle Grid coming online?
      • OBIEE
    • Next
  • 10. Shared Dimension (Conformed) and Drill Across Drill across to different business process fact can be enable via confromed dimension
  • 11. Shared Dimension (Conformed) and Drill Across (cont) To produce the following drill across report: SELECT Customer, Actual Amount, Forecast Amount FROM --Subquery Act returns Actuals ( SELECT Customer, SUM(Sales Amount) AS Actual Amount FROM Sales Fact, Customer JOIN ) Act INNER JOIN --Subquery Fsct returns Forecast ( SELECT Customer, SUM(Forecast Amount) AS Forecast Amount FROM Forecast Fact, Customer JOIN )Fsct --Join for the above 2 result sets ON Act.Customer = Fcst.Customer AND Back Customer Actual Amount Forecast Amount Bill Owen $76859 $75768 James Brown $63548 $85676
  • 12. Junk Dimension
      • Grouping of flags and indicators
      • Clean up cluttered design that already has too many dimensions
      • 4 indicators (as above example) collapsed into a single integer surrogate key in the fact table
      • Provide a smaller, quicker point of entry for queries (probably not so relevant for database with BITMAP indices, e.g. Oracle)
    • See Also: Kiball Design Tip #48: De-Clutter With Junk (Dimensions) http:www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT48DeClutter.pdf
    • Back
    Key Indicator1 Indicator2 Indicator3 Indicator4 1 Y Y Y Y 2 Y Y Y N 3 Y Y N Y 4 Y N Y Y
  • 13. Accumulating Snapshot Schema Useful to track a multi-step business process capture the process history in a single row Design to ease the query design and query performance Back
  • 14. Roadmap
      • Conformed Dimensions (Product, Department, Date) with full Slowly Changing Dimension (SCD) capability
      • Best practice ETL (Error handling, batch controls, slowly changing dimension ETL, foreign key lookup, assigning surrogate key, entity start/end date generation, naming standard)
      • Star Schema design review process (we build it and we kill it until it cant be killed!)
      • Dimensional Modeling trainings
      • Code generator: DataStage, Oracle Warehouse Builder??
  • 15. Myth busted
      • Teradata do not support Star Schema
      • Star Schema cannot support large volume of data
      • Column-Store vs. Row-Store ..column-store is able to process column-oriented data so effectivelyfinding that late materialization improves performance by a factor of threecompression provides about a factor of two on average [1]
    • [1] D. J. Abadi, S. R. Madden, N. Hachem, Column-Stores vs. Row-Store: How Different Are They Really? In SIGMOD08.
  • 16. The road is long but we wont get lost!
    • Books are on the way to our library!
      • The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data (Ralph Kimall)
      • Building the Data Warehouse (William E. Inmon)
      • Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance (Christopher Adamson)
      • The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Ralph Kimball)
    • Online materials (Kimball Group http://www.kimballgroup.com )
    • Bus Matrix Diagram
    • Some more interesting academic papers/research on my desk!
  • 17. Bus Matrix Back
  • 18. Previous presentations