Upload
reginald-leonard
View
217
Download
1
Embed Size (px)
Citation preview
SOFTWARE/WEB/MOBILE/DATABASE ARCHITECT, ENGINEER, AND DEVELOPER TORONTO, CANADAHTTP://SAYED.JUSTETC.NETHTTP://WWW.JUSTETC.NET
Sayed Ahmed
Logical Design of a Data Warehouse
OUR SERVICES
Free Training and Educational Services Training and Education in Bangla:
Bangla.SaLearningSchool.com Training and Education in English:
www.SaLearningSchool.com English.SaLearningSchool.com
Ask a question and get answers: Ask.JustEtc.net
TOPICS - KEYWORDS
Design a Data Warehouse Star Schema Snow Flake Schema Dimension Tables Fact Tables Auditing Surrogate Keys Type 1, Type 2, Type 3, and Mixed solutions for
slowly changing dimension data ( SCD management) Pivoting for Analysis
To help with SSAS on data warehouse
TOPICS - KEYWORDS
Design a Data Warehouse Additive measures
Semi additive measures Hierarchies for dimensions
Attributes in dimensions Attributes in lookup tables
Long term data warehouse design Usually Star Schema
Short term data warehouse design POC Usually snowflake schema
TOPICS - KEYWORDS
Fact Tables measures foreign keys and possibly an additional primary key and lineage columns granularity of fact tables auditing and lineage needs
Measures can be additive non-additive semi-additive
TOPICS - KEYWORDS
dimension keys names attributes member properties translations and lineage
TOPICS - KEYWORDS
attributes natural hierarchies
many-to-many fact table relationships you can introduce an additional
intermediate dimension
CONCLUSION
Not much – right However, if you understand all the terms and
can implement all these concepts in your data warehouse That will be great Not necessarily you will need to use all of these
concepts; however, you may need to justify based on the situation, will all or any of these will help?
What will help and what will not help
Check our sub sequent videos and tutorials
TOOLS AND SOFTWARE REQUIREMENTS Download the Adventure Works databases
OLTP database (LOB database) Data warehouse Database From
http://msftdbprodsamples.codeplex.com/releases/view/55330
For this tutorial, you can just check our slides Though the following tools will help
And probably check the details in the downloaded databases esp. The AdventureWorksDW2012
You will need help from SQL Server and SQL Server MGMT Studio Tools
REQUIRED TOOLS
Useful/Required SQL Server Components Database Engine Services Documentation Components Management Tools - Basic Management Tools – Complete SQL Server Data Tools
DATA WAREHOUSE DESIGN – THE DETAILS Data Warehouse Logical Design
Topics: Design and Implement a Data Warehouse Design and implement dimensions. Design and implement fact tables Design Auditing
track the source and time for data coming into a DW through auditing i.e lineage information
Why a Data Warehouse? It is hard to
generate reports from OLTP/LOB/Transactional database To do Analysis on OLTP database data (some times) Get useful information/useful summarized and details data to
be used to take business decisions
DATA WAREHOUSE DESIGN – THE DETAILS Why a Data Warehouse?
Data in OLTP are heavily normalized. The goal was to keep one data only in one single place to reduce redundancy and consistency of data
You may end up with many tables 100s, 1000s To generate reports you may need to join many
tables – will be slow Historical data may not be there Data quality is also an issue For reporting or analyzing, you may need data
from multiple databases across many departments
WHY A DATA WAREHOUSE?
So you can create a Data Warehouse By cleaning data With historical data Combining data from multiple sources Denormalizing data Using specific design geared towards Data
Warehouse design Some or many consider DW design is less complex
than relational database design Though it also has some complex areas to address... (by
those some or many)
SO WHAT DOES A DATA WAREHOUSE CONTAIN?
Usually two schemas are used for a DW Star Schema-> looks like a star Snow Flake Schema
Another one called Dimensional Model Includes both Star and Snow Flake in the
same Data Warehouse Both Schemas has tables of two types
Dimension Tables Fact Tables
SO WHAT DOES A DATA WAREHOUSE CONTAIN?
Fact Tables are in the center A Fact table joins/combines all the data required for this
reporting or for the business aspect of this reporting Usually combines the primary keys of different tables that
contain data for this report/business aspect
Dimension tables are all the other tables that contain actual data Dimension tables are the tables that contain data
these can be the actual tables in the OLTP database without any modification (Snow Flake)
Or Dimension tables can be newly created by denormalizing the existing OLTP databases (Star)
SO WHAT DOES A DATA WAREHOUSE CONTAIN?
So, you know now what are dimension tables and what are fact tables Fact tables contain primary keys of all related
tables (here they are foreign keys) Dimension tables contain data
Usually, it’s better that you keep your data warehouse separate from your OLTP database So bring all the tables (dimension) here Or denormalize them and bring them here in the
new database
SIMPLIFIED: WHAT ARE STAR AND SNOWFLAKE SCHEMAS
If you just create Fact tables and take all the related tables from your OLTP/LOB databases You get a Snow Flake Schema Here all Dimension tables are still normalized (as
you just took them from the actual database) This is easy –
so good for short-term, quick, and experimental Data Warehouse
One note, your reporting and analysis services queries (MDX, DMV) will be slow with Snow Flake Schemas
SO WHAT DOES A DATA WAREHOUSE CONTAIN?
Now, when you denormalize the dimension tables You get the start schema The Fact tables remain the same for
example Star Schema is kind of standard and
used a lot Originally was developed in 1980’s
EXAMPLES: WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
Sales amount for internet sales by different countries and historical years
WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
issues that I did not mention before If your OLTP database was well designed
(?) It may be hard to find the tables related to the
reporting The table names and the column names can be
tricky – do not follow any conventions – do not have meaning
So it can be hard to find data for the reporting
WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
Note: Reality: The OLTP may not even be well designed (that makes
reporting hard sometimes) even the relationships as well as normalization
– here we assumed that OLTP is perfect In a long back project
I had to re-write/verify/check/change/optimize/had to deal with (whatever you say) 100s (not really 100s, can be close to 100) of queries for a reporting system
Had to change the interface from one button for one report (easy to get lost)
Into a drop down list of reports The relations among data were arbitrary – actually had only in the
mind of the designer – did not follow any standards – No ER – no standard concepts---
So it was a hard job.. Anyway..
WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
In such cases Tools such as SQL Profiler might help you could create a test environment,
try to insert some data through an LOB application have SQL Profiler identify where the data was inserted
Another, issue with this particular example No lookup for dates and years
You need to extract The tables may not contain even historical data
No date field So no historical data
WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
If sales data reside in multiple databases even by multiple departments How do you merge Identify and match Customer data can be in different database with no
common identification Data quality can be low
Data missing Partial data Inconsistent data in multiple databases Data can be represented differenlt in different database
M or F for gender 1 or 0 for gender
STAR SCHEMA/FACT/DIMENSION/CUBE
TOTAL DW: MULTIPLE STAR SCHEMAS You saw one Star Schema for Internet Sales You can see another for Offline Sales Another for Accounting Your DW has many such Star Schemas And these start schemas need to be connected/related They will be connected when you use the same dimensions
for them i.e. If two star schemas have the same dimension they can share
that dimension Called: shared or conformed dimensions
For SSAS, you can use shared dimensions only There is a concept of private dimension
Not a great idea in practical and real life applications You cannot connect/compare/verify the data over the shared dimension
SHARED/CONFORMED DIMENSIONS
DENORMALIZED DIMDATE TABLE
SNOW FLAKES WILL BE MORE AND MORE NORMALIZED
Everything can be normalized Or the first level can be normalized
others are not
NORMALIZED PRODUCT DIMENSION
In the Star Schema, you could use these normalized product table to get snow flake schema (partially.) Could use all normalized dimensions to get full snow flake
SNOW FLAKE
In Snow flake, you may see partial than full snow flakes in reality
Though, in reality, better to go for star schema Queries will be faster
PARTIAL SNOW FLAKE
GRANULARITY
The number of Dimension Tables connected to a fact table Dimension of a star schema Cube = 3 dimension SSAS operates/analyzes on Cube
AUDITING AND LINEAGE
I will be very short on this In data warehouse, you may want some
auditing tables For every update, you should audit
who made the update, when it was made, and how many rows were transferred
to each dimension and fact table
in your DW
AUDITING AND LINEAGE
You will need additional fields/columns in your dimension and fact tables to track When, and who, and from where the row
data was/were updated Your ETL process needs to be updated If you used SSIS for the ETL
Modify SSIS packages so that you can record these information