8
The importance of date dimension in a data warehouse and BI project Date: March 3 th 2015 Author: Dirk Cludts

Date dimension in your data warehouse

Embed Size (px)

Citation preview

Page 1: Date dimension in your data warehouse

The importance of date dimension

in a data warehouse and BI project

Date: March 3th 2015

Author: Dirk Cludts

Page 2: Date dimension in your data warehouse

Page 2 The importance of a date dimension in a data warehouse and BI project

Web-IT Support and Consulting ~ [email protected]

1. Introduction

You're about to set up a new data warehouse, modify an existing one or maybe you are struggling

with dates in reports, dashboards, ... If so, this article will help you to understand what are the

benefits of a date dimension table. I will even try to prove you that a data warehouse always needs

to contain a date dimension table and that cubes, reports, dashboards, ... that are based on dates

use in 99% of the cases a date dimension table (or should do).

2. What's in it for you

Ask yourself the following questions and if you answer yes to one or more of these questions, then I

would invite you to read on.

I never used a dedicated date dimension?

I don't know what a date dimension is?

I need to perform analysis over a different periods (weeks, months, quarters, years)?

I need to do some specific grouping on dates?

I have gaps in my date ranges?

Departments in my company have different approaches to what is the first day of the week,

what is the first week of the year, what are the legal holidays (per country), what is the first

quarter depending on fiscal or standard year, ...?

I work in an international company and have big issues with all kinds of date formats?

I can't get all the information I want out of date (fiscal information, holiday, working day even if

weekend, ...)?

Did you answer “Yes” at least once? Guess

what? 8 out of 10 people do when they

start with data warehouses and Business

Intelligence projects. Enjoy the next

chapters and if you have additional

questions, send me a message and I will try

to help you.

Page 3: Date dimension in your data warehouse

Page 3 The importance of a date dimension in a data warehouse and BI project

Web-IT Support and Consulting ~ [email protected]

3. What and why?

A date dimension contains all the information you need about a certain date and allows

developers, analysts, users, ... to analyze data as efficiently and accurate as possible. The most

commonly known date attributes are: day, week, month, quarter and year. But many more (very

useful) attributes can be collected from a date (see later).

Having this information at any time at your disposal without having to perform calculations, will

make date operations much more performant, will allow you to access the attributes very efficiently.

On top of that all people in your company will talk the same way about dates (standard or fiscal)

and no more additional coding in stored procedures, reports, PowerPivot, ... will be needed.

In this article I focus on dates and not so

much on time. But the same principles apply

to time, although with different attributes

(seconds, minutes, hours).

Nevertheless we advice you to separate these

two dimensions, mainly because of

performance issues. A day has only 24 hours,

so the time dimension will be a fixed table

that never changes. Once you created it, it

will last forever and contains normally 86.400

records (60 seconds x 60 minutes x 24 hours).

If you would combine this with every single

date in the date dimension it would lead to

millions and millions of records, which makes no sense and will slow down date operations in your

procedures. And to be honest, not so many companies need a time dimension, whereas every

company needs a date dimension.

For example: the amount of records for 1 year would go to 31.536.000 records (60 seconds x 60

minutes x 24 hours x 365 days) instead of 365 records.

4. Date attributes

First things first, so we understand each other crystal clear:

The following attributes are not an exhaustive list. More attributes are possible (seasons, fiscal

information, company specific information, extended holidays, boxed periods (Easter, Christmas,

company events, ...), ...).

None of the fields are calculated. The values are entered directly in the field, never calculated!

Mostly the content is uploaded with company specific scripts, certainly for fiscal and company

definitions

Each day can only appear ones in your dimension table. Even when your fiscal year is not the

same as a standard year, all the information of 1 day needs to be in 1 record.

Whatever information you think is useful for a date and will avoid calculation or concatenations

or wrong interpretations, just add it as an additional field.

Page 4: Date dimension in your data warehouse

Page 4 The importance of a date dimension in a data warehouse and BI project

Web-IT Support and Consulting ~ [email protected]

4.1. Example of a dimension table structure I mostly use for my

customers

As said before, there can be many fields. This depends all on the kind of date-related information

you need. The screenshot below only contains the fields I mostly use for a calendar year

(DateStandardXXXXXX). The same goes for fiscal year and should be additional fields added behind

all these ones (DateFiscalXXXXXX).

Page 5: Date dimension in your data warehouse

Page 5 The importance of a date dimension in a data warehouse and BI project

Web-IT Support and Consulting ~ [email protected]

Fields that are often used in selection criteria should also be properly indexed. This is very important

for performance reasons. Most likely these fields are CHAR, INT or DATETIME fields. But all depends

on your data warehouse architecture. Unfortunately, there are no real golden rules, just good

common sense.

4.2. Example of live data for one date

The title says it, it's an example. If you want to format your information in a different way, you can.

Personalize all your fields the way it's best for the type of output you need. And if for some reason,

the number of the week, the weekday, etc are different for your organization, just adapt it to you

needs. Again, no gulden rules, just down to earth logic.

Page 6: Date dimension in your data warehouse

Page 6 The importance of a date dimension in a data warehouse and BI project

Web-IT Support and Consulting ~ [email protected]

5. Useful date functions to help you fill in the table

The next few lines of code will often be the basis to calculate the different date elements you need

to insert into the dimension table. Depending on the fields and types, you need to be creative with

those (and other) date functions. But this is a good start.

To fill the table with a set of dates, the code structure looks like this (not all the details are coded

since they depend on your table):

Page 7: Date dimension in your data warehouse

Page 7 The importance of a date dimension in a data warehouse and BI project

Web-IT Support and Consulting ~ [email protected]

6. Translations

For multinationals or companies who want to report their information in several languages, it can

be a better option to create special translation tables. In the above example I added 4 languages

(NL, FR, UK and GE) for month names, quarter names, ... This is the easiest way to set things up and

most of my customers report in one of those languages. So no need to make more tables and more

complex structures (you could even add Spanish and Italian). But if you're not sure that these

languages are sufficient, a better solution is to work with separate tables. So for each possible

description of a month, quarter, ... you will have 1 record per item and per language. For

translations of the month January for example you would have a table that has the next set of

records:

Important: if you need to add Turkish, Russian, ... translations, all VARCHAR fields need to be

NVARCHAR fields so you can store unicode characters too!

7. Summary

By now you should be ready to relate all your relevant dates from different tables in your data

warehouse to this very important dimension table.

No more gaps in date ranges since you have

every single date in this table.

No more wrong ideas and opinions on weeks,

quarters or other periods.

No more issues when grouping dates in reports

or PowerPivot.

Easy date hierarchy set up in your cubes (SSAS).

Correct date formats through the entire

company.

And many more advantages.

I hope you found this article interesting and above all useful. If so, you don’t need to ask me

whether you can share it with colleagues, friends, partners, students, ... Because sharing knowledge

is what it's all about.

Page 8: Date dimension in your data warehouse

Page 8 The importance of a date dimension in a data warehouse and BI project

Web-IT Support and Consulting ~ [email protected]

One vision, one goal!

Learning & sharing information never ends…