15
SCD Type 1 Type 1 Slowly Changing Dimension data warehouse architecture applies when no history is kept in the database. The new, changed data simply overwrites old entries. This approach is used quite often with data which change over the time and it is caused by correcting data quality errors (misspells, data consolidations, trimming spaces, language specific characters). Type 1 SCD is easy to maintain and used mainly when losing the ability to track the old history is not an issue. SCD 1 implementation in Datastage The job described and depicted below shows how to implement SCD Type 1 in Datastage. It is one of many possible designs which can implement this dimension. The example is based on the customers load into a data warehouse

SCD Docs

Embed Size (px)

Citation preview

Page 1: SCD Docs

SCD Type 1

Type 1 Slowly Changing Dimension data warehouse architecture applies when no history is kept in the database. The new, changed data simply overwrites old entries. This approach is used quite often with data which change over the time and it is caused by correcting data quality errors (misspells, data consolidations, trimming spaces, language specific characters).Type 1 SCD is easy to maintain and used mainly when losing the ability to track the old history is not an issue.

SCD 1 implementation in Datastage

The job described and depicted below shows how to implement SCD Type 1 in Datastage. It is one of many possible designs which can implement this dimension. The example is based on the customers load into a data warehouse

Page 2: SCD Docs
Page 3: SCD Docs

The most important facts and stages of the CUST_SCD2 job processing: There is a hashed file (Hash_NewCust) which handles a lookup of the new data coming from the text file. A T001_Lookups transformer does a lookup into a hashed file and maps new and old values to separate columns.

Page 4: SCD Docs

A T002 transformer updates old values with new ones without concerning about the overwritten data. SCD1 Transformer update old entries

The database is updated in a target ODBC stage (with the 'update existing rows' update action)

Page 5: SCD Docs

SCD Type 2

Slowly changing dimension Type 2 is a model where the whole history is stored in the database. An additional dimension record is created and the segmenting between the old record values and the new (current) value is easy to extract and the history is clear. The fields 'effective date' and 'current indicator' are very often used in that dimension and the fact table usually stores dimension key and version number.

SCD 2 implementation in Datastage

The job described and depicted below shows how to implement SCD Type 2 in Datastage. It is one of many possible designs which can implement this dimension.For this example, we will use a table with customers data (it's name is D_CUSTOMER_SCD2) which has the following structure and data:

CUST_ID

CUST_ CUST_ CUST_ CUST_ REC_ REC_ REC_

NAMEGROUP_I

DTYPE_I

DCOUNTRY_I

DVERSIO

N EFFDTCURRENT_IN

DDRBOUA7

Dream Basket EL S PL 1

10/1/2006 Y

ETIMAA5ETL tools info BI C FI 1

9/29/2006 Y

FAMMFA0 Fajatso FD S CD 1

9/27/2006 Y

FICILA0First Pactonic FD C IT 1

9/25/2006 Y

FRDXXA2 Frasir EL C SK 19/23/200

6 YGAMOPA9

Ganpa LTD. FD C US 1

9/21/2006 Y

GGMOPA9

GG electronics EL S RU 1

9/19/2006 Y

GLMFIA6 Glasithkli FD S PL 1 9/17/200 Y

Page 6: SCD Docs

ni 6

GLMPEA9Globiteleco TC S FI 1

9/15/2006 Y

GONDWA5

Goli Airlines BN S GB 1

9/13/2006 Y

Page 7: SCD Docs

The most important facts and stages of the CUST_SCD2 job processing:

Page 8: SCD Docs

The dimension table with customers is refreshed daily and one of the data sources is a text file. For the purpose of this example the CUST_ID=ETIMAA5 differs from the one stored in the database and it is the only record with changed data. It has the following structure and data: SCD 2 - Customers file extract:

There is a hashed file (Hash_NewCust) which handles a lookup of the new data coming from the text file. A T001_Lookups transformer does a lookup into a hashed file and maps new and old values to separate columns. SCD 2 lookup transformer

Page 9: SCD Docs

A T002_Check_Discrepacies_exist transformer compares old and new values of records and passes through only records that differ. SCD 2 check discrepancies transformer

Page 10: SCD Docs

A T003 transformer handles the UPDATE and INSERT actions of a record. The old record is updated with current indictator flag set to no and the new record is inserted with current indictator flag set to yes, increased record version by 1 and the current date. SCD 2 insert-update record transformer

Page 11: SCD Docs

ODBC Update stage (O_DW_Customers_SCD2_Upd) - update action 'Update existing rows only' and the selected key columns are CUST_ID and REC_VERSION so they will appear in the constructed where part of an SQL statement. ODBC Insert stage (O_DW_Customers_SCD2_Ins) - insert action 'insert rows without clearing' and the key column is CUST_ID.

Page 12: SCD Docs

CUST_IDCUST_ CUST_ CUST_ CUST_ REC_ REC_ REC_NAME GROUP_ID TYPE_ID COUNTRY_ID VERSION EFFDT CURRENT_IND

DRBOUA7 Dream Basket EL S PL 1 10/1/2006 Y

ETIMAA5 ETL tools info BI C FI 19/29/2006 N

FAMMFA0 Fajatso FD S CD 1 9/27/2006 YFICILA0 First Pactonic FD C IT 1 9/25/2006 YFRDXXA2 Frasir EL C SK 1 9/23/2006 YGAMOPA9 Ganpa LTD. FD C US 1 9/21/2006 Y

GGMOPA9GG electronics EL S RU 1 9/19/2006 Y

GLMFIA6 Glasithklini FD S PL 1 9/17/2006 YGLMPEA9 Globiteleco TC S FI 1 9/15/2006 YGONDWA5 Goli Airlines BN S GB 1 9/13/2006 Y

ETIMAA5 ETL-Tools.info BI C ES 212/2/2006 Y

SCD Type 3

In the Type 3 Slowly Changing Dimension only the information about a previous value of a dimension is written into the database. An 'old 'or 'previous' column is created which stores the immediate previous attribute. In Type 3 SCD users are able to describe history immediately and can report both forward and backward from the change. However, that model can't track all historical changes, such as when a dimension changes twice or more. It would require creating next columns to store historical data and could make the whole data warehouse schema very complex.

To implement SCD Type 3 in Datastage use the same processing as in the SCD-2 example, only changing the destination stages to update the old value with a new one and update the previous value field.

SCD Type 4

Page 13: SCD Docs

The Type 4 SCD idea is to store all historical changes in a separate historical data table for each of the dimensions.

To implement SCD Type 4 in Datastage use the same processing as in the SCD-2 example, only changing the destination stages to insert an old value into the destionation stage connected to the historical data table (D_CUSTOMER_HIST for example) and update the old value with a new one.