15
Recipes of Data Warehouse and Business Intelligence How to check the Staging Area Loading

Data Warehouse and Business Intelligence - Recipe 3

Embed Size (px)

DESCRIPTION

How to check the staging area loading

Citation preview

Page 1: Data Warehouse and Business Intelligence - Recipe 3

Recipes of Data Warehouse and Business Intelligence

How to check the Staging Area Loading

Page 2: Data Warehouse and Business Intelligence - Recipe 3

The Micro ETL Foundation

• The Micro ETL Foundation is a set of ideas and solutions for Data Warehouse and Business Intelligence Projects in Oracle environment.

• It doesn’t use expensive ETL tools, but only your intelligence and ability to think, configure, build and load data using the features and the programming language of your RDBMS.

• This recipes is another easy example based on the slides of Recipes 1 and 2 of Data Warehouse and Business Intelligence.

• Copying the content of the following slides with your editor and SQL Interface utility, you can reproduce this example.

• The solution presented here is the check of Staging area loading

Page 3: Data Warehouse and Business Intelligence - Recipe 3

The load of data file

• Configure and load the source data file according to the slides of «Recipes 2 of Data Warehouse and Business Intelligence».

• Copy the SQL statement in a file. Run the script and you will load a Staging Table with a «click»

• Now we will see how to verify the load process.• The data source file is the following

EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID117 Sigal Tobias STOBIAS 5.151.274.564 24/07/2005 PU_CLERK 2800 114 30118 Guy Himuro GHIMURO 5.151.274.565 15/11/2006 PU_CLERK 2600 114 30119 Karen Colmenares KCOLMENA 5.151.274.566 10/08/2007 PU_CLERK 2500 114 30120 Matthew Weiss MWEISS 6.501.231.234 18/07/2004 ST_MAN 8000 100 50121 Adam Fripp AFRIPP 6.501.232.234 10/04/2005 ST_MAN 8200 100 50122 Payam Kaufling PKAUFLIN 6.501.233.234 01/05/2003 ST_MAN 7900 100 50123 Shanta Vollman SVOLLMAN 6.501.234.234 10/10/2005 ST_MAN 6500 100 50124 Kevin Mourgos KMOURGOS 6.501.235.234 16/11/2007 ST_MAN 5800 100 50125 Julia Nayer JNAYER 6.501.241.214 16/07/2005 ST_CLERK 3200 120 50126 Irene Mikkilineni IMIKKILI 6.501.241.224 28/09/2006 ST_CLERK 2700 120 50

Page 4: Data Warehouse and Business Intelligence - Recipe 3

The load process

• The objects involved in the process are showned in the next figure.

ConfigurationExternal View

(CXV)

ConfigurationExternal Table

(CXT)

Source ExternalView (FXV)

Source ExternalTable (FXT)

SourceData File

Staging Table(STT)

File Sytem

RowFile

Row ExternalTable (RXT)

File Definition Table (CFT)

Load

1

23 4 5

Page 5: Data Warehouse and Business Intelligence - Recipe 3

What to check

• At the end of the loading, we need to control that it is gone all ok. We need to ensure that the rows number in the Staging table is correct. To have this safety, we must show that:

1. The rows number declared in the .row file2. The rows number in the source data file3. The rows number in the external table that refers to the data file4. The rows number of the view builded on the external table5. The rows number of the staging table

Are all exactly the same.

• Now see what we need.

Page 6: Data Warehouse and Business Intelligence - Recipe 3

The detail check table

• Build a check table to contain the result of the checks

• IO_COD is the same of the configuration table created in «Recipes2».

• SEQ_NUM is a global sequential number got from an Oracle sequence.

• SOURCE_COD is the name of the data file• SORT_NUM is a sort number inside the

io_cod• CHECK_DET is a description of the check• N1_VAL is the rows counter• STAMP_DTS is the sysdate

DROP TABLE STA_CHK_LOT;CREATE TABLE STA_CHK_LOT( IO_COD VARCHAR2(12) NOT NULL, SEQ_NUM NUMBER NOT NULL, SOURCE_COD VARCHAR2(24) NOT NULL, SORT_NUM NUMBER NOT NULL, CHECK_DET VARCHAR2(600) NOT NULL, N1_VAL NUMBER NOT NULL, STAMP_DTS DATE NOT NULL);

DROP SEQUENCE STA_CHK_SEQ;CREATE SEQUENCE STA_CHK_SEQSTART WITH 1 INCREMENT BY 1;

Page 7: Data Warehouse and Business Intelligence - Recipe 3

The summary check table

• Build a summary check table to contain in only one row the result of the previous table.

• IO_COD is the same of the configuration table created in «Recipes2».

• *_CNT is the rows number got from the 5 checks showed in the slide 4.

• RET_COD will be the final result (OK or NOT OK)

• STAMP_DTS is the sysdate

DROP TABLE STA_IO_LOT;CREATE TABLE STA_IO_LOT( IO_COD VARCHAR2(12) NOT NULL, SOURCE_COD VARCHAR2(80) NOT NULL, DEC_CNT NUMBER, FIL_CNT NUMBER, FXT_CNT NUMBER, FXV_CNT NUMBER, STT_CNT NUMBER, RET_COD varchar2(30), STAMP_DTS DATE);

Page 8: Data Warehouse and Business Intelligence - Recipe 3

The count rows function

• At this point I need to write some pl/sql code. You can write it also in java or other programming language.

• This function count the number of lines in the source data file.

• It has 2 parameters: the folder (Oracle directory) and the file name.

• It is all. Now we can load the two check tables.

CREATE OR REPLACE FUNCTION F_COUNT_FILE_ROWS( P_DIR VARCHAR2 ,P_FILE_NAME VARCHAR2) RETURN NUMBER IS V_F UTL_FILE.FILE_TYPE; V_COUNT NUMBER; V_LINE VARCHAR2(2000);BEGIN V_COUNT := 0; V_F := UTL_FILE.FOPEN(P_DIR, P_FILE_NAME, 'R'); LOOP UTL_FILE.GET_LINE(V_F, V_LINE); V_COUNT := V_COUNT+1; END LOOP; UTL_FILE.FCLOSE(V_F);EXCEPTION WHEN NO_DATA_FOUND THEN UTL_FILE.FCLOSE(V_F); RETURN V_COUNT;END; /

Page 9: Data Warehouse and Business Intelligence - Recipe 3

The declared rows

• Insert this number with the following SQL statement.• It use the Oracle dictionary to find the file name.• It use the source external view to calculate the number

INSERT INTO STA_CHK_LOT ( IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS)VALUES ('employees1',STA_CHK_SEQ.NEXTVAL,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT'),1,'DECLARED',(SELECT NVL(MAX(ROWS_NUM),0) FROM STA_EMPLOYEES1_FXV),SYSDATE);

Page 10: Data Warehouse and Business Intelligence - Recipe 3

The file rows

• Insert this number with the following SQL statement.• It use the Oracle dictionary to find the file name.• It use the function to calculate the number

INSERT INTO STA_CHK_LOT (IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS)VALUES ('employees1',STA_CHK_SEQ.NEXTVAL,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT'),2,'FILE',NVL(F_COUNT_FILE_ROWS('STA_BCK', (SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT')),0),SYSDATE);

Page 11: Data Warehouse and Business Intelligence - Recipe 3

The external table rows

• Insert this number with the following SQL statement.• It use the Oracle dictionary to find the file name.• It use the external table to calculate the number

INSERT INTO STA_CHK_LOT (IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS)VALUES ('employees1',STA_CHK_SEQ.NEXTVAL,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT'),3,'EXTERNAL TABLE (STA_EMPLOYEES1_FXT)',(SELECT NVL(COUNT(*),0) FROM STA_EMPLOYEES1_FXT),SYSDATE);

Page 12: Data Warehouse and Business Intelligence - Recipe 3

The external view rows

• Insert this number with the following SQL statement.• It use the Oracle dictionary to find the file name.• It use the external view and the configuration table to calculate the number

INSERT INTO STA_CHK_LOT (IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS)VALUES ('employees1',STA_CHK_SEQ.NEXTVAL,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT'),4,'EXTERNAL VIEW (STA_EMPLOYEES1_FXV)',(SELECT NVL(COUNT(*),0) FROM STA_EMPLOYEES1_FXV)+(SELECT HEAD_CNT+FOO_CNT FROM STA_IO_CFT WHERE IO_COD = 'employees1'),SYSDATE);

Page 13: Data Warehouse and Business Intelligence - Recipe 3

The staging table rows

• Insert this number with the following SQL statement.• It use the Oracle dictionary to find the file name.• It use the staging table and the configuration table to calculate the number

INSERT INTO STA_CHK_LOT (IO_COD,SEQ_NUM,SOURCE_COD,SORT_NUM,CHECK_DET,N1_VAL,STAMP_DTS)VALUES ('employees1',STA_CHK_SEQ.NEXTVAL,(SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT'),5,'STAGING TABLE (STA_EMPLOYEES1_STT)',(SELECT NVL(COUNT(*),0) FROM STA_EMPLOYEES1_STT)+(SELECT HEAD_CNT+FOO_CNT FROM STA_IO_CFT WHERE IO_COD = 'employees1'),SYSDATE);

Page 14: Data Warehouse and Business Intelligence - Recipe 3

The summary check

INSERT INTO STA_IO_LOT ( IO_COD, SOURCE_COD, DEC_CNT, FIL_CNT, FXT_CNT, FXV_CNT, STT_CNT,RET_COD, STAMP_DTS)SELECT IO_COD, SOURCE_COD, DEC_CNT,FIL_CNT, FXT_CNT, FXV_CNT, STT_CNT ,(CASE WHEN (DEC_CNT=FIL_CNT AND FIL_CNT=FXT_CNT AND FXT_CNT=FXV_CNT AND FXV_CNT=STT_CNT) THEN 'OK' ELSE 'NOT OK' END) ,SYSDATE FROM (SELECT IO_COD,SOURCE_COD,SORT_NUM,N1_VAL FROM STA_CHK_LOT WHERE SOURCE_COD = (SELECT SUBSTR(LOCATION,1,80) FROM USER_EXTERNAL_LOCATIONS WHERE TABLE_NAME = 'STA_EMPLOYEES1_FXT'))PIVOT ( SUM(N1_VAL) FOR SORT_NUM IN ( 1 AS DEC_CNT, 2 AS FIL_CNT, 3 AS FXT_CNT, 4 AS FXV_CNT, 5 AS STT_CNT));COMMIT;

• Insert the summary check with the following SQL statement.• It use the detail table.• It use an Oracle 11g analytics function (but you can use something else)

Page 15: Data Warehouse and Business Intelligence - Recipe 3

Conclusion

Email - [email protected] (italian/english) - http://massimocenci.blogspot.it/

With only two log tables, a function and some SQL statement we have reached the control of a Staging Area table loading, without ETL tools.This is the philosophy of Micro ETL Foundation.

We are at the end of this recipe. The final result of the two check tables are: