8
 Data Warehouse Testing Neveen ElGamal Faculty of Computers and Information, Cairo University, Giza, Egypt +201002585680 [email protected] Supervised by:  Ali ElBa staw issy Faculty of Computers and Information, Cairo University, Giza, Egypt [email protected] Galal Galal-Edeen Faculty of Computers and Information, Cairo University, Giza, Egypt [email protected] Thesis State: Middle ABSTRACT During the development of the data warehouse (DW), too much data is transformed, integrated, structured, cleansed, and grouped in a single structure that is the DW. These various types of changes could lead to data corruption or data manipulation. Therefore, DW testing is a very critical stage in the DW development process. A number of attempts were made to describe how the testing  process should take place in the DW e nvironment. In t his paper, I will state briefly these testing approaches, and then a proposed matrix will be used to evaluate and compare these approaches. Afterwards, I will highlight the weakness points that exist in the available DW testing approaches. Finally, I will describe how I will fill the gap in the DW testing in my PhD by developing a DW Testing Framework presenting briefly its architecture. Then, I will state the scope of work that I am planning to address and what type of limitations that exist in this area that I am expecting to experience. In the end, I will conclude my work and state possible future work in the field of DW testing. 1. INTRODUCTION In the data warehousing process, data passes through several stages each of which causes a different kind of changes to the data to finally reach the user in a form of a chart or a report. There should be a way of guaranteeing that the data in the sources is the same data that reaches the user, and the data quality is improved; not lost. ©2013 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored  by an employee, contractor or affiliate of the national government of Egypt. As such, the government of Egypt retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. / EDBT/ICDT '13 , March 18 - 22 2013, Ge noa, It aly Copyright 2013 ACM 978-1-4503-1599-9/13/03…$15.00 Testing the DW system outputs with the data in the data sources is not the best approach to experiment the quality of the data. However, this type of test i s an informative test that will take place at a certain point in the testing process but the most important part in the testing process should take place during the DW development. Every stage and every component the data passes through should be tested to guaranty its efficiency and data quality preservation or even improvement. 1.1 The Challenges of DW testing It is widely agreed upon that the DW is totally different from other systems, such as Computer Applications or Transactional Database Systems. Consequently, the testing techniques used for these other systems are inadequate to be used in DW testing. Here are some of the differences as discussed in [6-8]; 1. DW always answers Ad-hoc queries, which makes it impossible to test prior to system delivery. On the other hand, all functions in any computer application realm are predefined. 2. DW testing is data centric, while application is code centric. 3. DW always deals with huge data volumes. 4. The testing process in other systems ends with the development life-cycle while in DWs it continues after the system delivery. 5. “Software projects are self contained but a DW project continues due to decision-making process requirement for ongoing changes” [8]. 6. Most of the available testing scenarios are driven by some user inputs while in DW most of the tests are system-triggered scenarios. 7. Volume of test-data in DW is considerably large compared to any other testing process.

EETL DATA MAR VALIDATIONtl Data Mar Validation

Embed Size (px)

Citation preview

Page 1: EETL DATA MAR VALIDATIONtl Data Mar Validation

8/10/2019 EETL DATA MAR VALIDATIONtl Data Mar Validation

http://slidepdf.com/reader/full/eetl-data-mar-validationtl-data-mar-validation 1/8

Data Warehouse TestingNeveen ElGamal

Faculty of Computers and Information,Cairo University,

Giza, Egypt+201002585680

[email protected]

Supervised by:

Ali ElBastawissyFaculty of Computers and Information,

Cairo University, Giza, [email protected]

Galal Galal-EdeenFaculty of Computers and Information,

Cairo University, Giza, [email protected]

Thesis State: Middle

ABSTRACT

During the development of the data warehouse (DW), too muchdata is transformed, integrated, structured, cleansed, andgrouped in a single structure that is the DW. These various typesof changes could lead to data corruption or data manipulation.Therefore, DW testing is a very critical stage in the DWdevelopment process.A number of attempts were made to describe how the testing

process should take place in the DW environment. In this paper,I will state briefly these testing approaches, and then a proposedmatrix will be used to evaluate and compare these approaches.Afterwards, I will highlight the weakness points that exist in theavailable DW testing approaches. Finally, I will describe how Iwill fill the gap in the DW testing in my PhD by developing aDW Testing Framework presenting briefly its architecture.Then, I will state the scope of work that I am planning toaddress and what type of limitations that exist in this area that Iam expecting to experience. In the end, I will conclude mywork and state possible future work in the field of DW testing.

1. INTRODUCTIONIn the data warehousing process, data passes through several

stages each of which causes a different kind of changes to thedata to finally reach the user in a form of a chart or a report.There should be a way of guaranteeing that the data in thesources is the same data that reaches the user, and the dataquality is improved; not lost.

©2013 Association for Computing Machinery. ACMacknowledges that this contribution was authored or co-authored

by an employee, contractor or affiliate of the nationalgovernment of Egypt. As such, the government of Egypt retainsa nonexclusive, royalty-free right to publish or reproduce thisarticle, or to allow others to do so, for Government purposesonly./ EDBT/ICDT '13 , March 18 - 22 2013, Genoa, ItalyCopyright 2013 ACM 978-1-4503-1599-9/13/03…$15.00

Testing the DW system outputs with the data in the data

sources is not the best approach to experiment the quality ofthe data. However, this type of test is an informative test thatwill take place at a certain point in the testing process but themost important part in the testing process should take placeduring the DW development. Every stage and everycomponent the data passes through should be tested toguaranty its efficiency and data quality preservation or evenimprovement.

1.1 The Challenges of DW testingIt is widely agreed upon that the DW is totally different fromother systems, such as Computer Applications orTransactional Database Systems. Consequently, the testingtechniques used for these other systems are inadequate to be

used in DW testing. Here are some of the differences asdiscussed in [6-8];

1. DW always answers Ad-hoc queries, which makes itimpossible to test prior to system delivery. On the otherhand, all functions in any computer application realmare predefined.

2. DW testing is data centric, while application is codecentric.

3. DW always deals with huge data volumes.

4. The testing process in other systems ends with thedevelopment life-cycle while in DWs it continues afterthe system delivery.

5. “Software projects are self contained but a DW projectcontinues due to decision-making process requirementfor ongoing changes” [8].

6. Most of the available testing scenarios are driven bysome user inputs while in DW most of the tests aresystem-triggered scenarios.

7. Volume of test-data in DW is considerably largecompared to any other testing process.

Page 2: EETL DATA MAR VALIDATIONtl Data Mar Validation

8/10/2019 EETL DATA MAR VALIDATIONtl Data Mar Validation

http://slidepdf.com/reader/full/eetl-data-mar-validationtl-data-mar-validation 2/8

8. In other systems test cases can reach hundreds but thevalid combinations of these test cases will never beunlimited. Unlike the DW, the test cases are unlimiteddue to the core objective of the DW that allows all

possible views of data. [7].

9. DW testing consists of different types of testsdepending on the time the test is taking place and thecomponent being tested. For example; Initial data load

test is different from the incremental data load test.One of the core challenges of testing DWs or providingtechniques for testing DWs is its flexible architecture. DWsystems could have different architectures according to

business requirements, DW required functionalities, and/or budget/time constraints.

1.2 Data Warehouse ArchitectureAs shown in figure 1, a global DW system consists of anumber of inter-related components:

• Data Sources (DS)• Operational Data Store (ODS)/ Data Staging Area

(DSA)• Data Warehouse (DW)• Data Marts (DM)• And, User Interface (UI) Applications .Ex; OLAP

reports, Decision Support tools, and Analysis Tools

Figure 1. DW System Architecture

Each component needs to be tested to verify its efficiencyindependently. The connections between the DWcomponents are groups of transformations that take place on

data. These transformation processes should be tested as wellto ensure data quality preservation. The outputs of the DWsystem should be compared with the original data existing inthe DSs. Finally, from the operational point of view the DWsystem should be tested for performance, reliability,robustness, recovery, etc…

The remainder of this paper will be organized as follows;Section 2 briefly surveys the existing DW testing approachesand introduces our DW testing matrices that we used tocompare and evaluate the existing DW testing approaches.

Section 3 will analyze the comparison matrix to highlight thedrawbacks and weaknesses that exist in the area of DWtesting and the requirements for a DW testing framework.Section 4 presents my expected contribution in the PhD tofill the gap of DW testing by introducing a new DW testingframework. Section 5 will present the architecture of the

proposed DW testing Framework and Section 6 will state thescope of work and expected limitations that we are expecting

to experience during our work. Finally I will conclude mywork in section 7.

2. RELATED WORK

2.1 DW Testing ApproachesA number of trials were made to address the DW testing

process. Some were made by companies offering consultancyservices for DW testing like [5, 13, 14, 17, 22]. Others, to fillthe gap of not finding a generic DW testing technique, have

proposed one as a research attempt like [1-4, 8, 11, 15, 18,23]. A different trend was taken by authors and organizationsto present some automated tools for the DW testing processlike [12, 16, 22], and from a different perspective, Some

authors presented a DW testing methodology like [2, 14, 21].The rest of this section will briefly introduce theseapproaches in groups according to similarities, and acomparison between them will be presented later in thefollowing sections.

The approaches presented in [1-3, 14, 15] have adapted someof the software testing types like;

1. Unit testing,

2. Integration Testing,

3. System Testing,

4. User Acceptance Testing,

5. Security Testing,

6. Regression Testing,

7. Performance Testing,

and extended them to support the special needs of the DWtesting. What was a great advantage for these approaches isthat they had a solid background which is the well definedand formulated software testing that helped them in

presenting a DW testing approach that is not so far from thesystem testers to understand and use.

Other attempts like [4, 13, 23] focused on addressing thetesting of the most important part of the Data Warehousing

process which is the ETL process. [13] addressed the processof DW testing from 2 high-level aspects;

• The underlying Data which focuses on the datacoverage and data compliance with thetransformation logic in accordance with the

business rules,• DW components which focuses on the

Performance, Scalability, ComponentOrchestration, and regression testing.

What was unique in the approach presented in [4] is that theyconcentrated on the data validation of ETL and presented 2

Page 3: EETL DATA MAR VALIDATIONtl Data Mar Validation

8/10/2019 EETL DATA MAR VALIDATIONtl Data Mar Validation

http://slidepdf.com/reader/full/eetl-data-mar-validationtl-data-mar-validation 3/8

alternatives for the testing process either the White BoxTesting where the data is tracked through the ETL processitself, or the Black Box Testing where only the input and theoutput data of the ETL process is validated.

The Research attempts presented by the two research groupsfrom DISI -Bologna University (Matteo Golfarelli andStefano Rizzi) and Slovak University of Technology (PavolTanuška, Oliver Morav č ík, Pavel Važan Fratišek Miksa,

Peter Schreiber, Jaroslav Zeman, Werner Verschelde, andMichal Kopcek) where the richest attempts that addressed theDW testing from various perspectives. In [18], the authorssuggested a proposal for basic DW testing activities(routines) as a final part of the DW testing methodology.Other parts of the methodology were published in [19-21].The testing activities can be split into four logical unitsregarding:

Multidimensional database testing, Data pump (ETL) testing, Metadata and, OLAP testing.

The authors then highlighted how these activities split intosmaller more distinctive activities to be performed during theDW testing process.

In [8], the authors introduced DW testing activities (routines)framed within a DW development methodology introducedin [9]. They have stated that the components that need to betested are, Conceptual Schema, Logical Schema, ETLProcedures, Database, and Front-end. To be able to test thesecomponents, they have listed eight test types that best fit thecharacteristics of DW systems. These test types are:

Functional test Usability test Performance test Stress test

Recovery Test Security test Regression test

A comprehensive explanation of how the DW componentsare being tested by the above testing routines is then exploredshowing what type of test(s) is suitable for which componentas shown in table 1.

The authors then customized their DW testing technique to present a prototype-based methodological framework [11].Its main features are (1) Earliness to the life-cylce, (2)modularity, (3) Tight coupling with design, (4) Scalability,and (5) measurability throught proper matrics. The latest

piece of work that this research group presented in [10] was anumber of data-mart specific testing activities, classifiedinterms of what is tested and how it is tested. The onlydrawback with this research group’s work is that its DWarchitecture does not include a DW component. The data isloaded from the data sources to the data marts directly. TheirDW architecture consists of a number of data martsaddressing different business areas.

TABLE 1: DW components Vs testing types [8]

The approaches presented in [12, 16, 22] are CASE toolsspecially designed to address the DW Testing. TAVANT

product named “One-Click Automation Framework for DataWarehouse Testing” is a tool that supports DW testing

process that works concurrently with the DW DevelopmentLife Cycle [22]. It imbeds the test cases in the ETL processand insures that all stages of ETL are tested and verified

before subsequent data loads are triggered. Their DWTesting methodology is incremental and is designed toaccommodate changing DW schema arising from evolving

business needs.

The QuerySurge CASE tool developed by RTTS [16] is atool that assists the DW testers in preparing and schedulingquery pairs to compare data transformed from the source tothe destination, for example; preparing a query pair one thatruns on a DS and the other on the ODS to verify thecompleteness, correctness, and consistency of the structure ofdata and the data transformed from the DS to ODS.

Inergy is a company specialized in designing, developing andmanaging DWs and BI solutions [12]. They have developeda tool to automate the ETL testing process. They did not

build their tool from scratch. Instead, they used some existingtools like DbUnit (Database Testing Utility) and Ant TaskExtension (Java-based build tool) and they presented anextension to the DbUnit by applying some DW specific logiclike;

• Which ETL process to run at what time• Which dataset should be used as a reference• What logic to repeat and for which range

They also developed a PowerDesigner script that extracts thecore of the test script based on the data model of the DW.

Unfortunately, not enough data were available for theapproaches [5, 17] as they are two approaches developed bycompanies offering consultancy services and reveling theirDW testing techniques could negatively affect their business.

After we introduced the previous work that had been made inthe DW testing now it is time to exhaustively study, compareand evaluate them. In order to do such a comprehensivestudy, we have defined a group of DW testing matrices in[6].

Page 4: EETL DATA MAR VALIDATIONtl Data Mar Validation

8/10/2019 EETL DATA MAR VALIDATIONtl Data Mar Validation

http://slidepdf.com/reader/full/eetl-data-mar-validationtl-data-mar-validation 4/8

2.2 DW Testing MatricesAs we have presented previously in [6], the DW testingmatrices classifies tests – or test routines for clarification-according to where, what, and when they take place;• WHERE: presents the component of the DW that this test

targets. This divides the DW architecture as shown infigure 1 into the following layers:

o Data Sources to Operational Data Store : Presentsthe testing routines targeting data sources, wrappers,extractors, transformations and the operational datastore itself.

o Operational Data Store to DW: Presents the testingroutines targeting the loading process, and the DWitself.

o DW to Data Marts : Presents the testing routinestargeting the data marts and the transformations thattake place on the data used by the data marts and thedata marts themselves.

o Data Marts to User Interface : Presents the testing

techniques targeting the transformation of data to theinterface applications and the interface applicationsthemselves.

• WHAT: presents what these routines will test in thetargeted component.

o Schema: focuses on testing DW design issues.o Data: concerned with all data related tests like data

quality, data transformation, data selection, data presentation, etc…

o Operational: includes tests that are concerned withthe process of putting the DW into operation.

• WHEN will this test take place?

o Before System Delivery: A onetime test that takes place before the system is delivered to the user orwhen any change takes place on the design of thesystem.

o After System Delivery: Redundant test that takes placeseveral times during system operation.

The ‘ what’, ‘where’ and ‘ when’ testing categories will resultin a three dimensional matrix. As shown in table 2, the rowsrepresent the ‘ where’ dimension, the columns represent the‘what’ dimension, and later on the ‘ when’ dimension shall berepresented in color in the following section. Each cell of thistable will consist of the group of test routines that addresses

this combination of dimension members when this matrix isused to compare the existing DW testing approaches.

Table 2: DW Testing Matrices

Schema Data Operation

BackendDS ODS ODS DW

FrontendDW DMDM UI

2.3 Comparison and Evaluation ofSurveyed ApproachesAfter studying how each proposed DW testing approachaddressed the DW testing and according to the DW testingmatrices defined in the previous section, a comparison matrixis presented in table 3 showing the test routines that eachapproach covers. The DW testing approaches are representedon the columns, the ‘ what’ and ‘ where’ dimensions classifythe test routines on the rows. The intersection of rows andcolumns indicates the coverage of the test routine in thisapproach where “ √√” represents full coverage and “ √”represents partial coverage. Finally, the ‘ when’ dimensionthat indicates whether this test takes place before or aftersystem delivery is represented by color highlighting the testswhich take place after the system delivery, while the teststhat take place during the system development or when thesystem is subject to change are left without colorhighlighting.

We were able to compare only 10 approaches, as not enoughdata was available for the rest of the approaches.

As it is obvious in table 3, none of the proposed approachesaddressed the entire DW testing matrices. This is simply

because each approach addressed the DW testing processfrom its own point of view without leaning on any standardor general framework. Some of the attempts considered only

parts of the DW framework shown in figure 1. Otherattempts used their own framework for the DW environmentaccording to the case they are addressing. For example; [8]used a DW architecture that does not include either an ODSor DW Layers. The data is loaded from the Data Sources tothe Data Marts directly. This architecture makes the DataMarts layer acts as both the DW and the Data Martinterchangeably. Other approaches like [1, 14, 15] did notinclude the ODS layer.

From another perspective, there are some test routines thatare not addressed by any approach like; Data Quality factorslike accuracy, precision, continuity, etc... Some majorcomponents of the DW were not tested by any of the

proposed approaches which is the DM Schema and theadditivity of measures in the DMs.

3. REQUIREMENTS FOR A DWTESTING FRAMEWORKBased on carefully studying the available DW testingapproaches using the DW testing matrices presented

previously to analyze, compare and evaluate them, it isevident that the DW environment lacks the following:1. The existence of a generic, well defined, DW testing

approach that could be used in any project. This is because each approach presented its testing techniques based on its DW architecture which limits thereusability of the approach in other DW projects withdifferent DWs architectures.

2. None of the existing approaches included all the testroutines needed to guarantee a high quality DW afterdelivery.

Page 5: EETL DATA MAR VALIDATIONtl Data Mar Validation

8/10/2019 EETL DATA MAR VALIDATIONtl Data Mar Validation

http://slidepdf.com/reader/full/eetl-data-mar-validationtl-data-mar-validation 5/8

Table 3: DW Approaches Comparison

Backend

ODS DWTest

[ 1 ]

[ 4 ]

[ 1 4 ]

[ 3 ]

[ 2 ]

[ 2 3 ]

[ 1 5 ]

[ 1 8 ]

[ 8 ]

[ 1 3 ]

Schema Requirement testingUser Requirements coverageODS Logical ModelField MappingData type constraintsAspects of Transformation rulesCorrect Data Selection

Data Integrity ConstraintsParent-child relationshipRecord counts

Duplicate DetectionThreshold TestData BoundariesData profiling

Random record comparisonSurrogate keys

Operation Review job proceduresError messagingProcessing time

Integration testingRejected recordData access

ODS DWSchema DW Conceptual Schema:

DW Logical Model:Integrity ConstraintsThreshold testData type constraints

Hierarchy level integrityGranularityDerived attributes checking

Data Record counts No constants loaded Null records

Field-to-Field testData relationshipsData transformationDuplicate detectionValue totalsData boundariesQuality factorsCompare transformed data withexpected transformationData AggregationReversibility of data from DWto DSConfirm all fields loadedSimulate Data Loading

Operation Document ETL Process

ETL TestScalability testInitial load testIncremental load testRegression TestFreshnessData Access

Frontend

DW DM Test [ 1 ]

[ 4 ]

[ 1 4 ]

[ 3 ]

[ 2 ]

[ 2 3 ]

[ 1 5 ]

[ 1 8 ]

[ 8 ]

[ 1 3 ]

Schema DM Schema Design

Calculated MembersIrregular hierarchies

AggregationsCorrect data filtersAdditivity Guards

DM UI

SchemaReports comply withspecificationsReport structureReport cosmetic checks (font,color and format)Graph cosmetic checks (type,color, labels, and legend)Column headingsDescriptionDrilling Across Query Reports

Data Correct data displayedTrace report fields to DatasourceField data verification

OperationPerformance test (Responsetime, # of queries, # of users)Stress testAudit user accessesRefresh time for standard and

complex reportsFriendliness

Overall

Schema Dependencies between levelsMetadataFunctionality meetsrequirementsObject definitionData providers

Data type compatibility

Operation Error logging

HW Configuration

SW setup in test environmentSystem connections setupSecurity TestFinal Source to targetcomparisonTest Through Development

Page 6: EETL DATA MAR VALIDATIONtl Data Mar Validation

8/10/2019 EETL DATA MAR VALIDATIONtl Data Mar Validation

http://slidepdf.com/reader/full/eetl-data-mar-validationtl-data-mar-validation 6/8

3. None of the existing approaches took into considerationthe dependencies between test routines becausesometimes it is not mandatory to perform all testroutines. However, some tests if passed successfullyothers could be neglected.

4. The approaches proposed in [18, 19, 21] were the onlyones focusing on both the DW testing routines and the

life cycle of the testing process. The life cycle of eachtest routine includes a test plan, test cases, test data,termination criteria, and test results. Nevertheless, they

presented the two approaches independently notshowing how the testing routines can fit in a completeDW testing life cycle.

5. In several projects testing is neglected or diminisheddue to time or resource problems. None of the existingapproaches included any differentiation technique or

prioritization for the test routines according to impacton the overall DW quality in order to help the testersselect the important test routines that highly affect the

quality of the DW in case they have resource or timelimitations that force them to shorten the testing process.

6. Some of the above test routines could be automated butnone of the proposed approaches showed how theseroutines could be automated or have an automatedassistance using custom developed or existing tools.

These drawbacks urged us to think of a new DW testingapproach that fills the gaps in this area as well as benefitfrom the efforts made by others. The following section willinclude description of our proposed DW testing framework.

4. A PROPOSED DW TESTINGFRAMEWORKIn my PhD I am planning to develop a DW testingframework that is generic enough to be used in several DW

projects with different DW architectures. The framework’s primary goal is to guide the testers through the testing process by recommending the group of test routines that arerequired given the project’s customized DW architecture.

The proposed framework is supposed to include definitionsfor the test routines. The main target of our research is to

benefit from existing DW testing approaches by adopting thetest routine definitions from available testing approaches anddefine test routines that were not addressed or

comprehensively defined by any previous approach.In our study we will prioritize the test routines according toimportance and impact on the output product, so that thetester could select the tests that highly affect the quality ofthe delivered system if any scheduling or budget limitationswere faced.

Part of the test routine definition should include how the testroutines can be automated or get automatic support if full

automation is not applicable. It should also includesuggestions for using existing automated test tools tominimize the amount of work done to get automated supportin the DW testing process.

One of the core features that we intend to include in our proposed testing framework is testing along with the DWdevelopment life cycle. This could be done by stating the

pre-requisites of each testing routine with respect to the DW

development life cycle stages in order to detect errors asearly as possible.

The framework’s infrastructure is planned to take the form ofa directed graph. Nodes of the graph will represent testroutines, and links between nodes will represent test routinerelationships. It is expected that there will be several types ofrelationships between test routines. For example, dependency

between test routines, tests that are guaranteed to succeed ifother tests passed successfully, or some tests should never be

performed unless other tests pass.

After defining the DW testing framework and fullydocumenting it, I am planning in my PhD to materialize theDW testing framework by developing a web service that

makes the framework reachable for testers.Finally, a case study will be used to experiment the testingframework in a real world DW project and the results of theexperimentation will be used to evaluate the proposed DWtesting framework.

5. THE ARCHITECTURE OF THEPROPOSED FRAMEWORKThis section will present the architecture of the framework

proposed to show the work flow of the framework when it is put into operation. As Shown in figure 2, the key player ofthe DW testing process is the Test Manager who feeds thesystem with the DW architecture under test and the current

state of the DW. In other words, which component of theDW is developed so far. This step is done because wesupport the testing through system development in our

proposed framework.

The DW Architecture Analyzer component then studies thereceived data and compares it with the dependencies betweentest routines from the Test Dependency Graph with theassistance of the Test Dependency Manager component and

passes the data to the Test Recommender to generate an Abstract Test Plan.

The process of preparing the Detailed Test plan then splitsinto two different directions according to the type of testroutine whether it is a validation test or a verification test. In

case of validation test types, which are the more complicatedtesting routines, the Validation Manager involves the Business Expert(s) and System User(s) in addition toaccessing the relevant data from the system Repository to

prepare the part of the test plan concerns the validation testroutines. For the Verification Test Routines, the Verification

Manager along with the Test Case Generator and Test DataGenerator modules helps in preparing the Detailed Test Planof verification test routines.

Page 7: EETL DATA MAR VALIDATIONtl Data Mar Validation

8/10/2019 EETL DATA MAR VALIDATIONtl Data Mar Validation

http://slidepdf.com/reader/full/eetl-data-mar-validationtl-data-mar-validation 7/8

Figure 2: Proposed DW Testing Framework Architecture

The Verification Manager involves the system tester (s) and DB administrator for assistance in the process of Test Case preparation and test data generation (in case the DW is stillin the development phase and no real data is available).

To benefit from the existing test automation tools for DW,the Verification Manager will include in the Detailed Test

Plan possible use of existing Test Automation Tools in testroutines that could be automated.

6. SCOPE AND LIMITATIONSIn My PhD I will not take into consideration:

1. The testing routines that targets unconventionalDWs like temporal DWs, active DWs, SpatialDWs, DW2.0, etc...

2. Testing routines that targets the last layer of theDW Architecture presented in table 1 (DM to UI)since it is a broad range of different applicationtypes spread with different architectures andimplementation techniques which will be very hardto define a generic testing routines for them.

3. Test routines checking the data quality of the datasources; however, we will consider all the sourcesto be of low quality.

In the process of assimilating the existing DW testingapproaches into our proposed DW testing framework, we areexpecting to face some difficulties regarding the availabilityof data and the knowhow of the way testing routines are

defined and implemented. This is because a considerablenumber of the existing approaches are industrialorganizations and revealing such information could highlyaffect the organization’s novelty in the field of DW testing.

7. CONCLUSIONS AND FUTUREWORKSome trials have been carried out to address the DW testing,most of them were oriented to a specific problem and none ofthem were generic enough to be used in other datawarehousing projects.

By the end of my PhD, I will present a generic DW testingframework that integrates and benefits from all existing DWtesting trials. It will guarantee to the system user that the dataquality of the data sources is preserved or even improved dueto the comprehensive testing process that the DW had passedthrough.

Future works in this field could be extending this testingframework to support nonconventional DWs like ActiveDW, Temporal DW, Spatial DW, etc. This could be done bydefining a specialization for test routines to address thespecial needs for these DWs.

8. REFERENCES[1] Bateman, C. Where are the Articles on Data

Warehouse Testing and Validation Strategy? Information Management , 2002.

Page 8: EETL DATA MAR VALIDATIONtl Data Mar Validation

8/10/2019 EETL DATA MAR VALIDATIONtl Data Mar Validation

http://slidepdf.com/reader/full/eetl-data-mar-validationtl-data-mar-validation 8/8

[2] Bhat, S. Data Warehouse Testing - Practical Stick Minds , 2007.

[3] Brahmkshatriya, K. Data Warehouse Testing Stick Minds , 2007.

[4] Cooper, R. and Arbuckle, S., How to Throughly Test aData Warehouse. in Software Testing Analysis and

Review (STAREAST) , (Orlando, Florida, 2002).[5] CTG. CTG Data Warehouse Testing, 2002.[6] ElGamal, N., ElBastawissy, A. and Galal-Edeen, G.,

Towards a Data Warehouse Testing Framework. in IEEE 9th International Conference on ICT and Knowledge Engineering (IEEE ICT&KE) , (Bangkok,Thailand, 2011), 67-71.

[7] Executive-MiH. Data Warehouse Testing is Different,2010.

[8] Golfarelli, M. and Rizzi, S. A ComprehensiveApproach to Data Warehouse Testing ACM 12thinternational workshop on Data warehousing andOLAP (DOLAP '09) , Hong Kong, China, 2009.

[9] Golfarelli, M. and Rizzi, S. Data Warehouse Design: Modern Principles and Methodologies . McGraw Hill,2009.

[10] Golfarelli, M. and Rizzi, S. Data Warehouse Testing.

International Journal of Data Warehousing and Mining , 7 (2). 26-43.[11] Golfarelli, M. and Rizzi, S. Data Warehouse Testing:

A prototype-based methodology. Information andSoftware Technology , 53 (11). 1183-1198.

[12] Inergy. Automated ETL Testing in Data WarehouseEnvironment, 2007.

[13] Mathen, M.P. Data Warehouse Testing InfoSys , 2010.

[14] Munshi, A. Testing a Data Warehouse ApplicationWipro Technologies , 2003.

[15] Rainardi, V. Testing your Data Warehouse. in Building a Data Warehouse with Examples in SQLServer , Apress, 2008.

[16] RTTS. QuerySurge, 2011.[17] SSNSolutions. SSN Solutions, 2006.[18] Tanuška, P., Morav č ík, O., Važan, P. and Miksa, F.,

The Proposal of Data Warehouse Testing Activities. in20th Central European conference on Information and

Intelligent Systems , (Varaždin, Croatia, 2009), 7-11.[19] Tanuška, P., Morav č ík, O., Važan, P. and Miksa, F.,

The Proposal of the Essential Strategies of DataWarehouse Testing. in 19th Central EuropeanConference on Information and Intelligent Systems(CECIIS) , (2008), 63-67.

[20] Tanuška, P., Schreiber, P. and Zeman, J. TheRealization of Data Warehouse Testing Scenario

proizvodstvo obrazovanii. (Infokit-3) Part II: 3meždunarodnaja nature-techni č eskaja konferencija. ,Stavropol, Russia, 2008.

[21] Tanuška, P., Verschelde, W. and Kop č ek, M., The proposal of Data Warehouse Test Scenario. in

European conference on the use of Modern Information and Communication Technologies(ECUMICT) , (Gent, Belgium, 2008).

[22] TAVANT, T. Data Warehouse Testing. DateAccessed: Jan 2013

[23] Theobald, J. Strategies for Testing Data WarehouseApplications Information Management , 2007.