16
WHITE PAPER Profiling: Calculating Return on Investment for Data Migration and Data Integration Projects

Data Profiling White Paper

Embed Size (px)

DESCRIPTION

Whitepaper on best practices for Data Profiling

Citation preview

Page 1: Data Profiling White Paper

W H I T E P A P E R

Profi ling: Calculating Return on Investment for Data Migration and Data Integration Projects

Page 2: Data Profiling White Paper

This document contains Confi dential, Proprietary and Trade Secret Information (“Confi dential Information”) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica.

While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice.

The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software product—as well as the timing of any such release or upgrade—is at the sole discretion of Informatica.

Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700.

This edition published November 2006

Page 3: Data Profiling White Paper

1Profi ling: Calculating Return on Investment for Data Migration and Data Integration Projects

White Paper

Table of Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Key Benefi ts of Informatica Data Explorer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Why is understanding data hard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Example projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Project 1. Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Project 2. ERP Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Project 3. Data Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

ROI Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Reduction in Direct Project Costs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Business Value of Early Project Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Reduction in Project Overrun Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Value of Improved Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Cost of Informatica Data Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

Total Return on Investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

Page 4: Data Profiling White Paper

2

IntroductionIndustry experience has shown that data migration and data integration projects are prone to the same challenges and problems that are common to all IS projects. They suffer from time and budget overruns, tradeoffs between quality and deadlines, and outright project failures. Project managers face these issues on a daily basis and are often left with few options and no good solutions.

Informatica provides comprehensive data integration solutions for today’s complex enterprise environments. Addressing the broadest range of data sources, Informatica solutions help companies and government organizations of all sizes to access, discover, cleanse, integrate, and deliver their data—where needed and when needed—so they can realize its full value to the business.

Data profi ling is an essential step in the data integration lifecycle. Getting it right from the start can dramatically reduce the time it takes to plan and execute data integration strategies. Before data can be integrated or used in a data warehouse, CRM, ERP or business analytics application, its content, quality and structure must be fully understood. Informatica Data Explorer (formerly known as Similarity Systems Evoke AXIO) is a complete data investigation, discovery, and mapping product used by data analysts for planning complex data migration and data integration projects. Informatica Data Explorer provides a complete and completely accurate picture of all enterprise data through automated and process-drive data profi ling.

The value of Informatica Data Explorer lies in its ability to quickly analyze multiple source systems and assist the user in scoping the size and complexity of a migration or data integration project. The software enables data analysts to quickly discover hidden data quality issues, gaps, inconsistencies and incompatibilities within data sources. The result is an accurate profi le which is stored in a central open repository and can be used to accelerate the design and implementation of new applications, databases and data quality programs.

Key Benefi ts of Informatica Data ExplorerDelivers accurate source system knowledge •

Improves corporate data quality and accuracy•

Enables accurate data migration, integration and consolidation•

Helps expedite integration of multiple, disparate data sources•

Mitigates risk in data management projects•

Minimizes overruns in enterprise applications projects•

Improves productivity of data management projects•

Improves merger & acquisition data integration success•

Reduces costs•

Page 5: Data Profiling White Paper

White Paper

3

Background Despite careful initial planning, four out of fi ve (83%) of data migration projects over $1 million are at risk of overrunning in time and cost, or of outright failure, according to The Standish Group report “Migrate Headaches”. Although the increased or wasted cost of these projects is signifi cant, it is often dwarfed by the loss in business value that occurs when the implementation of a new business system or data warehouse is either delayed or cancelled. For business value to be maximized, managers and planners of data migration projects must directly address the underlying causes of these overruns and failures. They must not only ensure that their project completes on time, but also drive to accelerate business value by completing the project ahead of schedule.

According to The Standish Group report, one of the primary causes of data migration project overruns and failures is a lack of understanding of the source data prior to data movement. This study involved a series of focus groups with IT executives in four US cities. According to these executives, 60-80% of the effort in data migration projects is expended on trying to understand and map the source data. They also indicated that this was an iterative process without a predictable endpoint. The study concluded that if source data could be understood more effi ciently and accurately, then the project’s business value could be delivered on time, or even accelerated.

A deeper look into the problem shows that most migration projects and data integration initiatives rely on external information to provide an understanding of the data. Much of this information—documentation, source programs, existing data models, and staff experience—is often outdated, incorrect or missing. If the information is invalid, then it may take many iterative cycles to develop new information and validate that it is indeed correct (i.e., that it actually represents the source data). In this scenario as much as 50% of a total project’s labor budget may be wasted on manual, out-dated data analysis and diagnosis techniques. At the same time the overall success of the project can be put in jeopardy because of a poor understanding of the source data.

Why is understanding data hard?Manual data profi ling is tedious, slow, labor-intensive, and error-prone•

Metadata documentation may be missing, incomplete, or badly out of date•

Source code for legacy systems may no longer be found•

Relationships between data elements are not always obvious•

Assumed relationships and dependencies may be wrong•

Databases are not static; over time they may be corrupted•

Profi ling: Calculating Return on Investment for Data Migration and Data Integration Projects

Page 6: Data Profiling White Paper

4

Tackling the analysis of multiple data sources takes time—typically three to fi ve hours per attribute on average, with more challenging cases taking up to 10 hours per attribute or more. An extreme example is a major insurance provider where two months of effort resulted in the completed analysis of six attributes. An average of three to fi ve hours per attribute is an accepted industry standard and consistent with a similar estimate on business rule extraction.1

There is, however, a more direct way to understand source data. Informatica Data Explorer, can be used to thoroughly analyze the actual source data rather than relying on inaccurate metadata and documentation. The solution automates the identifi cation of problematic data and metadata and enables companies to readily identify inconsistencies, redundancies and inaccuracies across all their corporate databases. With the level of understanding that can be rapidly achieved using our data profi ling solution, the data migration project or data integration initiative can more readily proceed to completion.

Informatica Data Explorer supports a six stage process for examining the source data in detail to provide a through understanding of its content, structure, quality and integrity and then generating source-to-target attribute maps based on this understanding. The data profi ling and mapping process rapidly produces a correct data model and a set of source-to-target transformation maps, and eliminates the unpredictable and time consuming iterative cycles that result when a manual approach is used.

The solution can help lower project cost, reduce project risk, accelerate business value, and deliver better results by:

Reducing direct project costs such as labor and resources •

Increasing project value through early project completion •

Reducing unexpected costs and delayed benefi ts associated with overruns or cancellations •

Increasing data quality of the migrated or converted data without extending deadlines •

Xdata analysis develop move rework

50% 30% 20%

15% 30% 20%35% Reduction

Figure 1: Faster ROIThe graph above shows how using Informatica Data Explorer immediately reduces project timelines by cutting the time needed for data analysis. Even more time savings result from less rework and delays in later stages.

Manual DP&M

Automated DP&M

1 Terry Moriarty, “Getting Your Business Rules Automatically,” Database Programming and Design,

October 1997:74

Page 7: Data Profiling White Paper

White Paper

5

Example projectsIn this document, the relative contribution of each of these four benefi ts to total return on investment (ROI) is determined for three different sized projects. The planning factors, estimates, and project results shown below are composites of actual experiences across a number of customers and do not refl ect total results obtained at any particular customer site.

Project 1: Data Warehousing An insurance provider moved data from a legacy application to a data warehouse and several data marts. The initial evaluation established that a full analysis of the legacy data could take a year or more to complete and might not produce useful results when fi nished. The project was put on hold. Subsequently Informatica Data Explorer was brought in and the company was able to complete the data profi ling and mapping process in six weeks and the entire project in 10 weeks. An in-house project team did all the work; the fully loaded cost of $95 per hour is typical for the area.

Project 2: ERP Implementation A manufacturing company underwent a migration from multiple legacy VSAM applications to an ERP system. It was considered critical to accurately and completely migrate human resource data, including benefi ts history, plus an extensive amount of manufacturing history data, to the new system. The project was completed by a combination of in-house staff and external consultants. Most of the work was done by the in-house staff, which is refl ected in the average cost per hour of $75.

Project 3: Data Consolidation A major international transportation company was consolidating data from a recent acquisition. The estimate of four hours analysis per attribute was believed to be low but due to the size of the project could not be fully tested. Informatica Data Explorer was fi rst used to verify the accuracy of the initial data analysis, and then to develop accurate data profi les and data maps. The project team included in-house staff plus consultants from a major systems integrator; the fully loaded cost of $100 per hour takes this staffi ng mix into account.

Profi ling: Calculating Return on Investment for Data Migration and Data Integration Projects

Page 8: Data Profiling White Paper

6

ROI ScenariosFor these three example projects, return on investment (ROI) is demonstrated in four separate scenarios:

Reducing direct project costs such as labor and resources •

Increasing project value through early project completion •

Reducing unexpected costs and delayed benefi ts associated with overruns or cancellations •

Increasing data quality of the migrated or converted data without extending deadlines •

Although each scenario is illustrated for each example project, most projects will focus on a subset of these scenarios when establishing the expected ROI associated with the use of a data profi ling solution.

The ROI scenarios use the term “full time equivalent” or “FTE” in all staffi ng calculations. An FTE may be fractional, representing a person working part time on a project, or may consist of various staff members contributing different skill sets as needed by the project.

Reduction in Direct Project Costs A properly planned project includes an allotment of resources for profi ling data sources, designing the target systems and mapping specifi cations, and executing extraction, cleansing and transformation processes. Failure to be thorough in the profi ling and mapping phases will increase costs due to undiscovered problems that will eventually surface and need to be resolved. In comparison to manual estimates of three to fi ve hours per attribute, Informatica customers indicate that the maximum time required for data profi ling and mapping is 15 minutes per attribute.

Table 1 shows the difference in direct project costs attributable to the data profi ling and mapping (DP&M) phase for the three example projects when using a manual approach versus an approach that utilizes Informatica Data Explorer. The Standish Group report shows that manual data analysis in the average $1 million data migration project will consume 50% of the total project budget. Time, cost and resource constraints in larger projects constrain many of them from undertaking a 100% analysis of the source data. In the scenario in Table 1, each of the projects attempts only a 75% manual analysis of the source data, even though analyzing less than 100% of the data will result in an increase in project risk and a decrease in data quality. The scenario also doubles the total effort required for data profi ling and mapping with Informatica Data Explorer as a contingency to allow for implementation, training and startup time. Subsequent projects with an experienced team would not require this increase in the total effort.

Page 9: Data Profiling White Paper

White Paper

7

Formula Data

Warehouse

ERP

Implementation

Data

Consolidation

A. Initial project budget Input value $1,200,000 $3,000,000 $12,000,000

B. Fully loaded cost per FTE per hour Input value $95 $75 $100

C. Number of attributes Input value 1,750 4,675 16,200

Manual analysis

D. Hours per attribute Input value 5.0 4.0 3.0

E. Total hours (75% analysis)* C x D x .75 6,563 14,025 36,450

F. Cost of manual DP&M** B x E $623,485 $1,051,875 $3,645,000

G. DP&M** as a percent of total budget F / A 52% 35% 30%

With Informatica Data Explorer

H. Hours per attribute Input value 0.25 0.25 0.25

I. Total hours (100% analysis)*** C X H x 2 875 2,338 8,100

J. Cost of DP&M** using Informatica Data Explorer B x I $83,125 $175,350 $810,000

K. Project budget using Informatica Data Explorer A - F + J $659,640 $2,123,475 $9,165,000

L. DP&M** as apercent of revised budget J / K 13% 8% 9%

Reduction in direct project cost

M. Net reduction F - J $540,360 $876,525 $2,835,000

Business Value of Early Project Completion A signifi cant ROI component achieved by using Informatica Data Explorer is the business value that is realized due to early project completion. This value is easy to compute if the business value of the project has been estimated. In Table 2, the business value per month of each of the three example projects has been multiplied by the number of months saved to show the total business value of using Informatica Data Explorer to accelerate project completion. Each month is assumed to have 175 working hours.

Formula Data

Warehouse

ERP

Implementation

Data

Consolidation

A. Number of FTEs Input value 3 5 10

B. Project business value per month Input value $60,000 $90,000 $200,000

Manual analysis

C. Total hours(75% analysis) Table 1. E 6,563 14,025 36,450

D. Elapsed time (months)* C / A / 175 12.5 16.0 21.0

With Informatica Data Explorer

E. Total hours (100% analysis) Table 1. I 875 2,338 8,100

F. Elapsed time (months)* E / A / 175 1.7 2.7 4.6

Reduction in direct project cost

G. Elapsed time saving (months) D - F 10.8 13.3 16.4

H. Net value B x G $648,000 $1,197,000 $3,280,000

Profi ling: Calculating Return on Investment for Data Migration and Data Integration Projects

* With manual approach only 75% analysis possible ** Data Profi ling and Mapping*** Adjusted by factor of 2 for training and implementation for the fi rst project

Table 1: Reduction in Direct Project Costs

* 175 working hours per monthTable 2: Business Value Of Early Project Completion

Page 10: Data Profiling White Paper

8

Reduction in Project Overrun Costs The traditional manual approach to data profi ling and mapping has proven to be a very repetitive process. Metadata, copybooks, documentation and, in some cases, physical data are analyzed manually. Based on this analysis, and many assumptions, transformation specifi cations are produced and coded, and the data is extracted, transformed, and loaded into the new system. In over 80% of data migration projects, this process doesn’t work correctly the fi rst time—a process often referred to as code, load, and explode. As a result, the whole process is repeated over and over again until the project is ultimately successful or cancelled.

Unfortunately, there is no way to predict in advance just how much iteration will be required and when the process will be completed. One of the key benefi ts of using Informatica Data Explorer is that the data profi ling and mapping process is predictable. Furthermore, since the transformation specifi cations generated are 100% supported by the data, the project has a much greater probability of completing successfully the fi rst time, without endless rework.

According to The Standish Group (“CHAOS ’98: A Summary Review,” 1998), with a manual approach to data profi ling and mapping, 83% of data migration projects either overrun their budget or fail outright. Of the total projects studied, 49% will overrun and 34% will fail. The average overrun in project time is 79%, and in project cost is 69%. The latter percentage is used in Table 3 to calculate the anticipated overrun cost for the three example projects when an automated data profi ling and mapping product such as Informatica Data Explorer is not used. This scenario also assumes that these overrun costs will be reduced by 85% when Informatica Data Explorer is used, a factor that has been validated in many data profi ling and mapping projects. Although some projects might not overrun at all, good business planning requires taking this into consideration as part of the overall plan.

There are also additional project-related costs savings that would be different for specifi c projects. Assuming this is part of a larger IT project, e.g. an SAP implementation, delay in analyzing the data can delay the whole project. So you may have SAP people & software/hardware sidelined on the larger IT project, waiting for the data. Another cost, particularly for migrations/consolidations, is if you have to maintain old systems longer than originally planned, there is additional cost (e.g. paying for MIPS on a mainframe that you hoped to decommission). For simplicity, we haven’t added these costs into this calculation.

* Data Profi ling and MappingTable 3: Reduction in Project Overrun Costs

Formula Data

Warehouse

ERP

Implementation

Data

Consolidation

A. Total planned cost for manual DP&M* Table 1. F $623,485 $1,051,875 $3,645,000

B. Allowance for overrun A x .69 $430,205 $725,794 $2,515,050

Reduction in project overrun cost

C. Net reduction B x .85 $365,674 $616,925 $2,137,793

Page 11: Data Profiling White Paper

White Paper

9

Value of Improved Data Quality When IS projects are faced with hard deadlines, quality is often compromised in the interests of project completion. Informatica Data Explorer provides a wealth of information during the migration process that can be the starting point of an ongoing data quality initiative driven by our data quality management platform Informatica Data Quality. While it is easy to show how Informatica Data Explorer allows the project manager to meet or beat their project deadlines while ensuring high quality, accurate data in the target database, it is much more diffi cult to quantify this benefi t. It is the business user who must put a value on having accurate data. The conservative approach used in Table 4 is to say that improved data quality is worth at least 20% of the project business value per month for 12 months. Consider that it would cost at least this much, over this period of time, to recover from and correct poor data quality.

Formula Data

Warehouse

ERP

Implementation

Data

Consolidation

A. Project business value per month Table 2. B $60,000 $90,000 $200,000

Value of improved data quality Input value $60,000 $90,000 $200,000

B. Net value A x 0.2 x 12 $144,000 $216,000 $480,000

Profi ling: Calculating Return on Investment for Data Migration and Data Integration Projects

Table 4: Value of Improved Data Quality

Page 12: Data Profiling White Paper

10

Cost of Informatica Data Explorer Table 5 shows the total costs of acquiring and implementing Informatica Data Explorer for the example projects. This cost is factored into the ROI calculations in the subsequent scenarios. Costs are based on Informatica list prices as of November 2006, are subject to change without notice, and do not represent a price quotation or an offer to sell.

Total Return on Investment One way of looking at the total return on investment for the three example projects is shown in Table 6. The total ROI is the sum of the reduction in direct project costs, the reduction in project overrun costs, the business value of early completion, and the value of improved data quality. In all cases the cost of Informatica Data Explorer is substantially less than the benefi ts that are realized.

Table 5: Cost of Informatica Data Explorer

Formula Data

Warehouse

ERP

Implementation

Data

Consolidation

A. Informatica Data Explorer Input value $150,000 $180,000 $225,000

B. Source importers (VSAM or Relational) Input value $0 $20,000 $20,000

C. Maintenance for one year 20% of A $30,000 $36,000 $45,000

D. Training and consulting Input value $25,000 $64,000 $104,000

Cost of Informatica Data Explorer

F. Total Cost Sum A to D $205,000 $300,000 $394,000

Table 6: Total ROI

Formula Data

Warehouse

ERP

Implementation

Data

Consolidation

A. Reduction in direct project cost Table 1, M $540,360 $876,525 $2,835,000

B. Value of early project completion Table 2, H $648,000 $1,197,000 $3,280,000

C. Reduction in project overrun cost Table 3, C $365,674 $616,925 $2,137,793

D. Value of improved Data Quality Table 4, B $144,000 $216,000 $480,000

Total Value

E. Total value Sum A to D $1,698,034 $2,906,450 $8,732,793

F. Cost of Informatica Data Explorer Table 5, F $205,000 $300,000 $394,000

ROI as a percent of investment in IDE 100 (E - F) / F 728% 869% 2116%

Page 13: Data Profiling White Paper

White Paper

11Profi ling: Calculating Return on Investment for Data Migration and Data Integration Projects

ConclusionData migration and data integration projects that have used Informatica Data Explorer during the initial phases typically complete earlier than originally planned and much sooner than would have been possible otherwise. The benefi ts include lowering direct project costs, accelerating business value, reducing the risks of project overruns and failures, and increasing the quality of the fi nal result. The fi nancial benefi ts of using Informatica Data Explorer can be many times greater than the initial software investment.

Page 14: Data Profiling White Paper

12

Page 15: Data Profiling White Paper

White Paper

13Profi ling: Calculating Return on Investment for Data Migration and Data Integration Projects

Page 16: Data Profiling White Paper

Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USAphone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 www.informatica.com

Informatica Offi ces Around The Globe: Australia • Belgium • Canada • China • France • Germany • Japan • Korea • the Netherlands • Singapore • Switzerland • United Kingdom • USA

© 2008 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, and The Data Integration Company are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.

6744 (09/30/2008)