4

Click here to load reader

Informatica PowerCenter Data Cleansingcjgoldbergcommunications.com/portfolio/Portfolio/infa_ds_cleansing... · Standardize, Validate, and Correct Data to Maximize Its Integrity and

Embed Size (px)

Citation preview

Page 1: Informatica PowerCenter Data Cleansingcjgoldbergcommunications.com/portfolio/Portfolio/infa_ds_cleansing... · Standardize, Validate, and Correct Data to Maximize Its Integrity and

Standardize, Validate, and Correct Data toMaximize Its Integrity and ValueThe Informatica® PowerCenter® Data Cleansing option standardizes, validates, and corrects

name and address data to maximize the integrity and value of an organization’s most

important information assets and provide users with accurate business-relevant information.

Using the Data Cleansing option, organizations can parse out separate data elements, and

standardize and cleanse address data at the lowest granularity using information from third-

party sources, such as those from the U.S. Postal Service. The Data Cleansing option works

seamlessly within the PowerCenter environment, incorporating comprehensive data quality

functionality into the PowerCenter Designer object-based, visual development environment.

Optimized for performance, the option also utilizes the connectivity, metadata, parallel

performance, and linear scalability of the PowerCenter platform to increase the number of

data sources that may be cleansed and improve data quality in an organization.

Poor Data Quality Causes Poor Decisions and InefficientOperationsBusiness decisions today are routinely made based on inaccurate or incomplete data, and manyorganizations fail to understand the impact that poor data quality can have. For example, badCRM data can prevent organizations from obtaining a single view of the customer, which can leadto errors that could affect customer loyalty and even cause customer attrition. Poor data can alsoincrease operational costs, and even cause companies to falter on federal regulations such asSarbanes-Oxley, which requires companies to report accurate and relevant data.

These problems intensify when an organization attempts to aggregate different data repositories,or when it attempts to synchronize data across the enterprise with third-party data sources.Neglecting to examine the quality of data before data consolidation is one of the key causes offailure for data integration projects.

Poor data quality can result from mistakes made during the data entry process, inaccuraciesduring data processing and moving, and degradation of customer data over time. In the past,organizations have attempted to correct data errors manually or by using separate solutions thatwere not integrated into the data integration solution. This situation can hinder the data migrationeffort and cause delays in the deployment of new systems and integration projects.

Organizations need a solution that can simplify the process of cleansing, standardizing, andvalidating data and integrate this process into the broader data integration effort. With such asolution, developer productivity is maximized as the need for data reconciling decreases alongwith data integration costs.

BENEFITS

• Improve quality with the ability to standardize, validate, and correct name and address data

• Increase developer efficiency through a codeless development environment for both data cleansing and integration

• Optimize runtime for data correction to provide high performance and scalability

Informatica PowerCenter Data Cleansing

Page 2: Informatica PowerCenter Data Cleansingcjgoldbergcommunications.com/portfolio/Portfolio/infa_ds_cleansing... · Standardize, Validate, and Correct Data to Maximize Its Integrity and

The PowerCenter Data CleansingOption Improves Data Quality

The PowerCenter Data Cleansing optionallows organizations to standardize, validate,and correct name and address data fromwithin a single, unified data integration anddata cleansing environment, while leveraginga high-performance engine optimized for datacleansing at runtime. Organizations can usePowerCenter Data Cleansing to parse outseparate data elements, standardize names,and cleanse address data at the lowestgranularity using information from third-partysources. Because organizations can utilizePowerCenter’s codeless developmentenvironment, as well as its connectivity andmetadata-centric engine, organizations canachieve high performance across all dataintegration and quality efforts, whileimproving productivity and reducingoperational and training costs.

With the PowerCenter Data Cleansing option,organizations can reconcile data quickly, gaingreater confidence in their analytical systems,and create a single version of truth. As aresult, they are able to increase customersatisfaction by becoming more responsive tocustomer needs, reduce costs associatedwith detecting errors and maintaining baddata, and increase revenues by making betterbusiness decisions.

Improve Data Quality with theAbility to Standardize, Validate,and Correct Name and AddressData

The PowerCenter Data Cleansing optionintelligently parses and identifies individualcomponents of data, and standardizes,corrects, and enhances it using empiricaldata. These capabilities allow organizations toload operational data that is reliable andaccurate.

Parse fields by patternsThe PowerCenter Data Cleansing optionprovides robust features for parsing dataelements, including unstructured data,identifying individual data elements incustomer files, and separating them intoindividual fields. For example, parsingconverts a field containing “John Taylor, 123Anywhere Street, Hometown, IL” into fiveseparate fields: “John,” “Taylor,” “123Anywhere Street,” “Hometown,” and “IL.”Parsing isolates data and applies structure toeach data element, helping to speed the dataintegration transformation processes andreduce the risk of error.

Standardize format and structure ofoutput recordThe PowerCenter Data Cleansing optionallows users to convert data into differentformats and structures. This capability allowsorganizations to standardize the format andstructure of records across the enterprise toenhance the accuracy of data integrationefforts.

Correct name and address using third-party dataPowerCenter Data Cleansing leverages anunrivaled repository of postal knowledgeembodying global logistical data structuresand content. This feature enablesorganizations to compare and correct addressinformation against postal service directoriesfrom more than 195 countries, resulting ingreater quality and accuracy of addressinformation.

Increase Developer EfficiencyThrough a Codeless Environmentfor Both Data Cleansing andIntegration

PowerCenter Data Cleansing offers wizard-driven business rules that reduce thedevelopment effort necessary to cleanse andtransform data, thereby improving developerproductivity and reducing trainingrequirements. Developers using PowerCentercan take advantage of data cleansing andtransformation tools directly from thePowerCenter toolbar, and data cleansingbusiness rules write to and read from thesingle PowerCenter repository. With thisunified architecture, developers have accessto reusable PowerCenter data cleansing-specific transformations that they can use toboth integrate and cleanse data in a singleoperation.

The PowerCenter Data Cleansing optionprovides a rich library of transformationcapabilities for data correction, including theability to normalize by pivoting arrays, rankand sort data, filters for subset processing,custom and stored procedures, and manyothers. Well-known algorithms, such asSoundex and Metaphone, convert a textstring into an encoded value that can easilybe compared with other encoded values.These algorithms are designed to removecomparison issues with misspellings orsimilar names spelled differently; for example,names such as “Johnson” and “Johnston.”

Irwin m fletcherLA TiMes incorporated1234 Wiltshire BLD 6th FloorLos Angles Cal 94017

Title: Mr.First Name: IrwinMiddle Name: M.Last Name: FLETCHERFirm: Los Angeles Times, Inc.Address Line 1: 1234 Wilshire Blvd.Address Line 2: Suite 600Locality: Los AngelesRegion: CAPost Code: 90017-1908

Input Record Mapping Designer Output Record

Figure 1: Parsing, standardizing, and cleansing Name and Address

Page 3: Informatica PowerCenter Data Cleansingcjgoldbergcommunications.com/portfolio/Portfolio/infa_ds_cleansing... · Standardize, Validate, and Correct Data to Maximize Its Integrity and

Optimize Runtime for DataCorrection to Provide HighPerformance and Scalability

Increasing data volumes and shrinkingprocessing timelines require an InformationTechnology (IT) infrastructure that integrateswith and processes more information in lesstime than ever before. The fast pace of dataconsumption—when used by managers fordecision making—is driving the need for real-time, actionable data. This need requiresorganizations to have a data integrationinfrastructure that will scale up—in terms ofthroughput and scalability—to keep pace withincreasing data volumes and shrinkingprocessing windows, while cleansing the databeing processed so that it is accurate andactionable.

The Data Cleansing option’s tight integrationwith PowerCenter means users have accessto all the data sources to which PowerCentercan connect. Leveraging PowerCenter’sscalable runtime, the Data Cleansing optionis optimized to take advantage of the wideconnectivity, metadata-based design andreuse, parallel performance, and linearscalability of the PowerCenter platform.Because the Data Cleansing option ispartition-ready out of the box, users benefitfrom the same high performance they areaccustomed to with PowerCenter.

Figure 2: Integrated Cleansing transformation capability in PowerCenter Designer

For more information on Informatica PowerCenter, please visit www.informatica.com/products/powercenter or call 1.800.653.3871.

Figure 3: Correct Name and Address using third-party data

“By 2005, Fortune 1000 companies

will lose more money in operational

inefficiency due to data quality issues

than they will spend on data

warehouse and CRM initiatives.”

—Gartner Research, T. Friedman, April 2004

Page 4: Informatica PowerCenter Data Cleansingcjgoldbergcommunications.com/portfolio/Portfolio/infa_ds_cleansing... · Standardize, Validate, and Correct Data to Maximize Its Integrity and

Worldwide Headquarters, 2100 Seaport Boulevard, Redwood City, CA 94063, USAphone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.970.1179 www.informatica.com

Informatica Offices Around The Globe: Australia • Belgium • Canada • France • Germany • Japan • the Netherlands • Singapore • Switzerland • United Kingdom • USA

© 2004 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, Turning integration into insight, and PowerCenter are trademarks or registered trademarks of Informatica Corporation in the UnitedStates and in jurisdictions throughout the world. All other company and product names may be tradenames or trademarks of their respective owners.

J50269 6537 (12/16/04)