Transcript
Page 1: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Informatica Data Quality (Version 9.1.0 HotFix 2)

Accelerator Guide

Page 2: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Informatica Data Quality Accelerator Guide

Version 9.1.0 HotFix 2September 2011

Copyright (c) 2009-2011 Informatica. All rights reserved.

This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use anddisclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form,by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or internationalPatents and other Patents Pending.

Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided inDFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013 © (1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), asapplicable.

The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us inwriting.

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange,PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica OnDemand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and InformaticaMaster Data Management are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other companyand product names may be trade names or trademarks of their respective owners.

Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rightsreserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All rightsreserved.Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright © MetaIntegration Technology, Inc. All rights reserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems Incorporated. Allrights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © Microsoft Corporation. All rights reserved.Copyright © Rogue Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rights reserved. Copyright © Yahoo! Inc. All rights reserved. Copyright ©Glyph & Cog, LLC. All rights reserved. Copyright © Thinkmap, Inc. All rights reserved. Copyright © Clearpace Software Limited. All rights reserved. Copyright © InformationBuilders, Inc. All rights reserved. Copyright © OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All rightsreserved. Copyright © International Organization for Standardization 1986. All rights reserved. Copyright © ej-technologies GmbH . All rights reserved. Copyright © JaspersoftCorporation. All rights reserved.

This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License,Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing,software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See theLicense for the specific language governing permissions and limitations under the License.

This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright ©1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at http://www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but notlimited to the implied warranties of merchantability and fitness for a particular purpose.

The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine,and Vanderbilt University, Copyright ( © ) 1993-2006, all rights reserved.

This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution ofthis software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.

This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <[email protected]>. All Rights Reserved. Permissions and limitations regarding thissoftware are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or withoutfee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

The product includes software copyright 2001-2005 ( © ) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms availableat http://www.dom4j.org/ license.html.

The product includes software copyright © 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http://dojotoolkit.org/license.

This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding thissoftware are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.

This product includes software copyright © 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.gnu.org/software/ kawa/Software-License.html.

This product includes OSSP UUID software which is Copyright © 2002 Ralf S. Engelschall, Copyright © 2002 The OSSP Project Copyright © 2002 Cable & WirelessDeutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.

This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subjectto terms available at http:/ /www.boost.org/LICENSE_1_0.txt.

This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt.

This product includes software copyright © 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http:// www.eclipse.org/org/documents/epl-v10.php.

This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/ license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/license.html, http://jung.sourceforge.net/license.txt, http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org,http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3- license-agreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/LICENSE.txt. http://jotm.objectweb.org/bsd_license.html; http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://developer.apple.com/library/mac/#samplecode/HelpHook/Listings/HelpHook_java.html; http://www.jcraft.com/jsch/LICENSE.txt; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/software/tcltk/license.html, http://www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, and http://www.slf4j.org/license.html.

Page 3: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and DistributionLicense (http://www.opensource.org/licenses/cddl1.php ) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code LicenseAgreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php) the MIT License (http://www.opensource.org/licenses/mit-license.php) and the Artistic License (http://www.opensource.org/licenses/artistic-license-1.0).

This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this softwareare subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For furtherinformation please visit http://www.extreme.indiana.edu/.

This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775;6,640,226; 6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,254,590; 7,281,001; 7,421,458; 7,496,588; 7,523,121; 7,584,422, 7,720,842;7,721,270; and 7,774,791 , international Patents and other Patents Pending.

DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the impliedwarranties of noninfringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. Theinformation provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation issubject to change at any time without notice.

NOTICES

This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress SoftwareCorporation ("DataDirect") which are subject to the following terms and conditions:

1.THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOTLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OFTHE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACHOF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.

Part Number: IN-ACG-91000-HF2-0002

Page 4: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiInformatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Chapter 1: Introduction to Accelerators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Accelerators Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Installing Accelerators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Accelerator Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Content Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Demonstration Data Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Demonstration Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Tags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 2: Core Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Core Accelerator Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Core Demonstration Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Core Address Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Core Contact Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Core Corporate Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Core General Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Core Product Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 3: Australia/New Zealand Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Australia/New Zealand Accelerator Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Australia/New Zealand Demonstration Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Address Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Australia/New Zealand Contact Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Australia/New Zealand Corporate Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Australia/New Zealand General Data Cleansing Dependencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Australia/New Zealand Matching and Deduplication Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Table of Contents i

Page 5: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Chapter 4: Brazil Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Brazil Accelerator Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Brazil Demonstration Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Brazil Address Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Brazil Contact Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Brazil Corporate Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Brazil General Data Cleansing Dependencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Brazil Matching and Deduplication Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Chapter 5: Financial Services Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Financial Services Accelerator Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Financial Services Contact Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Financial Services Financial Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Financial Services General Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Financial Services Matching and Deduplication Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Chapter 6: Portugal Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Portugal Accelerator Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Portugal Demonstration Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Address Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Portugal Contact Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Portugal Corporate Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Portugal General Data Cleansing Dependencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Portugal Matching and Deduplication Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 7: United Kingdom Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38United Kingdom Accelerator Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

United Kingdom Demonstration Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

United Kingdom Address Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

United Kingdom Contact Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

United Kingdom Financial Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

United Kingdom General Data Cleansing Dependencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

United Kingdom Matching and Deduplication Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Chapter 8: U.S./Canada Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45U.S./Canada Accelerator Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

U.S./Canada Demonstration Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Address Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

U.S./Canada Contact Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

General Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

U.S./Canada Matching and Deduplication Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

ii Table of Contents

Page 6: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

PrefaceThe Informatica Data Quality Accelerator Guide is written for data quality developers. This guide assumes that youhave an understanding of data quality concepts such as standardization, parsing, labeling, and validation.

Informatica Resources

Informatica Customer PortalAs an Informatica customer, you can access the Informatica Customer Portal site at http://mysupport.informatica.com. The site contains product information, user group information, newsletters,access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library,the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica ProductDocumentation, and access to the Informatica user community.

Informatica DocumentationThe Informatica Documentation team takes every effort to create accurate, usable documentation. If you havequestions, comments, or ideas about this documentation, contact the Informatica Documentation team throughemail at [email protected]. We will use your feedback to improve our documentation. Let usknow if we can contact you regarding your comments.

The Documentation team updates documentation as needed. To get the latest documentation for your product,navigate to Product Documentation from http://mysupport.informatica.com.

Informatica Web SiteYou can access the Informatica corporate web site at http://www.informatica.com. The site contains informationabout Informatica, its background, upcoming events, and sales offices. You will also find product and partnerinformation. The services area of the site includes important information about technical support, training andeducation, and implementation services.

Informatica How-To LibraryAs an Informatica customer, you can access the Informatica How-To Library at http://mysupport.informatica.com.The How-To Library is a collection of resources to help you learn more about Informatica products and features. Itincludes articles and interactive demonstrations that provide solutions to common problems, compare features andbehaviors, and guide you through performing specific real-world tasks.

iii

Page 7: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Informatica Knowledge BaseAs an Informatica customer, you can access the Informatica Knowledge Base at http://mysupport.informatica.com.Use the Knowledge Base to search for documented solutions to known technical issues about Informaticaproducts. You can also find answers to frequently asked questions, technical white papers, and technical tips. Ifyou have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Baseteam through email at [email protected].

Informatica Multimedia Knowledge BaseAs an Informatica customer, you can access the Informatica Multimedia Knowledge Base at http://mysupport.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia filesthat help you learn about common concepts and guide you through performing specific tasks. If you havequestions, comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Baseteam through email at [email protected].

Informatica Global Customer SupportYou can contact a Customer Support Center by telephone or through the Online Support. Online Support requiresa user name and password. You can request a user name and password at http://mysupport.informatica.com.

Use the following telephone numbers to contact Informatica Global Customer Support:

North America / South America Europe / Middle East / Africa Asia / Australia

Toll FreeBrazil: 0800 891 0202Mexico: 001 888 209 8853North America: +1 877 463 2435

Toll FreeFrance: 0805 804632Germany: 0800 5891281Italy: 800 915 985Netherlands: 0800 2300001Portugal: 800 208 360Spain: 900 813 166Switzerland: 0800 463 200United Kingdom: 0800 023 4632

Standard RateBelgium: +31 30 6022 797France: +33 1 4138 9226Germany: +49 1805 702 702Netherlands: +31 306 022 797United Kingdom: +44 1628 511445

Toll FreeAustralia: 1 800 151 830New Zealand: 09 9 128 901

Standard RateIndia: +91 80 4112 5738

iv Preface

Page 8: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

C H A P T E R 1

Introduction to AcceleratorsThis chapter includes the following topics:

¨ Accelerators Overview, 1

¨ Installing Accelerators, 2

¨ Accelerator Rules, 2

¨ Content Sets, 3

¨ Demonstration Data Objects, 3

¨ Demonstration Mappings, 3

¨ Reference Tables, 4

¨ Tags, 4

Accelerators OverviewAccelerators provide solutions to common data quality issues in a country, region, or industry. Acceleratorscontain rules, reference tables, demonstration mappings, and demonstration data objects.

Informatica produces the following accelerators:

¨ Informatica Data Quality Accelerator for Australia and New Zealand

¨ Informatica Data Quality Accelerator for Brazil

¨ Informatica Data Quality Accelerator for Financial Services

¨ Informatica Data Quality Accelerator for Portugal

¨ Informatica Data Quality Accelerator for United Kingdom

¨ Informatica Data Quality Accelerator for US and Canada

¨ Informatica Data Quality Core Accelerator

Informatica provides the Core accelerator with the Content installer. The Core accelerator contains data qualityrules that you can use with data from multiple regions.

Informatica licenses the remaining accelerators. For each accelerator, Informatica customizes the repositoryobjects for a country, region, or industry. For example, the Brazil accelerator and the Portugal accelerator bothcontain a reference table that lists postal codes and cities. However, the contents of this reference table aredifferent for each country.

1

Page 9: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Installing AcceleratorsUse Informatica Developer to import accelerator rules, demonstration mappings, and reference tables to the Modelrepository and to write reference table data to the staging database. Use the Content installer executable files toinstall address reference data, identity populations, and accelerator demonstration data.

Install all accelerators to the same repository project to maintain accelerator dependencies. The following examplerepository path uses the string "Informatica_DQ_Content" as a project name:

[Informatica_DQ_Content]\Rules

For more information about installing accelerators, see the Data Quality Content Installation Guide. This guidecontains detailed information about installation and installation prerequisites.

Using Accelerators in PowerCenterTo use an accelerator in PowerCenter, you must first install it in Data Quality and then export the rule mapplets ormappings to PowerCenter.

When you export accelerator rules from Data Quality to PowerCenter, verify that you include all reference tables,data objects, and dependencies. If you export rules to PowerCenter 8.6.1 or 9.0.1, verify that the Data Quality9.1.0 Integration Plug-in is active on the Data Integration Service machine that runs the accelerator mappings.

Accelerator RulesAccelerator rules contain prebuilt data quality operations. You can use rules individually or combine rules in amapping.

Accelerator rules install to the following repository location:

[Informatica_DQ_Content]\Rules

Use accelerator rules to perform the following data quality tasks:

Address Validation

Verify and correct postal address data. This task requires address reference data files.

Data parsing

Parse information from records. Parsing rules can extract the following types of information: person names,organization names, telephone numbers, calendar dates, and identification numbers.

Data standardization

Standardize the spelling and format of data. Standardization rules can correct person names, organizationnames, telephone numbers, calendar dates, and identification numbers.

Duplicate analysis

Find duplicate records in a data set. Duplicate analysis rules identify duplicate records by comparing names,telephone numbers, calendar dates, email addresses, and identification numbers. You cannot use duplicateanalysis rules in the Analyst tool.

2 Chapter 1: Introduction to Accelerators

Page 10: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Content SetsAccelerators include content sets that you can use in Labeler, Parser, and Standardizer transformations to identifydata values for data quality operations. Content sets include character sets, pattern sets, regular expressions, andtoken sets.

Character Sets

A character set contains expressions that identify specific characters and character ranges. Use charactersets to identify a specific character or range of characters. For example, you can label all numerals in acolumn that contains telephone numbers.

Pattern Sets

A pattern set contains expressions that identify data patterns in the output of a token labeling operation. Usepattern sets to analyze the tokenized data output port and to write matching strings to one or more outputports.

Regular Expressions

In a content set, a regular expression is an expression that you can use to identify one or more strings in inputdata.

Token Sets

A token set contains expressions that identify tokens. Use token sets to identify specific tokens as part oflabeling and parsing operations.

Demonstration Data ObjectsAccelerators provide demonstration data objects that you can use to explore data quality functionality. These dataobjects are comma-separated data files.

Demonstration data objects install to the following repository location:

[Informatica_DQ_Content]\Rules_Demo

Demonstration MappingsAccelerators provide demonstration mappings for data quality operations such as standardization and duplicateanalysis. You can use demonstration mappings as templates for data quality operations.

For mappings that perform address validation, you must install a valid Address Doctor licence key and referencedata set.

Demonstration mappings install to the following repository location:

[Informatica_DQ_Content]\Rules_Demo

Content Sets 3

Page 11: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Reference TablesAccelerators use reference tables to standardize source data and derive additional information associated with thesource data. Each row in a reference table contains a set of related values, one of which is designated as the validvalue.

Accelerator reference tables install to the following repository location:

[Informatica_DQ_Content]\Dictionaries

TagsAccelerators objects contain tags that describe accelerator categories such as business area, entity, function, andlocale. You can search for tags to find all the accelerator objects in a category.

The following table lists the tags for each accelerator category:

Category Tags

Business Area - Company- Customer- Finance- General- Product

Entity - Address- Currency- Date- Email- Gender- Name- National ID- Number- Phone- SSN- Tax

Function - Enrich- Label- Matching- Parse- Profile- Standardize- Validate

Locale - Australia- Brazil- Canada- Europe- Great Britain- New Zealand- North America- Portugal- UK- USA

4 Chapter 1: Introduction to Accelerators

Page 12: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

C H A P T E R 2

Core AcceleratorThis chapter includes the following topics:

¨ Core Accelerator Overview, 5

¨ Core Demonstration Mappings, 5

¨ Core Address Data Cleansing Rules, 6

¨ Core Contact Data Cleansing Rules, 6

¨ Core Corporate Data Cleansing Rules, 7

¨ Core General Data Cleansing Rules, 7

¨ Core Product Data Cleansing Rules, 11

Core Accelerator OverviewThe Core accelerator validates and enhances data by using specialized data quality processes and referencetables. This accelerator includes rules, reference tables, demonstration mappings, and demonstration data objects.

The Core accelerator includes rules that perform the following data quality processes:

¨ Address data cleansing

¨ Contact data cleansing

¨ Corporate data cleansing

¨ General data cleansing

¨ Product data cleansing

Other accelerators have dependencies on Core accelerator rules and mapplets.

Core Demonstration MappingsThe Core accelerator demonstration mappings combine accelerator rules to demonstrate complex data qualityprocesses.

Core accelerator demonstration mappings install to the following repository location:

[Informatica_DQ_Content]\Rules_Demo\Core_Accelerator

The accelerator includes the following demonstration mappings:

5

Page 13: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

m_customer_data_demo

Parses, standardizes, and validates U.S. and Canadian data. The data objects referenced in this mapping usethe following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

m_product_demo

Demonstrates rules that parse product descriptions and validate the quality of those descriptions usingreference values.

You can use this mapping as a template for validating product descriptions. This mapping is not a completesolution. The data objects referenced in this mapping use the following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration.

Core Address Data Cleansing RulesUse address data cleansing rules from the Core accelerator to parse, standardize, and validate address data.

Core accelerator address data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Address_Data_Cleansing

The following table describes the address data cleansing rules from the Core accelerator:

Name Description

rule_Country_Name_Standardization Standardizes country names. This rule returns a country name, a two-character ISO country code, and three-character ISO country code.

rule_Global_Address_Validation_Discrete Validates the deliverability of fully tokenized global addresses. This rulerequires address reference data and a corresponding license.

rule_Global_Address_Validation_Hybrid Validates the deliverability of partially tokenized global addresses. This rulerequires address reference data and a corresponding license.

rule_Global_Address_Validation_Multiline Validates the deliverability of multiline global addresses. This rule requiresaddress reference data and a corresponding license.

Core Contact Data Cleansing RulesUse contact data cleansing rules from the Core accelerator to parse and validate data.

Core contact address data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing

6 Chapter 2: Core Accelerator

Page 14: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

The following table describes the contact data cleansing rules from the Core accelerator:

Name Description

rule_Email_Parse Parses email addresses from data fields.

rule_Email_Parse_Into_Mailbox_Domain Parses email addresses into mailbox, domain and subdomain ports. Forexample, "[email protected]" is parsed in the following manner:- Mailbox - "info"- Sub-domain - "informatica"- Domain - "com"

rule_Email_Validation Validates the format of email addresses. This rule does not verify thatemail addresses are accurate or active. This rule returns "Valid" or "Invalid."

Core Corporate Data Cleansing RulesUse corporate data cleansing rules from the Core accelerator to standardize data.

Core accelerator corporate data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing

The following table describes the corporate data cleansing rules from the Core accelerator:

Name Description

rule_Company_Name_Standardization Standardizes company names using reference table values.

Core General Data Cleansing RulesUse general data cleansing rules from the Core accelerator to parse, standardize, and validate data.

Core accelerator general data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\General_Data_Cleansing

The following table describes the general data cleansing rules from the Core accelerator:

Name Description

mplt_Parse_Tokens_Into_Single_Field Parses words from a space-delimited string. This mapplet is useful foranalyzing strings to identify recurring patterns.

rule_Add_Leading_Zero Adds the numeral "0" to the beginning of a string.

rule_Add_Parentheses_At_Start_End_ofLine Adds parenthetical symbols at the start and end of a string.

rule_Add_Plus_To_Start_of_Line Adds the plus symbol at the start of a string.

Core Corporate Data Cleansing Rules 7

Page 15: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_Add_Space_Around_Ampersand Adds a space before and after all ampersands in a string.

rule_Add_Space_Around_Hyphen Adds a space before and after all dashes and hyphens in a string.

rule_Add_Space_Between_Number_Letter Adds a space in between a character pair composed of one numeral andone alphabetic character. Reading from left to right, this mapplet adds aspace to the first numeral-alphabetic character pair in the data.

rule_Add_Spaces_Around_Period Adds a space before and after all periods in a string.

rule_AllTrim Removes all leading and trailing spaces from input data.

rule_Assign_DQ_90_ElementInputStatus_Description

Assigns a description string to the Element Input Status output from theAddress Validator transformation.

rule_Assign_DQ_90_ElementRelevance_Description

Assigns a description string to the Element Relevance output from theAddress Validator transformation.

rule_Assign_DQ_90_ElementResultStatus_Description

Assigns a description string to the Element Result Status output from theAddress Validator transformation.

rule_Assign_DQ_90_GeocodingStatus_Description

Assigns a description string to the Geocoding Status output from theAddress Validator transformation.

rule_Assign_DQ_90_Mailability_Score_Description

Assigns a description string to the Mailability Score output from theAddress Validator transformation.

rule_Assign_DQ_90_Match_Code_Descriptions Assigns a description string to Address Match Code Score output from theAddress Validator transformation.

rule_Compare_Dates Compares two dates and calculates the difference between them. Thismapplet provides information about the total time difference using thefollowing units of measure:- Hours- Days- Months- YearsEach output value is exclusive from the other values. The outputs cannotbe added to represent the difference between the data values.

rule_Completeness Checks for NULL values.

rule_Completeness_Multi_Port Checks multiple ports for NULL values.

rule_Concatenate_Words Concatenates two fields using a space as a separator.

rule_Convert_DQ90_Match_Codes_to_IDQ_86_Codes

Converts Data Quality 9.0 and later match codes to Data Quality 8.6 matchcodes.

8 Chapter 2: Core Accelerator

Page 16: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_CreditCard_Number_Validation Validates credit card numbers for credit cards that use the Luhn algorithm.This includes, but is not limited to, the following credit cards:- American Express- Diners Club Carte Blanche- Diners Club International- Diners Club US & Canada- Discover Card- JCB- Maestro- Master Card- Solo- Switch- Visa- Visa ElectronThis rule returns "Valid" or "Invalid."

rule_Date_Parse Parses dates from strings. This rule recognizes dates in the followingformats:- dd/mm/yyyy- mm/dd/yyyy- yyyy/dd/mmThis rule returns a date and also returns a string that contains the input textwith the date removed.

rule_Date_Standardization Standardizes date strings. For input dates that are not valid or do not fit theinput format you designate, the rule returns all zeros. To configure theformat of the output, edit the Output_Date_Format and Delimiterexpression variables in the dq_FormatDate Expression transformation.

rule_Date_Validation Validates date strings. To configure the date format the rule uses forvalidation, edit the In_Date_Format expression variable in thedq_ValidateDate Expression transformation. Default is "MM/DD/YYYY."This rule returns "Valid" or "Invalid."

rule_IsNumeric Validates whether input is numeric. This rule returns "True" or "False."

rule_LowerCase Returns all alphabetic characters in lower case.

rule_Luhn_Algorithm Validates a numeric string using the Luhn Algorithm. This rule validatesstrings such as credit card numbers.

rule_Parse_First_Word Parses the first word in a string.

rule_Parse_Number_At_End_Of_Line Parses a number that occurs at the end of a string, reading from left toright.

rule_Parse_Number_At_Start_Of_Line Parses a number that occurs at the beginning of a string, reading from leftto right.

rule_Parse_Text_Between_Parentheses Parses strings bracketed by left and right parentheses. This rule returns aport for parsed strings and a port for the input text with parsed stringsremoved.

rule_Parse_Text_in_Single_Quotes Parses strings located between quotation marks. In cases where the inputcontains multiple quoted elements, this rule parses the last element,

Core General Data Cleansing Rules 9

Page 17: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

reading from left to right. This rule returns a port for parsed strings and aport for the input text with parsed strings removed.

rule_Personal_Company_Identification Parses person names and company names. This rule has the followingoutputs:- Person name- Company name- Data the rule cannot parse- Data category, such as person name, or company name

rule_Prepend_Zero_to_Single_Digit Prepends the numeral "0" to single numeric characters.

rule_Remove_Apostrophe Removes apostrophes. This rule merges the text strings on either side ofthe apostrophe.

rule_Remove_Control_Characters Removes control characters from text strings. This rule returns a string thatcontains the control characters and a string that contains the input text withthe control characters removed.

rule_Remove_Extra_Spaces Replaces all multiple consecutive spaces with a single space and trimsleading and trailing spaces.

rule_Remove_Hyphen Removes hyphens.

rule_Remove_Leading_Zero Removes a single instance of the numeric character "0" from the beginningof a string.

rule_Remove_Limited_Punctuation Removes extraneous characters. Extraneous characters include slashes,backslashes, periods, exclamation marks, underscores, and multipleconsecutive spaces.

rule_Remove_Non_Numbers Removes all characters that are not numeric.

rule_Remove_Parentheses Removes right and left parenthesis symbols.

rule_Remove_Period Removes periods.

rule_Remove_Period_Parentheses Removes the following characters:- Left and right parentheses- Periods

rule_Remove_Punctuation Removes punctuation symbols from input data.

rule_Remove_Punctuation_and_Space Removes all punctuation and all space characters.

rule_Remove_Quotation Removes quotation marks.

rule_Remove_Slashes Removes forward and back slashes.

rule_Remove_Space Removes all space characters.

rule_Replace_Ampersand_With_Space Replaces ampersands with spaces.

rule_Replace_Hyphen_Underscore_with_Space Replaces hyphens and underscores with spaces.

10 Chapter 2: Core Accelerator

Page 18: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_Replace_Hyphen_with_Space Replaces hyphens with spaces.

rule_Replace_Limited_Punct_with_Space Replaces the following punctuation characters with a single space: dash,backslash, period, exclamation mark, and underscore. This rule alsoreplaces two, three, and four consecutive spaces with a single space.

rule_Replace_Non_Alphabetic_with_Space Replaces numerals and punctuation characters with a single space.

rule_Replace_Period_With_Space Replaces periods with a single space.

rule_Replace_Punctuation_with_Space Replaces all punctuation with spaces.

rule_Replace_Slashes_With_Space Replaces forward slashes and back slashes with spaces.

rule_Reverse_String_Input Reverses the order of characters in input strings.

rule_TitleCase Replaces strings with title case strings. In title case strings, the first letterof each word is capitalized.

rule_Translate_Diacritic_Characters Replaces diacritic characters with ASCII equivalents. For example, the ruleconverts "ã" to "a".

rule_UpperCase Returns all alphabetic characters in upper case.

Core Product Data Cleansing RulesUse Core product data cleansing rules to parse, standardize, and validate address data.

Core accelerator product data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Product_Data_Cleansing

The following table describes the Core product data cleansing rules:

Name Description

rule_Color_Parse Parses colors using a reference table.

rule_Parse_Quantity_And_UOM Parses the first instance of a quantity and unit of measure (UOM) from astring, reading from left to right. This rule returns the following data:- Quantity- Unit of measure- Input text with quantity and unit of measure removed

rule_UOM_Standardization Standardizes a unit of measure (UOM). This rule returns standardized andunstandardized values for quantity and UOM. It also returns a string thatcontains the input text with a standardized UOM.

rule_UPC_Validation Validates Universal Product Code (UPC) codes and returns a standardizedUPC code.

Core Product Data Cleansing Rules 11

Page 19: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

C H A P T E R 3

Australia/New Zealand AcceleratorThis chapter includes the following topics:

¨ Australia/New Zealand Accelerator Overview, 12

¨ Australia/New Zealand Demonstration Mappings, 12

¨ Address Data Cleansing Rules, 13

¨ Australia/New Zealand Contact Data Cleansing Rules, 14

¨ Australia/New Zealand Corporate Data Cleansing Rules, 16

¨ Australia/New Zealand General Data Cleansing Dependencies, 16

¨ Australia/New Zealand Matching and Deduplication Rules, 17

Australia/New Zealand Accelerator OverviewThe Australia/New Zealand accelerator validates and enhances data by using specialized data quality processesand region-specific reference tables. This accelerator includes rules, reference tables, demonstration mappings,and demonstration data objects.

The Australia/New Zealand accelerator includes rules that perform the following data quality processes:

¨ Address data cleansing

¨ Contact data cleansing

¨ Corporate data cleansing

¨ General data cleansing

¨ Matching and deduplication

This accelerator depends on general data cleansing rules that the Core accelerator installs.

Australia/New Zealand Demonstration MappingsThe Australia/New Zealand demonstration mappings combine accelerator rules to demonstrate complex dataquality processes.

Australia/New Zealand accelerator demonstration mappings install to the following repository location:

[Informatica_DQ_Content]\Rules_Demo\AUS_NZL_Accelerator

The accelerator includes the following demonstration mappings:

12

Page 20: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

m_AUS_customer_data_demo

Parses, standardizes, and validates Australia and New Zealand data. The data objects referenced in thismapping use the following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

m_AUS_customer_matching_demo

Demonstrates standardization and parsing rules that are customized for matching data from Australia andNew Zealand.

This mapping analyzes the following data combinations and generates match clusters for each combination:

¨ Person name and address data

¨ Person name and phone number

You can connect these match cluster outputs to an Association transformation to generate AssociationIDs .You can then connect the Association transformation output to a Consolidation transformation to identifymaster records.

The data objects referenced in this mapping use the following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

Address Data Cleansing RulesUse Australia/New Zealand address data cleansing rules to parse, standardize, and validate data.

Australia/New Zealand accelerator address data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Address_Data_Cleansing

The following table describes the Australia/New Zealand address data cleansing rules:

Name Description

rule_AUS_Address_Parse_Hybrid Parses partially tokenized Australian addresses into tokens. This rule does not validateaddress deliverability.

rule_AUS_Address_Parse_Multiline Parses multiline Australian addresses into tokens. This rule does not validate addressdeliverability.

rule_AUS_Address_Validation_Discrete

Validates the deliverability of Australian addresses. Use this rule if the input addressdata is fully tokenized. This rule requires address reference data and a correspondinglicense.

rule_AUS_Address_Validation_Hybrid

Validates the deliverability of Australia addresses. Use this rule if the input address datais partially tokenized. This rule requires address reference data and a correspondinglicense.

Address Data Cleansing Rules 13

Page 21: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_AUS_Address_Validation_Multiline

Validates the deliverability of Australian addresses. Use this rule if the input addressdata is not tokenized. This rule requires address reference data and a correspondinglicense.

rule_NZL_Address_Parse_Hybrid Parses partially tokenized New Zealand addresses into tokens. This rule does notvalidate address deliverability.

rule_NZL_Address_Parse_Multiline Parses multiline New Zealand addresses into tokens. This rule does not validateaddress deliverability.

rule_NZL_Address_Validation_Discrete

Validates the deliverability of New Zealand addresses. Use this rule if the input addressdata is fully tokenized. This rule requires address reference data and a correspondinglicense.

rule_NZL_Address_Validation_Hybrid

Validates the deliverability of New Zealand addresses. Use this rule if the input addressdata is partially tokenized. This rule requires address reference data and acorresponding license.

rule_NZL_Address_Validation_Multiline

Validates the deliverability of New Zealand addresses. Use this rule if the input addressdata is not tokenized. This rule requires address reference data and a correspondinglicense.

Australia/New Zealand Contact Data Cleansing RulesUse Australia/New Zealand contact data cleansing rules to parse, standardize, and validate data.

Australia/New Zealand accelerator contact data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing

The following table describes the Australia/New Zealand contact data cleansing rules:

Name Description

rule_AUS_Driver_Licence_Number_Validation

Validates Australian driving license numbers using length and pattern requirements.

rule_AUS_Gender_Assignment Assigns gender according to Australian first names. For example, this rule assigns thename "John Smith" a gender of "M" for male. This rule returns "M" or "F."

rule_AUS_Given_Name_Standard Generate given names from Australian nicknames.

rule_AUS_Multi_Person_Name_Parse

Parses U.K. names into name tokens such as title, first name, middle name, andsurname.

rule_AUS_Personal_Name_Parsing_FML

Parses Australian names in First-Middle-Last format into tokens.

rule_AUS_Personal_Name_Parsing_LFM

Parses Australian names in Last-First-Middle format into tokens.

14 Chapter 3: Australia/New Zealand Accelerator

Page 22: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_AUS_Phone_Number_Parse Parses a Australian phone number from a string. This rule parses the first phone numberin the data, reading from right to left.This rule recognizes phone numbers that use leading zeros, international dialing codes,or extensions that begin with the hash symbol. This rule processes the followingpunctuation symbols: the plus sign, parentheses, and the hash symbol. Before you runthis rule, remove all other punctuation, including double spaces.This rule returns a phone number and also returns a string that contains the input textwith the phone number removed.

rule_AUS_Phone_Number_Standardization

Standardizes Australian phone numbers to international and local dialing formats. Thisrule recognizes phone numbers that use leading zeros, international dialing codes, orextensions that begin with the hash symbol.

rule_AUS_Phone_Number_Validation

Validates the area code and length of Australian phone numbers. This rule returns theregion of the phone number, as well as codes that indicate if the area code and length ofa phone number are valid.

rule_AUS_Tax_File_Number_Parse Parses Australian Tax File Numbers (TFN).

rule_AUS_Tax_File_Number_Standardization

Standardizes Australian Tax File Numbers (TFN). To configure the standardized format,edit the TFN_Format expression variable in the dq_Format_TFN Expressiontransformation. Default is "No_punctuation."

rule_AUS_Tax_File_Number_Validation

Validates Australian Tax File Numbers (TFN) using a check digit.

rule_NZL_Gender_Assignment Assigns gender according to New Zealand first names. For example, this rule assignsthe name "John Smith" a gender of "M" for male. This rule returns "M" or "F."

rule_NZL_Given_Name_Standard Generate given names from New Zealand nicknames.

rule_NZL_IRD_Number_Parse Parses nine-digit numeric strings as New Zealand Inland Revenue Department numbers(IRD).

rule_NZL_IRD_Number_Standardization

Standardizes New Zealand Inland Revenue Department numbers (IRD). To configurethe standardized format, edit the IRD_Format expression variable in the dq_Format_IRDExpression transformation. Default is "No_punctuation." This rule requires that the inputis a nine-digit string.

rule_NZL_IRD_Number_Validate Validates New Zealand Inland Revenue Department numbers (IRD) using a check digit.

rule_NZL_Phone_Number_Parse Parses a New Zealand phone number from a string. This rule parses the first phonenumber in the data, reading from right to left.This rule recognizes phone numbers that use leading zeros, international dialing codes,or extensions that begin with the hash symbol. This rule processes the followingpunctuation symbols: the plus sign, parentheses, and the hash symbol. Before you runthis rule, remove all other punctuation, including double spaces.This rule returns a phone number and also returns a string that contains the input textwith the phone number removed.

rule_NZL_Phone_Number_Standardization

Standardizes Australian phone numbers to international and local dialing formats. Thisrule recognizes phone numbers that use leading zeros, international dialing codes, orextensions that begin with the hash symbol.

Australia/New Zealand Contact Data Cleansing Rules 15

Page 23: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_NZL_Phone_Number_Validation

Validates the area code and length of New Zealand phone numbers. This rule returnsthe region of the phone number, as well as codes that indicate if the area code andlength of a phone number are valid

rule_Prename_Assignment Generates an honorific according to the gender. You can change the female_prenameexpression variable from Ms. to Mrs.

rule_Salutation_Assignment Generates formal and casual greetings from prenames and name tokens. For example,for "Mr. John Smith," the rule generates the formal greeting "Dear Mr. Smith," and thecasual greeting "Dear John,". You can change the prefix and punctuation by editing thevariables in the dq_Generate_Salutation Expression transformation.

Australia/New Zealand Corporate Data Cleansing RulesUse Australia/New Zealand corporate data cleansing rules to parse, standardize, and validate data.

Australia/New Zealand accelerator corporate data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing

The following table describes the Australia/New Zealand corporate data cleansing rules:

Name Description

rule_AUS_Business_Number_Parse

Parses 11-digit numeric strings as Australian Business Numbers (ABN).

rule_AUS_Business_Number_Standardize

Standardizes Australian Business Numbers (ABN) into the standard format (99 999 999999). This rule requires that the input is a 11-digit string.

rule_AUS_Business_Number_Validation

Validates Australian Business Numbers (ABN) using a check digit.

rule_AUS_Company_Name_Standardization

Standardizes company names using Australian reference table values.

Australia/New Zealand General Data CleansingDependencies

The Australia/New Zealand accelerator has dependencies on general data cleansing rules that you install as partof the Core accelerator.

The Australia/New Zealand accelerator has dependencies on the following general data cleansing rules:

¨ rule_Assign_DQ_90_Mailability_Score_Description

¨ rule_Assign_DQ_90_Match_Code_Descriptions

¨ rule_Remove_Extra_Spaces

16 Chapter 3: Australia/New Zealand Accelerator

Page 24: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

¨ rule_Remove_Hyphen

¨ rule_Remove_Leading_Zero

¨ rule_Remove_Period_Parentheses

¨ rule_Remove_Punctuation

¨ rule_Remove_Punctuation_and_Space

¨ rule_Remove_Space

¨ rule_Replace_Limited_Punct_with_Space

¨ rule_UpperCase

For more information about these rules, see “Core General Data Cleansing Rules” on page 7.

Australia/New Zealand Matching and DeduplicationRules

Use Australia/New Zealand matching and deduplication rules to parse, standardize, and validate data.

Australia/New Zealand matching and deduplication rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Matching_Deduplication

The following table describes the Australia/New Zealand matching and deduplication rules:

Name Description

mplt_AUS_Firstname_and_TFN_Match

Identifies duplicate rows for Australian data based on Tax File Number (TFN) and firstnames. This mapplet matches rows using group keys generated from TFN.

mplt_AUS_IMO_Company_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows for Australian data based oncompany names and addresses. This mapplet matches rows using group keysgenerated from postal codes.

mplt_AUS_IMO_Familyname_and_Address_Match

Uses identity match strategies to identify duplicate rows for Australian data based onfamily names and addresses. This mapplet matches rows using group keys generatedfrom postal codes.

mplt_AUS_IMO_Individual_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows for Australian data based onperson names and addresses. This mapplet matches rows using group keys generatedfrom postal codes.

mplt_AUS_IMO_Personal_Name_and_Data

Uses identity match strategies to identify duplicate rows for Australian data based onperson names and personal data. The fields in the personal data column should containa single type of data, such as phone number, email, or TFN. This mapplet matches rowsusing group keys generated from personal data.

mplt_AUS_Individual_Name_and_Address_Match

Identifies duplicate rows based on person names and Australia address data. Thismapplet matches rows using group keys generated from the NYSIIS code for thesurname and the first two digits of the postal code.

mplt_AUS_Individual_Name_and_Date_Match

Identifies duplicate rows based on Australian person names and dates. This mappletmatches rows using group keys generated from dates.

Australia/New Zealand Matching and Deduplication Rules 17

Page 25: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

mplt_AUS_Individual_Name_and_Email_Match

Identifies duplicate rows based on email addresses and Australian person names. Thismapplet matches rows using group keys generated from email addresses.

mplt_AUS_Individual_Name_and_Phone_Match

Identifies duplicate rows based on Australia person names and phone numbers. Thismapplet matches rows using group keys generated from phone numbers.

mplt_AUS_Individual_Name_and_TFN_Match

Identifies duplicate rows for Australian data based on Tax File Number (TFN) andperson names. This mapplet matches rows using group keys generated from TFN.

mplt_AUS_Individual_Name_Match Identifies duplicate rows based on Australian person names. This mapplet matches rowsusing group keys generated from the NYSIIS codes for surnames.

mplt_AUS_NZL_Company_Name_and_Address_Match

Identifies duplicate rows based on company name and Australia/New Zealand addressdata. This mapplet matches rows using group keys generated from the first threecharacters of the Soundex code for the company name and the first two digits of thepostal code.

mplt_AUS_NZL_Familyname_and_Address_Match

Identifies duplicate rows based on surname and Australia/New Zealand address data.This mapplet matches rows using group keys generated from the NYSIIS code for thesurname and the first three digits of the postal code.

mplt_Company_Name_Match Identifies duplicate rows based on company name. This mapplet matches rows usinggroup keys generated from the first three characters of the Soundex code for thecompany name.

mplt_NZL_Firstname_and_IRD_Match

Identifies duplicate rows for New Zealand data based on Inland Revenue Department(IRD) number and first names. This mapplet matches rows using group keys generatedfrom IRD numbers.

mplt_NZL_IMO_Company_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows for New Zealand data based oncompany names and addresses. This mapplet matches rows using group keysgenerated from postal codes.

mplt_NZL_IMO_Familyname_and_Address_Match

Uses identity match strategies to identify duplicate rows for New Zealand data based onfamily names and addresses. This mapplet matches rows using group keys generatedfrom postal codes.

mplt_NZL_IMO_Individual_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows for New Zealand data based onperson names and addresses. This mapplet matches rows using group keys generatedfrom postal codes.

mplt_NZL_IMO_Personal_Name_and_Data

Uses identity match strategies to identify duplicate rows for New Zealand data based onperson names and personal data. The fields in the personal data column should containa single type of data, such as phone number, email, or IRD. This mapplet matches rowsusing group keys generated from personal data.

mplt_NZL_Individual_Name_and_Address_Match

Identifies duplicate rows based on person names and New Zealand address data. Thismapplet matches rows using group keys generated from the NYSIIS code for thesurname and the first two digits of the postal code.

mplt_NZL_Individual_Name_and_Date_Match

Identifies duplicate rows based on New Zealand person names and dates. This mappletmatches rows using group keys generated from dates.

mplt_NZL_Individual_Name_and_Email_Match

Identifies duplicate rows based on email addresses and New Zealand person names.This mapplet matches rows using group keys generated from email addresses.

18 Chapter 3: Australia/New Zealand Accelerator

Page 26: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

mplt_NZL_Individual_Name_and_IRD_Match

Identifies duplicate rows based on New Zealand person names and Inland RevenueDepartment (IRD) number. This mapplet matches rows using group keys generated fromIRD numbers.

mplt_NZL_Individual_Name_and_Phone_Match

Identifies duplicate rows based on New Zealand person names and phone numbers.This mapplet matches rows using group keys generated from phone numbers.

mplt_NZL_Individual_Name_Match Identifies duplicate rows based on New Zealand person names. This mapplet matchesrows using group keys generated from the NYSIIS codes for surnames.

rule_AUS_NZL_Company_Name_and_Address_MatchScore

Generates a match score by comparing company names and Australia/New Zealandaddresses.

rule_AUS_NZL_Familyname_and_Address_MatchScore

Generates a match score by comparing surnames and Australia/New Zealand addresses.

rule_AUS_NZL_Firstname_and_PID_MatchScore

Generates a match score by comparing first names and personal identification numbers.

rule_AUS_NZL_Individual_Name_and_Address_MatchScore

Generates a match score by comparing person names and Australia/New Zealandaddresses.

rule_AUS_NZL_Individual_Name_and_PID_MatchScore

Generates a match score by comparing person names and personal identificationnumbers.

rule_Company_Name_MatchScore Generates a match score by comparing company names.

rule_Individual_Name_and_Date_MatchScore

Generates a match score by comparing person names and dates.

rule_Individual_Name_and_Email_MatchScore

Generates a match score by comparing person names and email addresses.

rule_Individual_Name_and_Phone_MatchScore

Generates a match score by comparing person names and phone numbers.

rule_Individual_Name_MatchScore Generates a match score by comparing person names.

Australia/New Zealand Matching and Deduplication Rules 19

Page 27: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

C H A P T E R 4

Brazil AcceleratorThis chapter includes the following topics:

¨ Brazil Accelerator Overview, 20

¨ Brazil Demonstration Mappings, 20

¨ Brazil Address Data Cleansing Rules, 21

¨ Brazil Contact Data Cleansing Rules, 22

¨ Brazil Corporate Data Cleansing Rules, 23

¨ Brazil General Data Cleansing Dependencies, 23

¨ Brazil Matching and Deduplication Rules, 23

Brazil Accelerator OverviewThe Brazil accelerator validates and enhances Brazilian data by using specialized data quality processes andregion-specific reference tables. This accelerator includes rules, reference tables, demonstration mappings, anddemonstration data objects.

The Brazil accelerator includes rules that perform the following data quality processes:

¨ Address data cleansing

¨ Contact data cleansing

¨ Corporate data cleansing

¨ General data cleansing

¨ Matching and deduplication

This accelerator depends on general data cleansing and contact data cleansing rules that the Core acceleratorinstalls.

Brazil Demonstration MappingsThe Brazil accelerator demonstration mappings combine accelerator rules to demonstrate complex data qualityprocesses.

Brazil accelerator demonstration mappings install to the following repository location:

[Informatica_DQ_Content]\Rules_Demo\BRA_Accelerator

20

Page 28: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

The accelerator includes the following demonstration mappings:

m_BRA_customer_data_demo

Parses, standardizes, and validates Brazilian data. The data objects referenced in this mapping use thefollowing path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

m_BRA_customer_matching_demo

Demonstrates the standardization and parsing rules required for matching Brazilian data.

This mapping analyzes the following data combinations and generates match clusters for each combination:

¨ Person name and address data

¨ Person name and phone number

You can connect these match cluster outputs to an Association transformation to generate AssociationIDs.You can then connect the Association transformation output to a Consolidation transformation to identifymaster records.

The data objects referenced in this mapping use the following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor license key and reference data set.

Brazil Address Data Cleansing RulesUse Brazil address data cleansing rules to parse, standardize, and validate data.

Brazil accelerator address data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Address_Data_Cleansing

The following table describes the Brazil address data cleansing rules:

Name Description

rule_BRA_Address_Parse_Hybrid Parses partially tokenized Brazilian addresses into tokens. This rule does not validateaddress deliverability.

rule_BRA_Address_Parse_Multiline Parses multiline Brazilian addresses into tokens. This rule does not validate addressdeliverability.

rule_BRA_Address_Validation_Discrete

Validates the deliverability of Brazilian addresses. Use this rule if the input address datais fully tokenized. This rule requires address reference data and a corresponding license.

rule_BRA_Address_Validation_Hybrid

Validates the deliverability of Brazilian addresses. Use this rule if the input address datais partially tokenized. This rule requires address reference data and a correspondinglicense.

rule_BRA_Address_Validation_Multiline

Validates the deliverability of Brazilian addresses. Use this rule if the input address datais not tokenized. This rule requires address reference data and a corresponding license.

Brazil Address Data Cleansing Rules 21

Page 29: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Brazil Contact Data Cleansing RulesUse Brazil contact data cleansing rules to parse, standardize, and validate data.

Brazil accelerator contact data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing

The following table describes the Brazil contact data cleansing rules:

Name Description

rule_BRA_Gender_Assignment Assigns gender according to first name. For example, this rule assigns the name "JohnSmith" a gender of "M" for male. This rule returns "M" or "F."

rule_BRA_Given_Name_Standard Generate given names from Brazilian nicknames.

rule_BRA_Personal_CPF_Validation

Validates check digits for Cadastro de Pessoas Físicas (CPF) numbers.

rule_BRA_Personal_Name_Parse_Validate

Parses Brazilian person names and validates spelling. This rule also providesinformation about whether the person name is potentially a company name.

rule_BRA_Phone_Number_Parse Parses a Brazilian phone number from a string. This rule parses the first phone numberin the data, reading from left to right. This rule returns a phone number and also returnsa string that contains the input text with the phone number removed.

rule_BRA_Phone_Number_Standardization

Standardizes Brazilian phone numbers. The rule returns the phone number in thefollowing formats:- Standard - nn nnnn nnnn- Dashes - nn-nnnn-nnnn- No Spaces - nnnnnnnnnn

rule_BRA_Phone_Validatation Validates the area code and length of Brazilian phone numbers. This rule returns codesthat indicate if the area code and length of a phone number are valid.

rule_BRA_Prename_Assignment Generates an honorific according to the gender. You can change the female_prenameexpression variable from "Sra" to "Sta".

rule_BRA_Salutation_Assignment Generates formal and casual greetings from prenames and name tokens. For example,for "Sr John Smith," the rule generates the formal greeting "Prezado Sr Smith," and thecasual greeting "Prezado John,". You can change the prefix and punctuation by editingthe variables in the dq_Generate_Salutation Expression transformation.

Dependencies on Core Contact Data Cleansing RulesThe Brazil accelerator depends on the following contact data cleansing rules from the Core accelerator:

¨ rule_Email_Parse_Into_Mailbox_Domain

¨ rule_Email_Validation

For more information about these rules, see “Core Contact Data Cleansing Rules” on page 6.

22 Chapter 4: Brazil Accelerator

Page 30: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Brazil Corporate Data Cleansing RulesUse Brazil corporate data cleansing rules to standardize and validate data.

Brazil corporate data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing

The following table describes the Brazil corporate data cleansing rules:

Name Description

rule_BRA_Company_CNPJ_Validation

Validates Cadastro Nacional da Pessoa Jurídica (CPNJ) numbers. CPNJ numbersidentify Brazilian companies.

rule_BRA_Company_Suffix_Standardization

Standardizes Brazilian company suffixes.

Brazil General Data Cleansing DependenciesThe Brazil accelerator has dependencies on general data cleansing rules that you install as part of the Coreaccelerator.

The Brazil accelerator has dependencies on the following general data cleansing rules:

¨ rule_Assign_DQ_90_Mailability_Score_Description

¨ rule_Assign_DQ_90_Match_Code_Descriptions

¨ rule_Remove_Extra_Spaces

¨ rule_Remove_Punctuation

¨ rule_Replace_Limited_Punct_with_Space

¨ rule_TitleCase

¨ rule_UpperCase

For more information about these rules, see “Core General Data Cleansing Rules” on page 7.

Brazil Matching and Deduplication RulesUse Brazil matching and deduplication rules to generate match scores and identify duplicate rows.

Brazil matching and deduplication rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Matching_Deduplication

Brazil Corporate Data Cleansing Rules 23

Page 31: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

The following table describes the Brazil matching and deduplication rules:

Name Description

mplt_BRA_Company_Name_and_Address_Match

Identifies duplicate rows based on company name and Brazil address data. This mappletmatches rows using group keys generated from the first three characters of the Soundexcode for the company name and the first three digits of the ZIP Code.

mplt_BRA_Familyname_and_Address_Match

Identifies duplicate rows based on surname and Brazilian address data. This mappletmatches rows using group keys generated from the NYSIIS code for the surname andthe first three digits of the postal code.

mplt_BRA_Firstname_and_CPF_Match

Identifies duplicate rows based on first name and Cadastro de Pessoas Físicas (CPF)number. This mapplet matches rows using group keys generated from the CPF number.

mplt_BRA_IMO_Company_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows for Brazilian data based oncompany names and addresses. This mapplet matches rows using group keysgenerated from postal codes.

mplt_BRA_IMO_Familyname_and_Address_Match

Uses identity match strategies to identify duplicate rows for Brazilian data based onfamily names and addresses. This mapplet matches rows using group keys generatedfrom postal codes.

mplt_BRA_IMO_Individual_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows for Brazilian data based onperson names and addresses. This mapplet matches rows using group keys generatedfrom postal codes.

mplt_BRA_IMO_Personal_Name_and_Data

Uses identity match strategies to identify duplicate rows for Brazilian data based onperson names and personal data. The fields in the personal data column should containa single type of data, such as phone number, email, or TFN. This mapplet matches rowsusing group keys generated from personal data.

mplt_BRA_Individual_Name_and_Address_Match

Identifies duplicate rows based on person names and Brazilian address data. Thismapplet matches rows using group keys generated from the NYSIIS code for thesurname and the first three digits of the postal code.

mplt_BRA_Individual_Name_and_CPF_Match

Identifies duplicate rows based on person names and Cadastro de Pessoas Físicas(CPF) numbers. This mapplet matches rows using group keys generated from the CPFnumber.

mplt_BRA_Individual_Name_and_Date_Match

Identifies duplicate rows based on Brazilian person names and date data. This mappletmatches rows using group keys generated from dates.

mplt_BRA_Individual_Name_and_Email_Match

Identifies duplicate rows based on Brazilian person names and email addresses. Thismapplet matches rows using group keys generated from email addresses.

mplt_BRA_Individual_Name_and_Phone_Match

Identifies duplicate rows based on Brazilian person names and phone numbers. Thismapplet matches rows using group keys generated from phone numbers.

mplt_Company_Name_Match Identifies duplicate rows based on company name. This mapplet matches rows usinggroup keys generated from the first three characters of the Soundex code for thecompany name.

rule_BRA_Company_Name_and_Address_MatchScore

Generates a match score by comparing company names and Brazilian address data.

rule_BRA_Familyname_and_Address_MatchScore

Generates a match score by comparing surnames and Brazilian address data.

24 Chapter 4: Brazil Accelerator

Page 32: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_BRA_Firstname_and_CPF_MatchScore

Generates a match score by comparing first name and Cadastro de Pessoas Físicas(CPF) number.

rule_BRA_Individual_Name_and_Address_MatchScore

Generates a match score by comparing person names and Brazilian address data.

rule_BRA_Individual_Name_and_CPF_MatchScore

Generates a match score by comparing person names and Brazilian address data.

rule_BRA_Individual_Name_and_Phone_MatchScore

Generates a match score by comparing person names and phone numbers.

rule_Company_Name_MatchScore Generates a match score by comparing company names.

rule_Individual_Name_and_Date_MatchScore

Generates a match score by comparing person names and dates.

rule_Individual_Name_and_Email_MatchScore

Generates a match score by comparing person names and email addresses.

Brazil Matching and Deduplication Rules 25

Page 33: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

C H A P T E R 5

Financial Services AcceleratorThis chapter includes the following topics:

¨ Financial Services Accelerator Overview, 26

¨ Financial Services Contact Data Cleansing Rules, 26

¨ Financial Services Financial Data Cleansing Rules, 27

¨ Financial Services General Data Cleansing Rules, 29

¨ Financial Services Matching and Deduplication Rules, 30

Financial Services Accelerator OverviewThe Financial Services accelerator validates and enhances data by using specialized data quality processes andreference tables. This accelerator includes rules and reference data.

The Financial Services accelerator includes rules that perform the following data quality processes:

¨ Contact data cleansing

¨ Financial data cleansing

¨ General data cleansing

¨ Matching and deduplication

This accelerator depends on general data cleansing rules from the Core accelerator.

Financial Services Contact Data Cleansing RulesUse the Financial Services contact data cleansing rule to standardize contact data.

The Financial Services accelerator contact data cleansing rule installs to the following repository location:

[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing

The following table describes the Financial Services contact data cleansing rule:

Name Description

rule_USA_Given_Name_Standard Generates given names from United States nicknames. For example, thisrule standardizes the nickname "Bob" to the given name "Robert."

26

Page 34: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Financial Services Financial Data Cleansing RulesUse Financial Services financial data cleansing rules to parse, standardize, and validate financial data.

Financial Services financial data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Financial_Data_Cleansing

The following table describes the Financial Services financial data cleansing rules:

Name Description

rule_Account_Status_Validation Validates the account status using reference table values. This rule requires accountstatus reference data.

rule_Accrual_Period_Validation Validates that the start date is less than the end date.

rule_Age_For_Account_Validation Validates the customer age for the account type using reference table values. This ruleuses the age_per_account_infa reference table. You must update this reference tablewith your own data.

rule_Beta_Coefficient_Validation Validates that the Beta coefficient string is a number. This rule returns whether the stringis a positive number, negative number, zero, or not a number.

rule_BIC_SWIFT_Code_Validation Validates a BIC or SWIFT code by pattern recognition and country code validation.

rule_CAN_Transit_Number_Validation

Validates the pattern of a Canadian transit number using paper and electronic fundstransactions.

rule_Credit_Card_Expiry_Check Validates a credit card expiration date. This rule compares the credit card expiration dateto the system date and identifies expired dates. This rule accepts a seven characterstring in the format MM/YYYY.

rule_Credit_Card_Security_Code_Validation

Validates that the credit card security code is a whole number containing three or fourdigits.

rule_Currency_Code_Country_Validation

Validates that the currency code is valid for the ISO three character country code.

rule_Currency_Code_Validation Validates the currency code. This rule returns "Valid" or "Invalid."

rule_CUSIP_Validation Validates the format and length of the check digit value. This rule returns a status thatdescribes the validity of the check digit value and a message that explains the status.

rule_Delta_Validation Validates that the delta value is positive, negative, or zero.

rule_Dividend_Yield_Validation Validates that the dividend yield string is a number greater than or equal to zero. Thisrule returns whether the string is a positive number, negative number, zero, or not anumber.

rule_EAD_Drawn_Balance_Validation

Validates that the amount listed in the EAD is not less than the drawn balance. This rulefollows the guidelines for EAD calculation by the Financial Services Authority in theUnited Kingdom.

rule_EAD_Validation Validates that the EAD string is a number. This rule returns whether the string is apositive number, negative number, zero, or not a number.

rule_EPS_Validation Validates that the input is a number greater than or equal to zero.

Financial Services Financial Data Cleansing Rules 27

Page 35: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_Ex_Dividend_Date_Validation Validates that the ex-dividend date and the record date are valid dates and that the ex-dividend date is before the record date. This rule identifies dates with a difference ofmore than 15 days as not valid. This rule returns the difference in days between therecord date and the ex-dividend date.

rule_Gamma_Validation Validates that the Gamma string is a number. This rule returns whether the string is apositive number, negative number, zero, or not a number.

rule_GBR_Bank_Account_Parse Parses 8-digit numeric strings as U.K. bank account numbers.

rule_GBR_Bank_Account_Validation

Validates U.K. bank account numbers. This rule returns codes that indicate whether theinput is numeric and whether it is the correct number of digits.

rule_GBR_Bank_Sort_Code_Parse Parses 6-digit numeric strings as U.K. bank sort codes. This rule parses strings ofnumbers in the following formats:- Consecutive numbers (999999)- Numbers delimited with a dash (99-99-99)

rule_GBR_Bank_Sort_Code_Standardise

Standardizes a U.K. bank sort code to the format "NN-NN-NN."

rule_GBR_Bank_Sort_Code_Validation

Validates the format and length of U.K. bank sort codes that are standardized to thedash-delimited format (99-99-99). This rule returns a Status port that describes thevalidity of the sort code and a Validation Note port that explains the status. If the sortcode prefix matches a known assignment for a U.K. bank, the Validation Note portincludes the bank name.

rule_Interest_Rate_Within_Range Validates if the decimal value is within the specified range. The range is set using thetwo variable ports in the Expression transformation. This rule returns "True" or "False."

rule_ISIN_Code_Validation Validates an ISIN code by checking the format and check digit.

rule_Loan_to_Value_Ratio Calculates the loan to value ratio, which is the loan amount divided by the property value.

rule_Loss_Given_Default_Validation

Validates that the string is numeric and a positive, negative, or zero value.

rule_Market_Cap_Validation Validates that the input is a number greater than or equal to zero.

rule_Maturity_Date_Validation Validates that the maturity date is greater than the system date.

rule_Positive_Close_Price_Value_Validation

Validates that the input is a number greater than zero.

rule_Positive_Coupon_Percent_Validation

Validates that the input is a number greater than zero.

rule_Positive_Last_Price_Value_Validation

Validates that the input is a number greater than zero.

rule_Positive_Open_Price_Validation

Validates that the input is a number greater than zero.

rule_Positive_Volume_Validation Validates that the input is a number greater than zero.

28 Chapter 5: Financial Services Accelerator

Page 36: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_Price_Earnings_Ratio_Validation

Validates that the price-to-earnings ratio is a positive number in the range of 0 - 100.

rule_Probability_of_Default_Validation

Validates that the probability of default value is numeric and indicates if it is a positive,negative, or zero value. If positive, this rule returns status messages for values in thefollowing ranges:- < = .1- > .1 and < = .5- > .5 and < = 1- > 1

rule_Rating_Code_Validation Validates that a rating is in the Standard & Poor's ratings scale, the Moody's ratingsscale, or in a user-defined list.

rule_Rating_Date_Validation Validates that the rating date is one year greater than the system date.

rule_Risk_Weighted_Asset_Validation

Validates that a risk weighted asset is a positive number.

rule_SEDOL_Validation Validates a SEDOL code by checking its format and check digit.

rule_Stock_Exchange_Validation Validates most stock exchanges world wide by name and symbol.

rule_USA_Routing_Number_Validation

Validates a standard MICR formatted routing number. Validates the Associated FederalReserve Bank, the structure of the input, and the checksum calculation.

rule_Volatility_Validation Validates that the volatility value is a number greater than or equal to zero.

Financial Services General Data Cleansing RulesUse Financial Services general data cleansing rules to identify the type of information contained in input fields.

Financial Services accelerator general data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\General _Data_Cleansing

The following table describes the Financial Services general data cleansing rules:

Name Description

rule_Postive_Number_Validation Validates that the number is greater than zero.

Dependencies on Core General Data Cleansing RulesThe Financial Services accelerator depends on the following general data cleansing rules from the Coreaccelerator:

¨ rule_Remove_Punctuation

¨ rule_Remove_Punctuation_and_Space

¨ rule_Remove_Space

¨ rule_UpperCase

Financial Services General Data Cleansing Rules 29

Page 37: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

For more information about these rules, see “Core General Data Cleansing Rules” on page 7.

Financial Services Matching and Deduplication RulesUse Financial Services matching and deduplication rules to generate match scores and identify duplicate records.

Financial Services matching and deduplication rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Matching_Deduplication

The following table describes the Financial Services matching and deduplication rules:

Name Description

mplt_Company_Name_and_Address_Match

Identifies duplicate rows based on company name and U.S. address data. This mappletmatches rows using group keys generated from the first three characters of the Soundexcode for the company name and the first three digits of the postal Code.

mplt_Company_Name_Match Identifies duplicate rows based on company name. This mapplet matches rows usinggroup keys generated from the first three characters of the Soundex code for thecompany name.

mplt_Familyname_and_Address_Match

Identifies duplicate rows based on surname and U.S. address data. This mappletmatches rows using group keys generated from the NYSIIS code for the surname andthe first three digits of the postal Code.

mplt_Individual_Name_and_Address_Match

Identifies duplicate rows based on person names and U.S. address data. This mappletmatches rows using group keys generated from the NYSIIS code for the surname andthe first three digits of the postal Code.

mplt_Individual_Name_and_Date_Match

Identifies duplicate rows based on person names and date data. This mapplet matchesrows using group keys generated from dates.

mplt_Individual_Name_and_Email_Match

Identifies duplicate rows based on person names and email addresses. This mappletmatches rows using group keys generated from email addresses.

mplt_Individual_Name_and_Phone_Match

Identifies duplicate rows based on person names and phone numbers. This mappletmatches rows using group keys generated from phone numbers.

mplt_Individual_Name_Match Identifies duplicate rows based on person names. This mapplet matches rows usinggroup keys generated from the NYSIIS codes for surnames.

rule_Company_Name_and_Address_MatchScore

Generates a match score by comparing company names and U.S. address data.

rule_Company_Name_MatchScore Generates a match score by comparing company names.

rule_Familyname_and_Address_MatchScore

Generates a match score by comparing surnames and U.S. address data.

rule_Individual_Name_and_Address_MatchScore

Generates a match score by comparing person names and U.S. address data.

rule_Individual_Name_and_Date_MatchScore

Generates a match score by comparing person names and dates.

30 Chapter 5: Financial Services Accelerator

Page 38: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_Individual_Name_and_Email_MatchScore

Generates a match score by comparing person names and email addresses.

rule_Individual_Name_and_Phone_MatchScore

Generates a match score by comparing person names and phone numbers.

rule_Individual_Name_MatchScore Generates a match score by comparing person names.

Financial Services Matching and Deduplication Rules 31

Page 39: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

C H A P T E R 6

Portugal AcceleratorThis chapter includes the following topics:

¨ Portugal Accelerator Overview, 32

¨ Portugal Demonstration Mappings, 32

¨ Address Data Cleansing Rules, 33

¨ Portugal Contact Data Cleansing Rules, 34

¨ Portugal Corporate Data Cleansing Rules, 34

¨ Portugal General Data Cleansing Dependencies, 35

¨ Portugal Matching and Deduplication Rules, 35

Portugal Accelerator OverviewThe Portugal accelerator validates and enhances Portuguese data by using specialized data quality processesand region-specific reference tables. This accelerator includes rules, reference tables, demonstration mappings,and demonstration data objects.

The Portugal accelerator includes rules that perform the following data quality processes:

¨ Address data cleansing

¨ Contact data cleansing

¨ Corporate data cleansing

¨ Matching and deduplication

This accelerator depends on general data cleansing rules that the Core accelerator installs.

Portugal Demonstration MappingsThe Portugal accelerator demonstration mappings combine accelerator rules to demonstrate complex data qualityprocesses.

Portugal accelerator demonstration mappings install to the following repository location:

[Informatica_DQ_Content]\Rules_Demo\PRT_Accelerator

The accelerator includes the following demonstration mappings:

32

Page 40: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

m_PRT_customer_data_demo

Parses, standardizes, and validates Portuguese data. The data objects referenced in this mapping use thefollowing path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

m_PRT_customer_matching_demo

Demonstrates the standardization and parsing rules required for matching Portuguese data.

This mapping analyzes the following data combinations and generates match clusters for each combination:

¨ Person name and address data

¨ Person name and phone number

You can connect these match cluster outputs to an Association transformation to generate AssociationIDs.You can then connect the Association transformation output to a Consolidation transformation to identifymaster records.

The data objects referenced in this mapping use the following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

Address Data Cleansing RulesUse Portugal address data cleansing rules to parse and validate data.

Portugal accelerator address data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Address_Data_Cleansing

The following table describes the Portugal address data cleansing rules:

Name Description

rule_PRT_Address_Parse_Hybrid Parses partially tokenized Portuguese addresses into tokens. This rule does not validateaddress deliverability.

rule_PRT_Address_Parse_Multiline Parses multiline Portuguese addresses into tokens. This rule does not validate addressdeliverability.

rule_PRT_Address_Validation_Discrete

Validates the deliverability of Portuguese addresses. Use this rule if the input addressdata is fully tokenized. This rule requires address reference data and a correspondinglicense.

rule_PRT_Address_Validation_Hybrid

Validates the deliverability of Portuguese addresses. Use this rule if the input addressdata is partially tokenized. This rule requires address reference data and acorresponding license.

rule_PRT_Address_Validation_Multiline

Validates the deliverability of Portuguese addresses. Use this rule if the input addressdata is not tokenized. This rule requires address reference data and a correspondinglicense.

Address Data Cleansing Rules 33

Page 41: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Portugal Contact Data Cleansing RulesUse Portugal contact data cleansing rules to parse, standardize, and validate data.

Portugal accelerator contact data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing

The following table describes the Portugal contact data cleansing rules:

Name Description

rule_PRT_Gender_Assignment Assigns gender according to first name. For example, this rule assigns the name "JohnSmith" a gender of "M" for male. This rule returns "M" or "F."

rule_PRT_NIF_Parse Parses Número de Identificação Fiscal (NIF) numbers from strings. This rule returns theID numbers and also returns a string that contains the input text with the ID numbersremoved.

rule_PRT_NIF_Standardization Standardizes Número de Identificação Fiscal (NIF) numbers into a nine-digit string. Thisrule removes alphabetic characters, symbols, and spaces.

rule_PRT_NIF_Validate Validates Número de Identificação Fiscal (NIF) numbers by using a check digitalgorithm. This rule requires that the input is a nine-digit numeric string with no spaces.

rule_PRT_Personal_Name_Parse_Validate

Parses Portuguese person names and validates spelling. This rule also providesinformation about whether the person name is potentially a company name.

rule_PRT_Phone_Number_Parse Parses a Portuguese phone number from a string. This rule parses the first phonenumber in the data, reading from right to left. This rule returns a phone number and alsoreturns a string that contains the input text with the phone number removed.

rule_PRT_Phone_Number_Standardization

Standardizes Portuguese phone numbers to international and local dialing formats.

rule_PRT_Prename_Assignment Generates an honorific according to the gender. You can change the female_prenameexpression variable from "Sra" to "Sta".

rule_PRT_Salutation_Assignment Generates formal and casual greetings from prenames and name tokens. For example,for "Sr John Smith," the rule generates the formal greeting "Prezado Sr Smith," and thecasual greeting "Prezado John,". You can change the prefix and punctuation by editingthe variables in the dq_Generate_Salutation Expression transformation.

rule_PRT_Given_Name_Standard Generate given names from Portuguese nicknames.

rule_PRT_Phone_Number_Validation

Validates the area code and length of Portuguese phone numbers. This rule returns theregion of the phone number, as well as codes that indicate if the area code and length ofa phone number are valid.

Portugal Corporate Data Cleansing RulesUse Portugal corporate data cleansing rules to parse, standardize, and validate data.

Portugal corporate data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing

34 Chapter 6: Portugal Accelerator

Page 42: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

The following table describes the Portugal corporate data cleansing rules:

Name Description

rule_PRT_Company_Name_Standardization

Standardizes Portuguese company names using reference table values.

rule_PRT_NIPC_Parse Parses a Número de Identificação Pessoa Colectiva (NIPC). This rule returns the NIPCand also returns a string that contains the input text with the NIPC removed.

rule_PRT_NIPC_Standardise Standardizes a Número de Identificação Pessoa Colectiva (NIPC) into a nine-digit string.This rule removes alphabetic characters, symbols, and spaces.

rule_PRT_NIPC_Validate Validates a Número de Identificação Pessoa Colectiva (NIPC) by using a check digitalgorithm. This rule requires that the input is a nine-digit string.

Portugal General Data Cleansing DependenciesThe Portugal accelerator has dependencies on general data cleansing rules that you install as part of the Coreaccelerator.

The Portugal accelerator has dependencies on the following general data cleansing rules:

¨ rule_Assign_DQ_90_ElementResultStatus_Description

¨ rule_Assign_DQ_90_Match_Code_Descriptions

¨ rule_Remove_Extra_Spaces

¨ rule_Remove_Non_Numbers

¨ rule_Remove_Punctuation

¨ rule_Remove_Punctuation_and_Space

¨ rule_Replace_Limited_Punct_with_Space

¨ rule_UpperCase

For more information about these rules, see “Core General Data Cleansing Rules” on page 7.

Portugal Matching and Deduplication RulesUse Portugal matching and deduplication rules to generate match scores and identify duplicate records.

Portugal matching and deduplication rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Matching_Deduplication

Portugal General Data Cleansing Dependencies 35

Page 43: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

The following table describes the Portugal matching and deduplication rules:

Name Description

mplt_Company_Name_Match Identifies duplicate rows based on company name. This mapplet matches rows usinggroup keys generated from the first three characters of the Soundex code for thecompany name.

mplt_PRT_Company_Name_and_Address_Match

Identifies duplicate rows based on company name and Portuguese address data. Thismapplet matches rows using group keys generated from the first three characters of theSoundex code for the company name and the first three digits of the ZIP Code.

mplt_PRT_Familyname_and_Address_Match

Identifies duplicate rows based on surname and Portuguese address data. This mappletmatches rows using group keys generated from the NYSIIS code for the surname andthe first three digits of the postal code.

mplt_PRT_Firstname_and_NIF_BI_Match

Identifies duplicate rows based on first name and personal identification numbers suchas Número de Indentificação Fiscal (NIF) and Bilhete de Identidade (BI). This mappletmatches rows using group keys generated from personal identification numbers.

mplt_PRT_IMO_Company_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows for Portuguese data based oncompany names and addresses. This mapplet matches rows using group keysgenerated from postal codes.

mplt_PRT_IMO_Familyname_and_Address_Match

Uses identity match strategies to identify duplicate rows for Portuguese data based onfamily names and addresses. This mapplet matches rows using group keys generatedfrom postal codes.

mplt_PRT_IMO_Individual_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows for Portuguese data based onperson names and addresses. This mapplet matches rows using group keys generatedfrom postal codes.

mplt_PRT_IMO_Personal_Name_and_Data

Uses identity match strategies to identify duplicate rows for Portuguese data based onperson names and personal data. The fields in the personal data column should containa single type of data, such as phone number, email, or NIF. This mapplet matches rowsusing group keys generated from personal data.

mplt_PRT_Individual_Name_and_Address_Match

Identifies duplicate rows based on person names and Portuguese address data. Thismapplet matches rows using group keys generated from the NYSIIS code for thesurname and the first three digits of the postal code.

mplt_PRT_Individual_Name_and_Date_Match

Identifies duplicate rows based on Portuguese person names and date data. Thismapplet matches rows using group keys generated from dates.

mplt_PRT_Individual_Name_and_Email_Match

Identifies duplicate rows based on Portuguese person names and email addresses. Thismapplet matches rows using group keys generated from email addresses.

mplt_PRT_Individual_Name_and_Phone_Match

Identifies duplicate rows based on Portuguese person names and phone numbers. Thismapplet matches rows using group keys generated from phone numbers.

mplt_PRT_Individual_Name_Match Identifies duplicate rows based on Portuguese person names. This mapplet matchesrows using group keys generated from the NYSIIS codes for surnames.

rule_Company_Name_MatchScore Generates a match score by comparing company names.

rule_PRT_Company_Name_and_Address_MatchScore

Generates a match score by comparing company names and Portuguese address data.

36 Chapter 6: Portugal Accelerator

Page 44: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_PRT_Familyname_and_Address_MatchScore

Generates a match score by comparing surnames and Portuguese address data.

rule_PRT_Firstname_and_NIF_BI_MatchScore

Generates a match score by comparing first name data, Número de Indentificação Fiscal(NIF), and Bilhete de Identidade (BI) numbers.

rule_PRT_Individual_Name_and_Address_MatchScore

Generates a match score by comparing person names and Portuguese address data.

rule_PRT_Individual_Name_and_Date_MatchScore

Generates a match score by comparing person names and dates.

rule_PRT_Individual_Name_and_Email_MatchScore

Generates a match score by comparing person names and email addresses.

rule_PRT_Individual_Name_and_Phone_MatchScore

Generates a match score by comparing person names and phone numbers.

rule_PRT_Individual_Name_MatchScore

Generates a match score by comparing person names.

Portugal Matching and Deduplication Rules 37

Page 45: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

C H A P T E R 7

United Kingdom AcceleratorThis chapter includes the following topics:

¨ United Kingdom Accelerator Overview, 38

¨ United Kingdom Demonstration Mappings, 38

¨ United Kingdom Address Data Cleansing Rules, 39

¨ United Kingdom Contact Data Cleansing Rules, 40

¨ United Kingdom Financial Data Cleansing Rules, 42

¨ United Kingdom General Data Cleansing Dependencies, 42

¨ United Kingdom Matching and Deduplication Rules, 43

United Kingdom Accelerator OverviewThe United Kingdom accelerator validates and enhances U.K. data by using specialized data quality processesand region-specific reference tables. This accelerator includes rules, reference tables, demonstration mappings,and demonstration data objects.

The United Kingdom accelerator includes rules that perform the following data quality processes:

¨ Address data cleansing

¨ Contact data cleansing

¨ Financial data cleansing

¨ Matching and deduplication

This accelerator depends on general data cleansing rules that the Core accelerator installs.

United Kingdom Demonstration MappingsThe United Kingdom Accelerator demonstration mappings combine accelerator rules to demonstrate complex dataquality processes.

United Kingdom accelerator demonstration mappings install to the following repository location:

[Informatica_DQ_Content]\Rules_Demo\GBR_Accelerator

The United Kingdom accelerator includes the following demonstration mappings:

38

Page 46: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

m_GBR_customer_data_demo

Parses, standardizes, and validates data using rules customized for the United Kingdom. The data objectsreferenced in this mapping use the following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

m_GBR_customer_matching_demo

Demonstrates standardization and parsing rules that are customized for matching data from the UnitedKingdom.

This mapping analyzes the following data combinations and generates match clusters for each combination:

¨ Person name and address data

¨ Person name and phone number

You can connect these match cluster outputs to an Association transformation to generate AssociationIDs.You can then connect the Association transformation output to a Consolidation transformation to identifymaster records.

The data objects referenced in this mapping use the following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

United Kingdom Address Data Cleansing RulesUse United Kingdom address data cleansing rules to parse, standardize, and validate address data.

United Kingdom accelerator address data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Address_Data_Cleansing

The following table describes the United Kingdom address data cleansing rules:

Name Description

rule_GBR_Address_Parse_Hybrid Parses partially tokenized U.K. addresses into tokens. This rule does not validateaddress deliverability.

rule_GBR_Address_Parse_Multiline

Parses multiline U.K. addresses into tokens. This rule does not validate addressdeliverability.

rule_GBR_Address_Validation_Discrete

Validates the deliverability of U.K. addresses. Use this rule if the input address data isfully tokenized. This rule requires address reference data and a corresponding license.

rule_GBR_Address_Validation_Hybrid

Validates the deliverability of U.K. addresses. Use this rule if the input address data ispartially tokenized. This rule requires address reference data and a correspondinglicense.

rule_GBR_Address_Validation_Multiline

Validates the deliverability of U.K. addresses. Use this rule if the input address data isnot tokenized. This rule requires address reference data and a corresponding license.

United Kingdom Address Data Cleansing Rules 39

Page 47: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_GBR_Postcode_Parse Parses U.K. postal codes.

rule_GBR_Postcode_Standardise Standardizes U.K. postal codes. This rule requires that the input follows predefinedformats. The following list describes these formats, using "A" to represent alphabeticcharacters and "9" to represent numerals.- A9 9AA- A99 9AA- AA9 9AA- AA99 9AA- A9A 9AA- AA9A 9AA- GIR 0AAThe rule does not standardize Inputs that do not match these patterns.

rule_GBR_Postcode_Validate Validates U.K. postal codes. This rule matches standardized postal codes with valid U.K.postal codes. If the rule does not find a matching postal code, it verifies whether thepostal code follows the standard U.K. postal code pattern.

United Kingdom Contact Data Cleansing RulesUse United Kingdom contact data cleansing rules to parse, standardize, and validate contact data.

United Kingdom accelerator contact data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing

The following table describes the United Kingdom contact data cleansing rules:

Name Description

rule_GBR_Driver_Number_Parse Parses strings that match the format of U.K. driver numbers.

rule_GBR_Driver_Number_Validation

Validates U.K. driver numbers using rules defined by the U.K. Government DataStandards Catalogue.

rule_GBR_Gender_Assignment Assigns gender according to first name. For example, this rule assigns the name "JohnSmith" a gender of "M" for male. This rule returns "M" or "F."

rule_GBR_Given_Name_Standard Generate given names from U.K. nicknames.

rule_GBR_Multi_Person_Name_Parse

Parses U.K. names into name tokens such as title, first name, middle name, andsurname.

rule_GBR_NHS_Number_Parse Parses National Health Service (NHS) numbers from a string. This rule returns the NHSnumber and also returns a string that contains the input text with the NHS numberremoved.

rule_GBR_NHS_Number_Standardise

Standardizes National Health Service (NHS) numbers into the standard format (999 9999999). This rule requires that the input is a 10-digit string.

rule_GBR_NHS_Number_Validate Validates National Health Service (NHS) numbers by using a check digit algorithm. Thisrule requires that the input is a 10-digit string.

40 Chapter 7: United Kingdom Accelerator

Page 48: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_GBR_NINO_Conformity_Check

Validates the standard pattern for a U.K. National Insurance Number (NINO). This ruledoes not verify that a NINO is accurate or active.

rule_GBR_NINO_Parse Parses U.K. National Insurance Numbers (NINO) from strings. This rule returns theNINO and also returns a string that contains the input text with the NINO removed.

rule_GBR_NINO_Standardization Standardizes U.K. National Insurance Numbers (NINO) into the two most typicalformats. This rule returns the following formats, where C represents alphabeticcharacters and N represents numerals:- CC NN NN NN C- CCNNNNNNCThis rule formats all alphabetic characters as uppercase. The rule requires that the inputconforms to the pattern of a NINO.

rule_GBR_NINO_Validation Validates a U.K. National Insurance Number (NINO). This rule does not verify that aNINO is active.

rule_GBR_Passport_Number_MR_Parse

Parses U.K. passport numbers in extended format. The extended format is the machinereadable format for passport numbers.

rule_GBR_Passport_Number_Parse

Parses U.K. passport numbers that use the format specified by the Government DataStandards Catalogue. This rule parses all nine-digit strings.

rule_GBR_Passport_Number_Validation

Validates U.K. passport numbers that use the format specified by the Government DataStandards Catalogue.

rule_GBR_Personal_Name_Parsing_FML

Parses U.K. names in First-Middle-Last format into tokens.

rule_GBR_Personal_Name_Parsing_LFM

Parses U.K. names in Last-First-Middle format into tokens.

rule_GBR_Phone_Number_Parse Parses a U.K. phone number from a string. This rule parses the first phone number inthe data, reading from right to left.This rule recognizes phone numbers that use leading zeros, the "+44" internationaldialing code, and extensions that begin with the hash symbol. This rule processes thefollowing punctuation symbols: the plus sign, parentheses, and the hash symbol. Beforeyou run this rule, remove all other punctuation, including double spaces.This rule returns a phone number and also returns a string that contains the input textwith the phone number removed.

rule_GBR_Phone_Number_Standardisation

Standardizes U.K. phone numbers to international and local dialing formats. This rulerecognizes phone numbers that use leading zeros, the "+44" international dialing code,and extensions that begin with the hash symbol.

rule_GBR_Phone_Number_Validation

Validates the area code and length of U.K. phone numbers. This rule returns the regionof the phone number, as well as codes that indicate if the area code and length of aphone number are valid.

rule_Prename_Assignment Generates an honorific according to the gender. You can change the female_prenameexpression variable from Ms. to Mrs.

rule_Salutation_Assignment Generates formal and casual greetings from prenames and name tokens. For example,for "Mr. John Smith," the rule generates the formal greeting "Dear Mr. Smith," and thecasual greeting "Dear John,". You can change the prefix and punctuation by editing thevariables in the dq_Generate_Salutation Expression transformation.

United Kingdom Contact Data Cleansing Rules 41

Page 49: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

United Kingdom Financial Data Cleansing RulesUse United Kingdom financial data cleansing rules to parse, standardize, and validate contact data.

United Kingdom financial data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Financial_Data_Cleansing

The following table describes the United Kingdom financial data cleansing rules:

Name Description

rule_GBR_Bank_Account_Parse Parses 8-digit numeric strings as U.K. bank account numbers.

rule_GBR_Bank_Account_Validation

Validates U.K. bank account numbers. This rule returns codes that indicate whether theinput is numeric and whether it is the correct number of digits.

rule_GBR_Bank_Sort_Code_Parse Parses 6-digit numeric strings as U.K. bank sort codes. This rule parses strings ofnumbers in the following formats:- Consecutive numbers (999999)- Numbers delimited with a dash (99-99-99)

rule_GBR_Bank_Sort_Code_Validation

Validates the format and length of U.K. bank sort codes that are standardized to thedash-delimited format (99-99-99). This rule returns a Status port that describes thevalidity of the sort code and a Validation Note port that explains the status. If the sortcode prefix matches a known assignment for a U.K. bank, the Validation Note portincludes the bank name.

rule_GBR_Sort_Code_Standardise Standardizes a U.K. bank sort code to the format "NN-NN-NN."

United Kingdom General Data Cleansing DependenciesThe United Kingdom accelerator has dependencies on general data cleansing rules that you install as part of theCore accelerator.

The United Kingdom accelerator has dependencies on the following general data cleansing rules:

¨ rule_Assign_DQ_90_Mailability_Score_Description

¨ rule_Assign_DQ_90_Match_Code_Descriptions

¨ rule_Remove_Extra_Spaces

¨ rule_Remove_Leading_Zero

¨ rule_Remove_Period_Parentheses

¨ rule_Remove_Punctuation

¨ rule_Remove_Punctuation_and_Space

¨ rule_Remove_Space

¨ rule_Replace_Limited_Punct_with_Space

¨ rule_UpperCase

For more information about these rules, see “Core General Data Cleansing Rules” on page 7.

42 Chapter 7: United Kingdom Accelerator

Page 50: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

United Kingdom Matching and Deduplication RulesUse United Kingdom matching and deduplication rules to generate match scores and identify duplicate records.

United Kingdom matching and deduplication rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Matching_Deduplication

The following table describes the United Kingdom matching and deduplication rules:

Name Description

mplt_GBR_Company_Name_Postcode_Match

Identifies duplicate rows based on company name and postal code. This mappletmatches rows using group keys generated from the postal code.

mplt_GBR_Familyname_and_NINO_Match

Identifies duplicate rows based on surname and National Insurance Number. Thismapplet matches rows using group keys generated from the National Insurance Number.

mplt_GBR_Familyname_and_Postcode_Match

Identifies duplicate rows based on surname and United Kingdom postal code. Thismapplet matches rows using group keys generated from the postal code.

mplt_GBR_Firstname_3CharsSurname_DOB_and_Postcode_Match

Identifies duplicate rows based on the following data:- First name- The first three characters in the surname- Date of birth- postal codeThis mapplet matches rows using group keys generated from the postal code.

mplt_GBR_Firstname_Surname_2ElementsDOB_and_Postcode_Match

Identifies duplicate rows based on the following data:- Person names- Any two date of birth elements, such as month and year- U.K. postal codeThis mapplet matches rows using group keys generated from the postal code.

mplt_GBR_Firstname_Surname_DOB_and_Postcode_Match

Identifies duplicate rows based on the following data:- Person names- Date of birth- postal codeThis mapplet matches rows using group keys generated from the postal code.

mplt_GBR_IMO_Company_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows based on company names andaddresses. This mapplet matches rows using group keys generated from postal codes.

mplt_GBR_IMO_Familyname_and_Address_Match

Uses identity match strategies to identify duplicate rows based on family names andaddresses. This mapplet matches rows using group keys generated from postal codes.

mplt_GBR_IMO_Individual_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows based on person names andaddresses. This mapplet matches rows using group keys generated from postal codes.

mplt_GBR_IMO_Personal_Name_and_Data

Uses identity match strategies to identify duplicate rows based on person names andpersonal data. The fields in the personal data column should contain a single type ofdata, such as phone number, email, or NIN. This mapplet matches rows using groupkeys generated from personal data.

mplt_GBR_Individual_Name_and_Date_Match

Identifies duplicate rows based on U.K. person names and date data. This mappletmatches rows using group keys generated from dates.

mplt_GBR_Individual_Name_and_Email_Match

Identifies duplicate rows based on U.K. person names and email addresses. Thismapplet matches rows using group keys generated from email addresses.

United Kingdom Matching and Deduplication Rules 43

Page 51: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

mplt_GBR_Individual_Name_and_NINO_Match

Identifies duplicate rows based on U.K. person names and National Insurance Numbers(NINO). This mapplet matches rows using group keys generated from NINO data.

mplt_GBR_Individual_Name_and_Phone_Match

Identifies duplicate rows based on U.K. person names and phone numbers. Thismapplet matches rows using group keys generated from phone numbers.

mplt_GBR_Individual_Name_and_Postcode_Match

Identifies duplicate rows based on person names and postal code. This mappletmatches rows using group keys generated from postal codes.

mplt_GBR_Individual_Name_Match Identifies duplicate rows based on U.K. person names. This mapplet matches rowsusing group keys generated from the NYSIIS codes for surnames.

rule_GBR_Familyname_and_NINO_MatchScore

Generates a match score by comparing surnames and U.K. National IdentificationNumbers (NINO).

rule_GBR_Familyname_and_Postcode_MatchScore

Generates a match score by comparing surnames and U.K. postal codes.

rule_GBR_Firstname_3CharsSurname_DOB_and_Postcode_MatchScore

Generates a match score by comparing the following information:- First name- The first three characters in the surname- Date of birth- postal code

rule_GBR_Firstname_Surname_2ElementsDOB_and_Postcode_MatchScore

Generates a match score by comparing the following information:- Person names- Any two date of birth elements, such as month and year- U.K. postal code

rule_GBR_Firstname_Surname_DOB_and_Postcode_MatchScore

Generates a match score by comparing person names, date of birth, and postal code.

rule_GBR_Individual_Name_and_NINO_MatchScore

Generates a match score by comparing person names and U.K. National InsuranceNumbers (NINO).

rule_GBR_Individual_Name_and_Phone_MatchScore

Generates a match score by comparing person names and phone numbers.

rule_GBR_Individual_Name_and_Postcode_MatchScore

Generates a match score by comparing person names and U.K. postal codes.

rule_Individual_Name_and_Date_MatchScore

Generates a match score by comparing person names and dates.

rule_Individual_Name_MatchScore Generates a match score by comparing person names.

rule_GBR_Company_Name_Postcode_MatchScore

Generates a match score by comparing company name and U.K. postal codes.

rule_Individual_Name_and_Email_MatchScore

Generates a match score by comparing person names and email addresses.

44 Chapter 7: United Kingdom Accelerator

Page 52: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

C H A P T E R 8

U.S./Canada AcceleratorThis chapter includes the following topics:

¨ U.S./Canada Accelerator Overview, 45

¨ U.S./Canada Demonstration Mappings, 45

¨ Address Data Cleansing Rules, 46

¨ U.S./Canada Contact Data Cleansing Rules, 47

¨ General Data Cleansing Rules, 50

¨ U.S./Canada Matching and Deduplication Rules, 50

U.S./Canada Accelerator OverviewThe U.S./Canada accelerator validates and enhances data by using specialized data quality processes and region-specific reference tables. This accelerator includes rules, reference tables, demonstration mappings, anddemonstration data objects.

The U.S./Canada accelerator includes rules that perform the following data quality processes:

¨ Address data cleansing

¨ Contact data cleansing

¨ General data cleansing

¨ Matching and deduplication

This accelerator depends on general data cleansing rules that the Core accelerator installs.

U.S./Canada Demonstration MappingsThe U.S./Canada Accelerator demonstration mappings combine accelerator rules to demonstrate complex dataquality processes.

U.S./Canada accelerator demonstration mappings install to the following repository location:

[Informatica_DQ_Content]\Rules_Demo\US_Canada_Accelerator

The U.S./Canada accelerator includes the following demonstration mappings:

45

Page 53: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

m_customer_matching_US_demo

Demonstrates the standardization and parsing rules required for matching U.S. data.

This mapping analyzes the following data combinations and generates match clusters for each combination:

¨ Person name and address data

¨ Person name and phone number

You can connect these match cluster outputs to an Assocation transformation to generate AssociationIDs.You can then connect the Association transformation output to a Consolidation transformation to identifymaster records.

The data objects referenced in this mapping use the following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

m_customer_data_US_demo

Parses, standardizes, and validates U.S. and Canadian data. The data objects referenced in this mapping usethe following path:

<ServerInstallDir>\services\DQContent\INFA_Content\demos\source_data

You may need to modify this path to match your system configuration. To perform address validation, youmust also install an Address Doctor licence key and reference data set.

Address Data Cleansing RulesUse U.S. and Canada address data cleansing rules to parse, standardize, and validate address data.

U.S./Canada accelerator address data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Address_Data_Cleansing

The following table describes the U.S./Canada address data cleansing rules:

Name Description

rule_CAN_Address_Parse_Hybrid Parses partially tokenized Canadian addresses into tokens. This rule does not validateaddress deliverability.

rule_CAN_Address_Parse_Multiline

Parses multiline Canadian addresses into tokens. This rule does not validate addressdeliverability.

rule_CAN_Address_Validation_Discrete

Validates the deliverability of Canadian addresses. Use this rule if the input addressdata is fully tokenized. This rule requires address reference data and a correspondinglicense.

rule_CAN_Address_Validation_Hybrid

Validates the deliverability of Canadian addresses. Use this rule if the input addressdata is partially tokenized. This rule requires address reference data and acorresponding license.

rule_CAN_Address_Validation_Multiline

Validates the deliverability of Canadian addresses. Use this rule if the input addressdata is not tokenized. This rule requires address reference data and a correspondinglicense.

46 Chapter 8: U.S./Canada Accelerator

Page 54: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_CAN_Postcode_Validation Validates Canadian postal codes. This rule returns "Valid" or "Invalid."

rule_CAN_Province_Validation Validates Canadian province names. This rule returns "Valid" or "Invalid."

rule_Country_Identification Identifies country names within input fields. This rule can also use city, state, province,and postal code information to identify U.S. and Canadian addresses. This rule returns acountry name, a two-character ISO country code, and three-character ISO country code.

rule_Country_Name_Standardization

Standardizes country names. This rule returns a country name, a two-character ISOcountry code, and three-character ISO country code.

rule_USA_Address_Parse_Hybrid Parses partially tokenized U.S. addresses into tokens. This rule does not validateaddress deliverability.

rule_USA_Address_Parse_Multiline Parses multiline U.S. addresses into tokens. This rule does not validate addressdeliverability.

rule_USA_Address_Validation_Discrete

Validates the deliverability of U.S. addresses. Use this rule if the input address data isfully tokenized. This rule requires address reference data and a corresponding license.

rule_USA_Address_Validation_Hybrid

Validates the deliverability of U.S. addresses. Use this rule if the input address data ispartially tokenized. This rule requires address reference data and a correspondinglicense.

rule_USA_Address_Validation_Multiline

Validates the deliverability of U.S. addresses. Use this rule if the input address data isnot tokenized. This rule requires address reference data and a corresponding license.

rule_USA_County_Validation Validates U.S. county names. This rule compares input data against county names in allstates. This rule returns "Valid" or "Invalid."

rule_USA_State_Validation Validates U.S. state names. This rule returns "Valid" or "Invalid."

rule_USA_ZIPCode_Validation Validates five-digit U.S. Zip Codes. This rule returns "Valid" or "Invalid."

U.S./Canada Contact Data Cleansing RulesUse U.S. and Canada contact data cleansing rules to parse, standardize, and validate contact data.

U.S./Canada accelerator contact data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing

The following table describes the U.S./Canada contact data cleansing rules:

Name Description

rule_CAN_Gender_Assignment Assigns gender according to first name. For example, this rule assigns the name "JohnSmith" a gender of "M" for male. This rule returns "M" or "F."

rule_CAN_Given_Name_Standard Generate given names from Canadian nicknames. For example, this rule standardizesthe nickname "Bob" to the given name "Robert."

U.S./Canada Contact Data Cleansing Rules 47

Page 55: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_CAN_Multi_Person_Name_Parse

Parses names into name tokens such as title, first name, middle name, and surname.

rule_CAN_Personal_Name_Parse_and_Standardize_FML

Parses and standardizes Canadian names in First-Middle-Last format. This rule parsesnames into tokens and standardizes the tokens. This rule returns standardized tokens, afull name built from those tokens. The rule also returns data inferred from the inputname, such as gender, formal greeting, and casual greeting.

rule_CAN_Personal_Name_Parse_and_Standardize_LFM

Parses and standardizes Canadian names in Last-Middle-First format. This rule parsesnames into tokens and standardizes the tokens. This rule returns standardized tokensand a full name built from those tokens. The rule also returns data inferred from theinput name, such as gender, formal greeting, and casual greeting.

rule_CAN_Personal_Name_Parsing_FML

Parses names in First-Middle-Last format into tokens.

rule_CAN_Personal_Name_Parsing_LFM

Parses names in Last-First-Middle format into tokens.

rule_CAN_Phone_Number_Parse Parses a Canadian phone number from a string. This rule parses the first phone numberin the data, reading from right to left. This rule returns a phone number and also returnsa string that contains the input text with the phone number removed.

rule_CAN_Phone_Number_Standardization

Standardizes Canadian phone numbers. The rule returns the phone number in thefollowing formats:- Standard - (nnn) nnn-nnnn- Dashes - nnn-nnn-nnnn- No Spaces - nnnnnnnnnn

rule_CAN_Phone_Number_Validation

Validates the area code and length of Canadian phone numbers. This rule returns codesthat indicate phone number type and validity. Types describe categories such as "toll-free."

rule_CAN_SIN_Parse Parses a Canadian Social Insurance Number (SIN) from a string. This rule returns theSIN and also returns a string that contains the input text with the SIN removed.

rule_CAN_SIN_Standardization Standardizes Canadian Social Insurance Numbers (SIN). This rule can output thefollowing formats:- No Punctuation - nnnnnnnnn- Space - nnn nnn nnn- Dash - nnn-nnn-nnnTo change the format, edit the SIN_format expression variable in the dq_Format_SINExpression transformation. Default is "No_Punctuation."

rule_CAN_SIN_Validation Validates Canadian Social Insurance Numbers (SIN). This rule uses the Luhn algorithmto verify whether or not an SIN is valid. This rule returns "Valid" or "Invalid."

rule_Prename_Assignment Generates an honorific according to the gender. You can change the female_prenameexpression variable from Ms. to Mrs.

rule_Salutation_Assignment Generates formal and casual greetings from prenames and name tokens. For example,for "Mr. John Smith," the rule generates the formal greeting "Dear Mr. Smith," and thecasual greeting "Dear John,". You can change the prefix and punctuation by editing thevariables in the dq_Generate_Salutation Expression transformation.

48 Chapter 8: U.S./Canada Accelerator

Page 56: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

rule_USA_Gender_Assignment Assigns gender according to first name. For example, this rule assigns the name "JohnSmith" a gender of "M" for male. This rule returns "M" or "F."

rule_USA_Given_Name_Standard Generate given names from U.S. nicknames. For example, this rule standardizes thenickname "Bob" to the given name "Robert."

rule_USA_Multi_Person_Name_Parse

Parses names into name tokens such as title, first name, middle name, and surname.

rule_USA_Personal_Name_Parse_and_Standardize_FML

Parses and standardizes Canadian names in First-Middle-Last format. This rule parsesnames into tokens and standardizes the tokens. This rule returns standardized tokens, afull name built from those tokens. The rule also returns data inferred from the inputname, such as gender, formal greeting, and casual greeting.

rule_USA_Personal_Name_Parse_and_Standardize_LFM

Parses and standardizes U.S. names in Last-Middle-First format. This rule parsesnames into tokens and standardizes the tokens. This rule returns standardized tokensand a full name built from those tokens. The rule also returns data inferred from theinput name, such as gender, formal greeting, and casual greeting.

rule_USA_Personal_Name_Parsing_FML

Parses names in First-Middle-Last format into tokens.

rule_USA_Personal_Name_Parsing_LFM

Parses names in Last-First-Middle format into tokens.

rule_USA_Phone_Number_Parse Parses a U.S. phone number from a string. This rule parses the first phone number inthe data, reading from right to left. This rule returns a phone number and also returns astring that contains the input text with the phone number removed.

rule_USA_Phone_Number_Standardization

Standardizes U.S. phone numbers. The rule returns the phone number in the followingformats:- Standard - (nnn) nnn-nnnn- Dashes - nnn-nnn-nnnn- No Spaces - nnnnnnnnnn

rule_USA_Phone_Number_Validation

Validates the area code and length of U.S. phone numbers. This rule returns codes thatindicate if the area code and length of a phone number are valid.

rule_USA_SSN_Standardization Standardizes U.S. Social Security Numbers (SSN). This rule can output the followingformats:- No Punctuation - nnnnnnnnn- Space - nnn nnn nnn- Dash - nnn-nnn-nnnTo change the format, edit the SSN_format expression variable in the dq_SSN_FormatExpression transformation. Default is "No_Punctuation."

rule_USA_SSN_Validation Validates U.S. Social Security Numbers (SSN). Within each SSN, the rule verifies thatthe Area token (the first three numbers) and the Group token (the middle two numbers)are a valid combination. The rule does not verify that the SSN is an issued number. Thisrule returns "Valid" or "Invalid."

U.S./Canada Contact Data Cleansing Rules 49

Page 57: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

General Data Cleansing RulesUse U.S. and Canada general data cleansing rules to identify the type of information contained in input fields.

U.S./Canada accelerator general data cleansing rules install to the following repository location:

[Informatica_DQ_Content]\Rules\General _Data_Cleansing

The following table describes the U.S./Canada general data cleansing rules:

Name Description

rule_CAN_Field_Identification Identifies the type of information contained in an input field. This rule can identify names,Personal IDs, company names, dates, and Canadian address data. This rule returns alabel that describes the type of input data.

rule_USA_Field_Identification Identifies the type of information contained in an input field. This rule can identify names,Personal IDs, company names, dates, and U.S. address data. This rule returns a labelthat describes the type of input data.

Dependencies on Core General Data Cleansing RulesThe U.S./Canada accelerator depends on the following general data cleansing rules from the Core accelerator:

¨ rule_Assign_DQ_90_Mailability_Score_Description

¨ rule_Assign_DQ_90_Match_Code_Descriptions

¨ rule_Date_Validation

¨ rule_Remove_Extra_Spaces

¨ rule_Remove_Punctuation

¨ rule_Replace_Limited_Punct_with_Space

¨ rule_UpperCase

For more information about these rules, see “Core General Data Cleansing Rules” on page 7.

U.S./Canada Matching and Deduplication RulesUse matching and deduplication rules to generate match scores and identify duplicate records.

U.S./Canada matching and deduplication rules install to the following repository location:

[Informatica_DQ_Content]\Rules\Matching_Deduplication

The following table describes the U.S./Canada matching and deduplication rules:

Name Description

mplt_CAN_IMO_Company_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows based on company names andaddresses. This mapplet matches rows using group keys generated from postal codes.

mplt_CAN_IMO_Familyname_and_Address_Match

Uses identity match strategies to identify duplicate rows based on family names andaddresses. This mapplet matches rows using group keys generated from postal codes.

50 Chapter 8: U.S./Canada Accelerator

Page 58: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

mplt_CAN_IMO_Individual_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows based on person names andaddresses. This mapplet matches rows using group keys generated from postal codes.

mplt_CAN_IMO_Personal_Name_and_Data

Uses identity match strategies to identify duplicate rows based on person names andpersonal data. The fields in the personal data column should contain a single type ofdata, such as phone number, email, or SIN. This mapplet matches rows using groupkeys generated from personal data.

mplt_Company_Name_and_Address_Match

Identifies duplicate rows based on company name and U.S. address data. This mappletmatches rows using group keys generated from the first three characters of the Soundexcode for the company name and the first three digits of the ZIP Code.

mplt_Company_Name_Match Identifies duplicate rows based on company name. This mapplet matches rows usinggroup keys generated from the first three characters of the Soundex code for thecompany name.

mplt_Familyname_and_Address_Match

Identifies duplicate rows based on surname and U.S. address data. This mappletmatches rows using group keys generated from the NYSIIS code for the surname andthe first three digits of the ZIP Code.

mplt_Firstname_and_SSN_Match Identifies duplicate rows based on U.S. Social Security numbers and first names. Thismapplet matches rows using group keys generated from Social Security numbers.

mplt_Individual_Name_and_Address_Match

Identifies duplicate rows based on person names and U.S. address data. This mappletmatches rows using group keys generated from the NYSIIS code for the surname andthe first three digits of the ZIP Code.

mplt_Individual_Name_and_Date_Match

Identifies duplicate rows based on person names and date data. This mapplet matchesrows using group keys generated from dates.

mplt_Individual_Name_and_Email_Match

Identifies duplicate rows based on person names and email addresses. This mappletmatches rows using group keys generated from email addresses.

mplt_Individual_Name_and_Phone_Match

Identifies duplicate rows based on person names and phone numbers. This mappletmatches rows using group keys generated from phone numbers.

mplt_Individual_Name_and_SSN_Match

Identifies duplicate rows based on person names and U.S. Social Security numbers.This mapplet matches rows using group keys generated from Social Security numbers.

mplt_Individual_Name_Match Identifies duplicate rows based on person names. This mapplet matches rows usinggroup keys generated from the NYSIIS codes for surnames.

mplt_USA_Address_Match Identifies duplicate rows based on U.S. address data. This mapplet matches rows usinggroup keys generated from the first three digits of the ZIP Code.

mplt_USA_IMO_Company_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows based on company names andaddresses. This mapplet matches rows using group keys generated from postal codes.

mplt_USA_IMO_Familyname_and_Address_Match

Uses identity match strategies to identify duplicate rows based on family names andaddresses. This mapplet matches rows using group keys generated from postal codes.

mplt_USA_IMO_Individual_Name_and_Address_Match

Uses identity match strategies to identify duplicate rows based on person names andaddresses. This mapplet matches rows using group keys generated from postal codes.

mplt_USA_IMO_Personal_Name_and_Data

Uses identity match strategies to identify duplicate rows based on person names andpersonal data. The fields in the personal data column should contain a single type of

U.S./Canada Matching and Deduplication Rules 51

Page 59: Informatica Data Quality 9.1.0 HotFix 2 Accelerator Guide ... · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter

Name Description

data, such as phone number, email, or SSN. This mapplet matches rows using groupkeys generated from personal data.

rule_Company_Name_and_Address_MatchScore

Generates a match score by comparing company names and U.S. address data.

rule_Company_Name_MatchScore Generates a match score by comparing company names.

rule_Familyname_and_Address_MatchScore

Generates a match score by comparing surnames and U.S. address data.

rule_Firstname_and_SSN_MatchScore

Generates a match score by comparing first names and U.S. address data.

rule_Individual_Name_and_Address_MatchScore

Generates a match score by comparing person names and U.S. address data.

rule_Individual_Name_and_Date_MatchScore

Generates a match score by comparing person names and dates.

rule_Individual_Name_and_Email_MatchScore

Generates a match score by comparing person names and email addresses.

rule_Individual_Name_and_Phone_MatchScore

Generates a match score by comparing person names and phone numbers.

rule_Individual_Name_and_SSN_MatchScore

Generates a match score by comparing person names, Social Security numbers, andidentification data.

rule_Individual_Name_MatchScore Generates a match score by comparing person names.

rule_USA_Address_MatchScore Generates a match score by comparing U.S. address data.

52 Chapter 8: U.S./Canada Accelerator


Recommended