45
Ron Forino DAMA - Washington, DC September 1999 Project Driven Data Quality Improvement

Ron Forino DAMA - Washington, DC September 1999

  • Upload
    jemma

  • View
    32

  • Download
    1

Embed Size (px)

DESCRIPTION

Project Driven Data Quality Improvement. Ron Forino DAMA - Washington, DC September 1999. Examples. According to DM Review, one European company discovered through an audit that it was not invoicing 4% of its orders. With $2 billion in revenues, that meant $80 million went unpaid. - PowerPoint PPT Presentation

Citation preview

Page 1: Ron Forino DAMA - Washington, DC September 1999

Ron Forino

DAMA - Washington, DC September 1999

Project DrivenData Quality Improvement

Page 2: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

2

ExamplesExamplesExamplesExamples According to DM Review, one European company discovered

through an audit that it was not invoicing 4% of its orders. With $2 billion in revenues, that meant $80 million went unpaid.

Electronic data audits show that the invalid data values in the typical customer database average around 15 - 20%. Physical audits suggest that this number may be closer to 25 - 30%.

In 1992, 96,000 IRS tax refund checks were returned “undeliverable” due to incorrect addresses.

This year, incorrect price data in retail databases will cost American consumers as much as $2.5 billion in overcharges.

According to organizations like the Data Warehouse Institute, the Gartner Group and MetaGroup - Data Quality is one of the top 1-3 success factors to Data Warehousing.

The average mid-sized company may have 30,000 - 50,000 fields in files, tables, screens, reports, etc. [Platinum Technology]

Page 3: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

3

AgendaAgendaAgendaAgenda Definitions

What is Data Quality?

Tactics and the End Game

Building Blocks to Data Quality– Tactical Initiatives– Strategic Initiatives

Tactical Data Quality– Rule Disclosure– Data Quality Measurement, Analysis and Certification– Meta Data Creation– Validation– Quality Improvement

Page 4: Ron Forino DAMA - Washington, DC September 1999

Definitions

Page 5: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

5

DefinitionsDefinitionsDefinitionsDefinitions

Data Transformation - Changing data values to a format consistent with integrity and business rules agreed to by data stakeholders.

Data Cleansing - Consolidation of redundant customer records. Term used to describe the process of “merging and purging” of customer lists in an effort to reduce duplicate or inaccurate customer records.

Data Quality Improvement - The process of improving data quality to the level desired to support the enterprise information demand.

Data Quality - definition to follow….

Page 6: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

6

Data Quality Improvement Decision TreeData Quality Improvement Decision TreeData Quality Improvement Decision TreeData Quality Improvement Decision Tree

Data QualityImprovement

DataCleansing

Transform

Data Reengineering

Match & Dedupe

Process Reengineering StandardizeValidateMatchDedupeIntegrateEnrich

Conform to Business Rule

Task Process

Page 7: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

7

Tactics and The End GameTactics and The End GameTactics and The End GameTactics and The End Game

“We need better data quality...”

Enterprise Initiative

Select Project

Data Quality Assessment

Report & Recommendations

Source System Clean-up Initiative

Page 8: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

8

Tactics and The End GameTactics and The End GameTactics and The End GameTactics and The End Game

“We need better data quality...”

Enterprise Initiative

Select Project

Data Quality Assessment

Report & Recommendations

Source System Clean-up Initiative

Data Warehouse

Data Quality Assessment

ReportStaging Specifications

Source System Clean-up Initiative

Page 9: Ron Forino DAMA - Washington, DC September 1999

What is [Good] Data Quality?

Page 10: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

10

How Can We Know Good Data Quality?How Can We Know Good Data Quality?How Can We Know Good Data Quality?How Can We Know Good Data Quality?

Column 1 321453 212392 093255 214421 . . .

Is this Good Data Quality?

What can we conclude?

Page 11: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

11

What is Data Quality?What is Data Quality?What is Data Quality?What is Data Quality?

Information Quality = f(Definition + Data + Presentation)

Definition Defines Data Domain Value Specification Business Rules that Govern the Data Information Architecture Quality

Data Content Completeness Validity/Reasonability

Data Presentation Accessible Timely Non-ambiguous

Page 12: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

12

Common Data Quality ProblemsCommon Data Quality ProblemsCommon Data Quality ProblemsCommon Data Quality Problems

Data Content Missing Data Invalid Data Data Outside Legal Domain Illogical Combinations of Data

Structural Record Key Integrity Referential Integrity Cardinality Integrity

Migration/Integration Rationalization Anomalies Duplicate or Lost Entities

Definitions and Standards Ambiguous Business Rules Multiple Formats for Same

Data Elements Different Meanings for the

Same Code Value Multiple Codes Values with

the Same Meaning Field Used for Unintended

Data Data in Filler Y2K Violation

Page 13: Ron Forino DAMA - Washington, DC September 1999

Building Blocksto Data Quality

Page 14: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

14

Benefits Realization

Strategic

Tactical

Building Blocks of a Data Quality ProgramBuilding Blocks of a Data Quality ProgramBuilding Blocks of a Data Quality ProgramBuilding Blocks of a Data Quality Program

Rule Disclosure

Analyze & Certify

Meta Data Creation Quality Improvement

Data Stewardship

DQ Requirements

Enterprise Cultural Shift

QC/Process Auditing

Defect Prevention

Validation

Quality Reengineering

Measure

Page 15: Ron Forino DAMA - Washington, DC September 1999

Tactical Data Quality

Page 16: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

16

Steps to Tactical Data QualitySteps to Tactical Data QualitySteps to Tactical Data QualitySteps to Tactical Data Quality

MeasureQuality

Meta DataCreation

RuleDisclosure

Analyze &Certify

ValidationQuality

Improvement

Page 17: Ron Forino DAMA - Washington, DC September 1999

Rule Disclosure

Page 18: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

18

Sources of Meta DataSources of Meta DataSources of Meta DataSources of Meta Data

Legacy Meta Data– Data Models, Process Models– Data Dictionary, Definitions,

Aliases– Glossary of Terms

Transformation Meta Data– Data Mapping – Transformation Rules– Error Handling Rules

Access Meta Data – Data Directory– Data Definitions

The Subject Matter Expert

– Database Directory– Domain Values, Range of

Values– Run Books

– Derived Data Calculations– Audit Statistics

– Source & Transformation

Page 19: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

19

Acquiring good Meta Data is EssentialAcquiring good Meta Data is EssentialAcquiring good Meta Data is EssentialAcquiring good Meta Data is Essential

Meta Data can be gathered before, during or after the Assessment

Collect Documentation

ReportFindings

Validate theMeta Data

Assess theData

Collect Documentation

ValidateFindings

Assess theData

ReportFindings

Preferred

Collect ValidMeta Data

ReportFindings

Assess theData

“You can pay me now, or you can pay me later…”

Page 20: Ron Forino DAMA - Washington, DC September 1999

MeasuringData Quality Techniques Tools Methods

Page 21: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

21

Customer ComplaintsUser Interviews & FeedbackCustomer Satisfaction SurveyData Quality Requirements GatheringData Quality Assessments

“One accurate measurement is worth a thousand expert opinions”[Grace Hopper, Admiral, US Navy]

How can Data Quality be Measured?How can Data Quality be Measured?How can Data Quality be Measured?How can Data Quality be Measured?

Page 22: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

22

Measuring Data Quality - ToolsMeasuring Data Quality - ToolsMeasuring Data Quality - ToolsMeasuring Data Quality - Tools

Analysis Tools Specifically designed assessment tools

– Quality Manager, Migration Architect– N & A: Trillium, Group-1, ID Centric, Finalist, etc.

Improvisations– SAS, Focus, SQL, other query tools

Other Necessary Tools File Transfer Data Conversion

Page 23: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

23

Business Rule IntegrityRequiring Meta Data

Field Integrity Intuitive Integrity Rules

Level 1: Completeness– Nulls or Blanks– Misuse (or overuse) of Default Values

Level 2: Validity– Data Integrity Anomalies – Invalid Data based on Business Rule

Level 3: Structural Integrity – Primary Key Uniqueness– Key Structure (Cardinality, Referential Integrity, Alternate Keys)

Level 4: Business Rule Violations– Relationship between two or more fields– Calculations

Assessment MeasurementsAssessment MeasurementsAssessment MeasurementsAssessment Measurements

Page 24: Ron Forino DAMA - Washington, DC September 1999

Analyzeand Certify Identifying Problems Sizing up Problems “To Certify or Not to Certify…”

Report Card

Page 25: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

25

Template Template - field level- field level Template Template - field level- field level

Data Quality Report

Value Frequency Percent 88 Info Analysis

•Value - the domain occurrence•Frequency - the number of occurrences within the data set•Percent - the % of the whole set•88 Info - the copybook definition for the value•Analysis - comments about our findings

Page 26: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

26

Identifying ProblemsIdentifying ProblemsIdentifying ProblemsIdentifying Problems

Data Quality Report

Value Frequency Percent 88 Info Analysis

BLANK 19 11.9Is this a required field? If yes, what is the value definition (88 Info) for 'BLANK'?

BBUY 59 36.9 Best Buy

ID216 53 33.1What are the value definitions (88 Info) for all non-blank values?

MUNI 23 14.4 Municipal Bond

MLCMO 2 1.3 CMO Account

MLMTN 2 1.3 Manitenance Account

STANG 2 1.3What are the value definitions (88 Info) for all non-blank values?

Total 160 100

Analysis (and Discovery)1. Is the field required? If so, blanks indicate an anomaly.2. Are the values “ID206” and “STANG” allowed? (Is this a problem

with the data or the Meta Data?3.Some values occur in only 1.3% of the records. Is this telling us there is a problem?

1

2

3

Page 27: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

27

Data Quality Scoring Data Quality Scoring Data Quality Scoring Data Quality Scoring

Scoring Key

Priority

Criticality/Sensitivity High Medium Low LegendNo Problem Encountered A A A Remarks Action

Less than .1 % had problems B+ B+ B+

Less than .5 % had problems B B+ B+ A ExcellentPre-Certif ied. No problems encountered

If f indings agree w ith documented business rules, CERTIFY, otherw ise review findings w ith a SME.

Less than 1 % had problems C B B+ B Good Problems of small magnitudeMeet w ith SME to review metadata and report

Less than 2 % had problems F C B C PoorEither has data anomaly or a business definition is inaccurate

Meet w ith SME to review metadata and report

Less than 5 % had problems F C C F FailureRequires Serious Attention or is an unreliable field

Meet w ith SME to review metadata and report

Less than 10 % had problems F F C X Not Populated absent in 100% of row s Verify if there are plans to use the f ield

Less than 50 % had problems F F F * for SME Not enough meta data to score SME Review

Field is not populated in 100% of the row s X X XField cannot be scored because there is not a proper definition or domain description, and requires a SMEs consultation * * *

Prior to SME Review

Page 28: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

28

File: Customer Master SAS Program(s):Field: Customer Code Data Analysis\DQ Programs\cm\Cust Code BlankAN.txtCriticality: High Data Analysis\DQ Programs\cm\CustCodeLookup3.txtTest File Date: 1/19/99

Report ValidationDefinition: Customer code (A/E) identifier. New Buisiness Rules/Domain:

Customer codes are unique within regional office.

Data Cleansing Notes: Transformation/Edit Recommendations:

Issue Log #: Comments:

Scores

Scores%

w/problems CommentOverall Certification F 0%Completeness A 0% Every Customer Code field contains data.Validity F >2% 97248 customer codes not found on the validation lookup file.Structural Integrity N/A N/ABusiness Rules N/A N/A

Completeness Report# Blank Fields # Low-

Value Fields

% Blank/Low-Values

# Populated Fields

Total Records

Analysis

0 0 0% 4794726 4794726 >All fields contain a data value. No blank fields.

Validity ReportCode Values

Not Found (SAMPLE)

Code Frequency

# Codes Not Found

Total Records

Analysis

C2 1 97248 4794726 >97248 codes could not be found on71 1 the validation lookup field T.cu.ccode ( a copy of production file.AA 3 PVSAM.CICS.custcc 02/02/99)..AB 1 >A significant number of data values are not valid cust codes. .AC 2

Example: Example: Poor Data QualityPoor Data QualityExample: Example: Poor Data QualityPoor Data Quality

Page 29: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

29

Field AnalysisField AnalysisField AnalysisField Analysis

In a range of values, in the absence of domain rules,investigate the first and last .2%

Bell curve distribution

Page 30: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

30

Management ReportingManagement Reporting- Short Engagement- Short EngagementManagement ReportingManagement Reporting- Short Engagement- Short Engagement

T A B L E O F C O N T E N T S

EXECUTIVE SUMMARY......................................................................................................................... 1

LEVEL 1: COMPLETENESS AND VALIDITY ...................................................................................................... 1LEVEL 2: STRUCTURAL INTEGRITY................................................................................................................ 1LEVEL 3: BUSINESS RULES ............................................................................................................................ 2RECOMMENDATIONS...................................................................................................................................... 2

BACKGROUND ......................................................................................................................................... 6

THE CUSTOMER MASTER FILE ....................................................................................................................... 6DATA QUALITY ANALYSIS METHODOLOGY .................................................................................................. 9

CUSTOMER MASTER FILE SCORE CARD ...................................................................................... 11

OVERALL ASSESSMENT ............................................................................................................................... 12LEVEL 1 ANALYSIS ...................................................................................................................................... 12

Completeness (Fair) ............................................................................................................................... 12Validity (Very Good)............................................................................................................................... 15

LEVEL 2 ANALYSIS ...................................................................................................................................... 16Primary Keys (Very Good) ..................................................................................................................... 16Referential Integrity (N/A) ...................................................................................................................... 16

LEVEL 3 ANALYSIS ...................................................................................................................................... 16Business Rules and Calculations (Excellent*) ........................................................................................ 16

RECOMMENDATIONS.......................................................................................................................... 18

ESTABLISH THE OFFICE OF CORPORATE DATA QUALITY ............................................................................. 19IMPROVEMENT RECOMMENDATIONS ........................................................................................................... 21MANAGE AND DISTRIBUTE THE CORPORATION’S METADATA..................................................................... 21INITIATIVES FOR DATA IMPROVEMENT ........................................................................................................ 24PROPAGATE THE DATA QUALITY ASSESSMENT PROCESS ............................................................................ 32SAFEGUARD DATA WAREHOUSE USERS FROM DEFECTIVE DATA ............................................................... 33FACILITATE BEST IN TESTING, QUALITY ASSURANCE AND DATA QUALITY ............................................... 37

DETAILED ANALYSIS........................................................................................................................... 40

Page 31: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

31

Management Reporting - StatusManagement Reporting - StatusManagement Reporting - StatusManagement Reporting - Status

CustomerProduct

No Problem

Data Quality Anomalies

Undergoing Validation

58% 59%

31%

27%

11% 14%

0%

10%

20%

30%

40%

50%

60%

Field Analysis

Page 32: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

32

Management Reporting - AnomaliesManagement Reporting - AnomaliesManagement Reporting - AnomaliesManagement Reporting - AnomaliesStatistic # % of Total % of Anomalies

Fields Completed 467 - -OPEN Fields 271 58.0% -Data Quality Anomalies 143 30.6%

Completeness 106 22.7% 74.1%Validity 36 7.7% 25.2%

Structural Integrity 0 0.0% 0.0%Business Rule 1 0.2% 0.7%

Pre-certified 53 11.3% -

Data Quality Anomalies- %

74%

25%

0%

1%

0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0%

Completeness

Validity

Structural Integrity

Business Rule

Page 33: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

33

Management Reporting - ProductivityManagement Reporting - ProductivityManagement Reporting - ProductivityManagement Reporting - ProductivitySecurity

26-Feb 5-Mar 12-Mar 19-Mar 26-Mar 2-Apr 9-Apr 16-Apr 23-Apr 30-Apr 1-May 2-May 3-MayStatistic 1 2 3 4 5 6 7 8 9 10 11 12 13

Data Analysis MetricsField Count: 997 997 997 997 997 997 997 997 997 997

Fields Eliminated: 303 361 361 205 205 205 205 297 297 297Adjusted Field Count: 694 636 636 792 792 792 792 700 700 700

Work CompletedTables In Progress: 2 5 2 1 4 2 1 3 2 0Tables Completed: 1 1 6 8 9 11 15 16 17 20

% 2% 3% 15% 21% 23% 28% 38% 41% 44% 53%Fields in Progress: 104 120 113 20 123 93 68 49 29 0Fields Completed: 27 27 82 185 227 257 335 403 423 467

Fields Completed (week): 0 55 103 42 30 78 68 20 44% 4% 4% 13% 23% 29% 32% 42% 58% 60% 67%

AVG 27 14 27 46 45 43 48 50 47 47

<<<<< PROM only >>>>>

PROM Fields Completed:

27 27

82

185

227257

335

403423

467

0

50

100

150

200

250

300

350

400

450

500

1 2 3 4 5 6 7 8 9 10 11

Prom Progress Report- by Week

4% 4%

13%

23%29%

32%

42%

58%60%

67%

0%

10%

20%

30%

40%

50%

60%

70%

80%

1 2 3 4 5 6 7 8 9 10

Weekly Productivity

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Fields Completed (week): AVG

Page 34: Ron Forino DAMA - Washington, DC September 1999

Meta Data Creation

Page 35: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

35

Example: Data Quality RepositoryExample: Data Quality RepositoryExample: Data Quality RepositoryExample: Data Quality Repository

NewlyDiscovered Rules

NewlyDiscovered Rules

Page 36: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

36

Work Groups

Field Name

Data Inventory Meta Data

Meta Data

KnowledgeManagement

Transformation & Edit

Recommendations

Data QualityStatisticalReports

DQ Assessment

Data Quality& DefinitionValidation

Data CleansingUpdate

SME Validation

Meta Data Supply ChainMeta Data Supply ChainMeta Data Supply ChainMeta Data Supply Chain

Definition &Domain

Meta Data Gathering

Data Requirements

Page 37: Ron Forino DAMA - Washington, DC September 1999

Results Validation

Page 38: Ron Forino DAMA - Washington, DC September 1999

File: PONACUST SAS Program(s): T.EDA.SAS.INPUT(CdeIndPO) Field: oc801ind J:\EDA\DataAnalysis\DQPrograms\Ponacust\ CdeIndPO.txt

Criticality: MediumTest File Date: T.TSOSZUC.EDA (Jan.19,1999) RSLTS:

J:\EDA\DataAnalysis\DQAssessmentReports\Ponacust\CdeIndPO.xls

Report ValidationBusiness Name: Business Rules:Definition:

Data Cleansing Notes: Transformation/Edit Recommendations:

Issue Log #: Comments:

Scores

Scores % w/problems CommentOverall Certification A 0.00Completeness N/A N/AValidity N/A N/AStructural Integrity N/A N/ABusiness Rules N/A N/A

Frequency ReportValue Frequency Percent 88 info Analysis

0 3684278 76.8 1 1110449 23.2

Total= 4794727 100

Report ValidationReport ValidationSME validation… an opportunity to improve Meta Data1. Supply a clear name for the field.2. Is there a good definition?3. Make the business rules public?4. Will the SME initiate a data cleansing initiative?5. Does the SME recommend edit or data transformation rules?6. Are the findings consistent with the SMEs expectations?

Report Sections

Identification

Field Definition & Rules

Statistical Reports &

Analysis

Score & Explanation

1

2 3

4 5

6

Page 39: Ron Forino DAMA - Washington, DC September 1999

QualityImprovement

Page 40: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

40

Next StepsNext StepsNext StepsNext Steps

ManagementManagementReport &Report &

RecommendationsRecommendations

SteeringSteeringCommitteeCommittee

InitiativesInitiativesData Clean-upData Clean-up

Legacy SystemLegacy SystemEnhancements &Enhancements &Re-engineeringRe-engineering

Data MigrationData MigrationTransformationTransformation& Cleansing& CleansingSpecificationsSpecifications

ContinuedMonitoring

MonthlyMonthlyReportsReports

PerformBaseline

Assessment

InformationInformationManagementManagementObjectivesObjectives

Metadata,Metadata,Models,Models,

Reports, etc.Reports, etc.

LegacyLegacyDataData

ExtractionsExtractions

(DiscoveredBusiness

Rules)

Page 41: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

41

Completeness

Accuracy

100%

100%

(More complete,more error prone)

(More accurate,less data)

$$

(Most complete,most accurate,most costly,most timely)

Lessons Learned- Data CleanupLessons Learned- Data CleanupLessons Learned- Data CleanupLessons Learned- Data Cleanup

Page 42: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

42

SummarySummarySummarySummary We made the distinction between:

- Data Migration- Data Quality- Data Cleansing

We defined what “good” data quality is.We discussed that there could be 10 or more processes that

could take place in building a comprehensive data quality program for the enterprise.- Tactical should precede the Strategic [or be the 1st step of ]

There are 6 steps to an effective tactical data quality initiative:- Rule Disclosure- Quality Measurement- Analyze and Certify- Meta Data Creation- Validation- Quality Improvement

Page 43: Ron Forino DAMA - Washington, DC September 1999

ConfidentialDMR Consulting Group Inc.

43

Reference MaterialReference MaterialReference MaterialReference Material

The Demings Management Method (Total Quality Management), Mary Walton

Data Quality for the Information Age, Tom Redman

The Data Warehouse Challenge: Taming Data Chaos, Michael Brackett

Improving Data Warehouse and Business Information Quality, Larry English

DM Review Magazine, Information Quality series by Larry English

Page 44: Ron Forino DAMA - Washington, DC September 1999

Ron Forino

Director, Business Intelligence

DMR Consulting Group

(732)549-4100 [email protected]

[email protected]

Page 45: Ron Forino DAMA - Washington, DC September 1999