Upload
dq-global
View
174
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Presentation on "Data, how to get it clean and keep it clean?" We can help with your data quality issues.
Citation preview
Data, how to get it clean and keep it clean?
The best way to make money is to stop wasting it!
Agenda:
Who are DQ
Setting the scene
Acceptable Quality
Data Defects
Get it Clean
Keep it Clean
Q&A via web chat
Close
Setting the scene…
Who are we ?
What do we do ?
How do we do it
?
What’s in it for
our clients ?
UK B2C Data – annual rates of change…
UK Population is 63.23 M
• Over 3.25 M (5.1%) people move house• 0.584 M (0.9%) people pass away• 0.813 M (1.3%) Births• 0.290 M (0.5%) Marry• 0.130 M (0.2%) Divorce• 0.500 M (1.9%) Changes by Royal Mail• 0.250 M (1.4%) people sign up to MPS
UK Households 26.4 M
½ life of B2C data 1 to 1.2 years
UK B2B Data – annual rates of change…
4.934 M trading businesses in the UK• 3.10 M (62.8%) sole proprietorships• 0.43 M (8.8%) partnerships• 1.40 M (28.4%) limited companies• 0.60 M (12.2%) dormant businesses
5.7 M company or individual details changes:• 1 moves every 6 Minutes• 1 fails every 4 minutes
On average a person changes jobs 11 times during their career
Over 1.1 M (22.3%) businesses are registered with the CTPS
• 99.9% of businesses employ less than 250 staff • 99.2% of businesses employ less than 50 people who employ 59% of total staff
2.43 M employees of UK businesses:
@ 24% p.a. ½ life attrition = 3 years
@ 35% p.a. ½ life attrition = 2 years
Data decay – the impacts…
Financial:• £220 M per-annum wasted on inaccurate mailings• £95 M per-annum wasted by companies mailing people who have moved addresses• It costs more to mail a moved or deceased individual than to suppress them• Increase response rates – the same return with less mail
Brand:• Duplicates and incorrect details cause a negative perception• Mailing deceased individuals or bereaved families causes significant distress• Mailing someone who no longer lives at an address does not impress
Compliance:• Best practice – comply with Direct Marketing Association guidelines• Calling a consumer who has registered their objection to receiving direct marketing phone calls is illegal • Mailing a consumer who has registered their objection to receiving direct mail is bad management, contravenes the DMA Code of Practice and could be illegal
Environment
• Protect the environment – help cut down on wasteful mailing
The human factorsAcknowledging there is a problem
The Data Quality Delusion
Everyone understand the importance of data quality
Everyone agrees data
quality is important
Everyone cares about data
quality
Everyone knows what actions to take to improve
data quality
Opening the Johari Window
Seeing what you don’t currently see!
Unknown AreaUnknown to others and unknown to self
Johari Window
Johari Window - You don’t know what you don’t know...
Self
Others
Expand the Open Area
Reduce Blind Area
Reduce the Hidden Area
?
Johari Window
Acceptable levels of data quality?
All data has some level of quality, the question
is at what level is it unacceptable?
How does anyone know?
Who’s responsi
ble?
How much is low quality
data actually costing?
Unacceptable
Acceptable
All data has some level of quality, the question is at what level is it unacceptable.
Temp< 37°C
Hyperthermia
Temp= 37°C
Normal
Temp> 37°C
Abnormal
Temp> 37.8°C
Get help
How can we end up with bad data?
A Boy's name
beginning with the letter J:
"Gerald.."
A word beginning with Z: "Xylopho
ne.."
A part of the body beginning with N: "Knee..“
A mode of transport that you can walk in: "Your shoes.."
Getting your data clean and keeping it clean
Identify, correct, prevent
Get it Clean the basics
About “CURING” data defects• Mastering & Merging• Manual review
Batch process automation
Mass defect identification
Time consuming
More costly than prevention
Keep it Clean the basics
Prevention better than cure
• People• Process• Technology
Ongoing process
Costs of prevention many times lower than cure!
Waging war on error…
Finding
defects
Defini
ng st
andards
Correcting
data
Preventin
g error
Monitorin
g defects
Reference
data
Internal d
ata
Boolean Logic & Dates
DD/MM/YY v MM/DD/YY• 10/10/09 = 10/10/09• 99/99/99 was
accepted as a valid date structure yet it’s clearly wrong
Is it European format
DD/MM/YYYY or US format MM/DD/YYYY?
Precision• DD/MM/
YY or DD/MM/YYYY
OK to Mail = Y
Not OK to Mail =
Y
OK to Mail = N
Not OK to Mail =
N
Numbers in Text and Shared Numbers
Systems Contain:
•0’s and/or O’s•1’s and/or I’s•Tel numbers with 9 x 000 000 000 Same product –
different numbers in 2 systems
Misinterpretation & Standards
M = Male in one system
and Married in another
S = Single in one
system and Separated in another
Gender• 9 variants in
the gender field of a hotel project
Padhraic, Pádraig or PáraicLane, LN, Ln, Road, Rd, Rd. etc.MI or MichiganUS or USA or United StatesGB or UK or United KingdomMr. or MisterHants or Hampshire
Dislocation, misfielding
Address A Address B123 Arcasia Avenue
123 Arcasia Ave
Fareham
Hampshire Fareham
PO16 8XT HantsPO16 8XT
Person A Person B
MartinP Martin PDoyle Doyle
02392 988303 +1 312-253-7873
+1 312-253-7873 02392 988303
Anomalies & Congruence
eMail does not tally with
name parts
Currency does not
tally with
location
Goods shipped before order
Values not in
application pick lists
(metadata)
Default values used
Notes (memo)
fields used without
validation rules
DQ Studio – identifying and fixing
• Product demonstration by:• Martin Kerr
• How to connect, identify and correct defects…
DQ Studio
Classify
• Is the data in your database what you think it is?
Compare
• How similar is value A to value B in % similarity
Format• Email• I.P.• Postcode• Telephone• URL
Generate:• phonetic tokens• pattern tokens
Transform data• 13 Categories• 5 Spoken Languages
Validate• Email• I.P. Address• Postal code• Telephone• URL
DQ Studio
Derive:
• Job Title• Role• Level
• Gender• Male, female, unknown
• Telephone• Country• Location• Number Type
Parse:
• Email• I.P. Address• Telephone
Verify
• Locations (240 Countries)• Phones• Businesses• Contacts
Record matching
Identifying matches
Linking
Mastering
Merging
Updating
Matching – What is it?
• Identification and management of records which:• Are the same• Might be the same• Are not the same
• Table v Itself
• Table v Table
• PAF Batch• PAF
Lookup
• No Way• Gone Away• Passed Away• Append
Dedupe X-Match
X-Ref API
X-Ref Data
How is it done?
Black White
Manually • Internally• External Bureau service
Automatically • Software
Using black and white
magic...
• Black = Matches• White = Non Matches• Grey = Ambiguous
Carefully to avoid:
• Too many matches• Too few matches• Errors in matches
The grey areas - When is a match a match?
Bob = Bobby = Rob= Robert =
Robby= Roberto?
Thomson = Thompson =
Tomson = Thomson?
Xerox = Zerocks?
PO16 8XT = P0I6 8XT?
Grey to Black or Grey to White
• Transformations (Synonyms)• Phonetics• String comparisons• Intelligence
• Rules• Spelling• Typo’s
• Logic• Experience• Lookups
Mastering Perfection & merging?
Problems:• Which data survives?• Which data gets re-
assigned?• Which data gets stored?• Which data gets thrown
away
Solutions:• Define the record master• Define the field merge
rules• Use technology to
automate processes• Humanise exceptions
Perfect & Merge for
Identify Perfect Merge
Process flow
CRMDatabas
e
PrimaryID SecondaryID Score
{D1C12E3A-B7F2-E211-95FC-0015F298503A} {EFF76F28-E8EE-E211-9968-0015F298503A} 100{D1C12E3A-B7F2-E211-95FC-0015F298503A} {EE1F80ED-53F0-E211-BBCE-0015F298503A} 86{E9C12E3A-B7F2-E211-95FC-0015F298503A} {07F86F28-E8EE-E211-9968-0015F298503A} 100{E9C12E3A-B7F2-E211-95FC-0015F298503A} {062080ED-53F0-E211-BBCE-0015F298503A} 94{FFC12E3A-B7F2-E211-95FC-0015F298503A} {1DF86F28-E8EE-E211-9968-0015F298503A} 100{FFC12E3A-B7F2-E211-95FC-0015F298503A} {81F86F28-E8EE-E211-9968-0015F298503A} 92{FFC12E3A-B7F2-E211-95FC-0015F298503A} {1C2080ED-53F0-E211-BBCE-0015F298503A} 99{FFC12E3A-B7F2-E211-95FC-0015F298503A} {802080ED-53F0-E211-BBCE-0015F298503A} 100{CDC12E3A-B7F2-E211-95FC-0015F298503A} {EBF76F28-E8EE-E211-9968-0015F298503A} 100{CDC12E3A-B7F2-E211-95FC-0015F298503A} {4FF86F28-E8EE-E211-9968-0015F298503A} 82
{CDC12E3A-B7F2-E211-95FC-0015F298503A} {EA1F80ED-53F0-E211-BBCE-0015F298503A} 100{CDC12E3A-B7F2-E211-95FC-0015F298503A} {4E2080ED-53F0-E211-BBCE-0015F298503A} 82
{71F86F28-E8EE-E211-9968-0015F298503A} {702080ED-53F0-E211-BBCE-0015F298503A} 100
{6BF86F28-E8EE-E211-9968-0015F298503A} {6A2080ED-53F0-E211-BBCE-0015F298503A} 100{01C22E3A-B7F2-E211-95FC-0015F298503A} {1FF86F28-E8EE-E211-9968-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {83F86F28-E8EE-E211-9968-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {1E2080ED-53F0-E211-BBCE-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {822080ED-53F0-E211-BBCE-0015F298503A} 100
Match demonstration
Connectin
gDefini
ngIdentifying
Reviewing
Processing
Cleaning up your business systems:
Back-up your data
Define pick lists
Ensure legacy data conforms to picklists
Delete any temporary fields set-up for test and still in the production system
Delete or archive old dataIdentify contacts with no email and/or no telephone #
Identify and correct contacts with bogus phone numbers
Identify records whose email bounces
Identify businesses without contacts
Archive linked documents which are ‘n’ years old, however, take care with legal including: invoices and contracts
User admin – delete any users who no longer access systemsReview any prospects, suspects or opportunities not properly closed i.e. > ‘n’ weeks from opening
Actions to consider…
Change attitudes to “ABC” thinking
Think prevention not cure
Apply DQ processes
Verify, Format & Validate
Suppress records
Merge duplicates
Append missing data for segmentation
Govern and Comply
Measure & Manage
Get a CXO sponsorPrune & Consolidate & Remove competitionCommon dictionary of terms
Define customer value, and lifetime?
In conclusion…
Identify• recogni
se there is a problem?
Qualify• gather
evidence, what, when, where and how large is the problem?
Quantify• what’s
specifically doing the damage?
Accept• acknowl
edge the scale of the task?
Define• the goals
and what will be measured?
Perform• carry out
the tasks agreed in the order or significance
Questions…
• Build a better business based on trusted data…
• Contact DQ Global• www.DQGlobal.com
• Talk to a consultant• [email protected]• +44 2392 988303 (Europe)• +1 314-253-7873 (North America)