Monday November 28, 2016
2:05PM – 3:05PM
Jake Bound, K2 and You
Doug Morris, Computer System Innovations
Keeping Your Data Clean
Data Integrity Paradigm
Data Integrity Paradigm
Today
A day in our life…
Why?
• The 1-10-100 Rule • It costs $1 to verify data when it is first entered into the
database
• It costs $10 to clean and de-dupe the data later
• It costs $100 in additional costs and lost opportunity if it is never corrected
• Trust
• Trust is priceless
Why?
• Data is the single most important asset in your organization
or to put it in even more clear terms
• Your whole business might blow up!
How
1. Understand the Paradigm (you can’t win)
2. Decide what “clean” means to you
3. Create your iMIS superhero
4. Set standards
5. Identify bad data
6. Remediate
7. Refine
8. Rinse and Repeat
Understanding the Paradigm
• Fixing everything in one pass is impractical
• Data cleanup is a process, not a one-time task
• Not all errors can be prevented
• Not all cleanup can be automated
• Consider cost/benefit
• Always remember, WE ARE HUMAN
Deciding on what “clean” means to you
• Every organisation has different:
• Requirements
• Configurations
• Budgets
• Audiences
• Staff abilities
• Legacy data
• Vendors
• Politics
The “why” behind cleaning
• Build Trust with all
• Improve reporting quality
• Increase self-service
• Simplify automation and other projects
• Change of process, fresh start
• Save license fees
• Increase value of rosters
• Discard legacy data
• Save postage
• Save space
• Save money (e.g. emails)
Intangible Tangible
Your iMIS Superhero
• In charge of
• New Fields added to iMIS
• Bridging the gap between what is important and what is controllable
• Working with your vendors • Initial through on-going
• Creating reports/queries/other clean-up options
Set Standards
• Define criteria for “clean” and “dirty”
• Communicate the standards to all (staff)
• Configure iMIS to capture data in a useful format
• Birth date vs. age, staff size range vs. number of employees
• Desired outputs guide your inputs
• The ultimate use of data should guide how you collect that data (e.g. spouse name, public directory)
Identify bad data
• Country = Australia, but State is not an Australian state
• Country = Canada, but State is not a Province
• Potential duplicate related data (two orders for the same products the same day, same ID)
• Middle name is 1 character with no period (i20 fixes now)
• Orphaned address
• Preferred address is blank
• “Other” selection with no other text provided
• One Address found when two are expected
Identify bad data
• Business rule violations
• Lapsed members (still a member, but expired)
• Email required
• Students with past grad date
• Apprentice members must have a mentor relationship
• Board member not registered for House Of Delegates
Identifying bad data
Free Tools (IQA)
Pros
Easy to access
On demand
End user accessible from iMIS
Does not require SQL knowledge
Can be made interactive
Cons
• Reactive
• No scheduling
• No automated corrections
• No automatic distribution of results (without PAP)
Identifying bad data
Free Tools (SSRS)
Pros
On demand or scheduled*
Can be under user control
Can give them a home in iMIS
Automatic Distribution of results (send email to all)*
Cons
• Reactive
• No automated corrections
• Some technical aptitude required
Identifying bad data
Free Tools (SQL)
Pros
Incredibly powerful (SOUNDEX)
Can create views/business objects
More powerful with each version of SQL
Automated corrections
Cons
• Requires SQL Server
Knowledge
Access
• Dangerous in the wrong hands
• Can stop working without notice (Agent/Jobs)
Identifying bad data
Included with iMIS 20 (Process Automation)
Pros
Included with iMIS 20
Multiple alerts included
Works great with Staff site and Member site
Cons
• Limited Data Quality Alerts
Missing Primary email alert
Missing Mobile phone alert
Identifying bad data
Optional Modules (Process Automation Plus)
Pros
Create your own alerts
Can perform scheduled tasks (and call SQL)
Can send emails to Staff
Can send SSRS reports
Automatic Distribution of results
Cons
• Reactive
• Optional
• Additional Licensing Required
Identifying bad data
Optional modules (Customer Service Alerts)
Pros
Proactive
User configurable
Works with desktop and staff site
Immediate notification
Can run processes
Cons
• No automated corrections
• No interactive corrections*
• Reactive
• Additional Licensing Required
Identifying bad data
Optional modules (TaskCentre)
Pros
User configurable
Immediate or delayed notification
Can correct errors as well
Cons
• Reactive
• No interactive corrections
• IT Support required to configure and maintain
• Requires dedicated SQL Server (limiting hosting options)
Remediate
• Better labeling/instructions
• Restrict access, increase access (nobody knows their data better than themselves)
• Change the process, add lookup tables/validations OR just stop collecting data
• Upgrade to newer technology (i20)
• Rollback changes
• Notify the person who entered it
• Sooner is better!
• Notify a supervisor or someone who is able to fix (or punish)
• Modify the data automatically using tools
Remediate
Refine
• Is our process working?
• Do we need to automate what we log?
• Have we discovered new problems?
• What is the most serious remaining problem? (choose 1 per month)
• Remember to check cost/benefit (JB story)
Rinse and Repeat
• The process is continuous
How about some tools to take with you?
• IQA Query #1
• If country = Canada, province is a valid province
Or
• If valid province, country should = Canada
If Country = Canada then Valid Postal Code
If Country = Canada then Valid Postal Code
If Valid Postal Code then Country = Canada
If Valid Postal Code then Country = Canada
How about some tools to take with you?
• IQA Query #2
• Ensure email address = username
• Because we can’t compare fields in IQA (can’t say where csContact.EMAIL <> users.UserId) we need a custom business object
• Create a SQL view
• Create a business object
• Create an IQA
Username = Email address
create view k2V_UserNameEmailCheck as Select n.ID, n.EMAIL, u.ContactMaster, u.UserId, VALID_CHECK = case when n.EMAIL = '' then 'Email missing' when u.UserId = '' then 'Username missing' when n.EMAIL <> u.UserId then 'Invalid Username' else '' end From Name n, UserMain u Where n.ID = u.ContactMaster And n.STATUS = 'A' And n.MEMBER_RECORD = 1 And n.COMPANY_RECORD = 0 And u.UserId <> ''
Username = Email address
Username = Email address
How about some tools to take with you?
• IQA Query #3
• Ensure Canadian postal code matches correct format (LNL NLN)
• Unfortunately another example of not being able to use more advanced logic in IQA comparisons
• SQL and Business Object time again …
Check for Valid/Correct Postal Code
create view k2V_PostalCodeCheck as Select ID, ADDRESS_NUM, ZIP, VALID_CHECK = PATINDEX('%[A-Z]%', substring(ZIP,1,1)) + PATINDEX('%[0-9]%', substring(ZIP,2,1)) + PATINDEX('%[A-Z]%', substring(ZIP,3,1)) + PATINDEX('% %', substring(ZIP,4,1)) + PATINDEX('%[0-9]%', substring(ZIP,5,1)) + PATINDEX('%[A-Z]%', substring(ZIP,6,1)) + PATINDEX('%[0-9]%', substring(ZIP,7,1)), COUNTRY From Name_Address Where ZIP <> '' And COUNTRY = 'Canada' And PATINDEX('%[A-Z]%', substring(ZIP,1,1)) + PATINDEX('%[0-9]%', substring(ZIP,2,1)) + PATINDEX('%[A-Z]%', substring(ZIP,3,1)) + PATINDEX('% %', substring(ZIP,4,1)) + PATINDEX('%[0-9]%', substring(ZIP,5,1)) + PATINDEX('%[A-Z]%', substring(ZIP,6,1)) + PATINDEX('%[0-9]%', substring(ZIP,7,1)) <> 7
Check for Valid/Correct Postal Code
Check for Valid/Correct Postal Code
How about some tools to take with you?
• IQA Query #4
• Find Duplicates based on First Name, Last Name, and part of address
Possible Duplicates (Name and City)
create view k2_PossibleDuplicates as Select ID From Name Where replace(lower(FIRST_NAME)+lower(LAST_NAME)+lower(CITY),' ','') in ( Select replace(lower(FIRST_NAME)+lower(LAST_NAME)+lower(CITY),' ','') as MATCH_KEY from Name where FIRST_NAME <> '' and LAST_NAME <> '' group by replace(lower(FIRST_NAME)+lower(LAST_NAME)+lower(CITY),' ','') having count(replace(lower(FIRST_NAME)+lower(LAST_NAME)+lower(CITY), ' ','')) > 1 )
Possible Duplicates (Name and City)
Possible Duplicates (Name and City)
How about some tools to take with you?
• IQA Query #5
• Members without paid dues subscriptions
• Considering paid thru as “no paid dues”
Unpaid Members
Unpaid Members
Custom Shortcuts to Help with Editing
• Create a simple custom page with just the address and security details
• Pass the ID to the custom page from each IQA
Custom Shortcuts to Help with Editing
• Create a simple custom page with just the address and security details
• Pass the ID to the custom page from each IQA
Custom Shortcuts to Help with Editing
More ideas from Michelle Lelempsis!
More ideas from Michelle
More ideas from Michelle
Michelle’s favorite Stored Procedures
• Job coding in Financials for division
• Phone formatting
• Country specific (Australia/Canada/USA) removal from address
• Unlocking member web accounts
• Activity tasks to notify a staff about an issue
Michelle’s plan
• Data sources and data integrity
• Develop a Data Management Plan with key teams and management
• Identify your Minimum Data Set
• Develop iQA Reports for daily, weekly, monthly and adhoc checks
• Identify SQL Stored Procedures for updating fields
• Develop Staff Sites dashboards
• Implement Staff Site Alerts
Staff Site Data Integrity Dashboards Community & Events i20 2015Q4
Staff Site Data Integrity iMIS 20 Q4
• Staff site Alerts
Let’s hear from you!
Click to add text CLICK TO ADD TEXT
Click to add text
• Click to add text • Click to add text
THANK YOU TO OUR SPONSORS
FOUNDING PARTNER
PLATINUM PARTNERS
GOLD PARTNERS
SILVER PARTNERS