44
Data Integrity Verification Michael Kano, ACDA 1 Data Integrity Verification IIA Orange County Chapter November 13, 2015

Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Embed Size (px)

Citation preview

Page 1: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Data Integrity Verification

Michael Kano, ACDA

1Data Integrity Verification

IIA Orange County Chapter

November 13, 2015

Page 2: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Excel Transformed My Data!

Data Integrity Verification2

BEFORE AFTER

101122001XIOB00260002 10112210000000

Page 3: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

3

Michael KanoSenior Manager, Data Analytics

Sunera LLC

Michael is a Senior Manager with Sunera’s national data analytics practice. Michael has 20 years of experience in data analytics and internal audit with organizations in the USA, Canada, and Kuwait.

He has 20 years of experience with ACL software, including 8 years as the leader of ACL Services Ltd.’s global training team. During his tenure at ACL Services, Michael helped drive the training business to new levels of revenues and profits by actively supporting the Sales team in pre-sales discussions.

Michael’s most recent experience consists of four years with eBay, Inc.’s internal audit team as Manager, Audit Analysis. He was tasked with integrating data analytics into the audit workflow on strategic and tactical levels. This included developing quality and documentation standards, training users, and providing analytics support on numerous audits in the IT, PayPal, and eBay marketplaces business areas. He also provided support to non-IA teams such as the Business Ethics Office and Enterprise Risk Management teams.

During his years at eBay, Michael supported audits throughout the organization in the IT, compliance, operations, vendor management, revenue assurance, T&E, and human resources areas.

Michael also has 7 years of experience with Arbutus Software, and has managed the transition to Arbutus from other data analysis tools. He is a proficient user of Tableau, Microsoft Access, and Teradata SQL Assistant.

Page 4: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

AGENDA

� Defining data integrity verification (DIV)

� Sources of integrity erosion

� File-level testing

� Field-level testing

Data Integrity Verification4

Page 5: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Defining Data Integrity Verification

5Data Integrity Verification

Page 6: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Data Integrity Verification (DIV)

• The process by which the data analyst tests

the data to determine whether it is acceptable

for analysis

• Tests should be carried out at both the file

level and the field level before conducting any

analytics.

Data Integrity Verification6

Page 7: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

The Risks of Integrity Erosion

• Lost time

• Incorrect conclusions

• Revenue/cost

• Security

• Professional standing

7Data Integrity Verification

Page 8: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Evidence of data integrity erosion

• Missing records

• Excess records

• Duplicates

• Shifted fields

• Skewed records

Data Integrity Verification8

• Blank/invalid entries

in key fields

• Incorrect/invalid

formatting

• Invalid characters in

data

Page 9: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Shifted Fields

Data Integrity Verification9

Page 10: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Skewed Records

Data Integrity Verification10

Page 11: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Sources of Integrity Erosion

11Data Integrity Verification

Page 12: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Processing…

Data Integrity Verification12

Page 13: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

The Process

13Data Integrity Verification

Page 14: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Sources of data integrity errors

• Miscommunication of requirements

• Extraction

• Conversion

• Transmission

• Import

• Manual edits

• Data definition

Data Integrity Verification14

Page 15: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Miscommunication

• "All AP transactions between April and June,

including all important fields."

• "All AP payments and reversals between

4/1/2015 and 6/30/2015 (inclusive) including

the following fields: <field list>. The output

should be in a tab-delimited text file, and at

no point should it pass through a spreadsheet

or be opened in a spreadsheet application."

15Data Integrity Verification

Page 16: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Conversion

• Dropping leading zeros (ID numbers)

• Converting date to numeric

• Removing alphas from alphanumeric field

• Use of delimiter that is included within a text

field

• Insertion of blank lines in Excel

Data Integrity Verification16

Page 17: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Date Conversion

Data Integrity Verification17

Page 18: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Manual Edits

• Inadvertent/deliberate editing

• How does that happen?

– Sorting

– Formatting

– Copy/pasting

18Data Integrity Verification

Page 19: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Data Definition

• Record length

• Field position

• Formatting (date fields)

Data Integrity Verification19

Page 20: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

File-Level Testing

20Data Integrity Verification

Page 21: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

File-Level Testing

• Structure

• Content

Data Integrity Verification21

Page 22: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Structure

• Review metadata

• Send table layout to a table in Arbutus/ACL

• Compare field type/length/format to

metadata

22Data Integrity Verification

Page 23: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Content

• Completeness

– Run COUNT to document number of records

– Run TOTAL on numeric fields for control totals

• Uniqueness: Run DUPLICATES command

selecting all fields to identify duplicate

records

• Validity: Run VERIFY against numeric and

date fields

Data Integrity Verification23

Page 24: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Field-Level Testing: Numerics

24Data Integrity Verification

Page 25: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Numeric Fields: What to look for

Data Integrity Verification25

• Field total • Lowest value

•Highest value •Average

•Second-highest value •Range

•Ratio of 2nd highest to highest •Absolute value

•Median •Number of zeros

•Number of positives •Number of negatives

•Number of corrupt entries

Page 26: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Testing Numeric Fields

• Run STATISTICS against all numeric fields

– Look for zeros, negatives, bounds,

highest/second-highest

• Recalculate computed value with computed

fields (e.g, Total_Amount = Price * Quantity)

Data Integrity Verification26

Page 27: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Scripted Solution

Data Integrity Verification27

•Shows table/field names, and test date-time in a table

•Provides comprehensive, standard test results

•Faster and less error-prone than manual execution

•2 million records, 4 numeric fields in ~45 seconds

•Also saves table layout for file with _TL suffix

Page 28: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Script Results: Numerics

28Data Integrity Verification

Page 29: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Field-Level Testing: Dates

29Data Integrity Verification

Page 30: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Date Fields: What to look for

Data Integrity Verification30

•Oldest •Weekends

•Most recent •Blanks

•Span of valid dates •Invalid non-blank dates

Page 31: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Testing Date Fields

• Run STATISTICS against all date fields

– Blanks/invalids/weekends

– Bounds

• Test related fields, e.g., PO_Date <=

Invoice_Date

• Test for completeness (24/7 data) with GAPS

command

Data Integrity Verification31

Page 32: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Blank Dates & Formatting

• Entire date column is blank = Incorrect format

in field definition.

• Edit >> Table Layout to review and correct

format

Data Integrity Verification32

Page 33: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Formatting Date Fields

Data Integrity Verification33

Page 34: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Dates: Scripted Solution

34Data Integrity Verification

Page 35: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Field-Level Testing: Characters

35Data Integrity Verification

Page 36: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Character Fields: What to look for

Data Integrity Verification36

Item Functionality

Blanks ISBLANK(<key>)

Invalid entries CLASSIFY ON <key>

CLASSIFY ON FORMAT(<key>)

Duplicates DUPLICATES ON <key>

Page 37: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Character Fields: Formats

• Verify that format is valid

• May need to scrub

• PO numbers, customer IDs, phone numbers,

zip codes

• Use FORMAT() function in CLASSIFY to

display list of unique formats

CLASSIFY ON FORMAT(<field name>) TO "<output file>" OPEN

Data Integrity Verification37

Page 38: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Output of CLASSIFY + Format()

Data Integrity Verification38

•1 record per format

•Shows frequency

x= lower-case alpha

X = upper-case alpha

9 = numeric

Blanks/special characters

Page 39: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Mitigating Integrity Risk

39Data Integrity Verification

Page 40: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Key Items

• Know your data

• Obtain data independently (SQL?)

• Short chain from extraction to analysis

• Automated DIV

40Data Integrity Verification

Page 41: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

The Process

41Data Integrity Verification

Page 42: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

The New Process

42Data Integrity Verification

Page 43: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Benefits

• Independence

• Confidence

• Shorter time

• Comprehensive DIV

Data Integrity Verification43

Page 44: Data Integrity Verification - Chapters Site County/IIA OC Presentation... · Data Integrity Verification Michael Kano, ACDA Data Integrity Verification 1 IIA Orange County Chapter

Any questions?

Michael Kano, ACDA

[email protected]

Data Integrity Verification44