34
The Great Data Debate Do data quality dimensions have a place in assessing data quality? DAMA UK/ BCS Data Management Specialist Group 20 th June 2013

The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

Embed Size (px)

DESCRIPTION

This presentation was from a joint BCS/DAMA event on 20/6/13 discussing different aspects of assessing data quality and the role that data quality dimensions can play. This presentation was by Tim King, LSC Group who provided an overview on ISO8000 and the standards perspectives to assessing data quality. The video for this presentation is available here https://www.youtube.com/watch?v=kftnEO_A49c

Citation preview

Page 1: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The Great Data Debate – Do data quality dimensions have a place in assessing data quality?

DAMA UK/ BCS Data Management Specialist Group – 20th June 2013

Page 2: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

ISO 8000: Systemic and systematic data quality

03

Tim King, LSC Group

Page 3: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

ISO 8000: Systemic & systematic data quality

Dr. Timothy M. KING CEng CITP FIMechE FBCS DIC ACGI

IKM Principal Consultant, LSC Group

Convenor, ISO/TC184/SC4/WG13

DAMA / BCS DSMG Do data quality dimensions have a place in assessing data quality?

2013-06-20

Page 4: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The context

• ISO/TC184/SC4

– "Industrial data"

– sub-committee of ISO/TC184 – "Automation systems & integration"

– founded July 1984

• standards for exchange, sharing & archiving of industrial data

– ISO 10303 – Product data representation & exchange

– ISO 13584 – Parts library

– ISO 15531 – Industrial manufacturing management data

– ISO 15926 – Integration of life-cycle data for process plants

– ISO 16739 – Data sharing in the construction & facility management industries

– ISO 17506 – 3D visualization of industrial data

– ISO 18629 – Process specification language

– ISO 18876 – Integration of industrial data for exchange, access & sharing

– ISO 22745 – Open technical dictionaries & their application to master data

– ISO 29002 – Exchange of characteristic data

23

Page 5: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The context

• standards for exchange, sharing & archiving of industrial data

– ISO 10303 – Product data representation & exchange

– ISO 13584 – Parts library

– ISO 15531 – Industrial manufacturing management data

– ISO 15926 – Integration of life-cycle data for process plants

– ISO 16739 – Data sharing in the construction & facility management industries

– ISO 17506 – 3D visualization of industrial data

– ISO 18629 – Process specification language

– ISO 18876 – Integration of industrial data for exchange, access & sharing

– ISO 22745 – Open technical dictionaries & their application to master data

– ISO 29002 – Exchange of characteristic data

ISO/TC184/SC4/WG13 "Industrial data quality"

developing ISO 8000 "Data quality" since 2006

24

Page 6: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

ISO/TC184/SC4/WG13

• "Industrial data"

• founded 2006

• three face-to-face meetings per year

– two in parallel with parent committee ISO/TC184/SC4

• teleconference calls using Webex

– provided by ISO with free dial capability for all participants

• e-mail distribution list

– 150+ experts (including academics, engineers, scientists, consultants)

– 20+ countries

– manufacturing, logistics, mining, health, finance

• typical attendance at meetings of 15 to 20 individuals

25

Page 7: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

What is data quality? 26

Page 8: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

What is data quality?

• ... lost upon entry into orbit around Mars

• the Executive Summary from the Mishap Investigation Board identified that the primary cause of the accident was a data quality issue …

The Mars Climate Orbiter

"thruster performance data in English units was used … the data … was required to be in metric units per existing software interface

documentation"

27

Page 9: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

What is data quality?

data quality

spare part in warehouse but not recorded in

computer

number in stock

= 0

data has no sensible interpretation

length of bolt

= "green"

self-intersecting curve in CAD file

28

Page 10: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

What is data quality?

• ISO/IEC 25012 (Software engineering data quality model)

• ISO/IEC 15288 (Systems engineering)

• Accenture

• US Defense Logistics Information Service

• Butler Group

• Korean Database Promotion Centre

• Shell

• UK MOD Acquisition Management System

• DGIQ (German Data & Information Quality Association)

• IAIDQ (International Association for Information & Data Quality)

29

Page 11: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

What is data quality?

accessibility accessibility / security accuracy

appropriate amount of data authenticity availability believability

changeability clarity compatibility complete completeness

compliance concise representation conciseness confidential

confidentiality conformance with business rules congruity

consistency consistent representation correctness cost / benefit

credibility currency current currentness ease of manipulation

efficiency flexibility free of error inaccurate integrity

interpretability legible liability necessity objectivity outdated

portability precision protection recoverability redundancy

redundant referential integrity relevance relevancy relevant

reputation retrievability safety security sufficiency timeliness

timeliness / timely traceability unanimity understandability

usability utility utilization validity validity of data content

validity of format value added verifiable

30

Page 12: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

ISO/IEC 25012 (Software engineering data quality model)

accessibility accessibility / security accuracy

appropriate amount of data authenticity availability believability

changeability clarity compatibility complete completeness

compliance concise representation conciseness confidential

confidentiality conformance with business rules congruity

consistency consistent representation correctness cost / benefit

credibility currency current currentness ease of manipulation

efficiency flexibility free of error inaccurate integrity

interpretability legible liability necessity objectivity outdated

portability precision protection recoverability redundancy

redundant referential integrity relevance relevancy relevant

reputation retrievability safety security sufficiency timeliness

timeliness / timely traceability unanimity understandability

usability utility utilization validity validity of data content

validity of format value added verifiable

31

Page 13: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

IAIDQ (International Association for Information & Data Quality)

accessibility accessibility / security accuracy

appropriate amount of data authenticity availability believability

changeability clarity compatibility complete completeness

compliance concise representation conciseness confidential

confidentiality conformance with business rules congruity

consistency consistent representation correctness cost / benefit

credibility currency current currentness ease of manipulation

efficiency flexibility free of error inaccurate integrity

interpretability legible liability necessity objectivity outdated

portability precision protection recoverability redundancy

redundant referential integrity relevance relevancy relevant

reputation retrievability safety security sufficiency timeliness

timeliness / timely traceability unanimity understandability

usability utility utilization validity validity of data content

validity of format value added verifiable

32

Page 14: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

What is data quality?

ISO/IEC 25012

Software engineering data

quality model

IAIDQ

International Association for

Information & Data Quality

accessibility accessibility / security accuracy

appropriate amount of data authenticity availability believability

changeability clarity compatibility complete completeness

compliance concise representation conciseness confidential

confidentiality conformance with business rules congruity consistency

consistent representation correctness cost / benefit credibility

currency current currentness ease of manipulation efficiency

flexibility free of error inaccurate integrity interpretability legible

liability necessity objectivity outdated portability precision

protection recoverability redundancy redundant referential integrity

relevance relevancy relevant reputation retrievability safety

security sufficiency timeliness timeliness / timely traceability

unanimity understandability usability utility utilization validity

validity of data content validity of format value added verifiable

accessibility accessibility / security accuracy

appropriate amount of data authenticity availability believability

changeability clarity compatibility complete completeness

compliance concise representation conciseness confidential

confidentiality conformance with business rules congruity consistency

consistent representation correctness cost / benefit credibility

currency current currentness ease of manipulation efficiency

flexibility free of error inaccurate integrity interpretability legible

liability necessity objectivity outdated portability precision

protection recoverability redundancy redundant referential integrity

relevance relevancy relevant reputation retrievability safety

security sufficiency timeliness timeliness / timely traceability

unanimity understandability usability utility utilization validity

validity of data content validity of format value added verifiable

33

Page 15: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

What is data quality? 34

Page 16: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The fundamentals of quality

continual improvement of the quality management

system

customer

ISO 9000:2005 A process-based

quality management system accountability

measurement, analysis &

improvement

management responsibility

resource management

satisfaction

output

input requirements

product

product realization

35

Page 17: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

Information & data quality

continual improvement of the quality management

system

customer

ISO 9000:2005 A process-based

quality management system accountability

measurement, analysis &

improvement

management responsibility

resource management

satisfaction

output

input requirements

product

product realization

for data processes, "product" is data

product

quality is conformance to requirements, data quality is conformance to data requirements

requirements

a process focus is the basis on which to build in quality

product realization

36

Page 18: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The different perspectives on information & data quality

business processes

• the primary, core processes of interest to the user, involving making decisions & achieving outcomes for which the user is responsible

• examples of these processes include designing an aircraft, recruiting a new member of staff, extinguishing a fire, manufacturing ice cream etc.

37

Page 19: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The different perspectives on information & data quality

business processes

information management

• the means by which data are made available to ensure the right person at the right time can make the right decision as part of a particular business process

• ISO 15288 identifies the following tasks as forming information management: generate, collect, transform, retain, retrieve, disseminate & dispose

DAMA-DMBOK Guide

• data governance

• data architecture management

• data development

• database operations management

• data security management

• reference & master data management

• data warehousing & business intelligence management

• document & content management

• meta data management

• data quality management

38

Page 20: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The different perspectives on information & data quality

business processes

information management

data enable processes

processes create data

resources enable information management

• any component by which to achieve the required outcomes of information management

• these resources include people, software & hardware

39

Page 21: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The different perspectives on information & data quality

business processes

information management

data enable processes

processes create data

resources enable information management

process focus quality

management & process

maturity

data focus quality = conformance of data to requirements

ISO 9000

ISO 15504 (ISO 33000)

three types of quality

• syntactic

• semantic

• pragmatic

40

Page 22: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

ISO 8000 – In-scope list

• The following are within the scope of ISO 8000:

– principles of data quality;

– characteristics of data that determine its quality;

– requirements for achieving data quality;

– requirements for the representation of data

requirements, measurement methods, and inspection

results for the purposes of data quality;

– frameworks for measuring and improving data quality.

41

Page 23: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The parts of ISO 8000

General

Information & data focus

Process focus

42

Page 24: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The parts of ISO 8000

General

Information & data focus

Process focus

1 Overview, principles & general requirements

2 Terminology

3 Taxonomy

43

Page 25: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The parts of ISO 8000

General

Information & data focus

Process focus

8 Information quality: Concepts & measuring

9 Information quality: Relationship to other standards

10 Exchange of data: Syntax, semantic encoding & conformance to data specification

20 Exchange of data: Provenance

30 Exchange of data: Accuracy

40 Exchange of data: Completeness

100 Master data: Overview

102 Master data: Terminology

110 Master data: Exchange of characteristic data: Syntax, semantic encoding & conformance to data specification

120 Master data: Provenance

130 Master data: Accuracy

140 Master data: Completeness

311 Usage guide for ISO 10303-59 (Product data quality-shape)

44

Page 26: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The parts of ISO 8000

General

Information & data focus

Process focus

60 Data quality management: The overview of process assessment

61 Data quality management: Process reference model

62 Data quality management: Process maturity assessment model

63 Data quality management: Measurement framework

150 Master data: Quality management framework

45

Page 27: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

Some complications

• "information" & "data"

– definitions from ISO/IEC 2382-1:1993

• data: "re-interpretable representation of information in a formalized manner suitable for communication, interpretation, or processing"

• information: "knowledge concerning objects, such as facts, events, things, processes, or ideas, including concepts, that within a certain context has a particular meaning"

• attributes? dimensions? does data have colour?

– try reading warning notices in red text when wearing night vision goggles …

– multiple layers to the issue

• ISO/IEC 25012: "Software engineering data quality model"

46

Page 28: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

Case study Data quality requirements in master data

management

47

Page 29: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

ISO 8000-120Master Data Warehouse

Portable master data with provenance

Load Data Capture

provenance data

Map metadata to eOTD

Convert to ISO 22745-40 data stream

ERP

ISO 22745Managed Ontology

Terminology Data requirements Classifications Description rules

Data Integration

Master Data Cleansing1. Identify reference data2. Identify or assign class3. Assign data requirement4. Map properties (attributes)5. Identify & standardize values6. Obtain missing data (enrich)7. Validate data

Create multilingualdescriptions

Identify potential duplicates

ECCMAManaged Ontology

Terminology (eOTD) Data requirements (eDRR) Classifications (eCLR)

ISO 8000 in implementation form

Courtesy of PiLog

48

Page 30: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

Rigorous statement & exchange of requirements

Data requester

Data provider

Sub

Request for dataeOTD-q-xml

ISO 22745-35

Data exchange eOTD-r-xml

ISO 22745-40

Request for dataeOTD-q-xml

ISO 22745-35

Data exchange eOTD-r-xml

ISO 22745-40

Data requirementeOTD-i-xml

ISO 22745-30

49

Page 31: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

52368965412 – Tire Bridgestone 435/95 R25

56329845 – Tyre BS 435/R25 Standard Purpose E3 2 Star Radial

125435 – Bridge Stone 25inch 435/95

965123465 – Tyre Bridgestone Part Number 12345

Inventory rationalization as a result of ISO 8000

Common ERP descriptions

Standardised Long Description:

Tire: Pneumatic, Vehicular: Service Type for Which Designed: Loader Tire Rim Nominal Diameter: 25' Tire Width: 445mm Aspect Ratio: 0.95 Tire Ply Arrangement: Radial Ply Rating: 2* Tire & Rim Association Number: E3 Tread Material: Standard Tire Air Retention Method: Tubeless Tire Load Index and Speed Symbol: NA Tread Pattern: VHB TKPH Rating: 80

Standardised Short Description:

Tire Pneumatic: Loader 25‘ 445mm 0.95 2*

50

Page 32: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

The benefits of ISO 8000

vague data requirements

human-readable requirements

requirements differ from project to project

repeated cleansing of same non-conformances

ad hoc approaches to validation

explicit, measurable data requirements

computer-processable requirements

classified, common types of requirement

data right, first & every time

recommended types of validation

51

Page 33: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

Conclusions

• systematic

– alignment with ISO 9000 principles of quality

– driven by explicit, robust data requirements

• systemic

– errors in data fields as a symptom of the real problem

– sustainable quality from the enterprise strategy downwards

52

Page 34: The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King

Useful links

• ISO

– http://www.iso.org/iso/home.html

• ISO/TC184/SC4/WG13

– http://isotc.iso.org/livelink/livelink?func=ll&objId=8838237&objAction=brows

e&sort=name

• BSI AMT/4 "Industrial data & manufacturing interfaces"

– http://standardsdevelopment.bsigroup.com/Home/Committee/50001757

• LSC Group

– http://www.lsc.co.uk/

53