View
488
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This presentation was from a joint BCS/DAMA event on 20/6/13 discussing different aspects of assessing data quality and the role that data quality dimensions can play. This presentation was by Tim King, LSC Group who provided an overview on ISO8000 and the standards perspectives to assessing data quality. The video for this presentation is available here https://www.youtube.com/watch?v=kftnEO_A49c
Citation preview
The Great Data Debate – Do data quality dimensions have a place in assessing data quality?
DAMA UK/ BCS Data Management Specialist Group – 20th June 2013
ISO 8000: Systemic and systematic data quality
03
Tim King, LSC Group
ISO 8000: Systemic & systematic data quality
Dr. Timothy M. KING CEng CITP FIMechE FBCS DIC ACGI
IKM Principal Consultant, LSC Group
Convenor, ISO/TC184/SC4/WG13
DAMA / BCS DSMG Do data quality dimensions have a place in assessing data quality?
2013-06-20
The context
• ISO/TC184/SC4
– "Industrial data"
– sub-committee of ISO/TC184 – "Automation systems & integration"
– founded July 1984
• standards for exchange, sharing & archiving of industrial data
– ISO 10303 – Product data representation & exchange
– ISO 13584 – Parts library
– ISO 15531 – Industrial manufacturing management data
– ISO 15926 – Integration of life-cycle data for process plants
– ISO 16739 – Data sharing in the construction & facility management industries
– ISO 17506 – 3D visualization of industrial data
– ISO 18629 – Process specification language
– ISO 18876 – Integration of industrial data for exchange, access & sharing
– ISO 22745 – Open technical dictionaries & their application to master data
– ISO 29002 – Exchange of characteristic data
23
The context
• standards for exchange, sharing & archiving of industrial data
– ISO 10303 – Product data representation & exchange
– ISO 13584 – Parts library
– ISO 15531 – Industrial manufacturing management data
– ISO 15926 – Integration of life-cycle data for process plants
– ISO 16739 – Data sharing in the construction & facility management industries
– ISO 17506 – 3D visualization of industrial data
– ISO 18629 – Process specification language
– ISO 18876 – Integration of industrial data for exchange, access & sharing
– ISO 22745 – Open technical dictionaries & their application to master data
– ISO 29002 – Exchange of characteristic data
ISO/TC184/SC4/WG13 "Industrial data quality"
developing ISO 8000 "Data quality" since 2006
24
ISO/TC184/SC4/WG13
• "Industrial data"
• founded 2006
• three face-to-face meetings per year
– two in parallel with parent committee ISO/TC184/SC4
• teleconference calls using Webex
– provided by ISO with free dial capability for all participants
• e-mail distribution list
– 150+ experts (including academics, engineers, scientists, consultants)
– 20+ countries
– manufacturing, logistics, mining, health, finance
• typical attendance at meetings of 15 to 20 individuals
25
What is data quality? 26
What is data quality?
• ... lost upon entry into orbit around Mars
• the Executive Summary from the Mishap Investigation Board identified that the primary cause of the accident was a data quality issue …
The Mars Climate Orbiter
"thruster performance data in English units was used … the data … was required to be in metric units per existing software interface
documentation"
27
What is data quality?
data quality
spare part in warehouse but not recorded in
computer
number in stock
= 0
data has no sensible interpretation
length of bolt
= "green"
self-intersecting curve in CAD file
28
What is data quality?
• ISO/IEC 25012 (Software engineering data quality model)
• ISO/IEC 15288 (Systems engineering)
• Accenture
• US Defense Logistics Information Service
• Butler Group
• Korean Database Promotion Centre
• Shell
• UK MOD Acquisition Management System
• DGIQ (German Data & Information Quality Association)
• IAIDQ (International Association for Information & Data Quality)
29
What is data quality?
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity
consistency consistent representation correctness cost / benefit
credibility currency current currentness ease of manipulation
efficiency flexibility free of error inaccurate integrity
interpretability legible liability necessity objectivity outdated
portability precision protection recoverability redundancy
redundant referential integrity relevance relevancy relevant
reputation retrievability safety security sufficiency timeliness
timeliness / timely traceability unanimity understandability
usability utility utilization validity validity of data content
validity of format value added verifiable
30
ISO/IEC 25012 (Software engineering data quality model)
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity
consistency consistent representation correctness cost / benefit
credibility currency current currentness ease of manipulation
efficiency flexibility free of error inaccurate integrity
interpretability legible liability necessity objectivity outdated
portability precision protection recoverability redundancy
redundant referential integrity relevance relevancy relevant
reputation retrievability safety security sufficiency timeliness
timeliness / timely traceability unanimity understandability
usability utility utilization validity validity of data content
validity of format value added verifiable
31
IAIDQ (International Association for Information & Data Quality)
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity
consistency consistent representation correctness cost / benefit
credibility currency current currentness ease of manipulation
efficiency flexibility free of error inaccurate integrity
interpretability legible liability necessity objectivity outdated
portability precision protection recoverability redundancy
redundant referential integrity relevance relevancy relevant
reputation retrievability safety security sufficiency timeliness
timeliness / timely traceability unanimity understandability
usability utility utilization validity validity of data content
validity of format value added verifiable
32
What is data quality?
ISO/IEC 25012
Software engineering data
quality model
IAIDQ
International Association for
Information & Data Quality
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity consistency
consistent representation correctness cost / benefit credibility
currency current currentness ease of manipulation efficiency
flexibility free of error inaccurate integrity interpretability legible
liability necessity objectivity outdated portability precision
protection recoverability redundancy redundant referential integrity
relevance relevancy relevant reputation retrievability safety
security sufficiency timeliness timeliness / timely traceability
unanimity understandability usability utility utilization validity
validity of data content validity of format value added verifiable
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity consistency
consistent representation correctness cost / benefit credibility
currency current currentness ease of manipulation efficiency
flexibility free of error inaccurate integrity interpretability legible
liability necessity objectivity outdated portability precision
protection recoverability redundancy redundant referential integrity
relevance relevancy relevant reputation retrievability safety
security sufficiency timeliness timeliness / timely traceability
unanimity understandability usability utility utilization validity
validity of data content validity of format value added verifiable
33
What is data quality? 34
The fundamentals of quality
continual improvement of the quality management
system
customer
ISO 9000:2005 A process-based
quality management system accountability
measurement, analysis &
improvement
management responsibility
resource management
satisfaction
output
input requirements
product
product realization
35
Information & data quality
continual improvement of the quality management
system
customer
ISO 9000:2005 A process-based
quality management system accountability
measurement, analysis &
improvement
management responsibility
resource management
satisfaction
output
input requirements
product
product realization
for data processes, "product" is data
product
quality is conformance to requirements, data quality is conformance to data requirements
requirements
a process focus is the basis on which to build in quality
product realization
36
The different perspectives on information & data quality
business processes
• the primary, core processes of interest to the user, involving making decisions & achieving outcomes for which the user is responsible
• examples of these processes include designing an aircraft, recruiting a new member of staff, extinguishing a fire, manufacturing ice cream etc.
37
The different perspectives on information & data quality
business processes
information management
• the means by which data are made available to ensure the right person at the right time can make the right decision as part of a particular business process
• ISO 15288 identifies the following tasks as forming information management: generate, collect, transform, retain, retrieve, disseminate & dispose
DAMA-DMBOK Guide
• data governance
• data architecture management
• data development
• database operations management
• data security management
• reference & master data management
• data warehousing & business intelligence management
• document & content management
• meta data management
• data quality management
38
The different perspectives on information & data quality
business processes
information management
data enable processes
processes create data
resources enable information management
• any component by which to achieve the required outcomes of information management
• these resources include people, software & hardware
39
The different perspectives on information & data quality
business processes
information management
data enable processes
processes create data
resources enable information management
process focus quality
management & process
maturity
data focus quality = conformance of data to requirements
ISO 9000
ISO 15504 (ISO 33000)
three types of quality
• syntactic
• semantic
• pragmatic
40
ISO 8000 – In-scope list
• The following are within the scope of ISO 8000:
– principles of data quality;
– characteristics of data that determine its quality;
– requirements for achieving data quality;
– requirements for the representation of data
requirements, measurement methods, and inspection
results for the purposes of data quality;
– frameworks for measuring and improving data quality.
41
The parts of ISO 8000
General
Information & data focus
Process focus
42
The parts of ISO 8000
General
Information & data focus
Process focus
1 Overview, principles & general requirements
2 Terminology
3 Taxonomy
43
The parts of ISO 8000
General
Information & data focus
Process focus
8 Information quality: Concepts & measuring
9 Information quality: Relationship to other standards
10 Exchange of data: Syntax, semantic encoding & conformance to data specification
20 Exchange of data: Provenance
30 Exchange of data: Accuracy
40 Exchange of data: Completeness
100 Master data: Overview
102 Master data: Terminology
110 Master data: Exchange of characteristic data: Syntax, semantic encoding & conformance to data specification
120 Master data: Provenance
130 Master data: Accuracy
140 Master data: Completeness
311 Usage guide for ISO 10303-59 (Product data quality-shape)
44
The parts of ISO 8000
General
Information & data focus
Process focus
60 Data quality management: The overview of process assessment
61 Data quality management: Process reference model
62 Data quality management: Process maturity assessment model
63 Data quality management: Measurement framework
150 Master data: Quality management framework
45
Some complications
• "information" & "data"
– definitions from ISO/IEC 2382-1:1993
• data: "re-interpretable representation of information in a formalized manner suitable for communication, interpretation, or processing"
• information: "knowledge concerning objects, such as facts, events, things, processes, or ideas, including concepts, that within a certain context has a particular meaning"
• attributes? dimensions? does data have colour?
– try reading warning notices in red text when wearing night vision goggles …
– multiple layers to the issue
• ISO/IEC 25012: "Software engineering data quality model"
46
Case study Data quality requirements in master data
management
47
ISO 8000-120Master Data Warehouse
Portable master data with provenance
Load Data Capture
provenance data
Map metadata to eOTD
Convert to ISO 22745-40 data stream
ERP
ISO 22745Managed Ontology
Terminology Data requirements Classifications Description rules
Data Integration
Master Data Cleansing1. Identify reference data2. Identify or assign class3. Assign data requirement4. Map properties (attributes)5. Identify & standardize values6. Obtain missing data (enrich)7. Validate data
Create multilingualdescriptions
Identify potential duplicates
ECCMAManaged Ontology
Terminology (eOTD) Data requirements (eDRR) Classifications (eCLR)
ISO 8000 in implementation form
Courtesy of PiLog
48
Rigorous statement & exchange of requirements
Data requester
Data provider
Sub
Request for dataeOTD-q-xml
ISO 22745-35
Data exchange eOTD-r-xml
ISO 22745-40
Request for dataeOTD-q-xml
ISO 22745-35
Data exchange eOTD-r-xml
ISO 22745-40
Data requirementeOTD-i-xml
ISO 22745-30
49
52368965412 – Tire Bridgestone 435/95 R25
56329845 – Tyre BS 435/R25 Standard Purpose E3 2 Star Radial
125435 – Bridge Stone 25inch 435/95
965123465 – Tyre Bridgestone Part Number 12345
Inventory rationalization as a result of ISO 8000
Common ERP descriptions
Standardised Long Description:
Tire: Pneumatic, Vehicular: Service Type for Which Designed: Loader Tire Rim Nominal Diameter: 25' Tire Width: 445mm Aspect Ratio: 0.95 Tire Ply Arrangement: Radial Ply Rating: 2* Tire & Rim Association Number: E3 Tread Material: Standard Tire Air Retention Method: Tubeless Tire Load Index and Speed Symbol: NA Tread Pattern: VHB TKPH Rating: 80
Standardised Short Description:
Tire Pneumatic: Loader 25‘ 445mm 0.95 2*
50
The benefits of ISO 8000
vague data requirements
human-readable requirements
requirements differ from project to project
repeated cleansing of same non-conformances
ad hoc approaches to validation
explicit, measurable data requirements
computer-processable requirements
classified, common types of requirement
data right, first & every time
recommended types of validation
51
Conclusions
• systematic
– alignment with ISO 9000 principles of quality
– driven by explicit, robust data requirements
• systemic
– errors in data fields as a symptom of the real problem
– sustainable quality from the enterprise strategy downwards
52
Useful links
• ISO
– http://www.iso.org/iso/home.html
• ISO/TC184/SC4/WG13
– http://isotc.iso.org/livelink/livelink?func=ll&objId=8838237&objAction=brows
e&sort=name
• BSI AMT/4 "Industrial data & manufacturing interfaces"
– http://standardsdevelopment.bsigroup.com/Home/Committee/50001757
• LSC Group
– http://www.lsc.co.uk/
53