Copyright © OASIS, 2000 Onwards
Customising OASIS CIQ Specifications V3.0 to meet end user requirements – A Case StudyRam KumarFounding Chairman
October 2008October 2008
http://www.oasis-open.org/committees/ciq
Copyright © OASIS, 2000 Onwards
Agenda Why this case study? Code List
What, Why, Standard OASIS Code List Representation TC Methodology : Schematron based Value Validation
using Genericode (from OASIS Code List TC) OASIS CIQ TC Implementation of OASIS Code
List Specifications and Methodology – A Case Study
Copyright © OASIS, 2000 Onwards
Why this Case Study?
Copyright © OASIS, 2000 Onwards
Why this case study? Demonstrate how OASIS CIQ Specifications v3.0
can be customised to meet end user requirements Without breaking the conformance to the specifications
due to customisation Improve interoperability of data defined/represented
using CIQ Specifications Define specific business rules using open industry
standards to customise CIQ specifications Define code lists of CIQ specifications using open
industry standards
Copyright © OASIS, 2000 Onwards
Code List
Copyright © OASIS, 2000 Onwards
What is a Code List?aka enumerations, aka controlled vocabularies aka classification scheme and classification values
A set of values to choose from which represent an agreed upon semantic concept
Days of a week = {“Mon”, “Tue”, “Wed”, “Thu”, “Fri”, “Sat”, “Sun”}
Code List = List Name + values List Name = Days of a week Values = {“Mon”, “Tue”, “Wed”, “Thu”, “Fri”, “Sat”,
“Sun”}
Copyright © OASIS, 2000 Onwards
Why Code Lists are important?
It is not just elements and attribute names in XML that need to be semantically unambiguous & aligned for interoperability
The lexical form of element and attribute text content also needs to be aligned, i.e. simple data items need to be represented the same way
This is more important for applications For data oriented XML particularly (e.g. CIM), Code
Lists are as important as elements and attributes – they form part of the complete vocabulary of the document
Copyright © OASIS, 2000 Onwards
Standard for Code List
If code lists were really so simple and obvious, there would be a single, well known and acceptable way of handling them in XML
There is no agreed solution, though The problem is that while code lists are a
well understood concept, people do not actually agree on exactly what code lists are, and how they should be used
Copyright © OASIS, 2000 Onwards
The code list is in the eyes of the beholder
The XML schema may require only a 3-letter codes to represent the code list
The database may require a set of numeric codes, plus display labels (possibly in different languages)
The application may need to know which 3-letter code corresponds to which numeric code, so that it can process the XML and update the database
All of this code list information needs to be stored together in a single representation of the code list, so that all usages of code list can be generated from the same source information
Copyright © OASIS, 2000 Onwards
The only constant is change
Code lists change For a code list model to be useful, it has to
account for the fact that the code lists will change over time
There is little use in having a code list model that works only for a code list that is frozen in time
The code list model has to support changes between versions of a code list
Copyright © OASIS, 2000 Onwards
The only constant is change
Not all changes to a code list are version changes, however Some changes may be local changes to a distributed code
list The ISO 3-letter currency code list contains GBP for British
Pounds. However, prices on the London Stock Exchange are normally quoted in pence
This has led to the practice of adding an extra code to the standard ISO list (e.g. GBp, GBX) in order support pence as well as pounds
This kind of customisation is far from uncommon The utility of any code list model is greatly reduced if it does
not cater for local modifications of code lists
Copyright © OASIS, 2000 Onwards
OASIS Code List Representation Technical Committee The OASIS Code List Representation format, “genericode”,
is a single model and XML format (with a W3C XML Schema) that can encode a broad range of code list information
The XML format is not designed for run-time orreal-time use, but to have the standardizedinterchange format massaged into an optimized representation
27 of the 40 requirements gathered are implemented in v1.0 of the specifications
Copyright © OASIS, 2000 Onwards
Genericode Model Has a tabular structure for code list information Each row in the table represents a single distinct entry in the code list,
i.e. each row represents a single uniquely identifiable item in the code list.
Each column in the table represents a metadata value that can be defined for each distinct entry in the code list. Each column is either required or optional. A required column does not allow any row to have an undefined (nil or null) value. An optional column allows undefined values.
A genericode key is a set of one or more required columns that together uniquely identify each distinct entry in the code list. Optional columns cannot be used for keys. Each code list must have at least one key.
Genericode keys are equivalent to what people usually mean when they talk about the “codes” in a code list. However, genericode allows multiple keys for each code list, and there is no single preferred key.
Copyright © OASIS, 2000 Onwards
Concept Keep code lists aka enumerations out of the core
XML schema by using “schemes” The idea is that the code lists from which an element
value is taken is indicated via a “scheme” attribute containing a URI which represents the scheme (code list)
Same as the way that URIs are used to represent XML namespaces
This is done so that a newer version of core XML schema need not be released just because an externally controlled enumeration that it uses has changed (e.g. country code)
Copyright © OASIS, 2000 Onwards
Methodology : Context Value Association
using Genericode
Copyright © OASIS, 2000 Onwards
XML Instance Document ValidationNamespace: xmlns="urn:oasis:names:tc:ciq:xNL:3
Graphical Schema View:
XMLinstanceXXX.xml
<StsMetadataRecord>
xsi:schemaLocation="urn:oasis:names:tc:ciq:xNL:3”
<ESLVersionNumberID>5.0</ESLVersionNumberID><Person>
<cbc:BirthDate>1967-08-13</cbc:BirthDate><LearnerRegistration>
<cbc:NationalStudentNumberID>123456</cbc:NationalStudentNumberID></LearnerRegistration><PersonNameGeneric>
<cbc:FirstName>John</cbc:FirstName><cbc:LastName>Smith</cbc:LastName>
</PersonNameGeneric>. . .
Text view of XML instance:
XML instance documents can be validated against the applicable XML Schema
Copyright © OASIS, 2000 Onwards
Background (Glossary)
XML Data ContentIn an XML instance document, any values- between XML angles ‘>’ and ‘<’and- between quotes of an attribute are message data content
Examples:<BirthDate>1960-06-09</BirthDate>
<Country> <CountryCode listSchemeURI="urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1“>AUS</CountryCode> <Name>Australia</Name></Country>
Copyright © OASIS, 2000 Onwards
Background (Glossary), continued
Types of XML data content: Code values Other values (non-code values)
Examples:<Country>
<CountryCode>AUS</CountryCode></Country>
<BirthDate>1960-06-09</BirthDate>
Copyright © OASIS, 2000 Onwards
W3C XML Schema Limitations
W3C XML Schema is mostly about data structures
But it does some Data Content Validation has good support for
- data type conformity- min/max values- length, patterns …
has limited support for:- enumerations
has no support for- complex business rules- versioned changes of validation (without affecting the Schema’s version)
Copyright © OASIS, 2000 Onwards
Business Rules Examples
Date Arithmetic:
BirthDate < CurrentDate – 6 Years
Attribute Value Restriction:The code list value “First Name” cannot occur more than onceThe code list value “Last Name” cannot occur more than once
Element Use RestrictionCountry element cannot occur more than once, but optional
Zero-length string:
<Name></Name>
Copyright © OASIS, 2000 Onwards
Business Rules Examples, continued
Code Liststhe code list (+version) used by CountryCode must be an accepted code list<CountryCode listSchemeURI="urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1“>AUS</CountryCode>
Code ValueCountryCode ‘XYZ’ must be valid in that Country code list version <CountryCode listSchemeURI=" urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1“>AUS</CountryCode>
Co-occurrenceif Status=‘Closed’ then ClosureReason must be present also<StatusCode>Closed</StatusCode><ClosureReason>Obsolete</ClosureReason>
Copyright © OASIS, 2000 Onwards
Data Content Validation Conclusion XML Schema does not cover all data content validation
requirements Embedding content validation in XML Schema has undesired
consequences in conjunction with re-use and Schema versioning
Business rules vary more frequently than schema constraints, and the business rules between different partners wouldvary where the schema constraints remain the same.
By layering value constraints on top of structural/ lexical constraints, the schemas can remain unchanged while being adapted to different partners through different value constraints
Is data content validation required ? How can data content be validated in XML instances ?
Copyright © OASIS, 2000 Onwards
Without Data Content Validation in XML
Aextends
A
Content Validation at A: Content Validation at B:- Program code - Program code- Database constraints - Database constraints
Interoperability issues:- Validation at A equivalent to Validation at B?- Data quality of message is difficult to control- Communication of data quality issues between A & B- Relies on trust in the sender- Hard to ascertain equal interpretation of codes
XML file
W3C XMLDocument Schema
Schema Validation
Design
Implementation
Data ExchangePartner Agreement
Copyright © OASIS, 2000 Onwards
With Data Content Validation in XML
Sender’s and receiver's data content validation must be - electronic - portable- of shared logic and error output- platform-independent- versioned
Aextends
AXML file
XML Content Validation 2. Content Validation
Design
Implementation
Data ExchangePartner Agreement
W3C XMLDocument Schema
1. Schema Validation
Copyright © OASIS, 2000 Onwards
With Data Content Validation in XML
Sender’s and receiver's data content validation must be - electronic - portable- of equivalent logic and error output- platform-independent- versioned
Aextends
AXML file
Methodology 2. Content Validation
Design
Implementation
Data ExchangePartner Agreement
W3C XMLDocument Schema
1. Schema Validation
Copyright © OASIS, 2000 Onwards
Methodology - Features
Code Value ValidationExample:CountryCode must be a valid CountryCode
Code List Metadata ValidationExamples:CountryCode must belong to an agreed, named Country Code list (+version) urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1
Complex Rules ValidationExamples:- BirthDate < CurrentDate- StatusCode ‘Closed’ requires a ClosureReason.
Copyright © OASIS, 2000 Onwards
Methodology - Features, continued
Completely separate from W3C XML Schema
Platform-independent ISO/IEC 19757-3Schematron (implemented using W3C XSLT stylesheets) – Open Industry Standard
Completely independent of any XML Naming and Design Rules (NDRs)
Versioning in isolation of XML Schema
Copyright © OASIS, 2000 Onwards
Methodology - Process Overview
Schematron-based Value Validation using Genericode
ValidationCoding
W3C XML Validation Stylesheettransform generate
Data Exchange Partner Agreement
Data Content Validation Requirements
Copyright © OASIS, 2000 Onwards
Methodology - Involved Roles
Schematron-based Value Validation using Genericode
Data Content Validation Requirements
ValidationCoding
W3C XML Validation Stylesheettransform generate
Business Analysts & Testers
Users
(Developers)
(Data Architects)
Value Validation Service StaffRun-time Operator Specialist
Documentation
Developers & Testers
Users
Copyright © OASIS, 2000 Onwards
Methodology Run-Time Components
Aextends
A
W3CXML
ValidationStylesheet
XML file
W3CXML
DocumentSchema(s)
Copyright © OASIS, 2000 Onwards
Methodology - Value Validation
The validation process involves the use of Schematron language and XSLTs
Schematron is a rule-based XML Schema language, developed by Rick Jelliffe and internationally standardized as ISO/IEC 19757-3, using XPath expressions to describe validation rules .
Schematron is used to confirm the success or failure of a set of assertions made about XML document instances.
Schematron can be used as an adjunct to DTDs, RelaxNG or XML Schemas. It allows co-occurrence constraints, non-regular constraints, and inter-document constraints
Copyright © OASIS, 2000 Onwards
Methodology - Overview
Copyright © OASIS, 2000 Onwards
Methodology Data Flow Diagram
Copyright © OASIS, 2000 Onwards
A
B
C
D
E
F
Default Code List (gc)
XSDMethodology
XML
structure validationCode list validation
XML
Validated
Application A
B
C
G
H
Customised Code List (gc)
References
References
CVA
schXSL
Methodology - Process
Copyright © OASIS, 2000 Onwards
Application of the Process in an Enterprise
Enterprise Code ListsMethodology
Enterprise XML Schemas
Application B
Customised enterprise code
lists
Business Rules
Application A
Customised enterprise code
lists
Business Rules
Copyright © OASIS, 2000 Onwards
Methodology - Status OASIS Code List TC V1.0 No known platform-independent alternative
Plug-and-play run-time component
Methodology can evolve without impacting run-time requirements
A A
W3CXML
DocumentSchema
W3CXML
ValidationStylesheet
Copyright © OASIS, 2000 Onwards
Methodology - Benefits
Verify that instance document is valid as per DEPA Validate data content platform-independently Sender and receiver get the same validation result Simple run-time requirement (XSLT) Strong candidate to become a global industry standard
(UN/CEFACT is taking an interest) W3C Stylesheet and Schema are industry standards Simple run-time requirement (XSLT or Python
or any other ISO Schematron implementation)
Copyright © OASIS, 2000 Onwards
Methodology - Benefits, continued
Supports versioned validation in isolation of schema version
Documentation is in synch with implementation
Validation can be switched on/off as required (by msg. server or appl.)
Simplifies application coding
Simple run-time requirement allows for evolution of the methodology
Details of methodology is transparent to operations
Copyright © OASIS, 2000 Onwards
OASIS CIQ TC Case Study – Using the “Context Value Association” Methodology using Genericode to customise OASIS CIQ Spec. v3.0
Copyright © OASIS, 2000 Onwards
OASIS CIQ Technical Committee
Open Industry Specifications for defining Party Centric Data from global (international) perspective
Party – Person or Organisation Name (241+ countries in over 36 formats) Address/Location (241+ countries in over 130 formats)
Party Centric Attributes Party Relationships
Delivering royalty free, open, international, industry and application neutral XML specifications for representing, interoperating, and managing party(person/organization) centric information
Copyright © OASIS, 2000 Onwards
Why Genericode and the Methodology for CIQ TC? Keeps code list and values outside of the core CIQ
XML Schemas Provide users with the ability to define the
semantics for the data represented in CIQ structure
Provide users with the ability to customize the CIQ XML Schemas without modifying the CIQ XML Schemas
Provides users the ability to write business rules to constrain the structure of the CIQ XML Schemas without modifying the XML schemas
Copyright © OASIS, 2000 Onwards
OASIS CIQ Specifications Party Name Schema – xNL.xsd Supporting enumeration list (13) – xNL-types.xsd
Party Address Schema – xAL.xsd Supporting enumeration list (32) – xAL-types.xsd
Party Information Schema – xPIL.xsd Supporting enumeration list (60) – xPIL-types.xsd
Party Relationships Schema – xPRL.xsd Supporting enumeration list (10) – xPRL-types.xsd
Copyright © OASIS, 2000 Onwards
CIQ Specifications without Genericode Approach
Code Lists defined in “*-types.xsd” files
Copyright © OASIS, 2000 Onwards
Use Party Name as Case Study
Copyright © OASIS, 2000 Onwards
Code Lists defined in an XML Schema (xNL-types.xsd) that is “included” in xNL.xsd
Copyright © OASIS, 2000 Onwards
Enumeration List referenced from xNL-types.xsd
Copyright © OASIS, 2000 Onwards
xNL Enumeration List
Users given the choice to modify the code lists to meet their specific requirements
Basic default values provided, but it is up to the users to use them as is or customise it
Copyright © OASIS, 2000 Onwards
xNL Enumeration List - Drawbacks
Each application has to have its own enumeration list Point to point negotiations between applications No standard enumeration list file that remains untouched Change in enumeration list will result in change to
application code generation The Name schema might be used in multiple locations in an
organisation (e.g. billing, marketing, sales, customer identification) and hence, customising the enumeration list is not straightforward
It might be an overhead for an application to use a large code list when it requires only 3 values
Copyright © OASIS, 2000 Onwards
Objective of this case study
Move away from embedding code lists as XML schemas and “include” or “import” them in base XML schemas
Investigate the use of genericode approach and UMCLVV in CIQ Specifications
Implement genericode approach in CIQ Specifications as an optional feature
Customise the genericode based default code lists with specific requirements without modifying the default code lists
Apply business rule constraints on the core CIQ XML schemas without modifying the XML schemas
Copyright © OASIS, 2000 Onwards
Case Study - Scenarios Add a new code list value to default name code
list (“NativePlaceName”) Restrict the default name code list to allow no
more than one first and last name (“FirstName”, “LastName”)
Restrict the default code list to allow only “FirstName”, “LastName”, and “NativePlaceName” as code values
Apply business rule constraints on XML Schema
Customising the default xNL Code List without changing it to cater the above requirements is impossible
Copyright © OASIS, 2000 Onwards
Preparing xNL Schema with Genericode Approach to Handle
Code Lists
Copyright © OASIS, 2000 Onwards
Step 1- Create default .gc files
Identify and decide on list-level and instance-level metadata to be included
Create .gc file for each enumeration list in xNL-types.xsd
Ensure that the .gc file is valid structurally against genericode-code-list.xsd file
Copyright © OASIS, 2000 Onwards
.GC file - Example
Code Value
Copyright © OASIS, 2000 Onwards
List Level Metadata
Copyright © OASIS, 2000 Onwards
Instance Level Metadata In the absence of metadata properties for values in the
instance being validated, only the values found in the associated external list representation can be used. There being no qualification of the values in the instance, all values in the external file are in play as valid values for validation
If the instance being validated does have metadata properties specified for a given value, then that value is asserted to be a value from a particular version or identified list of values.
Instance level metadata allows an instance to disambiguate a coded value that might be the same value from two different lists.
Copyright © OASIS, 2000 Onwards
Step 2: Modify xNL.xsd
Remove references to enumeration list defined as xml schemas
Include distinct instance level metadata for all elements/attributes that uses code list values
Instance Level Metadata used Ref == genericode ShortName Ver == genericode Version URI == genericode CanonicalUri VerURI == genericode CanonicalVersionUri
Copyright © OASIS, 2000 Onwards
Instance Level Metadata
Instance level Metadata for “ElementType” attribute
xs: string
Copyright © OASIS, 2000 Onwards
Step 3: Prepare Context/Value Association (CVA) File
Every element and attribute information item below the document element of an XML document is in a document context described by its hierarchical ancestry of elements. A fully qualified document context specifies the information item’s precise location in the document.
Define the all the default document contexts with pointers to the default genericode files produced from xNL-types.xsd
Copyright © OASIS, 2000 Onwards
CVA File
Copyright © OASIS, 2000 Onwards
Step 4 - Prepare files for Value Validation
Run the supplied batch/shell files as part of the Methodology process to create the necessary files for code list value validation
Copyright © OASIS, 2000 Onwards
Applying Constraints to Default Code Lists
Copyright © OASIS, 2000 Onwards
Default Schema and Code List Values
- Add a new code value “NativePlaceName”
- Restrict the code values to have only “FirstName” and “LastName”
Copyright © OASIS, 2000 Onwards
Step 1 – Add a new code list value
Add a new code list value “NativePlaceName”
Create a gc file with this code value
Copyright © OASIS, 2000 Onwards
Step 2 – Restrict the default code list
Restrict the code values to only “FirstName” and “LastName”
Create a .gc file with this restriction
Copyright © OASIS, 2000 Onwards
Step 3 – Create Restriction CVA File
Copyright © OASIS, 2000 Onwards
Applying Business Rules to Constrain Default XML Schemas
Copyright © OASIS, 2000 Onwards
Step 4 – Define Business Rules to include constraints to default schema
Restrict the schema to accept only one First Name and one Last Name
Copyright © OASIS, 2000 Onwards
Business Rules to define constraint
No changes to xNL Schema
Copyright © OASIS, 2000 Onwards
Step 4 - Prepare files for Value Validation
Run the supplied demonstration batch/shell files as part of the Methodology process to create the necessary files for value validation
Copyright © OASIS, 2000 Onwards
CIQ Global Address Specification (xAL)
Can be customized to specific country address structure using the Methodology, but at the same time keeping the customized structure in compliance with xAL default structure
Copyright © OASIS, 2000 Onwards
Example 1: Customizing xAL for Singapore
Let us assume that Singapore Address does not require the following xAL elements:
Administrative Area Rural Delivery, or Post Office Location Coordinates Free Text Address Country
Copyright © OASIS, 2000 Onwards
Example 1: Customising xAL for Singapore
Copyright © OASIS, 2000 Onwards
Example 1: Business Rule for Singapore Address
No changes to xAL Schema
Copyright © OASIS, 2000 Onwards
Example 2: Customizing xAL to only use Free Text Address Lines
Copyright © OASIS, 2000 Onwards
Business Rule for Example 2
No changes to xAL Schema
Copyright © OASIS, 2000 Onwards
CIQ Specifications with Genericode Approach
Copyright © OASIS, 2000 Onwards
Skills Required to use OASIS Code List Approach
XML Schema Language Schematron Language XSLT (some times) XPATH XML Processors/XML Parsers Batch Files / Shell Files
Copyright © OASIS, 2000 Onwards
Experience using the Methdology and Genericode Approach
Powerful The only standard for managing code lists now in
industry Manual effort (requires patience) Painful without tool support But once everything has been set up, works
beautifully Does not deal with mapping between schemas
Copyright © OASIS, 2000 Onwards
OASIS Codelist Representation (Genericode) Version 1.0, December 2007, http://www.oasis-open.org/committees/codelist
Context Value Association, Working Draft, Version 0.4, April 2008, http://www.oasis-open.org/committees/codelist
OASIS Code List Adaptation Case Study (OASIS CIQ), Version 0.3, July 2007, http://www.oasis-open.org/committees/document.php?document_id=24813
OASIS Party Information Standards, http://www.oasis-open.org/committees/ciq
References
Copyright © OASIS, 2000 Onwards
Special Thanks……..
Ken Holman, Chair, OASIS Code List Representation TC
Juerg Tschumperlin, Data Management Solutions, New Zealand
Copyright © OASIS, 2000 Onwards
Thank You
http://www.oasis-open.org/committees/ciq