23
ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

Embed Size (px)

Citation preview

Page 1: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

ISO 19757 –Document Schema Definition

Languages (DSDL)

Martin Bryan

Convenor, JTC1/SC18 WG1

Page 2: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Parts of DSDL

1. Overview

2. Regular-grammar-based validation (RELAX NG)

3. Rule-based validation (Schematron)

4. Namespace-based validation dispatch language (NVDL)

5. Datatypes

6. Path-based integrity constraints

7. Character repertoire validation

8. Declarative document architectures

9. Datatype- and namespace-aware DTDs

10. Validation management

Page 3: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Regular-grammar-based validation (RELAX NG)

• XML description of a data model– Compact syntax is even simpler than DTDs

• Provides way of defining short-cuts– More functional than parameter entities

• Provides context-dependent models– Models can be amended when imported

• Supports namespaces and datatypes– Any datatype, including W3C Schema datatypes

• Can import modules from multiple namespaces– Can build multi-source schemas

Page 4: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Main components of RELAX NG

pattern ::= <element name="QName"> pattern+ </element>| <element> nameClass pattern+ </element>| <attribute name="QName"> [pattern] </attribute>| <attribute> nameClass [pattern] </attribute>| <group> pattern+ </group>| <interleave> pattern+ </interleave>| <choice> pattern+ </choice>| <optional> pattern+ </optional>| <zeroOrMore> pattern+ </zeroOrMore>| <oneOrMore> pattern+ </oneOrMore>| <list> pattern+ </list>| <mixed> pattern+ </mixed>| <ref name="NCName"/>| <parentRef name="NCName"/>| <empty/>| <text/>| <value [type="NCName"]> string </value>| <data type="NCName"> param* [exceptPattern] </data>| <notAllowed/>| <externalRef href="anyURI"/>| <grammar> grammarContent* </grammar>

Page 5: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Using the full syntax

<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

<start> <ref name="document"/> </start> <define name="document"> <element name="document"> <ref name="head"/> <ref name="body"/> </element> </define> <define name="head"> <element name="head"> <interleave> <element name="organization"> <choice> <value>ISO</value> <value>ISO/IEC</value> </choice> </element> <element name="document-type"> <choice> <value>International Standard</value> <value>Technical Report</value> <value>Guide</value> <value>Publicly Available Specification</value> <value>Technical Specification</value> <value>International Standardized Profile</value> </choice> </element>

Page 6: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Alternative compact syntax

• Can produce a whole ISO standard using just:namespace p = "http://relaxng.org/ns/proofsystem"datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"formal = element p:* { attribute * { text }*, (formal|text)* }inline &= formal*block |= formalblock |= element grammarref|rngref {attribute src { xsd:anyURI }}include "is.rnc“

• Can replace existing definitions with new one• Can extend definitions

– |= means “add this option to an existing OR group”

– &= means “add this option to an existing AND group”

• Can merge grammars

Page 7: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Rule-based validation (Schematron)

• “A Schematron schema contains natural-language assertions concerning a set of documents, marked up with various elements and attributes for testing these natural-language assertions, and for simplifying and grouping the assertions.”

• “A Schematron schema reduces to a non-chaining rule system whose terms are boolean functions invoking an external query language on the instance and other visible XML documents, with syntactic features to reduce specification size and to allow efficient implementation.”

Page 8: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Schematron example

<sch:rule context="failed-assert | successful-report"><sch:extends rule="second-level" /><sch:assert test="count(diagnostic-reference) + count(text)

= count(*)">The <sch:name/> element should only contain a text element

and diagnostic reference elements.</sch:assert><sch:assert test="count(text) = 1">The <sch:name/> element should only contain a text element.</sch:assert><sch:assert test="preceding-sibling::fired-rule |

preceding-sibling::failed-assert | preceding-sibling::successful-report">

A <sch:name/> comes after a fired-rule, a failed-assert or a succesful-report.

</sch:assert></sch:rule>

Page 9: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Schematron core elements

• active • assert • extends • include • let • name • ns • param • pattern • phase • report • rule • schema • value-of

Page 10: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Ancilliary elements and attributes

• diagnostics element• diagnostic element• dir element• emph element• p element• span element• title element

• flag attribute• fpi attribute• icon attribute• role attribute• see attribute• subject attribute

Page 11: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Namespace-based ValidationDispatching Language (NVDL)

• Allows data from different namespaces to be validated by different processes– Can validate one namespace using RELAX, another using a DTD

and a third using a W3C Schema

• Simple and full syntaxes– Full syntax simplified to simple syntax before use

• All validation is done in context– Slots are created to identify where data from alternative

namespaces has been removed• Allows attributes from different namespaces to be

validated • Elements and attributes in different namespaces are separated

into separate “sections”

Page 12: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

NVDL example – HTML + XForms (1)

<rules xmlns="purl://dsdl.org/nvdl/ns/structure/1.0" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"> <namespace ns="http://www.w3.org/2002/06/xhtml2"> <validate schema="xhtml2.rng"> <mode> <namespace ns="http://www.w3.org/2002/xforms"> <validate schema="xforms.rng"> <mode> <namespace ns="http://www.w3.org/2002/xforms"> <attach message="Skipped descendant XForms sections."/> </namespace> <namespace ns="http://www.w3.org/2002/06/xhtml2"> <unwrap message="Skipped descendant XHTML2 sections."/> </namespace> </mode> </validate> …

Page 13: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

NVDL example (2)

<unwrap> <mode> <namespace ns="http://www.w3.org/2002/xforms"> <unwrap message="Skipped descendant XForms"/> </namespace> <namespace ns="http://www.w3.org/2002/06/xhtml2"> <attach message="Any descendant XHTML2 sections"/> </namespace> </mode> </unwrap> </namespace> </mode> </validate> </namespace></rules>

Page 14: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Datatypes

• Allows multiple datatype sets to be defined– W3C datatypes can be used as the base

• Will allow user-defined datatype primitives to be added– Needed for extended date/period formats, etc

• Will provide mechanism for defining complex patterns– Patterns based on supertypes will be allowed

• Normalization of values, comparing results after normalization– Convert local date formats to ISO 8601 then compare

Page 15: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Possible form for Part 5

<datatype name="price">

<supertype name="decimal">

<cast>

<if test="not(sign='-')">

<copy-of select="whole-part"/>

<text>.</text>

<my:fraction-part>

<value-of select(substring(concat(fraction-part, '00'), 1,2)"/>

</my:fraction-part>

</if>

</cast>

</supertype>

</datatype>

Page 16: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Path-based integrity constraints

• Non-hierarchical links between information items in a structured resource can be identified by addressing items within the document tree and then expressing the relationship between them.

• Provides a method for identifying information items dependent on ancestry or the use of keys

• And a method for describing the role of relationships that are not hierarchical

• Allows selection of fragments to be validated• Will include an extensible basis for supporting

mechanisms not currently available

Page 17: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Character repertoire validation

• User-defined character sets that can be used to validate the contents of elements or attributes– Will be able to check that only characters relevant for

a particular language are used, not all those in a particular Unicode character block

• Schematron-like rules for associating character repertoires with a particular element or attribute

<sch:rule context="*[/*[@xml:lang='nl']]"> <sch:assert test="\p{IsBasicLatin}\p{IsLatin-1Supplement} &#x132;&#x133;\p{IsGeneralPunctuation}\p{IsCurrencySymbols}"> If this document is a Dutch document, it should have only characters used in typical Dutch publishing. </sch:assert> </sch:rule>

Page 18: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Declarative document architectures

• Allows locally meaningful names to be assigned to schema components– 80/20 rule allows many functions of abstract classes

• Allows predefined fragments to be defined within schema – Reintroduces entity definitions in a more controllable

form– May contain optional components

• Can even re-define entity names– No longer restricted to English-based prompts to

reference standard entity references such as &nbsp;

• Removing elements/attribute in defined contexts

Page 19: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Datatype/Namespace-aware DTDs

• Shows how the ISO 8879/XML Document Type Definition (DTD) syntax can be extended to validate documents that make full use of XML Namespaces and Part 5 Datatypes

• May be extended to add character repertoire validation

• Will allow DTDs to be used to validate any XML document, including those defined using Part 2

• Will allow SGML documents to be treated as input to ISO 19757 validation processes

Page 20: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Validation management

• Includes a mechanism to invoke parsers which read non-XML sources (and XML sources that can't be identified by a single URI) to create XML Infosets that can be used for subsequent processing

• Allows pre-validation transformations to be used to normalize and/or subset documents before validation

• Multiple validations and transformations may be applied• Transformations will be able to split a document into

multiple resulting documents• Includes facilities to generate customized validation

reports which can be output as XML document instances that can be processed by other applications

Page 21: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Possible format for Part 10

<framework>

<rule>

<instance>

<transform transformation="normalize.xslt"/>

</instance>

<assert>

<isValid schema="my-schema.rng"/>

<isValid schema="my-schema.sch"/>

</assert>

</rule>

</framework>

Page 22: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Current status

• Published– Part 2, RELAX-NG

• At Committee Draft stage– Part 3, Schematron– Part 4, NVDL

• Working Draft under consideration– Part 1, Overview– Part 7, Character repertoire validation– Part 8, Declarative document architectures– Part 10, Validation management

• Parts 5, 6 & 9 not yet drafted

Page 23: ISO 19757 - DSDL ISO 19757 – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1

ISO 19757 - DSDL

Tracking progress

• Via your national standards body– IST/41 at BSI

• Via XML UK or any ISUG chapter– Martin Bryan is XML UK representative on IST/41 and

ISUG representative for SC34/WG1

• Via the DSDL public website– http://www.dsdl.org