Upload
iman
View
29
Download
0
Embed Size (px)
DESCRIPTION
XML Schema: An Intensive One-Day Tutorial. Henry S. Thompson HCRC Language Technology Group University of Edinburgh. When you see this, it means there’s accompanying information in the Additional Materials handbook. 2. Overview. What are schemata, anyway? - PowerPoint PPT Presentation
Citation preview
XML Schema:An Intensive One-Day Tutorial
Henry S. ThompsonHCRC Language Technology
GroupUniversity of Edinburgh
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
2
Overview What are schemata, anyway?
The nature of document structure Schema as contract Taking control of structure definition
XML Schema: the activity The W3C and its WGs The Charter and Requirements The state of play
The Draft RECs A detailed walkthrough
Schemas and Layered Architecture
2When you see this, it means there’s accompanying information in the Additional Materials handbook
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
3
Terminology Documents have structure
Document types Document instances
Structure can be defined Informally (D. S. D.) SGML DTD XML DTD Schema using XML
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
4
Background SGML DTDs for D. S. D
Sperberg-McQueen Others
Considered for XML itself MCF, then RDF, now DCD, by Bray et al. XML-Data, two versions, now XML-Data
reduced, by Layman et al., then Frankston and Thompson
SOX, from Veo Corp. XSchema, from an ad-hoc group of
designers
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
5
Document Structure Two relations are constitutive
Part-of Kind-of
Existing DSD mechanisms use Content Models to specify part-of relations
But they only specify kind-of relations implicitly or informally
Making kind-of relations explicit would make both understanding and maintenance easier
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
6
Taking Control of D. S. D. Eric Naggum used to talk about
SGML allowing users to take control of their data
XML allows the same move one level up, for developers The starting point is much simpler The architecture is congenial The demand is there
We need to do this, to make the transition to validation easier
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
7
Why validate? A D. S. D. is a contract between
producers and consumers It provides a guaranteed interface Producers validate to ensure they are
providing what they promised Consumers validate to check up on
producers and to protect their applications
Application authors validate to simplify their task Leave error detection and analysis to the
validating parser
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
8
Reconstructing DTDs The Schema DTD is expressed in vanilla
XML Top level elements for declaring
Elements :-) Types Notations . . .
Subordinate element types for declaring Attributes Content models . . .
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
9An aside about terminology SGML and XML 1.0 talk about element
types XML Schema to date has been more
casual and just talked about elements Meaning either an element in an instance Or the abstraction which is described in a
DTD or Schema Further confused by XML Schema
making extensive use of type Also, schema means many different
things to different people I'll try always to say/write XML Schema. . .
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
10
A simple example<!ELEMENT text (#PCDATA|emph|name)*><!ATTLIST text timestamp NMTOKEN #REQUIRED>
<element name="text"> <type content="mixed"> <element ref="emph"/> <element ref="name"/> <attribute name="timestamp" type="date" minOccurs="1"/> </type></element>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
11The Schema Architecture: Static A document or an application or a
user identifies a schema Each is well-formed XML The schema is valid w.r.t the
Schema DTD The document is schema-valid w.r.t
the schema The schema is schema-valid wrt the
schema for schemas
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
12The Schema Architecture: Dynamic An XML application (XSP) which
schema-validates ‘Takes control’ because changing
how schemata work means changing the Schema DTD/schema for
schemas upgrading XSP accordingly not changing XML itself
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
13
The W3C XML Schema hopes to be a W3C
Recommendation The W3C is The World Wide Web
Consortium, a voluntary association of companies and non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Chairman (Tim Berners-Lee) holding all the high cards, but the big vendors (e.g. Microsoft, Adobe, Netscape) have a lot of power.
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
14
. . . and its WGs The XML recommendation was
written by the W3C’s XML Working Group
Which split itself into pieces, of which one is the XML Schema WG
Chartered in the autumn of 1998 Requirements document out in
February of 1999 Due to go to Last Call early in 2000
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
15
Requirements document Full of good and hopeful
requirements DTDs and more Support inheritance Data-friendly Good inventory of primitive
datatypes
5
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
16
The state of play Two component documents
Structures Datatypes
Three public working drafts so far May 1999 September 1999 November 1999:
Further (near-final) PWD out December 1999
http://www.w3.org/TR/xmlschema-1/
[contains pointers to previous drafts]
6
8
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
17The XML Schema worldview Validity and well-formedness are XML 1.0
concepts They are defined over character sequences
Namespace-compliant is a Namespace concept It's defined over character sequences too
Schema-validity is the XML Schema concept It is defined over XML document Infosets
So the whole XML Schema exercise is predicated on and layered on top of XML 1.0 well-formedness plus Namespaces Because they are constitutive of the Infoset
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
18
What's the Infoset? The XML 1.0 plus Namespaces
abstract data model Defines a modest number of
information items Element, attribute, namespace
declaration, ... Each has required and optional
properties Name, children, …
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
19
What the Infoset isn't It's not the DOM
Much higher level It's not about implementation or
interfacing at all But you can think of it as a data
structure if that helps It's not an SGML property set/grove
But it's close It doesn't have the entity problem
a mixed blessing, as we will see
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
20The Schema and the Infoset So crucially, schemas are about
infosets, not character sequences You could schema-validate a DOM
tree you built by hand! Using a schema which exists only as a
DOM tree ditto This simplifies things tremendously
but is hard to get your head around at first
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
21Basic XML Schema concepts Syntax is not the Schema Namespaces are fundamental But a schema is not a namespace Separation of tag from type Simple and Complex types Modular Schema construction Powerful type construction Local tag-type association Powerful wildcards Element equivalence classes Extension mechanism Documentation mechanism
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
22
Schema Walkthrough 1 A Toy Purchase Order schema 10
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
23
Types and Type Derivation For purposes of discussion,
consider only the content type aspects of types (attributes are analogous)
A content type definition (simple or complex) consists of a set of constraints on what's allowed as content.
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
24Permissions and obligations You can think of the type itself as the set
of strings/EIIs its constraints allow. It's helpful to think of constraints as composed of obligations and permissions: (\d )?(\d{3}-)?\d{3}-\d{4} regexp definition facet for [US] 'phone
number type the ? and the \d can be seen as
permissions, the - and the {3} as obligations 1 337-6818 and 207-422-6240 belong to this
type
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
25
Complex types (title?,forename*,surname) (shorthand for) content model for name
the ? can be seen as permission, the , and the 'surname' as obligations (at the end of the day, each component involves both permission AND obligation, but the balance of impact is as suggested)
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
26
Complex types, cont'd (title?,forename*,surname)
<name> <forename>...</forename> <surname>...</surname> </name>
and <name> <title>...</title> <surname>...</surname> </name>
are both members of this type
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
27
Restriction A type definition may be a restriction of
another type's definition if it reduces permissions, sometimes to the point of inducing obligations: \d[01]\d-\d{3}-\d{4} (a restriction (\d )?(\d{3}-)?\d{3}-\d{4} of US p#)
The membership of this type, which includes 207-422-6240 but not 1 337-6818
is a (proper) subset of the membership of the original type,
because by construction every member of the new type is a member of the original.
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
28
Restriction, cont'd Similarly,
(forename+,surname) is a restriction of the original type
definition for name (title?,forename*,surname)
and the same relation holds.
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
29
Restriction, cont'd Note first that
(forename+,surname) <name> <forename>...</forename> <surname>...</surname> </name>
is a member of the new type, but <name> <title>...</title> <surname>...</surname> </name>
is not.
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
30
Extension Now consider
(title?, forename*, surname, genMark?)
This type extends the original type definition for name. <name> <forename>Al</forename> <surname>Gore</surname> <genMark>Jr</genMark></name>
is an instance of this new type, but not of the original.
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
31
Any Finally note that the <any/>
content model particle, in all of its forms, introduces particularly broad permissions into complex content types.
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
32
Where are we headed? A number of design decisions can now
be stated: Should we make it easy to construct
type definitions which restrict or extend other type definitions, by specifying only the method of derivation and the differences between the source and derived type definitions?
The new proposal says 'yes', you do this by using the "source" and "derivedBy" attributes on your <type> or <datatype> element.
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
33
Datatype example Consider the simple type case first:
<datatype name='bodytemp' source='decimal'> <precision value='4'/> <scale value='1'/> <minInclusive value='97.0'/> <maxInclusive value='105.0'/> </datatype>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
34
Derived type<datatype name='healthyBodytemp' source='bodytemp'> <maxInclusive value='99.5'/> </datatype>
The healthyBodytemp type definition is defined by closing down the permitted range of bodytemp. We say it 'inherits' the other facets of bodytemp, so the 'effective type definition' of healthyBodytemp is
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
35
Effective type <datatype name='healthyBodytemp' source='decimal'> <precision value='4'/> <scale value='1'/> <minInclusive value='97.0'/> <maxInclusive value='99.5'/> </datatype>
Since it doesn't in general make sense to extend one simple type by another, the "derivedBy" attribute is actually redundant for <datatype>.
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
36Extension for complex types The next simplest case is extension
for complex types: <type name='name'> <element name='title' minOccurs='0'/> <element name='forename' minOccurs='0' maxOccurs='*'/> <element name='surname'/> </type>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
37
Derived type <type name='fullName' source='name' derivedBy='extension'> <element name='genMark' minOccurs='0'/> </type>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
38
The effective type <type name='fullName'> <element name='title' minOccurs='0'/> <element name='forename' minOccurs='0' maxOccurs='*'/> <element name='surname'/> <element name='genMark' minOccurs='0'/> </type>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
39Restriction for complex types Restriction for complex types is
harder to handle syntactically, because of the significance of linear order in content models, but the semantics are completely parallel to the simple type case:
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
40
Restriction example<type name='simpleName' source='name' derivedBy='restriction'> <restrictions> <element name='title' maxOccurs='0'/> <element name='forename' minOccurs='1'/> </restrictions> </type>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
41Restriction and Inheritance Just as in the <datatype> case, the
content model aspects not mentioned are left alone, including the "maxOccurs='*'" on <forename> and the whole particle for <surname>, so the 'effective content model' of 'simpleName' is
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
42
Effective type <type name='simpleName'> <element name='title' maxOccurs='0' minOccurs='0'/> <!-- i.e. forbidden --> <element name='forename' minOccurs='1' maxOccurs='*'/> <element name='surname'/> </type>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
43
Instances Given all the example definitions
above, all of <name><title>Ms</title><surname>Steinem</surname></name>
<name xsi:type='simpleName'> <foreName>Harry</foreName> <foreName>S</foreName> <surname>Truman</surname> </name>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
44
Another instance <name xsi:type='fullName'> <forename>Al</forename> <surname>Gore</surname> <genMark>Jr</genMark> </name>
all would be schema-valid per <element name='name' type='name'/>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
45Connecting Instances and Schemas Like I said
A schema is not a namespace The connection cannot be made rigid The draft identifies three layers, first is
schema-valid(EII,TypeName,ComponentSet)
The TypeName is a (namespaceURI,NCName) pair
The component set is made up of (namespaceURI,NCName,component) triples
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
46
Other layers Layer 2: transfer syntax Layer 3: web connections
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
47
Schema Walkthrough 2 The Schema for Datatypes 13
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
48
Schema Walkthrough 3 The Schema for Schemas 21
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
49
Change of Gear Let's look at the role of schemas in
supporting the layered architecture which is emerging all around us
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
50XML is ASCII for the 21st century ASCII (ISO 646) solved a fundamental
interchange problem for flat text documents What bits encode what characters
– (For a pretty parochial definition of 'character') UNICODE/ISO 10646 extends that
solution to the whole world XML thought it was doing the same for
simple tree-structured documents The emphasis in the XML design was on
simplifying SGML to move it to the Web XML didn't touch SGML's architectural
vision– flexible linearisation/transfer syntax– for tree-structured documents with internal links
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
51
Just what is XML? It's a markup language used for
annotating text It is concerned with logical structure
to identify sections, titles, section headers, chapters, paragraphs,…
It is not concerned with appearance you say 'this is a subtitle'
not 'this is in bold, 14pt, centered' you say 'this is an example'
not 'this is in verbatim, indented by 5pts, ragged right'
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
52Take Two: Just what is XML? It's a markup language used for
transferring data It is concerned with data models
to convert between application-appropriate and transfer-appropriate forms
It is not concerned with human beings It's produced and consumed by
programs
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
53
XML as UI A slogan of Adam Bosworth I interpret it in two ways:
At the client end– Use XML plus XSL as the basis for what the
user sees on his/her screen– Use XLinks from a master document to pull
together disparate sources of information At the server end
– Use XML as a uniform interface for any data source onto the web
– Not just documents, but E.g. Databases, process control information, stock quotes
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
54
Application data
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
55
Structured markup<POORDERHDR><DATETIME qualifier="DOCUMENT"> <YEAR>1996</YEAR> <MONTH>06</MONTH> <DAY>30</DAY> <HOUR>23</HOUR> <MINUTE>59</MINUTE> <SECOND>59</SECOND> <SUBSECOND>0000</SUBSECOND> <TIMEZONE>+0100</TIMEZONE> </DATETIME> <OPERAMT qualifier="EXTENDED" type="T"> <VALUE>670000</VALUE> <NUMOFDEC>2</NUMOFDEC> <SIGN>+</SIGN> <CURRENCY>USD</CURRENCY>. . .
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
56
What just happened!? The whole transfer syntax story just
went meta, that's what happened! XML has been a runaway success, on a
much greater scale than its designers anticipated Not for the reason they had hoped
– Because separation of form from content is right But for a reason they barely thought about
– Data must travel the web Tree structured documents are a useable
transfer syntax for just about anything So data-oriented web users think of XML as
a transfer mechanism for their data
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
57The Cambridge Communiqué A W3C Note resulting from a meeting
this August (http://www.w3.org/TR/schema-arch)
Signalled a widespread acceptance of layering:"XML has defined a transfer syntax for tree-structured documents;
"Many data-oriented applications are being defined which build their own data structures on top of an XML document layer, effectively using XML documents as a transfer mechanism for structured data; "
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
58
The Communiqué, cont'd Called for support in XML Schema for
specifying mapping between the XML document data model (or XML Infoset) and application-specific data models
XML Schema is a W3C recommendation-in-progress for definiing the structure of document families
A grammar for markup structure E.g.
artice -> title, subtitle?, section+
orPOORDERHDR -> DATETIME, ORDERAMT
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
59
Mapping between layers Fortunately, XML Schema is actually
notated in XML itself So there are elements defined for use
in schemas to define. . . Elements :-) Attributes Types
A type is a collection of constraints on element content and attribute values
A type may be either simple, for constraining string values complex, for constraining elements which
contain other elements
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
60
Type definition example<type name='personName'> <element name='title' minOccurs='0'/> <element name='forename' minOccurs='0' maxOccurs='*'/> <element name='surname'/> <attribute name='id' type='integer'/></type>
<element name='owner' type='personName'/>
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
61
Mapping between layers 2 We can think of this in two ways
In terms of an abstract data modelling language– Entity-Relation– UML– RDF
In concrete implementation terms– Tables and rows– Class instances and instance variables
The first is more portable The second more immediately
useful
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
62
Mapping between layers 3 Regardless of what approach we take,
we need A vocabulary of data model components An attachment of that vocabulary to schema
components Sample vocabularies
entity, relationship, collection table, row, column instance, variable, list, dictionary
Where should attachment be specified? In the schema
– convenient Outside it
– modular
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
63Specifying mapping in the schema Probably reasonable if done in high-
level (ER, UML) terms See example infoset-xmpl.xml,
infoset-uml.xsd
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
64Specifying mapping outside Requires some duplication of
structural information Encourages cross-language working See example infoset-xmpl.xsl
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
65
Take-home message The point at which idiosyncratic
scripting takes over can be moved one layer up
Using public consensual declarative standards is a Good Thing
Interoperability makes things better for everyone
Reuters Henry S. ThompsonXML Schema, London 1999-12-15
66
Overall Conclusion"Schemas are coming: Start using
them!" ____Tim Berners-Lee, 1999-11-
05