29
IS432 Semi-Structured Data Lecture 3: XSchema Dr. Gamal Al- Shorbagy

IS432 Semi-Structured Data Lecture 3: XSchema Dr. Gamal Al-Shorbagy

Embed Size (px)

Citation preview

IS432Semi-Structured Data

Lecture 3:

XSchema

Dr. Gamal Al-Shorbagy

In this lecture

• XML Schemas• Elements v. Types

• Regular expressions

• Expressive power

ResourcesW3C Draft: www.w3.org/TR/2001/REC-xmlschema-1-20010502

XML Schemas

• http://www.w3.org/TR/xmlschema-1/10/2000• generalizes DTDs• uses XML syntax• two documents: structure and datatypes

– http://www.w3.org/TR/xmlschema-1– http://www.w3.org/TR/xmlschema-2

• XML-Schema is very complex– often criticized– some alternative proposals

XML Schema Requirements• Structural

– namespaces– primitive types & structural schema integration– inheritance

• Data type– integers, dates, … (like in languages)– user-defined (constrain some properties)

• Conformance– processors, validity

Design Principles

• More expressive than DTDs

• Expressed in XML

• Self-describing

• Usable by various XML applications

• Simple enough

XML Schemas

<xsd:element name=“paper” type=“papertype”/>

<xsd:complexType name=“papertype”>

<xsd:sequence>

<xsd:element name=“title” type=“xsd:string”/>

<xsd:element name=“author” minOccurs=“0”/>

<xsd:element name=“year”/>

<xsd: choice> < xsd:element name=“journal”/>

<xsd:element name=“conference”/>

</xsd:choice>

</xsd:sequence>

</xsd:element>

<xsd:element name=“paper” type=“papertype”/>

<xsd:complexType name=“papertype”>

<xsd:sequence>

<xsd:element name=“title” type=“xsd:string”/>

<xsd:element name=“author” minOccurs=“0”/>

<xsd:element name=“year”/>

<xsd: choice> < xsd:element name=“journal”/>

<xsd:element name=“conference”/>

</xsd:choice>

</xsd:sequence>

</xsd:element>

DTD: <!ELEMENT paper (title,author*,year, (journal|conference))>

Elements v.s. Types in XML Schema

<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>

<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>

<xsd:element name=“person” type=“ttt”><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>

<xsd:element name=“person” type=“ttt”><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>

DTD: <!ELEMENT person (name,address)>

• Types:– Simple types (integers, strings, ...)

– Complex types (regular expressions, like in DTDs)

• Element-type-element alternation:– Root element has a complex type

– That type is a regular expression of elements

– Those elements have their complex types...

– ...

– On the leaves we have simple types

Elements v.s. Types in XML Schema

Local and Global Types in XML Schema

• Local type: <xsd:element name=“person”>

[define locally the person’s type] </xsd:element>

• Global type: <xsd:element name=“person” type=“ttt”/>

<xsd:complexType name=“ttt”> [define here the type ttt] </xsd:complexType>

Global types: can be reused in other elements

Local v.s. Global Elements inXML Schema

• Local element: <xsd:complexType name=“ttt”>

<xsd:sequence> <xsd:element name=“address” type=“...”/>... </xsd:sequence> </xsd:complexType>

• Global element: <xsd:element name=“address” type=“...”/>

<xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element ref=“address”/> ... </xsd:sequence> </xsd:complexType>

Global elements: like in DTDs

Regular Expressions in XML Schema

Recall the element-type-element alternation: <xsd:complexType name=“....”>

[regular expression on elements] </xsd:complexType>

Regular expressions:• <xsd:sequence> A B C </...> = A B C

• <xsd:choice> A B C </...> = A | B | C

• <xsd:group> A B C </...> = (A B C)

• <xsd:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*

• <xsd:... minOccurs=“0” maxOccurs=“1”> ..</...> = (...)?

Local Names in XML-Schema<xsd:element name=“person”> <xsd:complexType> . . . . . <xsd:element name=“name”> <xsd:complexType> <xsd:sequence> <xsd:element name=“firstname” type=“xsd:string”/> <xsd:element name=“lastname” type=“xsd:string”/> </xsd:sequence> </xsd:element> . . . . </xsd:complexType></xsd:element>

<xsd:element name=“product”> <xsd:complexType> . . . . . <xsd:element name=“name” type=“xsd:string”/>

</xsd:complexType></xsd:element>

<xsd:element name=“person”> <xsd:complexType> . . . . . <xsd:element name=“name”> <xsd:complexType> <xsd:sequence> <xsd:element name=“firstname” type=“xsd:string”/> <xsd:element name=“lastname” type=“xsd:string”/> </xsd:sequence> </xsd:element> . . . . </xsd:complexType></xsd:element>

<xsd:element name=“product”> <xsd:complexType> . . . . . <xsd:element name=“name” type=“xsd:string”/>

</xsd:complexType></xsd:element>

name hasdifferent meaningsin person andin product

Subtle Use of Local Names<xsd:element name=“A” type=“oneB”/>

<xsd:complexType name=“onlyAs”> <xsd:choice> <xsd:sequence> <xsd:element name=“A” type=“onlyAs”/> <xsd:element name=“A” type=“onlyAs”/> </xsd:sequence> <xsd:element name=“A” type=“xsd:string”/> </xsd:choice></xsd:complexType>

<xsd:element name=“A” type=“oneB”/>

<xsd:complexType name=“onlyAs”> <xsd:choice> <xsd:sequence> <xsd:element name=“A” type=“onlyAs”/> <xsd:element name=“A” type=“onlyAs”/> </xsd:sequence> <xsd:element name=“A” type=“xsd:string”/> </xsd:choice></xsd:complexType>

<xsd:complexType name=“oneB”> <xsd:choice> <xsd:element name=“B” type=“xsd:string”/> <xsd:sequence> <xsd:element name=“A” type=“onlyAs”/> <xsd:element name=“A” type=“oneB”/> </xsd:sequence> <xsd:sequence> <xsd:element name=“A” type=“oneB”/> <xsd:element name=“A” type=“onlyAs”/> </xsd:sequence> </xsd:choice></xsd:complexType>

<xsd:complexType name=“oneB”> <xsd:choice> <xsd:element name=“B” type=“xsd:string”/> <xsd:sequence> <xsd:element name=“A” type=“onlyAs”/> <xsd:element name=“A” type=“oneB”/> </xsd:sequence> <xsd:sequence> <xsd:element name=“A” type=“oneB”/> <xsd:element name=“A” type=“onlyAs”/> </xsd:sequence> </xsd:choice></xsd:complexType>

Arbitrary deep binary tree with A elements, and a single B element

Attributes in XML Schema

<xsd:element name=“paper” type=“papertype”/>

<xsd:complexType name=“papertype”>

<xsd:sequence>

<xsd:element name=“title” type=“xsd:string”/>

. . . . . .

</xsd:sequence>

<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>

</xsd:complexType>

<xsd:element name=“paper” type=“papertype”/>

<xsd:complexType name=“papertype”>

<xsd:sequence>

<xsd:element name=“title” type=“xsd:string”/>

. . . . . .

</xsd:sequence>

<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>

</xsd:complexType>

Attributes are associated to the type, not to the elementOnly to complex types; more trouble if we want to add attributesto simple types.

“Mixed” Content, “Any” Type

• Better than in DTDs: can still enforce the type, but now may have text between any elements

• Means anything is permitted there

<xsd:complexType mixed="true"> . . . .

<xsd:complexType mixed="true"> . . . .

<xsd:element name="anything" type="xsd:anyType"/> . . . .

<xsd:element name="anything" type="xsd:anyType"/> . . . .

“All” Group

• A restricted form of & in SGML• Restrictions:

– Only at top level– Has only elements– Each element occurs at most once

• E.g. “comment” occurs 0 or 1 times

<xsd:complexType name="PurchaseOrderType">

<xsd:all> <xsd:element name="shipTo" type="USAddress"/>

<xsd:element name="billTo" type="USAddress"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="items" type="Items"/>

</xsd:all>

<xsd:attribute name="orderDate" type="xsd:date"/>

</xsd:complexType>

<xsd:complexType name="PurchaseOrderType">

<xsd:all> <xsd:element name="shipTo" type="USAddress"/>

<xsd:element name="billTo" type="USAddress"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="items" type="Items"/>

</xsd:all>

<xsd:attribute name="orderDate" type="xsd:date"/>

</xsd:complexType>

Derived Types by Extensions <complexType name="Address">

<sequence> <element name="street" type="string"/>

<element name="city" type="string"/>

</sequence>

</complexType>

<complexType name="USAddress">

<complexContent>

<extension base="ipo:Address">

<sequence> <element name="state" type="ipo:USState"/>

<element name="zip" type="positiveInteger"/>

</sequence>

</extension>

</complexContent>

</complexType>

<complexType name="Address">

<sequence> <element name="street" type="string"/>

<element name="city" type="string"/>

</sequence>

</complexType>

<complexType name="USAddress">

<complexContent>

<extension base="ipo:Address">

<sequence> <element name="state" type="ipo:USState"/>

<element name="zip" type="positiveInteger"/>

</sequence>

</extension>

</complexContent>

</complexType>

Corresponds to inheritance

Derived Types by Restrictions

• (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions…

<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>

<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>

Corresponds to set inclusion

Simple Types

• String

• Token

• Byte

• unsignedByte

• Integer

• positiveInteger

• Int (larger than integer)

• unsignedInt

• Long

• Short

• ...

• Time

• dateTime

• Duration

• Date

• ID

• IDREF

• IDREFS

Facets of Simple Types

Examples

• length

• minLength

• maxLength

• pattern

• enumeration

• whiteSpace

• maxInclusive

• maxExclusive

• minInclusive

• minExclusive

• totalDigits

• fractionDigits

•Facets = additional properties restricting a simple type

•15 facets defined by XML Schema

Facets of Simple Types

• Can further restrict a simple type by changing some facets

• Restriction = subset

Not so Simple Types

• List types:

• Union types

• Restriction types

<xsd:simpleType name="listOfMyIntType">

<xsd:list itemType="myInteger"/>

</xsd:simpleType>

<xsd:simpleType name="listOfMyIntType">

<xsd:list itemType="myInteger"/>

</xsd:simpleType>

<listOfMyInt>20003 15037 95977 95945</listOfMyInt>

An Example Document (1/2)<?xml version=“1.0”?>

<purchaseOrder orderDate=“1999-10-20”>

<shipTo country=“US”>

<name>Matthias Hauswirth</name>

<street>4500 Brookfield Dr.</street>

<city>Boulder</city>

<state>CO</state>

<zip>80303</zip>

</shipTo>

<billTo country=“US”>

<name>Brian Temple</name>

<street>1234 Strasse</street>

<city>Boulder</city>

<state>CO</state>

<zip>80302</zip>

</billTo>

...

An Example Document (2/2) <comment>Brian pays</comment>

<items>

<item partNum=“123-AB”>

<productName>Porsche</productName>

<quantity>1</quantity>

<price>129400.00</price>

<comment>Need a new one</comment>

</item>

<item>

<productName>Ferrari</productName>

<quantity>2</quantity>

<price>189000.25</price>

<shipDate>1999-05-21</shipDate>

</item>

</items>

</purchaseOrder>

An Example Schema (1/3)<xsd:schema xmlns:xsd=“http://www.w3.org/1999/XMLSchema”>

<xsd:element name=“purchaseOrder” type=“purchaseOrderType”/>

<xsd:element name=“comment” type=“xsd:string”/>

<xsd:complexType name=“PurchaseOrderType”>

<xsd:element name=“shipTo” type=“AddressType”/>

<xsd:element name=“billTo” type=“AddressType”/>

<xsd:element ref=“comment” minOccurs=“0”/>

<xsd:element name=“items” type=“ItemsType”/>

<xsd:attribute name=“orderDate” type=“xsd:date”/>

</xsd:complexType>

...

An Example Schema (2/3)

<xsd:complexType name=“AddressType”>

<xsd:element name=“name” type=“xsd:string/>

<xsd:element name=“street” type=“xsd:string”/>

<xsd:element name=“city” type=“xsd:string”/>

<xsd:element name=“state” type=“xsd:string”/>

<xsd:element name=“zip” type=“xsd:decimal”/>

<xsd:attribute name=“country” type=“xsd:NMTOKEN”

use=“fixed” value=“US”/>

</xsd:complexType>

<xsd:simpleType name=“SkuType” base=“xsd:string”>

<xsd:pattern value=“\d{3}-[A-Z]{2}”/>

</xsd:simpleType>

...

An Example Schema (3/3) <xsd:complexType name=“ItemsType”>

<xsd:element name=“item” minOccurs=“0” maxOccurs=“unbounded”>

<xsd:complexType>

<xsd:element name=“productName” type=“xsd:string/>

<xsd:element name=“quantity”>

<xsd:simpleType base=“xsd:positiveInteger”>

<xsd:maxExclusive Value=“100”/>

</xsd:simpleType>

</xsd:element>

<xsd:element name=“price” type=“xsd:decimal”/>

<xsd:element ref=“comment” minOccurs=“0”/>

<xsd:element name=“shipDate” type=“xsd:date” minOccurs=“0”/>

<xsd:attribute name=“partNum” type=“SkuType”/>

</xsd:complexType>

</xsd:element>

</xsd:complexType>

</xsd:schema>

XSchema-Tools• XML Schema-aware Parser

– Xerces-J– Oracle XML Schema Processor

• XML Schema Validator (XSV, online)• DTD to Schema Conversion Tools• XML Schema Editor

– Extensibility’s XML Authority

• XML Schema-aware Instance Editor– Extensibility’s XMLInstance– ChannelPoint’s Merlot (maybe in future)

Summary of XML Schema

• Formal Expressive Power:– Can express precisely the regular tree

languages (over unranked trees)

• Lots of other stuff– Some form of inheritance– A “null” value– Large collection of data types