45
Modul 2: XML Schemadefinition a.Univ.-Prof. Dr. Werner Retschitzegger Vorlesung IFS in der Bioinformatik SS 2011 Johannes Kepler University Linz www.jku.ac.at Johannes Kepler University Linz www.jku.ac.at Institute of Bioinformatics www.bioinf.jku.at Institute of Bioinformatics www.bioinf.jku.at Information Systems Group www.ifs.uni-linz.ac.at IFS IFS Information Systems Group www.ifs.uni-linz.ac.at M2-2 XML Schemadefinition XML Schema Namespaces XML 1.0 Introduction © 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Outline Introduction Motivation for XML Document Markup Languages Application Areas for XML XML 1.0 Namespaces XML Schema The following slides are based (among others) on: Elliotte Rusty Harold, W. Scott Means, XML in a Nutshell: A Desktop Quick Reference, 3rd Edition, O'Reilly & Associates, 2005

M2 XML Schemadefinition - bioinf.jku.at Introduction XML 1.0 Namespaces XML Schema XML Schemadefinition © 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme

Embed Size (px)

Citation preview

Modul 2:

XML Schemadefinition

a.Univ.-Prof. Dr. Werner Retschitzegger

Vorlesu

ng

IFS in der B

ioinformatik

SS 2011

Johannes Kepler University Linzwww.jku.ac.at

Johannes Kepler University Linzwww.jku.ac.at

Institute of Bioinformaticswww.bioinf.jku.at

Institute of Bioinformaticswww.bioinf.jku.at

IFSIFSInformation Systems Group

www.ifs.uni-linz.ac.at

IFSIFSIFSIFSInformation Systems Group

www.ifs.uni-linz.ac.at

M2-2

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Outline

IntroductionMotivation for XMLDocument Markup LanguagesApplication Areas for XML

XML 1.0NamespacesXML Schema

The following slides are based (among others) on:Elliotte Rusty Harold, W. Scott Means, XML in a Nutshell: A Desktop Quick Reference, 3rd Edition, O'Reilly & Associates, 2005

M2-3

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Motivation for XML 1/5From HTML to XML

"If I invent another programming language, its name will contain the letter X."

(N. Wirth, Software Pioniere Konferenz, Bonn 2001)

223 Mio.SQL

252 Mio.ABC

20,6 K“Werner Retschitzegger”

237 Mio.Soccer

603 Mio.XML

2,2 Mrd.Love

Google Indicator:

... as of Sep/16/08

M2-4

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Motivation for XML 2/5From HTML to XML

Brian Kerningham: "The problem with HTML-WYSIWYG is thatwhat you see is all you've got"

HTML (HyperText Markup Language) is the "Lingua Franca" for representing Hypertext Documents at the WebStandardized 1989 by W3C (World Wide Web Consortium)Basic concept: "Markup" in terms of "Tags"

DrawbacksRestricted number of pre-defined tags

permanent extensions with proprietary tags

Tags primarily describe layout aspectshardens Web search

M2-5

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Motivation for XML 3/5From HTML to XML

<h1>PDACatalog</h1><h2>Nokia 8210</h2><table border="1"><tr><td>Battery</td><td>900mAh</td></tr><tr><td>Weight</td><td>141g</td></tr> …</table>

HTML describes layout of content<PDACatalog><Producer name="Nokia"><PDA name="8210"><Battery>900mAh</Battery><Weight>141g</Weight>

…</PDA></Producer></PDACatalog>

XML describes structure and semantics of content

Tim Bray, Co-Editor of XML 1.0:"XML will become the ASCII of the 21st century -

basic, essential, unexciting"

PDA-Catalog

BatteryWeight

PDA-Catalog

M2-6

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Motivation for XML 4/5Features of XML

Layout IndependenceSeparation of structure and semantics of the content from its layout

Platform and Vendor IndependenceEndorsed by the W3C

InternationalityBased on the UNICODE-Standard

ExtensibilityTags can be defined and named arbitrarily – meta language

StructurabilityTags can be nested arbitrarily

Semi-structuredContent can contain fully structured parts and fully unstructured parts

Self-describingTags describing structure and semantics of the content are... for humans: relatively easy to read and edit... for machines: easy to generate and parse

X-Technology InfrastructureW3C provides a set of XML-based standards – „XML Standards Family“

Correctness ProofOptionally, XML documents can be proofed for correctness

M2-7

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Well-formednesssyntactical properties, e.g.:

At least 1 tag per documentExactly 1 root tagTags have to be none-overlappingEach tag has to havean end tag....

XML-Processors parse XML documents and checkeither solely well-formedness (non-validating processors)or also validity (validating processors)

Can be called from within an application (e.g., browser)Decompose an XML document into its parts forming a tree, which allows to access its parts from within an application

ValidityXML document is well-formedand corresponds to a schemaSchema defines vocabulary and grammarAlternatives: DTD orXML Schema-StandardApplication

DocumentpartsErrors

Catalog.DTD

XML Processor

ParserEntityManagerPDACatalog1.XML

PDA

XML-Document

FeaturesEntities

Motivation for XML 5/5

Properties of XML Documents and XML Processors

M2-8

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Document Markup Languages 1/4History

Vannevar Bush 1945 MemexDouglas Engelbart 1962 AugmentTed Nelson 1965 XanaduWilliam Tunniclife (GCA) 1967 GenCodeGoldfarb, Mosher, Lorie (IBM) 1969 GML (Generalized Markup Language)ANSI 1978 Standardisierung (GenCode & GML)Charles GoldfarbISO 1986 SGML (Standard Generalized Markup

Language - ISO 8879)Tim Berners-Lee (CERN) 1989 HTML (Hypertext Markup Language)Mark Andreessen (NCSA) 1993 HTML-Forms (XMosaic)Netscape, Microsoft 1994 HTML-DerivationsJon Bosak, Tim Bray, 1996 XML Working Group James Clark et al. (W3C)

10. 2. 1998 XML 1.029. 9. 2006 XML 1.1, 2nd Edition

M2-9

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Document Markup Languages 2/4

Memexhttp://www.ps.uni-sb.de/~duchier/pub/vbush/vbush-all.shtml

M2-10

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

SGMLXML Meta Level

XHTML Language Level(e.g. DTDs)HTMLMathML

Instance Level(documents)

e iπ +1= 0n

f (n) = Σ kk=1

WMLz.B.

z.B.

M2

M1

M0

[www.omg.org]

Document Markup Languages 3/4

XML and OMG’s Metadata Architecture

M2-11

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Document Markup Languages 4/4XML versus ...

... SGMLXML vs. SGML (60 pages vs. 600 pages)XML has 20% of SGML’s complexity, but 80% of its functionalityXML documents are conform to an ISO revision of SGML -WebSGML (Annex to the SGML-Standard ISO8879)

... HTMLXML is complementary to HTML (semantic and structure vs. layout)XML is not backward compatible to HTMLSimple conversion from HTML documents to XML

... XHTML= Extensible HTMLW3C Recommendation Aug. 2002 (2nd edition)HTML 4.01 as an „XML application“, i.e. HTML was described bymeans of a XML-DTD

M2-12

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Application Areas of XML 1/4

Three Main Application Areas

Data Exchange ("Portable Data")Using XML solely as an exchange format orUsing also a common schema

Multi-DeliveryOne and the same content can be delivered to different end user devices

Intelligent RetrievalInstead of a simple keyword search on basis of HTML documents, structure-based search on basis of XML documents

"Mozart" -

Componist or chocolate

ball?

M2-13

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

[http://www.oasis-open.org/cover/xml.html#applications]

XML-DTDs for ...Literature "Gutenberg"Travel "openTravel"News "NewsML"Marketing "adXML"Weather "OMF"Human Resources "XML-HR"Voice Applications "VoxML"Vector Graphics "SVG"Mobile Applications "WML"Geo Applications "ANZMETA"Health Care "HL7"Mathematics "MathML”Banking "MBA”eGovernment “eGovML”

Electronic CommerceCBL: Common Business

Library (Commerce One)

BizTalk: MicrosoftcXML: Commerce XMLRosettaNet:Format for Online-

OrdersebXML: OASIS + XML/EDIFnXML: Financial Products

Markup Language...

Application Areas of XML 2/4

Industrial Sectors – "Verticalisation of XML"

M2-14

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Application Areas of XML 3/4

Sources of XML Data

Inter-application and mobile devices communication data

e.g., Web Services

Logs and Blogse.g., RSS

Metadatae.g., Schema, WSDL, XMP

Presentation datae.g., XHTML

Documentse.g., Word

Views of other sources of datae.g., Relational, LDAP, CSV, Excel, etc.

Sensor data

M2-15

XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XMLXML language concepts incl. DTD

XML NamespacesSupport of a global identification schema for element names and attribute names

XPath (XML Path Language)Path expressions for navigation in XML documents

XML SchemaXML-based language for the definition of XML schemata

XLink, XPointerXML-based language for the linking of (parts of) XML documents

XSL (Extensible Stylesheet Language)XSLT: Transformation of XML documents (declarative)XSL-FO: Rendering of XML documents (declarative)

DOM (Document Object Model)API for accessing XML documents in a procedural manner

W3C Standardization Levels:(1) Note(2) Working Draft (WD)(3) Candidate Recommendation (CR)(4) Proposed Recommendation (PR)(5) Recommendation (REC)

Application Areas of XML 4/4XML Standardization Family (excerpt)

„It takes ten minutes to understand (base) XML, but then ten month to understand the new technologies hung around it. „

(Peter Chen)

M2-16© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

Outline

IntroductionXML 1.0

XML DocumentDTDEntities

NamespacesXML Schema

M2-17© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

XML Document 1/3

Running Example: PDACatalog

<?xml <?xml version="1.0" version="1.0" encoding="UTF-8"?>><<PDACatalogPDACatalog>><!<!---- NOKIA NOKIA ---->>

<Producer<Producer name="NOKIA"name="NOKIA">><<ProducerNoProducerNo no="h1234"no="h1234"/>/><PDA<PDA name="7110"name="7110">><Weight><Weight>141g141g</Weight></Weight><Price <Price contract=contract=““yes"yes">>999999</Price></Price><Price <Price contract=contract=““no"no">>49994999</Price></Price>

</PDA></PDA><PDA<PDA name="8210"name="8210">>... ...

</PDA></PDA></Producer></Producer></</PDACatalogPDACatalog>>

“Root Element" or“Document Element"

Prologue (optional)"xml declaration"

Comment

Start Tag

End Tag Attribute

Attribute Value

Elementname

Text“Character Data"

“Element Content"of <Producer>

“Empty Element"Subelement

PDACatalog1.XMLPDACatalog1.XML

“Mixed Content"

M2-18© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

XML Document 2/3

Elements and Attributes

Element- and attribute names have to be valid "XML Names"[ letter | _ | : ] [ letter | '0..9' | '.' | '-' | '_' | ':' ]*

"letter": A-Z, a-z, and others like ä, ê ς

':' reserved for namespaces

No length restriction

Case-sensitive

Empty elements can be represented in long form or short form

<ProducerNo no="h1234"></ProducerNo> or<ProducerNo no="h1234"/>

Attribute values must be enlosed by quotation marks<PDA name='8210'> or<PDA name="8210">

M2-19© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

XML Document 3/3

Comments

Can stretch across multiple rowsBetween start tag and end tag of an elementBefore or after the root element

RestrictionsComment within a tag not allowedNesting of comments not allowed"--" within a comment not allowed

<!--A comment may comprisealso <tagNames> or&entities;-->

...

M2-20© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

A DTD defines vocabulary and grammar for a set of XML documentsAn XML document is allowed to reference a single DTD only("document type declaration - DOCTYPE")

A DTD has to be referencedAFTER the prologuebut BEFORE the root element

A DTD does NOT DEFINE the rootelement of a XML document

The root element is rather definedwithin the XML document itselfusing the DOCTYPE-DeclarationCan be an arbitrary element of the DTD

DTD 1/8Purpose and Characteristica

<?xml version="1.0"?><?xml version="1.0"?><!DOCTYPE <!DOCTYPE PDACatalogPDACatalog ......<<PDACatalogPDACatalog>>..........

PDACatalog1.XMLPDACatalog1.XML

Catalog.DTDCatalog.DTD

Root ElementDefinitionUsage

M2-21© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

DTD 2/8Incorporating DTD’s into XML Documents – 3 Alternatives

1. External DTD, i.e., a dedicated file (*.dtd) identified by means of an URI ("external subset") <!DOCTYPE PDACatalog SYSTEM "Catalog.dtd">

2. Internal DTD, i.e., defined within the XML document ("internal subset")<!DOCTYPE PDACatalog […]>

3. External & internal DTD, i.e., internal complements external

Excursus – URL vs. URI:An URL (Uniform Resource Locator) identifies Internet resources on basis of their location using the Domain Name Service (DNS)An URI (Uniform Resource Identifier) identifies arbitraryresources on basis of their names (z.B. ISBN#) or otherproperties of the resourceEach URL is a valid URI

M2-22© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

DTD 3/8Example – Catalog.dtd

<!-- Catalog DTD Version 1.0 --><!ELEMENT PDACatalog (Producer*)><!ELEMENT Producer (ProducerNo, PDA+)><!ATTLIST Producer name CDATA #REQUIRED><!ELEMENT ProducerNo EMPTY><!ATTLIST ProducerNo no ID #REQUIRED><!ELEMENT PDA (Weight, Price+)><!ATTLIST PDA name CDATA #REQUIRED><!ELEMENT Weight (#PCDATA)><!ELEMENT Price (#PCDATA)><!ATTLIST Price contract (yes|no) "no">

Weight

ProducerNono

*

1..*

Pricecontract

PDAname

PDACatalog

Producername

1

1 1..*

UML Class Diagram XML DTD

XML ElementXML Attribute

Legend:1 : exactly once1..* : once or several times* : 0 or several times

: part-of

M2-23© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

DTD 4/8Element Declaration <!ELEMENT element name

(Content Model)>

Sequence <!ELEMENT Producer (ProducerNo, PDA+)>

Alternative <!ELEMENT Battery (LiIo | NiMh | NiCd)>

CardinalityOptional (0 or once)

<!ELEMENT PDA (Comment?)>

Null or several times <!ELEMENT PDACatalog (Producer*)>

Once or several times<!ELEMENT Producer (PDA+)>

Content model can be nested by means of paranthesis<!ELEMENT div1 (head, (p | list | note)*, div2*)>

M2-24© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

DTD 5/8Element Declaration

Empty ElementElement may contain attributes, but neither text nor subelements

<!ELEMENT ProducerNo EMPTY>

Element ContentElement contains subelements and optional attributes but no text

<!ELEMENT PDACatalog (Producer*)>

Mixed ContentElement contains text and optional subelements or attributes

<!ELEMENT Price (#PCDATA)> <!ELEMENT Price (#PCDATA | Category | Discount)*>

Element with arbitrary contentContent not exactly specified in DTDUsed elements have to be declared anyway

<!ELEMENT Comment ANY>

M2-25© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

DTD 6/8Attribute Declaration

<!ATTLIST element nameattributename1 type defaultattributename2 type default...>

Attribute names must be unique within an element

Default specificationsNOT NULL #REQUIREDOptional Value #IMPLIEDDefault Value [#FIXED] "value"

M2-26© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

CDATAString<!ATTLIST Producer name CDATA #REQUIRED>

ID, IDREF(S)ID ensures uniqueness of attribute values within a documentPer element 1 attribute of type ID allowed onlyIDREF is a reference to an attribute of type ID

„Referential integrity“ (untyped!) is checked by XML processorValues of ID- and IDREF(S)-attributes must be valid XML names, i.e., starting numbers are not allowed

DTD 7/8Attribute Declaration – 10 Types

<!ATTLIST Exampleidentity ID #IMPLIEDreference IDREF #IMPLIED>

M2-27© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

DTD 8/8Attribute Declaration – 10 Types

Enumeration TypeA pre-defined set of values consisting of XML name tokens<!ATTLIST Price contract (yes|no) "no">

ENTITY, ENTITIESAttribute value is the name of a declared non-parsed Entity<!ATTLIST Image filename ENTITY #REQUIRED>

NMTOKEN(S)"XML name tokens” are an extended form of XML namesIn addition, they can start with "0..9 ", ". " and "-"<!ATTLIST journal year NMTOKEN #REQUIRED>

NOTATIONAttribute value is the name of a declared notation – seldomlyused<!ATTLIST image type NOTATION (gif | tiff) #REQUIRED>

M2-28© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

Entities 1/9Overview

General EntitiesUsage in XML documents

Parameter EntitiesUsage in DTDs

Pre-definedReplacement of XML-specific char’s

UnicodeReplacement of none-ASCII-char’s

User-definedReplacement of document parts

Internalembedded

Externalfile

Parsed

Non-parsedInternal External

Referenceable, named parts ofXML documents (plain text, markup or other arbitrary formats) or a DTD

Purpose: Character replacement – macros, modularisationProcessing: References are expanded during parsing

M2-29© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

Purpose: Representation of XML specific characterse.g. <> – "escaping"

5 pre-defined Entities &amp; & (ampersand)&lt; < (less than)&gt; > (greater than)

Example<formular>x &lt; y</formular>

UsageAs element value or attribute value

Alternative: CDATA-SectionExample:<formular>x <![CDATA[<]]> y</formular>“Within” CDATA only its end is recognized (']]>')CDATA-Sections cannot be nested

Entities 2/9Pre-defined Entities

&apos; ' (apostrophe)&qout; " (quotation mark)

Interpreted as plain text,NOT as markup

M2-30© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

PurposeRepresentation of characters, notavailable at the keyboardhttp://www.unicode.org/

Unicode classifies characters into letters, numbers, punctuations, symbols (general, technical, mathematical), etc.

Unique assignment of charactersto numbersSupports 25 living languages (Cyrillic, Hebrew, Hiragana, ...)All in all approx. 50.000 different characters

UsageAs element value or attribute valueArbitrary Unicode-characters arereferenced via their numbers(decimal or hexadecimal)

Entities 3/9Unicode ("Character Encoding") Entities

&#251; &#xFB; and ©all represent the samecharacter

M2-31© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

Text or well-formed markup is associated with a name

Declaration within the DTD:

UsageAs element value or attribute value of the XML documentIn entities themselves – but cyclic references are forbidden

Entities 4/9User-Defined Internal Entities

<!ENTITY entityName "replacementText or Markup">

&entityName;

M2-32© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

PurposeDecomposition of the XML document (similar to SSI – Server Side Include-mechanism) Because of the document’s size or for reuse

Declaration within the DTD

CharakteristicaIn principal well-formed, but may contain multiple root elementsReference to a DTD not allowed

UsageSyntax analogous to internal entitiesAs element values of the XML document and within entities themeselvesCyclic references forbiddenNOT within attribute values

Entities 5/9User-Defined External Parsed Entities

<!ENTITY entityName SYSTEM "URI">

M2-33© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

PurposeReferences to files with arbitrary formats, e.g. ASCII, not-wellformed XML, GIF, JPEG, QuickTime Movies

NDATA defines a "non-parsed" Entity and specifies an arbitrary file formata NOTATION-declaration is necessary to identify a corresponding application (via an URI), which is able to process files of thisformat

UsageOnly as attribute value of type ENTITYSyntax: entity name within quotation marks (Note: NO &...;)Processor informs the application only that there exists a non-parsed entity at a certain location – no expansion!

(More expressive) Alternative: W3C’s XLink-Standard

Entities 6/9User-Defined External Non-Parsed Entities

<!ENTITY entityName SYSTEM "URI" NDATA formatName><!NOTATION formatName SYSTEM "URI">

M2-34© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

Entities 7/9User-Defined Entities – Example

<?XML version="1.0"?><!DOCTYPE PDACatalog SYSTEM ”Catalog.dtd" [<!ENTITY linkNokia "http://www.nokia.de/8210"><!ENTITY address "<town>Linz</town>"><!ENTITY features SYSTEM "feat8210.XML"><!ENTITY bildNokia SYSTEM "/pictures/8210.jpg"

NDATA jpeg><!NOTATION jpeg SYSTEM "image/jpeg">…<!ATTLIST Image filename ENTITY #REQUIRED>]>…<PDA name="8210">

<Picture><Image filename="bildNokia"/></Picture><ProducerInfo>&linkNokia;</ProducerInfo>…&features; &address;

</PDA> …

Dec

lara

tion

Usa

ge

internal

external, parsedexternal, non-parsed

Usage aselement value

Usage asattribute value

M2-35© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

Entities 8/9Parameter Entities

<!ENTITY % Battery"(type, capacity)"

>

<!ELEMENT PDABatt %Battery;><!ELEMENT camcorderBatt %Battery;>

Internal<!ENTITY % linkNokia

SYSTEM "http://nokia.de">

%linkNokia;

External

PurposeModularization of DTDs

Syntactical difference to General Entities% blank included for declaration% blank excluded for usage

Definition of ...Name and content model of elementsAttribute declaration

M2-36© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition

Entities 9/9Parameter Entities – Overriding

<!ENTITY % residental_content"address,rooms">

External DTD

Internal DTD of a XML document<!ENTITY % residental_content

"address,rooms,baths">

A Parameter Entity defined within an external DTD can bearbitrarily overriden within the internal DTD of a XML documentThis allows to adapt the external DTD to the requirementsof single XML documents without having to change theexternal DTDThus, the Parameter Entity is used as a kind of "Customization Hook"

M2-37© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Outline

IntroductionXML 1.0NamespacesXML Schema

M2-38© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Namespaces 1/5

A XML namespace (NS) allows a unique global identification of elments and attributes

W3C-REC "Namespaces in XML", 14th Jan. 1999 (13 pages)

For this, elements and attributes of a domain (e.g. MathML) are assigned to one or more NS

XSL uses, e.g., different namespaces for XSLT and XSL-FO

A NS is represented by an URINeeds not directly refer to the corresponding vocabularyThus, provides a level of indirection which allows to decouple thelocation of the vocabulary from the unique identifier – the URI

The associated elements and attributes have to be qualifiedby means of this URI in case of usage, thus being madeglobaly unique

This allows the reuse and especially the combination(„mixture“) of different vocabularies

M2-39© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Namespaces 2/5NS with Prefix vs. Default NS

BUT: URIs cannot be used for direct qualificationThis is since URIs normally contain characters which are not allowed as part of valid XML names (e.g., " / ", " & ")

Instead, user-defined prefixes have to be used

One ore more NS are declared on basis of the pre-definedattribute xmlns

This attribute can be defined in the context of any element of the DTD

The name of the element itself where the NS has been declared as well as direct and indirect subelements and attributes can be qualified withthe NS – „NS-inheritance“

Default NSAlso declared via the pre-defined attribute xmlns – BUT – only 1 per element, and without declaring any prefixNone-qualified subelements are automatically associated with thedefault NS, attributes NOT Can be overriden within subelements

M2-40© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Namespaces 3/5Declaration and Usage

...<edi:HC

xmlns:edi='http://ecommerce.org/schema'xmlns='http://www.mobildev.com/schema'>

<model name="8210"><edi:price edi:units='Euro'>32.18</edi:price><price währung='USD'>25.16</price>...</model>...

</edi:HC>

NS Prefix (optional) URI of the NSPre-defined Attributefor NS Declaration

Default-NS(no Prefix)

The NS of the element edi:price is http://ecommerce.org/schemaThe NS of the elements model and price is the default NShttp://www.mobildev.com/schemaThe attributes name and währung have NO NS associated with

M2-41© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Namespaces 4/5... and DTDs

NS are in principle independent of DTDsCan be used in documents with or without DTDs

BUT:All elements and attributes which are qualified in the XML document must also be declared appropriately within the DTDHuge Overhead – this is since DTD’s are not aware of NS<edi:HC> ... <!ELEMENT edi:HC (....)><edi:price> ... <!ELEMENT edi:price (#PCDATA)>

What can be done is to specify a default NS within the DTD<!ATTLIST edi:HC xmlns

CDATA #FIXED 'http://www.mobildev.com/schema'>

M2-42© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Namespaces 5/5Exemplary NS-URIs

RDF http://www.w3.org/1999/02/22-rdf-syntax-ns#http://www.w3.org/2000/01/rdf-schema#

MathML http://www.w3.org/1998/Math/MathML

XHTML http://www.w3.org/1999/xhtmlSMIL http://www.w3.org/TR/REC-smil

XSL http://www.w3.org/1999/XSL/Transformhttp://www.w3.org/1999/XSL/Format

M2-43© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Outline

IntroductionXML 1.0NamespacesXML Schema

IntroductionElements and AttributesPre-defined DatatypesUser-defined DatatypesKeysSchema CompositionSchema Modeling StylesComparison DTD – XML Schema

M2-44© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

IntroductionDTD versus XML Schema 1/2

Drawbacks DTDsProprietary syntaxFew datatypes, in fact onlyone – StringGlobal definition of elementsParameter Entities for modularization & overridingID, IDREF(S): Severe restrictions

Advantages XML SchemaXML as syntaxNumerous pre-defineddatatypesUser-defined simple andcomplex datatypesInheritanceKeys, references:flexible concept

XML SchemaDefinition of the structure of XML documentsW3C REC May 2001, approx. 420 pagesW3C REC 2nd edition October 2004

M2-45© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

<?xml version="1.0"?><schema ...>

<simpleType name="producerNoType"> ...<element name="PDACatalog">

<complexType><sequence>

<element name="Producer" minOccurs="0" maxOccurs="unbounded"><complexType>

<sequence><element name="ProducerNo"

type="hc:producerNoType" minOccurs="1" maxOccurs="1"/><element name=„PDA" minOccurs="1" maxOccurs="unbounded">

<complexType><sequence>

<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name="Battery" type="string" minOccurs="1" maxOccurs="1"/>

</sequence> ...</schema>

Catalog.xsdCatalog.dtd

IntroductionDTD versus XML Schema 2/2

...<!ELEMENT PDACatalog (Producer*) ><!ELEMENT Producer (ProducerNo, PDA+)><!ELEMENT PDA (Weight, Battery)> <!ELEMENT Weight (#PCDATA)><!ELEMENT Battery (#PCDATA)> ...

M2-46© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Namespace for own VocabularyNamespace (NS) of the vocabulary to be defined can be declared by means of attribute targetNamespace (optional!)

NS of the XML Schema-Standard VocabularyDeclaration is obligatory!Additional NS (i.e., vocabularies) can be incorporated

A single NS can be defined as Default–NSEither own NS, XML Schema–NS or other NSFor all other NS used, a prefix is obligatory

<?xml version="1.0"?><schema targetNamespace="http://www.ifs.uni-linz.ac.at/hc"

xmlns:hc="http://www.ifs.uni-linz.ac.at/hc"xmlns="http://www.w3.org/2001/XMLSchema"attributeFormDefault="qualified"elementFormDefault="qualified">...

IntroductionDeclaration of Namespaces in the Schema

M2-47© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Schema of a XML document is defined within the root element via the attribute schemaLocation

1. Part: targetNamespace of the schema

2. Part: location of the schema document

Catalog.xsd

Catalog1.xml

<?xml version="1.0"?><schema targetNamespace="http://www.ifs.uni-linz.ac.at/hc"

xmlns:hc="http://www.ifs.uni-linz.ac.at/hc"xmlns="http://www.w3.org/2001/XMLSchema"attributeFormDefault="qualified"elementFormDefault="qualified">...

<?xml version="1.0"?><PDACatalog xsi:schemaLocation="http://www.ifs.uni-linz.ac.at/hc Catalog.xsd"

xmlns="http://www.ifs.uni-linz.ac.at/hc"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance“

>...

IntroductionUsage of NS in the XML Document

xsi:noNamespaceSchemaLocation= "directPathToXSD_File"

M2-48© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Element

Attribut

Global DefinitionDirect subelement of schemaNOTE: the root element of the XML document is required to be defined as global element!

Local DefinitionDefinition on an arbitrary nesting level

Analoguosly for Datatypes!

<element name="name" type="type" minOccurs="int" maxOccurs="int|unbounded"... />

Simple orComplex Type

Cardinality: Upper/Lower Bounds(only in “local” elements)

<attribute name="name" type="type" use="how-its-used" default/fixed="value"... />

Values: required,optional, prohibited(only in “local” attributes)

only relevant, if“use” is not defined

Simple Type

Elements and Attributes 1/3Global / Local Definition

M2-49© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Global or Local Datatypes

Reference to an existing Element or Attribute

<element name="name" minOccurs="int" maxOccurs="int|unbounded"...><complexType>…</complexType>

</element>

<element ref="name" minOccurs="int" maxOccurs="int|unbounded".../>

<attribute name="name" use="how-its-used" default/fixed="value"...><simpleType>...</simpleType>

</attribute>

<attribute ref="name" use="how-its-used" default/fixed="value".../>

Elements and Attributes 2/3Global / Local Datatypes and References

M2-50© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

<schema ...><element name="Producer"><complexType><sequence><element name="ProducerNo" type="hc:producerNoType"

minOccurs="1" maxOccurs="1"/><element ref="hc:PDA" maxOccurs="unbounded"/>

</sequence><attribute name="name" type="string" use="required"/>

</complexType></element><element name="PDA"><complexType><sequence><element name="Weight" type="string"/><element name="Battery" type="string"/>

</sequence></complexType>

</element><simpleType name="producerNo"> …

Global Element,local Datatype

Reference to a global Element

Local Element,global Datatype

Global Element,local Datatype

Local Element,pre-def. Datatype

Local Attribute,pre-def. Datatype

Elements and Attributes 3/3Summarizing Example – Global/Local

Orthogonality of Concepts:

M2-51© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

string boolean float double duration dateTime

time date gYear gMonthDay

gDaygYearMonth

anyType

anySimple Type(all complex types)

gMonth hexBinary

base64Binary

anyURI

QName NOTATION

normalizedString

token

language NMTOKEN Name

NMTOKENS NCName

ID IDREF ENTITY

IDREFS ENTITIES

decimal

integer

nonPositiveInteger nonNegativeInteger

negativeInteger positiveInteger unsignedLong

unsignedInt

unsignedShort

unsignedByte

long

int

short

byte

(W3C REC, 28th Oct. 2004)

Primitive (atomic)Derived

Pre-Defined Datatypes 1/4

M2-52© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Because of backward-compatibilityreasons, usable only as types forattributes

Pre-Defined Datatypes 2/4String Datatypes

string

anySimpleType

hexBinary

base64Binary

anyURI

QNameNOTATION

normalizedString

token

language

NMTOKEN

NameNMTOKENS

NCName

ID IDREF ENTITY

IDREFS ENTITIES

Pre-defined primitive TypesPre-defined derived Types

Backward-compatibility to DTDs

Normalized String with whitespace replacement. Each Tab, Linefeed and CR is replaced by Blank.

"Tokenized" String – all whitespace characters are replaced by blanks, all starting and ending blanks are deleted and multiple consecutive blanks are replacedby a single one.

Standardized language codes (e.g. en, en-US, de, de-DE)

Name token: String without blanks (z.B. "CMS", "234234")

XML-Name: must start with letter, ":" or "-" (e.g., "CMS", "-1")

Name without prefix

String-Datatype withoutWhitespace-Replacement

Binary string-encodedDatatypes

Qualified name: supports the usageof names with NS-prefix

M2-53© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Pre-defined Datatypes 3/4Numerical Datatypes

floatdouble

anySimpleType

decimal

integer

nonPositiveInteger nonNegativeInteger

negativeInteger positiveInteger unsignedLong

unsignedInt

unsignedShort

unsignedByte

long

int

short

byte

Pre-defined primitive TypesPre-defined derived Types

Decimal Numbers: decimal separator ".", "+" or "-" possible.

64, 32, 16 or 8 Bit

Floating Point Numbers: simple (32 Bits) and double(64 Bits) precision

boolean

M2-54© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Pre-defined Datatypes 4/4Date- and Time Datatypes

duration dateTimetime date gYear gMonthDay gDaygYearMonth

anySimpleType

gMonth

"CCYY-MM-DDThh:mm:ss"

"CCYY-MM-DD"

"CCYY-MM""CCYY"

"--MM-DD"

"---DD"

"--MM""hh:mm:ss""PnYnMnDTnHnMnS"

Day of the month

Day of the year

Month of the year

M2-55© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

User-defined DatatypesAlternatives

Should the Type contain Elements or Attributes?

Unstructured Content<simpleType>

Structured Content<complexType>

Derivation<restriction>

<union> or<list>

Derivation<restriction><extension>

Nesting<sequence><all><choice>

Empty / Mixed

Nam

ed

/ A

no

nym

ou

s

Should the Type contain Elements?

yes no

yes no

Attributes & Elements<complexContent>

Attributes<simpleContent>

Note: <complexContent>only necessary in case of derivationfrom a user-definedtype

M2-56© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

User-defined DatatypesAlternatives – Examples

<xsd:complexType name="BookTypeWithID"><xsd:complexContent>

<xsd:extension base="BookType"><xsd:attribute name="ID" type="xsd:token"/>

</xsd:extension></xsd:complexContent>

</xsd:complexType>

<xsd:complexType><xsd:sequence>

....</xsd:sequence>

</xsd:complexType>

<xsd:simpleType name="longitudeType"><xsd:restriction base="xsd:integer">

<xsd:minInclusive value="-180"/><xsd:maxInclusive value="180"/>

</xsd:restriction></xsd:simpleType>

<xsd:integer>

No Derivation Derivation

Simple

Complex

User-definedPre-defined

Anonymous Named

M2-57© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Restriction of a pre-defined datatype<restriction>

Union of pre-defined datatypes (Extension)<union>

Values must correspond to at least one of the combined datatypes

List of values of one pre-defined datatype(or again of a List-Datatype)

<list>

User-defined DatatypesDerived Simple Datatypes – <simpleType>

M2-58© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Alternative Definition PossibilitiesReferencing an existing datatype via the attribute baseLocal definition from scratch by using simpleType as subelement of the restriction-Element

12 Possible Restrictions, depending on the base datatypelengthminLengthmaxLengthpatternenumerationminInclusivemaxInclusiveminExclusivemaxExclusivewhiteSpacetotalDigitsfractionDigits

<simpleType name="batteryType"><restriction base="string">

<enumeration value="NiMh"/><enumeration value="NiCd"/><enumeration value="LiIo"/>

</restriction></simpleType><element name="Battery" type="hc:batteryType"/>

<Battery>NiCd</Battery>XML-Document

User-defined DatatypesDerived Simple Datatypes <simpleType> – restriction

M2-59© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

User-defined DatatypesDerived Simple Datatypes <simpleType> – restriction

M2-60© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

User-defined DatatypesDerived Simple Datatypes <simpleType> – restriction

Restrictions using a “pattern” element

Restrictions of the lexical values

Simple regular expressionsNormal characters: "C&amp;A"Categories of characters:"\p{IsBasicLatin}"Sets of characters: "[\p{IsBasicLatin}-[\d]]"Quantifiers: "[a-zA-Z]{1,8}"Paranthesis: "(XML(\s+|-))?Schema"

Combinations of these expressions

M2-61© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Alternative Definition PossibilitiesReferencing an existing datatype via attributes (memberTypes or itemType)

Local definition from scratch by using simpleType as subelementof the union- or list-Elements

<simpleType name="PDAFeatureType"><union memberTypes="hc:PDAColor hc:PDARobustness"/>

</simpleType><simpleType name="PDAFeatureListType">

<list itemType="hc:PDAFeature"/></simpleType><element name="PDAFeatureList" type="hc:PDAFeatureListType"/>

XML-Dokument:<PDAFeatureList>blue waterproof shockproof</PDAFeatureList>

User-defined DatatypeDerived Simple Datatypes <simpleType> – union/list

M2-62© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Nested ElementsPossible within a complex datatype only

AttributesPossible within a complex datatype only

Independent of the existence of nested elements

Empty ContentPossible within a complex datatype only

Does not have nested elements

Mixed ContentDatatype may contain nested elements and text

In contrast to DTDs, for nested elements, the ordering and cardinality properties can be arbitrarily specified

User-defined Datatypes<complexType> - Nested Elements/Attributes/Empty/Mixed

M2-63© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Sequence – <sequence>

Choice – <choice>Arbitrary Ordering – <all>

Nested Elements can be used in arbitrary order

CardinalityExpressed by means of minOccurs and maxOccurs

<complexType name=“PDAType"><sequence minOccurs="1" maxOccurs="1">

<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name="Battery" type="string" minOccurs="1" maxOccurs="1"/>

</sequence><attribute name="no" type="nonNegativeInteger" use="required"/>

</complexType>

User-defined Datatype<complexType> – Nested Elements / Attributes

M2-64© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

<complexType name=“PDAType" mixed="true"><sequence>

<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name="Battery" type="string" minOccurs="1" maxOccurs="1"/>

</sequence></complexType><element name=„PDA" type="hc:PDAType"/>

<PDA>Type Nokia 7110 has<Weight>141g</Weight>and<Battery>900mAh</Battery>

</PDA>

XML Document

User-defined Datatypes<complexType> – Mixed Content

M2-65© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Extension<extension>

Additional nested elements and/or attributes

Restriction<restriction>

DomainCardinality

Abstract Datatypes<complexType> with attribute abstract = "true“

Prohibition of Derivation<complexType> with attribute finalPotential values: #all, restriction, extension

User-defined Datatypes<complexType> – Derivation of Complex Types

M2-66© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Elements are attached at the endExtension must be specified within a <complexContent>-Tag

<complexType name=“extendedPDAType"><complexContent>

<extension base="hc:PDAType" ><sequence>

<element name=“Band" type="string" minOccurs="1" maxOccurs="1"/><element name="Feature" type="string"

minOccurs="1" maxOccurs="10"/></sequence>

</extension></complexContent>

</complexType>

extendedPDAType

PDAType

User-defined Datatypes<complexType> – Derivation via Extension

M2-67© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

The declarations of the base datatypewhich should retain must be repeatedRestriction must be specified within a <complexContent>-Tag

<complexType name=“restrictedPDAType"><complexContent>

<restriction base="hc:extendedPDAType"><sequence><element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name=“Band" type="string" minOccurs="1" maxOccurs="1"/><element name="Feature" type="string" minOccurs="1" maxOccurs="5"/>

</sequence></restriction>

</complexContent></complexType>

User-defined Datatypes<complexType> – Derivation via Restriction

extendedPDAType

restrictedPDAType

PDAType

M2-68© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

StaticDynamic

Definition of the derived datatype within the XML document via the attribute type of the XML Schema Instance (xsi) NS

ElementPDAhas datatypePDAType

<PDA><Weight>141g</Weight><Battery>900mAh</Battery>

</PDA><PDA xsi:type=“extendedPDAType">

<Weight>115g</Weight><Battery>550mAh</Battery><Band>Dualband</Band><Feature>Waterproof</Feature>

</PDA>

DatatypeextendedPDATypeis derived from PDAType:Extension withBand & Feature

User-defined Datatype<complexType> – Two Usage Possibilities

M2-69© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Characteristics of a key (key) Value (combination) must be uniqueValue must existKey must be defined as subelement of another element –following the type definition

Candidates for keys (field)Elements with simple datatypes only!AttributesCombinations of elements and attributes

Scope can be defined (selector)

Reference to key can be defined (keyref)

Elements, Attributes and Combinations thereof can bedefined to be unique (unique)

Value (combination) must be uniqueValue need NOT exist

Keys 1/2

M2-70© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Keys 2/2

<element name="PDACatalog"><complexType> ...</complexType><key name=“typeKey">

<selector xpath="hc:Producer/hc:PDA"/><field xpath="@name"/><field xpath="@version"/>

</key><keyref name="refToTypeKey" refer="hc:typeKey">

<selector xpath="hc:Stock/hc:PDAQuantity"/><field xpath="@name"/><field xpath="@version"/>

</keyref></element>

PDA Name Version Weight ... PDAQuantity Name Version Quantity

<element name="PDACatalog"><complexType> ...</complexType>

<unique name="uniqueProducerNo"><selector xpath="hc:Producer"/><field xpath="@producerNo"/>

</unique></element>

M2-71© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Group of Elements

<group name="mainData"><sequence>

<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name="Battery" type="string" minOccurs="1" maxOccurs="1"/>

</sequence></group>

<complexType name=“PDAType"><sequence>

<group ref="hc:mainData"/><element name="Feature" type="string" minOccurs="1" maxOccurs="10"/>

</sequence></complexType>

Schema CompositionWithin a Schema 1/2

M2-72© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Group of Attributes

<attributeGroup name="BatteryAttributeGroup"><attribute name="type" type="string" default="NiMh"/><attribute name=“capacity" type="string" use="required"/>

</attributeGroup>

<complexType name=“BatteryType"><sequence>...</sequence><attributeGroup ref="hc:BatteryAttributeGroup"/>

</complexType>

Schema CompositionWithin a Schema 2/2

M2-73© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Incorporation of other schemata via include, redefine and import

include, redefine and import elements must be subelementsof schema prior to any other declaration

Include of a Schema – includeNS of included schema must be equal to the NS of the including schema or no NS at allThe included schema can be used as if it were declared directly within the including schema

<schema...><include schemaLocation="PDA.xsd"/>...

Schema CompositionDifferent Schemata 1/2

M2-74© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Including and Redefining a Schema – redefineSame functionality as includeIn addition, included components (simpleType, complexType, group, attributeGroup) can be newly definedNew definitions replace the original ones

Import of a Schema – importImported schema can have an arbitrary NS (could be unequal to the current one) or none

<import namespace="http://" http://www.somewhere.else.com"schemaLocation="Producer.xsd"/>...

<redefine schemaLocation="PDA.xsd"><complexType name=“PDAType">....</complexType>...

</redefine>...

Schema CompositionDifferent Schemata 2/2

M2-75© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Schema Modeling StylesNon-Normative Datamodel of XML Schema Concepts

Legend:

http://www.w3.org/TR/xmlschema-1/

M2-76© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Schema Modeling StylesXML Schema Concepts in Practice

Analysis of 1400 Schemata of diverse standard vocabularies

Open Travel Alliance (OTA),

Human Resource XML (HR-XML),

W3C,

Global Justice XML,

etc.

P. Kiel, Profiling XML Schema, http://www.xml.com/pub/a/2006/09/20/profiling-xml-schema.html

M2-77© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Schema Modeling StylesRelationships /Global vs. Local /Element vs. Type

RelationshipsRealisation by means of nesting or via references

Global Elements/Attribute-DeclarationsPre-requisite for reuse in the same/another schemaRoot element must be global

Local Element/Attribute-DeclarationsIn case that a declaration makes sense only in combination with thedeclared type

Local Element DeclarationsCan occur with different structure but the same name in different types

Local Attribute DeclarationsMakes sense since attributes are most often tightly coupled to elements

Three Stereotypical Design FormsRussian Doll DesignSalami Slice DesignVenetian Blinds Design

LiteratureXMLSchema Best Practices (Roger Costello): www.xfront.comP. Kiel, Profiling XML Schema, http://www.xml.com/pub/a/2006/09/20/profiling-xml-schema.html

M2-78© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Nested Element Declarations

Local declarations onlyPrevents global types

AdvantagesStructure obvious(corresponds to the XML document‘s structure)Prevents side-effects

DisadvantagesDanger of deep nesting levelsNo reuse of declarations – redundanciesNo extensibility in terms of derivation

Schema Modeling StylesRussian Doll Design

M2-79© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Global Element DeclarationsUsage of global elements per reference (ref-Attribute)

Each global element can be aroot element

AdvantagesReuse of elements

DisadvantagesLarge numger of global elements

ConfusingDanger of side-effects in case ofchanges to global elements

No extensibility in terms of derivation

Schema Modeling StylesSalami Slice Design

M2-80© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Global Type DeclarationsElements, except the rootelement, are declared locally

AdvantagesReuse of types

A named type is available foreach element and attributeTypes can be imported fromother schemata

Extensibility by derivation<redefine>

DisadvantagesLarge number of global types

Confusing

Schema Modeling StylesVenetian Blinds Design

M2-81© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Russian Doll DesignFor restrictive structuresStructure of the XML documents in large parts pre-defined bythe schema

Salami Slice DesignFor flexible structuresStructure of the XML documents can strongly vary since different root elements are possible

Venetian Blinds DesignFor flexible structures tooStructure of XML documents can strongly vary in case that typeinheritance is used

In practice – mixtures!

Schema Modeling StylesComparison

M2-82© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Schema Modeling StylesPossible Mixture: „Garden of Eden“

M2-83© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Namespaces

ComplexSimpleStructure

XMLProprietarySyntax

XML SchemaDTD

Comparison DTD – XML SchemaGeneral Criteria

M2-84© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

minOccurs and maxOccurs(more flexible)

"?", "*", "+"Cardinality

<sequence>","Defined Order

simple Types, complexTypes

Text, Elements, mixed content

Definition of the content

Default values

XML SchemaDTD

Arbitrary Order <all>

Alternative <choice>"|"

Comparison DTD – XML SchemaElements

M2-85© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Optionality

Default values

XML SchemaDTD

Comparison DTD – XML SchemaAttributes

M2-86© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

<pattern>Very restricted (e.g. by means of cardinality)

Patterns forDatatypes

many possibilities: <length>, ...

Enumerating all possiblevalues (only for attributes)

Domains

User-definedDatatypes

various datatypes;e.g. boolean, integer...

few datatypes –in fact STRING only,e.g. CDATA, ID, ...

Pre-definedDatatypes

XML SchemaDTD

Comparison DTD – XML SchemaDatatypes

M2-87© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

base and <restriction>Dervivation from complexdatatypes (restriktion)

base and <extension>Derivation from complexdatatypes (extension)

baseDerivation from pre-defined, simple datatypes

XML SchemaDTD

Comparison DTD – XML SchemaInheritance

M2-88© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition

Most important advantages of DTD’squick andeasy to specify

benefical for the specification of simple documents

Most important advantages of XML SchemaNumerous datatypesObject-oriented approachmore modelling possibilities than with DTDs

beneficial for the specification of complex documents

Comparison DTD – XML SchemaSummary

M2-89© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML Schemadefinition

Literature

BooksXML in a Nutshell: A Desktop Quick Reference, 3rd EditionElliotte Rusty Harold, W. Scott Means, O'Reilly & Associates, 2005

O’Reilly XML.com: http://www.xml.com

XML 1.1 Bible, Elliotte Rusty Harold, 2nd Edition, John Wiley & Sons, 2004Elliotte Rusty Harold. Cafe con Leche XML News and Resources: http://www.ibiblio.org/xml

ConferencesXML Europe (XTech Conference Series)

http://www.xmleurope.com

XML Conference & Expositionhttp://www.xmlconference.org

Online ResourcesCommented XML-Standard – Tim Bray

http://www.xml.com/axml/testaxml.htm

W3Schoolshttp://www.w3schools.com/xml/

XML & DTD Patternshttp://www.xmlpatterns.com/

Overview XML Editorshttp://www.perfectxml.com/soft.asp?cat=6

Java and XML. Sun Microsystems, Inchttp://java.sun.com/xml/

IBM XML Zonehttp://www.ibm.com/developer/xml/

Microsoft XML Developer Centerhttp://msdn.microsoft.com/xml/default.asp

XML Schema Test Suites vom W3Chttp://www.w3.org/2001/05/xmlschema-test-collection.html

IBM's Schema Quality Checker (SQC)http://www.alphaworks.ibm.com/tech/xmlsqc