60
SE 5145 – eXtensible Markup Language (XML ) XML Schema 2011-12/Spring, Bahçeşehir University, Istanbul

SE 5145 – eXtensible Markup Language ( XML )

  • Upload
    fola

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

SE 5145 – eXtensible Markup Language ( XML ). XML Schema. 2011-12/Spring, Bahçeşehir University , Istanbul. 3rd Assignment: Validating XML with DTD & XML Schema (page 1/2). - PowerPoint PPT Presentation

Citation preview

Page 1: SE 5145 –  eXtensible Markup  Language  ( XML )

SE 5145 – eXtensible Markup Language (XML )

XML Schema

2011-12/Spring, Bahçeşehir University, Istanbul

Page 2: SE 5145 –  eXtensible Markup  Language  ( XML )

3rd Assignment: Validating XML with DTD & XML Schema (page 1/2)

The goal of this exercise is to understand the basic concepts of XML Schema and how it extends the capabilities of DTDs. You will use your XML Resume (CV) that you provided in Assignment 2.

Task 1. XML Schema: Write an XML schema definition for your XML Resume satisfying the following requirements:

  For any date in your Resume XML, make sure that your XML Schema checks for a valid date value. Try to

avoid xs:string as much as possible, or if you think that something really is a string, use your own string type which for example could take care of checking for a maximum length and some character set (a xs:pattern could be used to achieve the latter).

Make sure that at least one of your types is used by more than one element (because reuse is good). In real-life applications, you would start to design a type library, and then start using it when constructing your schema from the ground up.

Use minOccurs and maxOccurs to restrict the cardinality of some elements.

See next slide..

2

Page 3: SE 5145 –  eXtensible Markup  Language  ( XML )

3rd Assignment: Validating XML with DTD & XML Schema (page 2/2)

 The following are recommended (but optional) for this assignment: Depending on how similar or different your employer and education entries are, try to think of a way how you

could find some structural similarity between these entries and then represent this similarity using complex type derivation.

Try to add a targetNamespace to your schema, so that your Resume schema now is a full-grown schema with its own namespace. Don't forget that you have to change the instance (by using the namespace there) to match the schema when you do that.

Identity constraints could be used to check various aspects of the Resume , depending on what you think should be unique, a key, or a reference to an existing key. A typical example would be to have a key for institutions (educational or companies), and then have each of your skills reference this key so that you can represent where you have acquired each skill.

 

Task 3 – Validate XML: Use a tool to validate your XML Resume (*.xml) using your XML schema (*.xsd). A suitable online tool is http://www.xmlvalidation.com/. On the first page, provide the XML document and

select ‘Validate against external XML schema’. Click ‘Validate’ and provide the XML schema on the second page.

Another tool is Altova XML Spy (You can use download & use a trial version) Alternatively, you can remember and use the recommended tools described by Melike (validator.w3.org) and

Erokan (iexmltls.exe, msval.vbs) from the last presentations.

3

Page 4: SE 5145 –  eXtensible Markup  Language  ( XML )

XML Schemas

“Schemas” is a general term--DTDs are a form of XML schemas According to the dictionary, a schema is “a structured

framework or plan” When we say “XML Schemas,” we usually mean the

W3C XML Schema Language This is also known as “XML Schema Definition” language, or

XSD It has been introduced to overcome some of the commonly

observed limitations of DTDs, most notably the lack of typing

DTDs, XML Schemas, RELAX NG and Schematron are all XML schema languages

4

Page 5: SE 5145 –  eXtensible Markup  Language  ( XML )

What’s Wrong with DTDs? DTDs do not support application-level datatypes

XML for B2B is very data-centric and needs typing SGML was created for documents where typing was less important

DTDs do not support any relationships between markup constructs content models cannot be reused attribute lists cannot be reused structural relationships cannot be exploited in the DTD

DTDs provide a very weak specification language No restrictions on text content Very little control over mixed content (text plus elements) Little control over ordering of elements

DTDs are written in a strange (non-XML) format You need separate parsers for DTDs and XML

5

Page 6: SE 5145 –  eXtensible Markup  Language  ( XML )

Why XML Schemas? XML Schema Definition language (XSD) solves these problems XSD allows you to constrain the content of XML documents like DTDs, but

they are much more powerful & sophisticated. XSD allows a much finer level of control over structure and content XSD is written using XMLsyntax instead of a custom syntax like DTDs use

XML Schema's simple data type provide some semantics a formerly undescribed attribute can now be described as being a xs:date it can be understood as being a date and inserted into a calendar but what kind of date is it? a birthday? an order date? a shipping date? a question of the context of where the xs:date appears

XML Schema better supports model-level information however, XML Schema also only captures part of the application semantics an XML Schema is usually better than a DTD, because it contains types types provide information about the basic datatypes being used additional semantics (e.g., different kinds of dates) must be documented elsewhere

6

Page 7: SE 5145 –  eXtensible Markup  Language  ( XML )

Why not XML schemas?

DTDs have been around longer than XSD Therefore they are more widely used Also, more tools support them

Power of XSD comes with a price: XSD is a little harder and more verbose to write than DTDs,

even by XML standards More advanced XML Schema instructions can be non-

intuitive and confusing

Nevertheless, XSD is not likely to go away quickly

7

Page 8: SE 5145 –  eXtensible Markup  Language  ( XML )

Validation and Typing

XML Schema does two things at the same time:

1. Validation checks for structural integrity (is the document schema-valid?)

checking elements and attributes for proper usage (as with DTDs) checking element contents and attribute values for proper values

2. Type annotations make the types available to applications instead of having to look at the schema, applications get the Post-Schema

Validation Infoset (PSVI) type-based applications (such as XSLT 2.0) can work on the typed

instance

8

Page 9: SE 5145 –  eXtensible Markup  Language  ( XML )

Schema-Validation and Applications

9

Page 10: SE 5145 –  eXtensible Markup  Language  ( XML )

Anatomy of a Schema

Schema uses the namespace defined by http://www.w3.org/2001/XMLSchema and usually uses xsd or xs prefix in the XML code

The file extension is .xsd The root element is <schema> XSD starts like this:

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.rg/2001/XMLSchema">

10

Page 11: SE 5145 –  eXtensible Markup  Language  ( XML )

Referring to a schema

To refer to a DTD in an XML document, the reference goes before the root element:

<?xml version="1.0"?><!DOCTYPE rootElement SYSTEM "url"><rootElement> ... </rootElement>

To refer to an XML Schema in an XML document, the reference goes in the root element:

<?xml version="1.0"?><rootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" (The XML Schema Instance reference is required) xsi:noNamespaceSchemaLocation="url.xsd"> (This is where your XML Schema definition can be found) ...</rootElement>

11

Page 12: SE 5145 –  eXtensible Markup  Language  ( XML )

TYPES A type is a set of values

the values can be enumerated (home, mobile, office) the values can be described by extension (intervals, regular expressions)

DTDs have (almost) no types element content is always #PCDATA (any number of any characters) attributes most often are CDATA (any number of any characters) attributes may have enumerated types (but no extensional types) attributes may use ID/IDREF

12

Page 13: SE 5145 –  eXtensible Markup  Language  ( XML )

TYPES

DTD XML Schema

Concepts some conceptual model (formal/informal)

Types ID/IDREF and (#P)CDATA Hierarchy of Simple and Complex Types

Markup Constructs

Element Type Declarations<!ELEMENT order ...

Element Definitions<xs:element name="order"> ...

Instances (Documents)

<order date=""> [ order content ] </order>

13

Page 14: SE 5145 –  eXtensible Markup  Language  ( XML )

“Simple” and “complex” elements A “simple” element is one that contains text and nothing

else A simple element cannot have attributes A simple element cannot contain other elements A simple element cannot be empty However, the text can be of many different types, and may

have various restrictions applied to it If an element isn’t simple, it’s “complex”

A complex element may have attributes A complex element may be empty, or it may contain text,

other elements, or both text and other elements

14

Page 15: SE 5145 –  eXtensible Markup  Language  ( XML )

“Simple” types Simple types describe values not structured by XML markup

they describe attribute values (date="2006-10-03") they describe element content (<phone>+1-510-6432253</phone>)

Simple types can be used for elements or attributes XML Schema treats contents in elements and attributes equally simple type libraries can be designed independent of their eventual use

Simple types are available in three flavors atomic types: one value of one type (one number in some range) union types: one value of a union of types (a number or the

string undefined) list types: a whitespace-separated list of values (phone type="home

office")

15

Page 16: SE 5145 –  eXtensible Markup  Language  ( XML )

Named vs. Anonymous Types can be named or anonymous named types have a name and can be referenced (and thus be reused) anonymous types have no name and can only be used where they are defined

16

Page 17: SE 5145 –  eXtensible Markup  Language  ( XML )

Type Definitions Simple types are sets of values

named simple types are sets of values with a name (and thus reusable) anonymous simple types are sets of values defined where they are needed

Simple types are defined to represent model-level information in most cases, they will have restrictions associated with them they may also simply be tags for semantics (fax and phone numbers share

the same value space) XML Schema has a library of built-in datatypes

ur-types are the conceptual grounding of all types primitive types are the types that are there by definition derived types are based on primitive types users can derive their own types using simple type restriction

17

Page 18: SE 5145 –  eXtensible Markup  Language  ( XML )

Type Hierarchy

18

Page 19: SE 5145 –  eXtensible Markup  Language  ( XML )

Built-In Types

19

Page 20: SE 5145 –  eXtensible Markup  Language  ( XML )

Declaring Elements with Schema

Elements can be declared as having a simple or complex type Types can be either built-in or defined by your Schema

Elements can also have mixed, empty, or element content, just like in DTDs

Elements can be given a minimum and maximum number of times that they are allowed to occur

20

Page 21: SE 5145 –  eXtensible Markup  Language  ( XML )

Declaring Elements with Schema

21

Page 22: SE 5145 –  eXtensible Markup  Language  ( XML )

Defining a simple element

A simple element is defined asxs:element name="name" type="type" minoccurs/maxoccurs="number/unbounded" />

where: name is the name of the element the most common values for type are

xs:boolean xs:integer xs:date xs:string xs:decimal xs:time

minoccurs and maxoccurs are optional, default value= 1

Other attributes a simple element may have: default="default value" if no other value is

specified fixed="value" no other value may be specified

22

Page 23: SE 5145 –  eXtensible Markup  Language  ( XML )

Custom Simple Types with Restrict

You can define your own custom simple types by deriving them from existing simple types with restriction. the base type must be a simple type the derived type will be a simple type all simple types form a tree, rooted as

the anySimpleType Restriction are based on facets

each restriction can use 0-n facets facets can be refined in further simple type restrictions XML Schema designers should try to restrict types as

much as possible – WHY ?23

Page 24: SE 5145 –  eXtensible Markup  Language  ( XML )

Restrictions

The general form for putting a restriction on a text value is:

<xs:element name="name"> (or xs:attribute) <xs:restriction base="type"> ... the restrictions ... </xs:restriction></xs:element>

For example: <xs:element name="age">

<xs:simpleType> <xs:restriction base="xs:positiveInteger"> <xs:minInclusive value="0"/> <xs:maxInclusive value="140"/> </xs:restriction><xs:simpleType></xs:element> 24

Page 25: SE 5145 –  eXtensible Markup  Language  ( XML )

Facets

Facets define a certain way of restricting a simple type Facets may be repeated in different levels of the type

hierarchy Not all facets are applicable to all types

the applicability depends on the primitive type being used

25

Page 26: SE 5145 –  eXtensible Markup  Language  ( XML )

Restrictions on numbers

minInclusive -- number must be ≥ the given value

minExclusive -- number must be > the given value

maxInclusive -- number must be ≤ the given value

maxExclusive -- number must be < the given value

totalDigits -- number must have exactly value digits

fractionDigits -- number must have no more than value digits

after the decimal point

26

Page 27: SE 5145 –  eXtensible Markup  Language  ( XML )

Restrictions on strings

length -- the string must contain exactly value characters

minLength -- the string must contain at least value characters

maxLength -- the string must contain no more than value characters

pattern -- the value is a regular expression that the string must match

whiteSpace -- not really a “restriction”--tells what to do with whitespace value="preserve" Keep all whitespace

value="replace" Change all whitespace characters to spaces value="collapse" Remove leading and trailing whitespace, and replace

all sequences of whitespace with a single space

27

Page 28: SE 5145 –  eXtensible Markup  Language  ( XML )

Patterns

Patterns restrict the lexical space of simple types most other facets restrict the value space (e.g., intervals of numbers) in many cases, patterns are useful additions to value-oriented facets

Patterns are regular expressions they support many common regex constructs and Unicode the language pattern allows de, de-CH, and other tags the pattern checks for lexical correctness, not against a code list

([a-zA-Z]{2}|[iI]-[a-zA-Z]+|[xX]-[a-zA-Z]{1,8})(-[a-zA-Z]{1,8})*

28

Page 29: SE 5145 –  eXtensible Markup  Language  ( XML )

Facet Limitations

Facets limit one dimension of a type's value space using pattern, the lexical space can also be restricted restrictions should be made as specific as possible no limitations are possible beyond the predefined facets

There is no connection to the context within the document facets cannot make references to other values (e.g., neighboring attributes)

Additional constraints should be documented documentation enables applications to implement constraint checking other schema languages (like Schematron) may be used to express these

constraints

29

Page 30: SE 5145 –  eXtensible Markup  Language  ( XML )

Enumeration

An enumeration restricts content to allowable choices

Example: <xsd:element name="season">

<xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Spring"/> <xsd:enumeration value="Summer"/> <xsd:enumeration value="Autumn"/> <xsd:enumeration value="Fall"/> <xsd:enumeration value="Winter"/> </xsd:restriction> </xsd:simpleType></xsd:element>

30

Page 31: SE 5145 –  eXtensible Markup  Language  ( XML )

Simple Type Examples

31

Page 32: SE 5145 –  eXtensible Markup  Language  ( XML )

What is a Complex Type ?

Complex types describe the allowed element content they describe what the element may contain (the element's content

model) they describe the attributes that an element may have (the

element's attribute list) Complex types do not define the element name

they define which content is allowed for the element the element definition uses the complex type to define the allowed

element content Complex types have similar properties to simple types

they can be named or anonymous Complex Type Derivation can be used to construct a type hierarchy

32

Page 33: SE 5145 –  eXtensible Markup  Language  ( XML )

Declaring Complex Elements

To declare the elements with complex type: Use the xsd:anyType value for the type attribute Use the <xsd:complexType> tag in the definition

Structure: <xs:element name="name"> <xs:complexType> ... information about the complex type... </xs:complexType> </xs:element>

Remember that attributes are always simple types

33

Page 34: SE 5145 –  eXtensible Markup  Language  ( XML )

Complex elements

Example: <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>

<xs:sequence> says that elements must occur in this order

34

Page 35: SE 5145 –  eXtensible Markup  Language  ( XML )

Complex Type Example

35

Page 36: SE 5145 –  eXtensible Markup  Language  ( XML )

Complex Types & Content Types

Complex types can have different kinds of content simple content refers to simple type content using additional

attributes complex content is anything else (anything beyond simple type

content) Complex Type Derivation heavily depends on this

classification

36

Page 37: SE 5145 –  eXtensible Markup  Language  ( XML )

DTD Content Models Defining Elements in DTDs uses a compact syntax

XML Schema supports the same facilities with a more verbose syntax XML Schemas adds features which DTDs do not support

DTDs allow elements to be mandatory, optional, repeatable, or optional and repeatable

XML Schema allows the cardinality to be specified

DTDs allow sequences (,) and alternatives (|) XML Schema introduces a (very limited) operator for all groups

Apart from the syntax, XML Schema content models are not very different

37

Page 38: SE 5145 –  eXtensible Markup  Language  ( XML )

Empty Content

DTDs have a special keyword for empty elements instead of the content model, the keyword EMPTY is used empty elements may still have attribute lists associated with them

XML Schema empty types are defined implicitly there is no explicit keyword for defining an empty type if a type has no model group inside it, it is empty (it still may have

attributes)

Declaring empty elements <xs:element name="myEmptyElement">

<xs:complexType></xs:complexType>

</xs:element>

38

Page 39: SE 5145 –  eXtensible Markup  Language  ( XML )

Mixed Content DTDs define mixed content by mixing #PCDATA into the content

model DTDs always require mixed content to use the form ( #PCDATA | a | b )* the occurrence of elements in mixed content cannot be controlled

XML Schema defines mixed content outside of the content model the content model is defined like an element-only content model the mixed attribute on the type marks the type as being mixed

Example: (only one subtitle is allowed, why ?)

39

Page 40: SE 5145 –  eXtensible Markup  Language  ( XML )

Mixed Content XML Schema mixed content can use all model groups

it is possible to constrain element occurrences in the same way as in element-only content

in practice, this feature is rarely used (mixed content often is very loosely defined)

40

Page 41: SE 5145 –  eXtensible Markup  Language  ( XML )

Defining an attribute

Attributes are always declared as simple types Any of the simple types that can be used for elements can also be

used for attributes.

An attribute is defined as <xs:attribute name="name" type="type" />where:

name and type are the same as for xs:element

41

Page 42: SE 5145 –  eXtensible Markup  Language  ( XML )

Defining an attribute

Other attributes a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be specified use="required" attribute must be present use="optional" attribute is not required

(default) use="prohibited" attribute can not be used

Example: <xsd:attribute name="city" type="xsd:string"

use="optional" default="istanbul"/>

42

Page 43: SE 5145 –  eXtensible Markup  Language  ( XML )

Adding attributes to the elements

Adding attributes to an element that has an empty content model

43

Page 44: SE 5145 –  eXtensible Markup  Language  ( XML )

Adding attributes to the elements

Adding attributes to an element that only has character data content

44

Page 45: SE 5145 –  eXtensible Markup  Language  ( XML )

Adding attributes to the elements

Adding attributes to an element that have element or mixed content models

45

Page 46: SE 5145 –  eXtensible Markup  Language  ( XML )

Global and local definitions

Elements declared at the “top level” of a <schema> are available for use throughout the schema

Elements declared within a xs:complexType are local to that type Thus, in

<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>the elements firstName and lastName are only locally declared

The order of declarations at the “top level” of a <schema> do not specify the order in the XML data document 46

Page 47: SE 5145 –  eXtensible Markup  Language  ( XML )

Declaration and use

So far we’ve been talking about how to declare types, not how to use them

To use a type we have declared, use it as the value of type="..." Examples:

<xs:element name="student" type="person"/> <xs:element name="professor" type="person"/>

Scope is important: you cannot use a type if is local to some other type

47

Page 48: SE 5145 –  eXtensible Markup  Language  ( XML )

Declaring elements with element content

Sequence: child elements must appear in order All: child elements can occur in any order Choice: any one of the child elements from a list

48

Page 49: SE 5145 –  eXtensible Markup  Language  ( XML )

sequence

child elements must appear in a specific order: <xs:element name="person">

<xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>

49

Page 50: SE 5145 –  eXtensible Markup  Language  ( XML )

xs:all Child elements can appear in any order <xs:element name="person">

<xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element>

Despite the name, the members of an xs:all group can occur once or not at all

You can use minOccurs="n" and maxOccurs="n" to specify how many times an element may occur (default value is 1)

In this context, n may only be 0 or 1

50

Page 51: SE 5145 –  eXtensible Markup  Language  ( XML )

xs:choice Aany one of the child elements from a list

<xs:element name="vehicle"> <xs:complexType> <xs:choice> <xs:element name="car" type="xs:integer" />

<xs:element name="bus" type="xs:string" /> <xs:element name="van" type="xs:string" /> <xs:element name="motorcycle" type="xs:integer" />

</xs: choice > </xs:complexType> </xs:element>

51

Page 52: SE 5145 –  eXtensible Markup  Language  ( XML )

Referencing

Once you have defined an element or attribute (with name="..."), you can refer to it with ref="..."

Example: <xs:element name="person">

<xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element>

<xs:element name="student" ref="person"> Or just: <xs:element ref="person">

52

Page 53: SE 5145 –  eXtensible Markup  Language  ( XML )

Text element with attributes

If a text element has attributes, it is no longer a simple type

<xs:element name="population"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:integer"> <xs:attribute name="year" type="xs:integer"> </xs:extension> </xs:simpleContent> </xs:complexType>

</xs:element>

53

Page 54: SE 5145 –  eXtensible Markup  Language  ( XML )

Empty elements

Empty elements are (ridiculously) complex

<xs:complexType name="counter"> <xs:complexContent> <xs:extension base="xs:anyType"/> <xs:attribute name="count" type="xs:integer"/> </xs:complexContent></xs:complexType>

54

Page 55: SE 5145 –  eXtensible Markup  Language  ( XML )

Mixed elements

Mixed elements may contain both text and elements We add mixed="true" to the xs:complexType

element The text itself is not mentioned in the element, and

may go anywhere (it is basically ignored)

<xs:complexType name="paragraph" mixed="true"> <xs:sequence> <xs:element name="someName" type="xs:anyType"/> </xs:sequence></xs:complexType>

55

Page 56: SE 5145 –  eXtensible Markup  Language  ( XML )

Extensions

You can base a complex type on another complex type <xs:complexType name="newType">

<xs:complexContent> <xs:extension base="otherType"> ...new stuff... </xs:extension> </xs:complexContent></xs:complexType>

56

Page 57: SE 5145 –  eXtensible Markup  Language  ( XML )

Predefined string types

Recall that a simple element is defined as: <xs:element name="name" type="type" />

Here are a few of the possible string types: xs:string -- a string xs:normalizedString -- a string that doesn’t contain

tabs, newlines, or carriage returns xs:token -- a string that doesn’t contain any

whitespace other than single spaces Allowable restrictions on strings:

enumeration, length, maxLength, minLength, pattern, whiteSpace

57

Page 58: SE 5145 –  eXtensible Markup  Language  ( XML )

Predefined date and time types

xs:date -- A date in the format CCYY-MM-DD, for example, 2002-11-05

xs:time -- A date in the format hh:mm:ss (hours, minutes, seconds)

xs:dateTime -- Format is CCYY-MM-DDThh:mm:ss

Allowable restrictions on dates and times: enumeration, minInclusive, maxExclusive,

maxInclusive, maxExclusive, pattern, whiteSpace

58

Page 59: SE 5145 –  eXtensible Markup  Language  ( XML )

Predefined numeric types

Here are some of the predefined numeric types:

Allowable restrictions on numeric types: enumeration, minInclusive, maxExclusive,

maxInclusive, maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace

xs:decimal xs:positiveIntegerxs:byte xs:negativeIntegerxs:short xs:nonPositiveIntegerxs:int xs:nonNegativeIntegerxs:long

59

Page 60: SE 5145 –  eXtensible Markup  Language  ( XML )

Practice..

60