Sheet 1XML Technology in E-Commerce 2001 Lecture 2
XML Technology in E-Commerce
Lecture 2
Logical and Physical Structure, Validity, DTD, XML Schema
Sheet 2XML Technology in E-Commerce 2001 Lecture 2
• Logical and Physical Structure of XML Documents;
• Validity;
• DTD– Element declarations;
– Attribute declarations;
• XML Schema– Element and Attribute declarations;
– Simple types definitions;
– Complex types definitions;
Lecture Outline
Sheet 3XML Technology in E-Commerce 2001 Lecture 2
• By definition each XML document has logical and physical structure;
• Markups are used to describe the structures;• Two structures must be properly nested according
to the specification rules;
See “Logical and Physical Structure of XML Documents”
Logical and Physical Structure
Sheet 4XML Technology in E-Commerce 2001 Lecture 2
• An XML Document is an information item;
• Document Logical Structure: represents the information in the way perceived by the user (application);
Logical Structure
Sheet 5XML Technology in E-Commerce 2001 Lecture 2
Physical Structure
• An XML Document is also a physical entity;• The content that we logically perceive can be distributed across several
physical entities. They form the physical structure:
<students>
</students>
<student> John Smith</student>
<student> John Smith Jr.</student>
Entity 1
Entity 2
Entity 3
<students> <student> John Smith </student> <student> John Smith Jr. </student></students>
Logical View
Sheet 6XML Technology in E-Commerce 2001 Lecture 2
• Well-formedness constraints don’t specify element and attribute names and types and the instance document structure;
• Validity Constraints - specify element and attribute names
and types and the document structure;
• DTD based validation and Schema based validation;
• Parsers:
– Non-validating parsers: check documents for well-formedness;
– Validating parsers: check documents for well-formedness and validity constraints;
Valid XML Documents
Sheet 7XML Technology in E-Commerce 2001 Lecture 2
DTD Validation
Sheet 8XML Technology in E-Commerce 2001 Lecture 2
• DTD - Document Type Definition;
• DTD is a grammar for a class of XML documents;
• Document Type Declaration:
– Contains the DTD for an XML document;
– External subset:
<!DOCTYPE root SYSTEM “myDTD.dtd” >
– Internal subset:
<!DOCTYPE root [
……markup declarations………
]>
DTD
Sheet 9XML Technology in E-Commerce 2001 Lecture 2
• Element type declarations;
• Attribute list declarations;
• Entity declarations - declare the entities that form
the document physical structure. See “Logical and
Physical Structure of XML Documents”;
• Notation declarations;
Document Type Declaration can also contain Processing Instructions and Comments
DTDMarkup Declarations
Sheet 10XML Technology in E-Commerce 2001 Lecture 2
Specifies the element type and content:
<!ELEMENT Name contentSpec>
Element’s Content:– Empty:
<!ELEMENT homepage EMPTY>
– Any:<!ELEMENT container ANY>
– Only elements (element content);
– Mixed;
DTDElement Type Declaration
Sheet 11XML Technology in E-Commerce 2001 Lecture 2
• Content Model Building Blocks:– Choice
(p | list | table | form )
– Sequence(street, zip, city, country)
– Occurrence Specifiers? + *
• Example:<!ELEMENT person (name, address+,
homepage?, note*)>
See also Deitel 6.4.1, page 139
DTD: Element’s Content Content Model
Sheet 12XML Technology in E-Commerce 2001 Lecture 2
• Elements with mixed content can contain other elements and character data or only character data
<!ELEMENT note (#PCDATA | em | strong | abbr)*>
<!ELEMENT p (#PCDATA | em | i | b | a | ul)*>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
• Other examples - Deitel 6.4.2, page 143
DTD: Element’s Content Mixed Content
Sheet 13XML Technology in E-Commerce 2001 Lecture 2
• Attributes are always associated with a particular element;
• Attribute list declaration format:<!ATTLIST elName
attrName1 attrType1 attrDefault1
attrName2 attrType2 attrDefault2
………………………………… >
• Attribute types:– String type;
– Tokenized type;
– Enumerated type;
DTDAttribute List Declaration
Sheet 14XML Technology in E-Commerce 2001 Lecture 2
• String type:
<!ATTLIST person age CDATA #REQUIRED>
• Tokenized types:– ID, IDREF, IDREFS (Deitel 6.6.1 page 147);
– ENTITY, ENTITIES(Deitel 6.6.1 page 150, “Logical and Physical Structure of XML Documents”);
– NMTOKEN, NMTOKENS (Deitel 6.6.1 page 152);
<!ATTLIST person id ID #REQUIRED>
• Enumerated type:
<!ATTLIST person gender (M | F) #IMPLIED>
DTD: Attribute DeclarationsAttribute Types
Sheet 15XML Technology in E-Commerce 2001 Lecture 2
Provide information about the attribute’s presence:
• #REQUIREDAttribute must always be present.
• #IMPLIEDThe attribute may be absent. There is no default value.
• Default value
<!ATTLIST list type (ol|ul) “ul”>
<!ATTLIST list type (ol|ul) #FIXED “ul”>
DTD: Attribute DeclarationsAttribute Defaults
Sheet 16XML Technology in E-Commerce 2001 Lecture 2
• DTD is a grammar that specifies element and attributes types and names;
• DTD contains declarations for Entities and Notations that are used in the document physical structure (see “Logical and Physical Structure of XML Documents”);
• Mixed element content can not constrain the order of sub-elements;
• Attribute value type set doesn’t contain primitive data types like integer, date, time, etc.
Demo - DTD validation with XML Spy
Summary on DTD validation
Read: Deitel 6, “Logical and Physical Structure of XML Documents”
Assignment: Deitel Ex 6.6 and Ex 6.7, page 164
Sheet 17XML Technology in E-Commerce 2001 Lecture 2
• Logical and Physical Structure of XML Documents;
• Validity;
• DTD– Element declarations;
– Attribute declarations;
• XML Schema– Element and Attribute declarations;
– Simple types definitions;
– Complex types definitions;
Lecture Outline
Sheet 18XML Technology in E-Commerce 2001 Lecture 2
Schema Validation
Sheet 19XML Technology in E-Commerce 2001 Lecture 2
• XML Schema constrains the structure, element and attributes names and types of XML documents;
• There are several schema proposals. We will discuss W3C Schema;
• Schema specification defines an abstract data model for schemas and the correspondent XML representation;
• A schema is a set of components;
• There are 13 schema components divided into three groups:– Primary components;
– Secondary components;
– Helper components;
XML Schema
Sheet 20XML Technology in E-Commerce 2001 Lecture 2
Schema: XML Representation• schema element<xs:schema
xmlns:xs=”"http://www.w3.org/2000/10/XMLSchema" version=”1.0”>
<xs:attribute ……>
</xs:schema>
• Current namespace URI (30 March, no support in XML Spy 3.5): http://www.w3.org/2001/XMLSchema
• Components:– Element declarations;
– Attribute declarations;
– Simple type definitions;
– Complex type definitions;
Sheet 21XML Technology in E-Commerce 2001 Lecture 2
• Syntax:<element name=“myElement” type=“myType” />
<element ref=“myElement”/>
• Occurrence:minOccurs and maxOccurs attributes
<element ref=“myElement”
minOccurs=“2”
maxOccurs=“12”/>
<element ref=“myElement”
minOccurs=“0”
maxOccurs=“unbounded”/>
SchemaElement Declaration
Sheet 22XML Technology in E-Commerce 2001 Lecture 2
SchemaAttribute Declaration (1)
• Syntax:<attribute name=“myAttr” type=“myAttrType”/>
<attribute ref=“myAttr”/>
• Defaults:use and value attributes
<attribute ref=“myAttr” use=“required”/>
<attribute ref=“myAttr” use=“default”
value=“37”/>
<attribute ref=“myAttr” use=“fixed”
value=“37”/>
Sheet 23XML Technology in E-Commerce 2001 Lecture 2
SchemaAttribute Declaration (2)
Changes in attribute occurrence constraints syntax (made on 30 March, currently not supported by XML Spy 3.5)
• Defaults:use, default, fixed attributes
<attribute ref=“myAttr” use=“required”/>
<attribute ref=“myAttr” use=“optional”
default=“37”/>
<attribute ref=“myAttr” fixed=“37”/>
Sheet 24XML Technology in E-Commerce 2001 Lecture 2
• XML Schema provides two kinds of type definition:
– Simple types - specify constraints on strings that can be used
as values of attributes and elements with only character data
content;
– Complex types - specify attributes and content model of
document elements;
• Type definition hierarchy:– Types defined by restriction;
– Types defined by extension;
– Root type - anyType;
SchemaType Definitions
Sheet 25XML Technology in E-Commerce 2001 Lecture 2
• Usage - for attribute values and content of elements without attributes and children;
<phone>222-33-22-444-1</phone>
<age>23</age>
• Set of built-in simple datatypes defined in XML Schemas: Datatypes specification (see XML Primer, Appendix B, Table b1.a);
• Each simple type is a restriction of another simple type;
SchemaSimple Types
Sheet 26XML Technology in E-Commerce 2001 Lecture 2
• Syntax:<simpleType name=“mySimpleType”>
content: (restriction | union | list)
</simpleType>
• Restrictions:<simpleType name=“mySimpleType”>
<restriction base=“integer”>
<minInclusive value=“25”/>
<maxInclusive value=“100”/>
</restriction>
</simpleType>
• Facets (see XML Schema Primer, Appendix B);
• List and Union Types (see XML Schema Primer 2.3.1 and 2.3.2);
SchemaSimple Type Definition
Sheet 27XML Technology in E-Commerce 2001 Lecture 2
• Complex type definition contains a set of attribute
declarations and content model that specify the content and
attributes of a set of elements;
• Complex type can be:– a restriction of another complex type;
– an extension of a simple or complex types;
– a restriction of the anyType type;
• Extension mechanism adds additional content parts at
the end of the content model of the base definition
and/or adds new attribute declarations;
SchemaComplex Types
Sheet 28XML Technology in E-Commerce 2001 Lecture 2
Elements with text-only content and attributes. Extension of simple types:<height units=“m”>125</height>
<complexType name=“measurement”>
<simpleContent>
<extension base=“decimal”>
<attribute name=“units” type=“string”/>
</extension>
</simpleContent>
</complexType>
<element name=“height” type=“measurement”/>
SchemaComplex Type Definition
Sheet 29XML Technology in E-Commerce 2001 Lecture 2
• Model Group Elements (see XML Schema Primer 2.7):– sequence;
– choice;
– all;
– group;
• Mixed Content;<complexType name=“noteType” mixed=“true”>
<choice maxOccurs=“unbounded”>
<element name=“em” type=“string”/>
<element name=“b” type=“string”/>
<element name=“i” type=“string”/>
</choice>
</complexType>
• Empty Elements (see XML Schema Primer 2.5.3)
SchemaElement Content Model
Sheet 30XML Technology in E-Commerce 2001 Lecture 2
• Anonymous Types (Primer 2.4);
• Attribute Groups (Primer 2.8);
• Namespace (Primer 3.1);
• Deriving types by extension (Primer 4.2);
• Schema modularization (Primer 4.1);
• Annotations (Primer 2.6);
• Relating schema and document instances (Primer 5.6, Deitel 7.6)
Demo: Schema validation with XML Spy
SchemaAdditional Features
Sheet 31XML Technology in E-Commerce 2001 Lecture 2
• Expressed in XML;
• Based on the explicit notion of types for elements and attribute values;
• Provides namespace control;
• Uses extension and restriction for type derivation;
• Lacks of support for entities;
Summary on XML Schema
Read: Deitel 7, XML Schema Primer (24.10.2000 version)
Skip: Deitel 7.3..7.5, Primer 5.1..5.3, 5.5
Assignment:Write schema for planner.xml (Deitel 5.9, page 126)
and compare with the syntax in Deitel 7.7. Validate with XML-
Spy. Use Chapter 2 and Appendix B from the Primer, Deitel 7.6