Structuring XML Using Structuring XML Using DTD and SchemaDTD and Schema
Ching-Long Yeh, PhD, 葉 慶 隆Department of Computer Science and EngineeringTatung UniversityTaipei 104, TaiwanEmail: [email protected]: http://www.cse.ttu.edu.tw/~chingyeh
Structuring XML 2
ContentContent
• XML Document Basics• DTD Syntax Review• An Introduction to XML Schema• Types of Interaction with Document• DTD in Electronic Business• Conclusion
XML Document BasicsXML Document Basics
Structuring XML 4
Structure, Content, and FormatStructure, Content, and Format
• Central to XML is the concept that documents have structure, content, and format.
• These three ingredients combine to form a document.
• They interrelate in subtle ways, and you can easily confuse them as you work with your documents.
Structuring XML 5
What is Structure?What is Structure?
• The structure defines how the document is laid out and in what order elements are assembled
• For example, a bicycle assembly manual might consist of of the following section in this order: – an introduction that described the document and
lists the manufacturer’s address,
– assembly instructions,
– a part list,
– instruction for order replacement parts,
– troubleshooting advice, and
– index.
Structuring XML 6
What is Content?What is Content?
• Content is the actual data within a
document.
• The words and illustrations that make up a bicycle assembly manual are its contents.
Structuring XML 7
What is Format?What is Format?
• Format consists of how the words, sentences, and paragraphs are visually presented and distinguished from one another within a document.
• Boldface for title, italics for special terms, and blank lines between sections are examples of document formats.
• People often confuse format with structure.
Structuring XML 8
Why Structure, Content, and Format Why Structure, Content, and Format Are Important in XML?Are Important in XML?
• XML defines the structure and separate the content from the delivery-specific format.
• Through this approach, the actual document — its content and structure — becomes mobile.
Structuring XML 9
Indicating Structure Through Visual Indicating Structure Through Visual CuesCues
PRODUCT ADVISORYNumber: 146Type: PartsDate: 8/15/95
Subject: Revised Replacement Parts ...
Model 501 User Replaceable PartsThe parts list identified in the AnyCorp Mod
el 501 ...New Parts List 1. 345-234 (Filter, cooling fan)2. 148-745 (Fuse, power: 1.5amp)3 ...
Product AdvisoryNumber: 146Type: PartsDate: 8/15/95Revised:Subject: Revised
Replacement ...Model 501 User-Replaceable
PartsThe parts list identified in
the ...
New Parts List1. 345-234 (Filter, cooling
fan)2. 148-745 (Fuse, power:
1.5amp)3. ...
Structuring XML 10
Defining Structures in XMLDefining Structures in XML
• The structure of a document its type is defined by a document type definition, or DTD.
• The DTD lays out the rules for a document through the use of elements, attributes, and entities.
Structuring XML 11
Defining Structures in XMLDefining Structures in XML<!DOCTYPE advisory [<!ENTITY % parael “para|blist|nlist|graphic”><!ELEMENT advisory (idinfo,subject,subsec+)><!ELEMENT idinfo (advnbr,type,dateiss,daterev,product)><!ELEMENT subject (#PCDATA)><!ATTLIST subject safty (y|n) “n”><!ELEMENT subsec (title,(%parael;)?)> <!ELEMENT advnbr (#PCDATA)><!ELEMENT type (#PCDATA)><!ELEMENT dateiss (#PCDATA)><!ELEMENT daterev (#PCDATA)><!ELEMENT product (#PCDATA)><!ELEMENT title (#PCDATA)><!ELEMENT para (#PCDATA)><!ELEMENT blist (item+) -(nlist)><!ELEMENT nlist (item+)><!ELEMENT item (para|blist|nlist|graphic|#PCDATA)+><!ELEMENT graphic EMPTY><!ATTLIST graphic filename CDATA #REQUIRED artno CDATA #IMPLIED> ]>
Structuring XML 12
Using Structures in XMLUsing Structures in XML<?xml version=“1.0”><!DOCTYPE ADVISORY SYSTEM "advisory.dtd"><ADVISORY><IDINFO><ADVNBR>Number: 146</ADVNBR><TYPE>Type: 146</TYPE><DATEISS>Date: 8/15/95</DATEISS><DATEREV>Revised: 9/29/95</DATEREV><PRODUCT>Model 501 Nebulation</PRODUCT></IDINFO><SUBJECT>Subject: Revised Replacement Parts List (AnyCorp Model 501)</S
UBJECT><SUBSEC><TITLE>Model 501 User-Replaceable Parts</TITLE><PARA>The parts list identified in the AnyCorp Model 501 User's Mainten
ance Guide has been superseded, effective immediately. User-Replaceable parts are identified in the revised part list below. Parts orders which reference items o12n the previous list (dated 2/5/94) will be honored up to 3/14/96.Customers are advised to order from this revised list in order that they may achieve higher reliability at a lower unit cost. Questions on this subject should be directed to the Central Spares Organization.</PARA></SUBSEC>
<SUBSEC><TITLE>New Parts List</TITLE><BLIST><ITEM>345-234(Filler, coolingfan)</ITEM> <ITEM>148-745(Fuse, power, 1.5amp)</ITEM> <ITEM>345-712(Lamp, Indicator)</ITEM> <ITEM>2346-92(Disk, cleaning)</ITEM> <ITEM>347-622(Swabs, cleaning)</ITEM></BLIST></SUBSEC></ADVISORY>
Structuring XML 13
Well-Formed and Valid DocumentsWell-Formed and Valid Documents
• XML has two different notions of “correct.”
• Valid documents– Declaring conformance to a DTD in a document type
declaration – “Using the right words in the right place”– Type-valid
• Well-formed documents– Markup is intelligible.– “Getting the pronunciation right”– Non-type-valid
Structuring XML 14
Example Example — Table— Table
Structuring XML 15
Example Example — Table— Table
Structuring XML 16
Example Example — Table— Table
Structuring XML 17
Example Example — Database Publishing— Database Publishing
Structuring XML 18
Example: A DTD for B2B ECExample: A DTD for B2B EC
• RosettaNet PIP 3 A2 Price And Availability Query Version 1.2 Available at http://www.rosettanet.org
DTD Syntax ReviewDTD Syntax Review
Structuring XML 20
DTD SyntaxDTD Syntax
• Seven major headings:– document type declarations– element types– attributes– entities– notations– conditional sections– processing instructions
Structuring XML 21
Document Type DeclarationDocument Type Declaration
• A document type declaration defines constraints on the logical structure and to support the use of predefined storage units.
• The XML document type declaration contains or points to markup declarations that provide a grammar for a class of documents.
Structuring XML 22
Document Type DeclarationDocument Type Declaration
<?xml version “1.0”?><!DOCTYPE label[ <!ELEMENT label (name,street,city,state,country,code)> <!ELEMENT name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT country (#PCDATA)> <!ELEMENT code (#PCDATA)> ]><label><name>Rock N. Robyn</name> <street>Jay Bird Street</street> <city>Baltimore</city> <State>MD</state> <country>USA</country> <code>43214</code></label>
Structuring XML 23
Document Type DeclarationDocument Type Declaration
<?xml version “1.0”?><!DOCTYPE LABEL SYSTEM http://www.sgmlsource.com/dtds/label.dtd><LABEL>. . .</LABEL>
Structuring XML 24
Elements Type DeclarationElements Type Declaration
• Elements provide the basic logical structure for XML documents.
Element Type Declaration[45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>' [46] contentspec ::= 'EMPTY' | 'ANY' | Mixed | children
Element-content Models[47] children ::= (choice | seq) ('?' | '*' | '+')?[48] cp ::= (Name | choice | seq) ('?' | '*' | '+')?[49] choice ::= '(' S? cp ( S? '|' S? cp )* S? ')[50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')'
Structuring XML 25
Elements Type DeclarationElements Type Declaration
<!ELEMENT spec (front, body, back?)><!ELEMENT div1 (head, (p | list | note)*, div2*)><!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*><!ELEMENT p (#PCDATA|a|ul|b|i|em)*><!ELEMENT b (#PCDATA)>
Structuring XML 26
AttributesAttributes
• Attributes provide meta-data for elements, such as a security level, a revision status, or a unique identifier.
• Use an attribute list declaration to declare attributes for an element<!ATTLIST sample id ID #IMPLIED n CDATA #REQUIRED status (draft|final) “final”>
attribute name attribute type default value
Structuring XML 27
EntitiesEntities
• There are two types of entities:– general entities: apply within the top-level
and its attribute values.– parameter entities: apply within the internal
and external DTD subsets.
Structuring XML 28
Entities: General EntitiesEntities: General Entities
<!ENTITY xml “Extensible Markup Language”>
<para>The &xml; is derived from ISO 8879, an International Standard<index label=&xml;> <para>
<para>The Extensible Markup Language is derived from ISO 8879, an International Standard<index label=“Extensible Markup Language”> <para>
Structuring XML 29
Entities: Parameter EntityEntities: Parameter Entity
<!ENTITY %inline “#PCDATA|emphasis|link”>
<!ELEMENT para (%inline;)*
<!ELEMENT para (#PCDATA|emphasis|link)*>
Structuring XML 30
NotationsNotations
• Notations are used to include non-XML contents ─ like graphics, sounds, video , or source-code listing ─ in XML documents.
• While the XML parser knows nothing about the specific notations, it can pass them on to the processing software to let it know what kinds of data to handle.
<!NOTATION TeX PUBLIC “+//ISBN 0-201-13448-9::Knuth//NOTATION The TeXbook//EN”>
Structuring XML 31
Conditional SectionsConditional Sections
• In the external DTD subsets and external parameter entities, XML allows conditional sections that the parser can include or ignore, depending on the value of the keywords at the start.
<![IGNORE [ <!ELEMENT para (#PCDATA)>]]>
<!ENTITY %include-para “IGNORE”><![%include-para;[ <!ELEMENT para (#PCDATA)>]]>
<!DOCTYPE book SYSTEM “book.dtd”[ <!ENTITY %include-para “INCLUDE”>]>
overriding a parameter entity
Structuring XML 32
Processing InstructionsProcessing Instructions
• XML parser will pass PIs on to your application, but will be up to you to do something useful with them.
<?IS10744:arch name=“abc”>
Introduction to XML SchemaIntroduction to XML Schema
Structuring XML 34
IntroductionIntroduction
• The new XML Schema system aims at providing a rich grammatical structure for XML documents that overcomes the limitations of the DTD.
Structuring XML 35
What is a Schema?What is a Schema?
• A schema is a model for describing the structure of information.
• In the context of XML, a schema describes a model for a whole class of documents.
• A schema might also be viewed as an agreement on a common vocabulary for a particular application that involves exchanging documents.
Structuring XML 36
What is a Schema?What is a Schema?
• In schemas, models are described in terms of constraints.
• Two kinds of constraints that you can give:– content model constraints describe the order and sequen
ce of elements and – datatype constraints describe valid units of data.
Structuring XML 37
What is a Schema?What is a Schema?
• For example, a schema might describe a valid <address> with the content model constraint that – it consists of a <name> element, followed by – one or more <street> elements, followed by – exactly one <city>, <state>, and <zip> element.– The content of a <zip> might have a further datatype cons
traint that it consist of either a sequence of exactly five digits or a sequence of five digits, followed by a hyphen, followed by a sequence of exactly four digits. No other text is a valid ZIP code.
<address> <name>Namron H. Slaw</name> <street>256 Eight Bit Lane</street> <city>East Yahoo</city> <state>MA</state> <state>CT</state> <zip>blue</zip></address>
invalid
Structuring XML 38
Limitations of DTDLimitations of DTD
• XML inherited DTDs from SGML. • DTDs can be used to define content models and, to
a limited extent, the datatypes of attributes, but they have a number of obvious limitations:– different (non-XML) syntax– no support for namespaces– extremely limited datatyping– a complex and fragile extension mechanism based on littl
e more than string substitution (no explicit relationship)
Structuring XML 39
Features of SchemaFeatures of Schema
• Richer datatypes – booleans, numbers, dates and times, URIs, integers, deci
mal numbers, real numbers, intervals of time, etc.• User defined types• Attribute grouping• Refinable archetypes• Namespace support
Structuring XML 40
ValidityValidity
• Reasons why need to validate documents: – EC: received is exactly what you expect.– B2B: validating before inserting into your database. – XML document for control purpose
• Content model validity tests whether the order and nesting of tags is correct.
• Datatype validity is the ability to test whether specific units of information are of the correct type and fall within the specified legal values.
Structuring XML 41
Illustrations of XML SchemaIllustrations of XML Schema
An XML document fragment<InvoiceNo>123456789</InvoiceNo><ProductID>J123456</ProductID>
DTD fragment describing the above elements <!ELEMENT InvoiceNo (#PCDATA)><!ELEMENT ProductID (#PCDATA)>
XML Schema fragment describing the above elements<element name='InvoiceNo' type='positive-integer'/><element name='ProductID' type='ProductCode'/><simpleType name='ProductCode' base='string'> <pattern value='[A-Z]{1}d{6}'/></simpleType>
Structuring XML 42
Using Namespaces in XML SchemaUsing Namespaces in XML Schema
• One person may be processing documents from many other parties and the different parties may want to represent their data elements differently.
• Moreover, in a single document, they may need to separately refer to elements with the same name that are created by different parties.
• How can you distinguish between such different definitions with the same name?
• XML Schema allows the concept of namespaces to distinguish the definitions.
Structuring XML 43
Using Namespaces in XML Using Namespaces in XML SchemaSchema
• A given XML Schema defines a set of new names. The names defined in a schema are said to belong to its target namespace.
• Definitions and declarations in a schema can refer to names that may belong to other namespaces. We refer to those namespaces as source namespaces.
• Each schema has one target namespace and possibly many source namespaces.
• In fact, every name in a given schema belongs to some namespace.
• The names for the namespaces can be fairly long, but they can be abbreviated with the syntax of xmlns declaration in the XML Schema document.
Structuring XML 44
Using Namespaces in XML Using Namespaces in XML SchemaSchema
• Target and source namespaces
<!--XML Schema fragment in file schema1.xsd--><xsd:schema targetNamespace='http://www.SampleStore.com/Account' xmlns:xsd='http://www.w3.org/1999/XMLSchema' xmlns:ACC='http://www.SampleStore.com/Account'><xsd:element name='InvoiceNo' type='xsd:positive-integer'/><xsd:element name='ProductID' type='ACC:ProductCode'/><xsd:simpleType name='ProductCode' base='xsd:string'> <xsd:pattern value='[A-Z]{1}d{6}'/></xsd:simpleType>
Structuring XML 45
Using Namespaces in XML Using Namespaces in XML SchemaSchema
Multiple source namespaces, importing a namespace
<!--XML Schema fragment in file schema1.xsd-->
<schema targetNamespace='http://www.SampleStore.com/Account'
xmlns='http://www.w3.org/1999/XMLSchema'
xmlns:ACC= 'http://www.SampleStore.com/Account'
xmlns:PART='http://www.PartnerStore.com/PartsCatalog'>
<import namespace='http://www.PartnerStore.com/PartsCatalog'
schemaLocation=
'http://www.ProductStandards.org/repository/alpha.xsd'/>
<element name='InvoiceNo' type='positive-integer'/>
<element name='ProductID' type='ACC:ProductCode'/>
<simpleType name='ProductCode' base='string'>
<pattern value='[A-Z]{1}d{6}'/>
</simpleType>
<element name='stickyGlue' type='PART:SuperGlueType'/>
Structuring XML 46
Defining ElementsDefining Elements
• To define an element is to define its name and content model.
• In XML Schema, the content model of an element is defined by its type.
• Then, the instance elements in an XML document can have only values that fit the types defined in its schema.
Structuring XML 47
Defining ElementsDefining Elements
• A type can be simple or complex. • A simple type cannot contain elements or
attributes in its value.• A complex type can create the effect of
embedding elements in other elements or it can associate attributes with an element.
• The XML Schema spec also includes predefined simple types
• A derived simple type constrains the values of its base type.
Structuring XML 48
Defining ElementsDefining Elements
• Simple, non-nested elements have a simple type– An element that does not contain attributes or o
ther elements can be defined to be of a simple type, predefined or user-defined, such as string, integer, decimal, time, ProductCode, etc.
<element name='age' type='integer'/><element name='price' type='decimal'/>
Structuring XML 49
Defining ElementsDefining Elements
• Elements with attributes must have a complex type– If you want to add an attribute, you must define price as a compl
ex type. – We have defined what is called an anonymous type, where no expl
icit name is given to the complex type. In other words, the name attribute of the complexType element is not defined.
<element name='price'> <complexType base='decimal' derivedBy='extension'> <attribute name='currency' type='string'/> </complexType></element>
Structuring XML 50
Defining ElementsDefining Elements
• Elements that embed other elements must have a complex type
<Book> <Title>Cool XML<Title> <Author>Cool Guy</Author></Book>
<!ELEMENT Book (Title, Author)><!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)>
<element name='Book' type='BookType'/><complexType name='BookType'> <element name='Title' type='string'/> <element name='Author' type='string'/></complexType>
XML Document
XML DTD
XML Schema
Structuring XML 51
Defining ElementsDefining Elements
• A complex type defined with global simple types
<element name='Title' type='string'/><element name='Author' type='string'/><element name='Book' type='BookType'/><complexType name='BookType'> <element ref='Title'/> <element ref='Author'/></complexType>
Structuring XML 52
Defining ElementsDefining Elements
• Hiding BookType as a local type
<element name='Title' type='string'/><element name='Author' type='string'/><element name='Book'> <complexType> <element ref='Title'/> <element ref='Author'/> </complexType></element>
Structuring XML 53
Defining ElementsDefining Elements
• Expressing sophisticated constraints on elements– XML Schema offers greater flexibility than DTD for express
ing constraints on the content model of elements. – At the simplest level, as in DTD, you can associate attribut
es with an element declaration and indicate that a sequence of one only (1), zero or more (*), or one or more (+) elements from a given set of elements can occur in it.
– You can express additional constraints in XML Schema using, for example, minOccurs and maxOccurs attributes of element element and using choice, group, and all elements.
Structuring XML 54
<xsd:schema xmlns:xsd="http://www.w3.org/2000/08/XMLSchema"> <xsd:annotation> <xsd:documentation> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. </xsd:documentation> </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN" use="fixed" value="US"/> </xsd:complexType>
The Purchase Order Schema (1)
Structuring XML 55
<xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="partNum" type="SKU"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> <!-- Stock Keeping Unit, a code for identifying products --> <xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType> </xsd:schema>
The Purchase Order Schema (2)
Structuring XML 56
Simple Types Simple Types
<xsd:simpleType name="myInteger"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="10000"/> <xsd:maxInclusive value="99999"/> </xsd:restriction> </xsd:simpleType>
<xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType>
Structuring XML 57
<xsd:simpleType name="USState"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="AK"/> <xsd:enumeration value="AL"/> <xsd:enumeration value="AR"/> <!-- and so on ... --> </xsd:restriction> </xsd:simpleType>
Simple Types Simple Types
<xsd:simpleType name="listOfMyIntType"> <xsd:list itemType="myInteger"/> </xsd:simpleType>
<listOfMyInt>20003 15037 95977 95945</listOfMyInt>
Structuring XML 58
Simple Types Simple Types
<xsd:simpleType name="USStateList"> <xsd:list itemType="USState"/> </xsd:simpleType>
<xsd:simpleType name="SixUSStates"> <xsd:restriction base="USStateList"> <xsd:length value="6"/> </xsd:restriction> </xsd:simpleType>
<sixStates>PA NY CA NY LA AK</sixStates>
sixStates is declared to be a SixUSStates element.
Types of Interaction with Types of Interaction with DocumentDocument
Structuring XML 60
Types of Interaction with Types of Interaction with DocumentsDocuments
• Most documents stored in XML forms are created for the purpose of conveying information or keeping track of information.
• Types of interactions people have with documents:– creation and modification– management, storage, and archiving– utilization.
Structuring XML 61
Printing
Import
Exchange
Searching andviewing
Creation
Types of Interaction with Types of Interaction with DocumentDocument
Workstation
UpdateWorkstation
Review/validation
Workstation
Conversion/transformation
Workstation
Document classificationDocument assemblyDocument archivalDocument storage
Useful databaseinformation
Document creationand modification
Document managementand storage
Document utilization
Workstation
Laser printer
Building alternatedocuments
Online searchingviewing,
exchange, export
Extraction,analysis
DTD in Electronic BusinessDTD in Electronic Business
Structuring XML 63
RosettaNet: An EB StandardRosettaNet: An EB Standard
• RosettaNet is a consortium of major information technology (IT), electronic components (EC) and semiconductor manufacturing (SM) companies working to create and implement industry-wide EB process standards. – Perfect real-time information. – Efficient e-business processes. – Dynamic trading-partner relationships. – New business opportunities.
Structuring XML 64
Alphabet
Grammar
Dialog
Words
XML
Framework
Dictionary
Sound Internet
Business Process
Telephone Rose
ttaN
et
Telephone
DIALOG PIP
eBusiness Process
Ecom Application
human-to-humanhuman-to-humanbusiness exchangebusiness exchange
Partner-to-Partner Partner-to-Partner eBusiness exchangeeBusiness exchange
Business Process
RosettaNet FocusRosettaNet Focus
Structuring XML 65
ConclusionConclusion
• Schemas greatly improves over DTDs. • Certain kinds of applications can be made
more interoperable by XML Schema. • DTDs are well understood and they do offer a
good way to describe the structure of an document for interchange.
• It will take some time before XML Schema are as well understood.