31
Lecture 6 Lecture 6 XML DTD XML DTD Content of .xml file Content of .dtd file

Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Embed Size (px)

Citation preview

Page 1: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Lecture 6Lecture 6

XML DTDXML DTD

Content of .xml fileContent of .dtd file

Page 2: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

What is DTD?What is DTD? DTD (Document Type Definition) define an XML DTD (Document Type Definition) define an XML

document’s structure.document’s structure.

A typical DTD defines what elements can appear A typical DTD defines what elements can appear in an XML document, how they can be nested and in an XML document, how they can be nested and what attributes the elements have.what attributes the elements have.

When creating an XML document we can specify When creating an XML document we can specify which DTD it conforms to.which DTD it conforms to.

DTDs are written using the Extended Backus-Naur DTDs are written using the Extended Backus-Naur Form.Form.

DTDs can be locally or publicly stored. Examples DTDs can be locally or publicly stored. Examples of public DTDs are the XHTML’s DTDs.of public DTDs are the XHTML’s DTDs.

Page 3: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

.dtd.xml

.xsd

DTD Schema

or .xml.xml

.xml

DOM API

SoftwareSystem

SAX API

.css

.xsl

.fo

Stylesheet

DTD in the Context of an XML ProjectDTD in the Context of an XML Project

Page 4: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Document Type DeclarationDocument Type Declaration

<!DOCTYPE myMessage [ <!ELEMENT myMessage (#PCDATA)>]>

Internal subset

<!DOCTYPE myMessage SYSTEM "myDTD.dtd">

External subset

<!DOCTYPE myMessage SYSTEM "myDTD.dtd" [ <!ELEMENT myMessage (#PCDATA)>]>

Mixed

DTD is DOCTYPE’s Internal Subset + External Subset

Alte

rnativ

es

Page 5: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Document Type Declaration Document Type Declaration (full example)(full example)

<?xml version="1.0"?>

<!DOCTYPE myMessage SYSTEM "message.dtd">

<myMessage> <message>Hello World!</message></myMessage>

example.xml

<!ELEMENT myMessage (message)><!ELEMENT message (#PCDATA)>

message.dtd

root element

Page 6: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Sequences, Pipe Characters and Sequences, Pipe Characters and Occurrence IndicatorsOccurrence Indicators

<!ELEMENT classroom ( teacher, student)>

<!ELEMENT dessert ( iceCream | pastry )>

<!ELEMENT album ( song+ )>

<!ELEMENT album ( title, ( songTitile, duration )+ )>

<album> <title>Pablo Honey</title>

<songtitle>You</songtitle> <duration>3:27</duration>

<songtitle>Creep</songtitle> <duration>3:55</duration></album>

Page 7: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Sequences, Pipe Characters and Sequences, Pipe Characters and Occurrence Indicators (contd.)Occurrence Indicators (contd.)

<!ELEMENT library ( book* )>

<!ELEMENT seat ( person? )>

<library> <book>XML: How to Program</book> <book>Multimedia</book></library>

<library></library>

<seat> <person>Peter O’Connor</person></seat>

<seat></seat>

Page 8: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Sequences, Pipe Characters and Sequences, Pipe Characters and Occurrence Indicators (contd.)Occurrence Indicators (contd.)

<!ELEMENT donutBox ( jelly?, lemon*, ( ( crème | sugar)+ | glazed ) )>

<donutBox> <jelly>grape</jelly> <lemon>half-sour</lemon> <lemon>sour</lemon> <glazed>chocolate</glazed></donutBox>

<donutBox> <sugar>semi-sweet</sugar> <crème>whipped</crème> <sugar>sweet</sugar></donutBox>

Page 9: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Occurrence Indicators (summary)Occurrence Indicators (summary)

A*A* A may occur zero, one or more timesA may occur zero, one or more times

A+A+A may occur one or more times (but A may occur one or more times (but must occur at least once)must occur at least once)

A?A?A must either not occur or occur A must either not occur or occur onceonce

Page 10: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

EMPTY, Mixed Content and ANYEMPTY, Mixed Content and ANY

<!ELEMENT oven EMPTY>

<!ELEMENT myMessage ( #PCDATA | message )*>

<!ELEMENT program ANY>

ANY is used in early stages of developing DTDs.

<oven/>

<myMessage>Here is some text, some <message>other text</message>and <message>even more text</message></myMessage>

Page 11: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute DeclarationsAttribute Declarations<!ELEMENT x EMPTY><!ATTLIST x y CDATA #REQUIRED>

Element x has an attribute y of type CDATA whose value must be provided.

<!ATTLIST element attribute type attribute_default>

Attribute types:

1. CDATA – string that can contain any character text except < > & ' and "

2. Tokenized attribute types: ID, IDREF, ENTITY, NMTOKEN

3. Enumerated attribute types: NOTATION

Page 12: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Well-formed and Valid XML Well-formed and Valid XML DocumentsDocuments

An XML document is well-formed if it is An XML document is well-formed if it is syntactically correct, i.e.syntactically correct, i.e.

1.1. There is a single root element.There is a single root element.2.2. Each element has a start and an end tag.Each element has a start and an end tag.3.3. Elements are nested properly.Elements are nested properly.4.4. Attribute values are in quotes.Attribute values are in quotes.

An XML Document is Valid if it conforms to the An XML Document is Valid if it conforms to the DTD specified with the DOCTYPE declarationDTD specified with the DOCTYPE declaration

• Any element, attribute and entity used in the XML Any element, attribute and entity used in the XML document must be defined in the DTD;document must be defined in the DTD;

• Each element’s content and attributes must Each element’s content and attributes must correspond to the declarations in the DTDcorrespond to the declarations in the DTD

An XML document can be well-formed but NOT An XML document can be well-formed but NOT valid!valid!

Page 13: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

XML ProcessorXML Processor

Also called XML ParserAlso called XML Parser A software program required to process an A software program required to process an

XML documentXML document• Checks syntaxChecks syntax• A validating XML processors checks whether an A validating XML processors checks whether an

XML document conforms to the specified DTDXML document conforms to the specified DTD• Reports any errorsReports any errors• Required to pass all characters in a document, Required to pass all characters in a document,

including white space characters, to the including white space characters, to the application using the XML documentapplication using the XML document

• A validating XML processor must also inform A validating XML processor must also inform the application which characters constitute the application which characters constitute white space appearing in element contentwhite space appearing in element content

Page 14: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Parameter EntitiesParameter Entities Internal (parsed)Internal (parsed)

• Used to declare entities existing only in the DTDUsed to declare entities existing only in the DTD <!ENTITY % name "entity_value"><!ENTITY % name "entity_value">• Entity value can be a string of characters that contains:Entity value can be a string of characters that contains:

any character that is not any character that is not &&, , %%, , "", , '', , <<, , >> parameter entity reference parameter entity reference %Name;%Name; general entity reference general entity reference &Name;&Name; (to be explained shortly) (to be explained shortly) Unicode character reference, e.g., Unicode character reference, e.g., &#65; &#x4f;&#65; &#x4f;

External (parsed)External (parsed)• Used to link to external DTDsUsed to link to external DTDs• Can be private or publicCan be private or public <!ENTITY % name SYSTEM "URI"><!ENTITY % name SYSTEM "URI"> <!ENTITY % name PUBLIC "FPI" "URI"><!ENTITY % name PUBLIC "FPI" "URI">

URI – Uniform Resource Identifier (more generic than URL)

Page 15: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Parameter Entities (examples)Parameter Entities (examples)

<!ELEMENT author (#PCDATA)><!ENTITY % js "John Smith"><!ENTITY wb "written by %js;">

<!ENTITY % student SYSTEM "http://www.uni.com/stud.dtd">%student;

<!ENTITY % info "(id, surname, firstname)"><!ELEMENT lab_group_A %info;><!ELEMENT lab_group_B %info;><!ELEMENT lab_group_C %info;>

Page 16: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

General EntitiesGeneral Entities Internal (parsed)Internal (parsed)

• Can be thought of text macrosCan be thought of text macros

<!ENTITY name "entity_value"><!ENTITY name "entity_value">• Entity value can be a string of characters; Entity value can be a string of characters;

any character that is not any character that is not &&, , %%, , "", , '', , <<, , >> parameter entity reference parameter entity reference %Name;%Name; general entity reference general entity reference &Name;&Name; Unicode character reference, e.g., Unicode character reference, e.g., &#65; &#x004f;&#65; &#x004f;

<?xml version="1.0" standalone="yes">? <!DOCTYPE author [ <!ELEMENT author (#PCDATA)> <!ENTITY js "John Smith">]><author>&js;</author>

Page 17: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

General Entities (contd.)General Entities (contd.) External (parsed)External (parsed)

• Generally reference text that an XML parser has to parse.Generally reference text that an XML parser has to parse.• Useful for creating a common reference shared between Useful for creating a common reference shared between

multiple documents.multiple documents.• Any changes made to external entities are automatically Any changes made to external entities are automatically

updated in documents they are referenced.updated in documents they are referenced.• Can be private or publicCan be private or public <!ENTITY name SYSTEM "URI"><!ENTITY name SYSTEM "URI"> <!ENTITY name PUBLIC "FPI" "URI"><!ENTITY name PUBLIC "FPI" "URI">

<!ELEMENT copyright (#PCDATA)><!ENTITY c SYSTEM "http://www.xmlwriter.net/copyright.xml"><!ENTITY pc PUBLIC "-//W3C//TEXT copyright//EN" "http://www.w3.org/xmlspec/copyright.html">

<copyright>&c;</copyright><copyright>&pc;</copyright>

Page 18: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

General Entities (contd.)General Entities (contd.) External NDATA entities External NDATA entities (unparsed)(unparsed)

• Generally reference to non-XML data.Generally reference to non-XML data.• Refer to data that an XML processor does not have Refer to data that an XML processor does not have

to parseto parse <!ENTITY name SYSTEM "URI" NDATA name><!ENTITY name SYSTEM "URI" NDATA name> <!ENTITY name PUBLIC "FPI" "URI" NDATA name><!ENTITY name PUBLIC "FPI" "URI" NDATA name>

<!ENTITY logo SYSTEM "http://www.name.com/logo.gif" NDATA gif><!ENTITY plogo PUBLIC "-//W3C//GIF logo//EN" "http://www.w3.org/logo.gif" NDATA gif>

<!NOTATION gif PUBLIC "gif viewer">

NOTATIONS are used to identify the format of unparsed entities (non-XML data), elements with a notation attribute, or specific processing instructions.

Page 19: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Predefined General EntitiesPredefined General Entities

There are 5 predefined entitiesThere are 5 predefined entities• &lt; < &lt; < • &gt; >&gt; >• &amp; &&amp; &• &quot; "&quot; "• &apos; '&apos; '

Every character can also be represented Every character can also be represented as character entity (UNICODE value)as character entity (UNICODE value)• e.g., e.g., &#182;&#182; or or &#xAF4C;&#xAF4C;

Page 20: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute DeclarationsAttribute Declarations

Attributes are defined for the elements they belong to

<!ELEMENT test (question, answer+)><!ATTLIST test subject CDATA #REQUIRED><!ATTLIST test difficulty CDATA #IMPLIED>

or

<!ELEMENT test (question, answer*)><!ATTLIST test subject CDATA #REQUIRED difficulty CDATA #IMPLIED>

Page 21: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute TypesAttribute Types

CDATACDATA• character data, that is, text that does not form character data, that is, text that does not form

markup (no markup (no < > " ' & %< > " ' & %) )

Tokenized attribute typesTokenized attribute types• ID, IDREF, IDREFS, ENTITY, ENTITIES, ID, IDREF, IDREFS, ENTITY, ENTITIES,

NMTOKEN, NMTOKENSNMTOKEN, NMTOKENS Enumerated typesEnumerated types

• NOTATION, EnumeratedNOTATION, Enumerated

Page 22: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute Types (contd.)Attribute Types (contd.) Tokenized attribute typesTokenized attribute types

• IDID IDs of a particular value should not appear more than IDs of a particular value should not appear more than

once in an XML document once in an XML document An element type may only have one ID attributeAn element type may only have one ID attribute An An IDID attribute can only have an attribute can only have an #IMPLIED#IMPLIED or or #REQUIRED#REQUIRED default value default value

The first character of an The first character of an IDID value must be a value must be a letterletter, , __, , or or ::

• IDREFIDREF must refer to an must refer to an IDID value declared elsewhere in the value declared elsewhere in the

document document

• IDREFSIDREFS Allows multiple Allows multiple IDID values separated by whitespace values separated by whitespace

Page 23: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute Types (contd.)Attribute Types (contd.) Tokenized attribute types (more…)Tokenized attribute types (more…)

• ENTITYENTITY Reference to a general entityReference to a general entity

• ENTITIESENTITIES Allows multiple entity names separated by white Allows multiple entity names separated by white

space space

• NMTOKENNMTOKEN Name according to the XML specification; The first Name according to the XML specification; The first

character must be a character must be a letterletter, , digitdigit, , .., , --, , __, or , or ::

• NMTOKENSNMTOKENS Allows multiple Allows multiple NMTOKENNMTOKEN names separated by white names separated by white

spacespace

Page 24: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute Types (contd.)Attribute Types (contd.)

<!ATTLIST test difficulty CDATA #IMPLIED>

<!ATTLIST lecture number ID #REQUIRED>

<!ATTLIST example belongsTo IDREF #REQUIRED>

<test> ... </test><test difficulty="easy"> ... </test>

<lecture number="XML05"> ... </lecture><lecture number="XML06"> ... </lecture><lecture number="XML07"> ... </lecture>

<example belongsTo="XML07"> ... </example>

Example 1

Page 25: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute Types (contd.)Attribute Types (contd.)

<!ELEMENT experiment_a (results)*>

<!ELEMENT results EMPTY> <!ATTLIST results images ENTITIES #REQUIRED>

<!ENTITY a1 SYSTEM "http://www.university.com/results/experimenta/a1.gif">

<!ENTITY a2 SYSTEM "http://www.university.com/results/experimenta/a2.gif">

<!ENTITY a3 SYSTEM "http://www.university.com/results/experimenta/a3.gif">

<experiment_a> <results images="a1 a2 a3"/> </experiment_a>

Example 2

Page 26: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute Types (contd.)Attribute Types (contd.) Enumerated attribute typesEnumerated attribute types

• NOTATIONNOTATION useful when text needs to be interpreted in a useful when text needs to be interpreted in a

particular way, for example, by another particular way, for example, by another application. The first character must be a application. The first character must be a letterletter, , __, or , or ::

• EnumeratedEnumerated Enumerated attribute types allow you to make Enumerated attribute types allow you to make

a choice between different attribute values. a choice between different attribute values. The first character of an Enumerated value The first character of an Enumerated value must be a must be a letterletter, , digitdigit, , .., , --, , __, or , or ::

Page 27: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute Types (contd.)Attribute Types (contd.)

<?xml version="1.0"?>

<!DOCTYPE code [ <!ELEMENT code (#PCDATA)> <!NOTATION vrml PUBLIC "VRML 1.0"> <!ATTLIST code lang NOTATION (vrml) #REQUIRED> ]>

<code lang="vrml">Some VRML instructions</code>

Example 3

Page 28: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute Types (contd.)Attribute Types (contd.)

<?xml version="1.0"?>

<!DOCTYPE ToDoList [ <!ELEMENT ToDoList (task)*> <!ELEMENT task (#PCDATA)> <!ATTLIST task status (important | normal) #REQUIRED> ]>

<ToDoList> <task status="important"> This is an important task that must be completed </task> <task status="normal">This task can wait</task> </ToDoList>

Example 4

Page 29: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute Default ValuesAttribute Default Values

#REQUIRED#REQUIRED• Such an attribute must be suppliedSuch an attribute must be supplied

<!ATTLIST lecture type CDATA #REQUIRED><!ATTLIST lecture type CDATA #REQUIRED> #IMPLIED#IMPLIED

• Such an attribute may be suppliedSuch an attribute may be supplied

<!ATTLIST lecture type CDATA #IMPLIED><!ATTLIST lecture type CDATA #IMPLIED> #FIXED #FIXED

• Such an attribute has a fixed value Such an attribute has a fixed value (even if it is NOT supplied)(even if it is NOT supplied)

<!ATTLIST lecture type CDATA #FIXED "long"><!ATTLIST lecture type CDATA #FIXED "long">

Page 30: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Attribute Default Values (contd.)Attribute Default Values (contd.)

<?xml version="1.0"?>

<!DOCTYPE ToDoList [ <!ELEMENT ToDoList (task)*> <!ELEMENT task (#PCDATA)> <!ATTLIST task status (important|normal) "normal"> ]>

<ToDoList> <task status="important">This is an important task.</task> <task>This is by default a task with a normal status.</task> </ToDoList>

Page 31: Lecture 6 XML DTD Content of.xml fileContent of.dtd file

Declarations in DTDDeclarations in DTD

<!ELEMENT ……><!ELEMENT ……>

<!ATTLIST ……><!ATTLIST ……>

<!ENTITY ……><!ENTITY ……>

<!NOTATION ….><!NOTATION ….>