Upload
jonas-ferguson
View
219
Download
0
Embed Size (px)
Citation preview
www.monash.edu.au
CSE3201 Information Retrieval Systems
DTD
Document Type Definition
www.monash.edu.au
2
Valid XML Document
• A well-formed document complies to the syntax specified by W3C recommendation.
• A valid XML document is a well-formed XML document that also complies to the rules specified in a DTD or XML Schema.
• The rules include:– Naming of elements, attributes.– Structure of the document
www.monash.edu.au
3
DTD Example
<bookshop><book> <title> Harry Potter and the
Philosopher’s Stone</title> <author> <initials>
J.K</initials>
<surname> Rowlings
</surname> </author> <price value=“16.95”>
</price></book>…</bookshop>
<!DOCTYPE bookshop [<!ELEMENT bookshop (book)+><!ELEMENT book (title, author, price)+><!ELEMENT title (#PCDATA)><!ELEMENT author (initials, surname)>
<!ELEMENT initials (#PCDATA)>
<!ELEMENT surname (#PCDATA)><!ELEMENT price EMPTY><!ATTLIST pricevalue CDATA #IMPLIED
>]>
www.monash.edu.au
4
DOCTYPE declaration
• It is used to declare the name of a document. • The name should be the same to the root element’s
name.
<!DOCTYPE bookshop [ … ]>
NOTE: a DOCTYPE declaration DOES NOT replace a declaration of a root element. The root element needs to be declared separately using an element declaration.
<!DOCTYPE bookshop [<!ELEMENT bookshop (book)+>
… ]>
www.monash.edu.au
5
ELEMENT declaration
<!ELEMENT elementName contentModel>
• Content model defines what can be included between the opening tag and the closing tag of an element.
• Content Model: any, empty, text only (simple), element only (complex), mixed.
www.monash.edu.au
6
Content Model - Any
• An element defined with an ANY content model may contain anything, eg elements, character data, comments, etc
<!ELEMENT elementName ANY>
www.monash.edu.au
7
Content Model - Empty
• An element defined with an empty content model may not contain any text or child element.
<!ELEMENT elementName EMPTY>
• The existence of an attribute does not affect the structure of a content model.
XML:<img src=‘logo.png’/>
DTD:<!ELEMENT img EMPTY>
<!ATTLIST img src CDATA #REQUIRED>
www.monash.edu.au
8
Content Model – Text only (Simple)
• An element declared with this content model can only contain textual data (simple string) and entity references.
<!ELEMENT elementName (#PCDATA)>
• Example:XML:
<title> Harry Potter and Philosopher’s stone</title>
DTD:
<!ELEMENT title (#PCDATA)>
www.monash.edu.au
9
Content Model – Element Only (Complex)
• An element declared with this content model may only contain elements and entity references.
<!ELEMENT elementName(childElementName)+>
• Example:
XML:<bookshop>
<book/></bookshop>
DTD:<!ELEMENT bookshop(book)+>
www.monash.edu.au
10
Content Model - Mixed
• An element declared with this content model may contain intersperse child elements with text.
• Example
XML:<anElement> text mixed with <childElement> child
text</childElement></anElement>
DTD:<!ELEMENT anElement (#PCDATA|childElement)*><!ELEMENT childElement (#PCDATA)>
www.monash.edu.au
11
Cardinality
• Defines how many child elements may appear for the declared element.
Operator Description
None One and only one child is allowed
? Zero or one child
* Zero or more child(ren)
+ One or more child(ren)
www.monash.edu.au
12
Sequence Indicator
• Defined the occurrence of the child elements.• Possible occurences:
– “Followed by”=> AND => “ , “– ”Choice of” => OR => “ | “
www.monash.edu.au
13
Sequence Indicator – “Followed By”
DTD:
<!ELEMENT personName (title,firstName,middleName,lastName,suffix)>
<!ELEMENT title (#PCDATA)><!ELEMENT firstName (#PCDATA)><!ELEMENT middleName (#PCDATA)><!ELEMENT lastName (#PCDATA)><!ELEMENT suffix (#PCDATA)>
XML:
<personName> <title>Mr<Title/> <firstName>John</FirstName> <middleName> V </MiddleName> <lastName> Smart </LastName> <suffix>Jr</Suffix></personName>
www.monash.edu.au
14
Sequence Indicator – “Choice of”
DTD:<!ELEMENT personName
((Mr |Ms |Dr ),firstName,middleName,lastName,( Jr |Sr))><!ELEMENT Mr EMPTY ><!ELEMENT Ms EMPTY ><!ELEMENT Dr EMPTY ><!ELEMENT firstName(#PCDATA)><!ELEMENT middleName(#PCDATA)><!ELEMENT lastName(#PCDATA)><!ELEMENT Jr EMPTY ><!ELEMENT Sr EMPTY >
XML
<personName> <Mr/> <firstName>John</firstName> <middleName> V </middleName> <lastName> Smart </lastName> <Jr/></personName>
www.monash.edu.au
15
Attribute Declaration
<!ATTLIST elementNameattrName1 attrType1 attrDefault defaultValue1…attrNameN attrTypeN attrDefault defaultValueN>
DTD:
<!ELEMENT personName EMPTY><!ATTLIST personName
title CDATA #IMPLIEDfirstName CDATA #REQUIREDsurname CDATA #REQUIRED
>
XML:<personName title=“Dr” firstName=“Jenny” surname=“Genius”>
www.monash.edu.au
16
Attribute Type (some)
Type Description
CDATA Character Data (Simple String)
Enumerated Values
One of a series
ID A unique identifier for each instance of this element type
IDREF A reference to an element with ID type attribute
www.monash.edu.au
17
Attribute Defaults
Values Description
#REQUIRED Attribute must appear in every instance of the element.
#IMPLIED Attribute is OPTIONAL.
#FIXED (plus default value)
Attribute is OPTIONAL.
If it does appear, it must match the default value.
If it does not appear, the parser may supply the default value.
www.monash.edu.au
18
#Required - Example
<?xml version="1.0"?><!DOCTYPE friends [<!ELEMENT friends (person)+><!ELEMENT person (personName,email) ><!ELEMENT personName (firstName,surname) ><!ELEMENT firstName (#PCDATA) ><!ELEMENT surname (#PCDATA) ><!ELEMENT email (#PCDATA) ><!ATTLIST person perID ID #REQUIRED>]><friends>
<person perID="p1"><personName >
<firstName> Jenny </firstName><surname> Genius </surname>
</personName><email>[email protected]</email>
</person></friends>
www.monash.edu.au
19
#Implied - Example <?xml version="1.0"?>
<!DOCTYPE friends [
<!ELEMENT friends (person)+>
<!ELEMENT person (personName,email) >
<!ELEMENT personName (firstName,surname) >
<!ELEMENT firstName (#PCDATA) >
<!ELEMENT surname (#PCDATA) >
<!ELEMENT email (#PCDATA) >
<!ATTLIST person title CDATA #IMPLIED >
]>
… (next slide)
www.monash.edu.au
20
#Implied - Example
<friends><person title="Dr">
<personName><firstName> Jenny </firstName><surname> Genius </surname>
</personName><email>[email protected]</email>
</person><person>
<personName ><firstName> John </firstName><surname> Howard </surname>
</personName><email>[email protected]</email>
</person></friends>
www.monash.edu.au
21
#Fixed-Example
<?xml version="1.0"?>
<!DOCTYPE friends [
<!ELEMENT friends (person)+>
<!ELEMENT person (personName,email) >
<!ELEMENT personName (firstName,surname) >
<!ELEMENT firstName (#PCDATA) >
<!ELEMENT surname (#PCDATA) >
<!ELEMENT email (#PCDATA) >
<!ATTLIST person title CDATA #FIXED "Dr" >
]>
www.monash.edu.au
22
#Fixed-Valid Instances
<friends><person title="Dr">
<personName><firstName> Jenny </firstName><surname> Genius </surname>
</personName><email>[email protected]</email>
</person><person>
<personName><firstName> John </firstName><surname> Howard </surname>
</personName><email>[email protected]</email>
</person></friends>
www.monash.edu.au
23
#Fixed – Invalid Instance
<friends>
<person title=“Ms">
<personName>
<firstName> Jenny </firstName>
<surname> Genius </surname>
</personName>
<email>[email protected]</email>
</person>
</friends>
www.monash.edu.au
24
Entity
• Storage Unit• Entity is declared in DTD (except
predefined entity) and is referred in DTD/XML document.
www.monash.edu.au
25
Entity Example
<?xml version="1.0"?>
<!DOCTYPE footNote [
<!ELEMENT footNote (#PCDATA)>
<!ENTITY copy "©2001">
<!ENTITY uni "Monash University">
<!ENTITY disclaimer "No warranty © &uni;">
]>
<footNote>All &uni; websites contain the following disclaimer "&disclaimer;"
</footNote>
www.monash.edu.au
26
External DTD
• Re-use of DTD.• Easy to maintain
– single update• Public DTD
<!DOCTYPE article PUBLIC “MyPublicDTD/Book” http://www.csse.monash.edu.au/DTDs/maria/book.dtd>
• Local DTD<!DOCTYPE article SYSTEM book.dtd>
www.monash.edu.au
27
External DTD Example
• DTD file
<!ELEMENT friends (person)+>
<!ELEMENT person (personName,email) >
<!ELEMENT personName (firstName,surname) >
<!ELEMENT firstName (#PCDATA) >
<!ELEMENT surname (#PCDATA) >
<!ELEMENT email (#PCDATA) >
<!ATTLIST person title CDATA #IMPLIED >
www.monash.edu.au
28
External DTD Example
• XML file<?xml version="1.0" standalone="no"?><!DOCTYPE friends SYSTEM “friends.dtd"><friends>
<person title="Dr"><personName>
<firstName> Jenny </firstName><surname> Genius </surname>
</personName><email>[email protected]</email>
</person></friends>
NOTE: the value of the xml declaration attribute “standalone” has to be set to “no” when an external DTD is used.
www.monash.edu.au
29
Mixed DTDs
• Internal and external can be mixed.• The external has to be declared first.
<!DOCTYPE article PUBLIC “MyPublicDTD/Book” http://www.csse.monash.edu.au/DTDs/maria/book.dtd
[ DTD declarations …]>
Conflict management:– the internal DTD subset always take priority– the internal DTD will overide the external declaration.
internal DTD subset
www.monash.edu.au
30
Limitations of DTD
• Non-XML syntax• DTD is not Extensible• Weak Data Typing• No inheritance
• Possible solution: XML Schema