30
www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

Embed Size (px)

Citation preview

Page 1: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

CSE3201 Information Retrieval Systems

DTD

Document Type Definition

Page 2: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

2

Valid XML Document

• A well-formed document complies to the syntax specified by W3C recommendation.

• A valid XML document is a well-formed XML document that also complies to the rules specified in a DTD or XML Schema.

• The rules include:– Naming of elements, attributes.– Structure of the document

Page 3: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

3

DTD Example

<bookshop><book> <title> Harry Potter and the

Philosopher’s Stone</title> <author> <initials>

J.K</initials>

<surname> Rowlings

</surname> </author> <price value=“16.95”>

</price></book>…</bookshop>

<!DOCTYPE bookshop [<!ELEMENT bookshop (book)+><!ELEMENT book (title, author, price)+><!ELEMENT title (#PCDATA)><!ELEMENT author (initials, surname)>

<!ELEMENT initials (#PCDATA)>

<!ELEMENT surname (#PCDATA)><!ELEMENT price EMPTY><!ATTLIST pricevalue CDATA #IMPLIED

>]>

Page 4: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

4

DOCTYPE declaration

• It is used to declare the name of a document. • The name should be the same to the root element’s

name.

<!DOCTYPE bookshop [ … ]>

NOTE: a DOCTYPE declaration DOES NOT replace a declaration of a root element. The root element needs to be declared separately using an element declaration.

<!DOCTYPE bookshop [<!ELEMENT bookshop (book)+>

… ]>

Page 5: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

5

ELEMENT declaration

<!ELEMENT elementName contentModel>

• Content model defines what can be included between the opening tag and the closing tag of an element.

• Content Model: any, empty, text only (simple), element only (complex), mixed.

Page 6: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

6

Content Model - Any

• An element defined with an ANY content model may contain anything, eg elements, character data, comments, etc

<!ELEMENT elementName ANY>

Page 7: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

7

Content Model - Empty

• An element defined with an empty content model may not contain any text or child element.

<!ELEMENT elementName EMPTY>

• The existence of an attribute does not affect the structure of a content model.

XML:<img src=‘logo.png’/>

DTD:<!ELEMENT img EMPTY>

<!ATTLIST img src CDATA #REQUIRED>

Page 8: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

8

Content Model – Text only (Simple)

• An element declared with this content model can only contain textual data (simple string) and entity references.

<!ELEMENT elementName (#PCDATA)>

• Example:XML:

<title> Harry Potter and Philosopher’s stone</title>

DTD:

<!ELEMENT title (#PCDATA)>

Page 9: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

9

Content Model – Element Only (Complex)

• An element declared with this content model may only contain elements and entity references.

<!ELEMENT elementName(childElementName)+>

• Example:

XML:<bookshop>

<book/></bookshop>

DTD:<!ELEMENT bookshop(book)+>

Page 10: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

10

Content Model - Mixed

• An element declared with this content model may contain intersperse child elements with text.

• Example

XML:<anElement> text mixed with <childElement> child

text</childElement></anElement>

DTD:<!ELEMENT anElement (#PCDATA|childElement)*><!ELEMENT childElement (#PCDATA)>

Page 11: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

11

Cardinality

• Defines how many child elements may appear for the declared element.

Operator Description

None One and only one child is allowed

? Zero or one child

* Zero or more child(ren)

+ One or more child(ren)

Page 12: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

12

Sequence Indicator

• Defined the occurrence of the child elements.• Possible occurences:

– “Followed by”=> AND => “ , “– ”Choice of” => OR => “ | “

Page 13: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

13

Sequence Indicator – “Followed By”

DTD:

<!ELEMENT personName (title,firstName,middleName,lastName,suffix)>

<!ELEMENT title (#PCDATA)><!ELEMENT firstName (#PCDATA)><!ELEMENT middleName (#PCDATA)><!ELEMENT lastName (#PCDATA)><!ELEMENT suffix (#PCDATA)>

XML:

<personName> <title>Mr<Title/> <firstName>John</FirstName> <middleName> V </MiddleName> <lastName> Smart </LastName> <suffix>Jr</Suffix></personName>

Page 14: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

14

Sequence Indicator – “Choice of”

DTD:<!ELEMENT personName

((Mr |Ms |Dr ),firstName,middleName,lastName,( Jr |Sr))><!ELEMENT Mr EMPTY ><!ELEMENT Ms EMPTY ><!ELEMENT Dr EMPTY ><!ELEMENT firstName(#PCDATA)><!ELEMENT middleName(#PCDATA)><!ELEMENT lastName(#PCDATA)><!ELEMENT Jr EMPTY ><!ELEMENT Sr EMPTY >

XML

<personName> <Mr/> <firstName>John</firstName> <middleName> V </middleName> <lastName> Smart </lastName> <Jr/></personName>

Page 15: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

15

Attribute Declaration

<!ATTLIST elementNameattrName1 attrType1 attrDefault defaultValue1…attrNameN attrTypeN attrDefault defaultValueN>

DTD:

<!ELEMENT personName EMPTY><!ATTLIST personName

title CDATA #IMPLIEDfirstName CDATA #REQUIREDsurname CDATA #REQUIRED

>

XML:<personName title=“Dr” firstName=“Jenny” surname=“Genius”>

Page 16: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

16

Attribute Type (some)

Type Description

CDATA Character Data (Simple String)

Enumerated Values

One of a series

ID A unique identifier for each instance of this element type

IDREF A reference to an element with ID type attribute

Page 17: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

17

Attribute Defaults

Values Description

#REQUIRED Attribute must appear in every instance of the element.

#IMPLIED Attribute is OPTIONAL.

#FIXED (plus default value)

Attribute is OPTIONAL.

If it does appear, it must match the default value.

If it does not appear, the parser may supply the default value.

Page 18: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

18

#Required - Example

<?xml version="1.0"?><!DOCTYPE friends [<!ELEMENT friends (person)+><!ELEMENT person (personName,email) ><!ELEMENT personName (firstName,surname) ><!ELEMENT firstName (#PCDATA) ><!ELEMENT surname (#PCDATA) ><!ELEMENT email (#PCDATA) ><!ATTLIST person perID ID #REQUIRED>]><friends>

<person perID="p1"><personName >

<firstName> Jenny </firstName><surname> Genius </surname>

</personName><email>[email protected]</email>

</person></friends>

Page 19: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

19

#Implied - Example <?xml version="1.0"?>

<!DOCTYPE friends [

<!ELEMENT friends (person)+>

<!ELEMENT person (personName,email) >

<!ELEMENT personName (firstName,surname) >

<!ELEMENT firstName (#PCDATA) >

<!ELEMENT surname (#PCDATA) >

<!ELEMENT email (#PCDATA) >

<!ATTLIST person title CDATA #IMPLIED >

]>

… (next slide)

Page 20: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

20

#Implied - Example

<friends><person title="Dr">

<personName><firstName> Jenny </firstName><surname> Genius </surname>

</personName><email>[email protected]</email>

</person><person>

<personName ><firstName> John </firstName><surname> Howard </surname>

</personName><email>[email protected]</email>

</person></friends>

Page 21: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

21

#Fixed-Example

<?xml version="1.0"?>

<!DOCTYPE friends [

<!ELEMENT friends (person)+>

<!ELEMENT person (personName,email) >

<!ELEMENT personName (firstName,surname) >

<!ELEMENT firstName (#PCDATA) >

<!ELEMENT surname (#PCDATA) >

<!ELEMENT email (#PCDATA) >

<!ATTLIST person title CDATA #FIXED "Dr" >

]>

Page 22: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

22

#Fixed-Valid Instances

<friends><person title="Dr">

<personName><firstName> Jenny </firstName><surname> Genius </surname>

</personName><email>[email protected]</email>

</person><person>

<personName><firstName> John </firstName><surname> Howard </surname>

</personName><email>[email protected]</email>

</person></friends>

Page 23: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

23

#Fixed – Invalid Instance

<friends>

<person title=“Ms">

<personName>

<firstName> Jenny </firstName>

<surname> Genius </surname>

</personName>

<email>[email protected]</email>

</person>

</friends>

Page 24: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

24

Entity

• Storage Unit• Entity is declared in DTD (except

predefined entity) and is referred in DTD/XML document.

Page 25: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

25

Entity Example

<?xml version="1.0"?>

<!DOCTYPE footNote [

<!ELEMENT footNote (#PCDATA)>

<!ENTITY copy "&#xA9;2001">

<!ENTITY uni "Monash University">

<!ENTITY disclaimer "No warranty &copy; &uni;">

]>

<footNote>All &uni; websites contain the following disclaimer &quot;&disclaimer;&quot;

</footNote>

Page 26: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

26

External DTD

• Re-use of DTD.• Easy to maintain

– single update• Public DTD

<!DOCTYPE article PUBLIC “MyPublicDTD/Book” http://www.csse.monash.edu.au/DTDs/maria/book.dtd>

• Local DTD<!DOCTYPE article SYSTEM book.dtd>

Page 27: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

27

External DTD Example

• DTD file

<!ELEMENT friends (person)+>

<!ELEMENT person (personName,email) >

<!ELEMENT personName (firstName,surname) >

<!ELEMENT firstName (#PCDATA) >

<!ELEMENT surname (#PCDATA) >

<!ELEMENT email (#PCDATA) >

<!ATTLIST person title CDATA #IMPLIED >

Page 28: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

28

External DTD Example

• XML file<?xml version="1.0" standalone="no"?><!DOCTYPE friends SYSTEM “friends.dtd"><friends>

<person title="Dr"><personName>

<firstName> Jenny </firstName><surname> Genius </surname>

</personName><email>[email protected]</email>

</person></friends>

NOTE: the value of the xml declaration attribute “standalone” has to be set to “no” when an external DTD is used.

Page 29: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

29

Mixed DTDs

• Internal and external can be mixed.• The external has to be declared first.

<!DOCTYPE article PUBLIC “MyPublicDTD/Book” http://www.csse.monash.edu.au/DTDs/maria/book.dtd

[ DTD declarations …]>

Conflict management:– the internal DTD subset always take priority– the internal DTD will overide the external declaration.

internal DTD subset

Page 30: Www.monash.edu.au CSE3201 Information Retrieval Systems DTD Document Type Definition

www.monash.edu.au

30

Limitations of DTD

• Non-XML syntax• DTD is not Extensible• Weak Data Typing• No inheritance

• Possible solution: XML Schema