49
1 Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios, DTDs) Usage scenarios, DTDs)

1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

Embed Size (px)

Citation preview

Page 1: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

1

Module 2Module 2

XML BasicsXML Basics(XML, Namespaces, (XML, Namespaces,

Usage scenarios, DTDs)Usage scenarios, DTDs)

Page 2: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

2

History: SGML vs. HTML History: SGML vs. HTML vs. XMLvs. XML

SGML (1960)

XML(1996)

HTML(1990)XHTML(2000)

http://www.w3.org/TR/2006/REC-xml-20060816/

Page 3: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

3

Why XML ?Why XML ?

HTML is to be interpreted by HTML is to be interpreted by browsers browsers Shown on the screen to a humanShown on the screen to a human

Desire to separate the “content” Desire to separate the “content” from “presentation”from “presentation” Presentation has to please the human Presentation has to please the human eyeeye

Content can be interpreted by machines, Content can be interpreted by machines, for machines presentation is a handicapfor machines presentation is a handicap

Semantic markup of the dataSemantic markup of the data

Page 4: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

4

Information about a Information about a book in HTMLbook in HTML

<td><h1 class=”<td><h1 class=”BooksBooks">">Politics of experience by Ronald Politics of experience by Ronald Laing, published in 1967Laing, published in 1967</h1></td><td align="right" </h1></td><td align="right" nowrap> Item number:320070381076</td><td nowrap> Item number:320070381076</td><td align="right" valign="top"><img align="right" valign="top"><img src="http://pics.booksstatic.com/aw/pics/globalAssetssrc="http://pics.booksstatic.com/aw/pics/globalAssets/rtCurve.gif" width="8" height="8"></td></tr><tr><td /rtCurve.gif" width="8" height="8"></td></tr><tr><td colspan="6" valign="middle" bgcolor="#5F66EE"><img colspan="6" valign="middle" bgcolor="#5F66EE"><img src="http://pics.booksstatic.com/aw/pics/s.gif" src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" height="4"></td></tr></table><table width="1" height="4"></td></tr></table><table width="100%" border="0" cellpadding="0" width="100%" border="0" cellpadding="0" cellspacing="0"><tr><td bgcolor="#CCCCFF"><img cellspacing="0"><tr><td bgcolor="#CCCCFF"><img src="http://pics.booksstatic.com/aw/pics/s.gif" src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" height="1"></td><td bgcolor="#EEEEFF"><div width="1" height="1"></td><td bgcolor="#EEEEFF"><div id="FastVIPBIBO"><table border="0" cellpadding="0" id="FastVIPBIBO"><table border="0" cellpadding="0" cellspacing="0" width="100%">cellspacing="0" width="100%">

Page 5: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

5

The same information in The same information in XMLXML

<<bookbook yearyear=“1967”>=“1967”> <<titletitle>Politics of experience</>Politics of experience</titletitle>> <<authorauthor>>

<<firstnamefirstname>Ronald</>Ronald</firstnamefirstname>><<lastnamelastname>Laing</>Laing</lastnamelastname>>

</</authorauthor>></</bookbook>> Elements

• Information is (1) decoupled from presentation, then (2) chopped into smaller pieces, and then (3) marked with semantic meaning • It can be processed by machines•Like HTML, only syntax, not logical abstract data model

Page 6: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

6

XML key conceptsXML key concepts

DocumentsDocuments ElementsElements AttributesAttributes Namespace declarationsNamespace declarations TextText CommentsComments Processing InstructionsProcessing Instructions All inherited from SGML, then All inherited from SGML, then HTMLHTML

Page 7: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

7

The key concepts of XMLThe key concepts of XML<<bookbook yearyear=“1967”>=“1967”> <<titletitle>Politics of experience</>Politics of experience</titletitle>> <<authorauthor>>

<<firstnamefirstname>Ronald</>Ronald</firstnamefirstname>><<lastnamelastname>Laing</>Laing</lastnamelastname>>

</</authorauthor>></</bookbook>>

Elements • Documents• Elements• Attributes• Text

• Nested structure• Conceptual tree• Order is important• Only “characters”, not integers, etc

Page 8: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

8

ElementsElements Enclosed in TagsEnclosed in Tags

Begin Tag: e.g., Begin Tag: e.g., <bibliography><bibliography> End Tag: e.g., End Tag: e.g., </bibliography></bibliography> Element without content: e.g., Element without content: e.g., <bibliography <bibliography /> /> is a shorthand foris a shorthand for <bibliography> <bibliography> </bibliography></bibliography>

Elements can be nestedElements can be nested<bib> <bib> <book> Wilde Wutz </book> <book> Wilde Wutz </book> </bib></bib>

Subelements can implement multisets Subelements can implement multisets <bib> <bib> <book> ... </book> <book> ... </book> <book> ... </book> <book> ... </book> </bib></bib>

Order is important !Order is important ! Documents must be well-formedDocuments must be well-formed<a> <a> <b> <b> </a> </a> </b> </b> is forbidden!is forbidden!<a> <a> <b> </b> <b> </b> is forbidden!is forbidden!

Page 9: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

9

AttributesAttributes Attribute are associated to ElementsAttribute are associated to Elements<book price = „55“ year = „1967“<book price = „55“ year = „1967“ >> <title> ... </title><title> ... </title> <author> ... </author> <author> ... </author></book></book>

Elements can have only attributesElements can have only attributes<person name = „Wutz“ age = „33“/><person name = „Wutz“ age = „33“/>

Attribute names must be unique! (No Attribute names must be unique! (No Multisets)Multisets)<person name = „Wilde“ name = „Wutz“/> <person name = „Wilde“ name = „Wutz“/> is illegal!is illegal!

What is the difference between a nested What is the difference between a nested element and an attribute? Are attributes element and an attribute? Are attributes useful?useful?

Modeling decision: should „name“ be an Modeling decision: should „name“ be an attribute or a subelement of a person ? attribute or a subelement of a person ? What about „age“ ?What about „age“ ?

Page 10: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

10

Text and Mixed ContentText and Mixed Content Text appears in element contentText appears in element content

<title><title>The politics of experienceThe politics of experience</title></title> Can be mixed with other subelementsCan be mixed with other subelements

<title><title>The politics of The politics of <em>experience</em><em>experience</em></title></title>

Mixed ContentMixed Content For „documents“ data -- very usefulFor „documents“ data -- very useful The need does not arise in „data“ processing, The need does not arise in „data“ processing, only entities and relationshipsonly entities and relationships

People speak in sentences, not entities and People speak in sentences, not entities and relationships. XML allows to preserve the relationships. XML allows to preserve the structure of natural language, while adding structure of natural language, while adding semantic markup that can be interpreted by semantic markup that can be interpreted by machines.machines.

Page 11: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

11

Continuous spectrum Continuous spectrum between natural language, between natural language, semi-structured data, and semi-structured data, and

structured datastructured data1.1. Dana said that the book entitledDana said that the book entitled „The politics „The politics of experience“ is really excellent !of experience“ is really excellent !

2.2. <citation author=„Dana“><citation author=„Dana“> The book entitled The book entitled „The „The politics of experience“ is really excellent ! politics of experience“ is really excellent ! </citation></citation>

3.3. <citation author=„Dana“><citation author=„Dana“> The book entitled The book entitled <title><title> The politics of experienceThe politics of experience</title></title> is is really excellent ! really excellent ! </citation></citation>

4.4. <citation><citation> <author><author>DanaDana</author></author> <aboutTitle><aboutTitle>The politics of experienceThe politics of experience</aboutTitle></aboutTitle> <rating><rating> excellent excellent</rating></rating>

</citation></citation>

Page 12: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

12

CDATA sectionsCDATA sections Sometimes we would like to preserve Sometimes we would like to preserve the original characters, and not the original characters, and not interpret them as markupinterpret them as markup

CDATA sectionsCDATA sections Not parsed as XMLNot parsed as XML

<message><message> <greeting>Hello,world!</greeting><greeting>Hello,world!</greeting></message></message>

<message> <message> <![CDATA[<greeting>Hello, <![CDATA[<greeting>Hello, world!</greeting>]]>world!</greeting>]]> </message> </message>

Page 13: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

13

Comments, PIs, PrologComments, PIs, Prolog Comment: Syntax as in HTMLComment: Syntax as in HTML<!-- this is a comment --><!-- this is a comment -->

Processing InstructionsProcessing Instructions Contain no data - interpretation by processorContain no data - interpretation by processor Syntax: Syntax: <?pause 10 secs ?> <?pause 10 secs ?> Pause is Pause is „Target“; „Target“; 10secs 10secs is „Content“is „Content“ XML XML is a reserved target for prologis a reserved target for prolog

PrologProlog<?xml version=„1.0“ encoding=„UTF-8“ <?xml version=„1.0“ encoding=„UTF-8“ standalone=„yes“ ?>standalone=„yes“ ?> Standalone defines whether there is a Standalone defines whether there is a DTDDTD

Encoding is usually Unicode.Encoding is usually Unicode.

Page 14: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

14

Whitespaces Whitespaces declarationdeclaration

Whitespace = Continuous sequence ofWhitespace = Continuous sequence of SpaceSpace, , TabTab and and Return Return character character

Special Attribute Special Attribute xml:spacexml:space to control to control useuse

Human-readible XML (with Whitespace)Human-readible XML (with Whitespace)<book <book xml:space=„preserve“xml:space=„preserve“ > > <title>The politics of experience</title> <title>The politics of experience</title> <author>Ronald laing</author> <author>Ronald laing</author></book></book>

(Efficient) machine-readible XML (no (Efficient) machine-readible XML (no WS)WS) <book <book xml:space=„default“xml:space=„default“ ><title>The ><title>The politics of experience</title><author>Ronald politics of experience</title><author>Ronald Laing</author></book>Laing</author></book>

Performance improvement: ca. Factor 2.Performance improvement: ca. Factor 2.

Page 15: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

15

Language declarationLanguage declaration

<p <p xml:lang="en">xml:lang="en">The quick The quick brown fox jumps over the lazy brown fox jumps over the lazy dog.</p>dog.</p>

<p <p xml:lang="en-GB">xml:lang="en-GB">What colour What colour is it?</p>is it?</p>

<p <p xml:lang="en-US">xml:lang="en-US">What color What color is it?</p>is it?</p>

Page 16: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

16

Universal Resource Universal Resource Identifiers on the WebIdentifiers on the Web URLs, URIs, IRIsURLs, URIs, IRIs

URL (Universal Resource Locators):URL (Universal Resource Locators): deferenceable deferenceable identifier on the Webidentifier on the Web The target of an URL pointer is an HTML file (virtual or The target of an URL pointer is an HTML file (virtual or materialized)materialized)

URIs (Unique Resource Identifier):URIs (Unique Resource Identifier): general purpose general purpose key to resources on the Webkey to resources on the Web Uniquely identifies a resourceUniquely identifies a resource Target is not an HTML file, can be anything (schema, table, Target is not an HTML file, can be anything (schema, table, file, entity, object, tuple, person, physical item, etc)file, entity, object, tuple, person, physical item, etc)

Lifetime and scope of this “key” is user dependentLifetime and scope of this “key” is user dependent IRI (Internationalized Resource Identifiers)IRI (Internationalized Resource Identifiers)

Allow non Latin characters (Chinese, Arabic, Japanese, etc)Allow non Latin characters (Chinese, Arabic, Japanese, etc) URL, URI, IRIsURL, URI, IRIs

All stringsAll strings Very LONG stringsVery LONG strings

Page 17: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

17

NamespacesNamespaces Integration of Data from diverse data sourcesIntegration of Data from diverse data sources Integration of different XML Vocabularies (aka Integration of different XML Vocabularies (aka Namespaces)Namespaces)

Each „vocabulary“ has a unique key, identified by a Each „vocabulary“ has a unique key, identified by a URI/IRIURI/IRI

Same local name, from different vocabularies can haveSame local name, from different vocabularies can have Different meaningDifferent meaning Different structure associated with itDifferent structure associated with it

Qualified Names (Qname) to attach a „name“ to its Qualified Names (Qname) to attach a „name“ to its „vocabulary“„vocabulary“ for all nodes in an XML document that has names (Attributes, for all nodes in an XML document that has names (Attributes, Elements, PisElements, Pis

QNameQName ::= triple ( URI ::= triple ( URI [ prefix: ][ prefix: ] localname )localname ) Binding (prefix, URI) is introduced in elements start tagBinding (prefix, URI) is introduced in elements start tag Later only the prefix is used, not the long URIsLater only the prefix is used, not the long URIs Prefix is optional, default namespacesPrefix is optional, default namespaces Prefix and localname a separated by „:“ Prefix and localname a separated by „:“

„„http://w3.org/TR/1999/REC-xml-names“http://w3.org/TR/1999/REC-xml-names“

Page 18: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

18

Namespaces (cont)Namespaces (cont)

Namespace definitions look like Namespace definitions look like AttributesAttributes Identified by „xmlns:prefix“ or „xmlns“ Identified by „xmlns:prefix“ or „xmlns“ (default)(default)

Bind the Prefix to the URIBind the Prefix to the URI Scope is the entire element where the Scope is the entire element where the namespace is declarednamespace is declared Includes the element itslef, its Includes the element itslef, its attributes and ist subtreesattributes and ist subtrees

ExampleExample<<ns:ns:a a xmlns:ns=„someURI“ ns:xmlns:ns=„someURI“ ns:b=„foo“> b=„foo“>

<<ns:ns:b>content</b>content</nsns:b>:b></</ns:ns:a>a>

Page 19: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

19

Default namespacesDefault namespaces Default namespaces, no prefixDefault namespaces, no prefix

<a xmlns=„someURI“ ><a xmlns=„someURI“ > <b/> <!-- a and b are in the someURI <b/> <!-- a and b are in the someURI namespace! -->namespace! -->

</a></a> Only applies to subelements, not Only applies to subelements, not attributesattributes<a xmlns=„someURI“ <a xmlns=„someURI“ c = „not in someURI c = „not in someURI namespace“namespace“>>

<b/> <!-- a and b are in the someURI <b/> <!-- a and b are in the someURI namespace! -->namespace! -->

</a></a>

Page 20: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

20

Example: NamespacesExample: Namespaces

DQ1 defines DQ1 defines dishdish for for chinachina Diameter, Volume, Decor, ...Diameter, Volume, Decor, ...

DQ2 defines DQ2 defines dishdish for for satellitessatellites Diameter, FrequencyDiameter, Frequency

How many „dishes“ are there?How many „dishes“ are there? Better ask for:Better ask for:

„„How many How many dishes dishes are there?“are there?“ or or

„„How many How many dishesdishes are there are there?“ ?“

Page 21: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

21

Example: NamespacesExample: Namespaces<gs:dish <gs:dish xmlns:gs = „http://china.com“xmlns:gs = „http://china.com“ >>

<gs:dm gs:unit = „cm“><gs:dm gs:unit = „cm“>2020</gs:dm></gs:dm>

<gs:vol gs:unit = „l“><gs:vol gs:unit = „l“>55</gs:vol></gs:vol>

<gs:decor><gs:decor>MeissnerMeissner</gs:decor></gs:decor>

</gs:dish></gs:dish>

<sat:dish <sat:dish xmlns:sat = „http://satelite.com“xmlns:sat = „http://satelite.com“ >>

<sat:dm><sat:dm>200200</sat:dm></sat:dm>

<sat:freq><sat:freq>20-2000MHz20-2000MHz</sat:freq></sat:freq>

</sat:dish></sat:dish>

Page 22: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

22

Mixing Several Mixing Several NamespacesNamespaces

<<gs:dish xmlns:gs = „http://china.com“gs:dish xmlns:gs = „http://china.com“

xmlns:uom = xmlns:uom = „http://units.com“>„http://units.com“>

<<gs:dmgs:dm uom:unit = „cm“>uom:unit = „cm“>2020<</gs:dm/gs:dm>>

<<gs:volgs:vol uom:unit = „l“>uom:unit = „l“>55<</gs:vol/gs:vol>>

<<gs:decorgs:decor>>MeissnerMeissner<</gs:decor/gs:decor>>

<comment><comment>This is an unqualified element This is an unqualified element namename</comment></comment>

<</gs:dish/gs:dish>>

Page 23: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

23

Example XML dataExample XML data XHTML (browser/presentation)XHTML (browser/presentation) RSS (blogs)RSS (blogs) UBL (Universal Business Language)UBL (Universal Business Language) HealthCare Level 7 (medical data)HealthCare Level 7 (medical data) XBRL (financial data)XBRL (financial data) Digital photography metadata (XMP)Digital photography metadata (XMP) XMI (metadata)XMI (metadata) XQueryX (programs)XQueryX (programs) XForms (forms)XForms (forms) SOAP (message envelopes)SOAP (message envelopes) Microsoft Office -- Powerpoint in XML Microsoft Office -- Powerpoint in XML (documents)(documents)

Page 24: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

24

XHTMLXHTML

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 25: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

25

RSS, blogsRSS, blogs <?xml version="1.0"?><rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/"> <channel rdf:about="http://www.xml.com/xml/news.rss"> <title>XML.com</title> <link>http://xml.com/pub</link> <description> XML.com features a rich mix of information and services for the XML community. </description> <image rdf:resource="http://xml.com/universal/images/xml_tiny.gif" /> <items> <rdf:Seq> <rdf:li resource="http://xml.com/pub/2000/08/09/xslt/xslt.html" /> <rdf:li resource="http://xml.com/pub/2000/08/09/rdfdb/index.html" /> </rdf:Seq> </items> <textinput rdf:resource="http://search.xml.com" /> </channel> <image rdf:about="http://xml.com/universal/images/xml_tiny.gif"> <title>XML.com</title> <link>http://www.xml.com</link> <url>http://xml.com/universal/images/xml_tiny.gif</url> </image>

Page 26: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

26

UBL (Universal Business UBL (Universal Business Language) Language)

Vocabularies definitions for:Vocabularies definitions for: ApplicationResponse•AttachedDocument•BillOfApplicationResponse•AttachedDocument•BillOfLading•Catalogue•CatalogueDeletion•CataloguLading•Catalogue•CatalogueDeletion•CatalogueItemSpecificationUpdate•CataloguePricingUpeItemSpecificationUpdate•CataloguePricingUpdate•CatalogueRequest•CertificateOfOrigin•Cdate•CatalogueRequest•CertificateOfOrigin•CreditNote•DebitNote•DespatchAdvice•ForwardireditNote•DebitNote•DespatchAdvice•ForwardingInstructions•FreightInvoice•Invoice•OrderngInstructions•FreightInvoice•Invoice•Order•OrderCancellation•OrderChange•OrderRespons•OrderCancellation•OrderChange•OrderResponse•OrderResponseSimple•PackingList•Quotatione•OrderResponseSimple•PackingList•Quotation•ReceiptAdvice•Reminder•RemittanceAdvice•Re•ReceiptAdvice•Reminder•RemittanceAdvice•RequestForQuotation•SelfBilledCreditNote•SelfquestForQuotation•SelfBilledCreditNote•SelfBilledInvoice•Statement•TransportationStatuBilledInvoice•Statement•TransportationStatus•Waybills•Waybill

Page 27: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

27

HealthCareLevel 7HealthCareLevel 7

Medical information that is being Medical information that is being exchanged between hospitals, exchanged between hospitals, patients, doctors, pharmacies and patients, doctors, pharmacies and insurance companiesinsurance companies

http://en.wikipedia.org/wiki/HL7http://en.wikipedia.org/wiki/HL7

Page 28: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

28

XBRL (Financial XBRL (Financial information)information)

Goal: facilitate the exchange of Goal: facilitate the exchange of business and financial business and financial performance information between performance information between companies, governments, insurance companies, governments, insurance companies, banks, etc.companies, banks, etc.

Mandate by law in many countriesMandate by law in many countries http://en.wikipedia.org/wiki/XBRLhttp://en.wikipedia.org/wiki/XBRL

Page 29: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

29

Extensible Metadata Platform Extensible Metadata Platform (XMP)(XMP)

Used in Used in PDFPDF, , photographyphotography and and photo editingphoto editing applications.applications.

Particular Particular schemasschemas for basic properties for basic properties useful for recording the history of a useful for recording the history of a resource as it passes through multiple resource as it passes through multiple processing steps, from being photographed, processing steps, from being photographed, scannedscanned, or authored as text, through photo , or authored as text, through photo editing steps (such as editing steps (such as croppingcropping or color or color adjustment), to assembly into a final image.adjustment), to assembly into a final image.

XMP allows each software program or device XMP allows each software program or device along the way to add its own information to a along the way to add its own information to a digital resource, which can then be retained digital resource, which can then be retained in the final digital file.in the final digital file.

http://en.wikipedia.org/wiki/http://en.wikipedia.org/wiki/Extensible_Metadata_PlatformExtensible_Metadata_Platform

Page 30: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

30

Microsoft Office in XMLMicrosoft Office in XML

Office 2003 was able to Office 2003 was able to import/export all documents into XMLimport/export all documents into XML

Office 2007 models the documents Office 2007 models the documents NATIVELY in XMLNATIVELY in XML

Examples of vocabularies and Examples of vocabularies and schemas:schemas: WordprocessingML (the XML file format for WordprocessingML (the XML file format for Word 2003), SpreadsheetML (Excel 2003), Word 2003), SpreadsheetML (Excel 2003), FormTemplate XML schemas (InfoPath 2003) FormTemplate XML schemas (InfoPath 2003) and DataDiagramingML (Visio 2003)and DataDiagramingML (Visio 2003)

Page 31: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

31

Forms on the Web in XMLForms on the Web in XML

XML Forms (Xforms)XML Forms (Xforms) http://www.w3.org/TR/xforms/http://www.w3.org/TR/xforms/

<xforms:model> <xforms:instance> <xforms:model> <xforms:instance> <ecommerce xmlns=""> <method/> <ecommerce xmlns=""> <method/> <number/> <expiry/> <number/> <expiry/> </ecommerce> </xforms:instance> </ecommerce> </xforms:instance> <xforms:submission <xforms:submission action="http://example.com/submit" action="http://example.com/submit" method="post" id="submit" method="post" id="submit" </xforms:model></xforms:model>

Page 32: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

32

Programs and queries in Programs and queries in XMLXML XQuery, the XML query language, has an XML XQuery, the XML query language, has an XML

representationrepresentation Programs and queries are also DATAPrograms and queries are also DATA Blurring the distinction between data, Blurring the distinction between data, metadata, codemetadata, code

<xqx:functionName>distinct</xqx:functionName> <xqx:functionName>distinct</xqx:functionName> <xqx:parameters> <xqx:expr <xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:functionCallExpr"> xsi:type="xqx:functionCallExpr"> <xqx:functionName>document</xqx:functionName> <xqx:functionName>document</xqx:functionName> <xqx:parameters> <xqx:expr <xqx:parameters> <xqx:expr xsi:type="xqx:stringConstantExpr"> xsi:type="xqx:stringConstantExpr"> <xqx:value>http://www.bn.com</xqx:value> <xqx:value>http://www.bn.com</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:parameters> </xqx:expr> <xqx:stepExpr> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:elementTest> <xqx:elementTest> <xqx:nodeName> <xqx:nodeName> <xqx:QName>author</xqx:QName> <xqx:QName>author</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> </xqx:stepExpr> </xqx:expr>

Page 33: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

33

SOAP and Web ServicesSOAP and Web Services Web Services is the favorite way of exchanging Web Services is the favorite way of exchanging information between applicationsinformation between applications

XML exchange over HTTP, with a specific XML exchange over HTTP, with a specific protocol (SOAP)protocol (SOAP)

<?xml version='1.0' ?><env:Envelope <?xml version='1.0' ?><env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <m:reservation <env:Header> <m:reservation xmlns:m="http://travelcompany.example.org/reservation" xmlns:m="http://travelcompany.example.org/reservation" env:role="http://www.w3.org/2003/05/soap-envelope/role/next" env:role="http://www.w3.org/2003/05/soap-envelope/role/next" env:mustUnderstand="true"> <m:reference>uuid:093a2da1- env:mustUnderstand="true"> <m:reference>uuid:093a2da1-q345-739r-ba5d-pqff98fe8j7d</m:reference> <m:dateAndTime>2001-q345-739r-ba5d-pqff98fe8j7d</m:reference> <m:dateAndTime>2001-11-29T13:20:00.000-05:00</m:dateAndTime> </m:reservation> 11-29T13:20:00.000-05:00</m:dateAndTime> </m:reservation> <n:passenger xmlns:n="http://mycompany.example.com/employees" <n:passenger xmlns:n="http://mycompany.example.com/employees" env:role="http://www.w3.org/2003/05/soap-envelope/role/next" env:role="http://www.w3.org/2003/05/soap-envelope/role/next" env:mustUnderstand="true"> <n:name>Åke Jógvan env:mustUnderstand="true"> <n:name>Åke Jógvan Øyvind</n:name> </n:passenger> </env:Header> <env:Body/> Øyvind</n:name> </n:passenger> </env:Header> <env:Body/> </env:Envelope></env:Envelope>

Page 34: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

34

The need for XML The need for XML “schemas”“schemas” Unlike any other data format, XML is totally Unlike any other data format, XML is totally

flexible, elements can be nested in arbitrary flexible, elements can be nested in arbitrary waysways

We can start by writing the XML data -- no need We can start by writing the XML data -- no need for a priori design of a schemafor a priori design of a schema Think relational databases, or Java classesThink relational databases, or Java classes

However, schemas are necessary:However, schemas are necessary: Facilitate the writing of applications that process Facilitate the writing of applications that process datadata

Constraint the data that is correct for a certain Constraint the data that is correct for a certain applicationapplication

Have a priori agreements between parties with respect Have a priori agreements between parties with respect to the data being exchangedto the data being exchanged

Schema: a model of the dataSchema: a model of the data Structural definitionsStructural definitions Type definitionsType definitions DefaultsDefaults

Page 35: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

35

History and role of XML History and role of XML Schema LanguagesSchema Languages

Several standard Schema LanguagesSeveral standard Schema Languages DTDs, XML Schema, RelaxNGDTDs, XML Schema, RelaxNG

Schema languages have been designed after, Schema languages have been designed after, and in an orthogonal fashion, to XML itselfand in an orthogonal fashion, to XML itself

Schemas and data are completely decoupled Schemas and data are completely decoupled in XMLin XML Data can exist with or without schemasData can exist with or without schemas Or with multiple schemasOr with multiple schemas Schema evolutions rarely impose evolving the Schema evolutions rarely impose evolving the datadata

Schemas can be designed before the data, or Schemas can be designed before the data, or extracted from the data (DataGuide -- Stanford)extracted from the data (DataGuide -- Stanford)

Makes XML the right choice for manipulating Makes XML the right choice for manipulating semi-structured data, or rapidly evolving semi-structured data, or rapidly evolving data, or highly customizable datadata, or highly customizable data

Page 36: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

36

DTDsDTDs Inherited from SGMLInherited from SGML Part of the original XML 1.0 specificationPart of the original XML 1.0 specification Describe the “grammar” of the XML fileDescribe the “grammar” of the XML file

Element declarations:Element declarations: how elements are allowed how elements are allowed to nest within each other by rules and to nest within each other by rules and constraintsconstraints

Attributes lists:Attributes lists: describe what attributes are describe what attributes are allowed on which elementallowed on which element

Some constraints on the value of elements and Some constraints on the value of elements and attributesattributes

Which is the root element of the XML fileWhich is the root element of the XML file Checking the structural constraints: Checking the structural constraints: DTD DTD validation validation (valid vs. invalid documents)(valid vs. invalid documents)

DTD very useful for a while, not used DTD very useful for a while, not used anymore, several major limitationsanymore, several major limitations

Page 37: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

37

Declaring the Declaring the structure of elementsstructure of elements Grammar that describes the structure of Grammar that describes the structure of

the elementthe element Subelements, identified by Name orSubelements, identified by Name or #PCDATA#PCDATA

Combinators :Combinators : „„+“ for at least 1+“ for at least 1 „„*“ for 0 or more *“ for 0 or more „„?“ for 0 or 1?“ for 0 or 1 „ „ , „ for concatenation, „ for concatenation „ „ | „ for choice | „ for choice

<!ELEMENT a ( (b | c) * , d ? , e ) ><!ELEMENT a ( (b | c) * , d ? , e ) > PCDATA: only textual content allowedPCDATA: only textual content allowed

<!ELEMENT a #PCDATA><!ELEMENT a #PCDATA> EMPTY : the element must be emptyEMPTY : the element must be empty

<!ELEMENT a EMPTY><!ELEMENT a EMPTY> ANY: allows any contentANY: allows any content

<!ELEMENT a ANY ><!ELEMENT a ANY >

Page 38: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

38

Example DTD for recipesExample DTD for recipes

<!ELEMENT collection (description,recipe*)><!ELEMENT description ANY><!ELEMENT recipe

(title,ingredient*,preparation,comment?,nutrition)>

<!ELEMENT title (#PCDATA)><!ELEMENT ingredient (ingredient*,preparation)?><!ELEMENT preparation (step*)><!ELEMENT step (#PCDATA)><!ELEMENT comment (#PCDATA)><!ELEMENT nutrition EMPTY>

Page 39: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

39

Defining the attribute Defining the attribute listslists

Structure: Structure: <!ATTLIST <!ATTLIST ElementNameElementName definitiondefinition>>

<!ATTLIST<!ATTLIST ingredient ingredient name CDATA #REQUIRED name CDATA #REQUIRED amount CDATA #IMPLIED amount CDATA #IMPLIED unit CDATA #FIXED unit CDATA #FIXED „cup“ „cup“ >>

CDATA means normal contentCDATA means normal content #REQUIRED, or #IMPLIED refer to the #REQUIRED, or #IMPLIED refer to the fact that the attribute is optional fact that the attribute is optional or notor not

Default value possibleDefault value possible

Page 40: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

40

Attributes (cont.)Attributes (cont.) #REQUIRED#REQUIRED

Document must specify a value for attributeDocument must specify a value for attribute #IMPLIED#IMPLIED

Attribute is optional, there is no defaultAttribute is optional, there is no default valuevalue

Default value, if no other value specifiedDefault value, if no other value specified #FIXED #FIXED valuevalue

Default value, if no other value specifiedDefault value, if no other value specified If value specified, it must be the fixed If value specified, it must be the fixed valuevalue

Page 41: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

41

Major attribute typesMajor attribute types

PCDATA: normal Text contentPCDATA: normal Text content IDID

Value is unique within documentValue is unique within document Element has at most one attribute of Element has at most one attribute of this typethis type

No default values allowedNo default values allowed IDREF, IDREFSIDREF, IDREFS

References to other elements within References to other elements within the documentthe document

IDREFS: Enumeration, „ “ as IDREFS: Enumeration, „ “ as separatorseparator

Page 42: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

42

ID and IDREF attributesID and IDREF attributes <!ATTLIST<!ATTLIST book book

isbn ID isbn ID #REQUIRED#REQUIRED price CDATA #IMPLIED price CDATA #IMPLIED index IDREFS „“ index IDREFS „“ >>

<book id=„1“ index=„2 3 “ ><book id=„1“ index=„2 3 “ > <book id=„2“ index=„3“/><book id=„2“ index=„3“/> <book id =„3“/><book id =„3“/>

Page 43: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

43

Attributes list exampleAttributes list example

<!ELEMENT ingredient (ingredient*,preparation)?><!ATTLIST ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED><!ELEMENT nutrition EMPTY><!ATTLIST nutrition protein CDATA #REQUIRED carbohydrates CDATA #REQUIRED

fat CDATA #REQUIRED calories CDATA #REQUIRED alcohol CDATA #IMPLIED>

Page 44: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

44

Mixed content in DTDsMixed content in DTDs

Mixing PCDATA declarations with Mixing PCDATA declarations with other subelements means that the other subelements means that the content can be “mixed”content can be “mixed”

<!ELEMENT p(#PCDATA|a|ul|b|i|em)*><!ELEMENT p(#PCDATA|a|ul|b|i|em)*>

<p>some text <em>some emphasized <p>some text <em>some emphasized text</em> blah <b>some bold text</em> blah <b>some bold text</b> </p>text</b> </p>

Page 45: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

45

Declarations of DTDsDeclarations of DTDs No DTD (well-formed Documents)No DTD (well-formed Documents) DTD inside the Document: DTD inside the Document: <!DOCTYPE name <!DOCTYPE name [definition][definition] >>

DTD external, specified by URI:DTD external, specified by URI:<!DOCTYPE name <!DOCTYPE name SYSTEM „demo.dtd“>SYSTEM „demo.dtd“>

DTD external, Name and optional DTD external, Name and optional URI:URI:<!DOCTYPE name <!DOCTYPE name PUBLIC „Demo“>PUBLIC „Demo“><!DOCTYPE name <!DOCTYPE name PUBLIC „Demo“ „demo.dtd“>PUBLIC „Demo“ „demo.dtd“>

DTD inside the document + external:DTD inside the document + external:<!DOCTYPE name1 <!DOCTYPE name1 SYSTEM „demo.dtdSYSTEM „demo.dtd >>

Page 46: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

46

Correctness of XML Correctness of XML documentsdocuments

Well formedWell formed documents documents Verify the basic XML constraints, e.g. <a></b>Verify the basic XML constraints, e.g. <a></b>

Valid documentsValid documents Verify the additional DTD structural Verify the additional DTD structural constraintsconstraints

Non well formed XML documents cannot be Non well formed XML documents cannot be processedprocessed

Non-valid documents can still be processed Non-valid documents can still be processed (queried, transformed, etc)(queried, transformed, etc)

Page 47: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

47

Limitations of DTDsLimitations of DTDs DTDs describe only the “grammar” of the DTDs describe only the “grammar” of the XML file, not the detailed structure XML file, not the detailed structure and/or typesand/or types

This grammatical description has some This grammatical description has some obvious shortcomings:obvious shortcomings: we cannot express that a “length” element must we cannot express that a “length” element must contain a non-negative number contain a non-negative number (constraints on (constraints on the type of the value of an element or the type of the value of an element or attribute)attribute)

The “unit”The “unit” element should only be allowed when element should only be allowed when ““amount”amount” is present is present (co-occurrence (co-occurrence constraints)constraints)

the “the “comment”comment” element should be allowed to element should be allowed to appear anywhere appear anywhere (schema flexibility)(schema flexibility)

Page 48: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

48

Good Schema design principles

The XML schema language shall be1. more expressive than XML DTDs2. expressed in XML3. self-describing4. usable by a wide variety of applications

that employ XML5. straightforwardly usable on the Internet6. optimized for interoperability7. simple enough to be implemented with

modest design and runtime resources8. coordinated with relevant W3C specs

Page 49: 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

49

RecapitulationRecapitulation XML as inheriting from the Web historyXML as inheriting from the Web history

SGML, HTML, XHTML, XMLSGML, HTML, XHTML, XML XML key conceptsXML key concepts

Documents, elements, attributes, textDocuments, elements, attributes, text Order, nested structure, textual informationOrder, nested structure, textual information

NamespacesNamespaces XML usage scenariosXML usage scenarios

Financial, medical, metadata, blogs, etcFinancial, medical, metadata, blogs, etc DTDs and the need for describing the DTDs and the need for describing the “structure” of an XML file“structure” of an XML file

Next: XML SchemasNext: XML Schemas