about XML/Xquery/RDF

HTML vs. XML<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

<bibliography> <book> <title> Foundations…

</title> <author> Abiteboul

</author> <author> Hull </author> <author> Vianu </author> <publisher> Addison

Wesley </publisher> <year> 1995 </year> </book> …

</bibliography>

“Self-describing”

-Schema info part of the data

-Good for data exchange

(albeit baroque for storage)

Why are Database folks so excited about XML?

• XML is just a syntax for (self-describing) data

• This is still exciting because– No standard syntax for

relational data– With XML, we can

• Translate any legacy data to XML

• Can exchange data in XML format

– Ship over the web, input to any application

XML machine accessible meaningThis is what a web-page in natural language looks like for a machine

Jim Hendler

XML machine accessible meaning

education

private

XML allows “meaningful tags” to be added toparts of the text

Jim Hendler

education

private

But to your machine, the tags look like this….

Jim Hendler

Schemas help….

education

private

education

private

< > …by relating common termsbetween documents

Jim Hendler

But other people use other schemas

education

private

Someone else has one like this….

Jim Hendler

But other people use other schemas

education

private

education

private

< >…which don’t fit in

education

private

Moral: There is still

need for ontology

mapping..

Jim Hendler

The X-standards…

• XML: an on-the-wire representation for data– Xquery: a query language for XML– Xschema: a schema description language for

XML data• RDF: a language for meta-data description• WSDL/SOAP/UDDI: languages for

describing services

XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:

<book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags

<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …

</bibliography>

HTML describes presentation

XML describes content

XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:

<book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags

More XML: Attributes

<book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year></book>

Attributes are single-valued --No guidance on when to use them

More XML: Oids and References

oids and references in XML are just syntax

Object identifiers

XML vs. Relational Data• XML is meant as a language that supports

both Text and Structured Data– Conflicting demands...

• XML supports semi-structured data– In essence, the schema can be union

of multiple schemas • Easy to represent books with or

without prices, books with any number of authors etc.

• XML supports free mixing of text and data– using the #PCDATA type

• XML is ordered (while relational data is unordered)

Structured(relational)

XMLLessStructure

MoreStructure

DTDs<!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)>]>

Notice that DTD is not

In XML syntax…

Semi-structured

XML Schemas

• More recent proposal (with XML syntax)• unifies previous schema proposals• generalizes DTDs• uses XML syntax• two documents: structure and datatypes

– http://www.w3.org/TR/xmlschema-1– http://www.w3.org/TR/xmlschema-2

RDF: Meta-data Standard for Web<rdf:Description about=“www.mypage.com”> <about> birds, butterflies, snakes </about> <author> <rdf:Description> <firstname> John </firstname> <lastname> Smith </lastname> </rdf:Description> </author></rdf:Description>

www.mypage.com

birds, butterflies, snakes

John Smith

about author

firstname lastname

Good’ol semantic networks..?

Querying XML• Requirements:

– Need to handle lack of schema.• We may not know much about the data, so we need to navigate the XML.

– Need to support both “information retrieval” and “SQL-style” queries.

• Ordered vs. un-ordered XML – “Human readable”

• like SQL?

• Candidates– Many… based on conflicting requirements

• XSL: Makes IR folks happy• XML-QL: Makes DB folks happy• Xquery : W3C’s attempt to make everybody (un)happy

Agenda: Xquery examples

Information Integration

• XQuery 1.0: An XML Query Language

– W3C Working Draft 20 December 2001

• XML Query Use Cases – W3C Working Draft 20

December 2001• Microsoft .Net Xquery Language

Demo– http://131.107.228.20/– Supports querying on the

documents described in the W3C Use Cases

• Xquery Tutorial by Fankhauser & Wadler

– www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf

Xquery Resources

FLoWeR Expressions

Xquery queries are made up of FLWR expressions that work on “paths”

• For binds variables to nodes• Let computes aggregates• Where applies a formula to find matching elements• Return constructs the output elements

Path expressions are of the form: element//element/element[attrib=value]

Comparison to SQL• Look at the use case description on Xquery manual

• Supports all (?) SQL style queries (with different syntax of course) [default queries in the demo]

• Has support for – “construction”—outputting the answers in arbitrary XML formats

(use case XMP )– “path expressions” --- navigating the XML tree (use case seq)– Simple text queries [use case text]– Allows queries on “Tag” elements

• Removes the “data/meta-data” barrier in queries• For each book that has at least one author, list the title and first two authors,

and an empty "et-al" element if the book has additional authors. [XMP use case 6]

DTD for http://www.bn.com/bib.xml

<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>

Example Query

<bib> { for $b in /bib/book where $b/publisher =

"Addison-Wesley" and $b/@year > 1991 return <book year={ $b/@year

}> { $b/title } </book> } </bib>

“For all books after 1991, return with Year changed from a tag to an attribute”

<bib> <book year="1994"> <title>TCP/IP

Illustrated</title> </book> <book year="1992"> <title>Advanced

Programming in the Unix environment</title>

</book></bib>

ResultQuery

Example Query (2) • Return the books that cost more at amazon

than fatbrainLet $amazon := document(

http://www.amazon.com/books.xml),Let $fatbrain := document(

http://www.fatbrain.com/books.xml)For $am in $amazon/books/book, $fat in $fatbrain/books/bookWhere $am/isbn = $fat/isbn and $am/price > $fat/priceReturn <book>{ $am/title, $am/price, $fat/price

}<book>

XML frenzy in the DB Community

• Now that XML is there, what can we do with it?– Convert all databases from Relational to XML?

• Or provide XML views of relational databases?– Develop theory of native XML databases?

• Or assume that XML data will be stored in relational databases..

– Issues: What sort of storage mechanisms? What sort of indices?

XML middleware for Databases• XML adapters (middle-ware)

received significant attention in DB community– SilkRoute (AT&T)– Xperanto (IBM)

• Issues:– Need to convert relational data

into XML• Tagging (easy)

– Need to convert Xquery queries into equivalent SQL queries

• Trickier as Xquery supports schema querying

Relations

Xquery

Xquery Tutorial

Craig KnoblockUniversity of Southern California

References• XQuery 1.0: An XML Query Language

– W3C Working Draft 20 December 2001• XML Query Use Cases

– W3C Working Draft 20 December 2001• Microsoft .Net Xquery Language Demo

– http://131.107.228.20/– Supports querying on the documents described in the W3C Use

Cases• Xquery Tutorial by Fankhauser & Wadler

– www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf

DTD for http://www.bn.com/bib.xml

<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>

Data for www.bn.com/bib.xml<bib>

<book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price> 65.95</price>

</book> <book year="1992">

<title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author>

<publisher>Addison-Wesley</publisher> <price>65.95</price>

</book>

Data for www.bn.com/bib.xml (cont.)

<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>

</book> <book year="1999">

<title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first>

<affiliation>CITI</affiliation> </editor><publisher>Kluwer Academic Publishers</publisher> <price>129.95</price>

</book> </bib>

Document References

• Document can either be referenced explicitly or in the default namespace

• In the Microsoft Demo– /Bib =

document("http://www.bn.com/bib.xml")/bib• We will use /bib throughout, but you

must use the expansion to run the demo• In Theseus the document for xquery is

passed as input

Projection• Return the names of all authors of books/bib/book/author

=<author><last>Stevens</last><first>W.</first></author><author><last>Stevens</last><first>W.</first></author><author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author>

Project (cont.)• The same query can also be written as a for loop/bib/book/author

=for $bk in /bib/book return

for $aut in $bk/author return $aut=

<author><last>Stevens</last><first>W.</first></author><author><last>Stevens</last><first>W.</first></author><author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author>

Selection• Return the titles of all books published before

1997/bib/book[@year < "1997"]/title=<title>TCP/IP Illustrated</title><title>Advanced Programming in the Unix

environment</title>

Selection (cont.)• Return the titles of all books published before

1997/bib/book[@year < "1997"]/title=for $bk in /bib/book where $bk/@year < "1997" return $bk/title=<title>TCP/IP Illustrated</title><title>Advanced Programming in the Unix

environment</title>

Selection (cont.)• Return book with the title “Data on the Web”/bib/book[title = "Data on the Web"]=

<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></

author><author><last>Buneman</last><first>Peter</first></

author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>

</book>

Selection (cont.)• Return the price of the book “Data on the

Web”/bib/book[title = "Data on the Web"]/price=<price> 39.95</price>

How would you return the book with a price of $39.95?

Selection (cont.)• Return the book with a price of $39.95for $bk in /bib/book where $bk/price = " 39.95" return $bk=

<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>

</book>

Construction• Return year and title of all books published before 1997for $bk in /bib/book where $bk/@year < "1997" return <book>{ $bk/@year, $bk/title }</book>=<book year="1994"> <title>TCP/IP Illustrated</title></book><book year="1992"> <title>Advanced Programming in the Unix

environment</title></book>

Grouping• Return titles for each authorfor $author in distinct(/bib/book/author/last) return <author name={ $author/text() }> { /bib/book[author/last = $author]/title }</author>=<author name="Stevens"> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title></author><author name="Abiteboul"> <title>Data on the Web</title></author>…

Join• Return the books that cost more at amazon than

fatbrainLet $amazon := document(

http://www.amazon.com/books.xml),Let $fatbrain := document(

http://www.fatbrain.com/books.xml)For $am in $amazon/books/book, $fat in $fatbrain/books/bookWhere $am/isbn = $fat/isbn and $am/price > $fat/priceReturn <book>{ $am/title, $am/price, $fat/price }<book>

Example Query 1

<bib> { for $b in /bib/book where $b/publisher = "Addison-Wesley" and

$b/@year > 1991 return <book year={ $b/@year }> { $b/title } </book> } </bib>What does this do?

Result Query 1

<bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix

environment</title> </book></bib>

Example Query 2<results>{ for $b in

document("http://www.bn.com/bib.xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result> }</results>

Result Query 2<results> <result><title>TCP/IP Illustrated</title> <last>Stevens </last> </result> <result><title>Advanced Programming in the Unix environment</title> <last>Stevens</last> </result> <result><title>Data on the Web</title> <last>Abiteboul</last> </result> <result> <title>Data on the Web</title> <last>Buneman</last> </result> <result><title>Data on the Web</title> <last>Suciu</last> </result></results>

Example Query 3

<books-with-prices>{ for $b in document("http://www.bn.com/bib.xml")//book, $a in

document("http://www.amazon.com/reviews.xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <price-amazon>{ $a/price/text() }</price-amazon> <price-bn>{ $b/price/text() }</price-bn> </book-with-prices>}</books-with-prices>

Result Query 3

<books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-amazon>65.95</price-amazon> <price-bn> 65.95</price-bn> </book-with-prices> <book-with-prices> <title>Advanced Programming in the Unix environment</title> <price-amazon>65.95</price-amazon> <price-bn>65.95</price-bn> </book-with-prices> <book-with-prices> <title>Data on the Web </title> <price-amazon>34.95</price-amazon> <price-bn> 39.95</price-bn> </book-with-prices></books-with-prices>

Example Query 4

<bib> { for $b in document("www.bn.com/bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year >

"1991" return <book> { $b/@year } { $b/title } </book> sortby (title) } </bib>

Example Result 4

<bib> <book year="1992"> <title>Advanced Programming in the Unix

environment</title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </book> </bib>

Impact of XML on IntegrationIf and when all sources accept

Xqueries and exchange data in XML format, then– Mediator can accept user

queries in Xquery– Access sources using Xquery– Get data back in XML format– Merge results and send to user

in XML format• How about now?

– Sources can use XML adapters (middle-ware)

Mediator

Xquery

Relations

Xquery

Is XML standardization a magical solution for Integration?

If all WEB sources standardize into XML format– Source access (wrapper generation

issues) become easier to manage– BUT all other problems remain

• Still need to relate source (XML)schemas to mediator (XML)schema

• Still need to reason about source overlap, source access limitations etc.

• Still need to manage execution in the presence of source/network uncertainities

QueryQuery

Services

Webpages

Structureddata

Sensors(streamingData)

Services

Webpages

Structureddata

Sensors(streamingData)

ExecutorNeeds to handleSource/network

Interruptions,Runtime uncertainity,

replanning

Source Fusion/Query Planning

Needs to handle:Multiple objectives,Service composition,

Source quality & overlap

Source TrustOntologies;

Source/ServiceDescriptions

Replanning

Requests

Prefere

tility

Answers

ProbingQueries

Monitor

Updating StatisticsExecutor

Needs to handleSource/network

Interruptions,Runtime uncertainity,

replanning

Source Fusion/Query Planning

Needs to handle:Multiple objectives,Service composition,

Source quality & overlap

Source TrustOntologies;

Source/ServiceDescriptions

Replanning

Requests

Prefere

tility

Answers

ProbingQueries

Monitor

Updating Statistics

Mediator

Xquery

“Semantic Web”

• The LAV/GAV approaches assume that some human expert will do the actual schema mapping

• The “semantic-web” initiative attempts to automate schema mapping– Idea: Allow pages to write logical axioms relating their

vocabulary (tags) to other external tags– Support automatic inference of relations between

source and mediator schema using these rules • DAML+OIL

about XML/Xquery/RDF

Documents

The PARQL XQuery Interoperability Frameworkbikakis/papers/SPARQL2XQuery.pdfData, XML Data, Semantic Web, XML Schema to OWL, SPARQL to XQuery, SPARQL Update, SPARQL 1.1, XML Schema

Lecture 13: XQuery XML Publishing, XML Storage

Lecture 5: XML and XQuery

XML Search and XQuery Full-Text

Web Technologien – XML, XQuery, XPath und XSLTiss.uni-saarland.de/workspace/documents/wt_8_xml_xquery_xpath_und_xslt.pdf · Web Technologien – XML, XQuery, XPath und XSLT Univ.-Prof

XMLII XSchema XSchema XQuery XQuery. XML Schema XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports XML

Querying and Exchanging XML and RDF on the Webpolleres/ · Tutorial Overview Session 1 XQuery Overview – Sherif SPARQL Overview – Axel

XQuery Triggers in Native XML Database Sedna

XML Avancé : DTD, XSD, XPATH, XSLT, XQuery

Introduction to XML, XPath, & XQuery

XML Parsers XPath, XQuery Outline - EPFLlsir · ¥XML parsers ¥XPath ¥XQuery. 31 XQuery Motivation ¥Query is a strongly typed query language ¥Builds on XPath ¥XPath expressivity

XQuery, XML and databases · XQuery – query structure (Unexpectledly) XQuery is not an XML application There exists a verbose XML syntax for XQuery, not intended to be written by

Xquery Tutorial - Information Sciences Institute · References XQuery 1.0: An XML Query Language XML Query Use Cases What is XQuery: Katz, 2004

XQuery Novelties (XML Holland 2010 - hardcore xml)

RDF vs XML

Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –

XQuery – The W3C XML Query Language

9. XML Query Languages III â€“ XQuery

XML Query: xQuery Reference: Xquery By Priscilla Walmsley, Published by O’Reilly

Query Languages for XML: XQuery