about XML/Xquery/RDF

Preview:

DESCRIPTION

about XML/Xquery/RDF. < h1 > Bibliography < p > < i > Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 < p > < i > Data on the Web Abiteoul, Buneman, Suciu < br > Morgan Kaufmann, 1999. < bibliography > - PowerPoint PPT Presentation

Citation preview

HTML vs. XML<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

<bibliography> <book> <title> Foundations…

</title> <author> Abiteboul

</author> <author> Hull </author> <author> Vianu </author> <publisher> Addison

Wesley </publisher> <year> 1995 </year> </book> …

</bibliography>

“Self-describing”

-Schema info part of the data

-Good for data exchange

(albeit baroque for storage)

Why are Database folks so excited about XML?

• XML is just a syntax for (self-describing) data

• This is still exciting because– No standard syntax for

relational data– With XML, we can

• Translate any legacy data to XML

• Can exchange data in XML format

– Ship over the web, input to any application

XML machine accessible meaningThis is what a web-page in natural language looks like for a machine

Jim Hendler

XML machine accessible meaning

CV

name

education

work

private

< >

< >

< >

< >

< >

XML allows “meaningful tags” to be added toparts of the text

Jim Hendler

XML machine accessible meaning

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

But to your machine, the tags look like this….

Jim Hendler

XML machine accessible meaning

Schemas help….

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

< > …by relating common termsbetween documents

Jim Hendler

But other people use other schemas

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

>

<>

<>

Someone else has one like this….

Jim Hendler

But other people use other schemas

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

< >…which don’t fit in

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

< >

< >

Moral: There is still

need for ontology

mapping..

Jim Hendler

11/18

The X-standards…

• XML: an on-the-wire representation for data– Xquery: a query language for XML– Xschema: a schema description language for

XML data• RDF: a language for meta-data description• WSDL/SOAP/UDDI: languages for

describing services

XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:

<book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags

<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …

</bibliography>

HTML describes presentation

XML describes content

XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:

<book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags

More XML: Attributes

<book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year></book>

Attributes are single-valued --No guidance on when to use them

More XML: Oids and References

<person id=“o555”> <name> Jane </name> </person>

<person id=“o456”> <name> Mary </name> <children idref=“o123 o555”/></person>

<person id=“o123” mother=“o456”><name>John</name></person>

oids and references in XML are just syntax

Object identifiers

XML vs. Relational Data• XML is meant as a language that supports

both Text and Structured Data– Conflicting demands...

• XML supports semi-structured data– In essence, the schema can be union

of multiple schemas • Easy to represent books with or

without prices, books with any number of authors etc.

• XML supports free mixing of text and data– using the #PCDATA type

• XML is ordered (while relational data is unordered)

TEXT

Structured(relational)

Data

XMLLessStructure

MoreStructure

DTDs<!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)>]>

<paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section></paper>

Notice that DTD is not

In XML syntax…

Semi-structured

XML Schemas

• More recent proposal (with XML syntax)• unifies previous schema proposals• generalizes DTDs• uses XML syntax• two documents: structure and datatypes

– http://www.w3.org/TR/xmlschema-1– http://www.w3.org/TR/xmlschema-2

RDF: Meta-data Standard for Web<rdf:Description about=“www.mypage.com”> <about> birds, butterflies, snakes </about> <author> <rdf:Description> <firstname> John </firstname> <lastname> Smith </lastname> </rdf:Description> </author></rdf:Description>

www.mypage.com

birds, butterflies, snakes

John Smith

about author

firstname lastname

Good’ol semantic networks..?

Querying XML• Requirements:

– Need to handle lack of schema.• We may not know much about the data, so we need to navigate the XML.

– Need to support both “information retrieval” and “SQL-style” queries.

• Ordered vs. un-ordered XML – “Human readable”

• like SQL?

• Candidates– Many… based on conflicting requirements

• XSL: Makes IR folks happy• XML-QL: Makes DB folks happy• Xquery : W3C’s attempt to make everybody (un)happy

11/20

Agenda: Xquery examples

Information Integration

• XQuery 1.0: An XML Query Language

– W3C Working Draft 20 December 2001

• XML Query Use Cases – W3C Working Draft 20

December 2001• Microsoft .Net Xquery Language

Demo– http://131.107.228.20/– Supports querying on the

documents described in the W3C Use Cases

• Xquery Tutorial by Fankhauser & Wadler

– www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf

Xquery Resources

FLoWeR Expressions

Xquery queries are made up of FLWR expressions that work on “paths”

• For binds variables to nodes• Let computes aggregates• Where applies a formula to find matching elements• Return constructs the output elements

Path expressions are of the form: element//element/element[attrib=value]

Comparison to SQL• Look at the use case description on Xquery manual

• Supports all (?) SQL style queries (with different syntax of course) [default queries in the demo]

• Has support for – “construction”—outputting the answers in arbitrary XML formats

(use case XMP )– “path expressions” --- navigating the XML tree (use case seq)– Simple text queries [use case text]– Allows queries on “Tag” elements

• Removes the “data/meta-data” barrier in queries• For each book that has at least one author, list the title and first two authors,

and an empty "et-al" element if the book has additional authors. [XMP use case 6]

DTD for http://www.bn.com/bib.xml

<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>

Example Query

<bib> { for $b in /bib/book where $b/publisher =

"Addison-Wesley" and $b/@year > 1991 return <book year={ $b/@year

}> { $b/title } </book> } </bib>

“For all books after 1991, return with Year changed from a tag to an attribute”

<bib> <book year="1994"> <title>TCP/IP

Illustrated</title> </book> <book year="1992"> <title>Advanced

Programming in the Unix environment</title>

</book></bib>

ResultQuery

Example Query (2) • Return the books that cost more at amazon

than fatbrainLet $amazon := document(

http://www.amazon.com/books.xml),Let $fatbrain := document(

http://www.fatbrain.com/books.xml)For $am in $amazon/books/book, $fat in $fatbrain/books/bookWhere $am/isbn = $fat/isbn and $am/price > $fat/priceReturn <book>{ $am/title, $am/price, $fat/price

}<book>

Join

XML frenzy in the DB Community

• Now that XML is there, what can we do with it?– Convert all databases from Relational to XML?

• Or provide XML views of relational databases?– Develop theory of native XML databases?

• Or assume that XML data will be stored in relational databases..

– Issues: What sort of storage mechanisms? What sort of indices?

XML middleware for Databases• XML adapters (middle-ware)

received significant attention in DB community– SilkRoute (AT&T)– Xperanto (IBM)

• Issues:– Need to convert relational data

into XML• Tagging (easy)

– Need to convert Xquery queries into equivalent SQL queries

• Trickier as Xquery supports schema querying

SQL

Relations

Xquery

XML

Xquery Tutorial

Craig KnoblockUniversity of Southern California

References• XQuery 1.0: An XML Query Language

– W3C Working Draft 20 December 2001• XML Query Use Cases

– W3C Working Draft 20 December 2001• Microsoft .Net Xquery Language Demo

– http://131.107.228.20/– Supports querying on the documents described in the W3C Use

Cases• Xquery Tutorial by Fankhauser & Wadler

– www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf

DTD for http://www.bn.com/bib.xml

<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>

Data for www.bn.com/bib.xml<bib>

<book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price> 65.95</price>

</book> <book year="1992">

<title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author>

<publisher>Addison-Wesley</publisher> <price>65.95</price>

</book>

Data for www.bn.com/bib.xml (cont.)

<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>

</book> <book year="1999">

<title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first>

<affiliation>CITI</affiliation> </editor><publisher>Kluwer Academic Publishers</publisher> <price>129.95</price>

</book> </bib>

Document References

• Document can either be referenced explicitly or in the default namespace

• In the Microsoft Demo– /Bib =

document("http://www.bn.com/bib.xml")/bib• We will use /bib throughout, but you

must use the expansion to run the demo• In Theseus the document for xquery is

passed as input

Projection• Return the names of all authors of books/bib/book/author

=<author><last>Stevens</last><first>W.</first></author><author><last>Stevens</last><first>W.</first></author><author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author>

Project (cont.)• The same query can also be written as a for loop/bib/book/author

=for $bk in /bib/book return

for $aut in $bk/author return $aut=

<author><last>Stevens</last><first>W.</first></author><author><last>Stevens</last><first>W.</first></author><author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author>

Selection• Return the titles of all books published before

1997/bib/book[@year < "1997"]/title=<title>TCP/IP Illustrated</title><title>Advanced Programming in the Unix

environment</title>

Selection (cont.)• Return the titles of all books published before

1997/bib/book[@year < "1997"]/title=for $bk in /bib/book where $bk/@year < "1997" return $bk/title=<title>TCP/IP Illustrated</title><title>Advanced Programming in the Unix

environment</title>

Selection (cont.)• Return book with the title “Data on the Web”/bib/book[title = "Data on the Web"]=

<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></

author><author><last>Buneman</last><first>Peter</first></

author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>

</book>

Selection (cont.)• Return the price of the book “Data on the

Web”/bib/book[title = "Data on the Web"]/price=<price> 39.95</price>

How would you return the book with a price of $39.95?

Selection (cont.)• Return the book with a price of $39.95for $bk in /bib/book where $bk/price = " 39.95" return $bk=

<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>

</book>

Construction• Return year and title of all books published before 1997for $bk in /bib/book where $bk/@year < "1997" return <book>{ $bk/@year, $bk/title }</book>=<book year="1994"> <title>TCP/IP Illustrated</title></book><book year="1992"> <title>Advanced Programming in the Unix

environment</title></book>

Grouping• Return titles for each authorfor $author in distinct(/bib/book/author/last) return <author name={ $author/text() }> { /bib/book[author/last = $author]/title }</author>=<author name="Stevens"> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title></author><author name="Abiteboul"> <title>Data on the Web</title></author>…

Join• Return the books that cost more at amazon than

fatbrainLet $amazon := document(

http://www.amazon.com/books.xml),Let $fatbrain := document(

http://www.fatbrain.com/books.xml)For $am in $amazon/books/book, $fat in $fatbrain/books/bookWhere $am/isbn = $fat/isbn and $am/price > $fat/priceReturn <book>{ $am/title, $am/price, $fat/price }<book>

Example Query 1

<bib> { for $b in /bib/book where $b/publisher = "Addison-Wesley" and

$b/@year > 1991 return <book year={ $b/@year }> { $b/title } </book> } </bib>What does this do?

Result Query 1

<bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix

environment</title> </book></bib>

Example Query 2<results>{ for $b in

document("http://www.bn.com/bib.xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result> }</results>

Result Query 2<results> <result><title>TCP/IP Illustrated</title> <last>Stevens </last> </result> <result><title>Advanced Programming in the Unix environment</title> <last>Stevens</last> </result> <result><title>Data on the Web</title> <last>Abiteboul</last> </result> <result> <title>Data on the Web</title> <last>Buneman</last> </result> <result><title>Data on the Web</title> <last>Suciu</last> </result></results>

Example Query 3

<books-with-prices>{ for $b in document("http://www.bn.com/bib.xml")//book, $a in

document("http://www.amazon.com/reviews.xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <price-amazon>{ $a/price/text() }</price-amazon> <price-bn>{ $b/price/text() }</price-bn> </book-with-prices>}</books-with-prices>

Result Query 3

<books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-amazon>65.95</price-amazon> <price-bn> 65.95</price-bn> </book-with-prices> <book-with-prices> <title>Advanced Programming in the Unix environment</title> <price-amazon>65.95</price-amazon> <price-bn>65.95</price-bn> </book-with-prices> <book-with-prices> <title>Data on the Web </title> <price-amazon>34.95</price-amazon> <price-bn> 39.95</price-bn> </book-with-prices></books-with-prices>

Example Query 4

<bib> { for $b in document("www.bn.com/bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year >

"1991" return <book> { $b/@year } { $b/title } </book> sortby (title) } </bib>

Example Result 4

<bib> <book year="1992"> <title>Advanced Programming in the Unix

environment</title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </book> </bib>

Impact of XML on IntegrationIf and when all sources accept

Xqueries and exchange data in XML format, then– Mediator can accept user

queries in Xquery– Access sources using Xquery– Get data back in XML format– Merge results and send to user

in XML format• How about now?

– Sources can use XML adapters (middle-ware)

Mediator

Xquery

XML

Xquery

XML

SQL

Relations

Xquery

XML

Is XML standardization a magical solution for Integration?

If all WEB sources standardize into XML format– Source access (wrapper generation

issues) become easier to manage– BUT all other problems remain

• Still need to relate source (XML)schemas to mediator (XML)schema

• Still need to reason about source overlap, source access limitations etc.

• Still need to manage execution in the presence of source/network uncertainities

QueryQuery

Services

Webpages

Structureddata

Sensors(streamingData)

Services

Webpages

Structureddata

Sensors(streamingData)

ExecutorNeeds to handleSource/network

Interruptions,Runtime uncertainity,

replanning

Source Fusion/Query Planning

Needs to handle:Multiple objectives,Service composition,

Source quality & overlap

Source TrustOntologies;

Source/ServiceDescriptions

Replanning

Requests

Prefere

nce/U

tility

Model

Answers

ProbingQueries

Sour

ce C

alls

Monitor

Updating StatisticsExecutor

Needs to handleSource/network

Interruptions,Runtime uncertainity,

replanning

Source Fusion/Query Planning

Needs to handle:Multiple objectives,Service composition,

Source quality & overlap

Source TrustOntologies;

Source/ServiceDescriptions

Replanning

Requests

Prefere

nce/U

tility

Model

Answers

ProbingQueries

Sour

ce C

alls

Monitor

Updating Statistics

Mediator

Xquery

XML

Xquery

XML

“Semantic Web”

• The LAV/GAV approaches assume that some human expert will do the actual schema mapping

• The “semantic-web” initiative attempts to automate schema mapping– Idea: Allow pages to write logical axioms relating their

vocabulary (tags) to other external tags– Support automatic inference of relations between

source and mediator schema using these rules • DAML+OIL

Recommended