Upload
stuart-bradley
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Dan Suciu Tools for XML Data Exchange
Tools for XML Data Exchange
Dan SuciuAT&T Labs
Joint work with Mary Fernandez
Dan Suciu Tools for XML Data Exchange
XML Has Many Facets
• XML for fancier Web pages
– XML generated with structural editors
• XML for messaging
– generated during applications
• XML for Data Exchange
– generated from legacy data
Dan Suciu Tools for XML Data Exchange
XML in Data Exchange
• communities agree on common DTD
• export their data in XML
• exchange over HTTP protocol
• applications understand only that DTD
Dan Suciu Tools for XML Data Exchange
An Example of XML Data<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book> <publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
Dan Suciu Tools for XML Data Exchange
XML Exchange Vision
application
relational data
Transform
Integrate
Warehouse
XML Data WEB (HTTP)
application
application
legacy data
object-relational
Dan Suciu Tools for XML Data Exchange
Tools
• export legacy data to XML– RXL
• query/transform/integrate XML data– XML-QL
• compress XML data– XMill
• store/process incoming XML data– STORED
Dan Suciu Tools for XML Data Exchange
XML-QL: A Query Language for XML
• http://www.w3.org/TR/NOTE-xml-ql (8/98)
• W3C new Working Group on QL (9/99)
• XML-QL characteristics:– relational complete (like SQL)– XML input, XML output– queries, transforms, integrates XML data
[Deutsch et al., 1999 (WWW8)]
Dan Suciu Tools for XML Data Exchange
Querying in XML-QL
where <book language=“french”> <publisher> <name> Morgan Kaufmann </name> </publisher> <author> $a </author> </book> in “www.a.b.c/bib.xml”construct $a
where <book language=“french”> <publisher> <name> Morgan Kaufmann </name> </publisher> <author> $a </author> </book> in “www.a.b.c/bib.xml”construct $a
Pattern
Dan Suciu Tools for XML Data Exchange
Transformations in XML-QL
Note: </> abbreviates </book> or </result> or ...
where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author> $a </> <lang> $l </> </>
where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author> $a </> <lang> $l </> </>
<result> <author>. . .</author><lang>. . .</lang></result><result> <author>. . .</author><lang>. . .</lang></result><result> <author>. . .</author><lang>. . .</lang></result>
Template
Dan Suciu Tools for XML Data Exchange
Transformations in XML-QL
where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author id=F($a)> $a</> <lang> $l </> </>
where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author id=F($a)> $a</> <lang> $l </> </>
<result> <author>. . .</author> <lang>. . .</lang> <lang>. . .</lang> </result><result> <author>. . .</author> <lang>. . .</lang> <lang>. . .</lang> </result>
Skolem Functions in Templates
Dan Suciu Tools for XML Data Exchange
Data Integration in XML-QL
{ where <book > <isbn> $n </> <title> $t </> </> in “www.books.com” construct <result id=F($n)> <title> $t </> </> }
{ where <review> <isbn> $n </> <review> $r </> </> in “www.reviews.com”construct <result id=F($n)> <review> $r </> </> }
{ where <book > <isbn> $n </> <title> $t </> </> in “www.books.com” construct <result id=F($n)> <title> $t </> </> }
{ where <review> <isbn> $n </> <review> $r </> </> in “www.reviews.com”construct <result id=F($n)> <review> $r </> </> }
<result id=“..” > <title>. . .</title> <review>. . .</review> <review>. . .</review> </result>
Dan Suciu Tools for XML Data Exchange
RXL:Export Legacy Data To XML• legacy data
– fragmented into many flat relations– 3rd normal form– schema is proprietary
• XML data– nested– un-normalized– schema designed by agreement
Dan Suciu Tools for XML Data Exchange
RXL: An Example
• relational database:
• virtual XML view:
<store> <name> n1 </name> <book> ... </book> <book> ... </book> ... </store> <store> <name>n2 </name> <book> ... </book> <book> ... </book> …</store>
s i d n a m e… …… …
Stores i d b i d… …… …
SBb i d t i t l e… …… …
Book
Dan Suciu Tools for XML Data Exchange
A Simple RXL Query
• specify XML view declaratively
from Store, SB, Bookwhere Store.sid=SB.sid and SB.bid=Book.bidconstruct <store ID=f(Store.sid)> <name> Store.name </name> <book> Book.title </book> </store>
from Store, SB, Bookwhere Store.sid=SB.sid and SB.bid=Book.bidconstruct <store ID=f(Store.sid)> <name> Store.name </name> <book> Book.title </book> </store>
Dan Suciu Tools for XML Data Exchange
RXL: Querying the XML View
• users ask XML-QL queries:– find stores who sell “The Calculus”
where <store> <name> $n </name> <book> The Calculus </book> <store>construct <result> $n </result>
where <store> <name> $n </name> <book> The Calculus </book> <store>construct <result> $n </result>
Dan Suciu Tools for XML Data Exchange
RXL: Query composition
system composes query with view:from Store, SB, Bookwhere Store.sid=SB.sid and SB.bid=Book.bid and Book.title=“The Calculus”construct <result> Store.name </result>
from Store, SB, Bookwhere Store.sid=SB.sid and SB.bid=Book.bid and Book.title=“The Calculus”construct <result> Store.name </result>
s i d n a m e… …… …
Stores i d b i d… …… …
SBb i d t i t l e… …… …
Book<store> <name> n1 </name> <book> ... </book> <book> ... </book> ... </store> <store> <name>n2 </name> <book> ... </book> <book> ... </book> …</store>
RXL XML-QL
Dan Suciu Tools for XML Data Exchange
Compressing XML Data
• for exchange and archiving
• can use general tool (gzip)
• but specialized tool twice as good (Xmill)
Dan Suciu Tools for XML Data Exchange
Xmill Example: Weblogs
202.239.238.16|GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478 |-|-|http://www02.so-net.or.jp/|Mozilla/3.01 [ja] (Win95; I)
<apache:entry> <apache:host>202.239.238.16</apache:host> <apache:requestLine>GET / HTTP/1.0</apache:requestLine> <apache:contentType>text/html</apache:contentType> <apache:statusCode>200</apache:statusCode> <apache:date>1997/10/01-00:00:02</apache:date> <apache:byteCount>4478</apache:byteCount> <apache:referer>http://www02.so-net.or.jp/</apache:referer> <apache:userAgent>Mozilla/3.01 [ja] (Win95; I)</apache:userAgent> </apache:entry></store>
Dan Suciu Tools for XML Data Exchange
Xmill Example: Weblogs
weblog.dat: 15.9MB weblog.dat.gz: 1.6MB
weblog.xml: 24.2MB weblog.xml.gz: 2.1MB
weblog1.xmi: 1.75MB
weblog2.xmi: 1.33MB
weblog3.xmi: 0.82MB
xmill -p // weblog.xml weblog1.xmixmill -p // weblog.xml weblog1.xmi
xmill weblog.xml weblog2.xmi xmill weblog.xml weblog2.xmi
xmill -f settings.pz weblog.xml weblog3.xmi xmill -f settings.pz weblog.xml weblog3.xmi
Dan Suciu Tools for XML Data Exchange
Xmill: Fine Tuning the Compression
-p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8)-p//apache:userAgent=>seq(e "/" e)-p//apache:byteCount=>u-p//apache:statusCode=>e-p//apache:contentType=>e-p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e)-p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di)-p//apache:referer=>or(seq("file:" t) seq("http://" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t)
-p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8)-p//apache:userAgent=>seq(e "/" e)-p//apache:byteCount=>u-p//apache:statusCode=>e-p//apache:contentType=>e-p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e)-p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di)-p//apache:referer=>or(seq("file:" t) seq("http://" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t)
Dan Suciu Tools for XML Data Exchange
Storing XML Data
• Scenario:– receive a large XML data instance– want to store, manage it
• Could build an XML management system from scratch (eXcelon)
• Preferably: use existing database systems
Dan Suciu Tools for XML Data Exchange
&o1
&o3
&o2
&o4 &o5
paper
title author authoryear
&o6
“The Calculus” “…” “…” “1986”
Storing XML:Ternary Relation
[Florescu, Kossman 1999]
S o u r c e L a b e l D e s t
& o 1 p a p e r & o 2& o 2 t i t l e & o 3& o 2 a u t h o r & o 4& o 2 a u t h o r & o 5& o 2 y e a r & o 6
N o d e V a l u e
& o 3 T h e C a l c u l u s& o 4 …& o 5 …& o 6 1 9 8 6
Ref
Val
Dan Suciu Tools for XML Data Exchange
Storing XML:Derive Schema from DTD
• DTD:
• ODMG classes:
• [Christophides et al. 1994 , Shanmugasundaram et al. 1999]
<!ELEMENT employee (name, address, project*)><!ELEMENT address (street, city, state, zip)>
class Employee public type tuple (name:string, address:Address, project:List(Project))class Address public type tuple (street:string, …)
Dan Suciu Tools for XML Data Exchange
STORED Approach:Mine Data to Derive Schema
paperpaper paper
paper
authorauthor author author author
titletitle title title
year
fn fn fn fn lnlnlnln
a u t h o r t i t l eX X
f n 1 l n 1 f n 2 l n 2 t i t l e y e a r
X X X X X -X X - - X XX X - - X -
Paper1
Paper2
[Deutsch et al. 1999]
Dan Suciu Tools for XML Data Exchange
Summary
• XML - simple (?), lightweight syntax
• Challenge: build bridges to existing database tools
• XML in data exchange: YES
• XML as a new data model: NO