Copyright 2000, 2001, Ronald Bourret, XML and Databases Ronald Bourret rpbourret@rpbourret.com

Preview:

Citation preview

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

XML and DatabasesRonald Bourretrpbourret@rpbourret.comhttp://www.rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Overview

• Is XML a Database?

• Why Use XML with Databases?

• Data vs. Documents

• Storing and Retrieving Data

• Storing and Retrieving Documents

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Is XML a Database?

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Is XML a database?

• This is really two questions» Is an XML document a database?» Are XML and its surrounding technologies a

database management system (DBMS)?

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Is an XML document a database?

• Yes, it is a collection of data

• Pros» Self-describing» Portable (Unicode)» Can store directed graphs

• Cons» Slow access» Verbose

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Are XML and surrounding technologies a DBMS?

• Yes, they have:» Data storage (XML documents)» Schemas (DTDs, XML Schemas, RELAX, etc.)» Query languages (XPath, XQuery, XQL, etc.)» APIs (SAX, DOM)

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Are XML and surrounding technologies a DBMS? (cont.)

• No, they don’t have:» Separation of logical and physical data» Efficient storage» Indexes» Transactions» Multi-user access» Security» ...

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Using XML as a database

• Good for small, single-user databases» .ini files» Simple address book» List of browser bookmarks» Catalog of MP3s stolen with the help of Napster

• Almost useless for large or multi-user databases

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Why Use XML with Databases?

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Why use XML with databases?

• Expose legacy data as XML

• Transfer data between databases

• Integrating data from a variety of sources

• Store semi-structured data

• Queue e-commerce messages

• Manage and query large document collections

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data vs. Documents

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data vs. documents

• Are you storing documents or the data in them?

<Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>

Yellow = Data White + Yellow = Document

• Helps determine the system you need

• Look at your XML documents to decide

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data-centric documents

• Use XML primarily as a data transport

• Designed for machine consumption

• Sales orders, scientific data, dynamic Web pages

• Characteristics» Regular structure» Fine-grained data» Little or no mixed content» Sibling order not significant

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Sales order

<Order> <Number>1234</Number> <Customer>Gallagher Industries</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item></Order>

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Dynamic Web page

<html><head><title>Flight Schedule: SFO to FRA</title></head><body><p>Daily flights from SFO to FRA</p><table><tr><th>Airline</th><th>Num</th><th>Depart</th><th>Arrive</th></tr><tr><td>Air France</td><td>527</td><td>12:00</td><td>10:33</td></tr><tr><td>Lufthansa</td><td>459</td><td>13:55</td><td>10:05</td></tr><tr><td>American</td><td>385</td><td>14:17</td><td>11:48</td></tr><tr><td>Delta</td><td>99</td><td>15:30</td><td>14:02</td></tr></table></body></html>

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Document-centric documents

• Designed for human consumption

• Use XML to provide structure, metadata

• Books, presentations, email, static Web pages

• Characteristics» Irregular or semi-regular structure» Large-grained data» Lots of mixed content» Sibling order significant

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Product description<Product>

<Para><Name>XML-DBMS</Name> is <Summary>middleware for transferring data between XML documents and relational databases</Summary>. It is written by <Developer>Ronald Bourret</Developer>.</Para>

<Para>XML-DBMS uses an object-relational mapping in which complex element types are viewed as classes and simple element types, PCDATA, and attributes, as well as references to complex types, are viewed as properties.</Para>

<Para>You can:<List><Item><Link URL="Readme.htm">Read more about XML-DBMS</Link></Item><Item><Link URL="jxmldbms.zip">Download Java version</Link></Item><Item><Link URL="pxmldbms.zip">Download PERL version</Link></Item></List></Para>

</Product>

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing data and documents

• Store data in traditional database» Use a native XML database under certain conditions

• Store documents in native XML database» Use a traditional database under certain conditions

• Boundary between data and documents not always clear in practice

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing andRetrieving Data

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Goals and non-goals

• Goals» Preserve data and hierarchical order» Optionally preserve sibling order» One- or two-way data transfer

• Non-goals» Preserve physical structure (entity use, encodings, ...)» Preserve DTD, comments, processing instructions...» Preserve document identity

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data transfer software

• May be middleware or integrated into DBMS

• If integrated, DBMS is said to be XML-enabled

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Mapping data inXML documents to databases

• Most common mapping strategies» Template-driven» Model-driven

• No mapping needed for native XML databases

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Template-driven mappings

• Commands embedded in template

• Extremely flexible» Retrieve data with SQL or other query language» Place values almost anywhere in document» Parameterize subsequent SQL statements» Programming constructs such as if-then-else and for

• Transfer from database to XML only

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Template

<?xml version="1.0"?><FlightInfo> <Intro>The following flights have available seats:</Intro> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights</SelectStmt> <Conclude>We hope one of these meets your needs.</Conclude></FlightInfo>

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Output

<?xml version="1.0"?><FlightInfo> <Intro>The following flights have available seats:</Intro> <Flights> <Row> <Airline>ACME</Airline> <FltNumber>123</FltNumber> <Depart>Dec 12, 1998 13:43</Depart> <Arrive>Dec 13, 1998 01:21</Arrive> </Row> ... </Flights> <Conclude>We hope one of these meets your needs.</Conclude></FlightInfo>

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Model-driven mappings

• Two mappings are common» Table-based» Object-relational

• Data transferred according to model

• Two-way data transfer

• Simpler than templates, but less flexible

• Often used with XSLT

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Table-based mapping

• Map document with “table” structure to RDBMS

<database> <table1> <row> <column1>value 1</column1> <column2>value 2</column2> ... </row> ... </table1> <table2> ... </table2> ...</database>

Table1 Column1 Column2 ...

Table2 Column1 Column2 ...

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Pros and cons

• Pros» Easy to understand» Code is simple and fast» Useful for serializing databases

• Cons» Only works on a small subset of XML documents

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Object-relational mapping

• Map XML document to objects...

<Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item></Order>

Order

Customer Item

Part

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Object-relational mapping (cont.)

• ... and objects to tablesOrders Number Customer ...

Items OrderNumber ItemNumber Part ...

Customers ...

Parts ...

Order

Customer Item

Part

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Objects are data-specific...

• Different for each DTD (schema)

• Model the content (data) of the document

<Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item></Order>

Order

Customer Item

Part

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

... not the DOM

• Same for all XML documents

• Model the structure of the document

Element Attr (Order) (SONumber)

Element Element Element (Customer) (OrderDate) (Item)

... ... ...

<Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item></Order>

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Pros and cons

• Pros» Can handle any XML document» Maps well to existing data structures

• Cons» Very inefficient for mixed content

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data transfer issues

• Data types» All XML data is string» Conversion problems due to many formats

• Null data» Equivalent to missing element or attribute

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data transfer issues (cont.)

• Binary data» No standard way to store in XML» Commonly stored as unparsed entities or Base64

• Character sets» XML can use any encoding, including Unicode» Databases often require single encoding» Unicode is inefficient to store

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing data in anative XML database

• Data stored in XML (document) format

• Pros» Handles semi-structured data efficiently» Fast retrieving whole documents» Support for XML query languages, XLinks, etc.

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing data in anative XML database (cont.)

• Cons» Slow retrieving views outside of document hierarchy» No referential integrity» Data not accessible by non-XML applications

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing and Retrieving Documents

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Goals

• Preserve entire document» Data: elements, attributes, PCDATA» Logical structure: element hierarchy, sibling order» Physical structure: entities, CDATA, encoding...» Other: DTD, comments, processing instructions...

• Preserve document identity

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing documents as BLOBs

• Pros» Exploits existing capabilities: transactions, security...» Many databases have text search tools

• Cons» Text-based searches of XML unreliable

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Indexing XML BLOBswith “side tables”

• Consider the following DTD

<!ELEMENT Brochure (Title, Author, Content)><!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)> <!-- To be indexed --><!ELEMENT Content (%Inline;)> <!-- Inline entity from XHTML -->

• Store complete documents in one table

Brochures---------BrochureID INTEGER <--------- Index brochure IDsBrochure LONGVARCHAR <--------- Complete XML documents

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Indexing XML BLOBswith “side tables” (cont.)

• Store elements to be indexed in separate table

Authors----------------------Author VARCHAR(50) <--------- Index authorsBrochureID INTEGER

• Search index table and join to document table

SELECT Brochure FROM Brochures WHERE BrochureID IN (SELECT BrochureID FROM Authors WHERE Author='Chen')

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing documents innative XML databases

• Store whole XML documents in “native” form

• Define a (logical) model for an XML document» Minimal model is elements, attributes, PCDATA, and

document order» Store and retrieve documents according to that model

• Have normal database features» Query language, indexes, transactions, security, etc.

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Implementation strategies for native XML databases

• Text-based» Store documents as text» Proprietary or file-system storage

• Model-based» Store pre-parsed documents according to model» Relational, object-oriented, hierarchical, or

proprietary storage

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Persistent DOMs (PDOMs)

• Implement DOM over persistent storage

• Returned DOM tree is “live”

• Used by DOM applications that process very large XML documents

• Database is usually local

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Content management systems

• Manage document fragments (content)

• Hide database from user

• Maintain versions, document metadata

• Include editors, publishing systems, etc.

• Extensible through scripting or programming

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Resources

• Ronald Bourret’s Papers Page» http://www.rpbourret.com/xml/index.htm

• XML:DB.org’s Resources Page» http://www.xmldb.org/resources.html

• XML:DB Mailing List» http://www.xmldb.org/projects.html

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Questions?Ronald Bourretrpbourret@rpbourret.comhttp://www.rpbourret.com

Recommended