48
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com XML and Databases Ronald Bourret [email protected] http://www.rpbourret.com

Copyright 2000, 2001, Ronald Bourret, XML and Databases Ronald Bourret [email protected]

Embed Size (px)

Citation preview

Page 1: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

XML and DatabasesRonald [email protected]://www.rpbourret.com

Page 2: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Overview

• Is XML a Database?

• Why Use XML with Databases?

• Data vs. Documents

• Storing and Retrieving Data

• Storing and Retrieving Documents

Page 3: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Is XML a Database?

Page 4: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Is XML a database?

• This is really two questions» Is an XML document a database?» Are XML and its surrounding technologies a

database management system (DBMS)?

Page 5: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Is an XML document a database?

• Yes, it is a collection of data

• Pros» Self-describing» Portable (Unicode)» Can store directed graphs

• Cons» Slow access» Verbose

Page 6: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Are XML and surrounding technologies a DBMS?

• Yes, they have:» Data storage (XML documents)» Schemas (DTDs, XML Schemas, RELAX, etc.)» Query languages (XPath, XQuery, XQL, etc.)» APIs (SAX, DOM)

Page 7: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Are XML and surrounding technologies a DBMS? (cont.)

• No, they don’t have:» Separation of logical and physical data» Efficient storage» Indexes» Transactions» Multi-user access» Security» ...

Page 8: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Using XML as a database

• Good for small, single-user databases» .ini files» Simple address book» List of browser bookmarks» Catalog of MP3s stolen with the help of Napster

• Almost useless for large or multi-user databases

Page 9: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Why Use XML with Databases?

Page 10: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Why use XML with databases?

• Expose legacy data as XML

• Transfer data between databases

• Integrating data from a variety of sources

• Store semi-structured data

• Queue e-commerce messages

• Manage and query large document collections

Page 11: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data vs. Documents

Page 12: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data vs. documents

• Are you storing documents or the data in them?

<Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>

Yellow = Data White + Yellow = Document

• Helps determine the system you need

• Look at your XML documents to decide

Page 13: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data-centric documents

• Use XML primarily as a data transport

• Designed for machine consumption

• Sales orders, scientific data, dynamic Web pages

• Characteristics» Regular structure» Fine-grained data» Little or no mixed content» Sibling order not significant

Page 14: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Sales order

<Order> <Number>1234</Number> <Customer>Gallagher Industries</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item></Order>

Page 15: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Dynamic Web page

<html><head><title>Flight Schedule: SFO to FRA</title></head><body><p>Daily flights from SFO to FRA</p><table><tr><th>Airline</th><th>Num</th><th>Depart</th><th>Arrive</th></tr><tr><td>Air France</td><td>527</td><td>12:00</td><td>10:33</td></tr><tr><td>Lufthansa</td><td>459</td><td>13:55</td><td>10:05</td></tr><tr><td>American</td><td>385</td><td>14:17</td><td>11:48</td></tr><tr><td>Delta</td><td>99</td><td>15:30</td><td>14:02</td></tr></table></body></html>

Page 16: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Document-centric documents

• Designed for human consumption

• Use XML to provide structure, metadata

• Books, presentations, email, static Web pages

• Characteristics» Irregular or semi-regular structure» Large-grained data» Lots of mixed content» Sibling order significant

Page 17: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Product description<Product>

<Para><Name>XML-DBMS</Name> is <Summary>middleware for transferring data between XML documents and relational databases</Summary>. It is written by <Developer>Ronald Bourret</Developer>.</Para>

<Para>XML-DBMS uses an object-relational mapping in which complex element types are viewed as classes and simple element types, PCDATA, and attributes, as well as references to complex types, are viewed as properties.</Para>

<Para>You can:<List><Item><Link URL="Readme.htm">Read more about XML-DBMS</Link></Item><Item><Link URL="jxmldbms.zip">Download Java version</Link></Item><Item><Link URL="pxmldbms.zip">Download PERL version</Link></Item></List></Para>

</Product>

Page 18: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing data and documents

• Store data in traditional database» Use a native XML database under certain conditions

• Store documents in native XML database» Use a traditional database under certain conditions

• Boundary between data and documents not always clear in practice

Page 19: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing andRetrieving Data

Page 20: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Goals and non-goals

• Goals» Preserve data and hierarchical order» Optionally preserve sibling order» One- or two-way data transfer

• Non-goals» Preserve physical structure (entity use, encodings, ...)» Preserve DTD, comments, processing instructions...» Preserve document identity

Page 21: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data transfer software

• May be middleware or integrated into DBMS

• If integrated, DBMS is said to be XML-enabled

Page 22: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Mapping data inXML documents to databases

• Most common mapping strategies» Template-driven» Model-driven

• No mapping needed for native XML databases

Page 23: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Template-driven mappings

• Commands embedded in template

• Extremely flexible» Retrieve data with SQL or other query language» Place values almost anywhere in document» Parameterize subsequent SQL statements» Programming constructs such as if-then-else and for

• Transfer from database to XML only

Page 24: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Template

<?xml version="1.0"?><FlightInfo> <Intro>The following flights have available seats:</Intro> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights</SelectStmt> <Conclude>We hope one of these meets your needs.</Conclude></FlightInfo>

Page 25: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Example: Output

<?xml version="1.0"?><FlightInfo> <Intro>The following flights have available seats:</Intro> <Flights> <Row> <Airline>ACME</Airline> <FltNumber>123</FltNumber> <Depart>Dec 12, 1998 13:43</Depart> <Arrive>Dec 13, 1998 01:21</Arrive> </Row> ... </Flights> <Conclude>We hope one of these meets your needs.</Conclude></FlightInfo>

Page 26: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Model-driven mappings

• Two mappings are common» Table-based» Object-relational

• Data transferred according to model

• Two-way data transfer

• Simpler than templates, but less flexible

• Often used with XSLT

Page 27: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Table-based mapping

• Map document with “table” structure to RDBMS

<database> <table1> <row> <column1>value 1</column1> <column2>value 2</column2> ... </row> ... </table1> <table2> ... </table2> ...</database>

Table1 Column1 Column2 ...

Table2 Column1 Column2 ...

Page 28: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Pros and cons

• Pros» Easy to understand» Code is simple and fast» Useful for serializing databases

• Cons» Only works on a small subset of XML documents

Page 29: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Object-relational mapping

• Map XML document to objects...

<Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item></Order>

Order

Customer Item

Part

Page 30: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Object-relational mapping (cont.)

• ... and objects to tablesOrders Number Customer ...

Items OrderNumber ItemNumber Part ...

Customers ...

Parts ...

Order

Customer Item

Part

Page 31: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Objects are data-specific...

• Different for each DTD (schema)

• Model the content (data) of the document

<Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item></Order>

Order

Customer Item

Part

Page 32: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

... not the DOM

• Same for all XML documents

• Model the structure of the document

Element Attr (Order) (SONumber)

Element Element Element (Customer) (OrderDate) (Item)

... ... ...

<Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item></Order>

Page 33: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Pros and cons

• Pros» Can handle any XML document» Maps well to existing data structures

• Cons» Very inefficient for mixed content

Page 34: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data transfer issues

• Data types» All XML data is string» Conversion problems due to many formats

• Null data» Equivalent to missing element or attribute

Page 35: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Data transfer issues (cont.)

• Binary data» No standard way to store in XML» Commonly stored as unparsed entities or Base64

• Character sets» XML can use any encoding, including Unicode» Databases often require single encoding» Unicode is inefficient to store

Page 36: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing data in anative XML database

• Data stored in XML (document) format

• Pros» Handles semi-structured data efficiently» Fast retrieving whole documents» Support for XML query languages, XLinks, etc.

Page 37: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing data in anative XML database (cont.)

• Cons» Slow retrieving views outside of document hierarchy» No referential integrity» Data not accessible by non-XML applications

Page 38: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing and Retrieving Documents

Page 39: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Goals

• Preserve entire document» Data: elements, attributes, PCDATA» Logical structure: element hierarchy, sibling order» Physical structure: entities, CDATA, encoding...» Other: DTD, comments, processing instructions...

• Preserve document identity

Page 40: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing documents as BLOBs

• Pros» Exploits existing capabilities: transactions, security...» Many databases have text search tools

• Cons» Text-based searches of XML unreliable

Page 41: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Indexing XML BLOBswith “side tables”

• Consider the following DTD

<!ELEMENT Brochure (Title, Author, Content)><!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)> <!-- To be indexed --><!ELEMENT Content (%Inline;)> <!-- Inline entity from XHTML -->

• Store complete documents in one table

Brochures---------BrochureID INTEGER <--------- Index brochure IDsBrochure LONGVARCHAR <--------- Complete XML documents

Page 42: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Indexing XML BLOBswith “side tables” (cont.)

• Store elements to be indexed in separate table

Authors----------------------Author VARCHAR(50) <--------- Index authorsBrochureID INTEGER

• Search index table and join to document table

SELECT Brochure FROM Brochures WHERE BrochureID IN (SELECT BrochureID FROM Authors WHERE Author='Chen')

Page 43: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Storing documents innative XML databases

• Store whole XML documents in “native” form

• Define a (logical) model for an XML document» Minimal model is elements, attributes, PCDATA, and

document order» Store and retrieve documents according to that model

• Have normal database features» Query language, indexes, transactions, security, etc.

Page 44: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Implementation strategies for native XML databases

• Text-based» Store documents as text» Proprietary or file-system storage

• Model-based» Store pre-parsed documents according to model» Relational, object-oriented, hierarchical, or

proprietary storage

Page 45: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Persistent DOMs (PDOMs)

• Implement DOM over persistent storage

• Returned DOM tree is “live”

• Used by DOM applications that process very large XML documents

• Database is usually local

Page 46: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Content management systems

• Manage document fragments (content)

• Hide database from user

• Maintain versions, document metadata

• Include editors, publishing systems, etc.

• Extensible through scripting or programming

Page 47: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Resources

• Ronald Bourret’s Papers Page» http://www.rpbourret.com/xml/index.htm

• XML:DB.org’s Resources Page» http://www.xmldb.org/resources.html

• XML:DB Mailing List» http://www.xmldb.org/projects.html

Page 48: Copyright 2000, 2001, Ronald Bourret,  XML and Databases Ronald Bourret rpbourret@rpbourret.com

Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com

Questions?Ronald [email protected]://www.rpbourret.com