Upload
bethanie-mccoy
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
XML and DatabasesRonald [email protected]://www.rpbourret.com
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Overview
• Is XML a Database?
• Why Use XML with Databases?
• Data vs. Documents
• Storing and Retrieving Data
• Storing and Retrieving Documents
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Is XML a Database?
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Is XML a database?
• This is really two questions» Is an XML document a database?» Are XML and its surrounding technologies a
database management system (DBMS)?
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Is an XML document a database?
• Yes, it is a collection of data
• Pros» Self-describing» Portable (Unicode)» Can store directed graphs
• Cons» Slow access» Verbose
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Are XML and surrounding technologies a DBMS?
• Yes, they have:» Data storage (XML documents)» Schemas (DTDs, XML Schemas, RELAX, etc.)» Query languages (XPath, XQuery, XQL, etc.)» APIs (SAX, DOM)
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Are XML and surrounding technologies a DBMS? (cont.)
• No, they don’t have:» Separation of logical and physical data» Efficient storage» Indexes» Transactions» Multi-user access» Security» ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Using XML as a database
• Good for small, single-user databases» .ini files» Simple address book» List of browser bookmarks» Catalog of MP3s stolen with the help of Napster
• Almost useless for large or multi-user databases
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Why Use XML with Databases?
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Why use XML with databases?
• Expose legacy data as XML
• Transfer data between databases
• Integrating data from a variety of sources
• Store semi-structured data
• Queue e-commerce messages
• Manage and query large document collections
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Data vs. Documents
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Data vs. documents
• Are you storing documents or the data in them?
<Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>
Yellow = Data White + Yellow = Document
• Helps determine the system you need
• Look at your XML documents to decide
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Data-centric documents
• Use XML primarily as a data transport
• Designed for machine consumption
• Sales orders, scientific data, dynamic Web pages
• Characteristics» Regular structure» Fine-grained data» Little or no mixed content» Sibling order not significant
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example: Sales order
<Order> <Number>1234</Number> <Customer>Gallagher Industries</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item></Order>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example: Dynamic Web page
<html><head><title>Flight Schedule: SFO to FRA</title></head><body><p>Daily flights from SFO to FRA</p><table><tr><th>Airline</th><th>Num</th><th>Depart</th><th>Arrive</th></tr><tr><td>Air France</td><td>527</td><td>12:00</td><td>10:33</td></tr><tr><td>Lufthansa</td><td>459</td><td>13:55</td><td>10:05</td></tr><tr><td>American</td><td>385</td><td>14:17</td><td>11:48</td></tr><tr><td>Delta</td><td>99</td><td>15:30</td><td>14:02</td></tr></table></body></html>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Document-centric documents
• Designed for human consumption
• Use XML to provide structure, metadata
• Books, presentations, email, static Web pages
• Characteristics» Irregular or semi-regular structure» Large-grained data» Lots of mixed content» Sibling order significant
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example: Product description<Product>
<Para><Name>XML-DBMS</Name> is <Summary>middleware for transferring data between XML documents and relational databases</Summary>. It is written by <Developer>Ronald Bourret</Developer>.</Para>
<Para>XML-DBMS uses an object-relational mapping in which complex element types are viewed as classes and simple element types, PCDATA, and attributes, as well as references to complex types, are viewed as properties.</Para>
<Para>You can:<List><Item><Link URL="Readme.htm">Read more about XML-DBMS</Link></Item><Item><Link URL="jxmldbms.zip">Download Java version</Link></Item><Item><Link URL="pxmldbms.zip">Download PERL version</Link></Item></List></Para>
</Product>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Storing data and documents
• Store data in traditional database» Use a native XML database under certain conditions
• Store documents in native XML database» Use a traditional database under certain conditions
• Boundary between data and documents not always clear in practice
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Storing andRetrieving Data
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Goals and non-goals
• Goals» Preserve data and hierarchical order» Optionally preserve sibling order» One- or two-way data transfer
• Non-goals» Preserve physical structure (entity use, encodings, ...)» Preserve DTD, comments, processing instructions...» Preserve document identity
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Data transfer software
• May be middleware or integrated into DBMS
• If integrated, DBMS is said to be XML-enabled
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping data inXML documents to databases
• Most common mapping strategies» Template-driven» Model-driven
• No mapping needed for native XML databases
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Template-driven mappings
• Commands embedded in template
• Extremely flexible» Retrieve data with SQL or other query language» Place values almost anywhere in document» Parameterize subsequent SQL statements» Programming constructs such as if-then-else and for
• Transfer from database to XML only
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example: Template
<?xml version="1.0"?><FlightInfo> <Intro>The following flights have available seats:</Intro> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights</SelectStmt> <Conclude>We hope one of these meets your needs.</Conclude></FlightInfo>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example: Output
<?xml version="1.0"?><FlightInfo> <Intro>The following flights have available seats:</Intro> <Flights> <Row> <Airline>ACME</Airline> <FltNumber>123</FltNumber> <Depart>Dec 12, 1998 13:43</Depart> <Arrive>Dec 13, 1998 01:21</Arrive> </Row> ... </Flights> <Conclude>We hope one of these meets your needs.</Conclude></FlightInfo>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Model-driven mappings
• Two mappings are common» Table-based» Object-relational
• Data transferred according to model
• Two-way data transfer
• Simpler than templates, but less flexible
• Often used with XSLT
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Table-based mapping
• Map document with “table” structure to RDBMS
<database> <table1> <row> <column1>value 1</column1> <column2>value 2</column2> ... </row> ... </table1> <table2> ... </table2> ...</database>
Table1 Column1 Column2 ...
Table2 Column1 Column2 ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Pros and cons
• Pros» Easy to understand» Code is simple and fast» Useful for serializing databases
• Cons» Only works on a small subset of XML documents
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Object-relational mapping
• Map XML document to objects...
<Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item></Order>
Order
Customer Item
Part
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Object-relational mapping (cont.)
• ... and objects to tablesOrders Number Customer ...
Items OrderNumber ItemNumber Part ...
Customers ...
Parts ...
Order
Customer Item
Part
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Objects are data-specific...
• Different for each DTD (schema)
• Model the content (data) of the document
<Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item></Order>
Order
Customer Item
Part
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... not the DOM
• Same for all XML documents
• Model the structure of the document
Element Attr (Order) (SONumber)
Element Element Element (Customer) (OrderDate) (Item)
... ... ...
<Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item></Order>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Pros and cons
• Pros» Can handle any XML document» Maps well to existing data structures
• Cons» Very inefficient for mixed content
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Data transfer issues
• Data types» All XML data is string» Conversion problems due to many formats
• Null data» Equivalent to missing element or attribute
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Data transfer issues (cont.)
• Binary data» No standard way to store in XML» Commonly stored as unparsed entities or Base64
• Character sets» XML can use any encoding, including Unicode» Databases often require single encoding» Unicode is inefficient to store
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Storing data in anative XML database
• Data stored in XML (document) format
• Pros» Handles semi-structured data efficiently» Fast retrieving whole documents» Support for XML query languages, XLinks, etc.
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Storing data in anative XML database (cont.)
• Cons» Slow retrieving views outside of document hierarchy» No referential integrity» Data not accessible by non-XML applications
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Storing and Retrieving Documents
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Goals
• Preserve entire document» Data: elements, attributes, PCDATA» Logical structure: element hierarchy, sibling order» Physical structure: entities, CDATA, encoding...» Other: DTD, comments, processing instructions...
• Preserve document identity
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Storing documents as BLOBs
• Pros» Exploits existing capabilities: transactions, security...» Many databases have text search tools
• Cons» Text-based searches of XML unreliable
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Indexing XML BLOBswith “side tables”
• Consider the following DTD
<!ELEMENT Brochure (Title, Author, Content)><!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)> <!-- To be indexed --><!ELEMENT Content (%Inline;)> <!-- Inline entity from XHTML -->
• Store complete documents in one table
Brochures---------BrochureID INTEGER <--------- Index brochure IDsBrochure LONGVARCHAR <--------- Complete XML documents
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Indexing XML BLOBswith “side tables” (cont.)
• Store elements to be indexed in separate table
Authors----------------------Author VARCHAR(50) <--------- Index authorsBrochureID INTEGER
• Search index table and join to document table
SELECT Brochure FROM Brochures WHERE BrochureID IN (SELECT BrochureID FROM Authors WHERE Author='Chen')
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Storing documents innative XML databases
• Store whole XML documents in “native” form
• Define a (logical) model for an XML document» Minimal model is elements, attributes, PCDATA, and
document order» Store and retrieve documents according to that model
• Have normal database features» Query language, indexes, transactions, security, etc.
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Implementation strategies for native XML databases
• Text-based» Store documents as text» Proprietary or file-system storage
• Model-based» Store pre-parsed documents according to model» Relational, object-oriented, hierarchical, or
proprietary storage
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Persistent DOMs (PDOMs)
• Implement DOM over persistent storage
• Returned DOM tree is “live”
• Used by DOM applications that process very large XML documents
• Database is usually local
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Content management systems
• Manage document fragments (content)
• Hide database from user
• Maintain versions, document metadata
• Include editors, publishing systems, etc.
• Extensible through scripting or programming
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Resources
• Ronald Bourret’s Papers Page» http://www.rpbourret.com/xml/index.htm
• XML:DB.org’s Resources Page» http://www.xmldb.org/resources.html
• XML:DB Mailing List» http://www.xmldb.org/projects.html
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Questions?Ronald [email protected]://www.rpbourret.com