Upload
brian-snow
View
219
Download
0
Embed Size (px)
Citation preview
Full-Fidelity FlexibleObject-Oriented XML Access
James F. Terwilliger,Philip A. Bernstein, Sergey Melnik
Persistent XML via your favorite
programming language
Declarative mappings
Access to the whole document, able to
reconstruct the original
2
LRX: LINQ over Relations and XML
Classes TablesXML
Object-oriented access to stored data
SQL, XQuery
Native programming language Object-based queries and updatesStatic type checking
ORM’s do not handle XML!
?
Language-Integrated Query
3
Problem
• Currently common: use ORM mapper, bring XML data to client as a string and process on client– Load into XML-like objects and do XPath through API
from o in DB.JobCandidateswhere o.Resume.Skills.Contains("production")select o.Resume.Name.Name_Last;
WITH XMLNAMESPACES ('http://.../adventure-works/Resume' AS r)SELECT [Extent1].[Resume].value(N'/*[1]/r:Name/r:Name.Last',
N'nvarchar(max)') AS [C1]FROM [HumanResources].[JobCandidate] AS [Extent1]WHERE [Extent1].[Resume].exist(N'/*[1]/r:Skills[
contains(., "production"]') = CAST(1 as bit)
4
Inspiration: O-R MappingEntity Framework (Melnik et al. 2007)
Client-side (Objects): Store side (Relations):
Classes Tables
Q1 = Q1’Q2 = Q2’Q3 = Q3’
…
(select-project only)
Query view VQ
Update view VUMerge view VM
Object Queries (LINQ)
Object Updates
Mapping specified at schema level
Mapping compiled to views
Preserve fidelity of the source data
5
Person:idnametitle
EF Example
Client-side (Classes): Store side (Relations):Person1( id integer PRIMARY KEY, name varchar(50),)Person2( id integer PRIMARY KEY, title varchar(50), details varchar(2000))
πid, name Person = πid, name Person1
Person = πid, name, title Person1 ⋈ Person2
πid, title Person = πid, title Person2
6
Extending EF for XML:Design Requirements
Classes TablesXML
• Map classes to XML using similar mechanism• Schema-level mapping language• Compile into query and update procedures• In-place updates to maintain fidelity of source
• BONUS: Full-Fidelity object representation
7
Challenges and Related Work
1. Express O-X mappings declaratively– Some existing tools are canonical (not flexible)– E.g., LINQ-to-XSD
2. Translate mappings into bidirectional procedures– Some existing tools are unidirectional– E.g., Clio
3. Translate client queries and updates into server analogs
– Some existing tools are state-based– E.g., Lenses, Bidirectional XQuery
8
Outline
• Introduction• Mappings• Mapping compilation and query translation• Full Fidelity and update translation• Performance• Conclusion
9
Running ExampleExample Document: Store Side (XML Schema):
type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name
xsd:choice
<contact> <Address> … </Address> <phone_type>Home</phone_type> <number>555-5123</number> <phone_type>Cell</phone_type> <phone_type>Work</phone_type> <number>555-5234</number> <number>555-5345</number> <Address> … </Address> <Address> … </Address> <Name Prefix=“Ms.”> <First_name>Sue</First_name> <Last_name>Wall</Last_name> </Name></contact>
10
Component Designator (CD)Store Side (XML Schema):
type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name
/type::ns:Contact/model::sequence /schemaElement::ns:Address[1]
/type::ns:Contact/model::sequence /model::sequence[1]
/type::ns:Contact/model::sequence /schemaElement::ns:Address[2]
schemaElement::ns:Name /type::0/model::sequence[1] /schemaElement::First_Name[1]Name/First_Name
Mappings and Flexibility: IntuitionClient Side (Objects): Store Side (XML Schema):
type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name
Contact:AddressPhone1…Phone5ContactType (P/B)PersonNameBusinessInfo
PersonInfo:PrefixFirst_NameLast_Name
= P
model::sequence, 1Address[1], 1
model::sequence, 5
PhoneInfo:typeNumbers
11
12
Alternative Representation
Client Side (Objects): Store Side (XML Schema):
Contact:AddressPhone[5]
Person:PrefixFirst_NameLast_Name
model::sequence, all
Name/@Prefix, 1
Name/Last_Name, 1
type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name
EXISTS
13
Mappings
• A type mapping:– Associates one client-side class with one XML type– Assigns to each class member a CD expression
• “Mapping Fragment”• Essentially maps to a schema element• Might include a position reference if mapping into list
– Allows conditions on either side• Client-side can have equality conditions on values• XML-side can have equality conditions on values, tag
names, or existence of elements
14
Compiling CD Expressions: UPAExample Document: Store Side (XML Schema):
type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name
<contact> <Address> … </Address> <phone_type>Home</phone_type> <number>555-5123</number> <phone_type>Cell</phone_type> <phone_type>Work</phone_type> <number>555-5234</number> <number>555-5345</number> <Address> … </Address> <Address> … </Address> <Name Prefix=“Ms.”> <First_name>Sue</First_name> <Last_name>Wall</Last_name> </Name></contact>
Unique Particle Attribution
15
Compiling CD ExpressionsStore Side (XML Schema):
type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name
CD Expression (Compiles to)Query to retrieve all elements that match the element
/type::ns:T/model::sequence /schemaElement::ns:Address[1] (Compiles to)/Address[.<<../phone_type[1]]
Name/First_Name (Compiles to)Name/First_Name
16
Queries and Query Translation:Intuition
from p in ObjectContext.Peoplefrom e in p.resume.employerswhere e.address.city.Contains(“Port”)select new {pname = p.name, ename = e.name}
from p in ObjectContext.Peoplefrom e in p.resume.employerswhere e.address.city.Contains(“Port”)select new {pname = p.name, ename = e.name}
from p in ObjectContext.Peoplefrom e in SEQUENCE(p.resume, “/resume/employers”)where TEST(e, “/resume/address/city[contains(., “Port”)]”)select new {pname = p.name, ename =
VALUE(e, “/name”, string)}
17
Queries and Query Translation:Basics
Foo.bar1.bar2.bar3
PH(Q)
Type T
Q’: Compiled query for CD expression of T.bar3
PH’(Q/Q’)
VALUE: Run query, cast result as primitive typeQUERY: Run querySEQUENCE: Run query, iterate over resultsTEST: Run query, return boolean indicating if result is non-empty
PH and PH’ in {VALUE, QUERY, SEQUENCE, TEST}
18
Type Translation
Client Side (Objects): Store Side (XML Schema):
Contact:AddressPhone[5]
Person:PrefixFirst_NameLast_Name
type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name
EXISTS
where obj is Personwhere TEST(obj, “./Name”)
19
Full Fidelity
<contact><!-- added by Tom --><Address source=“corporate”>
…</Address>
…<!-- Need to review addresses --></contact>
class Contact {…AddressType Address;…
}
Not part of the schema for the document
20
Delta Representation
contactObject
Address = new Address (…)
Phone[1] = …
contactObject BeforeEnd: “Need to review addresses” (Comment)
Address = new Address (…) Before: “added by Tom” (Comment) Start: source=“corporate” (Attribute)Phone[1] = …
<contact><!-- added by Tom --><Address source=“corporate”>
…</Address>
…<!-- Need to review addresses --></contact>
• Each mappable location (anchor) is a key into the delta• Unmapped data becomes associated with an anchor with a relative position reference• Anchors stored in document order
21
UpdatesAKA: What Does Full Fidelity Get Us
• Re-serialization is always an option– Repackage the entire XML document and overwrite
• In-place updates may be substantially faster– Oracle, SQL Server, DB2 support in-place updates
• In-place updates based on XPath/XQuery– Insert new node, replace existing node, delete node– Inserts are relative to an existing node in tree– After, before, as first, as last
22
Relative Location
Client Side (Objects): Store Side (XML Schema):
Contact:AddressPhone[5]
Person:PrefixFirst_NameLast_Name
type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name
First_Name = BobPhone[1] = new PhoneType (…)
• After? Before? As first?• Correct location depends on pre-existing data• Schema is insufficient• Use delta representation to determine correct placement
23
Performance
• XMark queries over partially shredded data (4GB)– Q1: Simple paths– Q5: Aggregation and filtering– Q9: Joins– QN: Variant of Query Q6, Descendant axis
• Query 6 needed to be re-written because the interesting part of Q6 had been shredded
• LRX versus bringing data to client first– Currently, only other option is manual XQuery
24
Example: Q5• With LRXvar c = (from o in db.ClosedAuctions where o.closed_auction.price >= 40 select o.auto_id).Count();
• Without LRXvar q = from o in db.ClosedAuctions select o.closed_auction;int i = 0;foreach (var o in q) if ((decimal)o.Element("price") >= 40) i++;
Data is pulled to client
Filter and count on client
25
Takeaway: Benefit from Pushing Operations to Server
• Results are in seconds per 100 runs• Blue bars are LRX, green bars are without• Tried with cold (C) and hot (H) page buffers
26
Conclusion and Future Directions
• Query optimization– Pushing operations to either relational or XML
• Keyrefs Object pointers• Queries/updates directly to delta• LRX versus Lorax
27
Attention, my VLDB attendees!Our system is LRX, it speaks for the treesThe XML trees overlooked by the toolsThat follow the object-relational rules
Of course, one can always resort to XQueryBut FLWOR’s the deed that makes optimists
drearyWe leave all relational portions pristineBut add new components for XML seen
28
LRX takes fragments on schema expressedAnd compiles them to queries whose
structures suggestHow to draw the right data from trees
intertwinedAnd pack into objects of custom design
But what of the stuff ‘twixt the elements fall?The comments, the whitespace, the order of
xsd:all?Our LRX tucks all of that data awayIn a structure called “delta”, an indexed array
29
We draw from the keys in the delta in caseWe must locate the space to do updates in
placeWhen queries or updates on clients arriveNative XQuery does LRX contrive
Inspection of query performance has shownThat LRX is faster than client aloneThis is how we make objects of stored XMLMy talk is now done, so I bid you farewell
30
Thanks!
var q = from c in db.ClosedAuctions where c.closed_auction.price > 20 from t in c.closed_auction
.annotation.description.text
.Descendants(“emph") select t;
WITH XMLNAMESPACES('http://.../Auction' AS a) SELECT T FROM (SELECT T FROM ClosedAuctions C, SEQUENCE(C.closed_auction,
'a:auction/a:annotation/a:description/a:text//emph') AS T)
WHERE VALUE(C.closed_auction, 'a:auction/a:price', int) > 20
31
Supplemental Slides
32
Escape Hatch: LINQ-to-XML
• xsd:anyType, mixed content– Or a preference for the XPath model
• Map to class XElement• XPath-like interface to XML-like data
– Each method invocation translated into XPath on server
from c in db.ClosedAuctions where c.closed_auction.price > 20 from t in c.closed_auction
.annotation.description.text
.Descendants("emph") select t;
Object is of type XElement
Pushed to server as corresponding XPath axis
33
Queries and Query Translation:Conditions
where e.foo == “bar”
where e.foo.Contains(“bar”)
where e.foo is barType
IF conditions cover a client-side mapping fragment condition,translate into the corresponding store-side XML conditions
IF method has an XML analog, translate into the analogous XQuery function
Find fragment for barType, then translate into type or element conditions on XML
34
Object queries and updates via LINQ
O-X mappings
DB2 Oracle
DB2 Provider
SQL Server
PH XQuery
SS Provider
PH XQuery
Ora Provider
PH XQuery
O-R mappings
Translate XML-mapped references to
placeholder (PH) functions
Shred XML into objects according to object type and mappings
Translation to vendor-specific
SQL syntax
Relation-mapped classes XML-mapped classes
Package queries and updates into abstract trees, then
transform by applying mappings
Build objects from query
results
Client-side object space Object queries and updates via LINQ
Translate XML-mapped references to
placeholder (PH) functions
Package queries and updates into abstract trees, then
transform by applying mappings
PH XQuery
Build objects from query
results
Shred XML into objects according to object type and mappings
Relation-mapped classes XML-mapped classes