31
1 [email protected] MonetDB/XQuery: Updates ADT 2010 ADT 2010 ADT 2010 XQuery Updates in MonetDB/XQuery XQuery Updates in MonetDB/XQuery Stefan Manegold [email protected] http://www.cwi.nl/~manegold/

ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

1

[email protected] MonetDB/XQuery: Updates ADT 2010

ADT 2010ADT 2010

XQuery Updates in MonetDB/XQueryXQuery Updates in MonetDB/XQuery

Stefan [email protected]

http://www.cwi.nl/~manegold/

Page 2: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

2

[email protected] MonetDB/XQuery: Updates ADT 2010

• 09.11.2010:

•RDBMS back-end support for XML/XQuery (1/2):

•Document Representation (XPath Accelerator, Pre/Post plane)

• 16.11.2010:

•XPath navigation (Staircase Join)

•XQuery to Relational Algebra Compiler:

•Item- & Sequence- Representation

•Efficient FLWoR Evaluation (Loop-Lifting)

•Optimization

• 23.11.2010:

•RDBMS back-end support for XML/XQuery (2/2):

•Updateable Document Representation

•Other (DB-) approaches to XML/XQuery processing

ScheduleSchedule

Page 3: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

3

[email protected] MonetDB/XQuery: Updates ADT 2010

XQuery Update Facility 1.0 W3C Candidate Recommendation http://www.c3.org/TR/xquery-update-10/

• Categorize updates into• Value updates• Structural updates

(MonetDB/XQuery does not yet support the latest syntax changes made by W3C; for details see

http://monetdb.cwi.nl/XQuery/Documentation/XQuery-Updates.html)

XML/XQuery UpdatesXML/XQuery Updates

Page 4: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

4

[email protected] MonetDB/XQuery: Updates ADT 2010

do replace value of fn:doc("bib.xml")/books/book[1]/pricewith fn:doc("bib.xml")/books/book[1]/price * 1.1

do replace value of fn:doc(“bib.xml”)/books/book[2]/@isbnwith “90-6196-517-9”

do rename fn:doc(“bib.xml”)/books/book[3]/author[1]into “primary-author”

do rename fn:doc(“bib.xml”)/journals/journal[9]/@isbninto “issn”

=> map directly to simple value updates in relational storage

Value UpdatesValue Updates

Page 5: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

5

[email protected] MonetDB/XQuery: Updates ADT 2010

do insert attribute isbn {“90-6196-517”}into fn:doc("bib.xml")/books/book[17]

do delete fn:doc(“bib.xml”)/books/book[2]/@wrong

do insert <author>Stefan Manegold</author>after fn:doc(“bib.xml”)/books/book[33]/author[last()]

do replace fn:doc(“bib.xml”)/books/book[44]/author[1]with fn:doc(“bib.xml”)/books/book[33]/author[last()]

do delete fn:doc(“bib.xml”)/books/book[author = “Kermit”]

=> How to implement on pre-/post-encoding?

Structural UpdatesStructural Updates

Page 6: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

6

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XQuery XML/XQuery UpdatesUpdates

do insert <k><l/><m/></k> as first into /a/f/g

Page 7: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

7

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XML/XQuery XQuery UpdatesUpdates

do insert <k><l/><m/></k> as first into /a/f/g

Page 8: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

8

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XQuery UpdatesXML/XQuery Updates

Page 9: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

9

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XQuery UpdatesXML/XQuery Updates

Page 10: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

10

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XML/XQuery XQuery UpdatesUpdates

StaircaseStaircaseJoinJoin

Page 11: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

11

[email protected] MonetDB/XQuery: Updates ADT 2010

XML Storage RevisitedXML Storage Revisited

N9N8N7

N6N5N4N3N2nullnullN1N0nid

147

null03

30113010229

208

306305224

null-121510110

levelsizerid

309308227206145304303222131090

levelsizepre

null-12nullnull3

30113010229208147306305224

1510110

levelsizepre

69j58i77h46g85f14e03d22c31b90a

postpre

post = pre + size - level

Allow holes Define logical pages

Page 12: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

12

[email protected] MonetDB/XQuery: Updates ADT 2010

XML Storage RevisitedXML Storage Revisited

N5N4N3

N2N9N8N7N6nullnullN1N0nid

307

null03

14113010309

228

306225204

null-121510110

levelsizerid

309308227206145304303222131090

levelsizepre

null-12nullnull3

30113010229208147306305224

1510110

levelsizepre

69j58i77h46g85f14e03d22c31b90a

postpre

post = pre + size - level

Allow holes Define logical pages

122100

mappage

rid = pre.swizzle( )

Page 13: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

13

[email protected] MonetDB/XQuery: Updates ADT 2010

XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join

Opportunity currently not exploited by other RDBMS

Occurs widely in our XQuery translation.

N5N4N3

N2N9N8N7N6nullnullN1N0nid

307

null03

14113010309

228

306225204

null-121510110

levelsizerid

Page 14: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

14

[email protected] MonetDB/XQuery: Updates ADT 2010

XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join

Opportunity currently not exploited by other RDBMS

Occurs widely in our XQuery translation.

N5N4N3

N2N9N8N7N6nullnullN1N0nid

307

null03

14113010309

228

306225204

null-121510110

levelsizerid

Page 15: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

15

[email protected] MonetDB/XQuery: Updates ADT 2010

MonetDB/XQueryMonetDB/XQueryOur own XML DBMS with (almost..) full XQuery support.• Built purely on an RDBMS, namely MonetDB

Pathfinder compiler & “staircase join”:– Universität Tübingen (Torsten Grust, et al.)

– Technical University Twente (Maurice van Keulen, et. al.)

MonetDB High-Performance DBMS– CWI Amsterdam (Peter Boncz, Stefan Manegold, ...)

Useful for:

• Large XML databases!

• Querying XML annotations (multimedia, forensic NFI)

• XML information retrieval

• ...

Pathfinder Compiler

RelationalAlgebra

XQuery

RDBMS

(MonetDB)

Page 16: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

16

[email protected] MonetDB/XQuery: Updates ADT 2010

Research Projects & ExtensionsResearch Projects & Extensions• Value indeces

• Runtime optimization• SIGMOD'09 [Abdel Kader, Boncz, v. Keulen Manegold]

• Algebraic Query Optimization• Grust, Rittinger, et al. (Universität Tübingen)

• Distributed XQuery P2P XQuery• SOAP group communication, XQuery RPC

• VLDB'07 [Zhang, Boncz]

• Benchmarking beyond XMark• ExpDB'06 Workshop [Manegold]

• Support for XML Interval Annotations• XIME-P'06 Workshop [Alink et al.]

• Xquery + Information Retrieval: PF/Tijah

Page 17: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

17

[email protected] MonetDB/XQuery: Updates ADT 2010

ConclusionsConclusions• Relational approach can be scalable & fast

• MonetDB/XQuery compares favorably with all other available systems

• Techniques that made it work• Property-driven peephole optimization

Order & other properties

• Loop-lifted XPath steps Evaluate Sets of context nodes in a single pass

• Support for dense (autoincrement) keys Positional lookup

• Background Information & Literaturehttp://monetdb-xquery.orghttp://pathfinder-xquery.org

Page 18: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

18

[email protected] Other Xquery Processing Approaches ADT 2010

• 09.11.2010:

•RDBMS back-end support for XML/XQuery (1/2):

•Document Representation (XPath Accelerator, Pre/Post plane)

• 16.11.2010:

•XPath navigation (Staircase Join)

•XQuery to Relational Algebra Compiler:

•Item- & Sequence- Representation

•Efficient FLWoR Evaluation (Loop-Lifting)

•Optimization

• 23.11.2010:

•RDBMS back-end support for XML/XQuery (2/2):

•Updateable Document Representation

•Other (DB-) approaches to XML/XQuery processing

ScheduleSchedule

Page 19: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

19

[email protected] Other Xquery Processing Approaches ADT 2010

TopicsTopics Other approaches & techniques (selection, far from complete!)

Document storage / tree encoding:

ORDPATH

DataGuides

XPath processing:

Tree patterns, holistic twig joins

Page 20: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

20

[email protected] Other Xquery Processing Approaches ADT 2010

Fixed-Width Tree Encodings & UpdatesFixed-Width Tree Encodings & Updates Fixed-width tree encoding (like XPath Accelerator) are

Good for read(-only) processing

small footprint, positional lookup, staircase join

But inherently static

Milo et al., PODS 2002:

“There is a sequence of updates (subtree insertions) for any persistent tree encoding scheme E (where each node keeps its initial encoding label even under updates), such that E needs labels of length (N) to encode the resulting tree of N nodes.”

Page 21: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

21

[email protected] Other Xquery Processing Approaches ADT 2010

Fixed-Width Tree Encodings & UpdatesFixed-Width Tree Encodings & Updates Fixed-width tree encoding (like XPath Accelerator) are

Good for read(-only) processing

small footprint, positional lookup, staircase join

But inherently static

Milo et al., PODS 2002:

“There is a sequence of updates (subtree insertions) for any persistent tree encoding scheme E (where each node keeps its initial encoding label even under updates), such that E needs labels of length (N) to encode the resulting tree of N nodes.”

Page 22: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

22

[email protected] Other Xquery Processing Approaches ADT 2010

XML/XQuery XML/XQuery UpdatesUpdates

do insert <k><l/><m/></k> as first into /a/f/g

Page 23: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

23

[email protected] Other Xquery Processing Approaches ADT 2010

XML/XML/XQuery XQuery UpdatesUpdates

MonetDB/XQuery

hack:

exploit paging

& mmap trick

but:

updating pg|off

is still O(N)

do insert <k><l/><m/></k> as first into /a/f/g

Page 24: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

24

[email protected] Other Xquery Processing Approaches ADT 2010

Fixed-Width Tree Encodings & UpdatesFixed-Width Tree Encodings & Updates Fixed-width tree encoding (like XPath Accelerator) are

Good for read(-only) processing

small footprint, positional lookup, staircase join

But inherently static

Non-solutions:

Gaps in the encoding (never large enough)

Encoding based on decimal fractions (limited precision)

Possible solution:

Variable-width tree encodings:

Cheaper updates

At the expense of more expensive read(-only) processing

Page 25: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

25

[email protected] Other Xquery Processing Approaches ADT 2010

A Variable-Width Tree Encoding: ORDPATHA Variable-Width Tree Encoding: ORDPATH

O'Neil et al., SIGMOD 2004.

Page 26: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

26

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH Encoding: ExampleORDPATH Encoding: Example

Page 27: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

27

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH: Insertion Between SiblingsORDPATH: Insertion Between Siblings

Page 28: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

28

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH: Insertion Between SiblingsORDPATH: Insertion Between Siblings

Page 29: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

29

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH: Insertion Between SiblingsORDPATH: Insertion Between Siblings

Page 30: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

30

[email protected] Other Xquery Processing Approaches ADT 2010

Is ORDPATH suitable for XQuery?Is ORDPATH suitable for XQuery?• Mapping core operations of the XQuery processing model

to operations on ORDPATH labels:

Page 31: ADT 2010 XQuery Updates in MonetDB/XQuery · •XQuery to Relational Algebra Compiler: •Item- & Sequence- Representation •Efficient FLWoR Evaluation (Loop-Lifting) •Optimization

31

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH: Variable-Length Node EncodingORDPATH: Variable-Length Node Encoding• For a 10 MB XML sample document, the authors of ORDPATH observed

label lengths between 6 and 12 bytes.

• ORDPATH labels encode root-to-node paths => common prefixes.

=> Label comparisons often need to inspect encoding bits at the far right.

• MS SQL Server employs further path encodings organized in reverse

(node-to-root) order.

• Note: - Preorder ranks fit into CPU registers.- 4 byte pre's sufficient for 232 = 4G nodes (11 GB XMark fits easily).- 8 byte pre's sufficient for 264 nodes, i.e., “the universe” ...