31
2005 rel-xml-iii 1 View forests and query composition • The composition algorithm works for a (large) subset of XQuery, excluding : (see paper for details) • User-defined recursive functions • order-dependent features (before, after, …) • XQuery contain many redundant/complex features; Every query can be translated (& simplified) into a XQueryCore expression (syntax in p. 20) this is normalization • The first step is type checking (e.g., data is applied only where the element is atomic-valued) , then normalization, then the composition algorithm is applied

2005rel-xml-iii1 View forests and query composition The composition algorithm works for a (large) subset of XQuery, excluding : (see paper for details)

Embed Size (px)

Citation preview

2005 rel-xml-iii 1

View forests and query composition

• The composition algorithm works for a (large) subset of XQuery, excluding : (see paper for details)

• User-defined recursive functions

• order-dependent features (before, after, …)

• XQuery contain many redundant/complex features;

Every query can be translated (& simplified) into a XQueryCore expression (syntax in p. 20) – this is normalization

• The first step is type checking (e.g., data is applied only where the

element is atomic-valued) , then normalization, then the composition algorithm is applied

2005 rel-xml-iii 2

Normalization: (as defined in the XQuery formal semantics)

• Rewriting XML elements/attributes into internal form

• Breaking for/let clauses so each defines just one variable

• Replacing where e1 return e2 by if (e1) then e2 else ()

• Applying data to element/attribute operands of expressions that requires atomic arguments

SilkRoute adds some more:

• Long path expressions are broken into one-step expressions

• Each one-step binds a new variable

• … (see paper)

2005 rel-xml-iii 3

The original

public query

(fig. 6)

back 24

2005 rel-xml-iii 4

Upper & bottom parts of normalized public query(fig. 14, p. 21) p.3

2005 rel-xml-iii 5

In the algorithm, a node is a triple

• QName

• Forest (of children)

• SQL (an sql fragment)

Constructed by

elementNode(QName, F, S) or attributeNode(QName, F, S)

An SQL fragment is a triple:

(From, Where, Select) (each may be empty)

The function joinSQL concatenates several fragments into one

The algorithm is recursive, works top-bottom on the given query

2005 rel-xml-iii 6

We illustrate with a simple example :

The relational db has one relation

Clothing Itemon-saleprice

coattrue99

shirtfalse38

skirtfalse45

2005 rel-xml-iii 7

Here is (a simplified) canonical view tree

Omitted SQL fragments are empty

We assume $CV is bound to N1

N1 <Clothing>

N1.1 <Tuple>

From: Clothing c

N1.1.1 <item>

N1.1.1.1 string

Select: c.item

N1.1.3 <price>N1.1.2 <on-sale>

N1.1.2.1 bool

Select: c.on-sale

N1.1.3.1 int

Select: c.price

back 11

Back13

2005 rel-xml-iii 8

Here is our query:element view {

for $t in $CV/child::Tuple return

for $s in $t/child::on-sale return

if data($s) then

element product {

element name {

for $item in $t/child::item return data($item)

}

}

else ()

}

Changes: Tuple product, only the item field is output (projection), and on-sale is used for a selection

p. 7

2005 rel-xml-iii 9

Here is the expected view tree of the composition

P1.1 <product>

From: Clothing c

Where: c.on-sale

P1.1.1 <name>

P1.1.1.1 string

Select: c.item

Q: How is the SQL fragment for P1 generated?

A: when the binding for $t is found, the SQL fragment From: Clothing cis collected;

When the if is processed, the Where: c.on-saleis collected;

when element productis encountered, the SQL is output

The algorithm has three parameters

Env: bindings of variables seen so far to view trees/forests

Expr: the expression to be processed

S: an SQL fragment (collected on the way down)

p. 8

P1 <view>

2005 rel-xml-iii 10

The algorithm vfca(Env, Exp, S) is

recursive, functional, top-down, returns a view forest

Denote by Qi the query with first i rows deleted

vfca is initially called with

Env0 = {$cv N1},

Expr0 = Q0,

S0 = ()

Process 1st line : element view {

let vf = vfca(Env0, Q1, ())

in elementNode(QName,vf, S)

A node P1, with label view & empty SQL, is generated, and with child(ren) – whatever is returned by the recursive call

2005 rel-xml-iii 11

vfca(Env1 (= Env0), Q1, S1 (= S0)) :

Processing 2nd line: for $t in $cv/child::Tuple return

let vf = binding obtained for $t (= N1.1),

Env’ = Env1 + {$t vf}, // will be changed a bit later

Expr’ = remainder of Q1 (= Q2) ,

S’ = sqlJoin(S1, vf.sql) = From: clothing c

in vfca(Env’, Expr’, S’)

We refer in the following to the arguments as Env2, Q2, S2

p. 7

2005 rel-xml-iii 12

vfca(Env2, Q2, S2) :

Now, process: for $s in $t/child::on-sale return

let vf = binding obtained for $s (= N1.1.2),

Env’ = Env2 + {$s vf1},

Expr’ = remainder of Q2 (= Q3),

S’ = sqlJoin(S2, vf.sql) = From: clothing c

in vfca(Env’, Expr’, S’)

Note: the SQL fragment has not changed, since N1.1.2 contains an empty fragment

2005 rel-xml-iii 13

vfca(Env3, Q3, S3 (= From Clothing c)) :

Now, process: if data($s) then……

How do we process vfca( Env3, data($s), S3) ?

Type-checking we know that $s is bound to a node that contains an atomic-valued expression (bool, in this case)

On a db instance, we would obtain true or false

Given the binding for $s (to N1.1.2), the SQL fragment of the child N1.1.2.1 contains a Select with an atomic-valued expression Select: c.on-sale

vfca returns this child : <N1.1.2.1 : Select: c.on-sale >

(vfca always returns a forest)

p. 7

2005 rel-xml-iii 14

vfca(Env, var, S) =

let vf = Env(var) // the forest/tree var is bound to

in forestJoin(vf, S) For $s, this gives N1.1.2, with From Clothing c

vfca(Env, data(E), S) =

let vf = vfca(Env, E, S)

in vf/child::*( * returns all nodes, independent of label; we know there is just one)

Note: S = From Clothing c is ignored in this result

2005 rel-xml-iii 15

vfca(Env, if E1 then E2 else E3 , S )

let vn = vfca(Env, E1, S) // must be a singleton, with bool-valued Select

sqlTrue.From = vn.SQL.From

sqlTrue.Where = vn.SQL.Where “and (” , vn.SQL.Select, “ ) ”

vf2 = vfca(Env, E2, sqlTrue)

sqlFalse.From = vn.SQL.From

sqlFalse.Where = vn.SQL.Where “and not (” , vn.SQL.Select, “ ) ”

vf3 = vfca(Env, E3, sqlFalse)

in forestJoin((vf2, vf3), S)

forestJoin(vf, S) : adds (using sqlJoin) S to each root of vf

2005 rel-xml-iii 16

In our example, E3 = (), the empty forest, so vf3 =(), so (vf2, vf3) = vf2, a singleton forest, a tree

E2 is the result of vfca(Env3, Q4’, sqlTrue), where Q4’ is element product {

element name {

for $item in $t/child::item return data($item)

}

It returns P1.1 <product>

Where: c.on-sale

P1.1.1 <name>

P1.1.1.1 string

Select: c.item

only the Select: c.on-salecondition was passeddown to this query

But, forsetJoin (end of if ) now adds to its root

From: Clothing c

2005 rel-xml-iii 17

A few more cases:

vfca(Env, element QName {E}, S) = // an element constructor

let vf = vfca(Env, E, ())

in elementNode(QName, vf, S)

Thus, all accumulated SQL is added to the constructed element node (see prev. page)

Why is the recursive call with empty SQL?

Attribute construction is similar

2005 rel-xml-iii 18

A few more cases:

vfca(Env, L, S) = // L is a literal

let sql.From = S.From

sql.Where = S.Where

sql.Select = L “as atomicValue ” // atomicValue –an attribute name // in sql.Select, it is ignore d

in atomicNode(sql)

vfca(Env, E1 arithOp|logicalOp E2, S) =

compute the trees for E1, E2, with SQL = ()

take their Select components, add arithOp|logicalOp

create an atomic node with this SQL fragment

then add S to its root

2005 rel-xml-iii 19

For a for $v2 in $v1.Axis::nodeTest…

We have looked at Axis = child

If Axis = descendent: need to collect the SQL components of all nodes between the bindings for $v1, $v2 (not inclusive)

If Axis = self/ancestor:

Essentially as in child

In many cases, we need to rename variables in the tree bound to $v2 – here is an example

2005 rel-xml-iii 20

$view is bound toN1 <product> , From clothing c Where c.category = “outerwear”

N1.1 <report> , From problems p where p.pid = c.cid

Query (fragment):

for $p in $view/self::product return

for $r1 in $p/child::report return

for $r2 in $p/child::report return

if (some comparison of fileds of r1, r2…) …

W/o renaming, $r1 and $r2 will be bound to same SQL fragment, with same variable p being defined

2005 rel-xml-iii 21

Renaming:

When processing a for $v in… :

• Find the forest/tree bound to $v

• Copy it, renaming all defined variables in SQL fragments to new variables (not harmful, may be needed)

• Remove the SQL fragment from the root –it is now preserved as a parameter of recursive calls

• Add to the environment a binding of $v to resulting forest/tree

2005 rel-xml-iii 22

Processing a let $v = E1 return E2 :

vfca(Env, let $v = E1 return E2, S) =

let vf1 = vfca(Env, E1, ())

in vfca(Env + {$vfv1}, E2, S)

(much simpler than for )

2005 rel-xml-iii 23

Execution of view forests

To obtain a query’s result,

• One or more SQL queries are generated from its view forest

• They generate ordered streams, that are merged, nested and tagged (outside the machine)

2005 rel-xml-iii 24

SQL queries construction and XML generation :

Recall that each node n has an associated query Cn

• Add keys: Add to Cn’s Select keys for all relations in its From – needed for sorting results

• Partition the view tree into a spanning forest – an SQL query is constructed for each tree in this forest

• The schema for a tree t: If the longest node index has k digits, add k attributes L1, …Lk; together, they represent a node index ; the schema also contains all other selected attributes in the tree

• The query for t is an outer-join that combines the different paths, sorted on : L1, level-1 atts, L2, level-2 atts, …

• Merge the sorted streams, nest and tag

2005 rel-xml-iii 25

Illustration by example :

(compare to fig. 6, p. 3 )for $c in $CV/Clothing/Tuple return

<product>

for $d in $CV/Discount/Tuple

where $d/pid = $c/pid return

<sale>{ data($c/price)*data($d/discount)}</sale>

for $p in $CV/Problems/Tuple

where $p/pid = $c/pid return

<report code=“{ data($p/code) }”>

{$p/comments}

</report>

</product>

(elements are pointed to by arrows)

2005 rel-xml-iii 26

View tree :

N1<product>

From Clothing c

N1.2.1string

Select p.comments

N1.2 <report>From Problems pWhere p.pid = c.pid

N1.1.1float

Select d.discount * c.price

N1.1 <sale>From Discount d Where d.pid = c.pid

2005 rel-xml-iii 27

Adding keys :

N1 <product>From Clothing cSelect c.pid

N1.2.1string

Select p.comments

N1.2 <report>From Problems pWhere p.pid = c.pidSelect p.pid

N1.1.1float

Select d.discount * c.price

N1.1 <sale>From Discount d Where d.pid = c.pidSelect d.pid

2005 rel-xml-iii 28

Partition the tree :

Atomic nodes always go with their parents, so they are not considered separately

2005 rel-xml-iii 29

Query for partition (a) (all tree):

Q.

2005 rel-xml-iii 30

Query for partition (b) (right node, report):

Select 2 as L2, c.pid, p.code. P.comments

From Clothing c, problems p

Where c.pid = d.pid

Order by c.pid, p.code

Note: this is on p. 35, and it seems the original has a typo

2005 rel-xml-iii 31

Performance results :

For a large database, and a query with a large result (for other queries all approaches are fine):

One large query, many small queries are both inferior to a small number (4--6) of queries

The paper presents a greedy algorithm for selecting an appropriate partition of the view forest

ואידך זיל גמור