Upload
lynn-ferguson
View
214
Download
0
Embed Size (px)
Citation preview
2005 rel-xml-iii 1
View forests and query composition
• The composition algorithm works for a (large) subset of XQuery, excluding : (see paper for details)
• User-defined recursive functions
• order-dependent features (before, after, …)
• XQuery contain many redundant/complex features;
Every query can be translated (& simplified) into a XQueryCore expression (syntax in p. 20) – this is normalization
• The first step is type checking (e.g., data is applied only where the
element is atomic-valued) , then normalization, then the composition algorithm is applied
2005 rel-xml-iii 2
Normalization: (as defined in the XQuery formal semantics)
• Rewriting XML elements/attributes into internal form
• Breaking for/let clauses so each defines just one variable
• Replacing where e1 return e2 by if (e1) then e2 else ()
• Applying data to element/attribute operands of expressions that requires atomic arguments
SilkRoute adds some more:
• Long path expressions are broken into one-step expressions
• Each one-step binds a new variable
• … (see paper)
2005 rel-xml-iii 5
In the algorithm, a node is a triple
• QName
• Forest (of children)
• SQL (an sql fragment)
Constructed by
elementNode(QName, F, S) or attributeNode(QName, F, S)
An SQL fragment is a triple:
(From, Where, Select) (each may be empty)
The function joinSQL concatenates several fragments into one
The algorithm is recursive, works top-bottom on the given query
2005 rel-xml-iii 6
We illustrate with a simple example :
The relational db has one relation
Clothing Itemon-saleprice
coattrue99
shirtfalse38
skirtfalse45
2005 rel-xml-iii 7
Here is (a simplified) canonical view tree
Omitted SQL fragments are empty
We assume $CV is bound to N1
N1 <Clothing>
N1.1 <Tuple>
From: Clothing c
N1.1.1 <item>
N1.1.1.1 string
Select: c.item
N1.1.3 <price>N1.1.2 <on-sale>
N1.1.2.1 bool
Select: c.on-sale
N1.1.3.1 int
Select: c.price
back 11
Back13
2005 rel-xml-iii 8
Here is our query:element view {
for $t in $CV/child::Tuple return
for $s in $t/child::on-sale return
if data($s) then
element product {
element name {
for $item in $t/child::item return data($item)
}
}
else ()
}
Changes: Tuple product, only the item field is output (projection), and on-sale is used for a selection
p. 7
2005 rel-xml-iii 9
Here is the expected view tree of the composition
P1.1 <product>
From: Clothing c
Where: c.on-sale
P1.1.1 <name>
P1.1.1.1 string
Select: c.item
Q: How is the SQL fragment for P1 generated?
A: when the binding for $t is found, the SQL fragment From: Clothing cis collected;
When the if is processed, the Where: c.on-saleis collected;
when element productis encountered, the SQL is output
The algorithm has three parameters
Env: bindings of variables seen so far to view trees/forests
Expr: the expression to be processed
S: an SQL fragment (collected on the way down)
p. 8
P1 <view>
2005 rel-xml-iii 10
The algorithm vfca(Env, Exp, S) is
recursive, functional, top-down, returns a view forest
Denote by Qi the query with first i rows deleted
vfca is initially called with
Env0 = {$cv N1},
Expr0 = Q0,
S0 = ()
Process 1st line : element view {
let vf = vfca(Env0, Q1, ())
in elementNode(QName,vf, S)
A node P1, with label view & empty SQL, is generated, and with child(ren) – whatever is returned by the recursive call
2005 rel-xml-iii 11
vfca(Env1 (= Env0), Q1, S1 (= S0)) :
Processing 2nd line: for $t in $cv/child::Tuple return
let vf = binding obtained for $t (= N1.1),
Env’ = Env1 + {$t vf}, // will be changed a bit later
Expr’ = remainder of Q1 (= Q2) ,
S’ = sqlJoin(S1, vf.sql) = From: clothing c
in vfca(Env’, Expr’, S’)
We refer in the following to the arguments as Env2, Q2, S2
p. 7
2005 rel-xml-iii 12
vfca(Env2, Q2, S2) :
Now, process: for $s in $t/child::on-sale return
let vf = binding obtained for $s (= N1.1.2),
Env’ = Env2 + {$s vf1},
Expr’ = remainder of Q2 (= Q3),
S’ = sqlJoin(S2, vf.sql) = From: clothing c
in vfca(Env’, Expr’, S’)
Note: the SQL fragment has not changed, since N1.1.2 contains an empty fragment
2005 rel-xml-iii 13
vfca(Env3, Q3, S3 (= From Clothing c)) :
Now, process: if data($s) then……
How do we process vfca( Env3, data($s), S3) ?
Type-checking we know that $s is bound to a node that contains an atomic-valued expression (bool, in this case)
On a db instance, we would obtain true or false
Given the binding for $s (to N1.1.2), the SQL fragment of the child N1.1.2.1 contains a Select with an atomic-valued expression Select: c.on-sale
vfca returns this child : <N1.1.2.1 : Select: c.on-sale >
(vfca always returns a forest)
p. 7
2005 rel-xml-iii 14
vfca(Env, var, S) =
let vf = Env(var) // the forest/tree var is bound to
in forestJoin(vf, S) For $s, this gives N1.1.2, with From Clothing c
vfca(Env, data(E), S) =
let vf = vfca(Env, E, S)
in vf/child::*( * returns all nodes, independent of label; we know there is just one)
Note: S = From Clothing c is ignored in this result
2005 rel-xml-iii 15
vfca(Env, if E1 then E2 else E3 , S )
let vn = vfca(Env, E1, S) // must be a singleton, with bool-valued Select
sqlTrue.From = vn.SQL.From
sqlTrue.Where = vn.SQL.Where “and (” , vn.SQL.Select, “ ) ”
vf2 = vfca(Env, E2, sqlTrue)
sqlFalse.From = vn.SQL.From
sqlFalse.Where = vn.SQL.Where “and not (” , vn.SQL.Select, “ ) ”
vf3 = vfca(Env, E3, sqlFalse)
in forestJoin((vf2, vf3), S)
forestJoin(vf, S) : adds (using sqlJoin) S to each root of vf
2005 rel-xml-iii 16
In our example, E3 = (), the empty forest, so vf3 =(), so (vf2, vf3) = vf2, a singleton forest, a tree
E2 is the result of vfca(Env3, Q4’, sqlTrue), where Q4’ is element product {
element name {
for $item in $t/child::item return data($item)
}
It returns P1.1 <product>
Where: c.on-sale
P1.1.1 <name>
P1.1.1.1 string
Select: c.item
only the Select: c.on-salecondition was passeddown to this query
But, forsetJoin (end of if ) now adds to its root
From: Clothing c
2005 rel-xml-iii 17
A few more cases:
vfca(Env, element QName {E}, S) = // an element constructor
let vf = vfca(Env, E, ())
in elementNode(QName, vf, S)
Thus, all accumulated SQL is added to the constructed element node (see prev. page)
Why is the recursive call with empty SQL?
Attribute construction is similar
2005 rel-xml-iii 18
A few more cases:
vfca(Env, L, S) = // L is a literal
let sql.From = S.From
sql.Where = S.Where
sql.Select = L “as atomicValue ” // atomicValue –an attribute name // in sql.Select, it is ignore d
in atomicNode(sql)
vfca(Env, E1 arithOp|logicalOp E2, S) =
compute the trees for E1, E2, with SQL = ()
take their Select components, add arithOp|logicalOp
create an atomic node with this SQL fragment
then add S to its root
2005 rel-xml-iii 19
For a for $v2 in $v1.Axis::nodeTest…
We have looked at Axis = child
If Axis = descendent: need to collect the SQL components of all nodes between the bindings for $v1, $v2 (not inclusive)
If Axis = self/ancestor:
Essentially as in child
In many cases, we need to rename variables in the tree bound to $v2 – here is an example
2005 rel-xml-iii 20
$view is bound toN1 <product> , From clothing c Where c.category = “outerwear”
N1.1 <report> , From problems p where p.pid = c.cid
Query (fragment):
for $p in $view/self::product return
for $r1 in $p/child::report return
for $r2 in $p/child::report return
if (some comparison of fileds of r1, r2…) …
W/o renaming, $r1 and $r2 will be bound to same SQL fragment, with same variable p being defined
2005 rel-xml-iii 21
Renaming:
When processing a for $v in… :
• Find the forest/tree bound to $v
• Copy it, renaming all defined variables in SQL fragments to new variables (not harmful, may be needed)
• Remove the SQL fragment from the root –it is now preserved as a parameter of recursive calls
• Add to the environment a binding of $v to resulting forest/tree
2005 rel-xml-iii 22
Processing a let $v = E1 return E2 :
vfca(Env, let $v = E1 return E2, S) =
let vf1 = vfca(Env, E1, ())
in vfca(Env + {$vfv1}, E2, S)
(much simpler than for )
2005 rel-xml-iii 23
Execution of view forests
To obtain a query’s result,
• One or more SQL queries are generated from its view forest
• They generate ordered streams, that are merged, nested and tagged (outside the machine)
2005 rel-xml-iii 24
SQL queries construction and XML generation :
Recall that each node n has an associated query Cn
• Add keys: Add to Cn’s Select keys for all relations in its From – needed for sorting results
• Partition the view tree into a spanning forest – an SQL query is constructed for each tree in this forest
• The schema for a tree t: If the longest node index has k digits, add k attributes L1, …Lk; together, they represent a node index ; the schema also contains all other selected attributes in the tree
• The query for t is an outer-join that combines the different paths, sorted on : L1, level-1 atts, L2, level-2 atts, …
• Merge the sorted streams, nest and tag
2005 rel-xml-iii 25
Illustration by example :
(compare to fig. 6, p. 3 )for $c in $CV/Clothing/Tuple return
<product>
for $d in $CV/Discount/Tuple
where $d/pid = $c/pid return
<sale>{ data($c/price)*data($d/discount)}</sale>
for $p in $CV/Problems/Tuple
where $p/pid = $c/pid return
<report code=“{ data($p/code) }”>
{$p/comments}
</report>
</product>
(elements are pointed to by arrows)
2005 rel-xml-iii 26
View tree :
N1<product>
From Clothing c
N1.2.1string
Select p.comments
N1.2 <report>From Problems pWhere p.pid = c.pid
N1.1.1float
Select d.discount * c.price
N1.1 <sale>From Discount d Where d.pid = c.pid
2005 rel-xml-iii 27
Adding keys :
N1 <product>From Clothing cSelect c.pid
N1.2.1string
Select p.comments
N1.2 <report>From Problems pWhere p.pid = c.pidSelect p.pid
N1.1.1float
Select d.discount * c.price
N1.1 <sale>From Discount d Where d.pid = c.pidSelect d.pid
2005 rel-xml-iii 28
Partition the tree :
Atomic nodes always go with their parents, so they are not considered separately
2005 rel-xml-iii 30
Query for partition (b) (right node, report):
Select 2 as L2, c.pid, p.code. P.comments
From Clothing c, problems p
Where c.pid = d.pid
Order by c.pid, p.code
Note: this is on p. 35, and it seems the original has a typo
2005 rel-xml-iii 31
Performance results :
For a large database, and a query with a large result (for other queries all approaches are fine):
One large query, many small queries are both inferior to a small number (4--6) of queries
The paper presents a greedy algorithm for selecting an appropriate partition of the view forest
ואידך זיל גמור