View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Query Containment
The most fundamental relationship between a pair of queries
Query Q is contained in Q’ if:For any database D,Q(D) is a subset of Q’(D)
Roadmap Introduction and problem definition Containment of a subset of XML queries
Query containment is decidable
Query containment in practice Relaxing the assumptions
Conclusions
DepthFanout
Fixed Arbitrary
= 1 PTIME PTIME
Arbitrary coNP complete
In coNEXPTIME
Applications of Query Containment Semantic caching Determining independence of
database updates Query answering using views Detecting that a reformulated
query is redundant Query minimization Verification of knowledge bases
Query Processing in PDMS XML Query Containment in Peer Data
Management System (PDMS)
Answering queries using views to extract remote data
Removing redundant queries to enhance performance
MWS
MPW
MSB
MBW
QWQW
UW Stanford
Berkeley UPenn
QW
QP QB1
QB2
QS
QB1
QS
QB2 QB1
Query Containment: Relational v.s. XML
Relational
Input D Sets of tuples
Output Q(D) A set of tuples
Instance containment
Q(D) Q’(D)– Subset
Query containment
Q Q’– for every
input D, Q(D) Q’(D)
Query Containment: Relational v.s. XML
Relational XML
Input D Sets of tuplesAn XML instance
tree
Output Q(D) A set of tuplesAn XML instance
tree
Instance containment
Q(D) Q’(D)– Subset
Q(D) Q’(D)– Tree
embedding
Query containment
Q Q’– for every
input D, Q(D) Q’(D)
Q Q’– for every input
D, Q(D) Q’(D)
Example – An XML Instance
D:
<project>
<member>Alice</member>
</project>
<project>
<member>Bob</member>
</project>
project project
member member
Alice Bob
Example – An XML QueryQ:for $x in /project return<group>{
for $y in $x/member return <name>{
where $y=“Alice”return <Alice/>
where $y=“Bob”return <Bob/>
}</name>}</group>
D:
Q(D):
group
name
group
name
Alice Bob
project project
member member
Alice Bob
Example – Another XML Query
Q’:for $x in /project return<group>{
for $y in /project/member return <name>{
where $y=“Alice”return <Alice/>
where $y=“Bob”return <Bob/>
}</name>}</group>
D:
Q’(D):
name
group
name
Alice Bob
project project
member member
Alice Bob
Tree Embedding
Given two trees, a node mappingψfrom T1 to T2 is said to be an embedding from T1 to T2 if:
ψmaps the root of T1 to the root of T2.
If node n2 is a child of node n1 in T1, thenψ(n2) is a child ofψ(n1), and the labels of n1 and n2 has the same labels asψ(n1) andψ(n2).
What is the time complexity of
finding an embedding from t1
to t2?
Let e and e’ be two XML instances. e is contained in e’, denoted as e e’, if the tree of e can be embedded in the tree of e’.
Containment is reflexive and transitive.Containment is not antisymmetric: e e’
and e’ e do not imply e = e’.
XML Instance Containment
aa
b
a
b
Two XML instances that contain each
other but are not equivalent.
XML Query Containment
Let Q and Q’ be two XML queries.Q is contained in Q’, denoted as Q Q’, if for every input XML instance D, Q(D) Q’(D).
Q’(D):Q(D):
X
Example – Tree Embedding and Query Containment
Q (D) Q’(D)
Q’(D) Q (D)
name
group
name
Alice Bob
group
name
group
name
Alice Bob
Q’(D):Q(D):
name
group
name
Alice Bob
group
name
group
name
Alice Bob
Query Containment Problem
From answer containment to query containment
Our problemsGiven queries Q and Q’, decide whether Q
Q’The complexity of query containment
Q’(D) Q (D) Q’ Q
Q (D) Q’(D)
Q Q’
Previous Work (I)
Relational query containment Conjunctive queries [Chandra and Merlin, STOC
1977] Acyclic queries [Yannakakis, VLDB 1981] Queries with union [Sagiv and Yannakakis, JACM
1980] Queries with negation [Levy and Sagiv, VLDB 1993] Queries with arithmetic comparisons [Klug, JACM
1988] Recursive queries
[Shmueli, 1993], [Chaudhuri and Vardi, 1992] Queries over bags [Ioannidis and Ramakrishnan,
1995]
Previous Work (II)
XML query containment – two new challenges XPath containment
With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables
[Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions
[Florescu, Levy and Suciu, PODS 1998] Nested query containment
Containment Cannot be Determined Solely by Comparing XPath Components
Q: for $g in /group where $g/gname/text() = “database”return<area>{
for $p in $g/person return <person> <name>{$p/text()}</name>{for $q in $g/paper where $q/author/text() = $p/text() return
<paper>{$q/title/text()}</paper>}</person>
}</area>
Q’: for $g in /group return<area>{
for $p in $g/person return <person> <name>{$p/text()}</name> <group>{$g/gname/text()}</group>{for $q in $g/paper where $q/author/text() = $p/text() return
<paper>{$q/title/text()}</paper>}</person>
}</area>
Previous Work (II)
XML query containment – two new challenges XPath containment
With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables
[Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions
[Florescu, Levy and Suciu, PODS 1998] Nested query containment
Complex object query containment [Levy and Suciu, PODS 1997]Containment of nested XML queries Containment of nested XML queries
has has notnot been fully studied been fully studied
Conjunctive XML Queries (c-XQueries)
Returned variables are bound to tag names or text values only.
Conjunctive – no two sibling query blocks return the same tag
XPath: HAVE
Child axis (/) Wildcards (*) Branches ([…])
NOT HAVE descendant // Arithmetic comparison Union
Here, XPath containment is in Here, XPath containment is in PTIMEPTIME
Conjunctive Queries – cont.
A c-XQuery consists of nested query blocks.
The fan-out of a query block is the number of its immediate sub-blocks.
The nesting depth of a query is 1 plus the maximal nesting depth if its sub-blocks.The nesting depth of the query is the depth of
its outer-most block.
Query Head Tree
The structure of an XML query and its answers can be described using a query head tree. Edges represents query blocks.
The label of the node n in the head tree is the returned tag of the block corresponding to the incoming edge of n in Q .
A head tree is also an XML instance if its variables are substituted with actual values.
Query Head Tree Example:
Q: for $x in /project return<group>{for $s in $x/title/text() return<projtitile>{$s}</projtitle>} {for $t in $x/member/text() return<name>{$t}</name>}</group>
Query Head Treegroup
name
projtitle s
t
What is the fan-out and the nesting depth of Q?
Constant Conjunctive XML Queries (cc-XQueries) A cc-XQuery is a c-XQuery that
does not return tag variables.
The head tree of a cc-XQuery has constant labels only.
Roadmap Introduction and problem definition Containment of a subset of XML queries
Query containment is decidable
Query containment in practice Relaxing the assumptions
Conclusions
DepthFanout
Fixed Arbitrary
= 1 PTIME PTIME
Arbitrary coNP complete
In coNEXPTIME
Deciding Q Q’?
How to find a property for an infinite number of input XML instances
Standard technique Find a finite set of input representatives – Canonical
Databases Relational query: each canonical database is a
minimal input to generate the answer template XML query answers have infinite number of shapes
Find a finite set of answer templates – Canonical Answers
Answer Shapes Determined by the Head Tree
Q’:
for $x in /project return
<group>{
for $y in /project/member return
<name>{where $y=“Alice”
return <Alice/>
where $y=“Bob”
return <Bob/>
}</name>
}</group>
Alice
Bob
Head Tree:
group
namegroup
name
group
group
Alice
name
group
name
Bob
group
Alice
name
Bob
Head Tree:
An Additional Candidate Answer
name
group
name
Alice Bob
group
name
group
group
Alice
name
group
name
Bob
group
Alice
name
Bob
Head Tree:
Why Consider the Additional Case
name
group
name
Alice Bob
project project
member member
Alice Bob
Q(D):
group
name
group
name
Alice Bob
Q’(D):
D:
What can Serve as Canonical Answers?
Prefix subtrees of the head tree? – necessary but not sufficient
Trees contained in the head tree? – necessary and sufficient– but, too many and too complex
A Head Tree can Have Many Trees Contained in it
group
name name
Alice BobAlice
group
name name
Alice BobAliceBob
name
group group
Alice BobAliceBob
group
name name name
group
Alice
name
Bob
Head Tree:
What can Serve as Canonical Answers? Prefix subtrees of the head tree?
– necessary but not sufficient Trees contained in the head tree?
– necessary and sufficient– but, too many and too complex
Solution: consider only minimal trees that are contained in the head tree
Canonical Answer A minimal XML instance: No two sibling
subtrees where one is contained in the other Canonical Answer : A minimal XML instance
contained in the head tree
Every answer A of query Q corresponds to a unique canonical answer CA, s.t. A CA, CA A
group
name name
Alice BobAlice
group
Alice
name
Bob
group
name name
Alice Bob
Canonical Database Canonical Database: DBCA
The minimal XML instance to generate CA
project
member
project
member
Alice Bob
project
group
name name
Alice Bob
CA:
DB:
for $x in /project return
<group>{
for $y in /project/member return
<name>{
where $y=“Alice”
return <Alice/>
where $y=“Bob”
return <Bob/>
}</name>
}</group>
Canonical Database – Formal Def. Canonical Database of a cc-XQuery – DBCA.
DBCA is an XML instance, s.t. for each node N of CA where
N’s generator query block is qn the following holds:
Let p0/p1/…pn be a path expression in qn, where p0 is an
optional node variable from an ancestor query block.
For each pi, i [1,n], there is a distinct node, labeled i, that
is a
child of the node for pi-1. If p0 is absent, then p1 is a child of
DBCA’s root.
Sound and Complete Conditions for Nested Query ContainmentLet Q and Q’ be two cc-XQueries.
The following three conditions are equivalent:
1. Q Q’
2. For every canonical database DB of Q, Q(DB) Q’(DB)
3. For every canonical answer CA of Q,
a) CA is a canonical answer of Q’
b) DB’CA DBCA
Properties of Canonical Answers and Databases.
Lemma 1: Let Q be a cc-XQuery and D be an XML instance. There exist a unique canonical answer CA of Q, s.t. Q(D) CA and CA Q(D).
Lemma 2: Let Q be a cc-XQuery, CA be a
canonical answer of Q, DBCA be the canonical
database for CA of Q, and D be an XML instance.
CA Q(D) if only if DBCA D.
Containment of cc-XQueries – Proof (1)
1) => 2) Follows from definition.
2) => 3) CA Q(DBCA) Q(DBCA)
Q’(DBCA)
CA Q’(DBCA) a)
holds.
CA is a canonical answer of Q’ (a), CA
Q’(DBCA ),
DB’CA DBCA b) holds.
Lemma 2
2)
Containment is transitive
Lemma 2
Containment of cc-XQueries – Proof (2)
3) => 2) To show Q Q’, we need to show for every XML instance D, Q(D) Q’(D).
There exists a unique CA of Q, s.t. Q(D) CA and CA Q(D)
DBCA D.
DB’CA DBCA DB’CA D.
CA Q’(D) Q(D) Q’(D).
Lemma 1
Lemma 2
3) b) transitive
Lemma 2
transitive
Query Containment Algorithm Algorithm:
for every canonical answer CA of Q do
1. check whether CA is a canonical answer of Q’
2. generate DBCA and DB’CA
3. check DB’CA DBCA
Roadmap Introduction and problem definition Containment of a subset of XML queries
Query containment is decidable
Query containment in practice Relaxing the assumptions
Conclusions
DepthFanout
Fixed Arbitrary
= 1 ? ?
Arbitrary ? ?
Query Containment Algorithm Algorithm:
for every canonical answer CA of Q do
1. check whether CA is a canonical answer of Q’
2. generate DBCA and DB’CA
3. check DB’CA DBCA
Polynomial in the size and number of canonical answers What are the sizes of canonical answers? What is the number of canonical answers?
Containment of XML Queries with Fanout 1 E.g. d=3 – the depth; m=1 – the maximum fanout
Canonical Answers and Complexity Number: the depth of the query Size: bounded by the depth of the query Complexity: O( d·|Q|·|Q’|)
Theorem: Testing containment of XML Queries with fanout 1 is in PTIME
for $x in /project return
<group>{for $y in /project/member return
<name>{where $y =“Alice” return <Alice/>
}</name>
}</group>
group
Alice
name
group
name
group
Nesting with fanout 1 does not Nesting with fanout 1 does not increase complexityincrease complexity
Roadmap Introduction and problem definition Containment of a subset of XML queries
Query containment is decidable
Query containment in practice Relaxing the assumptions
Conclusions
DepthFanout
Fixed Arbitrary
= 1 PTIME PTIME
Arbitrary ? ?
Containment of XML Queries with Arbitrary Fanout E.g. d=4 – the depth; m=3 – the maximum fanout
Canonical Answers Complexity Number:
Size:
Theorem: Testing containment of XML Queries with depth 2 and arbitrary fanout is coNP-hard
1 2 3 1 2 2 33 1 1 2 2 3 2 33 1 3 11 21 2 2 31 2 3
d
d-1
d
Roadmap
Introduction and problem definition Containment of a subset of XML queries
Query containment is decidable
NOT
TIGHT
Query containment in practice Conclusions
DepthFanout
Fixed Arbitrary
= 1 PTIME PTIME
Arbitrary coNP hard coNP hard
Effect of the Depth on Containment of XML Queries Insight: Kernel Canonical Answer
The root node has a single child In any subtree, a path pattern is repeated no more than
cd times.d – query depthc – #(maximum path steps in a query block)
The size of kernel canonical answers Polynomial in the query size (for fixed nesting depth). Exponential in the query depth (for arbitrary depth).
Theorem: Testing containment of XML queries with fixed depth is
coNP-complete Testing containment of XML queries with arbitrary
depth is in coNEXPTIME
Effect of the Depth on Containment of XML Queries – Cont. Lemma 3: Let Q and Q’ be two cc-
XQueries. Q Q’ iff for each KCA of Q 1. KCA is a Canonical Answer of Q’. 2. DB’KCA DBKCA.
The size of a KCA is O(bcd)d
The number of KCA is O(m(bcd)d) b = #(query blocks in Q). m = #(maximum fanout in Q).
Effect of the Depth on Containment of XML Queries – Cont. Lemma 3: Let Q and Q’ be two cc-
XQueries. Q Q’ iff for each KCA of Q 1. KCA is a Canonical Answer of Q’. 2. DB’KCA DBKCA.
The size of a KCA is O(bcd)d
The number of KCA is O(m(bcd)d) b = #(query blocks in Q). m = #(maximum fanout in Q).
Roadmap Introduction and problem definition Containment of a subset of XML queries
Query containment is decidable
Query containment in practice Relaxing the assumptions
Conclusions
DepthFanout
Fixed Arbitrary
= 1 PTIME PTIME
Arbitrary coNP complete
In coNEXPTIME
Containment Checking in Practice Analyze element cardinality to reduce the
number of canonical answers for containment checking Given the query structure and the underlying XML
database schema, we can infer the cardinality of elements in the query answer.
Specifically, CAs are pruned according to the following 3 rules: 1. (=1) The schema implies that the a certain element
occurs exactly once under its parent element. 2. (≥1) A schema implies that t will occur at least
once under its parent element. 3. (≤1) Schema indicates a certain element occurs at
most once under its parent element.
Containment Checking in Practice – ExampleQ:
for $g in /group where $g/gname/text() = “database”return<area>{
for $p in $g/person return <person> <name>{$p/text()}</name>{for $q in $g/paper where $q/author/text() = $p/text() return
<paper>{$q/title/text()}</paper>}</person>
}</area>
Q’: for $g in /group return<area>{
for $p in $g/person return <person> <name>{$p/text()}</name> <group>{$g/gname/text()}</group>{for $q in $g/paper where $q/author/text() = $p/text() return
<paper>{$q/title/text()}</paper>}</person>
}</area>
#canonical answers – originally : 71
after analysis : 2
Roadmap
Introduction and problem definition Containment of a subset of XML
queries Query containment is decidable
Query containment in practice Relaxing the assumptions
Conclusions
DepthFanout
Fixed Arbitrary
= 1 PTIME PTIME
Arbitrary coNP complete
In coNEXPTIME
An Example Query that Returns Tag Variables
for $x in dbGrp return<result>{
for $y in $x/proj return <group>{
for $u in $y/member return <name> $u/text() </name>for $v in $y/paper return <pub> $v/text() </pub>
}</group>}</result>
Deciding Query Containment Leverage previous results –
simulation mapping [Levy and Suciu, PODS’97]
Check query simulation mapping for every canonical answer
Complexity Simulation mapping can be checked in
polynomial time in terms of query size Complexity of checking containment
does not arise
Roadmap Introduction and problem definition Containment of a subset of XML queries
Query containment is decidable
Query containment in practice Relaxing the assumptions
Conclusions
DepthFanout
Fixed Arbitrary
= 1 PTIME PTIME
Arbitrary coNP complete
In coNEXPTIME
Other Extensions
Query
Type
No tag variab
les
With tag
variables
With unions
Withneg
With//
Witheuiq-join on
tags
With arith comp
Un-neste
d
PTIME
PTIME
coNP complet
e
coNP comple
te
coNP complet
e
NP comple
te
2P
complete
Fan-out=1
PTIME
PTIME
coNP complet
e
coNP comple
te
coNP complet
e
NP comple
te
2P
complete
Fixed- depth
coNP complet
e
coNP complet
e
coNP complet
e
coNP comple
te
coNP complet
e
2P
complete
2P
complete
General
in coNEXPTIME
Conclusions
ContributionsA sound and complete condition for
containment of nested XML queriesDetailed complexity analysis
Future workEvaluate and optimize the containment
algorithm with element cardinality analysis
Answering nested XML queries using views