Upload
atul-shridhar
View
853
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
1
Computing Full Disjunctions
Yaron Kanza
Yehoshua Sagiv
The Selim and Rachel Benin
School of Engineering
and Computer Science
The Hebrew University of Jerusalem
2
A Formal Definitions of Full Disjunction
3
Preliminary Notations
• Given – a set of relations r1, …, rn
– with schemes R1, …, Rn , respectively
• We denote with tij the j-th tuple of ri
• For X Ri, we denote by tij[X] the projection of tij on X
• Next, we give some preliminary definitions
4
Scheme Graph
• Two distinct schemes Ri and Rj are connected if RiRj is non-empty
• The scheme graph of R1, …, Rn consists of
– A node for each scheme Ri
– An edge between Ri and Rj if Ri and Rj are connected
Movies Actors
Actors-that-Directed
Acted-in
5
Connected Relations Schemes
• Relation schemes Ri1, …, Rim
are connected
if their scheme graph is connected
• Tuples ti1j1, …, timjm, from m distinct
relations, are connected if the relation schemes of these relations are connected
Movies Actors
Acted-in
Connected Relation Schemes
Movies Actors
Unconnected Relation Schemes
6
Join Consistent Tuples
• Two tuples ti1j1 and ti2j2
are join consistent if
ti1j1[Ri1
Ri2] = ti2j2[Ri1
Ri2]
• m tuples, from m distinct relations, are join consistent if every pair of connected tuples are join consistent
7
Universal Tuple
• A universal tuple u is defined over all the attributes in R1 … Rn and consists of null and non-null values
• We denote by û the non-null portion of u• A universal tuple is called integrated tuple
if there are m connected and join consistent tuples ti1j1
, …, timjm such that û is the natural join of ti1j1
, …, timjm
8
Maximal Universal Tuple
• A universal tuple u subsumes a universal tuple v if u is equal to v on all the non-null attributes of v
(i.e., u can be created from v by replacing some null values with non-null values)
• In a given set D, a tuple u is maximal if there is no tuple in D, other than u, that subsumes u
9
A Full Disjunction
• The full disjunction of r1, …, rn is the set of all maximal integrated tuples that can be generated from m tuples of r1, …, rn
10
Acyclic Scheme
• Given a set of schemes R1, …, Rn, their scheme hypergraph consists of– A node for each attribute that appears in some Ri
– For each Ri (1in), a hyperedge that includes the attributes of Ri
• α-acyclic scheme hypergraph:– All the hyperedges can be removed by a sequence of ear
removals
• γ-acyclic scheme hypergraph:– The Bachman diagram of the scheme hypergraph is
acyclic
11
12
Computing Full Disjunctions
13
Product Graph
• Given a query Q and a database D, the product of Q and D is a graph such that– The nodes are pairs of a node of Q and a node of D
– The edges are between nodes such that the pair of nodes of Q and the pair of nodes of D both are connected by edges with the same label in Q and in D, respectively
– The root is the pair of the root of Q and the root of D
14
1
2 4
5
6
title
language
7
3
year
8
director
9
name
10
movie
date of birth
11
1983
movieactor
Zelig Antz
1998
English
1/12/1935
Woody Allen
title
year
filmography item
filmography itemv1
v2w1
v3title
actormovie
director
filmography item
w2
w3
w4
date of birth
name
languageThe product of the query and the database is the nextgraph
15
title
language
director
name
movie
date of birth
movieactor
title
filmography item
filmography item
V1, 1
V2, 2 V2, 3 V3, 4
w1, 5 w2, 6 w1, 8 w3, 10 w4, 11
There are additional nodes that are not reachable from the root
16
• For a subgraph G of the product graph1. G has no repeated variables2. G contains the root3. Each node in G is reachable from the root4. G preserves the constraints (edges) of the
query
• Conditions 1 – 3 OR-matching graph• Conditions 1 – 4 weak-matching graph
Matching as a Subgraph of the Product Graph
17
title
language
director
name
movie
date of birth
movieactor
title
filmography item
filmography item
V1, 1
V2, 2 V2, 3 V3, 4
w1, 5 w2, 6 w1, 8 w3, 10 w4, 11
V1, 1
V2, 2
w1, 5 w2, 6
V3, 4
w3, 10 w4, 11
An OR-matchinggraphIt is also a weak-matching graph
18
title
language
director
name
movie
date of birth
movieactor
title
filmography item
filmography item
V1, 1
V2, 2 V2, 3 V3, 4
w1, 5 w2, 6 w1, 8 w3, 10 w4, 11
V1, 1
V3, 4
w3, 10 w4, 11
Another OR-matching graph
V2, 3
w1, 8
It is not a weak-matching graph since the “director” edge of the query is not preserved
19
Matching Graphs
Each OR-matching graph represents an OR-matching (and each weak-matching graph represent a weak matching)
Each OR-matching graph represents an OR-matching (and each weak-matching graph represent a weak matching)
An OR-matching can be represented by many OR-matching graphs, but all these graphs have the same set of nodes and only differ by their edges(and the same it true for weak-matchings and weak-matching graphs)
An OR-matching can be represented by many OR-matching graphs, but all these graphs have the same set of nodes and only differ by their edges(and the same it true for weak-matchings and weak-matching graphs)
Matching
20
Intuition
• For DAG queries, matching graphs are constructed by adding edges according to the query constraints– The order of the extensions is simply made by using a
topological sort of the query nodes
• For cyclic queries, a simple traversal over the query nor a simple traversal over the database will work– Instead, we use a stratum traversal over the matching
graph
21
title
language
director
name
movie
date of birth
movieactor
title
filmography item
filmography item
V1, 1
V2, 2 V2, 3 V3, 4
w1, 5 w2, 6 w1, 8 w3, 10 w4, 11
Dividing the edges to strata
Stratum 1
Stratum 2
Stratum 3
…
22
Stratum Traversal
• A stratum traversal is an ordered list that– Starts with the edges on stratum 1– Followed by the edges of stratum 2– …– Followed by the edges of stratum n– …
The order of the edges in eachstratum is unimportant
The order of the edges in eachstratum is unimportant
There can be multiple occurrences of the same edge in different strata
There can be multiple occurrences of the same edge in different strata
We only look at the first n strata,where n is the size of the query
We only look at the first n strata,where n is the size of the query
23
Computing the OR-Matching Graphs• A set of OR-matching graphs is created• We extend each OR-matching graph in the set by
adding edges according to the stratum traversal• Initially, the set includes a single graph that
consists only the root of the product graph• In each extension step, we try to add the current
edge to the graphs that were produced so far, and this may cause– The creation of a new graph that replaces the extended
graph– The creation of a new graph that is added to the set of
graphs in addition to the existing graphs– No change to the set of graphs
24
Adding an Edge
• After each addition of an edge, subsumed matching-graphs are being removed, to avoid exponential blowup
• There are six cases that should be handled
• The cases of extending a graph by an edge will be described next
25
No change is being done
movie V1, O1
V2, O2
actorV3, O4
title
V2, O2
V1, O3
The target of the added edge has a node with a pair that includes the root of Q without the root of D1
No change is being done
movie V1, O1
V2, O2
actorV3, O4
movie
V1, O1
V2, O2
The graph already includes the added edge
2
26
No change is being done
movie V1, O1
V2, O2
actorV3, O4
title
V2, O3
W1, O8
The graph does not include the source of the added edge
3
movie V1, O1
V2, O2
actorV3, O4
title
V2, O2
W1, O5
The graph includes the source of the added edge and no nodewith the variable of the target
4movie V1, O1
V2, O2
actorV3, O4
titleW1, O5
The edge is added to the graphand the new graph replaces the
existing graph
27
movie V1, O1
V2, O2
actorV3, O4
The graph already includes the source and the target of the added edge but does not include the added edge itself
5
title
W1, O3
a.k.a
V2, O2
W1, O3 The edge is added to the graphand the new graph replaces the
existing graph
a.k.a
28
movie V1, O1
V2, O2
actorV3, O4
film
V3, O4
V2, O4
The graph includes the source of the added edge but also includes a node with the same variable as the variable in the target of the added edge
6
title
W1, O3
Different nodeswith the samevariable V2
A new graph is created and being added to the existing graph, without replacing it
movie V1, O1
V2, O2
actorV3, O4
title
W1, O3
movie V1, O1
V2, O4
actorV3, O4
film
(V2,O2) is replaced by (V2,O4), and nodes that are not reachable from the root are being erased
29
Applying the algorithm to the movies example Applying the algorithm to the movies example
V1, 11
V1, 12
movieV2, 2
V1, 1movie
V2, 2
V1, 1
3movie
V2, 2
V1, 1
V1, 1
V2, 3
movie
movieV2, 2
V1, 1 V1, 1
V2, 3
movie
30
4movie
V2, 2
V1, 1 V1, 1
V2, 3
movie
actor
V1, 1
V3, 4
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactorV3, 4
actor
5
title
V2, 2
w1, 5
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactorV3, 4
actor
titlew1, 5
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactorV3, 4
actor
31
6
languageV2, 2
w2, 6
titlew1, 5
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactorV3, 4
actor
languagew2, 6
titlew1, 5
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactorV3, 4
actor
7
languagew2, 6
titlew1, 5
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactorV3, 4
actor
titlew1, 5
V2, 3
languagew2, 6
titlew1, 5
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactorV3, 4
actor
titlew1, 5
32
languagew2, 6
titlew1, 5
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactor
V3, 4
actor
titlew1, 5
8
name
V3, 4
w3, 10
name
w3, 10
namew3, 10
V3, 4
w4, 11
date of birth
9date of birth
w4, 11
date of birth
w4, 11
33
languagew2, 6
titlew1, 5
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactor
V3, 4
actor
titlew1, 5
10
director
V2, 2
V3, 4
name
w3, 10
namew3, 10
date of birth
w4, 11
date of birth
w4, 11
languagew2, 6
titlew1, 5
V3, 4
movie
V2, 2
V1, 1 V1, 1
V2, 3
movieactor
V3, 4
actor
titlew1, 5
name
w3, 10name
w3, 10
date of birth
w4, 11
date of birth
w4, 11director
34
11
filmography item
V3, 4
V2, 2
languagew2, 6
titlew1, 5
V3, 4
movieV2, 2
V1, 1 V1, 1
V2, 3
movieactor
V3, 4
actor
titlew1, 5
name
w3, 10
namew3, 10
date of birth
w4, 11
date of birth
w4, 11
titlew1, 5
movie
V2, 2
language
w2, 6
V3, 4
V1, 1 V1, 1
V2, 3
movieactor
V3, 4
actor
titlew1, 5
name
w3, 10name
w3, 10
date of birth
w4, 11
date of birth
w4, 11filmography
item
director
V1, 1
V2, 2 V3, 4
actor
namew3, 10
date of birth
w4, 11filmography
item
Subsumed by the left matching graph
35
12
filmography item
V3, 4
V2, 3
V1, 1
V2, 3
movieV3, 4
actor
titlew1, 5
namew3, 10
date of birth
w4, 11
titlew1, 5
movie
V2, 2
language
w2, 6
V3, 4
V1, 1
actor
name
w3, 10
date of birth
w4, 11filmography
item
director
titlew1, 5
movie
V2, 2
language
w2, 6
V3, 4
V1, 1 V1, 1
V2, 3
movieactor
V3, 4
actor
titlew1, 5
name
w3, 10name
w3, 10
date of birth
w4, 11
date of birth
w4, 11filmography
item
director
filmography
item
V2, 3V3, 4
V1, 1
actor
name
w3, 10
date of birth
w4, 11
filmography
item
Subsumed by the right matching graph
36
title
language
name
movie
date of birth
movieactor
title
filmography item
filmography item
V1, 1
V2, 2 V2, 3 V3, 4
w1, 5 w2, 6 w1, 8 w3, 10 w4, 11
director
titlew1, 5
movie
V2, 2
language
w2, 6
V3, 4
V1, 1
actor
name
w3, 10
date of birth
w4, 11filmography
item
directorV1, 1
V2, 3
movieV3, 4
actor
titlew1, 5
namew3, 10
date of birth
w4, 11filmography
item
The OR-Matchings
The Product Graph
37
Computing Maximal Weak-Matching Graphs
• In order to compute maximal weak matching graphs, the same algorithm is being used with a slight change
• After each addition of an edge the nodes that cause a query constraint not to be preserved are removed (along with edges that contain these nodes)
• Also, are deleted nodes that the previous deletion causes them not to be reachable from the root
38
The Algorithm Computes Weak-Queries in Polynomial Time
Theorem Given a query Q and a database D,
the revised algorithm terminates with the set
of maximal weak-matching graphs of Q
w.r.t. D. The runtime of the algorithm is
O(q3dm2), where q is the size of the query, d is
the size of the database and m is the size of
the result