38
1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem

Computing FDs

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Computing FDs

1

Computing Full Disjunctions

Yaron Kanza

Yehoshua Sagiv

The Selim and Rachel Benin

School of Engineering

and Computer Science

The Hebrew University of Jerusalem

Page 2: Computing FDs

2

A Formal Definitions of Full Disjunction

Page 3: Computing FDs

3

Preliminary Notations

• Given – a set of relations r1, …, rn

– with schemes R1, …, Rn , respectively

• We denote with tij the j-th tuple of ri

• For X Ri, we denote by tij[X] the projection of tij on X

• Next, we give some preliminary definitions

Page 4: Computing FDs

4

Scheme Graph

• Two distinct schemes Ri and Rj are connected if RiRj is non-empty

• The scheme graph of R1, …, Rn consists of

– A node for each scheme Ri

– An edge between Ri and Rj if Ri and Rj are connected

Movies Actors

Actors-that-Directed

Acted-in

Page 5: Computing FDs

5

Connected Relations Schemes

• Relation schemes Ri1, …, Rim

are connected

if their scheme graph is connected

• Tuples ti1j1, …, timjm, from m distinct

relations, are connected if the relation schemes of these relations are connected

Movies Actors

Acted-in

Connected Relation Schemes

Movies Actors

Unconnected Relation Schemes

Page 6: Computing FDs

6

Join Consistent Tuples

• Two tuples ti1j1 and ti2j2

are join consistent if

ti1j1[Ri1

Ri2] = ti2j2[Ri1

Ri2]

• m tuples, from m distinct relations, are join consistent if every pair of connected tuples are join consistent

Page 7: Computing FDs

7

Universal Tuple

• A universal tuple u is defined over all the attributes in R1 … Rn and consists of null and non-null values

• We denote by û the non-null portion of u• A universal tuple is called integrated tuple

if there are m connected and join consistent tuples ti1j1

, …, timjm such that û is the natural join of ti1j1

, …, timjm

Page 8: Computing FDs

8

Maximal Universal Tuple

• A universal tuple u subsumes a universal tuple v if u is equal to v on all the non-null attributes of v

(i.e., u can be created from v by replacing some null values with non-null values)

• In a given set D, a tuple u is maximal if there is no tuple in D, other than u, that subsumes u

Page 9: Computing FDs

9

A Full Disjunction

• The full disjunction of r1, …, rn is the set of all maximal integrated tuples that can be generated from m tuples of r1, …, rn

Page 10: Computing FDs

10

Acyclic Scheme

• Given a set of schemes R1, …, Rn, their scheme hypergraph consists of– A node for each attribute that appears in some Ri

– For each Ri (1in), a hyperedge that includes the attributes of Ri

• α-acyclic scheme hypergraph:– All the hyperedges can be removed by a sequence of ear

removals

• γ-acyclic scheme hypergraph:– The Bachman diagram of the scheme hypergraph is

acyclic

Page 11: Computing FDs

11

Page 12: Computing FDs

12

Computing Full Disjunctions

Page 13: Computing FDs

13

Product Graph

• Given a query Q and a database D, the product of Q and D is a graph such that– The nodes are pairs of a node of Q and a node of D

– The edges are between nodes such that the pair of nodes of Q and the pair of nodes of D both are connected by edges with the same label in Q and in D, respectively

– The root is the pair of the root of Q and the root of D

Page 14: Computing FDs

14

1

2 4

5

6

title

language

7

3

year

8

director

9

name

10

movie

date of birth

11

1983

movieactor

Zelig Antz

1998

English

1/12/1935

Woody Allen

title

year

filmography item

filmography itemv1

v2w1

v3title

actormovie

director

filmography item

w2

w3

w4

date of birth

name

languageThe product of the query and the database is the nextgraph

Page 15: Computing FDs

15

title

language

director

name

movie

date of birth

movieactor

title

filmography item

filmography item

V1, 1

V2, 2 V2, 3 V3, 4

w1, 5 w2, 6 w1, 8 w3, 10 w4, 11

There are additional nodes that are not reachable from the root

Page 16: Computing FDs

16

• For a subgraph G of the product graph1. G has no repeated variables2. G contains the root3. Each node in G is reachable from the root4. G preserves the constraints (edges) of the

query

• Conditions 1 – 3 OR-matching graph• Conditions 1 – 4 weak-matching graph

Matching as a Subgraph of the Product Graph

Page 17: Computing FDs

17

title

language

director

name

movie

date of birth

movieactor

title

filmography item

filmography item

V1, 1

V2, 2 V2, 3 V3, 4

w1, 5 w2, 6 w1, 8 w3, 10 w4, 11

V1, 1

V2, 2

w1, 5 w2, 6

V3, 4

w3, 10 w4, 11

An OR-matchinggraphIt is also a weak-matching graph

Page 18: Computing FDs

18

title

language

director

name

movie

date of birth

movieactor

title

filmography item

filmography item

V1, 1

V2, 2 V2, 3 V3, 4

w1, 5 w2, 6 w1, 8 w3, 10 w4, 11

V1, 1

V3, 4

w3, 10 w4, 11

Another OR-matching graph

V2, 3

w1, 8

It is not a weak-matching graph since the “director” edge of the query is not preserved

Page 19: Computing FDs

19

Matching Graphs

Each OR-matching graph represents an OR-matching (and each weak-matching graph represent a weak matching)

Each OR-matching graph represents an OR-matching (and each weak-matching graph represent a weak matching)

An OR-matching can be represented by many OR-matching graphs, but all these graphs have the same set of nodes and only differ by their edges(and the same it true for weak-matchings and weak-matching graphs)

An OR-matching can be represented by many OR-matching graphs, but all these graphs have the same set of nodes and only differ by their edges(and the same it true for weak-matchings and weak-matching graphs)

Matching

Page 20: Computing FDs

20

Intuition

• For DAG queries, matching graphs are constructed by adding edges according to the query constraints– The order of the extensions is simply made by using a

topological sort of the query nodes

• For cyclic queries, a simple traversal over the query nor a simple traversal over the database will work– Instead, we use a stratum traversal over the matching

graph

Page 21: Computing FDs

21

title

language

director

name

movie

date of birth

movieactor

title

filmography item

filmography item

V1, 1

V2, 2 V2, 3 V3, 4

w1, 5 w2, 6 w1, 8 w3, 10 w4, 11

Dividing the edges to strata

Stratum 1

Stratum 2

Stratum 3

Page 22: Computing FDs

22

Stratum Traversal

• A stratum traversal is an ordered list that– Starts with the edges on stratum 1– Followed by the edges of stratum 2– …– Followed by the edges of stratum n– …

The order of the edges in eachstratum is unimportant

The order of the edges in eachstratum is unimportant

There can be multiple occurrences of the same edge in different strata

There can be multiple occurrences of the same edge in different strata

We only look at the first n strata,where n is the size of the query

We only look at the first n strata,where n is the size of the query

Page 23: Computing FDs

23

Computing the OR-Matching Graphs• A set of OR-matching graphs is created• We extend each OR-matching graph in the set by

adding edges according to the stratum traversal• Initially, the set includes a single graph that

consists only the root of the product graph• In each extension step, we try to add the current

edge to the graphs that were produced so far, and this may cause– The creation of a new graph that replaces the extended

graph– The creation of a new graph that is added to the set of

graphs in addition to the existing graphs– No change to the set of graphs

Page 24: Computing FDs

24

Adding an Edge

• After each addition of an edge, subsumed matching-graphs are being removed, to avoid exponential blowup

• There are six cases that should be handled

• The cases of extending a graph by an edge will be described next

Page 25: Computing FDs

25

No change is being done

movie V1, O1

V2, O2

actorV3, O4

title

V2, O2

V1, O3

The target of the added edge has a node with a pair that includes the root of Q without the root of D1

No change is being done

movie V1, O1

V2, O2

actorV3, O4

movie

V1, O1

V2, O2

The graph already includes the added edge

2

Page 26: Computing FDs

26

No change is being done

movie V1, O1

V2, O2

actorV3, O4

title

V2, O3

W1, O8

The graph does not include the source of the added edge

3

movie V1, O1

V2, O2

actorV3, O4

title

V2, O2

W1, O5

The graph includes the source of the added edge and no nodewith the variable of the target

4movie V1, O1

V2, O2

actorV3, O4

titleW1, O5

The edge is added to the graphand the new graph replaces the

existing graph

Page 27: Computing FDs

27

movie V1, O1

V2, O2

actorV3, O4

The graph already includes the source and the target of the added edge but does not include the added edge itself

5

title

W1, O3

a.k.a

V2, O2

W1, O3 The edge is added to the graphand the new graph replaces the

existing graph

a.k.a

Page 28: Computing FDs

28

movie V1, O1

V2, O2

actorV3, O4

film

V3, O4

V2, O4

The graph includes the source of the added edge but also includes a node with the same variable as the variable in the target of the added edge

6

title

W1, O3

Different nodeswith the samevariable V2

A new graph is created and being added to the existing graph, without replacing it

movie V1, O1

V2, O2

actorV3, O4

title

W1, O3

movie V1, O1

V2, O4

actorV3, O4

film

(V2,O2) is replaced by (V2,O4), and nodes that are not reachable from the root are being erased

Page 29: Computing FDs

29

Applying the algorithm to the movies example Applying the algorithm to the movies example

V1, 11

V1, 12

movieV2, 2

V1, 1movie

V2, 2

V1, 1

3movie

V2, 2

V1, 1

V1, 1

V2, 3

movie

movieV2, 2

V1, 1 V1, 1

V2, 3

movie

Page 30: Computing FDs

30

4movie

V2, 2

V1, 1 V1, 1

V2, 3

movie

actor

V1, 1

V3, 4

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactorV3, 4

actor

5

title

V2, 2

w1, 5

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactorV3, 4

actor

titlew1, 5

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactorV3, 4

actor

Page 31: Computing FDs

31

6

languageV2, 2

w2, 6

titlew1, 5

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactorV3, 4

actor

languagew2, 6

titlew1, 5

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactorV3, 4

actor

7

languagew2, 6

titlew1, 5

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactorV3, 4

actor

titlew1, 5

V2, 3

languagew2, 6

titlew1, 5

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactorV3, 4

actor

titlew1, 5

Page 32: Computing FDs

32

languagew2, 6

titlew1, 5

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactor

V3, 4

actor

titlew1, 5

8

name

V3, 4

w3, 10

name

w3, 10

namew3, 10

V3, 4

w4, 11

date of birth

9date of birth

w4, 11

date of birth

w4, 11

Page 33: Computing FDs

33

languagew2, 6

titlew1, 5

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactor

V3, 4

actor

titlew1, 5

10

director

V2, 2

V3, 4

name

w3, 10

namew3, 10

date of birth

w4, 11

date of birth

w4, 11

languagew2, 6

titlew1, 5

V3, 4

movie

V2, 2

V1, 1 V1, 1

V2, 3

movieactor

V3, 4

actor

titlew1, 5

name

w3, 10name

w3, 10

date of birth

w4, 11

date of birth

w4, 11director

Page 34: Computing FDs

34

11

filmography item

V3, 4

V2, 2

languagew2, 6

titlew1, 5

V3, 4

movieV2, 2

V1, 1 V1, 1

V2, 3

movieactor

V3, 4

actor

titlew1, 5

name

w3, 10

namew3, 10

date of birth

w4, 11

date of birth

w4, 11

titlew1, 5

movie

V2, 2

language

w2, 6

V3, 4

V1, 1 V1, 1

V2, 3

movieactor

V3, 4

actor

titlew1, 5

name

w3, 10name

w3, 10

date of birth

w4, 11

date of birth

w4, 11filmography

item

director

V1, 1

V2, 2 V3, 4

actor

namew3, 10

date of birth

w4, 11filmography

item

Subsumed by the left matching graph

Page 35: Computing FDs

35

12

filmography item

V3, 4

V2, 3

V1, 1

V2, 3

movieV3, 4

actor

titlew1, 5

namew3, 10

date of birth

w4, 11

titlew1, 5

movie

V2, 2

language

w2, 6

V3, 4

V1, 1

actor

name

w3, 10

date of birth

w4, 11filmography

item

director

titlew1, 5

movie

V2, 2

language

w2, 6

V3, 4

V1, 1 V1, 1

V2, 3

movieactor

V3, 4

actor

titlew1, 5

name

w3, 10name

w3, 10

date of birth

w4, 11

date of birth

w4, 11filmography

item

director

filmography

item

V2, 3V3, 4

V1, 1

actor

name

w3, 10

date of birth

w4, 11

filmography

item

Subsumed by the right matching graph

Page 36: Computing FDs

36

title

language

name

movie

date of birth

movieactor

title

filmography item

filmography item

V1, 1

V2, 2 V2, 3 V3, 4

w1, 5 w2, 6 w1, 8 w3, 10 w4, 11

director

titlew1, 5

movie

V2, 2

language

w2, 6

V3, 4

V1, 1

actor

name

w3, 10

date of birth

w4, 11filmography

item

directorV1, 1

V2, 3

movieV3, 4

actor

titlew1, 5

namew3, 10

date of birth

w4, 11filmography

item

The OR-Matchings

The Product Graph

Page 37: Computing FDs

37

Computing Maximal Weak-Matching Graphs

• In order to compute maximal weak matching graphs, the same algorithm is being used with a slight change

• After each addition of an edge the nodes that cause a query constraint not to be preserved are removed (along with edges that contain these nodes)

• Also, are deleted nodes that the previous deletion causes them not to be reachable from the root

Page 38: Computing FDs

38

The Algorithm Computes Weak-Queries in Polynomial Time

Theorem Given a query Q and a database D,

the revised algorithm terminates with the set

of maximal weak-matching graphs of Q

w.r.t. D. The runtime of the algorithm is

O(q3dm2), where q is the size of the query, d is

the size of the database and m is the size of

the result