29
Ann Math Artif Intell (2011) 63:257–285 DOI 10.1007/s10472-012-9276-z Autonomous sets for the hypergraph of all canonical covers Henning Köhler Published online: 24 February 2012 © Springer Science+Business Media B.V. 2012 Abstract We present a method for decomposing a hypergraph with certain regular- ities into smaller hypergraphs, in a “direct product”-like fashion. By applying this to the set of all canonical covers of a given set of functional dependencies, we obtain more efficient methods for solving several optimization problems in database design. These include finding one or all “optimal” covers w.r.t. different criteria, which can help to synthesize better decompositions, and to reduce the cost of constraint check- ing. As a central step we investigate how the hypergraph of all canonical covers can be computed efficiently. Our results suggest that decomposed representations of this hypergraph are usually small and can be obtained rather quickly, even if the number of canonical covers is huge. Keywords Hypergraph · Transversal · Database · Functional dependency · Cover Mathematics Subject Classifications (2010) 03B04 · 05C65 · 05C70 · 06B04 1 Introduction Many data structures can be modeled as hypergraphs, and such hypergraphs often display certain regularities. To make such regularities explicit, we introduce the no- tion of autonomous vertex sets. This allows us to decompose a given hypergraph into smaller hypergraphs, which can be stored and manipulated more efficiently. In particular, the set of all solutions to a given problem often forms a hypergraph with the type of regularities we are interested in. While in this case the hypergraph is not given explicitly, determining the autonomous sets can help to split the problem into smaller sub-problems, which can be solved independently. H. Köhler (B ) University of Queensland, Brisbane, QL, Australia e-mail: [email protected]

Autonomous sets for the hypergraph of all canonical covers

Embed Size (px)

Citation preview

Page 1: Autonomous sets for the hypergraph of all canonical covers

Ann Math Artif Intell (2011) 63:257–285DOI 10.1007/s10472-012-9276-z

Autonomous sets for the hypergraphof all canonical covers

Henning Köhler

Published online: 24 February 2012© Springer Science+Business Media B.V. 2012

Abstract We present a method for decomposing a hypergraph with certain regular-ities into smaller hypergraphs, in a “direct product”-like fashion. By applying this tothe set of all canonical covers of a given set of functional dependencies, we obtainmore efficient methods for solving several optimization problems in database design.These include finding one or all “optimal” covers w.r.t. different criteria, which canhelp to synthesize better decompositions, and to reduce the cost of constraint check-ing. As a central step we investigate how the hypergraph of all canonical covers canbe computed efficiently. Our results suggest that decomposed representations of thishypergraph are usually small and can be obtained rather quickly, even if the numberof canonical covers is huge.

Keywords Hypergraph · Transversal · Database · Functional dependency · Cover

Mathematics Subject Classifications (2010) 03B04 · 05C65 · 05C70 · 06B04

1 Introduction

Many data structures can be modeled as hypergraphs, and such hypergraphs oftendisplay certain regularities. To make such regularities explicit, we introduce the no-tion of autonomous vertex sets. This allows us to decompose a given hypergraph intosmaller hypergraphs, which can be stored and manipulated more efficiently.

In particular, the set of all solutions to a given problem often forms a hypergraphwith the type of regularities we are interested in. While in this case the hypergraph isnot given explicitly, determining the autonomous sets can help to split the probleminto smaller sub-problems, which can be solved independently.

H. Köhler (B)University of Queensland, Brisbane, QL, Australiae-mail: [email protected]

Page 2: Autonomous sets for the hypergraph of all canonical covers

258 H. Köhler

One such hypergraph, which arises in database theory, is formed by the set ofall canonical covers of a given set of functional dependencies. Covers, in particularcanonical covers, play an important role in database design. They can determine thedecomposition when following the synthesis approach [3], and are needed for con-straint checking. To find good decompositions, or speed up constraint checking, it isvital to use the right cover, which is often in canonical form [8, 16] or can immediatelybe derived from a canonical cover [13].

We will investigate this hypergraph and show how autonomous sets can be foundefficiently. We also describe how it can be computed in (partially) decomposed formdirectly, without obtaining a non-decomposed version first. This is vital since thenumber of canonical covers is often prohibitively large. Based on these findings weindicate how certain covers can be found which are “optimal” in some sense. As a re-sult of using optimal covers, we can improve the speed of database updates.

The rest of the paper is organized as follows. Section 2 introduces some basicterminology related to hypergraphs. Autonomous sets are defined and characterizedin Section 2.1. An algorithm for computing the minimal autonomous sets of a givenhypergraph is given in Section 2.2. In Section 3 the results on hypergraphs are appliedto canonical covers, and certain autonomous sets are characterized. This is then usedto construct an algorithm which computes all canonical covers in decomposed form.The autonomous sets used here need not be minimal though, and in Section 4 it isshown that identifying the minimal autonomous sets is co-NP-hard. Related work ismentioned in Section 5.

A short version of this paper appeared in [9].

2 Hypergraph decomposition

We introduce some basic terminology and well known results about hypergraphs.

Definition 1 A hypergraph H on a vertex set V is a set of subsets of V, i.e., H ⊆P(V). The elements of H are called edges. A hypergraph is called simple if none ofits edges is included in another.

Definition 2 The set ϑH of vertices actually appearing in edges of a hypergraph H iscalled the support of H:

ϑH :=⋃

e∈H

e

Note that we do allow the empty edge in a hypergraph, and that we do not requirethat V = ϑH . This simplifies some arguments, but has no significant impact on theresults.

Definition 3 Let H, G be hypergraphs. We define the cross-union H ∨ G of H andG as

H ∨ G := {eH ∪ eG | eH ∈ H, eG ∈ G}If VH, VG are the vertex sets of H and G then H ∨ G is a hypergraph on VH ∪ VG.

Page 3: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 259

Definition 4 Let H be a hypergraph on vertex set V. The projection H[S] of Honto S is

H[S] := {e ∩ S | e ∈ H},which is a hypergraph on V ∩ S.

Definition 5 Let H be a hypergraph on V. A set t ⊆ V is a transversal of H if tintersects with every edge of H. We denote the set of all minimal transversals (w.r.t.inclusion) by Tr(H), and call Tr(H) the transversal hypergraph of H.

Clearly Tr(H) is a simple hypergraph on V, even if H is not simple.

Lemma 1 [2] Let H be a simple hypergraph. Then Tr(Tr(H)) = H.

2.1 Autonomous sets

We shall introduce the concept of an autonomous vertex set. Note that our definitionis not meant to extend any use of the term “autonomous set” in the context of graphs,where it is better known as “module”, and characterizes vertex sets M in whicheach vertex v ∈ M has the same neighbors outside M [7]. Using our terminology,autonomous sets are only interesting for hypergraphs but not for graphs. Essentiallythe only graphs with non-trivial autonomous sets are complete bipartite graphs.

Definition 6 Let H be a hypergraph on the vertex set V. We call a vertex subset S ⊆V autonomous if H = H[S] ∨ H[S] where S := V \ S denotes the complement of S.

Clearly the complement of an autonomous set is itself autonomous, and

H ⊆ H[S] ∨ H[S]for any S ⊆ V. The sets ∅, V are autonomous for any hypergraph H on V, as are allsubsets of V \ ϑH and their complements.

Example 1 Consider the vertex set V = ABCDE, and on it the hypergraph

H = {AC, AD, BC, BD}H is simple, and its support is ϑH = ABCD. The set S = AB is autonomous for H,as is its complement S = CDE, since

H[AB] ∨ H[CDE] = {A, B} ∨ {C, D} = {AC, AD, BC, BD} = H

Lemma 2 Let S, T ⊆ V be autonomous. Then S ∩ T is autonomous as well.

Proof We need to show that for every pair of edges e1, e2 ∈ H the edge e′1 ∪ e′

2 withe′

1 := e1 ∩ (S ∩ T) and e′2 := e2 ∩ S ∩ T lies in H as well. Since S is autonomous, the

edge e′ := (e1 ∩ S) ∪ (e2 ∩ S) lies in H. Thus, since T is autonomous, the edge e′′ :=(e′ ∩ T) ∪ (e2 ∩ T) lies in H. Clearly

e′′ ∩ (S ∩ T) = e′ ∩ (S ∩ T) = e1 ∩ (S ∩ T) = e′1

Page 4: Autonomous sets for the hypergraph of all canonical covers

260 H. Köhler

and similarly

e′′ ∩ S ∩ T = (e′ ∩ (T \ S)

) ∪(

e2 ∩ T)

= (e2 ∩ (T \ S)) ∪(

e2 ∩ T)

= e′2

which shows e′1 ∪ e′

2 = e′′ ∈ H.

Corollary 1 Let S, T ⊆ V be autonomous. Then S ∪ T is autonomous.

Proof The complements and intersections of autonomous sets are autonomous bydefinition and by Lemma 2, respectively, and we have

S ∪ T = S ∩ T

Proposition 1 Let H, G be hypergraphs and S1, S2 vertex sets. Then we have

(i) H[S1][S2] = H[S1 ∩ S2](ii) (H ∨ G)[S1] = H[S1] ∨ G[S1]

Lemma 3 Let H be a hypergraph on V and S ⊆ V autonomous for H. Then for anyT ⊆ V the set S ∩ T is autonomous for H[T].

Proof Since S is autonomous for H we have H = H[S] ∨ H[S]. Thus

H[T] =(

H[S] ∨ H[S])

[T]= H[S][T] ∨ H[S][T]= H[T][S ∩ T] ∨ H[T][S ∩ T]

Theorem 1 Let H be a hypergraph on V and {S1, . . . , Sn} a partition of V into auto-nomous sets. Then

H = H[S1] ∨ . . . ∨ H[Sn]

Proof By induction on n. The equation hold trivially for n = 1. Assume now thetheorem holds for a fixed value of n. To show the theorem for n + 1 we use thatSn ∪ Sn+1 is autonomous by Corollary 1, so that by assumption we have

H = H[S1] ∨ . . . ∨ H[Sn−1] ∨ H[Sn ∪ Sn+1]By Lemma 3 Sn is autonomous for H[Sn ∪ Sn+1], i.e., we have

H[Sn ∪ Sn+1] = H[Sn] ∨ H[Sn+1]which shows the theorem for n + 1.

Page 5: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 261

When talking about minimal autonomous sets, we will always mean minimal w.r.t.inclusion among all non-empty autonomous sets, even though the empty set is alwaysautonomous by definition. While it would be more precise to call them minimal non-empty autonomous sets, this quickly becomes tedious.

Theorem 2 Every hypergraph H has a f inest partition {S1, . . . , Sn} into minimalautonomous sets. The autonomous sets of H are just the unions of these sets.

Proof Let S1, . . . , Sn be the minimal autonomous sets of H. By Lemma 2 they arepairwise disjoint. The union of autonomous sets is itself autonomous by Corollary 1,in particular

S :=n⋃

i=1

Si

Furthermore the complement of S is autonomous, and since S does not include anyminimal autonomous set it is empty, i.e., S = V. Thus the sets S1, . . . , Sn form apartition of V.

Whenever an autonomous set T intersects with some Si it must include it com-pletely, since otherwise T ∩ Si would be a smaller non-empty autonomous set byLemma 2. Thus each autonomous set is the union of those Si it intersects with.

Example 2 Consider again H = {AC, AD, BC, BD} on V = ABCDE. Then

H = H[AB] ∨ H[CD] ∨ H[E] = {A, B} ∨ {C, D} ∨ {∅}so the minimal autonomous sets of H are AB, CD, E. Thus H has a total of 23

autonomous sets:

∅, AB, CD, E, ABCD, ABE, CDE, ABCDE

We now consider another type of decomposition, which will help us in character-izing the autonomous sets of simple hypergraphs.

Definition 7 Let H be a hypergraph on vertex set V. The subhypergraph H 〈S〉 of Hinduced by S ⊆ V is

H 〈S〉 := {e ∈ H | e ⊆ S}

Definition 8 Let H be a hypergraph on the vertex set V. We call a vertex subset

S ⊆ V isolated if H = H 〈S〉 ∪ H⟨S⟩.

Clearly S is isolated if and only if every edge it intersects with lies completely in S.As with minimal autonomous sets, we will mean by minimal isolated sets the minimalsets (w.r.t. inclusion) among all non-empty isolated sets.

Definition 9 As with graphs, we say that two vertices v1, vn in a hypergraph H areconnected if there exists a sequence v1, v2, . . . , vn such vi, vi+1 always lie in somecommon hyperedge of H. H is connected if all (pairs of) its vertices are connected.The connected components of H are its connected subhypergraphs.

Page 6: Autonomous sets for the hypergraph of all canonical covers

262 H. Köhler

It follows immediately that the minimal isolated sets of H are the vertex sets ofits maximal connected components, and that the isolated sets of H are the unionsof them.

Recall that Tr(H) denotes the transversal hypergraph of H.

Theorem 3 Let H, G be hypergraphs on disjoint vertex sets VH and VG. Then

Tr(H ∨ G) = Tr(H) ∪ Tr(G)

Tr(H ∪ G) = Tr(H) ∨ Tr(G)

Proof

(1) We first show that a set t ⊆ V := VH ∪ VG is a transversal of H ∨ G iff itintersects with every edge of H or with every edge of G. For the “if” part,assume w.l.o.g. that t intersects with every edge of H. Since every edge e ∈H ∨ G is of the form e = eH ∪ eG with eH ∈ H, eG ∈ G, t intersects with ebecause it intersects with eH . We show the “only if” part by contraposition andassume that there be edges eH ∈ H, eG ∈ G such that t intersects with neitherof them. But then t does not intersect eH ∪ eG ∈ H ∨ G either, i.e., t is not atransversal of H ∨ G.We thus have that the transversals of H ∨ G are the transversals of H plusthe transversals of G. Thus the minimal transversals of H ∨ G are the minimalelements of Tr(H) ∪ Tr(G). Since VH and VG are disjoint, all elements ofTr(H) ∪ Tr(G) are minimal. Thus

Tr(H ∨ G) = Tr(H) ∪ Tr(G)

(2) By definition a set t ⊆ V is a transversal of H ∪ G iff it is a transversal of bothH and G. Thus the transversals of H ∪ G are the unions of transversals of Hwith transversals of G. The minimal transversals of H ∪ G are therefore theminimal elements of Tr(H) ∨ Tr(G). Since VH and VG are disjoint, all elementsof Tr(H) ∨ Tr(G) are minimal. Thus

Tr(H ∪ G) = Tr(H) ∨ Tr(G)

We are now able to characterize the autonomous sets of a simple hypergraph.

Theorem 4 Let H be a simple hypergraph. Then the autonomous sets of H are theisolated sets of its transversal hypergraph Tr(H).

Proof Let S ⊆ V be autonomous for H, i.e., H = H[S] ∨ H[S]. Then

Tr(H) = Tr(H[S]) ∪ Tr(H[S])by Theorem 3, so S is isolated for Tr(H).

Conversely let S be any isolated set of Tr(H). Then

Tr(H) = Tr(H) 〈S〉 ∪ Tr(H)⟨S⟩

Page 7: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 263

and by Theorem 3 we have

Tr(Tr(H)) = Tr(Tr(H) 〈S〉) ∨ Tr(

Tr(H)⟨S⟩)

Thus S is autonomous for Tr(Tr(H)) = H.

Example 3 The requirement that H be simple in Theorem 4 is necessary: Considerthe hypergraphs

H = {AC, AD, BC, BD}and

H′ = H ∪ {ABC}

Both H and H′ have the same minimal transversals

Tr(H) = Tr(H′) = {AB, CD}

Clearly AB, CD are isolated sets of Tr(H′), but AB and CD are not autonomousfor H′:

H′[AB] ∨ H′[CD] = {A, B, AB} ∨ {C, D}= {AC, AD, BC, BD, ABC, ABD}= H′ ∪ {ABD} = H′

Since graphs are just special hypergraphs, our theory of autonomous sets appliesto them as well. Clearly all complete bipartite graphs have a non-trivial partition intotwo autonomous sets, but one may wonder whether there are others.

Lemma 4 A simple graph G without isolated vertices has a non-trivial partition intoautonomous sets if f it is complete bipartite.

Proof Let S /∈ {∅, ϑG} be autonomous. Since G contains no isolated vertices, G[S]and G[S] contain non-empty edges. As all edges in a simple graph contain exactlytwo vertices, the edges of G[S] and G[S] contain exactly one vertex each. Thus G =G[S] ∨ G[S] is complete bipartite.

We note that non-simple graphs with non-trivial autonomous partition may alsohave loops on all vertices of one side of the bipartition, as well as isolated vertices.

2.2 Computing autonomous sets

To complete this section, we now address the question of computing the minimalautonomous sets of H. While Theorem 4 suggests an approach (at least for simplehypergraphs), computing the transversal hypergraph can lead to exponential run-time. Instead, we shall utilize the following observation.

Page 8: Autonomous sets for the hypergraph of all canonical covers

264 H. Köhler

Lemma 5 Let H be a hypergraph on V and P = {S1, . . . , Sn} the partition of V intominimal autonomous sets. Let further H′

1 ⊆ H[S1] be non-empty, and H′ ⊆ H be thehypergraph

H′ := H′1 ∨ H[S2] ∨ . . . ∨ H[Sn].

Then S2, . . . , Sn are minimal autonomous sets of H′.

Proof By definition S2, . . . , Sn are autonomous for H′. If one of those Si were notminimal for H′, i.e., could be partitioned into smaller autonomous sets T1, . . . , Tk,then

H[Si] = H[T1] ∨ . . . ∨ H[Tk]and thus the sets Ti would be autonomous for H as well, contradicting the minimal-ity of Si.

We use this to compute the partition of V into minimal autonomous sets asfollows. We pick some vertex v ∈ V and split H into two hypergraphs, one containingall the edges which contain v, the other one containing all those edges which do notcontain v. We will need only one of them, so let Hv be the smaller one of the two,i.e., the one with fewer edges (if both contain exactly the same number of edges wemay choose either one):

Hv := smaller of{ {e ∈ H | v ∈ e}

{e ∈ H | v /∈ e}If Hv is empty, then v lies in all or no edges of H, and in both cases the set {v}

is autonomous for H. This reduces the problem of finding the minimal autonomoussets of H to finding the minimal autonomous sets of H[v], where v := ϑH \ {v}, asthey are also minimal autonomous sets of H.

Consider now the case where Hv is not empty. Let S1 be the minimal autonomousset of H containing v. Then Hv has the same form as H′ in Lemma 5, where H′

1contains either the edges of H[S1] which do or those which do not contain v. Wenow compute the minimal autonomous sets of Hv , and check for each set whetherit is autonomous for H. By Lemma 5 the sets autonomous for H are exactly theS2, . . . , Sn, while the sets not autonomous for H partition S1. Taking the unionof those non-autonomous sets and keeping the autonomous ones thus gives us theminimal autonomous sets of H. Note that the set {v} is always autonomous for Hv , asv is contained in either all or no edges of Hv . Thus it suffices to compute the minimalautonomous sets of Hv[v].

In either case we have reduced the problem of finding the minimal autonomousset of H to that of finding the minimal autonomous sets of a hypergraph with fewervertices. This gives us algorithm “Recursive Autonomous Partitioning” below.

While the test whether a set S is autonomous for H can be performed bycomputing H′ := H[S] ∨ H[S] and comparing it to H, the resulting set can easilycontain up to |H|2 edges if S is not autonomous. We observe that always H ⊆ H′and thus H = H′ iff |H| = |H′|. Since |H′| = |H[S]| · |H[S]|, the later condition canbe checked faster without actually computing H′.

Page 9: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 265

Algorithm “Recursive Autonomous Partitioning”INPUT: hypergraph HOUTPUT: partition of ϑH into minimal autonomous setsfunction RAP(H)

select vertex v ∈ ϑH

Hv := smaller of{ {e ∈ H | v ∈ e}

{e ∈ H | v /∈ e}if Hv = ∅ then

return {{v}} ∪ RAP(H[v])else

Aut := ∅, S1 := {v}Autv := RAP(Hv[v])for all S ∈ Autv do

if S autonomous for H thenAut := Aut ∪ {S}

elseS1 := S1 ∪ S

endreturn {S1} ∪ Aut

Theorem 5 Let H be a hypergraph with k vertices and n edges. Then the ”RecursiveAutonomous Partitioning” algorithm computes the partition of ϑH into minimalautonomous sets of H in time O(nk2).

Proof We have already argued that the algorithm computes the minimal au-tonomous sets of H correctly, so we only need to show the time bound.

The depth of recursion is at most k. In each call we compute Hv , which can bedone in O(n). If Hv = ∅ we only need to compute H[v], which is possible in O(nk).Thus this part of the algorithm can be performed in O(nk2).

If Hv = ∅ we need to test each set found to be autonomous for Hv whether itis autonomous for H. The number of such tests is at most k, and each test canbe performed in O(nk), by computing H[S] and H[S] and testing whether |H| =|H[S]| · |H[S]|. This leads to a complexity of O(nk2). Since the number of edges ofHv is at most half of the number of edges of H, the number of steps required forperforming the tests on Hv (or the next subgraph in the recursion for which tests arerequired) is at most half as many. This leads to a total complexity of

O((

n + n2

+ n4

+ . . .)

k2)

= O(nk2)

2.3 Superedges and partial superedges

While canonical covers will form the edges in our hypergraph, we will have to argueabout sets of FDs which form a cover, but may contain more FDs than needed. Wecall such supersets of edges “superedges”.

Page 10: Autonomous sets for the hypergraph of all canonical covers

266 H. Köhler

Definition 10 Let H be a hypergraph on V. A set E ⊆ V is called a superedge of Hif it includes some edge e ∈ H, i.e. e ⊆ E. We call E ⊆ S ⊆ V a partial (super)edgeon S if E is a (super)edge of H[S].

Lemma 6 Let H be a hypergraph on V and S ⊆ V. A set S′ ⊆ S is a partial superedgeon S if f S′ ∪ S is a superedge.

Proof By definition S′ is a partial superedge on S iff it includes a partial edge eS ∈H[S], i.e. iff there exists an edge e ∈ H with e ∩ S = eS ⊆ S′. Since

e = eS ∪ (e ∩ S) ⊆ S′ ∪ S

this implies that S′ ∪ S is a superedge. Conversely, if S′ ∪ S is a superedge, it includesan edge e ∈ H, which gives us

e ∩ S ⊆ (S′ ∪ S) ∩ S = S′

Lemma 7 Let H be a hypergraph on V and P = {S1, . . . , Sn} a partition of V intoautonomous sets. A set E ⊆ V is a superedge if f Ei := E ∩ Si is a partial superedge onSi for i = 1, . . . , n.

Proof If E is a superedge then it includes an edge e ∈ H. Thus Ei includes e ∩ Si ∈H[Si], which makes Ei a partial superedge on Si.

Now for each i = 1, . . . , n let Ei be a partial superedge on Si, including the partialedge ei ∈ H[Si]. Thus E includes

e := e1 ∪ . . . ∪ en ∈ H[S1] ∨ . . . ∨ H[Sn] (Theorem 1)= H

which makes E a superedge of H.

We can therefore strengthen Lemma 6 when S is autonomous:

Lemma 8 Let H be a hypergraph on V, E ⊆ V a superedge and S ⊆ V autonomous.A set S′ ⊆ S is a partial superedge on S if f S′ ∪ (E \ S) is a superedge.

Proof By Lemma 7, the set S′ ∪ (E \ S) is a superedge iff S′ is a partial superedge onS and E \ S is a partial superedge on S. Since E is a superedge, Lemma 7 assures thatE \ S = E ∩ S is a partial superedge on S.

3 Canonical covers

We shall now apply our theory of hypergraph decomposition to the set of allcanonical covers. For this, we begin by introducing basic terminology.

A functional dependency (FD) on a set of attributes R is an expression of theform X → Y (read “X determines Y”) where X and Y are subsets of R. Functionaldependencies are frequently used in database systems, where they restrict which data

Page 11: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 267

tables can be stored [11, 14, 15]. For attribute sets X, Y and attribute A we will writeXY short for X ∪ Y and A short for {A}.

A set X ⊆ R is a key of R w.r.t. a set � of integrity constraints on R, if � impliesX → R. Note that some authors use the term ’key’ only for minimal keys, and callkeys which may not be minimal ‘superkeys’.

A set � of FDs can imply other FDs. Implication of FDs can be characterized bythe following derivation rules, known as the Armstrong Axioms [1]:

X → YY ⊆ X,

X → YXW → YW

,X → Y Y → Z

X → Z(1)

Here the FDs on the top imply the FD at the bottom, and Y ⊆ X is a side conditionwhich needs to hold for the first rule to be applicable. We write �∗ for the set of allFDs on R implied by �.

Two sets of FDs �,�′ are called equivalent or covers of each other if they implyeach other. This can be written as � � �′ and �′ � �, or as �∗ = �′∗.

We will use letters at the end of the alphabet (. . . , X, Y, Z ) to denote subsets ofR, while letters at the beginning (A, B, C, . . .) denote single attributes.

Definition 11 We use the following terminology:

(i) A FD X → A is called singular.(ii) A non-trivial singular FD X → A ∈ �∗ is called atomic, if and only if for all

Y � X we have Y → A /∈ �∗.(iii) The atomic closure �∗a of � is the set of all atomic FDs in �∗(iv) A set G ⊆ �∗a of atomic FDs is called canonical cover if it is a cover of � which

is minimal w.r.t. set inclusion, i.e., for all H � G the set H is not a cover of �.

Note that atomic FDs have also been called “elemental” FDs [18].

Definition 12 Let � be a set of FDs. We denote the set of all canonical coversof � by

CC(�) := {G ⊆ �∗a | G is a canonical cover of �}

When given a set � of functional dependencies for schema decomposition,instance validation or similar tasks, we may choose to use a cover �′ of functionaldependencies equivalent to � instead. The choice of �′ is important, as it determinesthe result of the decomposition, the speed of updates, and generally can have a hugeimpact on database performance. Optimal results are usually achieved by coverswhich are in some standard form, often canonical, but finding these optimal covers isoften NP-hard.

To simplify these problems, we now wish to find autonomous sets of CC(�).While algorithm “Recursive Autonomous Partitioning” allows us to compute theminimal autonomous sets of a given hypergraph, CC(�) is usually not given directly.Instead we will assume that only � is given, and develop alternative means for findingautonomous sets of CC(�) without computing the entire hypergraph first.

Example 4 Let � consist of the following FDs:

� = {AB → C, C → A, A → D, D → E, E → A}

Page 12: Autonomous sets for the hypergraph of all canonical covers

268 H. Köhler

Then � has the following canonical covers:

CC(�) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

{AB → C, C → A, A → D, D → E, E → A},{AB → C, C → A, A → E, E → D, D → A},{AB → C, C → A, A → D, D → A, A → E, E → A},{AB → C, C → A, A → D, D → A, D → E, E → D},{AB → C, C → A, A → E, E → A, D → E, E → D},{DB → C, C → A, A → D, D → E, E → A},{DB → C, C → A, A → E, E → D, D → A},{DB → C, C → A, A → D, D → A, A → E, E → A},{DB → C, C → A, A → D, D → A, D → E, E → D},{DB → C, C → A, A → E, E → A, D → E, E → D},{EB → C, C → A, A → D, D → E, E → A},{EB → C, C → A, A → E, E → D, D → A},{EB → C, C → A, A → D, D → A, A → E, E → A},{EB → C, C → A, A → D, D → A, D → E, E → D},{EB → C, C → A, A → E, E → A, D → E, E → D},{AB → C, C → D, A → D, D → E, E → A},{AB → C, C → D, A → E, E → D, D → A},{AB → C, C → D, A → D, D → A, A → E, E → A},{AB → C, C → D, A → D, D → A, D → E, E → D},{AB → C, C → D, A → E, E → A, D → E, E → D},{DB → C, C → D, A → D, D → E, E → A},{DB → C, C → D, A → E, E → D, D → A},{DB → C, C → D, A → D, D → A, A → E, E → A},{DB → C, C → D, A → D, D → A, D → E, E → D},{DB → C, C → D, A → E, E → A, D → E, E → D},{EB → C, C → D, A → D, D → E, E → A},{EB → C, C → D, A → E, E → D, D → A},{EB → C, C → D, A → D, D → A, A → E, E → A},{EB → C, C → D, A → D, D → A, D → E, E → D},{EB → C, C → D, A → E, E → A, D → E, E → D},{AB → C, C → E, A → D, D → E, E → A},{AB → C, C → E, A → E, E → D, D → A},{AB → C, C → E, A → D, D → A, A → E, E → A},{AB → C, C → E, A → D, D → A, D → E, E → D},{AB → C, C → E, A → E, E → A, D → E, E → D},{DB → C, C → E, A → D, D → E, E → A},{DB → C, C → E, A → E, E → D, D → A},{DB → C, C → E, A → D, D → A, A → E, E → A},{DB → C, C → E, A → D, D → A, D → E, E → D},{DB → C, C → E, A → E, E → A, D → E, E → D},{EB → C, C → E, A → D, D → E, E → A},{EB → C, C → E, A → E, E → D, D → A},{EB → C, C → E, A → D, D → A, A → E, E → A},{EB → C, C → E, A → D, D → A, D → E, E → D},{EB → C, C → E, A → E, E → A, D → E, E → D}

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

Page 13: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 269

Instead of using this bulky “direct” representation, we aim to decompose CC(�) intosets of partial covers:

⎧⎨

{AB → C},{DB → C},{EB → C}

⎫⎬

⎭ ∨⎧⎨

{C → A},{C → D},{C → E}

⎫⎬

⎭ ∨

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

{A → D, D → E, E → A},{A → E, E → D, D → A},{A → D, D → A, A → E, E → A},{A → D, D → A, D → E, E → D},{A → E, E → A, D → E, E → D}

⎫⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎭

Using this decomposed representation, dealing with CC(�) becomes much easier.

3.1 Linear resolution

When computing the hypergraph of all canonical covers, we first need to find theatomic closure as it contains all vertices of our hypergraph. A method called linearresolution for obtaining the atomic closure was presented in [8]. As already discov-ered in [12], linear resolution can also be used to find all minimal keys, and it will becentral to our approach for computing all canonical covers as well. We outline the ap-proach below.

Definition 13 For any application of the following derivation rule (known as resolu-tion rule)

X → A AY → BXY → B

we call

⎧⎨

X → AAY → BXY → B

⎫⎬

⎭ the

⎧⎨

substitutingbasederived

⎫⎬

⎭ FD.

Theorem 6 [8] Let � be a set of singular FDs. Then every atomic FD in �∗a can bederived from � using the resolution rule

X → A AY → BXY → B

(2)

This result still holds if we restrict ourselves to derivations where the substitutingFDs X → A lie in �, i.e., for every atomic FD Xi → Ai ∈ �∗a there exists a linearresolution tree deriving Xi → Ai from �.

Proof Let X → A ∈ �∗a, and thus A ∈ X \ X. We use the known fact that the“closure” algorithm works, i.e., that there exists a sequence of FDs Xi → Ai ∈ �, i =1 . . . k with Ak = A and

Xi ⊆ Ci := X ∪ {A1, . . . , Ai−1}We start our derivation with Xk → A (= Ak), and then successively use Xi → Ai fori = k − 1, . . . , 1 in the resolution rule (2):

Xi → Ai Ui+1 → AUi → A

Page 14: Autonomous sets for the hypergraph of all canonical covers

270 H. Köhler

provided Ai ∈ Ui+1. In this, the derived left hand sides Ui have the form

Uk = Xk, Ui ={

Xi (Ui+1 \ Ai) if Ai ∈ Ui+1

Ui+1 else

It is easy to see that Ui ⊆ Ci, and in particular U1 ⊆ X. Since X → A is atomic, weget U1 = X, thus we have indeed constructed the derivation we wanted.

Note that the intermediate FDs obtained during the derivation need not beatomic. Since we are only interested in atomic FDs, we reduce the left hand sideof any FD we derive by removing attributes that are not needed, or extraneous:

Definition 14 Let � be a set of FDs and X → Y ∈ �∗. We say that an attribute A ∈X is extraneous in X → Y w.r.t. � if (X \ A) → Y ∈ �∗. A FD X → Y withoutextraneous attributes in X is called LHS-minimal. For an attribute set X we callA ∈ X extraneous if it is extraneous in X → X, i.e., if (X \ A) → X ∈ �∗.

This leads to the following derivation rule:

X → A AY → BLM�(XY → B)

where LM� denotes LHS-minimization w.r.t. �, removing (arbitrary) extraneousattributes from the LHS until the FD is LHS-minimal.

Example 5 Starting with the canonical cover

� = {C → L, CT → R, LT → C, RT → C}where C, L, R, T stand for Course, Lecturer, Room, and Time, we use resolution:

RT → C C → LRT → L

,LT → C CT → R

LT → R

The newly found FDs RT → L and LT → R are already atomic, so we add them to�∗a. We then test whether new resolution steps have become possible:

CT → R RT → L[CT → L] ,

C → L LT → R[CT → R]

Algorithm “linear resolution”INPUT: set of FDs �

OUTPUT: atomic closure �∗a

compute a canonical cover �′ of �

�∗a := �′for all Y → B ∈ �∗a do

for all X → A ∈ �′ with A ∈ Y, B /∈ X do�∗a := �∗a ∪ {LM�((XY \ A) → B)}

endend

Page 15: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 271

The FD CT → L can be LHS-minimized to C → L, which has already been found.The FD CT → R is already contained in �∗a as well, so no further atomic FDs canbe derived. We therefore obtain:

�∗a = � ∪ {RT → L, LT → R}

Theorem 7 [8] The “linear resolution” algorithm computes �∗a correctly.

Proof By Theorem 6 all atomic FDs that can be derived using (2). Instead of storingthe potentially non-atomic FD (XY \ A) → B, the “linear resolution” algorithmstores the stronger result U → B. Thus any derivation which requires (XY \ A) →B can be replaced by a derivation using U → B as initial FD instead.

As mentioned earlier, linear resolution can also be used to compute all minimalkeys:

Theorem 8 [8] Let � be any set of singular FDs over R and Y ⊆ R. Then any FDX → Y ∈ �∗ with minimal left hand side X can be derived from Y → Y using linearresolution.

Proof Let X be minimal with X → Y ∈ �∗. For every Ai ∈ Y \ X there exists aminimal Xi ⊆ X with Xi → Ai ∈ �∗a. By Theorem 6 every Xi → Ai ∈ �∗a has alinear resolution tree in �. Combining these resolution trees to substitute all Ai ∈Y \ X in Y → Y (possibly skipping some if the attribute Ai has been eliminated inan earlier step) yields a linear resolution of X ′ → Y for some X ′ ⊆ ⋃

Xi ⊆ X. SinceX is minimal X ′ = X.

Further improvements to the algorithm can be found in [8]. The paper alsoprovides a complexity analysis, showing a complexity of O( f · k2n2) where f is thenumber of FDs in �∗a (or number of minimal keys, if used for that purpose), k thenumber of attributes and n the number of FDs in �.

3.2 Partial covers

The set of all canonical covers of � forms a simple hypergraph on the FDs in �∗a.We may thus use the terms defined for hypergraphs for canonical covers as well.In particular, we shall talk about autonomous sets of FDs, and (partial) superedges.Note that in this context the superedges are the atomic covers, while the edges arethe canonical covers.

Definition 15 We call a set of FDs in �∗a autonomous if it is autonomous for thehypergraph CC(�). When talking about transversals, we always mean transversalsof CC(�).

Lemma 9 A set G ⊆ �∗a is a cover of � if f it intersects with all minimal transversalsof CC(�).

Page 16: Autonomous sets for the hypergraph of all canonical covers

272 H. Köhler

Proof G is a cover iff it is a superedge of CC(�). Furthermore, CC(�) is simple, andby Lemma 1 the edges of a simple hypergraph are the minimal sets which intersectwith all minimal transversals. Thus superedges are simply sets (not necessarilyminimal) which intersect with all minimal transversals.

As superedges become (atomic) covers for the hypergraph CC(�), partialsuperedges become partial covers.

Definition 16 Let � be a set of FDs and G ⊆ S ⊆ �∗a. We call G a partial cover of� on S if G is a partial superedge of CC(�) on S.

When S is autonomous, testing whether a set of FDs is a partial cover on S is easy:

Lemma 10 Let S ⊆ �∗a be autonomous, and let �′ ⊆ �∗a be an atomic cover of �.Then a set G ⊆ S is a partial cover on S if f G ∪ (�′ \ S) is a cover of �.

Proof Follows directly from Lemma 8.

Clearly G ∪ (�′ \ S) is a cover of � iff G ∪ (�′ \ S) � �′ ∩ S, which allows us toperform this test quickly.

We will identify some autonomous (but not necessarily minimal) sets of CC(�).Theorem 4 relates autonomous sets to the minimal transversals of CC(�). The fol-lowing lemmas establish some results about the form of these minimal transversals.

Lemma 11 Let S ⊆ �∗a be a minimal transversal of CC(�) and X → A ∈ S. ThenS = �∗a \ S is not a cover of �, but S ∪ {X → A} is.

Proof By Lemma 9, S is not a cover of � since it does not intersect with S. If S ∪{X → A} were not a cover, then every cover would contain a FD in

S ∪ {X → A} = S \ {X → A}Thus S \ {X → A} would be a transversal, which contradicts the minimality of S.

Definition 17 The sets of attributes X and Y are equivalent under a set of FDs �,written X ↔ Y, if X → Y and Y → X lie in �∗.

Lemma 12 Let X → A, Y → B be contained in a common minimal transversal S ⊆�∗a of CC(�). Then X and Y are equivalent under S = �∗a \ S.

Proof By Lemma 11 we have

S � Y → BS ∪ {X → A} � Y → B

(3)

Let us denote the closure of Y under S by Y∗S. If X � Y∗S then

Y∗S = Y∗S∪{X→A}

which contradicts (3). Thus S � Y → X, and by symmetry S � X → Y.

Page 17: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 273

Definition 18 Let � be a set of FDs on R. We denote the set of FDs in �∗a with LHSequivalent to X ⊆ R as

EQX := {Y → Z ∈ �∗a | Y ↔ X}The partition of �∗a into non-empty equivalence sets is denoted as

EQ := {EQX | ∃Y.X → Y ∈ �∗a}

Theorem 9 Let � be a set of FDs on R. Then every set EQX ∈ EQ is autonomous.

Proof By Lemma 12 all FDs in a (maximal) connected component of Tr(CC(�))

have equivalent LHSs under �. Thus EQX is the union of vertex sets of maximalconnected components of Tr(CC(�)), and therefore an isolated set of Tr(CC(�)).By Theorem 4 isolated sets of Tr(CC(�)) are autonomous for CC(�).

From this we can quickly obtain the following:

Theorem 10 Let � be a set of FDs on R. A set G ⊆ �∗a is a cover of � if f G ∩ EQX

is a partial cover of � on EQX for every EQX ∈ EQ.

Proof By Theorem 9 the sets EQX form a partition of �∗a into autonomous sets, sothe theorem is a special case of Lemma 7.

3.3 Relative covers

Partial covers on a set S of atomic FDs are obtained by taking an atomic cover andintersecting it with S. A different concept which will become useful is that of relativecovers, which can be obtained through ’relativation’ of covers onto sets of attributes.

Definition 19 Let � be a set of FDs on R, and �H be a set of FDs on H ⊆ R, H :=R \ H. We call �H a relative cover of � on H if for all X, Y ⊆ H we have

�H � X → Y ⇔ � � X ∪ H → Y

Relative covers have been used previously by Saiedian and Spencer in [17]under the name contraction. We will be using them in the context of implicationdependencies, which are functional dependencies over an attribute set of functionaldependencies, describing implication between them (see Section 3.4). Here the setH will consist of FDs already known from “elsewhere”.

Definition 20 The relativation of a FD X → Y onto an attribute set H is

X → Y ]H[ := X ∩ H → Y ∩ H

The relativation of a set � of FDs onto H is

� ]H[ := {X → Y ]H[ | X → Y ∈ �}

Note that we do allow FDs with empty LHS. They arise naturally when relativat-ing sets of FDs with non-empty LHSs.

Page 18: Autonomous sets for the hypergraph of all canonical covers

274 H. Köhler

We will show that relative covers can be constructed through relativation.

Lemma 13 Let �H be a relative cover of � on H. Then

�∗H = �∗ ]H[

Proof By definition we have

�∗ ]H[ = {X ∩ H → Y ∩ H | X → Y ∈ �∗}Thus for any X, Y ⊆ H we get

X → Y ∈ �∗ ]H[ ⇔ ∃X ′ ⊆ H with X ∪ X ′ → Y ∈ �∗

⇔ X ∪ H → Y ∈ �∗

⇔ X → Y ∈ �∗H

The last correspondence holds since �H is a relative cover.

Lemma 14 Let � be a set of FDs on R and H ⊆ R.

(a) If � � X → Y then � ]H[ � X ∩ H → Y ∩ H(b) If � ]H[ � X → Y then � � X ∪ H → Y

Proof

(a) If X → Y ∈ � then X ∩ H → Y ∩ H ∈ � ]H[. Otherwise X → Y can be de-rived from � using the Armstrong Axioms (1). We show that X ∩ H → Y ∩ Hcan be derived from � ]H[ by induction on the length of the derivation treeused to derive X → Y. This is straight forward:

derivation from � derivation from � ]H[

X → YY ⊆ X �

X ∩ H → Y ∩ HY ∩ H ⊆ X ∩ H

X → Y

XW → YW�

X ∩ H → Y ∩ H

XW ∩ H → YW ∩ H

X → Y Y → Z

X → Z�

X ∩ H → Y ∩ H Y ∩ H → Z ∩ H

X ∩ H → Z ∩ H

(b) If X → Y ∈ � ]H[ then � contains a FD X ∪ X ′ → Y ∪ Y ′ with X ′, Y ′ ⊆ H,which implies X ∪ H → Y. The remaining argument proceeds as for (a).

Lemma 15 Let � be a set of FDs on R and H ⊆ R. Then

�∗ ]H[ = (� ]H[)∗

Page 19: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 275

Proof We can show �∗ ]H[ ⊆ (� ]H[)∗ as follows:

X ′ → Y ′ ∈ �∗ ]H[ ⇔ ∃X → Y ∈ �∗ with X ′ = X ∩ H, Y ′ = Y ∩ H⇔ � � X → Y

(Lemma 14a) ⇒ � ]H[ � X ∩ H → Y ∩ H⇔ X ′ → Y ′ ∈ (� ]H[)∗

The proof for (� ]H[)∗ ⊆ �∗ ]H[ is similar:

X → Y ∈ (� ]H[)∗ ⇔ � ]H[ � X → Y(Lemma 14b) ⇒ � � X ∪ H → Y

⇔ X ∪ H → Y ∈ �∗⇒ X → Y ∈ �∗ ]H[

Using Lemmas 13 and 15 we get:

Theorem 11 Let � be a set of FDs on R and H ⊆ R. Then the relativation � ]H[ of� is a relative cover of � on H.

Proof Let �H be a relative cover of � on H. We need to show that �∗H = (� ]H[)∗.

This is clear by Lemmas 13 and 15:

�∗H = �∗ ]H[ = (� ]H[)∗

Note that a variant of Theorem 11 has been shown previously: It corresponds toLemma 6 in [17]. Note further that the converse of Theorem 11 does not hold: notevery relative cover is the relativation of a cover.

Example 6 Consider the set of FDs

� = {AB → C, C → D, B → D}.Its relativation onto H = BCD is a relative cover of � on H:

� ]H[ = {B → C, C → D, B → D}The FD B → D is redundant in � ]H[ so that the set

�H := {B → C, C → D}is also a relative cover of � on H. However, every cover of � contains B → D orB → BD, so �H cannot be the relativation of a cover of �.

3.4 Implication dependencies

In Section 3.1 we have seen how minimal keys can be computed efficiently. Theproblem of finding canonical covers is similar: Instead of looking for minimal setsof attributes which determine all other attributes, we are interested in minimal setsof atomic FDs which imply all other atomic FDs. However, in the case of the key

Page 20: Autonomous sets for the hypergraph of all canonical covers

276 H. Köhler

finding problem, a set of FDs was used to describe determination between attributesets, whereas implication of FDs is given implicitly. To utilize our linear resolutionalgorithm, we need to make implications explicit. This is done as follows.

Definition 21 Let � be a set of FDs. We call an expression of the form S ⇒ T whereS, T ⊆ � an implication dependency (ID).

An ID S ⇒ T is the equivalent of a FD S → T over �, where � is regarded asattribute set (i.e., we regard the FDs in � as independent attributes without anyconnection). We thus use the terminology defined for FDs for IDs as well, assumingan equivalent definition. In particular, we say that a set � of IDs implies an ID S ⇒ Tiff S ⇒ T can be derived from � using the equivalent of the Armstrong axioms (with→ replaced by ⇒).

Definition 22 Let � be a set of FDs. We call a set � of IDs on � an implication coverof � if for all sets S, T ⊆ � we have

S ⇒ T ∈ �∗ iff S � T

We call � sound if S ⇒ T ∈ �∗ implies S � T, and complete for � if S � T impliesS ⇒ T ∈ �∗. Furthermore, we call �H a relative implication cover on H ⊆ � if forall S, T ⊆ H we have

S ⇒ T ∈ �∗H iff S ∪ (� \ H) � T

Note that the relationship of implication covers and relative implication covers isthe same as for covers and relative covers: For any implication cover � the condition

S ⇒ T ∈ �∗H iff S ∪ (� \ H) � T

is equivalent to

�H � S ⇒ T iff � � S ∪ (� \ H) ⇒ T

which resembles precisely Definition 19. In particular this gives us Theorem 11 forimplication covers:

Corollary 2 Let � be a set of FDs with implication cover � and H ⊆ �. Then � ]H[is a relative implication cover of � on H.

Let us now recall Lemma 6. For the hypergraph CC(�) it states the following:

Corollary 3 Let � be a set of FDs and H ⊆ �∗a. A set S ⊆ H is a partial cover on Hif f S ∪ (�∗a \ H) is a cover of �.

The last condition of Corollary 3 is equivalent to S ∪ (�∗a \ H) � �∗a, and thus to

S ∪ (�∗a \ H) � H

Comparing this to Definition 22, we can rewrite the condition as

S ⇒ H ∈ �∗H

Page 21: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 277

for any relative implication cover �H of �∗a on H. This characterizes S as a key ofH w.r.t. the set of IDs �H , giving us the following lemma.

Lemma 16 Let S ⊆ H ⊆ �∗a, and �H be a relative implication cover of �∗a on H.Then S is a partial cover of �∗a on H if f S ⇒ H ∈ �∗

H.

Proof See above.

The last lemma allows us to find partial canonical covers as follows: We first finda relative implication cover �H , then use linear resolution to find all minimal keysw.r.t. �H , which are the partial canonical covers needed.

The next theorem allows us to compute a relative implication cover. To makethe soundness proof for a (relative) implication cover easier, we first show a simplelemma.

Lemma 17 A set � of IDs on � is sound if f S � T for all S ⇒ T ∈ �. A set �H ofIDs on H ⊆ � is sound if f S ∪ (� \ H) � T for all S ⇒ T ∈ �H.

Proof We only need to show that S � T for all derived IDs S ⇒ T ∈ �∗, and S ∪(� \ H) � T for IDs S ⇒ T ∈ �∗

H . For each application of the armstrong axioms forIDs, it is easy to see that soundness of the premises (i.e., S � T or S ∪ (� \ H) � T,respectively) implies soundness of the derived IDs.

Theorem 12 Let � be a set of FDs, EQ the partition of �∗a into equivalence classes,and H = EQX for some set EQX ∈ EQ. Construct �X as follows: For every pair of(dif ferent) FDs Y → A ∈ H, Z → A ∈ �∗a with � � Y → Z let �X contain the ID

{Y → Zi ∈ H | Zi ∈ Z } ∪ {Z → A} ⇒ Y → A (4)

provided Z → A ∈ H, or

{Y → Zi ∈ H | Zi ∈ Z } ⇒ Y → A (5)

otherwise. Then �X is a relative implication cover of �∗a on H.

Proof We first show that �X is sound. By Lemma 17 it suffices to show that for everyID in �X of the form (4) or (5) we have

{Y → Zi ∈ H | Zi ∈ Z } ∪ {Z → A} ∪ (�∗a \ H) � Y → A (6)

For every Zi ∈ Z \ Y there is a minimal Yi ⊂ Y such that Yi → Zi ∈ �∗a. If Yi =Y, then Yi and Y are not equivalent, since Y is a minimal LHS. Thus Yi → Zi ∈�∗a \ H, so that all Yi → Zi are contained in the LHS of (6). Clearly {Yi → Zi |Zi ∈ Z \ Y} ∪ {Z → A} implies Y → A.

To prove that �X is complete, let S, T ⊆ H with S ∪ (�∗a \ H) � T. We need toshow that S ⇒ T ∈ �∗

X . Assume the contrary, so that for U ′ := S ∪ (�∗a \ H) thereexists a FD Y ′ → A′ ∈ T (and thus Y ′ → A′ ∈ H) with (note that U ′ ∩ H = S):

U ′ � Y ′ → A′ and U ′ ∩ H ⇒ Y ′ → A′ /∈ �∗X

Page 22: Autonomous sets for the hypergraph of all canonical covers

278 H. Köhler

Now let U ⊆ U ′ be minimal such that there exists a FD Y → A ∈ H for which

U � Y → A and U ∩ H ⇒ Y → A /∈ �∗X

Consider closure computation for Y under U : Since we have U � Y → A there mustbe a FD Z → A ∈ U such that U A := U \ {Z → A} implies Y → Z . EquivalentlyU A � Y → Zi for all Zi ∈ Z . Since U A � U and U was chosen minimal, we get

U A ∩ H ⇒ Y → Zi ∈ �∗X

for all Y → Zi ∈ H, Zi ∈ Z . Since U A ∩ H ⊆ U ′ ∩ H = S this gives us

S ⇒ {Y → Zi ∈ H | Zi ∈ Z } ∈ �∗X

If Z → A ∈ H then Z → A ∈ U ∩ H ⊆ S, and since �X contains the ID (4) itfollows that

S ⇒ Y → A ∈ �∗X

which contradict our assumption. For Z → A /∈ H the same follows with the ID (5).

The size of the relative implication cover of EQX constructed is clearly polyno-mial in the size of �∗a. We note that using Theorem 10 to split up the problem offinding canonical covers into finding partial canonical covers for equivalence classesof FDs is helpful in two ways. First it allow us to represent the set of all canonicalcovers in an efficient manner. At the same time it simplifies the problem of findingimplication covers by allowing for small relative implication covers. As Example 7demonstrates, it can happen that every implication cover of �∗a is exponential in thesize of �∗a.

Example 7 Let X1 . . . X2n, Y1 . . . Yn, A be attributes and

X = X1 . . . X2n, Xi = X \ Xi, Y = Y1 . . . Yn, Yi = Y \ Yi

be attribute sets. Let further

� =

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

X1 → Y1, X2 → Y1,

X3 → Y2, X4 → Y2,

. . . . . .

X2n−1 → Yn, X2n → Yn,

Y → A

⎫⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎭

and thus (note that Xi ∪ X j = X for i = j)

�∗a = � ∪

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

X1Y1 → A, X2Y1 → A,

X3Y2 → A, X4Y2 → A,

. . . . . .

X2n−1Yn → A, X2nYn → A,

X → A

⎫⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎭

Then every implication cover of �∗a contains (at least) the 2n atomic IDs

{Z 1Y1 → A, Z 2 → Y2, Z 3 → Y3, . . . , Z n → Yn} ⇒ X → A

Page 23: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 279

where each Z i is replaced by X2i−1 or X2i. This is because the FDs in the LHSs donot imply any other FD in �∗a \ {X → A}.

3.5 The algorithm

We summarize the algorithm developed below.The sets CCX are the partial canonical covers of � on EQX by Lemma 16, and

together they represent CC(�) as described in Theorem 10.Note that the partition of �∗a into autonomous sets EQ might not be minimal.

We will see later in Section 4 that deciding whether a set of FDs is autonomousfor CC(�) is co-NP-complete, even when given �∗a. However, given all minimalautonomous sets, testing whether a set is autonomous can be done in polynomialtime by Theorem 2. Thus, unless P=NP, finding the minimal autonomous sets willnot be possible in polynomial time. We contented ourselves with the non-minimalpartition EQ since it was easy to identify and fast to compute. In Section 3.6 wewill discuss an efficient method for computing a finer partition into autonomous sets,although these sets may not be minimal either.

If we want to find the minimal autonomous partition after the sets CCX havebeen computed, e.g., to store CC(�) more efficiently, we can partition the hyper-graphs CCX further using the “Recursive Autonomous Partitioning” algorithm fromSection 2.

The following example calculation illustrates the algorithm “divide and resolve”.

Example 8 Our goal is to compute all canonical covers for the set of FDs

� = {AB → C, AC → B, AD → C, AE → C, BE → A}We start by computing the atomic closure �∗a of �

�∗a = � ∪ {AE → B, AD → B, BE → C}and partitioning �∗a into equivalence classes EQ = {EQAB, EQAD, EQAE} with

EQAB = {AB → C, AC → B}EQAD = {AD → C, AD → B}EQAE = {AE → B, AE → C, BE → A, BE → C}

Algorithm “divide and resolve”INPUT: set of FDs �

OUTPUT: set of all partial canonical covers CCX for every equivalence class EQX

of �∗a

compute �∗a

partition �∗a into equivalence classes EQfor each EQX ∈ EQ do

construct relative implication cover �X of EQX

CCX := {minimal keys of EQX w.r.t.�X}end

Page 24: Autonomous sets for the hypergraph of all canonical covers

280 H. Köhler

We then construct the relative implication cover for each equivalence class:

�AB = ∅�AD =

{ {AD → B} ⇒ {AD → C},{AD → C} ⇒ {AD → B}

}

�AE =

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

{AE → B} ⇒ {AE → C},{AE → B, BE → C} ⇒ {AE → C},

{AE → C} ⇒ {AE → B},{BE → A, AE → C} ⇒ {BE → C},

{BE → A} ⇒ {BE → C}

⎫⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎭

The partial canonical covers can now be computed as minimal keys w.r.t. the relativeimplication covers:

CCAB = {{AB → C, AC → B}}CCAD = {{AD → B}, {AD → C}}CCAE = {{AE → B, BE → A}, {AE → C, BE → A}}

Together this gives us all four canonical covers in

CC(�) = CCAB ∨ CCAD ∨ CCAE

=

⎧⎪⎪⎨

⎪⎪⎩

{AB → C, AC → B, AD → B, AE → B, BE → A},{AB → C, AC → B, AD → B, AE → C, BE → A},{AB → C, AC → B, AD → C, AE → B, BE → A},{AB → C, AC → B, AD → C, AE → C, BE → A}

⎫⎪⎪⎬

⎪⎪⎭

3.6 Improvements and complexity analysis

When using linear resolution to find CCX , the most time consuming step of elimi-nating extraneous attributes from the LHS of an ID is actually that of eliminatingredundant FDs from a partial atomic cover on EQX . When minimizing the LHSof a FD, we remove one attribute A at a time. We then check whether A is stilldetermined by the remaining attributes by computing their closure, using the FDs in�. The corresponding approach for an ID L ⇒ EQX would be to remove one FDY → A from its LHS L, and compute the closure of the remaining FDs L \ {Y → A}w.r.t. the relative implication cover �X .

However, since �X can be large compared to �, it is usually more efficient tocheck whether L \ {Y → A} is still a partial cover using Lemma 10. Since we knowthat L is a partial cover, we only need to check whether Y → A is implied by

(L \ {Y → A}) ∪ (� \ EQX)

This gives us the following simple procedure:

proc minimize-partial-cover(L, EQX , �)

for all Y → A ∈ L doL′ := (L \ {Y → A}) ∪ (� \ EQX)

Y∗L′ := closure of Y w.r.t. L′if A ∈ Y∗L′

thenL := L \ {Y → A}

end

Page 25: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 281

To establish an upper bound for the complexity of the “divide and resolve”approach, we use the variables

f = |�∗a|, n = |�|, k = |R|as defined earlier, as well as

c = max {|CCX | | EQX ∈ EQ}i.e., c is the maximum number of partial canonical covers over all EQX .

Computing �∗a can be done in O( f · k2n2). Partitioning �∗a can be done in O( f ·kn) by computing the closure of each LHS. Constructing all �X takes O( f 2 · k2)

using the closures computed before. Computing the sets CCX by linear resolutionwithout the optimization described above would lead to a complexity of O(c · f 6),using that |�X | is bounded by f 2. Using the partial cover test instead leads to aworst-time complexity of O(c · f 4 · k), which can be argued as follows: The numberof LHS minimizations is bounded by c · ∑ |�X | ≤ c · f 2, and each minimizationrequires at most f redundancy tests. Each redundancy test requires one closurecomputation relative to a subset of �∗a, which can be performed in O( f · k). Theoverall computation time is therefore bounded by the term

O(c · f 4 · k + f · k2n2 + f 2 · k2)

If we assume that n, k are bounded by f , which holds in all but some “lucky” cases(“lucky” because this means f is very small indeed), this can be simplified to

O(c · f 4 · k

)

As a final improvement, we wish to find a partition of �∗a into autonomoussets which is finer than EQ. For brevity we will just refer to the work of Saiedianand Spencer in [17]. Their algorithm finds autonomous sets for the hypergraphof all minimal keys. By applying it to our partial implication covers, we obtainautonomous sets for the hypergraph of all canonical covers which are often finer thanthe equivalence classes in EQ. This can be done in polynomial time, but the autono-mous sets found here need not be minimal either.

4 An NP-hardness result

While we identified the equivalence classes EQX as autonomous sets of CC(�),they need not be minimal. In the following we will show that finding the minimalautonomous sets of CC(�) is difficult, given �.

Definition 23 We call an atomic FD X → A ∈ �∗a essential iff it appears in somecanonical cover of �. Otherwise, we call it inessential.

We will first show that testing essentiality is NP-hard. We do so by reducing thefollowing problem to it, which is known to be NP-complete [12]:

Problem “prime attribute”Given a set � of FDs on schema R and an attribute A ∈ R, is A a primeattribute, i.e., does A lie in a minimal key of R?

Page 26: Autonomous sets for the hypergraph of all canonical covers

282 H. Köhler

Theorem 13 Given a set � of FDs, the problem of deciding whether a FD is essentialis NP-complete.

Proof To verify that a FD is essential, we only need to guess the canonical covercontaining it. By [5] the size of any canonical cover is polynomial in �, so the problemlies in NP.

We prove completeness by reducing the “prime attribute” problem to it. Let G bea set of FDs on R, and A /∈ R an additional attribute. We construct � as

� := G ∪ {A → Ai | Ai ∈ R}We claim that Ai is a prime attribute w.r.t. G iff A → Ai is essential w.r.t. �. SinceA does not appear in any FD in G we have

�∗a = G∗a ∪ {A → Ai | Ai ∈ R}Now let �′ be any canonical cover of �, and let �′

A := {A → Ai ∈ �′}. Clearly theFD A → R ∈ �∗ is implied by �′ iff A → K is implied by �′

A for some minimalkey K of R (w.r.t. G). This is the case iff �′

A consists of exactly (since �′ is non-redundant) those FDs A → Ai for which Ai ∈ K. Thus A → Ai is essential iff Ai isprime.

From this, we can deduce that identifying the (minimal) autonomous sets ofCC(�) is difficult as well.

Theorem 14 Given a set � of FDs and an atomic FD X → A ∈ �∗a, the problem ofdeciding whether the set {X → A} is autonomous for CC(�) is co-NP-complete.

Proof {X → A} is autonomous iff X → A appears in all or no canonical covers of�. Thus, if {X → A} is not autonomous, we only need to guess one canonical coverwhich contains X → A, and one which does not. By [5] the size of these canonicalcovers is polynomial in �. This shows that the problem lies in co-NP.

To show co-NP-hardness, we use that it is NP-hard to decide whether a FD X →A is essential. Let �′ be any canonical cover of �. Such a canonical cover �′ can becomputed in polynomial time [14]. By definition, X → A is essential iff it appears insome, but not necessarily all canonical covers of �. We distinguish two cases.

(1) If X → A ∈ �′ then X → A is essential.(2) If X → A /∈ �′ then X → A is essential iff it appears in some but not all

canonical covers of �, i.e., iff {X → A} is not autonomous.

We thus reduced the NP-hard problem of deciding whether X → A is essential tothe problem of deciding whether {X → A} is not autonomous.

The last theorem shows that the autonomous set problem is hard when given �.However, when we try to find autonomous sets of CC(�), we first compute �∗a,which can be exponential in the size of �. Thus it might be possible to decide whethera given set is autonomous in time polynomial in the size of �∗a. We show next thatthis is not the case.

Page 27: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 283

Theorem 15 Given a set � of FDs, its atomic closure �∗a and an atomic FD X →A ∈ �∗a, the problem of deciding whether the set {X → A} is autonomous for CC(�)

is co-NP-complete.

Sketch of Proof In [12] the NP-hardness of the “prime attribute” problem is shownby first reducing the “vertex cover” problem to the “key of cardinality m” problem,and then in turn to “prime attribute”. We will describe a slightly modified version ofthis reduction.

Let G be the graph for the vertex cover, �card the set of FDs for the “key ofcardinality m” problem (denoted D[0]′ in [12]), and �prime the set of FDs for the“prime attribute” problem (denoted D[0] in [12]). For the first reduction, �card isconstructed as follows:

�card := {N(v) → v | v is vertex in G}

where N(v) is the neighborhood of v in G.The second reduction is more complicated. We give a modified version next (the

modification occurs in condition (ii) below), which can be shown to be correct as in[12]. Let A′ be the set of attributes occurring in �card, A′′ of cardinality m < |A′|and b a new attribute. The attribute set for the “prime attribute” problem is thenA = A′ ∪ [A′′ × A′] ∪ {b}, and �prime is constructed as follows:

(i) for E → F ∈ �card add {b} ∪ E → F to �prime

(ii) for e ∈ A′ add {e} → A′′ × {e} to �prime

(iii) for i ∈ A′′ and e ∈ A′ add {b , (i, e)} → {e} to �prime

(iv) for i ∈ A′′ and e, f ∈ A′ distinct add {(i, e), (i, f )} → {b} to �prime

(v) for e ∈ A′ add {e} → {b} to �prime

Using these constructions, we want to establish some bounds on the size of theLHSs of FDs. This can be done by verifying the following statements.

– The size of the LHS of a FD in �card is the degree of the corresponding vertexin G

– �∗acard = �card

– The LHS of a FD in �∗aprime is at most one larger that the LHS of a FD in �∗a

card

The reductions from the “prime attribute” problem to the problems “essentialFD” and then “autonomous set” do not increase the size of the LHSs of atomicFDs. Thus the maximal size of the LHSs of FDs in �∗a is at most one larger thatthe maximal degree of vertices in G. It has been shown in [4] that the vertex coverproblem is still NP-hard for graphs of maximal degree 3. These cases reduce toinstances of the “autonomous set” problem for which all FDs in �∗a contain at most 4attributes in their LHS. But there exist less than k5 such FDs, where k is the numberof different attributes occuring in �, so the size of �∗a is polynomial in the size of �,and can therefore be computed in polynomial time using linear resolution [8]. Thisshows the theorem.

Page 28: Autonomous sets for the hypergraph of all canonical covers

284 H. Köhler

5 Conclusion

We presented a framework for decomposing hypergraphs using autonomous sets. Wethen applied this to compute a decomposed representation for the set of all canonicalcovers, and briefly mentioned some applications.

The idea of using autonomous sets to simplify problems has been used before,though not in the context of a general frame work, and not always for the hypergraphof canonical covers.

Saiedian and Spencer describe an approach in [17], where they determine au-tonomous sets for the hypergraph consisting of all minimal keys. The same au-tonomous sets are used again by Gottlob, Pichler and Wei in [6].

A unique representation for a set of unitary FDs (i.e., FDs with only a singleattribute in their left hand side) is given in [10]. This representation is obtained byfactoring the attribute determination graph via the equivalence relation on attributesinduced by �. From this, partial canonical covers for the equivalence classes couldbe constructed as minimal strongly connected directed graphs. Our work generalizesthis to arbitrary functional dependencies.

Maier already noted in [13] that there is a correspondence between the equiv-alence classes of non-redundant covers. Our work generalizes these results byinvestigating the projections of (canonical) covers onto arbitrary autonomous sets,and placing them into a more general theoretic framework.

It remains to investigate how autonomous sets can be used to simplify problems inother areas. We suggest that it is most likely to be helpful whenever a large solutionspace is generated by a small problem instance.

References

1. Armstrong, W.W.: Dependency structures of data base relationships. In: IFIP Congress, pp. 580–583 (1974)

2. Berge, C.: Hypergraphs: Combinatorics of Finite Sets. Elsevier Science Pub. Co. (1989)3. Biskup, J., Dayal, U., Bernstein, P.A.: Synthesizing independent database schemas. In: SIGMOD

Conference, pp. 143–151 (1979)4. Garey, M.R., Johnson, D.S., Stockmeyer, L.J.: Some simplified NP-complete graph problems.

Theor. Comput. Sci. 1(3), 237–267 (1976)5. Gottlob, G.: On the size of nonredundant FD-covers. Inf. Process. Lett. 24(6), 355–360 (1987)6. Gottlob, G., Pichler, R., Wei, F.: Tractable database design through bounded treewidth. In:

Vansummeren, S. (ed.) PODS, pp. 124–133. ACM (2006)7. Habib, M., de Montgolfier, F., Paul, C.: A simple linear-time modular decomposition algorithm

for graphs, using order extension. In: Hagerup, T., Katajainen, J. (eds.) SWAT, Lecture Notes inComputer Science, vol. 3111, pp. 187–198. Springer (2004)

8. Köehler, H.: Finding faithful Boyce–Codd normal form decompositions. In: Cheng, S.W., Poon,C.K. (eds.) AAIM, Lecture Notes in Computer Science, vol. 4041, pp. 102–113. Springer (2006)

9. Köhler, H.: Autonomous sets—a method for hypergraph decomposition with applications indatabase theory. In: Hartmann, S., Kern-Isberner, G. (eds.) FoIKS, Lecture Notes in ComputerScience, vol. 4932, pp. 78–95. Springer (2008)

10. Lechtenbörger, J.: Computing unique canonical covers for simple FDs via transitive reduction.Inf. Process. Lett. 92(4), 169–174 (2004)

11. Levene, M., Loizou, G.: A Guided Tour of Relational Databases and Beyond. Springer (1999)12. Lucchesi, C.L., Osborn, S.L.: Candidate keys for relations. J. Comput. Syst. Sci. 17(2), 270–279

(1978)13. Maier, D.: Minimum covers in the relational database model. J. ACM 27(4), 664–674 (1980)14. Maier, D.: The Theory of Relational Databases. Computer Science Press (1983)

Page 29: Autonomous sets for the hypergraph of all canonical covers

Autonomous sets for the hypergraph of all canonical covers 285

15. Mannila, H., Räihä, K.J.: The Design of Relational Databases. Addison-Wesley (1987)16. Osborn, S.L.: Testing for existence of a covering Boyce–Codd normal form. Inf. Process. Lett.

8(1), 11–14 (1979)17. Saiedian, H., Spencer, T.: An efficient algorithm to compute the candidate keys of a relational

database schema. Comput. J. 39(2), 124–132 (1996)18. Zaniolo, C.: A new normal form for the design of relational database schemata. ACM Trans.

Database Syst. 7(3), 489–499 (1982). doi:10.1145/319732.319749