34
3.7 Design of Relational Database Sc hemas We have several times noticed that converting dir ectly from object-oriented ODL designs (and to a lesser extent from E/R designs) leads to problems with the relational database schema. The principa l problem we have identified is redundancy, where a fact is repeated in more than one tuple. Moreov er, we have identified the most common cause for this redundancy: attempts to group into one relat ion both single-valued and multivalued properties of an object. For instance, we saw in Section 3. 2.2 the redundancy that results when we tried to store single-valued information about movies, suc h as their length, with multi-valued properties s uch as the set of stars for a movie. The problems are seen in Fig. 3.27, which we reproduced here a s Fig. 3.30. We found similar redundancy in that section when we tried to store single-valued birt hdate information for a star with a set of addres ses for a star. In this section, we shall tackle the prob lem of design of good relation schemas in the fol lowing stages: 1. We first explore in more detail the pr oblems that arise when our schema is flawed. 2. Then, we introduce the idea of “decomp osition,” breaking a relation schema (set of attr ibutes) into two smaller schemas. 3 7 关关关关关关关关关 我我我我我我我我我我我我 我我我 ODL 我我我我 ( 我我我我我我我我 E/R 我我我我 ) 我我我我我我我我我我我我 我我我我我我 我我我我我我 我我我我 我我我我我我我我我我我我我我 一。, 我我我我我我我我我我我我 我我我 我我我我我我我 一。, 我我我我我我我我我我我我我我 我我我我我我我我我我 我我我我 我我我我我 我我我 我 一。, 3.2.2 我我 我我 我我我我我我我我我我我 ( 我我我 ) 我我我我我 ( 我 我我我 ) 我我我 我我我我我 我我我我我我我我我 我 一,。 我我我我我我 3.27 我我我我我我我我 3.30 我我 我我 我我 我我我我我我我我 我我我我我我我我我我我 一,一 我我 我我我我我我我 我我我我我我我我我我我我 一,。 我我我 我我我我我我我 我我我我我我我我我我我我我我我我我1. 我我 我我我我 我我我我我我我我我我我我我我我我我我我2. 我我 我我我我 “我我”我我我 我我 我我我我我 .一 ( 我我我 ) 我我我我 我我我我我我

3.7 Design of Relational Database Schemas We have several times noticed that converting directly from object-oriented ODL designs (and to a lesser extent

Embed Size (px)

Citation preview

3.7 Design of Relational Database Schemas

We have several times noticed that converting directly from object-oriented ODL designs (and to a lesser extent from E/R designs) leads to problems with the relational database schema. The principal problem we have identified is redundancy, where a fact is repeated in more than one tuple. Moreover, we have identified the most common cause for this redundancy: attempts to group into one relation both single-valued and multivalued properties of an object. For instance, we saw in Section 3.2.2 the redundancy that results when we tried to store single-valued information about movies, such as their length, with multi-valued properties such as the set of stars for a movie. The problems are seen in Fig. 3.27, which we reproduced here as Fig. 3.30. We found similar redundancy in that section when we tried to store single-valued birthdate information for a star with a set of addresses for a star.

In this section, we shall tackle the problem of design of good relation schemas in the following stages:

1. We first explore in more detail the problems that arise when our schema is flawed.

2. Then, we introduce the idea of “decomposition,” breaking a relation schema (set of attributes) into two smaller schemas.

3. 7 关系数据库模式设计 我们已经几次注意到从面向对

象的 ODL 设计出发 ( 以及在少数情况下,从 E/R 设计出发 ) 直接向关系数据库模式转换,会导致关系数据库模式上的一些问题。我们发现主要的问题是冗余性,即一个事实在多个元组中重复。而且,我们发现造成这种冗余的最常见的原因是企图把一个对象的单值和多值持性包含在一个关系中。例如,在 3.2.2小节,当我们企图把电影的单值信息 ( 如长度 ) 和多值特性 ( 如影星集 ) 存储在一起的时候,就导致了信息冗余。出现的问题见图 3.27 。现在把它再现在图 3.30 中。在那一节,当我们试图存储一个影星的单值出生日期信息和一组地址信息时,就发现了类似的冗余现象。

在本节,我们将按如下步骤解决好的关系模式设计可能遇到问题:

1. 首先,我们应该更详细地分析关系模式有缺陷时引起的问题。

2. 然后,我们引入“分解”的思想.把一个关系模式 ( 属性集 ) 分解成两个更小的模式。

Fig 3.30 The relation Movie exhibiting anomalies

title year length filmType studioName starName

Star Wars 1977 124 color Fox Carrier Fisher

Star Wars 1977 124 color Fox Mark Hamill

Star Wars 1977 124 color Fox Harrison Ford

Mighty Ducks 1991 104 color Disney Emilio Estevez

Wayne’s World 1992 95 color Paramount Dana Carvey

Wayne’s World 1992 95 color Paramount Mike Meyers

3. Next, we introduce “Boyce-Codd normal form,” or “BCNF.” a condition on a relation schema that eliminates these problems.

4. These points are tied together when we explain how to assure the BCNF condition by decomposing relation schemas.

3.7.1 Anomalies Problems such as redundancy that occur when we try to cra

m too much into a single relation are called anomalies. The principal kinds of anomalies that we encounter are:

1. Redundancy. Information may be repeated unnecessarily in several tuples. Examples are the length and film type for movies as in Fig. 3.30.

2. Update Anomalies. We may change information in one tuple but leave the same information unchanged in another. For example, if we found that Star Wars was really 125 minutes long, we might carelessly change the length in the first tuple of Fig. 3.30 but not in the second or third tuples. True, we might argue that one should never be so careless. But we shall see that it is possible to redesign relation Movie so that the risk of such mistakes does not exist.

3. Deletion Anomalies. If a set of values becomes empty, we may lose other information as a side effect. For example,

3. 其次,我们引入“ Boyce-Codd 范式”即“ BCNF” 范式,给出一个消除这些问题的关系模式应满足的条件。

4. 在我们解释如何通过分解关系模式保证满足 BCNF 条件时,这几点是相关联的。

3.7.1 异常 当我们企图把太多的信息存放

在一个关系里时,出现的诸如冗余之类的问题称为“异常” 。我们遇到的主要的几种异常是:

1. 冗余。信息可能在多个元组中不必要地重复。如图 3.30 中电影的长度和电影类型。

2. 修改异常。我们可能修改了一个元组中的信息,但另一个元组中的相同的信息却没有修改。例如,如果我们发现“星球大战”的实际长度为 125 分钟,我们可能比较粗心,只修改了图 3.30 中的第一个元组,而没有修改第二和第三个元组。当然,也许有人会争论,不应该如此粗心。但我们将会看到,重新设计关系 Movie 从而避免出现这种错误是可能的。

3. 删除异常。如果一组属性的

should we delete Emilio Estevez from the set of stars of Mighty Ducks, then we have no more stars for that movie in the database. The last tuple for Mighty Ducks in the relation Movie would disappear, and with it information that it is 95 minutes long and in color.

3.7.2 Decomposing Relations The accepted way to eliminate these anomalies is to decom

pose relations. Decomposition of R involves splitting the attributes of R to make the schemas of two new relations. Our decomposition rule also involves a way of populating those relations with tuples by “projecting” the tuples of R.After describing the decomposition process, we shall show how to pick a decomposition that eliminates anomalies.

Given a relation R with schema {A1, A2, …, An}, we may decompose R into two relation S and T with schemas {B1, B2, …, Bm} and {C1, C2, …, Ck}, respectively, such that

1. {A1, A2, …,An} = {B1, B2, …, Bm} {C∪ 1, C2, …, Ck} 2. The tuples in relation S are the projections onto {B1,

B2, …, Bm} of all the tuples in R. That is, for each tuple t in the current instance of R, take the components of t in the attributes B1, B2, …, Bm. These components form a tuple, and this tuple belongs in the current instance of S.

值变为空,带来的副作用可能是丢失—些其他信息。例如如果我们从Mighty Ducks 的影星集中删除了 Emilio Estevez 那么,数据库里就没有这部电影的影星信息了。关系 Movie 中的最后一个 Mighty Ducks 元组将会消失,同时消失的还有它的长度 (104 分钟,原文误为 95 分钟 )和类型 ( 彩色 ) 信息。

3.7.2 关系分解 消除这些异常的一个可行的方

法是“分解”关系。 R 的分解包括把 R 的属性分开,以构成两个新的关系模式。我们的分解规则也包括通过对 R 的元组进行投影而增加新关系的方法,描述了分解过程之后,我们将说明如何选择一个能消除异常的分解方法。

给定一个模式为 {A1, A2, …, An

} 的关系 R ,我们可以把 R 分解为两个关系 S 和 T ,模式分别为 {B1, B2, …, Bm} 和 {C1, C2, …, Ck} 使得:

1. {A1, A2, …,An} = {B1, B2, …, Bm} {C∪ 1, C2, …, Ck}

2. 关系 S 的元组是 R 的所有元组在 {B1, B2, …, Bm} 上的投影。

However, relations are sets, and the same tuple of S could result from projecting two different tuples of R. If som, we put into the current instance of S only one copy of each tuple.

3. Similarly, the tuples in relation T are the projections, onto set of attributes {C1, C2, …, Ck}, of the tuples in the current instance of R.

Example 3.32: Let us decompose the Movie relation of Fig. 3.30. First, we shall decompose the schema. Our choice, whose merit will be seen in Section 3.7.3, is to use

1. A relation called Movie1, whose schema is all the attributes except for starName.

2. A relation called Movie2, whose schema consists of the attributes title, year, and starName.

Now, let us illustrate the process of decomposing relation instances by decomposing the sample data of Fig. 3.30. First, let us construct the projection onto the movie1 schema:

{title, year, length, filmType, studioName} The first three tuples of Fig. 3.30 each have the same comp

onents in these five attributes: {Star Wars, 1977, 124, color, Fox} The fourth tuple yields a different tuple for the first five

也就是说.对于 R 的当前实例中的每个元组 t ,取 t 在属性 B1, B2, …, Bm 上的分量。这些分量构成了一个元组,它属于 S 的当前实例。然而,关系是集合,而 R 中的两个不同元组投影到 S 上,可能导致 S 中的相同元组。如果出现了这种情况,我们在 S 的当前实例中只保留相同元组中的一份。

3. 类似地,关系 T 中的元组是R 的当前实例中的元组在属性集合{C1, C2, …, Ck} 上的投影。

例 3.32: 让我们来分解图 3.30 中的关系 Movie 。首先进行模式分解。我们选择下面两个关系。在 3.7.3 节将会看到这种选择的优点。

1. 一个关系称为 Movie1 ,其模式是除 starName 以外的所有属性。

2. 另—个关系称为 Movie2 ,其模式包括属性 title , year 和 starName 。

现在,我们通过分解图 3.30 中的采样数据来说明分解关系实例的过程;首先,我们构造在 Movie1 模式哈的投影: {title, year, length, filmType, studioName} 。图 3.30 的前三个元组在这五个分量上的投影

components, and the fifth and sixth tuple each yield the same five-component tuple. The resulting relation for Movie1 is shown in Fig. 3.31.

Figure 3.31: The relation Movie1

Figure. 3.32: The relation Movie2

是: {Star Wars, 1977, 124, color, Fox} 。第四个元组的前五个分量构成一个不同的元组。而第五和第六个元组构成相同的五分量元组。结果得到的关系 Movie1 如图 3.31所示。

其次,考虑图 3.30 在 Movie2 模式上的投影。图中的六个元组在属件 title , year 和 starName 中至少有一个分星不同,所以关系 Movie2 的结果如图 3.32所示。 口

title year length filmType studioName

Star Wars 1977 124 color Fox

Mighty Ducks 1991 104 color Disney

Wayne’s World 1992 95 color Paramount

title year starName

Star Wars 1977 Carrie Fisher

Star Wars 1977 Mark Hamill

Star Wars 1977 Harrison Ford

Mighty Ducks 1991 Emilio Estevez

Wayne’s World 1992 Dana Carvey

Wayne’s World 1992 Mike Meyers

Next, consider the projection of Fig. 3.30 onto the schema of Movie2. Each of the six tuples of that figure differ in at least one of the attributes title, year, and starName, so the result is the relation shown in Fig. 3.32.

Notice how this decomposition eliminates the anomalies we mentioned in Section 3.7.1. The redundancy has been eliminated; for example, the length of each film appears only once, in relation Movie1. The risk of an update anomaly is gone. For instance, since we only have to change the length of Star Wars in one tuple of Movie1, we can not wind up with two different length for that movie.

Finally, the risk of a deletion anomaly is gone. If we delete all the stars for Mighty Ducks, say, that deletion makes the movie disappear from Movie2. But all the other information about the movie can still be found in Movie1.

It might appear that Movie2 still has redundancy, since the title and year of a movie can appear several times. However, these two attributes form a key for movies, and there is no more succinct way to represent a movie. Moreover, Movie2 dose not offer an opportunity for an update anomaly. We might suppose that if we changed the year in, say, the Carrie Fisher tuple but not the other two tuples for Star Wars, then there would be an update anomaly. However, there is nothing in our assumed functional dependencies that prevents there from being a

注意,这种分解如何消除了3.7.1 小节提到的异常。冗余已经消除了;比如,每部电影的长度只在关系 Movie1 中出现一次。修改异常的危险也不存在了。例如,因为我们只需要在 Movie1 的一个元组中修改 Star Wars的长度信息,所以该电影就不可能出现两个不同的长度。

最后,删除异常的危险也不再存在。如果我们删除了电影 Mighty Ducks 中的全部影星,也就是说,这个删除操作使得该电影在 Movie2 中消失了。但该电影的其他信息依然可以在 Movie1中找到。

由于一部电影的名称和年份可能出现多次.看起来 Movie2可能还存在冗余信息 , 然而,这两个属性构成了电影的键码,而且没有更简洁的方法来表示一部电影。此外, Movie2并没有提供出现修改异常的机会,我们也许会这样假设:如果我们改变了Star Wars 的 Carrie Fisher 元组的year 属性值,而另外两个元组保持不变,于是就出现了修改异常。

different movie named Star Wars in 1997, and Carrie Fisher may have starred in that one. Thus, we do not want to prevent changing the year in one Star Wars tuple, nor is such a change necessarily incorrect.

3.7.3 Boyce-Codd Normal Form

The goal of decomposition is to replace a relation by several that do not exhibit anomalies. There is, it turns out, a simple condition under which the anomalies discussed above can be guaranteed not to exist. This condition is called Boyce-Codd normal form, or BCNF

. A relation R is in BCNF if and only if: whenever there is a nontrivial dependency A1A2 … An → B for R, it is the case that {A1, A2, …, An} is a superkey for R.

That is, the left side of every nontrivial functional dependency must be a superkey. Recall that a superkey need not be minimal. Thus, an equivalent statement of the BCNF condition is that the left side of every nontrivial functional dependency must contain a key.

When we find a BCNF-violating dependency, it is sometimes useful to find all the other dependencies that have the same left side, whether or not they are BCNF violations. The following is an alternative definition of

然而,在我们假定的函数依赖中并不排除 1997年存在另一部名为 Star Wars 的电影,而 Carrie Fisher 也可能担任该电影的主演。因此,我们不要去阻止修改一个 Star Wars元组的年份。而且这种修改也未必就是错误的。

BCNF in which we look for a set of dependencies with common left side, at least one of which is nontrivial and violates the BCNF condition.

. Relation R is in BCNF if and only if: whenever nontrivial dependency A1A2 … An → B1B2 … Bm holds for R, it must be the case that {A1, A2, …, An} is a superkey for R.

This requirement is equivalent to the original BCNF condition. Recall that the dependency A1A2 … An → B1B2 … B

m is shorthand for the set of dependencies A1A2 … An → Bi for I = 1, 2, …, m. Since there must be at least one Bi that is not among the A’s (or else A1A2 … An → B1B2 … Bm would be trivial), A1A2 … An → Bi is a BCNF violation according to our original definition.

Example 3.33: Relation Movie, as in Fig. 3.30, is not in BCNF. To see why, we first need to determine what sets of attributes are keys. We argued in Example 3.21 why (title, year, starName) is a key. Thus, any set of attributes containing these three is a superkey. The same arguments we followed in Example 3.21 can be used to explain why no set of attributes that does not include all three of title, year, and starName could be a superkey. Thus, we assert that (title, year, starName) is the only key for Movie.

However, consider the functional dependency title year → length filmType studioName

请输入译文

which we know holds in Movie. Recall the reason we can assert this dependency: the original ODL design has key {title, year}, single-valued attributes length and filmType, and single-valued relationship ownedBy leading to the owning studio.

Unfortunately, the left side of the above dependency is not a superkev. In particular, we know that title and year do not functionally determine the sixth attribute, star-Name. Thus. the existence of this dependency violates the BCNF condition and tells us Movie is not in BCNF. Moreover according to the original definition of BCNF, where a single attribute on the right side was required. we can offer any of the three functional dependencies, such as title year → length, as a BCNF violation.

Example 3.34 : On the other hand, Moviel of Fig. 3.31 is in BCNF. Since

title year → length filmType studioName holds in this relation, and we have argued that neither title

nor year by itself functionally determines any of the other attributes, the only key for Moviel is {title, year}. Moreover, the only nontrivial functional dependencies must have at least title and year on the left side, and therefore their left sides must be superkeys. Thus, Moviel is in BCNF.

Example 3.35: We claim that any two-attribute relation is in BCNF. We need to examine the possible nontrivial dependencies with a single attribute on the right. There are

请输入译文

not too many cases to consider, so let us consider them in turn. In what follows, suppose that the attributes are A and B.

1. There are no nontrivial functional dependencies. Then surely the BCNF condition must hold, because only a nontrivial dependency can violate this condition. Incidentally, note that {A,B} is the only key in this case.

2. A → B holds, but B → A does not hold. In this case, A is the only key, and each nontrivial dependency contains A on the left (in fact the left can only be A). Thus there is no violation of the BCNF condition.

3. B → A holds, but A → B does not hold. This case is symmetric to case (2).

4. Both A → B and B → A hold. Then both A and B are keys. Surely any dependency has at least one of these on the left, so there can be no BCNF violation.

It is worth noticing from case (4) above that there may be more than one key for a relation. Further, the BCNF condition only requires that some key be contained in the left side of any nontrivial dependency, not that all keys are contained in the left side. Also observe that a relation with two attributes, each functionally determining the other, is not completely implausible. For example, a company may assign its employees unique employee ID‘s and also record their Social Security numbers. A relation with attributes

请输入译文

empID and ssNo would have each attribute functionally determining the other. Put another way, each attribute is a key, since we don't expect to find two tuples that agree on either attribute.

3.7.4 Decomposition into BCNF By repeatedly choosing suitable decompositions, we can br

eak any relation schema into a collection of subsets of its attributes with the following important properties:

1. These subsets are the schemas of relations in BCNF. 2. The data in the original relation is represented faithfu

lly by the data in the relations that are the result of the decomposition, in a sense to be made precise in Section 3.7.6. Roughly, we need to be able to reconstruct the original relation exactly from the decomposed relations.

Example 3.35 suggests that perhaps all we have to do is break a relation schema into two-attribute subsets, and the result is surely in BCNF. However, such an arbitrary decomposition will not satisfy condition (2), as we shall see in Section 3.7.6. In fact, we must be more careful and use the violating functional dependencies to guide our decomposition.

The decomposition strategy we shall follow is to look for a nontrivial functional dependency A1A2 … An → B1B2 … Bm that violates BCNF; i.e., {A1, A2, …, An} is not a superkey. As a heuristic, we shall generally add to the right

请输入译文

side as many attributes as are functionally determined by

{A1, A2, …, An} Figure 3.33 illustrates how the attributes are broken into tw

o overlapping relation schemas. One is all the attributes involved in the violating dependency, and the other is the left side plus all the attributes not involved in the dependency, i.e., all the attributes except those B's that are not A's.

Figure 3.33: Relation schema decomposition based on a BCNF violation

Example 3.36: Consider our running example, the Movie relation of Fig. 3.30. We saw in Example 3.33 that

title year → length filmType studioName is a BCNF violation. In this case, the right side already

请输入译文

includes all the attributes functionally determined by title and year, so we shall use this BCNF violation to decompose Movie into:

1. The schema with all the attributes of the dependency, that is:

{title, year, length,filmType, studioName} 2. The schema with all of Movie's attributes except the

three that appear on the right of the dependency. Thus, we remove length, filmType, and studioName, leaving the second schema:

{title, year, starName} Notice that these schemas are the ones selected for relati

ons Moviel and Movie2 in Example 3.32. We observed that these are each in BCNF in Example 3.34. Example 3.37: Let us consider the relation MovieStudio that was introduced in Example 3.30. This relation stores information about movies, their owning studios, and the addresses of those studios. The schema and some typical tuples for this relation are shown in Fig. 3.34.

Note that MovieStudio contains redundant information. Because we added to our usual sample data a second movie owned by Paramount, the address of Paramount is stated twice. However, the source of this problem is not the same as in Example 3.36. In the latter example, the problem was that a multivalued relationship (the stars of a given movie)

请输入译文

was being stored with other information about the movie. Here, everything is single-valued: the attribute length for a movie, the relationship ownedBy that relates a movie to its unique owning studio, and the attribute address for studios.

Figure 3.34: The relation MovieStudio

In this case, the problem is that there is a "transitive dependency." That is, as mentioned in Example 3.30, relation movieStudio has the dependencies

title year → studioName studioName → studioAddr We may apply the transitive rule to these to get a new depen

dency title year → studioAddr

请输入译文

title year length filmType studioName studioAddress

Star Wars

1977 124 color Fox Hollywood

Mighty Ducks

1991 104 color Disney Buena Vista

Wayne’s World

1992 95 color Paramount Hollywood

Addams Family

1991 102 color Paramount Hollywood

That is, a title and year (i.e., the key for movies) functionally determine a studio address - the address of the studio that owns the movie. Since

title year → length filmType is another obvious dependency, we conclude that {title, yea

r} is a key for MovieStudio; in fact it is the only key. On the other hand, dependency studioName → studioAddress which is one of those used in the application of the transitiv

e rule above, is nontrivial but its left side is not a superkey. This observation tells us MovieStudio is not in BCNF. We can fix the problem by following the decomposition rule, using the above dependency. The first schema of the decomposition is the attributes of the dependency itself, that is:

{studioName, studioAddr} The second schema is all the attributes of MovieStudio exc

ept for studioAddr, because the latter attribute is on the right of the dependency used in the decomposition. Thus, the other schema is:

{title, year, length, filmType, studioName} The projection of Fig. 3.34 onto these schemas gives us t

he two relations MovieStudiol and MovieStudio2 shown in Figs. 3.35 and 3.36. Each of these is in BCNF. The sole key

请输入译文

for MovieStudio1 is {title,year}, and the sole key for MovieStudio2 is {studioName}. In each case, there are no nontrivial dependencies that do not contain these keys on the left.

Figure 3.35: The relation MovieStudio1

Figure 3.36: The relation MovieStudio2 In each of the previous examples, one judicious applicati

on of the deco9mposition rule is enough to produce a collection of relations that are in BCNF. In general, that is not the case.

请输入译文

title year length filmType studioName

Star Wars 1977 124 color Fox

Mighty Ducks 1991 104 color Disney

Wayne’s World 1992 95 color Paramount

Addams Family 1991 102 color Paramount

studioName studioAddr

Fox Hollywood

Disney Buena Vista

Paramount Hollywood

Example 3.38: We could generalize Example 3.37 to have a chain of functional dependencies longer than two. Consider a relation with schema

{title, year, studioName, president, presAddr} That is, each tuple of this relation tells about a movie, its st

udio, the president of the studio, and the address of the president of the studio. Three functional dependencies that we would assume in this relation are

title year → studioName studioName → president president → presAddr The sole key for this relation is {title, year}. Thus the las

t two dependencies above violate BCNF. We could decompose starting with

studioName → president for example, we should add to the right side of this function

al dependency any other attributes in the closure of studioName. By the transitive rule applied to studioName → president and president → presAddr, we know

studioName → presAddr Combing the two dependencies with studioName on the lef

t, we get: studioName → president presAddr This functional dependency has a maximally expanded

请输入译文

right side, so we shall now decompose into the following two relation schemas.

{title, year, studioName} {studioName, president, presAddr} The first of these is in BCNF. However, the second has

{studioName} for its only key but also has the dependency president → presAddr which is in BCNF violation. Thus, we must decompose aga

in, using the expanded dependency in which president is addedd to the above right side. The resulting three relation schemas, all in BCNF, are:

{title, year, studioName} {studioName, president} {president, presAddr} IN general, we must keep applying the decomposition ru

le as many times as needed, until all our relations are in BCNF. We can be sure of ultimate success, because every time we apply the decomposition rule to a relation R, the two resulting schemas each have fewer attributes than that of R. As we saw in Example 3.35, when we get down to two attributes, the relation is sure to be in BCNF.

To see why decomposition always yields smaller schemas, suppose we have a BCNF violation A1A2 … An → B1B2

… Bm. We may assume this dependency has already been expanded to include among the B’s any other

请输入译文

attribute functionally determined by the A’s and that any of the B’s that are also among the A’s have been removed from the B’s.

One of the schemas of the decomposition is all the attributes of R except for the B’s. There must be at least one B, so this schema does not include all attributes.

The other schema is all the A’s and B’s. This set cannot be all the attributes of R, because if it were, then {A1 , A2 ,

…, An} would be superkey for R (i.e., the A’s would functionally determine all the other attributes of R). No dependency with a superkey on the left is a BCNF violation.

We conclude that both schemas of the decomposition are smaller than the schema of R. Thus, the repeated decomposition process must eventually reach a collection of BCNF relations.

3.7.5 Projecting Functional Dependencies When we decompose a relation schema, we need to check t

hat the resulting schemas are in BCNF. As we saw in Example 3.38 it is possible that one or both of the new schemas themselves have BCNF violations. However, how can we tell whether a relation is in BCNF unless we can determine the functional dependencies that hold for that relation? In Example 3.38 we reasoned about the dependencies that hold for the new relations in an ad hoc way. Fortunately, there is a methodical way to find the

请输入译文

functional dependencies for the results of a decomposition. Suppose we have a relation R, which is decomposed into

relation S and some other relation. Let F be the set of functional dependencies known to hold for R. To compute the functional dependencies that hold in S do the following:

Consider each set of attributes X that is contained in the set of attributes of S. Computer X+. Then for each attribute B such that

1. B is an attribute of S, 2. B is in X+, and 3. B is not in X, the functional dependency X → B holds in S. Example 3.39: Let R have schema R(A, B, C, D), and

suppose the functional dependencies A → B and b → C are given for R. Let S(a, C) be one of the relations in a decomposition of R. We shall compute the dependencies that hold is S.

In principle, we must compute the closure of each subset of {A, C}, which is the set of attributes of S. Let us begin with {A}+. This set is easily seen to be {A, B, C}. Since B is not in the schema of S, we do not claim that A → B is a dependency for S. However, C is in the schema for S, so we assert dependency A → C for S.

Now, we must consider {C}+. Since C is not a left side of a given dependency, we get nothing new in the closure, so

请输入译文

{C}+ = {C}. In general, a set that does not contain at least one left side of a given dependency cannot yield any dependencies for S.

We must also consider {A,C}+ , which is {A, B, C}. We therefore get no new dependency not already found when we considered {A}+. The conclusion is that A → C is the only dependency we need assert for S. Of course there are other dependencies for S that are derived from this one, such as AD → C or the trivial dependency A → A. However, they can be obtained by the rules given in Section 3.6 and need not be stated specifically when we give the functional dependencies for S.

Example 3.40: Now consider R(A, B, C, D, E) decomposed into S(A, B, C) and another relation. Let the functional dependencies of R be A → D, B → E, and DE → C.

First, consider {A}+ = {A, D}. Since D is not in the schema of S, we get no dependencies here. Similarly, {B}+

= {B, E} and {C}+ = {C}, yielding no dependencies for S. Now consider pairs. {A, B}+ = {A, B, C, D, E}. Thus, we

get the dependency AB → C for S. Neither of the other pairs give us any dependencies for S. Of course the set of all three attributes of S, {A, B, C}, cannot yield any nontrivial dependencies for S. Thus, the only dependency we need assert for S is AB → C.

请输入译文

3.7.6 Recovering Information from a Decomposition

Let us now turn our attention to the question of why the decomposition algorithm of Section 3.7.4 preserves the information that was contained in the original relation. The idea is that if we follow this algorithm, then the projections of the original tuples can be “joined” again to produce all and only the original tuples.

To simply the situation, let us consider a relation R with schema {A, B, C} and a functional dependency B → C, which we suppose is a BCNF violation. It is possible, for example, that as in Example 3.37, there is a transitive dependency chain, with another functional dependency A → B. In that case, {A} is the only key, and the left side of B → C clearly is not a superkey. Another possibility is that B → C is the only nontrivial dependency, in which case the only key is {A, B}. Again, the left side of B → C is not a superkey. In either case, the required decomposition based on the dependency B → C separates the attributes into schemas {A, B} and {B, C}.

Let t be a tuple of R. We may write t = (a, b, c), where a, b, and c are the components of t for attributes A, B, and C, respectively. Tuple t projects as (a, b) for the relation with schema {A, B} and as (b, c) for the relation with schema {B, C}.

It is possible to join a tuple from {A, B} with a tuple

请输入译文

from {B, C}, provided they agree in the B component. In particular, (a, b) joins with (b, c) to give us the original tuple t = (a, b, c) back again. That happens regardless of what tuple t we started with; we can always join its projections to get t back.

However, getting back those tuples we started with is not enough to assure that the original relation R is truly represented by the decomposition. What might happen if there were two tuples of R, say t = (a, b, c) and v = (d, b, e)? When we project t onto {A, B} we get u = (a, b), and when we project v onto {B, C} we get w = (b, e), as suggested by Fig. 3.37.

Tuples u and w join, since they agree on their B components. The resulting tuple is x = (a, b, e). Is it possible that x is a bogus tuple? That is, could (a, b, e) not be a tuple of R?

Since we assume the functional dependency B → C for relation R, the answer is “no.” Recall that this dependency says any two tuples of R that agree in their B components must also agree in their C components. Since t and v agree in their B components (they both have b there), they also agree on their C components. That means c =e; i.e., the two values we supposed were different are really the same. Thus, (a, b, e) is really (a, b, c); that is, x = t.

Since t is in R, it must be that x is in R. Put another way, as long as functional dependency B → C holds, the joining

请输入译文

Figure 3.37: Joining two tuples from projected relations

of two projected tuples cannot produce a bogus tuple. Rather, every tuple produced by joining is guaranteed to be a tuple of R.

请输入译文

This argument works in general. We assume A, B, and C were each single attributes, but the same argument would apply if they were any sets of attribute. That is, we take any BCNF-violating dependency, let B be the attributes on the left side, C be the attributes on the right but not the left, and A be the attributes on neither side. We may conclude:

. If we decompose a relation according to the method of Section 3.7.4, then the original relation can be recovered exactly by joining the tuples of the new relations in all possible ways.

If we decompose relation in a way that is not based on a functional dependency, then we might not be able recover the original relation. Here is an example.

Example 3.41: Suppose we have the relation R with schema {A, B, C} as above, but that the dependency B → C does not hold. Then R might consist of the two tuples

The projections of R onto the relations with schemas {A, B} and {B, C} are

请输入译文

A B C

1 2 3

4 2 5

and

respectively. Since all four tuples share the same B-value, 2, each tuple of one relation joins with both tuples of the other relation. Thus, when we try to reconstruct R by joining, we get

请输入译文A B

1 2

4 2

B C

2 3

2 5

A B C

1 2 3

1 2 5

4 2 3

4 2 5

That is, we get “too much”; we get too bogus tuples, (1, 2, 5) and (4, 2, 5) that were not in the original relation R.

3.7.7 Third Normal Form Occasionally, one encounters a relation schema and its dep

endencies that are not in BCNF but that one doesn’t want to decompose further. The following example is typical.

Example 3.42: Suppose we have a relation Bookings with attributes:

1. Title, the name a movie. 2. Theater, the name of a theater where the movie is bein

g shown. 3. City, the city where the theater is located. The intent behind a tuple (m, t, c) is that the movie with titl

e m is currently being shown at theater t in city c. We might reasonably assert the following functional dep

endencies: theater → city

title city → theater The first says that a theater is located in one city. The secon

d is not obvious but is based on the normal practice of not booking a movie into two theaters in the same city. We shall assert this dependency if only for the sake of the

请输入译文

example. Let us first find the keys. No single attribute is a key. For

example, title is not a key because a movie can play in several theaters at once and in several cities at once. Also, theater is not a key, because although theater functionally determines city, there are multiscreen theaters that show many movies at once. Thus, theater does not determine title. Finally, city is not a key because cities usually have more than one theater and more than one movies playing.

On the other hand, two of the three sets of two attributes are keys. Clearly {title, city} is a key because of the given dependency that says these attributes functionally determine theater.

It is also true that {theater, title} is a key. To see why, start the given dependency theater → city. By the augmentation rule of Exercise 3.6.3(a), theater, title → city follows. Intuitively, if theater alone functionally determines city, then surely theater and title together will do so.

The remaining pair of attributes, city and theater, do not functionally determine title, and are therefore not a key. We conclude that the only two keys are

{title, city} {theater, title} Now we immediately see a BCNF violation. We were

请输入译文

given functional dependency theater → city, but its left side, theater, is not a superkey. We are therefore tempted to decompose, using this BCNF-violating dependency, into the two relation schemas:

{theater, city} {theater, title} There is a problem with this decomposition, concerning t

he functional dependency title city → theater. There could be current relations for the decomposed schemas that satisfy the dependency theater → city (which can be checked in the relation {theater,city}) but that, when joined, yield a relation that does not satisfy title city → theater. For instance, the two relations

and

请输入译文

theater city

Guild Menlo Park

Park Menlo Park

theater title

Guild The Net

Park The Net

are permissible according to the functional dependencies that apply to each relation,, but when we join them we get two tuples

that violate the dependency title city → theater. The solution to the above problem is to relax our BCNF

requirement slightly, in order to allow the occasional relation schema, like that of Example 3.42 that cannot be decomposed into BCNF relations without our losing the ability to check each functional dependency within one relation. This relaxed condition is called the third normal form condition:

. A relation R is in third normal form (3NF) if: whenever A1A2 … An → B is a nontrivial dependency, either {A1, A2, …, An} is a superkey, or B is a member of some key.

Note that the difference between this 3NF condition and the BCNF condition is the clause “or B is a member of some key .” This clause “excuses” a dependency like theater → city in Example 3.42, because the right side, city, is a member of a key.

请输入译文

theater city title

Guild Menlo Park The Net

Park Menlo Park The Net

It is beyond the scope of this book to prove that 3NF is in fact adequate for its purposes. That is, we can always decompose a relation schema in a way that does not lose information, into schemas that are in 3NF and allow all functional dependencies to be check. When these relations are not in BCNF, there will be some redundancy left in the schema, however.

It is interesting to observe that the example we found of a relation schema that is in 3NF but not in BCNF is somewhat different from the the non-BCNF examples we have seen earlier. One of the relevant functional dependencies, theater → city, is of the typical form, based on the fact that a theater is a unique object located in one city. However, the other dependency

title city → theater comes from an observation about the movie-distribution po

licies practiced by movie studios. In general, functional dependencies fall into two categories: those based on the fact that we are representing unique objects such as movies into at most one theater per city.

3.7.8 Exercises for Section 3.7 Exercise 3.7.1: For each of the following relation schemas

and sets of functional dependencies: a) R(A, B, C, D) with functional dependencies AB → C,

C → D, and D → A.

请输入译文

b) R(A, B, C, D) with functional dependencies B → C, B → D.

c) R(A, B, C, D, E) with functional dependencies AB → C, DE → C, and B → D.

do the following: i) Indicate all the BCNF violations. Do not forget to

consider dependencies that are not in the given set, but follow from them. However, it is not necessary to give violations that have more than one attribute on the right side.

ii) Decompose the relation, as necessary, into collections of relations that are in BCNF.

iii) Indicate all the 3NF violations. iv) Decompose the relations, as necessary, into

collections of relations that are in 3NF. Exercise 3.7.2: We mentioned is Section 3.7.4 that we

should expand the right side of a functional dependency that is a BCNF violation if possible. However, it was deemed an optional step. Consider a relation R whose schema is the set of attributes {A, B, C, D} with functional dependencies A → B and A → C. Either is a BCNF violation, because the only key for R is {A, D}. Suppose we begin by decomposing R according to A → B. Do we ultimately get the same result as if first expand the BCNF violation to A → BC? Why or why not?

请输入译文

Exercise 3.7.3: Suppose we decompose relation R(A, B, C, D, E) into relation S(A, B, C) and other relation. Give the functional dependencies that hold in S if the dependencies for R are:

a) AB → DE, C → E, D → C, and E → A. b) A → D, BD → E, AC → E, and DE → B. c) AB → D, AC → E, BC → D, D → A, and E → B In each case, it is sufficient to give a minimal basis for the

full set of dependencies of S.

请输入译文