View
215
Download
1
Embed Size (px)
Citation preview
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1
Database Systems I
Design Theoryfor Relational Databases
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 2
IntroductionTypically, the first database design uses a high-level database model such as the ER model. This model is then translated into a relational schema.Sometimes a relational database schema is developed directly without going through the high-level design.Either way, the initial relational schema has room for improvement, in particular by eliminating redundancy.Redundancies lead to undesirable update and deletion anomalies.Relational database design theory introduces various normal forms of a schema that avoid various types of redundancies and algorithms to convert a relational schema into these normal forms.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 3
Functional DependencyNormal forms are based on the concept of functional dependencies between sets of attributes.A functional dependency (FD) X Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of set X, then they must also agree on all attributes in set Y.We say “X Y holds in R.”Convention: …, X, Y, Z represent sets of attributes of relation R. A, B, C,… represent single attributes of R.Convention: no parentheses to denote sets of attributes, just ABC, rather than {A,B,C }.A FD X Y is called trivial if .
XY
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 4
Splitting / Combining Rule
XA1A2…An holds for R if and only if each of
XA1, XA2,…, XAn hold for R.
Example: The FD ABC is equivalent to the two FDs AB and AC.
This rule can be used to split a FD into multiple ones with singleton right sides or to combine multiple singleton right side FDs into one FD.
There is no splitting /combining rule for left sides.
We’ll generally express FDs with singleton right sides.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 5
Functional Dependency
Consider the relationMovies1 (title, year, length, genre, studioName, starName).title year length genre studioName holds, assuming that there are not two movies with the same title in the same year.title year starName does not hold, since a movie can have more than one star acting.A FD makes an assertion about all possible instances of a relation, not only about one (such as the current) instance.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 6
Keys
Given a relation R with attributes X = {A1, . . .,
An}.
is a superkey for relation R if K
functionally determines X, i.e. K X.
K is a key for R if K is a superkey, but no
proper subset of K is a superkey.
Keys are a special case of a FD.
Keys can be deduced systematically, if all FDs
for relation R are given.
XK
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 7
Keys
{title, year, starName} is a superkey of Movies1, since title title, year year, title year length,
title year genre,title year studioName,starName starName.
Remember that title year starName does not hold.{title, year}, {year, starName} and {title, starName} are not superkeys.Thus, {title, year, starName} is a key of Movies1.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 8
Closure of Attributes
Given a set of attributes {A1, . . ., An} and a set S of FDs.The closure of {A1, . . ., An} under S is the set of attributes X such that every relation that satisfies all the FDs in S also satisfies {A1, . . ., An} X,i.e. {A1, . . ., An} X follows from the FDs in S.
The closure of set Y is denoted by Y+.
Exampleattribute set {A, B, C}FDs {ABD, DE, BCF, GH}{A,B,C}+ = {A, B, C, D, E, F}
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 9
Computing the Closure of Attributes
Given a set of attributes {A1, . . ., An} and a set
S of FDs.
If necessary, apply the splitting rule to the FDs in S.
Initialize X to {A1, . . ., An}.
Repeat search for some FD B1, . . ., Bm C in S
such that for all and add C to the set Xuntil no more attribute C can be added.
Now X = {A1, . . ., An}+, , return X.
XCXBi i and:
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 10
Computing the Closure of Attributes
Given a set of attributes {A, B, C, D, E, F}
and FDs {AB C, BC AD, D E, CFB}.
What is {A,B} +?
Apply the splitting rule: split BC AD into BC A and BC D.
Initialize X = {A,B} .
Iterationsapply AB C, X = {A,B,C}
apply BC D, X = {A,B,C,D}apply D E, X = {A,B,C,D,E}
Return {A,B} + = {A,B,C,D,E}.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 11
Relational Schema DesignGoal of relational schema design is to avoid anomalies.Redundancies lead to certain forms of anomalies. and redundancy.Update anomalyone occurrence of a fact is changed, but not all occurrences.Deletion anomalyvalid fact is lost when a tuple is deleted.In the following example, consider the relationMovies1 (title, year, length, genre, studioName, starName).
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 12
Relational Schema Designtitle year length genre studioNam
estarName
Star Wars 1977 124 SciFi Fox Carrie Fisher
Star Wars 1977 124 SciFi Fox Mark Hamill
Star Wars 1977 124 SciFi Fox Harrison Ford
Gone with the Wind
1939 231 drama MGM Vivien Leigh
Wayne’s World
1992 95 comedy Paramount Dana Carvey
Wayne’s World
1992 95 comedy Paramount Mike MeyersUpdate anomaly: update the length to 125 (only) for the first Star Wars tuple.Deletion anomaly: delete Vivien Leigh (and the corresponding movie).
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 13
Decomposing Relations
How to eliminate these anomalies? Decompose the relation into two or more relations that together have the same attributes.Given relation R {A1, . . ., An}. A decomposition of R consists of two relations S {B1, . . ., Bm} and T {C1, . . ., Ck} such that
},...,{},...,{},...,{ 111 kmn CCBBAA
)(,...,1RS
mBB
)(,...,1RT
kCC
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 14
Decomposing RelationsDecompose Movies1 (title, year, length, genre, studioName, starName)into Movies2 (title, year, length, genre, studioName)and Movies3 (title, year, starName).
title year length
genre studioName
Star Wars
1977
124 SciFi Fox
Gone with the Wind
1939
231 drama MGM
Wayne’s World
1992
95 comedy
Paramount
title year starName
Star Wars
1977 Carrie Fisher
Star Wars
1977 Mark Hamill
Star Wars
1977 Harrison Ford
Gone with the Wind
1939 Vivien Leigh
Wayne’s World
1992 Dana Carvey
Wayne’s World
1992 Mike Meyers
Movies2
Movies3
The update and deletionanomalies are gone!
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 15
Boyce-Codd Normal Form
There are many ways of decomposing a relation.Which decompositions leads to an anomaly-free relation? Boyce-Codd normal form defines a condition under which the anomalies discussed so far cannot exist.A relation R is in Boyce-Codd normal form (BCNF), if and only if the following condition holds:for every non-trivial FD A1 . . . An B1 . . . Bm for R {A1, . . ., An } is a superkey for R.
I.e., the left side of a FD needs to contain a key.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 16
Boyce-Codd Normal FormConsider Movies1 (title, year, length, genre, studioName, starName).{title, year, starName} is a key, and it is the only one.We have the FD title year length genre studioName.The left side of this FD is not a superkey, i.e. Movies1 is not in BCNF.Consider Movies2 (title, year, length, genre, studioName).{title, year } is its only key, since neither
title length genre studioName nor year length genre studioName hold.
All non-trivial FDs must have at least title and year on the left side. Thus, Movies2 is in BCNF.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 17
Decomposition into BCNF
We want to find a decomposition R into R1, . . ., Rk such that
each of the resulting relations is in BNCF, andthe original relation R can be reconstructed from R1, . . ., Rk .
Look for non-trivial FD that violates BCNF, e.g. A1 . . . An B1 . . . Bm. {A1,. . ., An} is no superkey.Add to the right side all other attributes {C1,. . ., Ck} that functionally depend on {A1,. . ., An} .
Need to compute the closure of {A1,. . ., An} under the given FDs.This step is optional, but leads to a smaller number of component relations.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 18
Decomposition into BCNF
Decompose R into R1 and R2: R1 contains {A1,. . ., An} and {B1,. . ., Bm} and {C1,. . ., Ck} R2 contains {A1,. . ., An} and all attributes not involved in the FD.Continue to decompose the resulting relations until there is no more FD for any of the Ri that violates BCNF.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 19
Decomposition into BCNF
Consider Movies1 (title, year, length, genre, studioName, starName).title year length genre studioName violates BCNF.Decompose Movies1 into Movies2 (title, year, length, genre, studioName)
and Movies3 (title, year, starName).{title, year, starName} is key for Movies3.In general, more than one decomposition necessary to reach a schema in BCNF.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 20
Decomposition into BCNFConsider relation schema {title, year, studioName, president, presAddr}.FDs
title year studioNamestudioName presidentpresident presAddr
{title, year} is the only key.The last two FDs violate BCNF. Assume that we first deal with studioName president.Decompose into
{title, year, studioName}, and {studioName, president, presAddr}.While the first of these relations is in BCNF, the second one is not:
president presAddr, but studioName is key.Decompose the second relation further into {studioName, president} and
{president, presAddr}.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 21
Recoverability of InformationSo far, we know how to decompose R into R1, . . ., Rk such that each of the resulting relations is in BNCF.But can we reconstruct R from R1, . . ., Rk ?
More precisely: is ?Yes!To see why, consider a relation R(A,B,C) and a FD BC that violates BCNF.Decompose R into R1(A,B) and R2(B,C).Let t = (a,b,c) be an arbitrary tuple from R.
Then and consequently .
Thus, all tuples in R are in
kRRR
1
RcbRbaCBBA ,,
),(and),(
21),,( RRcba
.21RR
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 22
Recoverability of InformationLet t = (a, b, d) and v = (d, b, e) be arbitray tuples in R, which implies that
Then Is it also in R?BC and implies d = e.Since (a, b, d) in R, also (a, b, e) in R.Thus, all tuples in are in R. This means that the BCNF decomposition is lossless.
21RR
RebRdbRbdRba
CBCB
BABA
,,
,,
),( and),(),( and),(
.21
),,( RReba
RebRdbCBCB ,,
),( and),(
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 23
Summary
A functional dependency (FD) states that two tuples that agree on some set of attributes also agree on another attribute set.Keys are special cases of functional dependencies.Redundancies in a relational table lead to anomalies such as update and deletion anomalies. A relation is in Boyce-Codd normal form (BCNF), if the left sides of all non-trivial FDs contain a key.A schema in BCNF avoids the above anomalies.A given schema can be decomposed into subsets of attributes such that the resulting tables are all in BCNF and the join of these tables recovers the original table.