Upload
nguyenkhanh
View
234
Download
0
Embed Size (px)
Citation preview
Winter 2017
Bojana Bislimovska
CS 338
Functional Dependencies
Major research
• Design Guidelines for Relation Schemas
• Functional Dependency
• Set and Attribute Closure
• Schema Decomposition
• Boyce-Codd Normal Form
Outline
Major research
Measures to determine the quality of relation schema design:
• Clear semantics of attributes in the schema
• Reduction of redundant information in tuples
• Reduction of NULL values in tuples
• Disallowment of generating spurious tuples
Above-mentioned measures are not always independent of one another
Design Guidelines for Relation Schemas
Major research
Guideline 1: Design a relation schema so that is easy to explain its meaning.
• Do not combine attributes from multiple entity types and relationship types into a single relation
• Example of violating guideline 1
• Every tuple includes employee and department data
• Redundant repetition of Dname and Manager Ssn for every employee (increases storage space)
• Potential for many NULL values (employee without department, department without employees)
Design Guidelines for Relation Schemas
Major research
• Update anomalies
Deleting the last employee in a dept should not delete dept
Changing the dept name/mgr requires many tuples to be updated
Inserting employees requires checking for consistency of its dept name and manager
• Guideline 2: Design base relation schemas so that no insertion, deletion, or modification anomalies are present
• Consistent with guideline 1
Design Guidelines for Relation Schemas
Major research
• NULL values in tuples
• If some attributes do not apply to all tuples in the relation
• Multiple interpetation of NULL values
Guideline 3: Avoid placing attributes in a base relation whose values may frequently be NULL. If NULLs are unavoidable, make sure they apply in exceptional cases, and do not apply to a majority of tuples in the relation.
Design Guidelines for Relation Schemas
Major research
• Spurious tuples
Guideline 4: Design relation schemas so that they can be joined with equality conditions on appropirately related attribute pairs (primary key, foreign key) that guarantees that no spurious tuples are generated.
Design Guidelines for Relation Schemas
Major research
Functional Dependency
• Constraint between two sets of attributes from the database
• Expresses the fact that in a relation schema (values of) a set of attributes uniquely determine (values of) another set of attributes
Definition: Given relation schema R(A1,A2,…,An) and sets of attributes X {A1,A2,…,An}, Y {A1,A2,…,An}, X Y specifies the following constraint (functional dependency): for any tuples t1 and t2 in any valid relation state r of R, if t1[X] = t2[X] then t1[Y] = t2[Y] .
• A functional dependency is a property of the semantics (meaning) of the attributes
• Main use in the further description of a relation schema by specifying constraints on its attributes that must always hold
• Given a relation state
Cannot determine which functional dependencies hold
Can state that functional dependency does not hold if there are tuples that show violation of the dependency
Major research
Functional Dependency
• Notation: {B1,B2,…,Bi} {C1,C2,…,Cj} but can omit set braces if i=1 or j=1, respectively.
• Example
Following functional dependencies hold
Ssn Ename
Pnumber {Pname, Plocation}
{Ssn, Pnumber} Hours
Major research
Functional Dependencies and Keys
• Keys (as defined previously):
A superkey is a set of attributes such that no two tuples (in an instance) agree on their values for those attributes.
A candidate key is a minimal superkey.
A primary key is a candidate key chosen by the DBA
• Relating keys and FDs:
If K ⊆ R is a superkey for relatio s he a R, the depe de K → R holds on R.
If depe de K → R holds o R a d we assu e that R does ot o tai duplicate tuples (i.e. relational model) then K ⊆ R is a superkey for relation schema R
Major research
Closure of FD Sets
How do we know what additional FDs hold in a schema?
• The closure of the set of functional dependencies F (denoted F +) is the set of all functional dependencies that are satisfied by every relational instance that satisfies F.
• Informally, F + includes all of the dependencies in F, plus any dependencies they imply.
Major research
Computing Attribute Closures
• If we want to derive the maximal set of attributes functionally determined by some X (called the attribute closure of X).
function ComputeX +(X, F)
begin
X + := X;
while true do
if there e ists Y → ) ∈ F such that
(1) Y ⊆ X +, and
(2) Z ⊆ X +
then X + := X + ∪ Z
else exit;
return X +;
end
Major research
Computing Attribute Closures
• Let R be a relational schema and F a set of functional dependencies on R. Then
• Theorem:
X is a superkey of R if and only if ComputeX +(X, F) = R
• Theorem:
X → Y ∈ F + if and only if Y ⊆ ComputeX +(X, F)
Major research
Attribute Closure Example
Example: F = { Ssn → EName
Pnum → Pname, Ploc
Ploc, Hours → Allowa e }
ComputeX +({Pnum,Hours},F):
FD X+
initial Pnum, Hours
Pnum→ Pname, Ploc Pnum, Hours, Pname, Ploc
Ploc, Hours→ Allowa e Pnum, Hours, Pname, Ploc, Allowance
function ComputeX +(X, F)
begin
X + := X;
while true do
if there e ists Y → ) ∈ F such that
(1) Y ⊆ X +, and
(2) Z ⊆ X +
then X + := X + ∪ Z
else exit;
return X +;
end
Major research
Schema Decomposition
Let R be a relation schema (= set of attributes). The collection {R1, . . . , Rn} of relation schemas is a decomposition of R if
R = R1 ∪ R2 ∪ · · · ∪ Rn
A good decomposition does not
• lose information
• complicate checking of constraints
• contain anomalies (or at least contains fewer anomalies)
Major research
Lossless-Join Decompositions
• We should be able to reconstruct the instance of the original table from the instances of the tables in the decomposition
Example: Consider replacing
By decomposing into two tables
Student Assignment Group Mark
Ann A1 G1 80
Ann A2 G3 60
Bob A1 G2 60
Student Group Mark
Ann G1 80
Ann G3 60
Bob G2 60
Assignment Mark
A1 80
A2 60
A1 60
Major research
Lossless-Join Decompositions
Computing natural join of SGM and AM produces
. . . and we get extra data (spurious tuples). We would therefore lose information if we were to replace the first table, Marks, by SGM and AM.
If re-joining SGM and AM would always produce exactly the tuples in Marks, then we call SGM and AM a lossless-join decomposition
Student Assignment Group Mark
Ann A1 G1 80
Ann A2 G3 60
Ann A1 G3 60
Bob A2 G2 60
Bob A1 G2 60
Major research
Lossless-Join Decompositions
• A decomposition {R1, R2} of R is lossless if and only if the common attributes of R1 and R2 form a superkey for either s he a, that is R ∩ R → R or R ∩ R → R2
• Example: In the previous example we had
R = {Student, Assignment, Group, Mark} ,
F = { “tude t, Assig e t → Group, Mark } , R1 = {Student, Group, Mark} ,
R2 = {Assignment, Mark}
• Decomposition {R1, R2} is lossy e ause R ∩ R = {Mark} is ot a superkey of either {Student, Group, Mark} or {Assignment, Mark}
Major research
Normal Forms
What is a good relatio al data ase s he a? • Rule of thumb: Independent facts in separate tables:
Each relation schema should consist of a primary key and a set of mutually independent attri utes
This is achieved by transforming a schema into a normal form.
• Goals:
Intuitive and straightforward transformation
Anomaly-free/Nonredundant representation of data
Major research
Boyce-Codd Normal Form (BCNF) - Informal
• BCNF formalizes the goal that in a good database schema, independent relationships are stored in separate tables.
• Given a database schema and a set of functional dependencies for the attributes in the schema, we can determine whether the schema is in BCNF. A database schema is in BCNF if each of its relation schemas is in BCNF.
• Informally, a relation schema is in BCNF if and only if any group of its attributes that functionally determines any others of its attributes functionally determines all others, i.e., that group of attributes is a superkey of the relation.
Major research
Formal Definition of BCNF
Let R be a relation schema and F a set of functional dependencies.
Schema R is in BCNF (w.r.t. F) if and only if
whenever (X → Y ) ∈ F + and XY ⊆ R, then either
• (X → Y ) is trivial (i.e., Y ⊆ X), or
• X is a superkey of R
A database schema {R1, . . . , Rn} is in BCNF if each relation schema Ri is in BCNF.
Major research
BCNF and Redundancy
• Why does BCNF avoid redundancy? Consider:
Supplied_Items
• The following functional dependency holds: Sno → Sname, City
• Therefore, supplier name “ ith a d it Aja ust e repeated for ea h ite supplied supplier S1.
• Assume the above FD holds over a schema R that is in BCNF. This implies that:
Sno is a superkey for R
each Sno value appears on one row only
no need to repeat Sname and City values
Sno Sname City Pno Pname Price
Major research
Lossless-Join BCNF Decomposition
function DecomposeBCNF(R, F)
begin
Result := {R};
while some Ri ∈ Result a d X → Y ∈ F +
violate the BCNF condition do begin
Replace Ri by Ri − Y − X ; Add {X, Y } to Result;
end;
return Result;
end
Major research
Lossless-Join BCNF Decomposition
• No efficient procedure to do this exists.
• Results depend on sequence of FDs used to decompose the relations.
• It is possible that no lossless join dependency preserving BCNF decomposition exists
Major research
Lossless-Join BCNF Decomposition Example
• R = {Sno,Sname,City,Pno,Pname,Price}
• functional dependencies:
Sno → Sname,City
Pno → Pname
Sno,Pno → Pri e • This schema is not in BCNF because, for example, Sno determines Sname and
City, but is not a superkey of R.
Major research
Lossless-Join BCNF Decomposition Example
• The complete schema is now
R1 = {Sno,Sname,City}
R2 = {Sno,Pno,Price}
R3 = {Pno,Pname}
• This schema is a lossless-
join, BCNF decomposition of
the original schema R.