Upload
joella-fields
View
233
Download
0
Tags:
Embed Size (px)
Citation preview
1
CSE 480: Database Systems
Lecture 18: Normal Forms and Normalization
2
Functional Dependencies
A functional dependency (FD) takes the form of X Y, where X and Y are subsets of attributes in a relation
What does X Y mean?
Values of attributes X determines the values of attributes Y;
Values of attributes Y depends on the values of attributes X;
Suppose t1 and t2 are two tuples in the relation. If t1 and t2 have the same values for attribute set X, then their values for attribute set Y
must be identical to each other in these two tuples
3
Functional Dependencies
EMP_PRJ(Ssn, Pnumber, Hours, Ename, Pname, Plocation)
{Ssn} {Ename} is a FD
Ename depends on Ssn
{Pnumber} {Pname, Plocation} is a FD
Pname and Plocation depends on Pnumber
Two rows with the same Pnumber must have the same values of Pname and Plocation
{Plocation} {Pnumber} is not a FD
{Ename, Plocation} {Pnumber} is not a FD
4
Functional Dependencies
Graphical Representation of FDs:
FD1: {SSN, Pnumber} {Hours} FD2: {SSN} {Ename} FD3: {PNumber} {PName, PLocation}
5
Functional Dependencies
A relation may contain many functional dependencies– How to derive all of them?
Given a set of functional dependencies of a relation R:
= {AC B, A C, D A}
– Does entail AD BC (i.e., is AD BC also a FD of R)?
6
Inference Rules (Example)
Given AC B, A C, D A }
Does entail AD BC?
1. D A (given in )
2. AD A (augmenting (1) with A)
3. A C (given in )
4. A AC (augmenting (3) with A)
5. AC B (given in )
6. AC BC (augmenting (5) with C)
7. A BC (transitive between (4) and (6))
8. AD BC (transitive between (2) and (7))
7
Normal Forms and Normalization
Functional dependencies can help us analyze whether a relational schema is “good” or “bad”
In relational model, we don’t say that a schema is good/bad. We say it is in 1NF, 2NF, 3NF, etc
– Properties The higher the NF, the stricter the conditions placed on the schema A higher NF relation is also in lower NF but not vice-versa
– A 3NF relation is in 2NF and 1NF (but not in 4NF, 5NF)
Normalization:– The process of decomposing "bad" (lower normal form) relations
by breaking up their attributes into smaller relations
8
First Normal Form
A schema is in 1NF if it permits only atomic (indivisible) attribute values
1NF disallows– composite attributes
– multivalued attributes
The relational model itself prohibits relations that contain composite and multivalued attributes– Therefore, all the schemas in relational model are at least in 1NF
9
Example
Relation is not in 1NF because it has a multivalued attribute (Dlocations)
10
Normalization into 1NF
3 strategies for normalization:– Place the “offending” attributes in a separate relation
DEPARTMENT(Dname, Dnumber, Dmgr_ssn) DEPTLOCATIONS(Dnumber, Dlocation)
– Change Dlocations into Dlocation and modify the primary key DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dlocation)
– If the maximum number of locations per department is 3: DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dloc1, Dloc2, Dloc3)
11
Is 1NF Sufficient?
Key of the relation is the combination of (Dnumber, Dlocation)
Relation is in 1NF, but there are redundancies:– Two rows with the same Dnumber must have the same Dname
and Dmgr_ssn (even though their Dlocations are different)
12
2NF (Motivating Example)
Functional dependencies – {Dnumber, Dlocation} {Dname, Dmgr_ssn} (from primary key)
– {Dnumber} {Dname, Dmgr_ssn}
Consequence: two tuples with same Dnumber but different Dlocation will have same Dname and Dmgr_ssn, which leads to redundancy!
If {Dnumber} {Dname, Dmgr_ssn} is not a FD, then there won’t be a redundancy problem
13
2NF (Motivating Example)
This example suggests that if X Y is a FD, where X is the key, you can’t have X’ Y also a FD of the same table (where X’ is a subset of X), otherwise, there’ll be redundancies in the table
– We say that X Y must be a full FD
{Dnumber, Dlocation} {Dname, Dmgr_ssn} (from primary key)
{Dnumber} {Dname, Dmgr_ssn}
14
Full versus Partial Dependencies
X Y is a full FD if removal of any attribute from X means the FD does not hold any more
X Y is a partial FD if there is a FD X’ Y where X’ is a subset of X
Example:
– {Dnumber, Dlocation} {Dname, Dmgr_ssn} is a partial FD because {Dnumber} {Dname, Dmgr_ssn} is also a FD of the schema
15
Prime versus NonPrime Attributes
Prime attribute: – an attribute that is a member of the candidate key K
– Example (from previous slide): Dnumber, Dlocation
Nonprime attribute:– an attribute that is not a member of any candidate key.
– Example (from previous slide): Dname, Dmgr_ssn
16
2NF Definition
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the key of R
Since {Dnumber, Dlocation} is the key– {Dnumber, Dlocation} {Dname, Dmgr_ssn} is FD of the schema– But {Dnumber} {Dname, Dmgr_ssn} is also a FD of the schema
The non-prime attributes are not fully functionally dependent on the key
So schema is not in 2NF
17
Example
FDs:– {SSN, Pnumber} {Hours, Ename, Pname, Plocation},
– {SSN} {Ename},
– {Pnumber} {Pname, Plocation}
18
Example
– {SSN, PNUMBER} HOURS is a full FD since neither SSN HOURS nor PNUMBER HOURS hold
– But {SSN, PNUMBER} ENAME is a partial dependency since SSN ENAME also holds
19
2NF
– Is {SSN, PNUMBER} {Hours} a full FD? Yes– Is {SSN, PNUMBER} {Ename} a full FD? No– Is {SSN, PNUMBER} {Pname} a full FD? No– Is {SSN, PNUMBER} {Plocation} a full FD? No
Conclusion: The EMP_PROJ relation is not in 2NF 2NF normalization: take the “offending” FDs and create
separate relations
20
Normalizing into 2NF
{SSN, Pnumber} {Hours},
{SSN} {Ename},
{Pnumber} {Pname, Plocation}
21
Is 2NF sufficient?
Key is SSN FDs:
– {SSN} {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn}– {Dnumber} {Dname, Dmgr_ssn}
Is the table in 2NF? – Yes because every non-prime attribute is fully FD on the key
22
Is 2NF sufficient?
Are there still redundancies in the relation? Yes– Two tuples with the same Dnumber have the same Dname and
Dmgr_ssn
What is the “offending” FD that causes redundancy?
23
Is 2NF sufficient?
Functional dependencies:– {SSN} {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn}
– {Dnumber} {Dname, Dmgr_ssn}
Since Dnumber is not a key, you can have two rows with the same Dnumber. Hence their Dname and Dmgr_ssn must be the same => redundancy!
24
3NF
A relation schema R is in third normal form (3NF) if – It is in 2NF and
– There is no non-prime attribute in R that is transitively dependent on the primary key In X Y and Y Z are FDs, with X as the primary key, we consider
Z to be transitively dependent on X only if Y is not a candidate key. If Y is a candidate key, then we do not consider this as a transitive dependency problem
25
Example of 3NF
FDs:– SSN Ename, Bdate, Address, Dnumber– SSN Dnumber– Dnumber Dname, Dmgr_ssn
Dname is transitively dependent on the primary key SSN because SSN Dnumber and Dnumber Dname are FDs of the relation
– Therefore the relation is not in 3NF
26
Third Normal Form
Another way to check whether a relation is in 3NF (without checking for partial and transitive dependencies):
– A relation schema R is in 3NF if whenever a nontrivial FD X A holds, either X is a superkey of R or A is a prime attribute of R
27
3NF
FDs:– SSN Ename, Bdate, Address
– SSN Dnumber
– Dnumber Dname, Dmgr_ssn But Dnumber is not superkey and Dname,Dmgr_ssn are not prime
attributes
Therefore the relation is not in 3NF
Transitive dependency
28
Normalizing into 3NF
Take the “offending” FDs and create separate relations
29
Is 3NF enough to remove redundancy?
FDs: – {Student, Course} Instructor
– Instructor Course
Relation is in 3NF (but there is still redundancy)
Assume every instructor teaches only 1 course
Key is (Student, Course)
No transitive dependency because Course is not a
prime attribute
30
BCNF (Boyce-Codd Normal Form)
A relation schema R is in BCNF if whenever an FD X A holds in R, then X must be a superkey of R
FDs: – {Student, Course} Instructor
– Instructor Course
Relation is not in BCNF because Instructor is not a superkey
31
Achieving BCNF by Decomposition
STUD_COURSE– Key is {Student,Course}
COURSE_INSTRUCT– Key is {Instructor}
– FD: Instructor Course
Loses the FD: {Student, Course} Instructor– But no redundancy
STUD_COURSE COURSE_INSTRUCT
32
Decomposition 1
Problem: decomposition does not result in lossless join (i.e., does not have nonadditive join property)
– i.e., spurious tuples may be generated
33
Decomposition 2
Dependency preserving? No– loses the FD: {Student, Course} Instructor
Lossless join? Yes
34
Decomposition 3
Dependency preserving? No– loses the FD: {Student, Course} Instructor
Lossless join? No
35
Summary
1st normal form– no composite/multivalued attributes in relations
2nd, 3rd, and Boyce-Code normal forms– Eliminate redundancies based on FDs
More normal forms (see textbook)– 4th : deal with multivalued dependencies
– 5th : deal with join dependencies