Upload
deshaun-halden
View
213
Download
0
Embed Size (px)
Citation preview
Normalization
Sridhar [email protected]
SSN PNUMBER HOURS ENAME PNAME PLOC
E1 P1 20 Joe CIS Roof UNCW
E1 P2 20 Joe Restaurant Mayfaire
E2 P1 40 Joe CIS Roof UNCW
EMP_PROJ
• Something feels wrong about this design• Try adding a row – Insertion anomaly• Try deleting a row – Deletion anomaly• Try updating a row – Update anomaly
• Need a formal way to reason about what is wrong with it and how to fix it
Functional Dependency
• Constraints between attribute sets in a relation
• If X and Y are sets of attributes of a relation R, and whenever two tuples in R have the same X-values they also have the same Y-values, we say that X functionally determines Y.
Functional Dependency
• Written as X -> Y– X functionally determines Y– Y is functionally determined by X– X is the determinant, Y is the dependent
• Examples– SSN -> SSN (trivial dependency)– PNUMBER -> PNAME– SSN -> ENAME– SSN, PNUMBER -> HOURS
Functional Dependency
• Between sets of attributes, not just single attributes
• Holds for all time, not just for a particular instance (snapshot) of a relation
• Formally states constraints that exist for the relation– These constraints are in addition to those imposed
by primary keys and foreign keys
Functional dependencies and keys
• If X functionally determines all attributes of R, then X is a super key
• If X is irreducible, i.e. every member of X is essential for the functional dependencies to hold, then X is a candidate key.
• Attributes that are a part of a candidate key are key attributes
Examples
Super key:– SSN, PNUMBER, PNAME -> SSN, PNUMBER, HOURS,
ENAME, PNAME, PLOC
Candidate key:– SSN, PNUMBER -> SSN, PNUMBER, HOURS, ENAME,
PNAME, PLOC
SSN PNUMBER HOURS ENAME PNAME PLOC
E1 P1 20 Joe CIS Roof UNCW
E1 P2 20 Joe Restaurant Mayfaire
E2 P1 40 Joe CIS Roof UNCW
Redundancy
• If in a relation R, A -> B and A is not a candidate key for R, then R will involve some redundancy.
SSN PNUMBER HOURS ENAME PNAME PLOC
Intuitively, all functional dependencies in a relation should involve candidate keys to eliminate redundancy
Normalization
• A process that utilizes functional dependencies to identify relation schemas that have an undesirable form (redundancy) and decomposes them into smaller schema in which the redundancy has been eliminated.
Decomposition
• Decomposition should be– Lossless join• Allow exact recovery of the original schema (without
spurious tuples)
– Dependency preserving• Allow dependencies to be checked without requiring a
join
Lossy decomposition
SSN PNUMBER HOURS ENAME
E1 P1 20 Joe
E1 P2 20 Joe
E2 P1 40 Joe
ENAME PNAME PLOC
Joe CIS Roof UNCW
Joe Restaurant Mayfaire
Joe CIS Roof UNCW
Natural join to recover originalSSN PNUMBER HOURS ENAME PNAME PLOC
E1 P1 20 Joe CIS Roof UNCW
E1 P2 20 Joe Restaurant Mayfaire
E2 P1 40 Joe CIS Roof UNCW
E2 P1 40 Joe Restaurant Mayfaire
Heath’s Theorem
• If relation R = {A,B,C} where A,B,C are attribute sets
• and A -> B• then R1= {A, B} and R2 = {A, C} represents a
lossless decomposition
Levels of normalization
• First normal form – 1NF• Second normal form – 2NF• Third normal form – 3NF• Boyce-Codd Normal Form - BCNF
Increasingly stringent requirements
Normal Forms
1NF 2NF3NF
BCNF
First normal form
• Relation is in 1NF if all attribute values are atomic (By definition, all relations are in 1NF)
D_NAME D_NUM MGR_SSN D_LOCATIONS
RESEARCH 5 334619276 {Lumberton, Red Springs, Raeford}
• Assume that a department can have multiple locations, like {Lumberton, Red Springs, Raeford}• Relation not in 1NF
Resolution?
D_NAME D_NUM MGR_SSN D_LOCATIONS
RESEARCH 5 334619276 Lumberton
RESEARCH 5 334619276 Red Springs
RESEARCH 5 334619276 Raeford
DecompositionD_NAME D_NUM MGR_SSN D_LOCATIONS
D_NAME D_NUM MGR_SSN D_NUM D_LOCATIONS
Second Normal Form: 2NF
• A relation is in 2NF if – It is in 1NF, and– If the non-key attributes are fully (irreducibly)
dependent on the primary key
Example: EMP_PROJ
SSN PNUMBER HOURS ENAME PNAME PLOC
• Functional Dependencies?• SSN -> ENAME• PNUMBER -> PNAME, PLOC• {SSN, PNUMBER} -> HOURS
•Relation not in 2NF• Non-key attributes ENAME, and PLOC and PNAME, are not
fully dependent on the primary key
Solution? Decompose
SSN PNUMBER ENAME PNAME PLOC1b
SSN PNUMBER HOURS1a 2NF
2NF ?
Decompose further…
SSN PNUMBER PNAME PLOC2b
SSN ENAME2a 2NF
2NF ?
And a little more…
SSN PNUMBER3b 3b is a part of 1a, so drop it.
PNUMBER PNAME PLOC3a 2NF
2NF Normalization
SSN PNUMBER HOURS1a 2NF
SSN ENAME2a 2NF
PNUMBER PNAME PLOC3a 2NF
More than one way to get here
SSN PNUMBER HOURS ENAME PNAME PLOC
PNUMBER PNAME PLOC1a 2NF
SSN PNUMBER HOURS ENAME1b Not2NF
Decompose further…
SSN PNUMBER HOURS2a
SSN PNUMBER ENAME2b
2NF
Not2NF
And a little bit more
SSN PNUMBER
3a SSN ENAME
3b
2NF
Redundant
3NF Normalization
• A relation is in 3NF if – It is in 2NF, and– If the non-key attributes are mutually
independent. That is, no functional dependencies exist between non-key attributes.
Example: EMP_DEPT
• Functional Dependencies?• SSN -> {ENAME, DOB, ADDRESS, DNUM}• DNUM -> {DNAME, DMGRSSN}
• Redundancy? • Relation in 1NF ?• 2NF ?• 3NF ?
SSN ENAME DOB ADDRESS DNUM DNAME DMGRSSN
3NF Normalization
DNUM DNAME DMGRSSN
SSN ENAME DOB ADDRESS DNUM1a1b
BCNF Normalization
• S# and SNAME – Supplier# and Supplier Name are unique• FDs
– S# -> SNAME– SNAME -> S#– S#,P# -> QTY– SNAME, P# -> QTY
• Candidate keys– S#, P# and SNAME, P#
S# SNAME P# QTY
S1 Acme Supply P1 100
S2 Gem Mfg P1 200
S1 Acme Supply P2 400
BCNF Normalization
• Redundancy?• 1NF?• 2NF?• 3NF?
S# SNAME P# QTY
S1 Acme Supply P1 100
S2 Gem Mfg P1 200
S1 Acme Supply P2 400
BCNF
• Relation is in BCNF if and only if the only determinants are candidate keys
• FDs– S# -> SNAME– SNAME -> S#– S#,P# -> QTY– SNAME, P# -> QTY
BCNF Normalization
S# P# QTY
S1 P1 100
S2 P1 200
S1 P2 400
S# SNAME
S1 Acme Supply
S2 Gem Mfg
S1 Acme Supply
Two candidate keys:• S#• SNAME