Upload
anup-kumar
View
107
Download
0
Embed Size (px)
DESCRIPTION
normalization
Citation preview
Relational Database Design Relational Database Design
BCNF, 3NFBCNF, 3NF
2
Learning ObjectivesLearning Objectives
Understand the rationale (anomalies) and definition of the main normal
forms based on functional dependencies (2NF, 3NF and BCNF)
Be able to decompose (or synthesize) a schema into a dependency
preserving BCNF or 3NF.
3
Anomalies: ExampleAnomalies: Example
first_name last_name address department position salary
Dewi Srijaya 12a Jln Lempeng Toys clerk 2000
Izabel Leong 10 Outram Park Sports trainee 1200
John Smith 107 Clementi Rd Toys clerk 2000
Axel Bayer 55 Cuscaden Rd Sports trainee 1200
Winny Lee 10 West Coast Rd Sports manager 2500
Sylvia Tok 22 East Coast Lane Toys manager 2600
Eric Wei 100 Jurong drive Toys assistant manager 2200
? ? ? ? security guard 1500
Redundant storage
Update anomaly
Potential deletion anomaly
Insertion anomaly
Assume the position determines the salary:
position → salary
key
T1
4
Decomposition ExampleDecomposition Example
first_name last_name address department position
Dewi Srijaya 12a Jln lempeng Toys clerk
Izabel Leong 10 Outram Park Sports trainee
John Smith 107 Clementi Rd Toys clerk
Axel Bayer 55 Cuscaden Rd Sports trainee
Winny Lee 10 West Coast Rd Sports manager
Sylvia Tok 22 East Coast Lane Toys manager
Eric Wei 100 Jurong drive Toys assistant manager
position salary
clerk 2000
trainee 1200
manager 2500
assistant manager 2200
security guard 1500
T2
T3
�No Redundant storage
�No Update anomaly
�No Deletion anomaly
�No Insertion anomaly
5
NormalizationNormalization
Normalization is the process of decomposing a relation schema R into fragments (i.e., smaller tables) R1, R2,.., Rn. Our goals are:
Lossless decomposition: The fragments should contain the same information as the original table. Otherwise decomposition results in information loss.
Dependency preservation: Dependencies should be preserved within each Ri
, i.e., otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive.
Good form: The fragments Ri should not involve redundancy. Roughly speaking, a table has redundancy if there is a FD where the LHS is not a key (more on this later).
6
A relation is in a particular normal form if it satisfies certain normalization
properties.
There are several normal forms defined:
1NF - First Normal Form
2NF - Second Normal Form
3NF - Third Normal Form
BCNF - Boyce-Codd Normal Form
4NF - Fourth Normal Form
5NF - Fifth Normal Form
Each of these normal forms are stricter than the next.
For example, 3NF is better than 2NF because it removes more
redundancy/anomalies from the schema than 2NF.
7
Types of NFTypes of NF
8
Lossless Join DecompositionLossless Join Decomposition
The decomposition is lossless ( lossless join) if we can recover the initial table by performing an outer join of the fragments.
In general a decomposition of R into R1 and R2 is lossless if and only if at least one of the following dependencies is in F+:
R1 ∩ R2 → R1
R1 ∩ R2 → R2
In other words, the common attribute of R1 and R2 must be a candidate key for R1 or R2. In our example, the decomposition is lossless because position is a key for T3.
9
Example of a Example of a LossyLossy DecompositionDecomposition
Decompose R = (A,B,C) into R1 = (A,B) and R2 = (B,C)
It is a lossy decomposition:
two extraneous tuples.
You get more, not less!!
ΠΑ,B(r) ΠB,C(r)A B
α 1
α 2
β 1
B C
1 m
2 n
1 p
r A B C
α 1 m
α 2 n
β 1 p
A B C
α 1 m
α 2 n
β 1 p
α 1 p
β 1 m
∏A,B(R) ∏B,C(R)
10
Dependency Preserving DecompositionDependency Preserving Decomposition
The decomposition of a relation scheme R with FDs F is a set of tables
(fragments) Ri with FDs Fi
Fi is the subset of dependencies in F+ (the closure of F) that include only
attributes in Ri.
The decomposition is dependency preserving if and only if
(∪i Fi)+ = F+
11
NonNon--Dependency Preserving Dependency Preserving
Decomposition ExampleDecomposition Example
R = (A, B, C), F = {{A}→{B}, {B}→{C}, {A}→{C}}. Key: A
There is a dependency {B}→ {C}, where the LHS is not the key, meaning that there can be considerable
redundancy in R.
Solution: Break it in two tables R1(A,B), R2(A,C) (normalization)
424
323
322
321
CBA
The decomposition is lossless because the common attribute A is a key for R1 (and R2)
The decomposition is not dependency preserving because F1={{A}→{B}}, F2={{A}→{C}} and
(F1∪F2)+≠F+
We lost the FD {B}→{C}.
In practical terms, each FD is implemented as a constraint or assertion, which it is checked when
there are updates. In the above example, in order to find violations, we have to join R1 and R2.
Can be very expensive.
44
33
32
31
CA
24
23
22
21
BA
12
Dependency Preserving Dependency Preserving
Decomposition ExampleDecomposition Example
R = (A, B, C), F = {{A}→{B}, {B}→{C}, {A}→{C}}. Key: A
Break R in two tables R1(A,B), R2(B,C)
424
323
322
321
CBA
24
23
22
21
BA
32
42
CB
The decomposition is lossless because the common attribute B is a key for R2
The decomposition is dependency preserving because F1={{A}→{B}}, F2={{B}→{C}} and
(F1∪F2)+=F+
Violations can be found by inspecting the individual tables, without performing a join.
13
Looking for a Looking for a ““GoodGood”” FormForm
Recall that our goals are
Lossless decomposition - necessary in order to ensure correctness of the data
Dependency preservation – not necessary, but desirable in order to achieve efficiency of updates
Good form – in order to avoid redundancy.
But what it means for a table to be in good form?
First Normal Form (1NF).
If the domains of all attributes in a table contain only atomic values, then the table is in
In other words, there are no nested tables, multi-valued attributes, or complex structures such as lists.
Relational tables are always in 1NF, according to the definition of the relational model.
14
1 NF( 1 NF( contdcontd……))
A relation is in first normal form (1NF) if all its attribute values are
atomic.
That is, a 1NF relation cannot have an attribute value that is:
a set of values (multi-valued attribute)
a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs.
However, object-oriented DBMSs and nested relational DBMSs relax this
constraint.
A relation that is not in 1NF is an unnormalized relation.
15
INF( INF( ContdContd……))
Two ways to convert a non-1NF relation to a 1NF relation:
1) Splitting Method - Divide the existing relation into two relations: non-
repeating attributes and repeating attributes.
FMake a relation consisting of the primary key of the original relation
and the repeating attributes. Determine a primary key for this new
relation.
FRemove the repeating attributes from the original relation.
2) Flattening Method - Create new tuples for the repeating data combined
with the data that does not repeat.
FIntroduces redundancy that will be later removed by normalization.
FDetermine primary key for this flattened relation.
16
INF(ContdINF(Contd……))
Converting a non-1NF Relation to 1NF Using Splitting
17
INF(ContdINF(Contd……))
Converting a non-1NF Relation to 1NF Using Flattening
18
Second Normal Form (Second Normal Form (2NF2NF))Not permit Partial FDs
R is a relation schema, with the set F of FDs
R is in 2NF if and only if
for each FD: X → {A} in F+
Then
• A ∈ X (the FD is trivial), or
• Either X is not a proper subset of a candidate key for R, or
• If X is Proper sub set then A must be a prime attribute
• R is in 2NF if it is in INF and if all non prime attributes are fully functionally
depend on the ralation key(s).
• Not permits partial dependency between a nonprime attributes and KEYs
A prime attribute is an attribute that is part of a candidate key
In 2NF, a subset of a candidate key cannot determine a non-prime
attribute.
HINT: whenever you try to determine the normal form (2NF, 3NF, BCNF)
of a table, you always have to find all candidate keys.
19
2NF(Contd2NF(Contd…….).)
A relation is in second normal form (2NF) if it is in 1NF and every non-
primary key (non-prime) attribute is fully functionally dependent on the
primary key.
Alternative definition from your text: every nonkey column depends on
all candidate keys, not a subset of any candidate key
Violations:
Part of key -> nonkey
Violations only for combined keys
Note: By definition, any relation with a single primary key attribute is
always in 2NF.
If a relation is not in 2NF, we will divide it into separate relations each in 2NF
by insuring that the primary key of each new relation functionally
determines all the attributes in the relation.
20
2NF Example2NF Example--11
Consider the relation scheme {A,B,C,D} with the FDs:
{A,B} → {C,D} and
{A} → {D}
{A,B} is a candidate key (it is not a proper subset)
{A} is a proper subset of a candidate key
{D} is not a prime attribute
This scheme is not in 2NF because of {A} → {D}
2NF is not important because we can always achieve a better
form (3NF) that is lossless, preserves dependencies and
contains less redundancy.
21
2NF Example2NF Example--22
fd1 and fd4 are partial functional dependencies. Normalize to:
Emp (eno, ename, title, bdate, salary, supereno, dno)
WorksOn (eno, pno, resp, hours)
Proj (pno, pname, budget)
22
2NF Example2NF Example--2 2 contdcontd……....
23
Third Normal Form (Third Normal Form (3NF3NF))NOTE : 3NF does not permit partial FD and Transitive FD
R is a relation schema, with the set F of FDs
R is in 3NF if and only if
for each FD: X → {A} in F+
Then
• A ∈ X (trivial FD), or
• X is a superkey for R, or
• A is prime attribute for R
In words: For every FD that does not contain extraneous (useless)
attributes:
the LHS is a candidate key, or
the RHS is a prime attribute, i.e., it is an attribute that is part of a candidate
key
24
Third Normal Form (Third Normal Form (3NF3NF) ) contdcontd……..
Third normal form (3NF) is based on the notion of transitive dependency. A
transitive dependency A → C is a FD that can be inferred from existing
FDs A → B and B → C.
Note that a transitive dependency may involve more than 2 FDs.
A relation is in third normal form (3NF) if it is in 2NF and there is no non-
primary key (non-prime) attribute that is transitively dependent on the
primary key.
Alternate definition from your text: A table is in 3NF if it is in 2NF and each
nonkey column depends only on candidate keys, not on other nonkey
columns
Violations: Nonkey→ Nonkey
Converting a relation to 3NF from 2NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the
transitively dependent attributes from the relation and put them in a new
relation along with a copy of the determinant (LHS of FD).
25
3NF Example3NF Example
R = (B, C, E)
F = {{E}→{B}, {B,C}→{E}}
Remember that you always have to find all candidate keys in order to
determine the normal form of a table
Two candidate keys: BC and EC
{E}→{B} B is prime attribute ( Here E is a proper subset but B is prime
attribute so Allowed)
{B,C}→{E} BC is a candidate key ( Allowed)
None of the FDs violates the rules of the previous slide. Therefore, R is in
3NF
26
Redundancy in 3NFRedundancy in 3NF
Bank-schema = (Branch B, Customer C, Employee E) .Two candidate keys: BC and ECF = {{E}→{B}, e.g., an employee works in a single branch{B,C}→{E}}, e.g., when a customer goes to a certain branch s/he is always served by the same employee
ChengnullCentral
JonesWongCentral
AuChinHKUST
AuWongHKUST
EmployeeCustomerBranch
A 3NF table still has problems
� redundancy (e.g., we repeat that Au works at HKUST branch)
� need to use null values (e.g., to represent that Cheng works at Central even though he is not
assigned any customers).
27
3NF Example3NF Example--22
fd2 results in a transitive dependency eno → salary. Remove it.
28
3NF 3NF ContdContd……..
A relation schema R is in 3NF if for all functional
dependencies that hold on R of the form X →Y, at least
one of the following holds:
Y is a prime attribute of R
X is a superkey of R
The last condition deals with transitive dependencies. Since
X is a superkey of R, we cannot have a non-prime attribute
(alone) for X and hence we cannot have transitive
dependencies.
29
General Definitions of 2NF and 3NFGeneral Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys.
However, a more general definition considers all candidate
keys (just not the primary key we have chosen).
General definition of 2NF:
A relation is in 2NF if it is in 1NF and every non-prime
attribute is fully functionally dependent on any
candidate key.
General definition of 3NF:
A relation is in 3NF if it is in 2NF and there is no non-
prime attribute that is transitively dependent on any
candidate key.
Note that a prime attribute is an attribute that is in any key
(candidate or primary).
30
General Definition of 3NF ExampleGeneral Definition of 3NF Example
The relation is not in 3NF according to the basic definition
because SSN is not a primary key attribute.
However, there is nothing wrong with this schema (no
anomalies) because the SSN is a candidate key and any
attributes fully functionally dependent on the primary key
will also be fully functionally dependent on the candidate
key.
Thus, the general definition of 2NF and 3NF includes all
candidate keys instead of just the primary key
31
Normalization QuestionNormalization Question
Consider the universal relation R(A,B,C,D,E,F,G,H,I,J) and
the set of functional dependencies:
F= { A,B → C ; A → D,E ; B → F ; F → G,H ; D → I,J }
List the keys for R.
Decompose R into 2NF and then 3NF relations.
32
BoyceBoyce--CoddCodd Normal Form (Normal Form (BCNFBCNF))
R is a relation schema, with the set F of FDs
R is in BCNF if and only if
for each FD: X → {A} in F+
Then
• A ∈ X (trivial FD), or
• X is a superkey for R
In words: For every FD that does not contain extraneous (useless) attributes, the LHS of every FD is a candidate key.
BCNF tables have no redundancy.
If a table is in BCNF it is also in 3NF (and 2NF and 1NF)
33
BoyceBoyce--CoddCodd Normal Form (BCNF)Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key.
To test if a relation is in BCNF, we take the determinant of each FD in the relation and determine if it is a candidate
key.Special cases not covered by 3NF
1. Part of key → Part of key
2. Nonkey→ Part of key
Special cases are not common
The difference between 3NF and BCNF is that 3NF allows a FD X → Y to remain in the relation if X is a superkey or Y is a prime attribute. BCNF only allows this FD if X is a superkey.
Thus, BCNF is more restrictive than 3NF. However, in practice most relations in 3NF are also in BCNF.
34
BCNF ExampleBCNF Example
R = (B, C, E)F = {{E}→{B}, {B,C}→{E}}
Two candidate keys: BC and EC
{B,C}→{E} does not violate BCNF because BC is a key
{E}→{B} violates BCNF because E is not a key because in BCNF LHS of every FDsmust be a KEY(s)
In order to achieve BCNF we have to decompose the table but how?
Since the decomposition must be lossless, we only have one option: R1(B,E), and R2(C,E). The common attribute E should be key of one fragment, here R1.
35
BCNF Example (cont)BCNF Example (cont)
Bank-schema = (Branch B, Customer C, Employee E)
F = {{E}→{B}, {B,C}→{E}} . Decompose into R1(B,E), and R2(C,E)
ChengnullCentral
JonesWongCentral
AuChinHKUST
AuWongHKUST
EmployeeCustomerBranch
JonesCentral
ChengCentral
AuHKUST
EmployeeBranch
JonesWong
AuChin
AuWong
EmployeeCustomer
We have avoided the problems of redundancy and null values
of 3NF
36
BCNF Example (cont)BCNF Example (cont)
We can generate the original table by joining the two fragments
(however, but we must use an outer join -an outer join fills null values for tuples that do not have join
partners)
ChengnullCentral
JonesWongCentral
AuChinHKUST
AuWongHKUST
Empl.Cust.Branch
JonesCentral
ChengCentral
AuHKUST
EmployeeBranch
JonesWong
AuChin
AuWong
EmployeeCustomer
Is the decomposition dependency preserving?
No. We loose {B,C}→{E}
Can we have a dependency preserving decomposition?
No. No matter how we break we loose {B,C}→{E} since it involves all attributes
=
37
BCNFBCNF--ExampleExample--22
Consider the WorksOn relation where we have the added constraint that given the hours worked, we know exactly the employee who performed the work. (i.e. each employee is FD from the hours that they work on projects). Then:
Note that we lose the FD eno,pno → resp, hours.
38
Observations about BCNFObservations about BCNF
1. Best Normal Form
2. Avoids the problems of redundancy and all anomalies
3. There is always a lossless decomposition that generates
BCNF tables
4. However, we may not be able to preserve all
dependencies
Next step: an algorithm for automatically generating BCNF
tables.
39
BCNF versus 3NFBCNF versus 3NF
1. We can decompose to BCNF but sometimes we do not
want to if we lose a FD.
2. The decision to use 3NF or BCNF depends on the amount
of redundancy we are willing to accept and the willingness
to lose a functional dependency.
3. Note that we can always preserve the lossless-join
property (recovery) with a BCNF decomposition, but we do
no always get dependency preservation.
4. In contrast, we get both recovery and dependency
preservation with a 3NF decomposition.
40
Algorithm for BCNF Decomposition Algorithm for BCNF Decomposition
Let R be the initial table with FDs F
S={R}
Until all relation schemes in S are in BCNFfor each R in S
for each FD X → Y that violates BCNF for R
S = (S – {R}) ∪ (R-Y) ∪ (X,Y)
enduntil
This is a simplified version. In words:
When we find a table R with BCNF violation X→Y we:
1] Remove R from S
2] Add a table that has the same attributes as R except for Y
3] Add a second table that contains the attributes in X and Y
41
BCNF Decomposition Example BCNF Decomposition Example
Let us consider the relation scheme R=(A,B,C,D,E) and the FDs:{A} → {B,E}, {C} → {D}
Candidate key: AC
Both functional dependencies violate BCNF because the LHS is not a candidate key
Pick {A} → {B,E}
We can also choose {C} → {D} – different choices lead to different decompositions.
(A,B,C,D,E) generates R1=(A,C,D) and R2=(A,B,E)
Do we need to decompose further?
42
BCNF Decomposition Example (cont)BCNF Decomposition Example (cont)
(A,C,D) and (A,B,E)
{A}→{B,E}, {C}→{D}
We need to decompose R1=(A,C,D) because of the FD {C}→{D}
Thus (A,C,D) is replaced with R3=(A,C) and R4=(C,D).
Final decomposition: R2=(A,B,E), R3=(A,C), R4=(C,D)
Is the decomposition lossless?
Yes the algorithm always creates lossless decompositions. In step S = (S – {R}) ∪ (R-
Y) ∪ (X,Y) we replace R with tables (R-Y) and (X,Y) that have X as the common
attribute and X→Y, i.e., X is the key of (X,Y)
Is the decomposition dependency preserving?
Yes because F2={{A}→{B,E}}, F3=∅, F4={{C}→{D}} and (F2∪F3∪F4)+=F+
But remember: sometimes we may not be able to preserve dependencies
43
Testing if a FD violates BCNFTesting if a FD violates BCNF
Important question: which dependencies to check for BCNF violations? F or F+?
Answer-Part 1: To check if a table R with a given set of FDs F is in BCNF, it suffices to check only the dependencies in F
Consider R (A, B, C, D), with F = {{A}→{B}, {B}→{C}}
The key is {A,D}.
R violates BCNF because the LHS of both {A}→{B} and {B}→{C}. Neither A nor B is a key.
We can see that by simply using F - we do not need F+ (e.g., we do not need to check the implicit FD {A}→{C})
We can show that if none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.
44
Testing if a FD violates BCNF (cont)Testing if a FD violates BCNF (cont)
Answer-Part 2: However, using only F is insufficient when testing a fragment in the decomposition of R
Consider again R(A,B,C,D), with F = {{A}→{B}, {B}→{C}} that violates BCNF
Decompose R into and R1(A,C,D) and R2(A,B)
There is no FD in F that contains only attributes from R1(A,C,D) so we might be mislead into thinking that R1 is in BCNF.
In fact, dependency {A}→{C} in F+ shows that R1 is not in BCNF.
Therefore, for the decomposed relations we also need to consider dependencies in F+(see next slide).
45
Testing if a FD violates BCNF (cont)Testing if a FD violates BCNF (cont)
To check if a relation Ri in a decomposition of R is in BCNF,
Either test Ri for BCNF with respect to the restriction of F+ to Ri (that is, all FDs in F+ that contain only attributes from Ri)
or use the the following test:
1. for every set of attributes X ⊆ Ri, check that X+ either includes no attribute of Ri-X,( Ex. BC->B, BC is a KEY) or includes all attributes of Ri . (Means X must be a Candidate key)
2. If the condition is violated, the dependency X → (X+ - X ) ∩ Ri holds on Ri, and Ri violates BCNF.
We use above dependency ( in BCNF Algorithm) to decompose Ri
Note: we have seen how to compute X+ in the previous class about FDs.
46
Testing if a FD violates BCNF Testing if a FD violates BCNF -- ExampleExample
Consider again: R(A,B,C,D), F = {{A}→{B}, {B}→{C}} and the decomposition R1(A,C,D) and R2(A,B)
A+={A,B,C}, B+={B,C}, C+={C}
R2(A,B) is in BCNF because
A+∩R2 ={A,B,C}∩{A,B}={A,B} includes all attributes of R2
B+∩R2 ={B,C}∩{A,B}={B} includes no attributes of R2-{B}
In other words, each attribute (e.g., A) determines everything (it is a key) or nothing (e.g., B).
R1(A,C,D) is not in BCNF because
A+∩R1 = {A,B,C}∩{A,C,D}={A,C} does not include all attributes of R1
Therefore, the dependency {A}→{C} causes a BCNF violation and will be used for further
decomposing R1
Final decomposition: R2(A,B), R3(A,D), R4(A,C)
47
Conversion to BCNFConversion to BCNF
There is a direct algorithm for converting to BCNF without goingthrough 2NF and 3NF given relation R with FDs F:
1. Eliminate extraneous columns from the LHSs
2. Remove derived FDs
3. Arrange the FDs into groups with each group having the same determinant.
4. For each FD group, make a table with the determinant as the primary key.
5. Merge tables in which one table contains all columns of the other table.
48
Different BCNF DecompositionsDifferent BCNF Decompositions
The different possible orders in which we consider FDs violating BCNF in the algorithm may lead to
different decompositions
Previous example: R(A,B,C,D), F = {{A}→{B}, {B}→{C}}
Previous BCNF decomposition: R2(A,B), R3(A,D), R4(A,C)
Question: is the decomposition dependency preserving?
Answer: No – we lost the dependency {B}→{C}
Question: Can you obtain a dependency preserving decomposition?
Answer: Yes – in the first decomposition we first applied violation {A}→{B}. If, instead, we apply {B}→{C}
we obtain:
R1=(A,B,D) and R2=(B,C)
We decompose R1=(A,B,D) further using {A} → {B} to obtain:
R3=(A,D) and R4=(A,B)
The final decomposition R2=(B,C), R3=(A,D), R4=(A,B) is dependency preserving.
49
Third Normal Form: MotivationThird Normal Form: Motivation
We can always obtain a lossless join decomposition in BCNF using the previous
algorithm.
However, there are some situations where
there does not exist a dependency preserving BCNF decomposition, and
efficient checking for FD violation on updates is important
Solution: use the weaker Third Normal Form (3NF).
Allows some redundancy (with related problems)
But FDs can be checked on individual relations without computing a join.
There is always a lossless-join, dependency-preserving decomposition into 3NF. see next
algorithm
50
Algorithm for 3NF Synthesis Algorithm for 3NF Synthesis
Let R be the initial table with FDs F
Compute the canonical cover Fc of F
S=∅
for each FD X→Y in the canonical cover Fc
S=S∪(X,Y)
if no scheme contains a candidate key for R
Choose any candidate key CN
S=S ∪ table with attributes of CN
Note: unlike the BCNF algorithm where we break the original relation, in 3NF we
synthesize the tables using the FDs in the canonical cover
51
3NF Example3NF Example
Bank=(branch-name, customer-name, banker-name, office-number)
Functional dependencies (also canonical cover):{banker-name}→{branch-name, office-number}{customer-name, branch-name}→{banker-name}
Candidate Keys: {customer-name, branch-name} or {customer-name, banker-name}
{banker-name}→{office-number} violates 3NF
3NF tables – for each FD in the canonical cover create a table
Banker = (banker-name, branch-name, office-number)
Customer-Branch = (customer-name, branch-name, banker-name)
Since Customer-Branch contains a candidate key for Bank, we are done.
Question: is the decomposition lossless and dependency preserving?
Answer: Yes – all decompositions generated by this algorithm have these properties
52
BCNF versus 3NF ExampleBCNF versus 3NF Example
An example of not having dependency preservation with BCNF:
street,city → zipcode and zipcode → city
Two keys: {street,city} and {street, zipcode}
53
BCNF versus 3NF ExampleBCNF versus 3NF Example
Consider an example instance:
Join tuples with equal zipcodes:
Note that the decomposition did not allow us to enforce the constraint that
street,city → zipcode even though no FDs were violated in the
decomposed relations.
54
Normalization to BCNF QuestionNormalization to BCNF Question
Given this schema normalize into BCNF directly.
55
Normalization Question 2Normalization Question 2
Given this database schema normalize into BCNF.
New FD5 says that the size of the parcel of land determines what
county it is in.
56
Normalization to BCNF QuestionNormalization to BCNF Question
Given this schema normalize into BCNF:
R (courseNum, secNum, offeringDept, creditHours, courseLevel, instructorSSN,
semester, year, daysHours, roomNum, numStudents)
courseNum → offeringDept,creditHours, courseLevel
courseNum, secNum, semester, year → daysHours, roomNum, numStudents,
instructorSSN
roomNum, daysHours, semester, year → instructorSSN, courseNum, secNum
57
MultiMulti--Valued DependenciesValued Dependencies
A multi-valued dependency (MVD) occurs when two independent, multi-valued attributes are present in the schema.
A MVD occurs when two independent 1:N relationships are in the relational schema.
When these multi-valued attributes are flattened into a 1NF relation, we must have a tuple for every combination of the values in the two attributes.
It may seem strange why we would want to do this as it obviously increases the number of tuples and redundancy.
The reason is that since the two attributes are independent it does not make sense to store some combinations and not the others because all combinations are equally valid. By leaving out some combination, we are unintentionally favoring one combination over the other which should not be the case.
58
MultiMulti--Valued Dependencies ExampleValued Dependencies Example
Employee may:
- work on many projects
- be in many departments
59
MultiMulti--Valued Dependencies (Valued Dependencies (MVDsMVDs))
A multi-valued dependency (MVD) is a dependency between attributes A, B, C
in a relation such that for each value of A there is a set of values B and a set of
values C where the set of values B and C are independent of each other.
A MVD is denoted as A → → B and A → → C or abbreviated as A → → B | C.
A trivial MVD A → → B occurs when either:
B is a subset of A or
A B = R
60
MultiMulti--Valued Dependencies RulesValued Dependencies Rules
1) Every FD is a MVD.
If X →Y, then swapping Y ’s between two tuples that agree on X doesn’t
change the tuples.
Therefore, the “new” tuples are surely in the relation, and we know X → → Y.
2) Complementation: If X →→ Y, and Z is all the other attributes, then X
→→ Z.
Note that the splitting rule for FDs does not apply to MVDs.
61
Fourth Normal Form (4NF)Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies.
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies.
Formal definition: A relation schema R is in 4NF with respect to a set of dependencies F if, for every nontrivial multi-valued dependency X → → Y, X is a superkey of R.
If X → → Y is a 4NF violation for relation R, we can decompose R using the same technique as for BCNF:
XY is one of the decomposed relations.
All but Y – X is the other.
62
Fourth Normal Form (4NF) ExampleFourth Normal Form (4NF) Example
63
LosslessLossless--join Dependencyjoin Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation.
A lossless-join dependency is a property of decomposition which ensures that no spurious tuples are generated when relations are natural joined.
There are cases where it is necessary to decompose a relation into more than two relations to guarantee a lossless-join.
64
Fifth Normal Form (5NF)Fifth Normal Form (5NF)
Fifth normal form (5NF) is based on join dependencies.
A relation is in fifth normal form (5NF) if nad only if every nontrivial join dependency is implied by the superkeys of R.
A join dependency (JD) denoted by JD(R1, R2, …, Rn) on relational schema R specifies a constraint on the states r of R. The constraint states that every legal state r of R is equal to the join of its projections on R1, R2, …, Rn. That is for every such r we have:
ΠR1(r) ∗ ΠR2(r) ∗… ∗ ΠRn(r) = r
65
Fifth Normal Form (5NF) ExampleFifth Normal Form (5NF) Example
Consider a relation Supply (sname, partName, projName). Add the additional constraint that:
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j Then
supplier s also supplies part p to project j
66
Fifth Normal Form (5NF) ExampleFifth Normal Form (5NF) Example
Note: That only joining all three relations together will get you back to the original
relation. Joining any two will create spurious tuples!
Let R be in BCNF and let R have no composite keys. Then R is in 5NF
67
4NF and 5NF in Practice4NF and 5NF in Practice
In practice, 4NF and especially 5NF are rare.
4NF relations are easy to detect because of the many redundant tuples.
5NF are so rare than no one really cares about them in practice.
Further, it is hard to detect join dependencies in large-scale designs, so even if they do exist, they often go unnoticed.
The redundancy in 5NF is often tolerable.
The redundancy in 4NF is not acceptable, but good designs starting from conceptual models (such as ER modeling) will rarely produce a non-4NF schema.
68
Normalization GoalsNormalization Goals
Goal for a relational database design is:
BCNF.
Lossless join.
Dependency preservation.
If we cannot achieve this, we accept one of
Lack of dependency preservation in BCNF
Redundancy due to use of 3NF
Interestingly, SQL does not provide a direct way of specifying functional dependencies
other than superkeys.
Can specify FDs using assertions/triggers, but they are expensive to test
Normal forms are used to prevent anomalies and redundancy. However, just because successive
normal forms are better in reducing redundancy that does not mean they always have to be used.
For example, query execution time may increase because of normalization as more joins become
necessary to answer queries
69
ER Model and NormalizationER Model and Normalization
When an E-R diagram is carefully designed, the tables generated from the E-R diagram
should not need further normalization.
However, in a real (imperfect) design there can be FDs from non-key attributes of an
entity to other attributes of the entity
E.g. employee entity with attributes department-number and department-address, and
an FD department-number → department-address
Good design would have made department an entity
Normalization and ER modeling are two independent concepts.
• You can use ER modeling to produce an initial relational schema and then use
normalization to remove any remaining redundancies.
• If you are a good ER modeler, it is rare that much normalization will be
required.
• In theory, you can use normalization by itself. This would involve identifying all
attributes, giving them unique names, discovering all FDs and MVDs, then applying
the normalization algorithms.
Since this is a lot harder than ER modeling, most people do not do it.
70
Universal Relation ApproachUniversal Relation Approach
We start with a single universal relation and we decompose it using the FDs (no ER
diagrams)
Assume Loans(branch-name, loan-number, amount, customer-id, customer-name)
and FDs:
{loan-number} → {branch-name, amount, customer-id}
{customer-id} → {customer-name}
We apply existing decomposition algorithms to generate tables :
Loan(loan-number, branch-name, amount, customer-id)
Customer(customer-id,customer-name)
71
DenormalizationDenormalization for Performancefor Performance
May want to use non-normalized schema for performance
E.g. displaying customer-name along with loan-number and amount requires join of loan with
customer
Alternative 1: Use denormalized relation containing attributes of loan as well as customer with
all above attributes
faster lookup
Extra space and extra execution time for updates
extra coding work for programmer and possibility of error in extra code
Alternative 2: use a materialized view defined as
loan JOIN customer
Benefits and drawbacks same as above, except no extra coding work for programmer and avoids
possible errors
72
Other Design IssuesOther Design Issues
Some aspects of database design are not caught by normalization
Examples of bad database design, to be avoided:
Instead of earnings(company-id, year, amount), use
earnings-2000, earnings-2001, earnings-2002, etc., all on the schema (company-id, earnings).
Above are in BCNF, but make querying across years difficult and needs new table each year
company-year(company-id, earnings-2000, earnings-2001, earnings-2002)
Also in BCNF, but also makes querying across years difficult and requires new attribute each year.