IS 230 Lecture 8 Slide 1
Normalization
Lecture 9
IS 230 Lecture 8 Slide 2IS 230 Lecture 8 Slide 2
Lecture 8: Normalization1. Normalization2. Data redundancy and anomalies3. Spurious information4. Functional dependencies5. Normalization
first normal form second normal form third normal form BCNF normal form
6. Normalization Methodology: Example
IS 230 Lecture 8 Slide 3IS 230 Lecture 8 Slide 3
1. Normalization A technique for producing a set of
relations with desirable properties, given the data requirements of the applications
IS 230 Lecture 8 Slide 4IS 230 Lecture 8 Slide 4
2. Data redundancy and anomalies
Emp-Dept relation
NI# Name DateOfBirth Dept# Dname Manager 21 AA - 5 CS 91 22 BB - 5 CS 91 23 CC - 6 TS 93 24 DD - 7 PSV 94 25 EE - 7 PSV 94
Insertion
How do we insert a new department with no employees yet? (keys?) Entering employees is difficult as department information must be
entered correctly.
Deletion What happens when we delete CC's data - do we lose department 6?
Modification If we change the manager of department 5, we must change it for
tuples with Dept# = 5.
IS 230 Lecture 8 Slide 5IS 230 Lecture 8 Slide 5
3. Spurious information Avoid breaking up relations in such a
way that spurious information is createdNI# Name ProjName ProjLocation 123 XX Accounts London 123 XX Analysis Paris 124 YY PI London
may be broken into:
Joining them back together, we get NEW TUPLES!
NI# Name ProjName 123 XX Accounts 123 XX Analysis 124 YY PI
NI# Name ProjName ProjLocation 123 XX Accounts Paris 123 XX Analysis London
NI# ProjLocation 123 London 123 Paris 124 London
IS 230 Lecture 8 Slide 6IS 230 Lecture 8 Slide 6
4. Functional dependencies Formal concepts that may be used to
exhibit “goodness” and “badness” of individual relational schemas, and describe relationships between attributes
Examples of Functional Dependency
Name, DateOfBirth, Dept all depend on NI
Dname & Manager depend on Dept
ProjLocation depends on ProjName.
IS 230 Lecture 8 Slide 7IS 230 Lecture 8 Slide 7
Functional Dependence An attribute, X, of a relation is functionally
dependent on attributes A, B, ..., N if the same values of A, ..., N are always associated with the same value of X
{A,…,N} X
A, ..., N is called the determinant of the functional dependency
IS 230 Lecture 8 Slide 8IS 230 Lecture 8 Slide 8
Full Functional Dependency X fully depends on A, …, N if it is
not dependent on any subset of A, ..., N; otherwise we talk of partial dependency
e.g. age is dependent on NI and Name but only fully dependent on NI.
IS 230 Lecture 8 Slide 9IS 230 Lecture 8 Slide 9
5. Normalization Process
taking a set of relations and decomposing them into more relations satisfying some criteria.
Decomposition essentially a series of projections so that the
original data can be reconstituted using joins.
Normal form form of the relations which satisfy the
criteria
IS 230 Lecture 8 Slide 10IS 230 Lecture 8 Slide 10
5.1. First Normal Form (1)
A relation is in first normal form (1NF) if all values are atomic, i.e. single values - small strings and numbers.
Identify and remove repeating groups (multi-valued attributes)
IS 230 Lecture 8 Slide 11IS 230 Lecture 8 Slide 11
First Normal Form (2) DEPARTMENT
Dnumber Dname Locations 5 C.S. { Paris, London }
Two ways of normalising this:
Have a tuple for each location of each department:
Have a separate relation for (Dnumber, Locations) pairs:
The latter is better as it avoids redundancy.
Dnumber Dname Locations 5 C.S. Paris 5 C.S. London
Dnumber Dname 5 C.S
Dnumber Locations 5 Paris 5 London
IS 230 Lecture 8 Slide 12IS 230 Lecture 8 Slide 12
First Normal Form (3) Example of composite attribute:
Room is not atomic Room includes Deptname and
Room#
TeacherID Course Room 23568 230 CS234 23669 240 IS225 25644 352 IS232 26338 455 CE255
TeacherID Course Room# 23568 230 234 23669 240 225 25644 352 232 26338 455 255
Room# Deptname 234 CS 225 IS 232 IS 255 CE
IS 230 Lecture 8 Slide 13IS 230 Lecture 8 Slide 13
5.2. Second Normal Form By the definition of the primary key, every other attribute
is functionally dependent on it.
If all the other attributes are fully functionally dependent then the relation is in Second Normal Form (2NF).
Clearly, any relation with a single primary key will be 2NF.
If there are two primary key attributes, A & B, then each other attribute is either
dependent on A alone; dependent on B alone; or dependent on both.
2NF Normal consists of creating a separate relation for each of the three cases.
IS 230 Lecture 8 Slide 14IS 230 Lecture 8 Slide 14
Example of 2NF decomposition
EMP_PROJ(SSN, Pnumber, Hours, Ename, Pname, Ploc)
SSN Pnumber
Hours
Ename Pname Ploc
Decomposition into three 2NF relations:
Work(SSN,Pnumber,Hours)EMP(SSN,Ename)Project(Pnumber,Pname,Ploc)
SSN EnamePnumber Pname,PlocSSN, Pnumber Hours
IS 230 Lecture 8 Slide 15IS 230 Lecture 8 Slide 15
5.3. Third Normal Form Third Normal Form eliminates transitive dependencies - i.e.
those dependencies which hold only because of some intermediary. An attribute is transitively dependent on the primary key if there is
some other attribute which is dependent on and which is, in turn, dependent on the key.
NI# Dnumber Dname 1234 5 C.S. 1235 5 C.S.
Dname is dependent on NI, but only because it is dependent on Dnumber which is, in turn, dependent on NI.
Non-3NF relations are likely to hold redundant information. A relation is in 3NF if for any pair of attribute A & B such that A
B, there is no attribute such that A X and X B.
NI# Dnumber 1234 5 1235 5
Dnumber Dname 5 C.S.
Normalizing this would create:
IS 230 Lecture 8 Slide 16IS 230 Lecture 8 Slide 16
Example of 3NF decomposition EMP_DEPT(SSN, Ename, Bdate, Address, Dnum, Dname, Dman)
Ename Bdate Address Dnum
Dname Dman
SSN
Decomposition into two 3NF relations:
EMPLOYEE(SSN, Ename, Bdate, Address, Dnum)DEPT(Dnum, Dname, Dman)
SSN Ename, Bdate, Address, DnumDnum Dname, Dman
IS 230 Lecture 8 Slide 17IS 230 Lecture 8 Slide 17
General Definitions of Normal Forms
IS 230 Lecture 8 Slide 18IS 230 Lecture 8 Slide 18
5.4. Boyce-Codd Normal Form Every relation in BCNF is also in 3NF
Relation in 3NF is not necessarily in BCNF
Nontrivial FD means not trivial FD A trivial functional dependency X Y
is one in which Y is a subset of X Example: A, B B is a trivial FD
Most relation schemas that are in 3NF are also in BCNF
IS 230 Lecture 8 Slide 19IS 230 Lecture 8 Slide 19
Example of not BCNF The following is not in BCNF:
bor_loan = (customer_id, loan_number, amount)
The following is a functional dependency that may hold:
loan_numberamount but loan_number is not a superkey of bor_loan
We decompose into two relations: R1=(customer_id, loan_number) R2=(loan_number, amount) R1 and R2 are in BCNF
IS 230 Lecture 8 Slide 20IS 230 Lecture 8 Slide 20
R=(A, B, C, D, E, F) A, B D B E (not in BCNF) D F (not in 3NF)
IS 230 Lecture 8 Slide 21IS 230 Lecture 8 Slide 21
Normalization Methodology: Example
Consider the following description of a company: The company is divided into departments. Each department is identified by its department number. A department has a name and a manager (an employee). The employees of the company are identified by their National Insurance number. An employee has a name, an address, an age, and work in one department only. An employee is supervised by several supervisors (employees), and a supervisor can supervise several employees (supervisees). An employee can have dependents, where each dependent is described by his/her name and age. The company has a number of running projects. A project is identified by its project number. A project has also a name and a description. Several employees can work on a project, and an employee can work on several projects, each a fixed number of hours.
IS 230 Lecture 8 Slide 22IS 230 Lecture 8 Slide 22
A B1 41 53 7
1. Functional dependencies An attribute, X, of a relation is
functionally dependent on attributes A, B, ..., N if the same values of A, ..., N are always associated with the same value of X
{A,…,N} X
Example
A B does NOT hold, B A does hold
IS 230 Lecture 8 Slide 23IS 230 Lecture 8 Slide 23
1. Functional dependencies (Cont.)
The company is divided into departments. Each department is identified by its department number. A department has a name and a manager (an employee).
Dnumber Dname, Manager
The employees of the company are identified by their National Insurance number. An employee has a name, an address, an age, and work in one department only.
NI Ename, Address, Eage, Dnumber
IS 230 Lecture 8 Slide 24IS 230 Lecture 8 Slide 24
1. Functional dependencies (Cont.)
An employee is supervised by several supervisors (employees), and a supervisor can supervise several employees (supervisees).
NI Supervisor (is not a valid functional dependency since an employee can have several supervisors)
An employee can have dependents, where each dependent is described by his/her name and age.
NI, DepName DepAge
IS 230 Lecture 8 Slide 25IS 230 Lecture 8 Slide 25
1. Functional dependencies (Cont.)
The company has a number of running projects. A project is identified by its project number. A project has also a name and a description.
Pno Pname, Description
Several employees can work on a project, and an employee can work on several projects, each a fixed number of hours.
Pno, NI Hours
IS 230 Lecture 8 Slide 26IS 230 Lecture 8 Slide 26
1. All Functional dependencies
Dnumber Dname, ManagerNI Ename, Address, Eage, DnumberNI, DepName DepAgePno Pname, Description Pno, NI Hours
The Universal Relation:U(Dnumber, Dname, Manager, NI, Ename, Address, Eage, Pno, Pname, Description, DepName, DepAge, Hours, Supervisor)
IS 230 Lecture 8 Slide 27IS 230 Lecture 8 Slide 27
2. The Primary key
What is the primary key?(Dnumber, NI, Pno, DepName)?
Supervisor must be part of the primary key because it is a multivalued attribute
DepName must be part of the primary key since there can be several dependents to an employee
Dnumber is not part of the primary key since it can be derived from NI
Thus the primary key is (NI, Pno, DepName, Supervisor)
IS 230 Lecture 8 Slide 28IS 230 Lecture 8 Slide 28
3. Is U in 1NF?
A relation is in first normal form (1NF) if all values are atomic, i.e. single values
Supervisor is a multivalued attribute, hence U is not in 1NF
To remove the multivalued attribute, the relation U is decomposed as follows
U1(Dnumber, Dname, Manager, NI, Ename, Address, Eage, Pno, Pname, Description, DepName, DepAge, Hours)
Supervise(NI, Supervisor)
IS 230 Lecture 8 Slide 29IS 230 Lecture 8 Slide 29
4. Is U in 2NF? A relation is in Second Normal Form (2NF) if all the attributes
are fully functionally dependent on the primary key. A relation with a single primary key is in 2NF.
Supervise is in 2NF U1 is not in 2NF because some attributes are partially
functionally dependent on the primary key (e.g. DepAge, Ename, etc).
We decompose U1 into the following 2NF relations:
Dept_Emp(Dnumber, Dname, Manager, NI, Ename, Address, Eage)
Dependent(NI, DepName, DepAge) Project(Pno, Pname, Description) Work(NI, Pno, Hours)
IS 230 Lecture 8 Slide 30IS 230 Lecture 8 Slide 30
5. Is the previous decomposition in 3NF? A relation is in 3NF if for any pair of attributes A
& B such that A B, there is no attribute such that A X and X B
Supervise is in 3NF Among the above relations, only Dept_Emp is
not in 3NF, since it has transitive dependencies (e.g. NI Dnumber Dname).
We therefore decompose the relation into the following two relations, which are in 3NF:
Department(Dnumber, Dname, Manager)
Employee(NI, Ename, Address, Eage, Dnumber)
IS 230 Lecture 8 Slide 31IS 230 Lecture 8 Slide 31
6. Complete Schema in 3NF
Department(Dnumber, Dname, Manager) Employee(NI, Ename, Address, Eage,Dnumber) Dependent(NI, DepName, DepAge) Project(Pno, Pname, Description) Work(NI, Pno, Hours) Supervise(NI, Supervisor)
IS 230 Lecture 8 Slide 32
End of Chapter