Data Normalisation 2
Objectives
Data normalisation aims to derive record structures which avoid anomalies in
Insertion
Deletion
Modification
Accessing
Data normalisation ensures single valuedness of facts
Facts are represented in fields in keyed records
Data Normalisation 3
The Process of Normalisation
Usually three steps (in industry) giving rise to
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
In academia
Boyce -Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
At each step we consider relationships between an entity's attributes
These relationships are known as functional dependencies
Data Normalisation 4
Steps in Data Normalisation
UNORMALISED ENTITY
step1 ... remove repeating groups
1st NORMAL FORM
step2 ... remove partial dependencies
2nd NORMAL FORM
step3 ... remove indirect dependencies
3rd NORMAL FORM
step4 ... remove multi-dependencies
4th NORMAL FORM
step4 ..every determinate a key
BOYCE-CODD NORMAL FORM
Data Normalisation 5
Attributes - Identifiers
An entity identifier uniquely determines an occurence on the entity
A Superkey - a combination of attributes that uniquely identify
When more than one identifier exists we have Candidate
dentifiers (Keys) - minimal superkey
Primary Key - designated
Supplier# Supplier-name Supp-add
SUPPLIER
Data Normalisation 6
Attributes - Repeating Groups
When a group of attributes has multiple values then we say there is a
repeating group of attributes in the entity
COMPANY NAME ADDRESSBRANCH
NAMEBRANCH
ADDRESS
A123 ABC Ltd 100 High St ABC1 Manchester
ABC2 London
ABC3 Glasgow
(BRANCH_NAME, BRANCH_ADDRESS) is a repeating group
Data Normalisation 7
Functional Dependency
A B
PART-DESCRIPTIONPART#
A
B
C
B is functionally dependent on A if a value of A uniquely determines
a value of B
Data Normalisation 8
Functional Dependency
A -> B B is functionally dependent on A, A determines B
for all A that have the same value , have the same value of B
Functional Dependency is Trivial if satisfied by all tuples
ie A ->A
in general X -> Y is trivial if Y = X or is a subset
FDs are said to HOLD - when every possible attribute combination complies
FDs are said to be SATISFIED - when all stated attribute instances comply
Data Normalisation 10
Example
ORDER NUMBER
SUPPLIER NUMBER
ORDER DATE
DELIVERY DATE
500028
09/05/88
25/07/88
PART NO. PART-DESC QTY-ORD PRICE
O463 Hook 150 15.00
1492 Bolt 1000 10.00
3164 Spanner 10 5.00
TOTAL 30.00
1023
PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, (PART#, PART-
DESCRIPTION,QUANTITY-ORDERED, PRICE), TOTAL-PRICE)
Data Normalisation 11
First Normal Form
An entity type is in 1NF if there are no repeating groups of attribute types
Any un-normalised entity type is transformed to 1NF
Remove all repeating attribute groups
Repeating attribute groups become new entity types in their own right
The identifier of the original entity type must be an attribute (but not necessarily an identifier) of the derived entity type.
Data Normalisation 12
Example of First Normal Form
ORDER NUMBER
SUPPLIER NUMBER
ORDER DATE
DELIVERY DATE
500028
09/05/88
25/07/88
PART NO. PART-DESC QTY-ORD PRICE
O463 Hook 150 15.00
1492 Bolt 1000 10.00
3164 Spanner 10 5.00
TOTAL 30.00
1023
PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, (PART#, PART-
DESCRIPTION,QUANTITY-ORDERED, PRICE), TOTAL-
PRICE)
UN-NORMALISED ENTITY TYPE
Data Normalisation 13
Example in 1NF
PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, TOTAL-PRICE)
PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION,
QUANTITY-ORDERED, PRICE)
[NOTE: PART# ALONE DOES NOTE IDENTIFY PURCHASE-ITEM]
ENTITY TYPES IN 1NF
ORDER NUMBER
SUPPLIER NUMBER
ORDER DATE
DELIVERY DATE
500028
09/05/88
25/07/88
PART NO. PART-DESC QTY-ORD PRICE
O463 Hook 150 15.00
1492 Bolt 1000 10.00
3164 Spanner 10 5.00
TOTAL 30.00
1023
Data Normalisation 14
Example
STUDENT NUMBER
STUDENT NAME
STUDENT ADDRESS
COURSE NO COURSE TUTOR NAME TUTOR NO
S0843215
P. Smith
1, South Downs Hale
PM951 Computing T. Long 037428
S212 Biology S. Short 096524
REGISTRATION FORM
STUDENT (Student#, student-name, student-address)
ENROLMENT (Student#, Course#, course-title,tutor-name,tutor-staff#
Data Normalisation 15
Benefits from 1ST Normal Form
Any 'hidden' entities are identified
Process results in separation of different objects
BUT anomalies may still exist
PURCHASE-ITEM-1( ORDER#, PART#, PART-DESCRIPTION,QUANTITY-ORDERED, PRICE)
PART-DESCRIPTION appears on every PURCHASE-ITEM occurence.
This may result in anomalies when updating or deleting records
The problem in the example is that PART-DESCRIPTION is functionally dependent only on PART# (part of the identifier)
Data Normalisation 16
Second Normal Form
An enity type is in 2NF if it is in 1NF and each non identifying attribute depends upon the whole identifier
Any enity type in 1NF is transformed to 2NF
Identify functional dependencies
Re-write entity types so that each non-identifying attribute is functionally dependent on the whole of the identifier
Data Normalisation 17
Example
PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, TOTAL-PRICE)
PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION,
QUANTITY-ORDERED, PRICE)
ENTITY TYPES IN 1NF
ORDER NUMBER
SUPPLIER NUMBER
ORDER DATE
DELIVERY DATE
500028
09/05/88
25/07/88
PART NO. PART-DESC QTY-ORD PRICE
O463 Hook 150 15.00
1492 Bolt 1000 10.00
3164 Spanner 10 5.00
TOTAL 30.00
1023
Data Normalisation 18
Functional Dependencies
PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, TOTAL-PRICE)
PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION,QUANTITY-ORDERED, PRICE)
ORDER#
PART#
PART-
DESCRIPTION
QUANTITY-ORDERED
PRICE
Data Normalisation 19
In 2nd Normal Form
Decompose PURCHASE-ITEM into two entity types
PURCHASE-ITEM (Order#, Part#, Quantity-Ordered, Price)
PART (Part#, Part-Description)
Original enity type decomposed into three entity types in 2nd normal form
PURCHASE-ORDER (Order#,Supplier#, Order-Date, Delivery-Date, Total-Price)
PURCHASE-ITEM (Order#, Part#,Quantity-Ordered, Price)
PART (Part#, Part-Description)
Data Normalisation 20
Example in 2NF
STUDENT NUMBER
STUDENT NAME
STUDENT ADDRESS
COURSE NO COURSE TUTOR NAME TUTOR NO
S0843215
P. Smith
1, South Downs Hale
PM951 Computing T. Long 037428
S212 Biology S. Short 096524
REGISTRATION FORM
STUDENT (Student#,Student-Name, Student-Adderss)
ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#)
COURSE (Course#, Course-Title)
ENTITY TYPES IN 2NF
Data Normalisation 21
Third normal Form
An enity type is in 3NF if it is in 2NF and all non identifying attributes are independent
Any enity type in 2NF is transformed in 3NF
Determine functional dependencies between non identifying attributes
Decompose enity into new entities
Data Normalisation 22
Example
STUDENT NUMBER
STUDENT NAME
STUDENT ADDRESS
COURSE NO COURSE TUTOR NAME TUTOR NO
S0843215
P. Smith
1, South Downs Hale
PM951 Computing T. Long 037428
S212 Biology S. Short 096524
REGISTRATION FORM
STUDENT (Student#,Student-Name, Student-Adderss)
ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#)
COURSE (Course#,, Course-Title)
ENTITY TYPES IN 2NF
Data Normalisation 23
Functional Dependencies
STUDENT (Student#,Student-Name, Student-Adderss)ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#)COURSE (Course#,, Course-Title)
Student#
Course#
Tutor-staff#
Tutor-name
Data Normalisation 24
Example in 3NF
STUDENT (Student#,Student-Name, Student-Adderss)
ENROLMENT ( Student#, Course#, Tutor-Staff#)
COURSE (Course#,, Course-Title)
TUTOR (Tutor-Staff#, Tutor-Name)
STUDENT NUMBER
STUDENT NAME
STUDENT ADDRESS
COURSE NO COURSE TUTOR NAME TUTOR NO
S0843215
P. Smith
1, South Downs Hale
PM951 Computing T. Long 037428
S212 Biology S. Short 096524
REGISTRATION FORM
ENTITY TYPES IN 3NF
Data Normalisation 25
Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if every determinate is a candidate key
For a relation with only one candidate key , 3NF and BCNF are equivalent
Violation of BCNF is rare, may occur in a relation that :
contains two (or more) composite candidate keys and
which overlap, that is share at least one attribute in common
Data Normalisation 26
BCNF
Client_no InterviewDate
InterviewTime
Staff_no Room_no
CR76 13-May-95
13-May-95
13-May-95
10.30 SG5 G101
CR56 12.00 SG5 G101
CR74 12.00 SG37 G102
CR56 10-Jun-95 10.00 SG5 G102
The following FDs hold :Client_No,Interview_Date ->Interview_time,Staff_no,Room_noStaff_no,Interview_Date,Interview_time -> Client_noStaff_no,Interview_date -> Room_no
Client_no,Interview_date and Staff_no,Interview_date are composite candidate keys that share the common attribute Interview_date
CLIENT_INTERVIEW
Data Normalisation 27
BCNF
The relation CLIENT_INTERVIEW is in 3NF but not BCNF
To transform to BCNF:Remove the violating FD and create two relations:
INTERVIEW (Client_no, Interview_date, Interview_time, Staff_noSTAFF_ROOM (Staff_no,Interview_date,Room_no)
Data Normalisation 28
Fourth Normal Form
An entity type is in 4NF if it is in 3NF and there are no multivalued dependencies between its attribute types
Any entity type in 3NF is transformed to 4NF
Detect any multivalued dependencies
Decompose entity type
Data Normalisation 29
AUTHOR_NO BOOK_NO SUBJECT BOOK_TITLE AUTHOR_NAME
A1
A1
A2
A2
A3
B1
B1
B1
B1
B2
Comp. Sc.
Maths
Comp. Sc.
Maths
Maths
Methods
Method
Methods
Methods
Calculus
Jones
Jones
Smith
Smith
Brown
Multivalued Dependencies - 1
AUTHOR (Author_no, Author-name)
BOOK (Book_no, Book-_title)
AUTHOR-BOOK-SUBJECT (Author_no,
Book_no, Subject)
IN 3rd NORMAL FORM author_no
book_no
subject
author_name
book_title
Data Normalisation 30
Multivalued Dependencies - 2
Example models that "each AUTHOR is associated with all the SUBJECTS under which the BOOK is classified"
The attribute SUBJECT contains redundant values. If SUBJECT were deleted from rows 1 & 2 the values could be deduced from rows 3 & 4
Anomaly because the same set of SUBJECT is associated with each AUTHOR of the same BOOK
BOOK_NO AUTHOR_NO
multidetermines
BOOK_NO SUBJECT
AUTHOR_NO BOOK_NO SUBJECT
B1 B1 B1 B1 B2
Comp. Sc. Maths Comp. Sc. Maths Maths
A1 A1 A2 A2 A3
Data Normalisation 31
Fourth Normal Form
AUTHOR (Author_no, Author_name)
BOOK (Book_no, Book_Title)
AUTHOR-BOOK (Author_no, Book_no)
BOOK-SUBJECT (Book_no, Subject)
IN 4th NORMAL FORM
AUTHOR_NO BOOK_NO SUBJECT
A1 A1 A2 A2 A3
B1 B1 B1 B1 B2
Comp. Sc. Maths Comp. Sc. Maths Maths
AUTHOR_NO BOOK_NO
A1 A2 A3
B1 B1 B2
BOOK_NO SUBJECT
B1 B1 B2
Comp. Sc. Maths Maths
Data Normalisation 32
Conclusions
Data Normalisation is a bottom-up technique that ensures the basic properties of the relational model
no duplicate tuples
no nested relations
Data normalisation is often used as the only technique for database design - implementation view
A more appropriate approach is to complement conceptual modelling with data normalisation