Upload
abhimaurya
View
219
Download
0
Embed Size (px)
Citation preview
8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database
1/9
DATABASE & DATABASE DESIGN
What is Normalization?
Normalization is the process of efficiently organizing data
in a database. There are two goals of the normalization
process: eliminating redundant data and ensuring that the
database design does not suffer from any Update, delete
and insert anomalies . Both of these are worthy goals as
they reduce the amount of space a database consumes
and ensure that data is logically stored
Example of Anomalies:
Consider the following database design:
Student
Enroll No Name Section Mailing
Address
Club
Membership06BS1256 Shipra A CC-89,Xmas
Street,
Gurgaon
Finance
06BS1909 Krishna K A-
4555,Christ
Rd,Chennai
IT
06BS1256 Shipra A CC-89,Xmas Marketing
8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database
2/9
Street,
Gurgaon06BS1909 Krishna K A-
4555,ChristRd,Chennai
HR
06BS1890
Gokul J ABC,Saket,N
Delhi
Finance
In the table student the primary key is Enroll No and
Club Membership. It is seen from the table that a student
will be opting for a number of club memberships and as is
evident from the table that in case of a student opting for
no of club memberships there is repetition of students
details like name,section ,mailing address(see row 1 and3).It results not only in redundancy of data but can also
result in data inconsistency since any changes which have
to be made have to be made at multiple places.If Shipras
section changes from Section A to Section C changes will
will have to be made in both rows 1 and 3.If row 3 is not
updated it will result in data inconsistency and henceresult in Update Anomaly.
Insert Anomaly:
Consider a case of a new student joining IBS gurgaon in
Section D but he doesnt have any club memberships
8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database
3/9
.Since Club Memebership is a primary key and Primary
key cannot be left blank or have NULL Value we cannot
insert the details of the new student till he becomes a
member of atleast one club.This is refereed to as Insert
Anomaly.
Delete Anomaly:
Consider a case where Gokul (Row 5 of the table) is no
longer a member of Finance Club. We will have to delete
Gokuls record since Club Membership cannot be NULL or
blank since its a primary key.If we delete Gokuls record
we loose all information about gokuls section ,mailing
address etc.This is a Delete Anomaly.
To prevent instances like these the database community
has developed a series of guidelines for ensuring that
databases are normalized. These are referred to as normal
forms and are numbered from one (the lowest form of
normalization, referred to as first normal form or 1NF)
through five (fifth normal form or 5NF). In practicalapplications, you'll often see 1NF, 2NF, and 3NF along
with the occasional 4NF. Fifth normal form is very rarely
seen.
It's important to point out that they are guidelines and
guidelines only . Occasionally, it becomes necessary to
8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database
4/9
stray from them to meet practical business requirements.
However, when variations take place, it's extremely
important to evaluate any possible ramifications they
could have on the system and account for possible
inconsistencies.
First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an
organized database: Remove all multivalued attributes. No comma
separated values are allowed in a single field of the
database.
For example
Customer
8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database
5/9
Customer ID
FirstName Surname
TelephoneNumber
123 Robert Ingram 555-861-2025
456 Jane Wright
555-403-1659,555-776-4100
789 Maria Fernandez 555-808-9633
This table is not in 1 NF since Telephone Number contains
Multiple Attributes in one cell/field.
Remedies:
Customer ID
FirstName Surname
Tel. No.1
Tel. No.2
Tel.No. 3
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659555-776-4100
8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database
6/9
789 Maria Fernandez 555-808-9633
Have different Number of Columns for the
repeating field Telephone No. But this remedy also
comes with its problems since the max no of
telephone numbers for a customer would be difficult
to ascertain .Like in this table we see that most of
the fields for Tel No 2 and Tel No 3 are blank.This
contributes to redundancy .
Have Separate rows for different Telephone
nos. This also contributes to redundancy since for
every row except for Telephone No all other
attributes will get repeated.
Best remedy would be to divide this table into
two .The problem of both redundancy and
Multivalued Attributes gets solved.
Customer
Customer ID
FirstName Surname
Customer TelephoneNumber
Customer ID
TelephoneNumber
8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database
7/9
123 Robert Ingram
456 Jane Wright
789 Maria Fernandez
123 555-861-2025
456 555-403-1659
456 555-776-4100
789 555-808-9633
SECOND NORMAL FORM (2NF)
Second normal form (2NF) further addresses the concept
of removing duplicative data:
Applicable to tables which have a composite
primary key.
2 NF states that all Non Key Attributes shouldbe fully functionally dependent on the key
attributes.
2 NF states that all Non key Attributes (Section, Mailing
Address, Name) in table student should be fully
functionally dependent on the Key attributes(Enroll No,
8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database
8/9
Club Membership).That means no Non Key attribute
should get its value only from a part of the primary key.
In the table student all three Non key Attributes are
dependent on Enroll No and not Club Membership. For the
table to be in 2NF them all must derive their values from
the combination of both Enroll No and Club Membership.
To get this table in 2NF we divide the table further by
taking out the non key attribute which has a partial
functional dependency on part of primary key.For exampleName(Non key attribute) and Enroll No (part of the
Primary key Attribute) will be palced in another table.
Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
Meet all the requirements of the second normal
form.
Remove columns that are not dependent upon
the primary key directly .There should be no
transitive dependency .
For egs:
Employee
Employee ID Dept Id Dept Name
8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database
9/9
In This table Employee Id is the Primary key
.Employee ID determines the Dept ID (the dept for
which the employee is working) and Dept ID
determines the Dept Name. Thus in this table there
is transitive dependency because Employee ID
determines Dept ID and Dept ID determines Dept
Name .3NF states that all non key attributes should
be directly dependent on the primary key.
To get this table in 3 NF we remove Dept Name andDept ID and put it another table.