Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

Embed Size (px)

Citation preview

  • 8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

    1/9

    DATABASE & DATABASE DESIGN

    What is Normalization?

    Normalization is the process of efficiently organizing data

    in a database. There are two goals of the normalization

    process: eliminating redundant data and ensuring that the

    database design does not suffer from any Update, delete

    and insert anomalies . Both of these are worthy goals as

    they reduce the amount of space a database consumes

    and ensure that data is logically stored

    Example of Anomalies:

    Consider the following database design:

    Student

    Enroll No Name Section Mailing

    Address

    Club

    Membership06BS1256 Shipra A CC-89,Xmas

    Street,

    Gurgaon

    Finance

    06BS1909 Krishna K A-

    4555,Christ

    Rd,Chennai

    IT

    06BS1256 Shipra A CC-89,Xmas Marketing

  • 8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

    2/9

    Street,

    Gurgaon06BS1909 Krishna K A-

    4555,ChristRd,Chennai

    HR

    06BS1890

    Gokul J ABC,Saket,N

    Delhi

    Finance

    In the table student the primary key is Enroll No and

    Club Membership. It is seen from the table that a student

    will be opting for a number of club memberships and as is

    evident from the table that in case of a student opting for

    no of club memberships there is repetition of students

    details like name,section ,mailing address(see row 1 and3).It results not only in redundancy of data but can also

    result in data inconsistency since any changes which have

    to be made have to be made at multiple places.If Shipras

    section changes from Section A to Section C changes will

    will have to be made in both rows 1 and 3.If row 3 is not

    updated it will result in data inconsistency and henceresult in Update Anomaly.

    Insert Anomaly:

    Consider a case of a new student joining IBS gurgaon in

    Section D but he doesnt have any club memberships

  • 8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

    3/9

    .Since Club Memebership is a primary key and Primary

    key cannot be left blank or have NULL Value we cannot

    insert the details of the new student till he becomes a

    member of atleast one club.This is refereed to as Insert

    Anomaly.

    Delete Anomaly:

    Consider a case where Gokul (Row 5 of the table) is no

    longer a member of Finance Club. We will have to delete

    Gokuls record since Club Membership cannot be NULL or

    blank since its a primary key.If we delete Gokuls record

    we loose all information about gokuls section ,mailing

    address etc.This is a Delete Anomaly.

    To prevent instances like these the database community

    has developed a series of guidelines for ensuring that

    databases are normalized. These are referred to as normal

    forms and are numbered from one (the lowest form of

    normalization, referred to as first normal form or 1NF)

    through five (fifth normal form or 5NF). In practicalapplications, you'll often see 1NF, 2NF, and 3NF along

    with the occasional 4NF. Fifth normal form is very rarely

    seen.

    It's important to point out that they are guidelines and

    guidelines only . Occasionally, it becomes necessary to

  • 8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

    4/9

    stray from them to meet practical business requirements.

    However, when variations take place, it's extremely

    important to evaluate any possible ramifications they

    could have on the system and account for possible

    inconsistencies.

    First Normal Form (1NF)

    First normal form (1NF) sets the very basic rules for an

    organized database: Remove all multivalued attributes. No comma

    separated values are allowed in a single field of the

    database.

    For example

    Customer

  • 8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

    5/9

    Customer ID

    FirstName Surname

    TelephoneNumber

    123 Robert Ingram 555-861-2025

    456 Jane Wright

    555-403-1659,555-776-4100

    789 Maria Fernandez 555-808-9633

    This table is not in 1 NF since Telephone Number contains

    Multiple Attributes in one cell/field.

    Remedies:

    Customer ID

    FirstName Surname

    Tel. No.1

    Tel. No.2

    Tel.No. 3

    123 Robert Ingram 555-861-2025

    456 Jane Wright 555-403-1659555-776-4100

  • 8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

    6/9

    789 Maria Fernandez 555-808-9633

    Have different Number of Columns for the

    repeating field Telephone No. But this remedy also

    comes with its problems since the max no of

    telephone numbers for a customer would be difficult

    to ascertain .Like in this table we see that most of

    the fields for Tel No 2 and Tel No 3 are blank.This

    contributes to redundancy .

    Have Separate rows for different Telephone

    nos. This also contributes to redundancy since for

    every row except for Telephone No all other

    attributes will get repeated.

    Best remedy would be to divide this table into

    two .The problem of both redundancy and

    Multivalued Attributes gets solved.

    Customer

    Customer ID

    FirstName Surname

    Customer TelephoneNumber

    Customer ID

    TelephoneNumber

  • 8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

    7/9

    123 Robert Ingram

    456 Jane Wright

    789 Maria Fernandez

    123 555-861-2025

    456 555-403-1659

    456 555-776-4100

    789 555-808-9633

    SECOND NORMAL FORM (2NF)

    Second normal form (2NF) further addresses the concept

    of removing duplicative data:

    Applicable to tables which have a composite

    primary key.

    2 NF states that all Non Key Attributes shouldbe fully functionally dependent on the key

    attributes.

    2 NF states that all Non key Attributes (Section, Mailing

    Address, Name) in table student should be fully

    functionally dependent on the Key attributes(Enroll No,

  • 8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

    8/9

    Club Membership).That means no Non Key attribute

    should get its value only from a part of the primary key.

    In the table student all three Non key Attributes are

    dependent on Enroll No and not Club Membership. For the

    table to be in 2NF them all must derive their values from

    the combination of both Enroll No and Club Membership.

    To get this table in 2NF we divide the table further by

    taking out the non key attribute which has a partial

    functional dependency on part of primary key.For exampleName(Non key attribute) and Enroll No (part of the

    Primary key Attribute) will be palced in another table.

    Third Normal Form (3NF)

    Third normal form (3NF) goes one large step further:

    Meet all the requirements of the second normal

    form.

    Remove columns that are not dependent upon

    the primary key directly .There should be no

    transitive dependency .

    For egs:

    Employee

    Employee ID Dept Id Dept Name

  • 8/9/2019 Normalization_is_the_process_of_efficiently_organizing_data_in_a_database

    9/9

    In This table Employee Id is the Primary key

    .Employee ID determines the Dept ID (the dept for

    which the employee is working) and Dept ID

    determines the Dept Name. Thus in this table there

    is transitive dependency because Employee ID

    determines Dept ID and Dept ID determines Dept

    Name .3NF states that all non key attributes should

    be directly dependent on the primary key.

    To get this table in 3 NF we remove Dept Name andDept ID and put it another table.