19
Normalization in DBMS, notes prepared by Mahendra Patil Normalization: A te chnique for pr oduc ing a se t of ta ble s wi th desira bl e pr opert ies that su pport th e requirements of a user or company. Process of decomposing relations with anomalies to produce smaller, well-structured relations. Normalisation is a process for deciding which attributes should be grouped together in a relation. Use to val ida te and imp rove log ica l de sign to sat isf y certai n constrain ts - avo id unnecessary duplication of data. Objective of Normalization : The basic objectives of normalization are: 1) To reduce redundancy which means that information is to be stored only once. 2) To reduce file storage space required by base tables. 3) To reduce the inconsisten cy caused by redunda ncy. 4) To make it feasible to represent any relation in the database. 5) To free relations from undesirable insertion, update, and deletion anomalies. Properties of Normalized Relations: a. No data value should be du plicated in d ifferent rows unnecessarily. b. A value must be specified (and required) for every attribute in a ro w. c. Each relation should be self-contained. In other words, if a row from a relation is deleted, important information should not be accidentally lost. d. When a row is added to a rela tion, other relations in the database should not be affected. e. A value of an attribute in a tuple may be changed independent of ot her tuples in the relation and other relations. 1

Normalization Notes by Mahendra Patil

  • Upload
    udddd

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 1/18

Normalization in DBMS, notes prepared by Mahendra Patil

Normalization:

A technique for producing a set of tables with desirable properties that support therequirements of a user or company.

Process of decomposing relations with anomalies to produce smaller, well-structuredrelations.

Normalisation is a process for deciding which attributes should be grouped together in

a relation.

Use to validate and improve logical design to satisfy certain constraints - avoidunnecessary duplication of data.

Objective of Normalization:

The basic objectives of normalization are:1) To reduce redundancy which means that information is to be stored only once.2) To reduce file storage space required by base tables.

3) To reduce the inconsistency caused by redundancy.4) To make it feasible to represent any relation in the database.

5) To free relations from undesirable insertion, update, and deletion anomalies.

Properties of Normalized Relations:

a. No data value should be duplicated in different rows unnecessarily.

b. A value must be specified (and required) for every attribute in a row.

c. Each relation should be self-contained. In other words, if a row from a relation is

deleted, important information should not be accidentally lost.

d. When a row is added to a relation, other relations in the database should not be

affected.

e. A value of an attribute in a tuple may be changed independent of other tuples in the

relation and other relations.

1

Page 2: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 2/18

Normalization in DBMS, notes prepared by Mahendra Patil

Data redundancy and update anomalies:

Problems associated with data redundancy are illustrated by comparing the Staff and

Branch tables with the StaffBranch table.

Fig: StaffBranch Table

StaffBranch table has redundant data; the details of a branch are repeated for everymember of staff.

In contrast, the branch information appears only once for each branch in the

Branch table and only the branch number (branchNo) is repeated in the Staff table,

to represent where each member of staff is located.

Tables that contain redundant information may potentially suffer from updateanomalies.

Types of update anomalies include

1) insertion

2) deletion

3) modification/updation

1) Insert Anomalies: Try to insert details for a new member of staff into StaffBranch.

You also need to insert branch details that are consistent with existing details for

the same branch.

Hard to maintain data consistency with StaffBranch

2

Page 3: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 3/18

Normalization in DBMS, notes prepared by Mahendra Patil

2) Delete Anomalies:

Try to delete details for a member of staff from StaffBranch.

You also lose branch details in that row (tuple).

3) Update Anomalies:

Try to update the value of one of the attributes of a branch.

You also need to update that information in all the rows about the same branch.

Decomposition of Relations :

Two important properties of decomposition:

Lossless-join property enables us to find any instance of original relation from

corresponding instances in the smaller relations.

Dependency preservation property enables us to enforce a constraint on

original relation by enforcing some constraint on each of the smaller relations.

Staff and Branch relations which are obtained by decomposing StaffBranch do notsuffer from these anomalies.

Steps in Normalisation:

First normal form: Any multivalued attributes (repeating groups) have beenremoved

Second normal form: Any partial functional dependencies have been removed

Third normal form: Any transitive dependencies have been removed

Boyce/Codd normal form: Any remaining anomalies that result from functional

dependencies have been removed

Fourth normal form: Any multivalued dependencies have been removed

Fifth normal form: Any remaining anomalies have been removed

Usually only bother with First to third

Following Fig shows process:

3

Page 4: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 4/18

Normalization in DBMS, notes prepared by Mahendra Patil

Relationship of Normal Forms:

4

Page 5: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 5/18

Normalization in DBMS, notes prepared by Mahendra Patil

The Process of Normalization :

Given a relation, use the following cycle

1. Find out what normal form it is in.

2. Transform the relation to the next higher form by decomposing it to form simpler

relations

3. You may need to refine the relation further if decomposition resulted in

undesirable properties

First normal form (1NF):

A relation is in 1NF if and only if all underlying domains contain atomic values only.

Or

 A table in which the intersection of every column and record contains only one value.

Steps from UNF to 1NF

1. Nominate an attribute or group of attributes to act as the key for theunnormalized table.

2. Identify repeating group(s) in unnormalized table which repeats for the key

attribute(s).

Fig: Branch table is not in 1NF

5

Page 6: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 6/18

Normalization in DBMS, notes prepared by Mahendra Patil

Second normal form (2NF) :

 A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on

 primary key of the relation.

2NF only applies to tables with composite primary keys.

Functional dependency :

Functional Dependency

Describes relationship between attributes in a relation or columns in a table.

If A and B are columns of table R, B is functionally dependent on A if each value

of A in R is associated with exactly one value of B in R. It is represented by A->B. Weare interested in finding such functional dependencies among database relations

6

Page 7: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 7/18

Normalization in DBMS, notes prepared by Mahendra Patil

• Determinant of a functional dependency refers to attribute or group of attributes

on left-hand side of the arrow.

• If the determinant can maintain the functional dependency with a minimum

number of attributes, then we call it fully functional dependency.

1NF to 2NF :

Steps:

1. Identify primary key for the 1NF relation.

2. Identify functional dependencies in the relation.

3. If partial dependencies exist on the primary key remove them by placing them in a

new relation along with copy of their determinant.

For ex:

7

Page 8: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 8/18

Normalization in DBMS, notes prepared by Mahendra Patil

Fig: TempStaffAllocation table is not in 2NF

8

Page 9: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 9/18

Normalization in DBMS, notes prepared by Mahendra Patil

9

Page 10: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 10/18

Normalization in DBMS, notes prepared by Mahendra Patil

Third normal form (3NF) :

 A relation R is in third normal form if it is in 2NF and every non-key attribute of R is

non-transitively dependent on primary key of R.

For example, consider a table with A, B, and C. If B is functional dependent on A(A-> B) and C is functional dependent on B (B-> C), then C is transitively

dependent on A via B (provided that A is not functionally dependent on B or C).

If a transitive dependency exists on the primary key, the table is not in 3NF.

2NF to 3NF :

Steps:

1. Identify the primary key in the 2NF relation.2. Identify functional dependencies in the relation.

3. If transitive dependencies exist on the primary key, remove them by placing them

in a new relation along with copy of their determinant.

For ex:

10

Page 11: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 11/18

Normalization in DBMS, notes prepared by Mahendra Patil

Fig: StaffBranch table is not in 3NF

11

Page 12: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 12/18

Normalization in DBMS, notes prepared by Mahendra Patil

Fig: Converting the StaffBranch table to 3NF

12

Page 13: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 13/18

Normalization in DBMS, notes prepared by Mahendra Patil

Boyce/Codd Normal Form (BCNF):

A relation is BCNF ⇔ every determinant is a candidate key

A determinant is an attribute, possibly composite, on which some other attribute is

fully functionally dependent For ex: Consider a relation SJT (Student-Subject-Teacher relation)

S J T

Smith Math Prof. White

Smith Physics Prof. Green

Jones Math Prof. WhiteJones Physics Prof. Brown

1. For each subject(J), each student (S) of that subject taught by only one teacher(T):

FD: S,J -> T

2. Each teacher (T) teaches only one subject(J):

FD: T -> J

3. Each subject (J) is taught by several teacher:

MVD: J -> -> T

There exists a relation SJT with attributes S (student), J (subject) and T (teacher).

The meaning of SJT tuple is that the specified student is taught the specified subject

by the specified teacher. There are two determinants: (S, J) and T in functional dependency.

Anomalies in update: If the fact that Jones studies physics is deleted, the fact that

Professor Brown teaches physics is also lost. It is because T is a determinant butnot a candidate key.

13

Page 14: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 14/18

Normalization in DBMS, notes prepared by Mahendra Patil

 

Fig: relation SJ Fig: relation TJ

Relations (S, J) and (T, J) are in BCNF because all determinants are candidate keys.

BCNF vs 3NF:

It should be noted that most relations that are in 3NF are also in BCNF. Infrequently, a3NF relation is not in BCNF and this happens only if

(a) the candidate keys in the relation are composite keys (that is, they are not single

attributes),

(b) there is more than one candidate key in the relation, and(c) the keys are not disjoint, that is, some attributes in the keys are common.

The BCNF differs from the 3NF only when there are more than one candidate keys and

the keys are composite and overlapping.

• BCNF: For every functional dependency X->Y in a set  F  of functional

dependencies over relation R, either:

 – Y is a subset of X or, –  X is a superkey of R

• 3NF: For every functional dependency X->Y in a set F of functional dependencies

over relation R, either: – Y is a subset of X or, – X is a superkey of R, or

 – Y is a subset of K for some key K of R

For Example:

Consider a 3NF schema which is not in BCNF:

14

Page 15: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 15/18

Normalization in DBMS, notes prepared by Mahendra Patil

Client, Office -> Client, Office, Account

Account -> Office

Account Client OfficeA Joe 1

B Mary 1

A John 1

C Joe 2

3NF has some redundancy BCNF does not.

Unfortunately, BCNF is not dependency preserving, but 3NF is.

Account Office

A 1

B 1C 2

 

Account Client

A Joe

B Mary

A John

C Joe

No No-trival FD’s

15

Page 16: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 16/18

Normalization in DBMS, notes prepared by Mahendra Patil

Multi-valued Dependency:

Given a relation R with attributes A, B and C. The multi-valued dependence R.A

→→R.B holds ⇔ the set of B-values matching a given (A-value, C-value) pair in

R depends only on the A-value and is independent of the C-value

16

Page 17: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 17/18

Normalization in DBMS, notes prepared by Mahendra Patil

Fourth Normal Form(4 NF):A relation is in 4NF⇔whenever there exists an multi-

valued dependence (MVD), say A→→B, then all attributes are also functionally

dependent on A, i.e. A→X for all attribute X of the relation

For Ex: Relation CTX (not in 4NF)

Course Teacher Text

Physics Prof. Green Basic Mechanics

Physics Prof. Green Principles of Optics

Physics Prof. Brown Basic Mechanics

Physics Prof. Brown Principles of Optics

Physics Prof. Black Basic Mechanics

Physics Prof. Black Principles of Optics

Math Prof. White Modern Algebra

Math Prof. White Projective Geometry

A tuple (C, T, X) appears in CTX ⇔ course C can be taught by teacher T and usesX as a reference. For a given course, all possible combinations of teacher and text

appear – that is, CTX satisfies the constraint: if tuples (C, T1, X1), (C, T2, X2)

both appears, then tuples (C, T1, X2), (C, T2, X1) both appears also. CTX contains redundancy

CTX is in BCNF as there are no other functional determinants

But CTX is not in 4NF as it involves an MVD that is not an FD at all, let alone an

FD in which the determinant is a candidate key Anomalies in insert: For example, to add the information that the physics course

uses a new text called Advanced Mechanism, it is necessary to create three new

tuples, one for each of the three teachers.

Fig: Relation CT Fig: Relation CX

17

Page 18: Normalization Notes by Mahendra Patil

7/28/2019 Normalization Notes by Mahendra Patil

http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 18/18

Normalization in DBMS, notes prepared by Mahendra Patil

4NF is an improvement over BCNF, in that it eliminates another form of undesirable

structure

Fifth Normal Form (5NF)/ Projection-Join Normal form:

Join dependency: relation R satisfies the JD (X, Y,…Z) ⇔ it is the join of its

projections on X, Y,…Z where X, Y,…Z are subsets of the set of attributes of R

A relation is in 5NF/PJNF (Projection-join normal form) ⇔ every join dependencyin R is implied by the candidate keys of R

5NF is the ultimate normal form with respect to projection and join.

For Ex:

Summary:

• Relations are categorized as a normal form based on which modification anomaliesor other problems that they are subject to:

 

18