Lecture 10

Theory of Database Systems

Lecture 10. The process of normalization I.

Normalization

• Normalization is a technique for producing a set of suitable relations that support the data requirements of an enterprise.

Suitable set of relations

• Characteristics of a suitable set of relations include:

– the minimal number of attributes necessary to support the data requirements of the enterprise;

– attributes with a close logical relationship are found in the same relation;

– minimal redundancy with each attribute represented only once with the important exception of attributes that form all or part of foreign keys.

Benefits of suitable set of relations

• The benefits of using a database that has a suitable set of relations is that the database will be:

– easier for the user to access and maintain the data;

– take up minimal storage space on the computer.

How Normalization Supports Database Design

• Normalization is a bottom-up approach to DB design that begins by examining the relationships between attributes.

• However a top-down approach can also be used that begins by identifying the main entities and relationships and uses normalization as a validation technique.

The Process of Normalization

• Normalization is a formal technique for analyzing a relation based on its primary key and the functional dependencies between the attributes of that relation.

• Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.

Normalization

• Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF).

• Normalization is based on functional dependencies among the attributes of a relation.

• A relation can be normalized to a specific form to prevent possible occurrence of update anomalies.


The relationship between the normal forms.It shows that some 1NF relations are also in 2NF and some 2NF relations are also in 3NF, an so on.


Unnormalized Form (UNF)

• Before discussing first normal form, we initially give a definition of the state prior to first normal form.

• Unnormalized form is a table that contains one or more repeating groups.

• To create an unnormalized table – Transform the data from the information source

(e.g. form) into table format with columns and rows.

• In this format, the table is in unnormalized form (UNF).

Repeating group

• A repeating group is an attribute, or group of attributes, within a table that occurs with multiple values for a single occurrence of the nominated key attribute(s) of that table.

• Nominated key: refers to the attribute(s) that uniquely identify each row within the unnormalized table.

Example: Form

Collection of DreamHome leases.In the example it is assumed that a client rents a given

property only once and cannot rent more than one property at any one time.

UNF example

• Sample data is taken from two leases for two different clients and is transferred into table format with rows and columns.

• This is an unnormalized table.

ClientRental unnormalized table.

UNF example

• We identify the key attribute for the Clientrental unnormalized table as clientNo.

• Next we identify the repeating group in the unnormalized table:

Repeating Group = (propertyNo, pAddress, rentstart, rentFinish, rent, ownerNo, ownerName)

• As a consequence, there are multiple values at the intersection of certain rows and columns.

First Normal Form (1NF)

• A relation in which the intersection of each row and column contains one and only one value.

UNF to 1NF

• To transform the unnormalized table to first normal form we identify and remove repeating groups within the table.

– Nominate an attribute or group of attributes to act as the key for the unnormalized table.

– Identify the repeating group(s) in the unnormalized table which repeats for the key attribute(s).

• There are two common approaches to removing repeating groups from unnormalized tables.

Method 1

• We remove the repeating group by entering appropriate data into the empty columns of rows containing the repeating data (‘flattening’ the table). We fill in the blanks by duplicating the nonrepeating data.

• The resulting relation contains atomic values at the intersection of each row and column, and is therefore in 1NF.

• With this approach redundancy is introduced into the resulting relation.

Method 1 example

• Remove the repeating group by entering the appropriate client data into each row.

• The resulting relation ClientRental is in 1NF as there is a single value at the intersection of each row and column.

Method 1 example

• We identify the candidate keys for the ClientRental relation as being composite keys:– (clientNo, propertyNo)

– (clientNo, rentStart)

– (propertyNo, rentStart)

• We select (clientNo, propertyNo) as the primary key.

• The relation contains data describing clients, property rented, and property owners, which is repeated several times. As a result, the relation contains significant data redundancy.

Method 2

• We remove the repeating group by placing the repeating data along with a copy of the original key attribute(s) into a separate relation.

• A primary key is identified for the new relation.

• This approach produces relations in at least 1NF with less redundancy.

Method 2 example

• Using the second approach, we remove the repeating group by placing the repeating data along with a copy of the original key attribute (clientNo) into a separate table, called PropertyRentalOwner.

Method 2 example

• Then we identify a primary key for the new table (clientNo, propertyNo).

• The format of the resulting 1NF relations are as follows:

Client (clientNo, CName)

PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName)

• Both the Client and PropertyRentalOwner tables are in 1NF, but the PropertyRentalOwner table contains significant redundancy.

Second Normal Form (2NF)

• Second normal form is based on the concept of full functional dependency.

• Full functional dependency indicates that if – A and B are attributes of a relation,

– B is fully functionally dependent on A if B is functionally dependent on A but not on any proper subset of A.

• A functional dependency A B is a full functional dependency if removal of any attribute from A results in the dependency not being sustained any more.

Second Normal Form (2NF)

• A relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key.

– Second normal form applies to relations with composite keys (the primary key composed of two or more attributes).

– A relation with a single attribute primary key is automatically in at least 2NF.

1NF to 2NF

• Identify the primary key for the 1NF relation.

• Identify the functional dependencies in the relation.

• If partial dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their determinant.

Partial dependency

• A functional dependency A B is partially dependent if there is some attribute that can be removed from A and the dependency still holds.

2NF example

Consider the ClientRental relation.

• This ClientRental table is in 1NF. The primary key of the table is (clientNo, propertyNo).

• In order to move this table to a 2NF solution, we must identify and remove the partial dependencies from the table.

Functional dependencies in ClientRental relation

• The functional dependencies (fd) for the ClientRental relation are as follows:

• The presence of partial dependencies show that the table is not in 2NF.– cName is partially dependent on the primary key, in

other words, on only the clientNo attribute.– Property attributes are also partially dependent on

the primary key.

Transform the ClientRental relation into 2NF

• To remove the partial dependencies, we create new tables so that the non-primary-key columns are removed, along with a copy of the part of the primary key on which they are fully functionally dependent.

• This results in the creation of three new relations

called Clioent, Rental, and PropertyOwner.

2NF relations derived from ClientRental relation

• The three tables, Client, Rental and PropertyOwner are in 2NF because every non-primary-key column is fully functionally dependent on the primary key of the table.

Remarks

• Although 2NF relations have less redundancy than those in 1NF, they may still suffer from update anomalies.

• E.g. if we want to update the name of on owner e.g. Tony Diamond we have to update two tuples in the PropertyOwner relation.

• If we update only one tuple and not the other the database would be in an inconsistent state.

• This update anomaly is caused by a transitive dependency.

• We need to remove such dependencies by progressing to third normal form.

Documents

Lecture 10