32
IS 230 Lecture 8 Slide 1 Normalization Lecture 9

IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

Embed Size (px)

Citation preview

Page 1: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 1

Normalization

Lecture 9

Page 2: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 2IS 230 Lecture 8 Slide 2

Lecture 8: Normalization1. Normalization2. Data redundancy and anomalies3. Spurious information4. Functional dependencies5. Normalization

first normal form second normal form third normal form BCNF normal form

6. Normalization Methodology: Example

Page 3: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 3IS 230 Lecture 8 Slide 3

1. Normalization A technique for producing a set of

relations with desirable properties, given the data requirements of the applications

Page 4: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 4IS 230 Lecture 8 Slide 4

2. Data redundancy and anomalies

Emp-Dept relation

NI# Name DateOfBirth Dept# Dname Manager 21 AA - 5 CS 91 22 BB - 5 CS 91 23 CC - 6 TS 93 24 DD - 7 PSV 94 25 EE - 7 PSV 94

Insertion

How do we insert a new department with no employees yet? (keys?) Entering employees is difficult as department information must be

entered correctly.

Deletion What happens when we delete CC's data - do we lose department 6?

Modification If we change the manager of department 5, we must change it for

tuples with Dept# = 5.

Page 5: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 5IS 230 Lecture 8 Slide 5

3. Spurious information Avoid breaking up relations in such a

way that spurious information is createdNI# Name ProjName ProjLocation 123 XX Accounts London 123 XX Analysis Paris 124 YY PI London

may be broken into:

Joining them back together, we get NEW TUPLES!

NI# Name ProjName 123 XX Accounts 123 XX Analysis 124 YY PI

NI# Name ProjName ProjLocation 123 XX Accounts Paris 123 XX Analysis London

NI# ProjLocation 123 London 123 Paris 124 London

Page 6: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 6IS 230 Lecture 8 Slide 6

4. Functional dependencies Formal concepts that may be used to

exhibit “goodness” and “badness” of individual relational schemas, and describe relationships between attributes

Examples of Functional Dependency

Name, DateOfBirth, Dept all depend on NI

Dname & Manager depend on Dept

ProjLocation depends on ProjName.

Page 7: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 7IS 230 Lecture 8 Slide 7

Functional Dependence An attribute, X, of a relation is functionally

dependent on attributes A, B, ..., N if the same values of A, ..., N are always associated with the same value of X

{A,…,N} X

A, ..., N is called the determinant of the functional dependency

Page 8: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 8IS 230 Lecture 8 Slide 8

Full Functional Dependency X fully depends on A, …, N if it is

not dependent on any subset of A, ..., N; otherwise we talk of partial dependency

e.g. age is dependent on NI and Name but only fully dependent on NI.

Page 9: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 9IS 230 Lecture 8 Slide 9

5. Normalization Process

taking a set of relations and decomposing them into more relations satisfying some criteria.

Decomposition essentially a series of projections so that the

original data can be reconstituted using joins.

Normal form form of the relations which satisfy the

criteria

Page 10: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 10IS 230 Lecture 8 Slide 10

5.1. First Normal Form (1)

A relation is in first normal form (1NF) if all values are atomic, i.e. single values - small strings and numbers.

Identify and remove repeating groups (multi-valued attributes)

Page 11: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 11IS 230 Lecture 8 Slide 11

First Normal Form (2) DEPARTMENT

Dnumber Dname Locations 5 C.S. { Paris, London }

Two ways of normalising this:

Have a tuple for each location of each department:

Have a separate relation for (Dnumber, Locations) pairs:

The latter is better as it avoids redundancy.

Dnumber Dname Locations 5 C.S. Paris 5 C.S. London

Dnumber Dname 5 C.S

Dnumber Locations 5 Paris 5 London

Page 12: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 12IS 230 Lecture 8 Slide 12

First Normal Form (3) Example of composite attribute:

Room is not atomic Room includes Deptname and

Room#

TeacherID Course Room 23568 230 CS234 23669 240 IS225 25644 352 IS232 26338 455 CE255

TeacherID Course Room# 23568 230 234 23669 240 225 25644 352 232 26338 455 255

Room# Deptname 234 CS 225 IS 232 IS 255 CE

Page 13: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 13IS 230 Lecture 8 Slide 13

5.2. Second Normal Form By the definition of the primary key, every other attribute

is functionally dependent on it.

If all the other attributes are fully functionally dependent then the relation is in Second Normal Form (2NF).

Clearly, any relation with a single primary key will be 2NF.

If there are two primary key attributes, A & B, then each other attribute is either

dependent on A alone; dependent on B alone; or dependent on both.

2NF Normal consists of creating a separate relation for each of the three cases.

Page 14: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 14IS 230 Lecture 8 Slide 14

Example of 2NF decomposition

EMP_PROJ(SSN, Pnumber, Hours, Ename, Pname, Ploc)

SSN Pnumber

Hours

Ename Pname Ploc

Decomposition into three 2NF relations:

Work(SSN,Pnumber,Hours)EMP(SSN,Ename)Project(Pnumber,Pname,Ploc)

SSN EnamePnumber Pname,PlocSSN, Pnumber Hours

Page 15: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 15IS 230 Lecture 8 Slide 15

5.3. Third Normal Form Third Normal Form eliminates transitive dependencies - i.e.

those dependencies which hold only because of some intermediary. An attribute is transitively dependent on the primary key if there is

some other attribute which is dependent on and which is, in turn, dependent on the key.

NI# Dnumber Dname 1234 5 C.S. 1235 5 C.S.

Dname is dependent on NI, but only because it is dependent on Dnumber which is, in turn, dependent on NI.

Non-3NF relations are likely to hold redundant information. A relation is in 3NF if for any pair of attribute A & B such that A

B, there is no attribute such that A X and X B.

NI# Dnumber 1234 5 1235 5

Dnumber Dname 5 C.S.

Normalizing this would create:

Page 16: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 16IS 230 Lecture 8 Slide 16

Example of 3NF decomposition EMP_DEPT(SSN, Ename, Bdate, Address, Dnum, Dname, Dman)

Ename Bdate Address Dnum

Dname Dman

SSN

Decomposition into two 3NF relations:

EMPLOYEE(SSN, Ename, Bdate, Address, Dnum)DEPT(Dnum, Dname, Dman)

SSN Ename, Bdate, Address, DnumDnum Dname, Dman

Page 17: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 17IS 230 Lecture 8 Slide 17

General Definitions of Normal Forms

Page 18: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 18IS 230 Lecture 8 Slide 18

5.4. Boyce-Codd Normal Form Every relation in BCNF is also in 3NF

Relation in 3NF is not necessarily in BCNF

Nontrivial FD means not trivial FD A trivial functional dependency X Y

is one in which Y is a subset of X Example: A, B B is a trivial FD

Most relation schemas that are in 3NF are also in BCNF

Page 19: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 19IS 230 Lecture 8 Slide 19

Example of not BCNF The following is not in BCNF:

bor_loan = (customer_id, loan_number, amount)

The following is a functional dependency that may hold:

loan_numberamount but loan_number is not a superkey of bor_loan

We decompose into two relations: R1=(customer_id, loan_number) R2=(loan_number, amount) R1 and R2 are in BCNF

Page 20: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 20IS 230 Lecture 8 Slide 20

R=(A, B, C, D, E, F) A, B D B E (not in BCNF) D F (not in 3NF)

Page 21: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 21IS 230 Lecture 8 Slide 21

Normalization Methodology: Example

Consider the following description of a company: The company is divided into departments. Each department is identified by its department number. A department has a name and a manager (an employee). The employees of the company are identified by their National Insurance number. An employee has a name, an address, an age, and work in one department only. An employee is supervised by several supervisors (employees), and a supervisor can supervise several employees (supervisees). An employee can have dependents, where each dependent is described by his/her name and age. The company has a number of running projects. A project is identified by its project number. A project has also a name and a description. Several employees can work on a project, and an employee can work on several projects, each a fixed number of hours.

Page 22: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 22IS 230 Lecture 8 Slide 22

A B1 41 53 7

1. Functional dependencies An attribute, X, of a relation is

functionally dependent on attributes A, B, ..., N if the same values of A, ..., N are always associated with the same value of X

{A,…,N} X

Example

A B does NOT hold, B A does hold

Page 23: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 23IS 230 Lecture 8 Slide 23

1. Functional dependencies (Cont.)

The company is divided into departments. Each department is identified by its department number. A department has a name and a manager (an employee).

Dnumber Dname, Manager

The employees of the company are identified by their National Insurance number. An employee has a name, an address, an age, and work in one department only.

NI Ename, Address, Eage, Dnumber

Page 24: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 24IS 230 Lecture 8 Slide 24

1. Functional dependencies (Cont.)

An employee is supervised by several supervisors (employees), and a supervisor can supervise several employees (supervisees).

NI Supervisor (is not a valid functional dependency since an employee can have several supervisors)

An employee can have dependents, where each dependent is described by his/her name and age.

NI, DepName DepAge

Page 25: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 25IS 230 Lecture 8 Slide 25

1. Functional dependencies (Cont.)

The company has a number of running projects. A project is identified by its project number. A project has also a name and a description.

Pno Pname, Description

Several employees can work on a project, and an employee can work on several projects, each a fixed number of hours.

Pno, NI Hours

Page 26: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 26IS 230 Lecture 8 Slide 26

1. All Functional dependencies

Dnumber Dname, ManagerNI Ename, Address, Eage, DnumberNI, DepName DepAgePno Pname, Description Pno, NI Hours

The Universal Relation:U(Dnumber, Dname, Manager, NI, Ename, Address, Eage, Pno, Pname, Description, DepName, DepAge, Hours, Supervisor)

Page 27: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 27IS 230 Lecture 8 Slide 27

2. The Primary key

What is the primary key?(Dnumber, NI, Pno, DepName)?

Supervisor must be part of the primary key because it is a multivalued attribute

DepName must be part of the primary key since there can be several dependents to an employee

Dnumber is not part of the primary key since it can be derived from NI

Thus the primary key is (NI, Pno, DepName, Supervisor)

Page 28: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 28IS 230 Lecture 8 Slide 28

3. Is U in 1NF?

A relation is in first normal form (1NF) if all values are atomic, i.e. single values

Supervisor is a multivalued attribute, hence U is not in 1NF

To remove the multivalued attribute, the relation U is decomposed as follows

U1(Dnumber, Dname, Manager, NI, Ename, Address, Eage, Pno, Pname, Description, DepName, DepAge, Hours)

Supervise(NI, Supervisor)

Page 29: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 29IS 230 Lecture 8 Slide 29

4. Is U in 2NF? A relation is in Second Normal Form (2NF) if all the attributes

are fully functionally dependent on the primary key. A relation with a single primary key is in 2NF.

Supervise is in 2NF U1 is not in 2NF because some attributes are partially

functionally dependent on the primary key (e.g. DepAge, Ename, etc).

We decompose U1 into the following 2NF relations:

Dept_Emp(Dnumber, Dname, Manager, NI, Ename, Address, Eage)

  Dependent(NI, DepName, DepAge)  Project(Pno, Pname, Description)  Work(NI, Pno, Hours)

Page 30: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 30IS 230 Lecture 8 Slide 30

5. Is the previous decomposition in 3NF? A relation is in 3NF if for any pair of attributes A

& B such that A B, there is no attribute such that A X and X B

Supervise is in 3NF Among the above relations, only Dept_Emp is

not in 3NF, since it has transitive dependencies (e.g. NI Dnumber Dname).

We therefore decompose the relation into the following two relations, which are in 3NF:

Department(Dnumber, Dname, Manager)

Employee(NI, Ename, Address, Eage, Dnumber)

Page 31: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 31IS 230 Lecture 8 Slide 31

6. Complete Schema in 3NF

Department(Dnumber, Dname, Manager) Employee(NI, Ename, Address, Eage,Dnumber) Dependent(NI, DepName, DepAge) Project(Pno, Pname, Description) Work(NI, Pno, Hours) Supervise(NI, Supervisor)

Page 32: IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious

IS 230 Lecture 8 Slide 32

End of Chapter