27
Nov 11, 2003 Murali Mani Normalization B term 2004: lecture 7, 8, 9

Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Nov 11, 2003 Murali Mani

Normalization

B term 2004: lecture 7, 8, 9

Nov 11, 2003 Murali Mani

Anomalies

Insert, delete and update anomalies

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 ER

Student

Note: We cannot insert a professor who has no students.

Insert Anomaly

Nov 11, 2003 Murali Mani

Delete Anomaly

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 ER

Student

Note: We cannot delete a student that is the only student of a professor.

Note: In both cases, minimum cardinality of Professor in the correspondingER schema is 0

Student

sNumber

sName

Professor

pNumber

pName

HasAdvisor

(1,1) (0,*)

years

Nov 11, 2003 Murali Mani

Update Anomaly

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p1 MM

Student

Note: To update the name of a professor, we have to update in multiple tuples.Update anomalies are due to redundancy.

Student

sNumber

sName

Professor

pNumber

pName

HasAdvisor

(1,1) (0,*)

years

Nov 11, 2003 Murali Mani

Keys (primary, candidate and superkeys) A key for a relation R (a1, a2, …, an) is a set of

attributes that uniquely determine the values for all attributes of R.

A key K is minimal if no subset of K is a key. One of the minimal keys is assigned as primary

key, and others are candidate keys. A superkey need not be minimal

Note: a primary or candidate key is by default a superkey, but a superkey need not be a primary or candidate key.

Note: For a relation R, the set of all its attributes is a superkey.

Nov 11, 2003 Murali Mani

Keys: Example

sNumber sName address

1 Dave 144FL

2 Greg 320FL

Student

Primary Key: sNumberCandidate key: sNameSome superkeys: {<sNumber, address>, <sName>, <sName, address>,

<sNumber>}

Nov 11, 2003 Murali Mani

From Set Theory: More on Functions

Consider a function f : X Y, where X is domain, and Y is range

f is one-to-one if no two x X, are related to the same y Y (also called injection)

f is onto if for every y Y, there is some x X that it is related to (also called surjection)

f is said to be a bijection if it is both one-to-one and onto

Nov 11, 2003 Murali Mani

Functional Dependencies (FDs)

sNumber sName address

1 Dave 144FL

2 Greg 320FL

Student

Suppose we have the FD sName address• for any two rows in the Student relation with the same value for sName, the value for address must be the same.• Note: we cannot have a tuple such as (3, null, a1), that is we cannot have a tuple with address value, but with no sName value. (this looks similar to onto functions, but not really).

Note: Any key (primary, candidate or superkey) of a relation R functionally determines all attributes of R

Nov 11, 2003 Murali Mani

Properties of FDs

A B, A C, then A BC If A BC, then A B, A C (assumption:

no nulls) Reflexive (also called trivial FD), if A B,

then A B is always true Transitive, if A B, and B C, then A C If A B, then AZ BZ is always true

Nov 11, 2003 Murali Mani

Inferring FDs

Why? Suppose we have a relation R (A, B, C) and we

have functional dependencies A B, B C, C A what is a key for R? Should we split R into multiple relations?

We can infer A BC, B AC, C AB. Hence A, B, C are all keys.

Nov 11, 2003 Murali Mani

Algorithm for inference of FDs

Based on transitivity, i.e., if A B and B C, then A C

Computing the closure of set of attributes {A1, A2, …, An}, denoted {A1, A2, …, An}+

1. Let X = {A1, A2, …, An}2. If there exists a FD B1, B2, …, Bm C, such

that every Bi X, then X = X C3. Repeat step 2 till no more attributes can be

added.4. {A1, A2, …, An}+ = X

Nov 11, 2003 Murali Mani

Inferring FDs: Example

Given R (A, B, C), and FDs A B, B C, C A, what are possible keys for R

We can computer the closure of attributes as {A}+ = {B, C} {B}+

= {A, C}

{C}+ = {A, B}

So minimal keys are <A>, <B>, <C>

Nov 11, 2003 Murali Mani

Decomposing Relations

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 MM

Student

FDs: pNumber pName

sNumber sName pNumber

s1 Dave p1

s2 Greg p2

Student

pNumber pName

p1 MM

p2 MM

Professor

Nov 11, 2003 Murali Mani

Bad Decomposition Example Generation of spurious tuples (also called

lossless join property)

sNumber sName pName

S1 Dave MM

S2 Greg MM

Student

pNumber pName

p1 MM

p2 MM

Professor

sNumber sName pNumber pName

s1 Dave p1 MM

s1 Dave p2 MM

s2 Greg p1 MM

s2 Greg p2 MM

Student

Nov 11, 2003 Murali Mani

Normalization Step Consider relation R with set of attributes AR.

Consider a FD A B (such that no other attribute in (AR – A – B) is functionally determined by A). Let A be not a key for R. We may decompose R as: Remove attributes B from R, in other words create

R’ (AR – B) Create R’’ with attributes A B Key for R’’ = A

Nov 11, 2003 Murali Mani

Normal Forms

First Normal Form (1NF): Attributes of a relation are simple.

Second Normal Form (2NF): No non-prime attribute a is partially dependent on any minimal key (i.e., no subset of a minimal key functionally determines a). Note: An attribute is prime attribute if it is part of

some minimal key.

Nov 11, 2003 Murali Mani

2NF - example

Lot (propNo, county, lotNum, area, price, taxRate)

Candidate key: <county, lotNum>FDs: county taxRatearea price

The FD county taxRate violates 2NF

Decomposition:

Lot (propNo, county, lotNum, area, price)

County (county, taxRate)

Nov 11, 2003 Murali Mani

Normal Forms

Third Normal Form (3NF): For every non-trivial FD X a in R, either a is a prime attribute or X is a superkey of R.

Boyce Codd Normal Form (BCNF): For every non-trivial FD X a in R, X is a superkey of R.

Nov 11, 2003 Murali Mani

3NF ExampleLot (propNo, county, lotNum, area, price)

County (county, taxRate)

Candidate key for Lot: <county, lotNum>

FDs: area price

The FD area price violates 3NF

Decomposition:

Lot (propNo, county, lotNum, area)

County (county, taxRate)

Area (area, price)

Nov 11, 2003 Murali Mani

BCNF example

SCI (student, course, instructor)

FDs: student, course instructorinstructor course

Decomposition:SI (student, instructor)

Instructor (instructor, course)

Nov 11, 2003 Murali Mani

Dependency PreservationWe might want to ensure that all specified FDs are captured.BCNF does not necessarily preserve FDs.2NF, 3NF preserve FDs.

student instructor

Dave MM

Greg ER

SI

instructor course

MM DB 1

ER DB 1

Instructor

student instructor course

Dave MM DB 1

Dave ER DB 1

Greg MM DB 1

Greg ER DB 1

SCI (from SI and Instructor)

SCI violates the FDstudent, course instructor

Nov 11, 2003 Murali Mani

Worst Case ExampleConsider relation R (A, B, C, D) with primary key (A, B, C), and FDs B D, and C D. We see that R violates 2NF. Decomposing it, we get 3 relations as:

R1 (A, B, C), R2 (B, D), R3 (C, D)

Let us consider an instance where we need these 3 relations and how we do a natural join ⋈

A B C

a1 b1 c1

a2 b2 c1

a3 b1 c2

B D

b1 d1

b2 d2

R1R2

C D

c1 d1

c2 d2

R3

A B C D

a1 b1 c1 d1

a2 b2 c1 d2

a3 b1 c2 d1

R1 R2: violates ⋈ C D

A B C D

a1 b1 c1 d1

R1 R2 R3: no FD is violated⋈ ⋈

Nov 11, 2003 Murali Mani

Define Foreign Keys? Consider the normalization step: A relation R

with set of attributes AR, and a FD A B, and A is not a key for R, we may decompose R as: Remove attributes B from R, in other words create

R’ (AR – B) Create R’’ with attributes A B Key for R’’ = A

We can also define foreign key R’ (A) references R’’ (A)

Nov 11, 2003 Murali Mani

Multivalued Dependencies (MVDs) Consider relation R with attributes AR.

Consider A = {A1, A2, …, An}, and B = {B1, B2, …, Bm}, where A, B AR. We say that A ↠ B if the set of values of B is dependent only on the value of A.

cNumber title professor textbook

3431 DB 1 MM UW

3431 DB 1 MM GR

3431 DB 1 ER UW

3431 DB 1 ER GR

CPT

cNumber professor↠cNumber textbook↠

Nov 11, 2003 Murali Mani

Properties of MVDs

Consider a relation R with set of attributes AR. Consider A, B AR. If we have the MVD A ↠ B, then the MVD A ↠ AR – A – B is true.

If A ↠ B and A ↠ C, then A ↠ BC However, if A ↠ BC, then A ↠ B and A ↠ C is

not true (contrast with FDs) Any FD is a MVD: so reflexive rule holds, i.e,

if A B, then A ↠ B is always true (Transitive): If A ↠ B and B ↠ C, then A ↠ C

Nov 11, 2003 Murali Mani

4NF (Fourth Normal Form)

Consider a relation R with attributes AR. If there exists a MVD A ↠ B, where A, B AR, A B AR, A B = , and A is not a superkey, then R is not in 4NF.

If R is not in 4NF, we decompose R as: Create R’ with attributes (AR – B) Create R’’ with attributes (A, B), the multivalued

dependency is defined in R’’. Key for R’’ is (A, B)

Nov 11, 2003 Murali Mani

4NF ExampleCPT

Decomposing, we get 3 relations as:

cNumber title

3431 DB 1

R1

cNumber professor

3431 MM

3431 ER

R2

course textbook

3431 UW

3431 GR

R3

cNumber title professor textbook

3431 DB 1 MM UW

3431 DB 1 MM GR

3431 DB 1 ER UW

3431 DB 1 ER GR

cNumber professor↠cNumber textbook↠