21
Further Normalization II: Higher Normal Forms Prof. Yin-Fu Huang Prof. Yin-Fu Huang CSIE, NYUST CSIE, NYUST Chapter 13 Chapter 13

13.1 Introduction

  • Upload
    arlene

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

13.1 Introduction. Multi-valued dependency (MVD)  4NF Join dependency (JD)  5NF. 13.2 Multi-valued Dependencies and Fourth Normal Form (1/5). Example (See Fig. 13.1) Assumptions: 1. For a given course c , there can be any number m of - PowerPoint PPT Presentation

Citation preview

Page 1: 13.1 Introduction

Further Normalization II:

Higher Normal Forms

Prof. Yin-Fu HuangProf. Yin-Fu Huang

CSIE, NYUST CSIE, NYUST

Chapter 13Chapter 13

Page 2: 13.1 Introduction

Advanced Database System Yin-Fu Huang

Multi-valued dependency (MVD) 4NF Join dependency (JD) 5NF

13.113.1 Introduction Introduction

Page 3: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.213.2 Multi-valued Dependencies and Fourth Normal Form (1/5)

Example (See Fig. 13.1)

Assumptions:1. For a given course c, there can be any number m of

corresponding teachers and any number n of corresponding

texts (m>0, n>0).

2. Teachers and texts are quite independent of one another.

3. A given teacher or a given text can be associated with any

number of courses.

Page 4: 13.1 Introduction

Advanced Database System Yin-Fu Huang

Eliminating the relation-valued attributes

(See Fig. 13.2)

13.213.2 Multi-valued Dependencies and Fourth Normal Form (2/5)

CTX satisfies the constraint

If tuples (c, t1, x1), (c, t2, x2) both appear

then tuples (c, t1, x2), (c, t2, x1) both appear also.

Page 5: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.213.2 Multi-valued Dependencies and Fourth Normal Form (3/5)

CTX involves a good deal of redundancy update anomalies

In fact, CTX is in BCNF, since it is “all key.” Multi-valued dependencies (MVDs) are a generalization of functional dependencies.

Two MVDs in CTX: Course →> Teacher

Course →> Text Multi-valued dependence: Let R be a relvar, and let A, B, and C be subsets of the attribut

es of R. Then B is multi-dependent on A, A→>B, if and only if, in every legal value of R, the set of B values matching a given AC value pair depends only on the A value and is independent of the C value.

Page 6: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.213.2 Multi-valued Dependencies and Fourth Normal Form (4/5)

MVDs always go together in pairs.

e.g., Course →> Teacher|Text The two projections CT and CX do not involve any such MVDs.

(See Fig. 13.3)

Page 7: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.213.2 Multi-valued Dependencies and Fourth Normal Form (5/5)

Theorem: Let R{A, B, C} be a relvar, where A, B, and C are sets of attributes. Then R is equal to the join of its projections on {A, B} and {A, C} if and only if R satisfies the MVDs A→>B|C.

Fourth normal form: Relvar R is in 4NF if and only if, whenever there exist subsets A and B of the attributes of R such that the nontrivial MVD A→>B is satisfied, then all attributes of R are also functionally dependent on A.

In other words, the only nontrivial dependencies in R are of the form K→X.

If we start with a relvar involving two or more independent RVAs, it is better to separate the RVAs first.

Page 8: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.313.3 Join Dependencies and Fifth Normal Form (1/5)

A relvar is “n-decomposable” if it can be nonloss-decomposed into n projections but not into m, where 1 < m and m < n.

(See Fig. 13.4)

Page 9: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.313.3 Join Dependencies and Fifth Normal Form (2/5)

The constraint (Constraint 3D):

if the pair (s1, p1) appears in SP

and the pair (p1, j1) appears in PJ

and the pair (j1, s1) appears in JS

then the triple (s1, p1, j1) appears in SPJ

if (s1, p1, j2), (s2, p1, j1), (s1, p2, j1) appear in SPJ

then (s1, p1, j1) appears in SPJ also

A relvar will be n-decomposable for some n > 2 if and only if it satisfies some such (n-way) cyclic constraint.

Page 10: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.313.3 Join Dependencies and Fifth Normal Form (3/5)

Join dependency: Let R be a relvar, and let A, B, ..., Z be subsets of the attributes of R. Then R satisfies the JD

* {A, B, ..., Z} if and only if every legal value of R is equal to the join of its projections on A, B, ..., Z.

Relvar SPJ suffers from a number of update anomalies, anomalies that disappear when it is 3-decomposed.

(See Fig. 13.5)

Page 11: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.313.3 Join Dependencies and Fifth Normal Form (4/5)

An MVD is just a special case of a JD, or that JDs are a generalization of MVDs.

JDs are the most general form of dependency possible.

Fifth normal form: A relvar R is in 5NF, also called projection-join normal form (PJ/NF) if and only if every nontrivial join dependency that is satisfied by R is implied by the candidate keys of R.

Relvar SPJ is in 4NF, but not in 5NF.

Relvar SPJ can be 3-decomposable and

3-decomposability is not implied by the fact that the

combination {S#, P#, J#} is a candidate key.

Page 12: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.313.3 Join Dependencies and Fifth Normal Form (5/5)

Example: * { {S#, Sname, Status}, {S#, City} }

This JD is implied by the fact that {S#} is a candidate key.

Example: * { {S#, Sname}, {S#, Status}, {Sname, City} }

This JD is implied by the fact that {S#} and {Sname} are

both candidate keys.

Given a relvar R, we can tell if R is in 5NF as long as we know all candidate keys and all JDs in R.

However, discovering all of those JDs might itself be a nontrivial exercise.

5NF is the ultimate normal form with respect to projection and join.

e.g., the supplier relvar S

Page 13: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.413.4 The Normalization Procedure Summarized (1/3)

Given some 1NF relvar R and some set of FDs, MVDs, and JDs that apply to R, we systematically reduce R to a collection of “smaller” relvars that are equivalent to R.

The overall process:

1. 1NF 2NF: to eliminate FDs that are not irreducible

2. 2NF 3NF: to eliminate transitive FDs

3. 3NF BCNF: to eliminate remaining FDs in which the determinant is not a candidate key.

4. BCNF 4NF: to eliminate MVDs that are not also FDs

5. 4NF 5NF: to eliminate JDs that are not implied by the candidate key.

Page 14: 13.1 Introduction

Advanced Database System Yin-Fu Huang

Several points:

1. Done in a nonloss way, and preferably in a dependency-preserving way as well.

2. There is a very attractive parallelism among the definitions of BCNF, 4NF, and 5NF.

3. The overall objectives:

a. To eliminate certain kinds of redundancy

b. To avoid certain update anomalies

c. To produce a design that is “good” representation of the

real world

d. To simplify the enforcement of certain integrity constraints

13.413.4 The Normalization Procedure Summarized (2/3)

Page 15: 13.1 Introduction

Advanced Database System Yin-Fu Huang

4. The normalization guidelines are only guidelines, and

occasionally there might be good reasons for not normalizing “all the way.”

5. The notions of dependency and further normalization are

semantic in nature.

6. The ideas of normalization are not a panacea.

a. JDs, MVDs and FDs are not the only kinds of constraints

that can arise in practice.

b. The decomposition might not be unique.

c. The BCNF and dependency preservation objectives can

be in conflict.

d. Not all redundancies can be eliminated in the

normalization procedure.

13.413.4 The Normalization Procedure Summarized (3/3)

Page 16: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.513.5 A Note on Denormalization

It is often claimed that “denormalization” is necessary to achieve good performance. Full normalization lots of logically separate relvars lots of physically separate stored files lots of I/O More specifically, the objective is to reduce the number of joins that need to be done at run time by doing some of those j

oins ahead of time, as part of the database design. Example (See Fig. 13.6)

Page 17: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.513.5 A Note on Denormalization (Cont.) Some problems:

a. Once we start denormalizing, it is not clear where we should stop.

b. There can be retrieval problems too. e.g., Summarize P Per P{Color}

Add Avg (Weight) As Avwt

Summarize PSQ {P#, Color, Weight} Per PSQ{Color}

Add Avg (Weight) As Avwt

c. When we say that denormalization is good for performance, what we really mean is that it is good for the performance of specific applications

Page 18: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.613.6 Orthogonal Design (A Digression)(See Fig. 13.7)

The Principle of Orthogonal Design (initial version): Within a given database, no two distinct base relvars should have overlapping meanings.

Page 19: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.613.6 Orthogonal Design (A Digression) (Cont.)

(See Fig. 13.8)

The Principle of Orthogonal Design (final version): Let A and B be distinct base relvars. Then there must not exist nonloss decompositions of A and B into A1, A2, …, Am and B1, B2, …, Bn such that some projection Ai in the set A1, A2, …, Am and some projection Bj in the set B1, B2, …, Bn have overlapping meanings.

Page 20: 13.1 Introduction

Advanced Database System Yin-Fu Huang

13.713.7 Other Normal Forms

Dependency theory Domain-key normal form “Restriction-union” normal form

(3,3)NF 4NF Sixth normal form

Page 21: 13.1 Introduction

Advanced Database System Yin-Fu Huang

The End.