13
Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK

Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Embed Size (px)

Citation preview

Page 1: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Fundamentals/ICY: Databases2012/13

WEEK 11 – 4th Normal Form(optional material)

John BarndenProfessor of Artificial IntelligenceSchool of Computer ScienceUniversity of Birmingham, UK

Page 2: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Fourth Normal Form (4NF)

About a different sort of issue from 2NF/3NF/BCNF.

Those NFs are concerned with the redundancy from functional dependencies (FDs).

4NF is concerned with redundancy resulting from multivalued dependencies (MVDs).

Page 3: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Fourth Normal Form (4NF), contd.

A multivalued dependency of some attribute X on an attribute-set D is like a functional dependency except that X sometimes has several values for a given value of D.

The crucial point is that once the D value is specified, the X values are independent of other attributes.

So, we can think of X as a multivalued attribute implemented by putting different values in different rows, where the set of X values is fully determined by just the value of D.

E.g.: imagine multivalued car-colour being determined by just the make and year of the car.

Page 4: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Notes re Multivalued Dependencies Caution: some books take functional dependencies to be just a

special case of multivalued dependencies. So all dependencies are technically “multiple”, but some actually involve multiplicity and some don’t.

The determinant D in a (truly) multivalued dependency cannot be a superkey, because if it were then there could only be one X value per D value.

The D/X association doesn’t violate 2NF, 3NF or BCNF because it’s not a functional dependency.

“Trivial” multivalued dependencies include those where D together with X forms a superkey (including the case where there are no other attributes). Trivial MVDs avoid the problem on the next slide.

Page 5: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Fourth Normal Form

[R,C&C and R&C:] A table is in 4NF if

It is in 3NF and

It does not have multiple multivalued dependencies

[Garcia-Molina et al.:] A table is in 4NF if

It is in BCNF

It does not have any non-trivial multivalued dependencies

Page 6: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Example of Multiple MDs

Example: an employee may be assigned to several work assignments and may, independently of that, help several different charitable organizations.

If we try to use one table, we have

a multivalued dependency of assignment on (say) employee-id

a multivalued dependency of charitable-org on employee-id

Page 7: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Three Ways of Trying to Encode the two multivalued dependencies

(Figure no. shown is from R&C 6th ed. It is 5.10 in 7th ed, and Fig. 7.10 in R,C&C.)

Page 8: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Problems with those Multiple MDs

Those methods cause wasted space, redundancy, and/or additional manipulation complexity (with distinct possibility of getting the manipulation wrong).

Because of NULL values it may be difficult to define a good or any PK. May need to replace NULLs by some other special value.

Page 9: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

A Set of Tables in 4NF (Figure no. shown is from 6th ed. of textbook. It is 5.11 in 7th ed., and 7.11 in R,C&C)

Page 10: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Notes on the Resulting Tables1) Tables ASSIGNMENT and SERVICE_V1 are bridging tables.

2) The PK of SERVICE_V1 consists of both its attributes.

3) The PK of ASSIGNMENT is meant to be ASSIGN_NUM. But note that the other 2 columns also form a candidate key.

4) Each of the tables in the diagram is in 4NF, under both definitions of 4NF.

A. Each table is in BCNF (and hence 3NF), and

B. The only tables containing MVDs are ASSIGNMENT and SERVICE_V1, and

C. In each of these tables, there is only one MVD, with determinant = EMP_NUM, and

D. Each of these MVDs is trivial: the attributes involved in it (the “D” together with the “X”) is a superkey.

Page 11: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Problems even with a Single MVD1) Suppose there is an attribute Z (different from X) that is not

determined by D together with X, such as SIZE. (Hence, also, Z is not determined by D by itself.) Then there are different represented objects (e.g. cars) with different values of Z but the same value of D and X, and each such object needs to have rows in the table to cover all the different values of X (e.g., red, blue and green) associated with that value of D.

So we get redundancy of representation of the D/X association (same problem as with e.g. partial and transitive dependencies, but now worse because of the multi-valuedness of X).

Notice that the above situation can only happen if the MVD is non-trivial. If the MVD were trivial you wouldn’t be able to have a Z as above.

Page 12: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Problems with a Single MD, contd.

2) Just the problem covered earlier in module concerning car-colour: if there is another attribute Y in the table and Y is determined by D , then:

either it has a value repeated in all the different rows holding the different X values for a single D value, so we get redundancy of the representation of the D/Y association

or if, say, NULLs are used instead of repeating the Y value, we get extra manipulation complexity in handling/maintaining Y.

Page 13: Fundamentals/ICY: Databases 2012/13 WEEK 11 – 4 th Normal Form (optional material) John Barnden Professor of Artificial Intelligence School of Computer

Problems with a Single MD, contd.

But note that problem 2 is prevented from arising if the table is in BCNF, because D has to be a non-superkey determinant (of Y), and this is disallowed by BCNF.

Similarly, get some such protection from problem 2 if the table is in 3NF or just 2NF.

But BCNF etc. don’t prevent either problem 1 or special problems arising from the interaction of different multivalued dependencies.