25
Databases 2013/14 Week 10 –Monday –Normalization, contd John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK

Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Embed Size (px)

DESCRIPTION

Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd. John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK. Reminder. Second Normal Form. An entity type is in second normal form (2NF) if: It is in 1NF and - PowerPoint PPT Presentation

Citation preview

Page 1: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Fundamentals/ICY: Databases2013/14

Week 10 –Monday –Normalization, contd

John BarndenProfessor of Artificial Intelligence

School of Computer ScienceUniversity of Birmingham, UK

Page 2: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Reminder

Page 3: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Second Normal Form

An entity type is in second normal form (2NF) if:

It is in 1NF and

It includes no partial dependencies

Page 4: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Conversion to 2NF

For each determinant D involved in a partial dependency in the original entity type T,

use D as, also, the PK for a new entity type NT(D)

and move out the attributes X determined by D into NT(D).

D itself stays in T as well as being copied into NT(D).

Page 5: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Reminder:Partial and Transitive Dependencies

Page 6: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Second Normal Form (2NF) Conversion results on example on previous slide

Page 7: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

New

Page 8: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Third Normal Form

An entity type is in third normal form (3NF) if:

It is in 2NF and

It contains no transitive dependencies

Page 9: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Ent. Type in 2NF but not in 3NF because of a “transitive” dependency

Page 10: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Transitive Dependencies

A prime attribute is one that is within some candidate key (not necessarily the primary key).

So a non-prime attribute is, in particular, not within the PK.

A transitive dependency is where the determinant D is at least partially outside the PK and is not a superkey,

and the determined attribute X is non-prime (the reason for this restriction is on a later slide).

E.g.: previous Figure for simple case of a simple (= one-attribute) determinant.

Above definition is partly based on Garcia-Molina, Ullman & Widom 2009. More general than the account in our textbook.

Page 11: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Conversion to 3NF

For each determinant D involved in a transitive dependency in the original entity type T,

use D as, also, the PK for a new entity type NT(D)

and move out the attributes X transitively determined by D into NT(D).

NB: the determinants themselves stay in T as well.

Page 12: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Third Normal Form (3NF) Conversion Results on previous example

Page 13: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

The Boyce-Codd Normal Form (BCNF) Determinants of partial and transitive functional dependencies

are not superkeys.

So the corresponding normalization gets rid of some non-superkey determinants used in functional dependencies.

Normalization into BCNF gets rid of all such determinants.

An entity type is in BCNF if it’s in 1NF and every determinant in a functional dependency is a superkey

i.e., every attribute-set that determines any other attribute determines all the attributes, so there’s no redundancy problem

Page 14: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

An Entity Type in 3NF but not in BCNF

The dependency is NOT TRANSITIVE since B is prime

Page 15: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Decomposition to BCNF

The middle diagram shows that changing the PK so as to include C instead of B changes the dependency into a partial one, which can then be removed in the usual way.

Page 16: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

((ASIDE: A Simple Form of BCNF))

Any simple (= one-attribute) superkey is a candidate key.

So BCNF requires, in particular, all simple determinants to be candidate keys.

Some books (incl. our textbook) define BCNF merely to mean in effect that all simple determinants are candidate keys.

This is a simpler, less general form of BCNF.

A table could be in simple-BCNF but not be in full BCNF.

My definition of (full) BCNF is from Garcia-Molina, Ullman & Widom, Database Systems: The Complete Book, 2nd. Ed., Pearson, 2009.

This book also gives a process for conversion to full BCNF.

Page 17: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

BCNF versus 3NF BCNF implies that there are no partial or transitive

dependencies, so a table that is in BCNF is also in 3NF.

((If a table is in 3NF but not BCNF then each of the non-superkey determinants D is partly outside the PK and determines only prime attributes.

If also the PK is the only candidate key, then:

the attributes determined by each D must all be in the PK;

but they cannot cover all of the PK (otherwise D would be a superkey). So the PK must be composite.))

Page 18: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

((A Reason for Prime-X Exclusion in Transitive Dependencies))

Earlier we said that in a transitive dependency the determined attribute X is non-prime (i.e. not within a candidate key). The reason is:

In removing a transitive dependency, we delete the dependent attribute X from the original entity type. If X were within the primary key (special case of candidate key), that key would therefore be disrupted, and this would affect other entity types referencing the table.

But non-primary candidate keys are also sometimes used for such referencing, and are then called secondary keys. So if X were in such a key, the conversion to 3NF would disrupt the referencing.

So, to keep things simple for the purposes of 3NF, all prime Xs are banned from being transitively dependent.

Page 19: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

((Inter-Table Reference Disruption contd.)) NB: Conversion to 2NF can, and from 3NF to BCNF does,

remove dependent prime attributes, so is potentially disruptive of reference between entity types.

But I assume that in practice it’s rarely a problem in conversion to 2NF, because, in partial dependencies, the dependent attributes are rarely prime. In particular, they cannot be in the PK.

By contrast, if a 3NF table is not in BCNF then the troublesome dependencies necessarily involve prime Xs (see a previous slide).

Page 20: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

((3NF and Reference Disruption contd.)) Some textbooks (e.g., Connolly and Begg, Database Systems,

Pearson, 2010) only require transitive dependencies to avoid non-primary-key attributes, rather than to avoid all prime attributes. In that case, conversion to 3NF can disrupt references using a secondary key. But at least the cases of 2NF and 3NF are now more similar to each other.

I haven’t seen a version of 2NF that is only concerned with non-prime Xs. But don’t be too surprised if you come across that!

Page 21: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Material on 4NF:in Week 11 if there’s time (or in

Revision Week)

Page 22: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Normal Forms Overall Let “<<” mean “provides less protection than”. Then:

1NF < < 2NF < < 3NF < < BCNF ((and 3NF << 4NF)) ((Also BCNF < < 4NF under the second definition of 4NF.

BCNF and 4NF guard against relatively unusual situations. BCNF is more disruptive to achieve than 2NF or 3NF.

Merely requiring 2NF is now unusual.

So 3NF is a reasonable target.

Page 23: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Non-Normalization/Denormalization

Normalization leads to more tables.

Joining larger number of tables takes additional disk input/output (I/O) operations, additional manipulation complexity, and possibly substantial communication delays.

Conflicts among design principles, information requirements, and processing speed are often resolved through compromises that may include ending up with some non-normalized tables.

Page 24: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Non-/Denormalization (continued)

Unnormalized tables in a production database tend to have these defects:

Data updates are less efficient to the extent that programs that read and update tables must deal with larger tables

((Indexing is much more cumbersome))

((Unnormalized tables yield no simple strategies for creating virtual tables known as views))

Page 25: Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Summary:Normalization and Database Design

Normalization helps eliminate data redundancies and some other aspects of poor structure.

Normalization focusses on problems in individual entity types.

Difficult to separate normalization from overall ER modelling process.

Normalization cannot, by itself, guarantee good designs.

3NF is often enough, but BCNF, 4NF etc. may also need to be considered.

Non-normalized entity types may be desirable in some cases, to increase processing speed and/or reduce conceptual complexity of operations.