23
Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK

Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Embed Size (px)

Citation preview

Page 1: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Fundamentals/ICY: Databases2013/14

WEEK 9 –Friday

John BarndenProfessor of Artificial Intelligence

School of Computer ScienceUniversity of Birmingham, UK

Page 2: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Reminder

Page 3: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Relation from a Table

The relation at the moment is

‘9568876A’, ‘Chopples’, 37 >, ‘2544799Z’, ‘Blurp’, NULL >, ‘1698674F’, ‘Rumpel’, 88 >

PERS-ID NAME AGE

9568876A Chopples 37

2544799Z Blurp

1698674F Rumpel 88

People

Page 4: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

A Table as a Relation? People loosely talk about tables being relations.

This is mathematically inaccurate for several reasons:1) The table properly speaking includes not just the rows but also

the attribute names themselves, their domains, specification of primary and foreign keys, etc.

2) It’s only the rows at any given moment that form a relation. When a value in the table changes or a row is added or deleted, the mathematical relation is replaced by a different one.

3) Relations do not cater for tables with repeated rows.

• ((ASIDE: But see next slide for a way out.))

But OK if you know what you (and those people) mean.

Page 5: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

New(last on maths for now)

Page 6: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

((ASIDE: “Bags” in Maths))

A variant of sets called “bags” (or “multisets”) is used in maths (and CS) and allows repeated members. There are union, etc. operations that respect the repetitions.

So bags and their operations are a better fit to DB tables and notably their repetition-respecting operations (e.g. UNION ALL) than sets and their operations are.

But bags are non-standard and they’re not normally covered at an introductory level.

See the databases textbook by Garcia-Molina et al 2009 for bags and their use in the DB area.

Page 7: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

— Back to Database Design —

NORMALIZATION

Page 8: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Normalization

Normalization is often used within ER modeling, to help produce a good database design.

Evaluates entity types, and when appropriate creates new entity types and adjusts attributes in existing ones

(mainly) to minimize certain types of data redundancy, and in some cases to avoid certain types of complexity

Some situations require non-normalization or denormalization for efficiency reasons:

Normalization generally increases the number of tables and makes many queries more elaborate (in straightforward ways, though).

Page 9: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Normal Forms

Normalization can be divided into a series of stages called normal forms, giving more and more protection:

First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Boyce-Codd normal form (BCNF) ((Fourth normal form (4NF) )) Yet others!

1NF is mandatory and we have implicitly already covered it.

Page 10: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

First Normal Form (1NF) Just insists on some restrictions we have already explicitly

or implicitly imposed on entity types and tables:

In the entity type there is a candidate key whose attributes never have NULL values, and one such key has been chosen as the primary key.

There are no “repeating groups” in the table implementing the entity type:

A repeating group is a group of related rows that have some empty cells that are to be thought of as copying values from some other row in the group.

That’s my definition. More usually expressed in terms of having cells with multiple values, but I think this is inaccurate and misleading.

Page 11: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

A Sample Report Layout with “repeated groups”

Page 12: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Another Unusual Feature of that Table

The table has another feature that departs from DB-style tables.

What is it?

Page 13: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

(Partially) Corresponding Attempt at a DB-Style Table

Page 14: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

The Problem with Repeating Groups

Q: Why are they a problem?

A: First reason: Rows in a DB-style table are unordered, so how do you

know which row(s) to “copy” PROJ_NUM and PROJ_NAME values from/to? (Previous diagram is deceptive.)

A: Second reason: Even if you could work out which row(s) to copy

from/to, the copying would make many queries much more complex.

Page 15: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

That Table put into 1NF (assuming there is a PK)

Page 16: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Dependencies and Determinants These concepts are needed for most of the remaining normal forms.

Any set S of attributes in an entity type “determines” each attribute within it, i.e.:

Each attribute in S is “functionally dependent” on the whole set S.

But in the following discussion of normalization…

When we say X is functionally dependent on S – i.e. S determines X – we will mainly be talking about non-trivial cases—cases where X is outside S (though still in the same entity type).

A [non-trivial] “determinant” will be a set of attributes D in a table such that it determines some attribute X outside D in the same entity type.

Page 17: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

1NF can have Undesirable Dependencies

1NF entity types can contain “partial,” “transitive” and other generally undesirable functional dependencies of an attribute X on a determinant D.

By “undesirable” I will mean mainly that the determinant D is not a superkey, so that at least one attribute Y in the entity type is not determined by D,

so Y can have different values in the entity type for equal D values,

so redundancy on DX (repetition of the association between D and X values) can arise.

Page 18: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Partial and Transitive Dependencies

Page 19: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

1NF can have Partial Dependencies

Partial dependency: where the determinant is part but not all of the primary key (and NB: is therefore not a superkey)

The determined attribute X is necessarily outside the whole PK—exercise: why?

Page 20: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Second Normal Form

An entity type is in second normal form (2NF) if:

It is in 1NF and

It includes no partial dependencies

Page 21: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Conversion to 2NF

For each determinant D involved in a partial dependency in the original entity type T,

use D as, also, the PK for a new entity type NT(D)

and move out the attributes X determined by D into NT(D).

D itself stays in T as well as being copied into NT(D).

Page 22: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Reminder:Partial and Transitive Dependencies

Page 23: Fundamentals/ICY: Databases 2013/14 WEEK 9 –Friday John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham,

Second Normal Form (2NF) Conversion results on example on previous slide