81
Normalization: Kroenke Chapters 3 and 4

Normalization: Kroenke Chapters 3 and 4

  • Upload
    vivien

  • View
    82

  • Download
    0

Embed Size (px)

DESCRIPTION

Normalization: Kroenke Chapters 3 and 4. A relation is categorized by one of several normal forms . An aid to design helps characterize relations that experience anomalies in update operations Higher normal forms TEND to be better design, but not guaranteed . - PowerPoint PPT Presentation

Citation preview

Page 1: Normalization:  Kroenke  Chapters 3 and 4

Normalization: Kroenke Chapters 3 and 4

Page 2: Normalization:  Kroenke  Chapters 3 and 4

A relation is categorized by one of several normal forms.An aid to designhelps characterize relations that experience anomalies in update operationsHigher normal forms TEND to be better design, but not guaranteed.

Page 3: Normalization:  Kroenke  Chapters 3 and 4

Remember the one fact-one place theme!Deletion anomaly:

Deleting 1 fact inadvertently deletes another.Insertion anomaly:

inserting 1 fact not possible without inserting another seemingly unrelated fact.

Page 4: Normalization:  Kroenke  Chapters 3 and 4

First Normal Form – 1NF

A relation is 1NF if each attribute is atomic That is, attributes are simple types (int, float, string, char, etc)

Page 5: Normalization:  Kroenke  Chapters 3 and 4

Second Normal Form – 2NF

How about this as a base table?Primary key

Page 6: Normalization:  Kroenke  Chapters 3 and 4

Definition: R is a relation; X and Y are attributes of R. Y is functionally dependent on X iff each X-value in R has precisely one Y-value in R associated with it.

A common notation is X Y.

Page 7: Normalization:  Kroenke  Chapters 3 and 4

Example:In the supplier’s S table, Status, City, and Name are functionally dependent on S#.In the SP table, Qty is functionally dependent on the combined attributes of S# and P#

S#

Status

City

Name

S#

P#Qty

Page 8: Normalization:  Kroenke  Chapters 3 and 4

City and status are not functionally dependent on each other.There may be several entries containing ‘London’ but different status values.There may be several entries containing a status of 50 but have different cities.

Page 9: Normalization:  Kroenke  Chapters 3 and 4

QTY is not functionally dependent on either P# or S#.S1 might have multiple QTY values for different partsSimilar for P1

Page 10: Normalization:  Kroenke  Chapters 3 and 4

Def: Y is Fully Functionally Dependent on X if X Y but Y is not functionally dependent on any proper subset of X.In S, (S#, Status) City -- but not fully because S# Cityin SP: (S#, P#) Qty -- fully because neither S# nor P# by itself determines Qty. The functional dependence requires BOTH S# and P#.

Page 11: Normalization:  Kroenke  Chapters 3 and 4

Semantic notation. Must understand meaning of data, NOT a consequence of table data.For example, suppose that each city in S has the same status.Is it coincidence or by design?

Page 12: Normalization:  Kroenke  Chapters 3 and 4

Why is this important?

What if we combined the relations S and SP into a single relation, First as in a few slides previous?

First(S#, P#, Status, City, QTY)Underlined attributes represent the primary key.

Page 13: Normalization:  Kroenke  Chapters 3 and 4

Cannot enter fact that a supplier is located in a city unless that supplier already supplies some part. Why?

Insertion Anomalies:

Page 14: Normalization:  Kroenke  Chapters 3 and 4

Deletion Anomalies

Suppose S3 no longer supplies P2.Delete (S3, P2, 10, Paris, 200) if that is the ONLY part S# supplied, you lose fact that S3 is in Paris.

Page 15: Normalization:  Kroenke  Chapters 3 and 4

Update Anomalies

S1 moves from London to Amsterdam. May have to update many entries.Violates the “one fact, one place” guidelineThese problems are caused by dependences on a proper subset of the primary key.

Page 16: Normalization:  Kroenke  Chapters 3 and 4

See also Kroenke’s example on page 95 and the text on page 96.

Page 17: Normalization:  Kroenke  Chapters 3 and 4

A relation is 2NF iff it is 1NF and every non-key attribute is fully functionally dependent on the primary key.There are no attributes dependent on a proper subset of the primary key.

Second Normal Form (2NF)

Page 18: Normalization:  Kroenke  Chapters 3 and 4

Table First is NOT 2NF. Some nonkey attributes are not fully dependent on the primary key (S#, P#).Some are dependent on S# onlyThe S and SP tables ARE 2NF.They are a better design in this case.Similar example in Fig 3-10 on page 106 of text.

Page 19: Normalization:  Kroenke  Chapters 3 and 4

How about this table?Does it contain redundancy?Are there update anomalies?

Page 20: Normalization:  Kroenke  Chapters 3 and 4

Suppose a supplier status is determined by the supplier’s city.That is, City Status.Since also S# City then S# status is a result of these dependencies.A Transitive dependency exists as shown below.

Transitive dependencies

S#

City

Status

Page 21: Normalization:  Kroenke  Chapters 3 and 4

Similarly, a Housing table that links a student with a dorm and a residence fee would also likely have a transitive dependence.

SID

dorm

Fee

Page 22: Normalization:  Kroenke  Chapters 3 and 4

Cannot state fact that a supplier in Rome must have a status of 50 unless there is already a supplier there. Cannot state fact a dorm has a specific cost unless there is already a student there.

Insertion anomalies:

Page 23: Normalization:  Kroenke  Chapters 3 and 4

Deletion anomalies

Delete (S5, 30, Athens) If it’s the ONLY Athens, lose fact that status for Athens must be 30. Delete (100, Randolph, $3200) from the Housing table. If that’s the only “Randolph” then you lose the connection between dorm and cost. 

Page 24: Normalization:  Kroenke  Chapters 3 and 4

Update anomalies

“Change status of London supplier” may mean multiple updates.Violates the “one fact” – “one place” rule. i.e. that each fact should be stored in one place.

Page 25: Normalization:  Kroenke  Chapters 3 and 4

A relation is 3NF iff it is 2NF and every non-key attribute is nontransitively dependent on the primary key. i.e. non-key attributes are mutually independent.Again, it’s a consequence of the meaning of the data, not the data itself.

Third Normal Form (3NF)

Page 26: Normalization:  Kroenke  Chapters 3 and 4

Suppose all London suppliers had a status of 50.Is that coincidence?Is it by design?

Page 27: Normalization:  Kroenke  Chapters 3 and 4

Question:

Is 3NF better than 2NF? Maybe. In the cases presented here, probably so. An employee table where EmpIDAddressZipCode is not 3NF. We may not care about AddressZip_Code unless it’s a UPS or Post Office application.

Page 28: Normalization:  Kroenke  Chapters 3 and 4

 Dividing a table into 2 or more tables to achieve a higher normal form.Previously we divided First into tables S and SP to achieve 2NF.

Table Decomposition

Page 29: Normalization:  Kroenke  Chapters 3 and 4

Now we find that S is not 3NF, so we should decompose S into two tables.We have options:1. SS(S#, Status) and CS(City, Status)2. SC(S#, City) and SS(S#, Status) or3. SC(S#, City) and CS(City, Status)

Which is best?

Page 30: Normalization:  Kroenke  Chapters 3 and 4

Need to ask:Does the decomposition result in a loss of information?For example, can we still relate the attributes that have been separated into two tables?Are the two relations independent of each other?

Page 31: Normalization:  Kroenke  Chapters 3 and 4

Option 1:

SS(S#, Status) and CS(City, Status)

Cannot get the city of a supplier. Can you see why?

Page 32: Normalization:  Kroenke  Chapters 3 and 4

Option 2:

SC(S#, City) and SS(S#, Status)

Relations not independent. If two suppliers are in the same city, must make sure they have the same status. Requires monitoring of changes, possibly the use of triggers. Extra work.

Page 33: Normalization:  Kroenke  Chapters 3 and 4

CAN get the status of a city but ONLY if there’s a supplier there. Otherwise there’s a loss of information.Can’t store the status of a city unless there’s a supplier there.

Page 34: Normalization:  Kroenke  Chapters 3 and 4

Option 3:

SC(S#, City) and CS(City, Status)

Two relations are independent. No loss of informationBest option

Page 35: Normalization:  Kroenke  Chapters 3 and 4

 Decompose the Housing table into one of1. SD(SID, Dorm) and DF(Dorm, Fee)2. SD(SID, Dorm) and SF(SID, Fee)3. SF(SID, Fee) and DF(Dorm, Fee)

Which is better?Construct a similar argument

SID

dorm

Fee

Page 36: Normalization:  Kroenke  Chapters 3 and 4

Best decomposition frequently follows the FD arrows.This is a guideline, not an absolute rule.

Page 37: Normalization:  Kroenke  Chapters 3 and 4

Consider SMA(SID, MID, AID) where a student has one advisor for a major and an advisor advises for one major.This table is 3NF since there is only one non-key attribute.S2 drops Physics and you may lose the fact that A3 advises for Physics.

Determinants

SID MID AIDS1S1S2S2

MathPhysMathPhys

A1A2A1A3

AIDMID

SID

Page 38: Normalization:  Kroenke  Chapters 3 and 4

Def: If Y is fully functionally dependent on X then X is a determinant.Def: A tuple is an entry from a relation. The name is rooted in the historical development by E.J. Codd who used mathematical models to describe relations.Def: An attribute is a candidate key if that attribute uniquely identifies a tuple. A primary key is chosen from a list of candidates keys.Every candidate key is a determinant.

Page 39: Normalization:  Kroenke  Chapters 3 and 4

A relation is BCNF if every determinant is a candidate key. SMA is NOT BCNF since AID is a determinant but not a candidate key.

Boyce-Codd Normal Form (BCNF)

Page 40: Normalization:  Kroenke  Chapters 3 and 4

Possible decompositions:

SA(SID, AID) and AM(AID, MID) No Loss but relations are not independent.How do you “Find the major of S1”. It requires a search of two tables which seems somewhat counterintuitive.

Page 41: Normalization:  Kroenke  Chapters 3 and 4

SM(SID, MID) and AM(AID, MID). Cannot get advisor of a student.

Page 42: Normalization:  Kroenke  Chapters 3 and 4

SA(SID, AID) and SM(SID, MID). Cannot get who advises what.

None of the three possible decompositions seems satisfactory.

Page 43: Normalization:  Kroenke  Chapters 3 and 4

Solution: Look at bigger picture (E-R diagram)

Student (S)

Advisor (A)Major (M)

redundant

Relations: S, M, A (With a foreign key matching the primary key in M), SM, and SA to implement the many-many relationships

Page 44: Normalization:  Kroenke  Chapters 3 and 4

NOTE: With BOTH SM and SA, it is possible for inconsistency to occur. Could have (S1, M5) in SM; (S1, A3) in SA; and have M8 as a foreign key for advisor A3 in the Advisor table. Would need software or triggers to assure consistency which adds to overhead.

Page 45: Normalization:  Kroenke  Chapters 3 and 4

On the other hand, relationship between S and M is derived from relationships between S and A and between A and M. This provides an argument that the relationship between S and M should not be shown as a separate relationship

Page 46: Normalization:  Kroenke  Chapters 3 and 4

Of course, then the fact that “ a student is majoring is something” is NOT explicitly stored. The design is based on business rules which we assume to be correct. May not always be the case.

Page 47: Normalization:  Kroenke  Chapters 3 and 4

Maybe the business rule that states “a student is majoring in something” is flawed.Allows a student to choose a major without having an advisor first.

Page 48: Normalization:  Kroenke  Chapters 3 and 4

Perhaps a better rule is “a student has an advisor, which determines the major”. It would be a model that forces student to choose an advisor, which may be a better rule since many students do NOT seek out advisors in timely fashion.

Page 49: Normalization:  Kroenke  Chapters 3 and 4

Multivalued Dependencies (MVDs)

Consider SMA(Student, Major, Activity)A student can have multiple majors and participate in multiple activities.

Page 50: Normalization:  Kroenke  Chapters 3 and 4

This relation is BCNF vacuously (There are no determinants)Can’t store the major of a student unless that student has an activity.

Student Major ActivityS1S1S2S2

MathMathPhysMath

SwimmingFootballBaseballBaseball

Page 51: Normalization:  Kroenke  Chapters 3 and 4

Another exampleCIX(Courses, Instructor, teXt)

To implement training programs or corporate sponsored courses.Courses taught by many instructors and an instructor can lead many courses.Similar for text and coursesInstructors do NOT choose textsThere are NO determinants in this tableCan’t store the text for a course unless there is an instructor.

Page 52: Normalization:  Kroenke  Chapters 3 and 4

Yet another example on page 95-96.

Page 53: Normalization:  Kroenke  Chapters 3 and 4

Def: Suppose A, B, and C are attributes of a relation. A Multivalued dependency (MVD) AB holds in R if for each A there are multiple B values which are independent of any C values.

Page 54: Normalization:  Kroenke  Chapters 3 and 4

A relation is 4NF if it is BCNF and has no multi-valued dependencies.SMA is NOT 4NF. Would decompose into SM and SA. No loss since there’s no connection between M and A.

Fourth Normal Form (4NF)

Page 55: Normalization:  Kroenke  Chapters 3 and 4

CIX is NOT 4NF. Would decompose into CI and CX. No loss since there’s no connection between I and X.

Page 56: Normalization:  Kroenke  Chapters 3 and 4

There is a 5NF but we will not cover. They rarely occur in practice.

Page 57: Normalization:  Kroenke  Chapters 3 and 4

Landmark paper: Ronald Fagin, “A Normal Fsorm for Relational Databases That Is Based on Domains and Keys”, ACM Transactions on Database Systems, September 1981.

 

Domain/Key Normal Form (DK/NF )

Page 58: Normalization:  Kroenke  Chapters 3 and 4

In this paper heDefined DK/NFProved that a relation in DK/NF has NO modification anomaliesA relation having no modification anomalies must be in DK/NF

Page 59: Normalization:  Kroenke  Chapters 3 and 4

What is it? First, some definitions:

Constraint: a rule governing static values of attributes. e.g. rules such as 0<=gpa<=4; credits >=0; functional dependencies; multivalued dependencies.

Page 60: Normalization:  Kroenke  Chapters 3 and 4

key: unique identifier of a tuple.

domain: description of an attribute’s allowable values

Page 61: Normalization:  Kroenke  Chapters 3 and 4

Def: A relation is DK/NF (Domain Key Normal Form) if every constraint is a logical consequence of the definition of its keys and domains.Without an example this probably makes little sense.

Page 62: Normalization:  Kroenke  Chapters 3 and 4

Ex. (from a previous edition of Kroenke):

Track students, faculty, and who advises whom.Possible relations:

Student(SID, Sname, FID) and

Faculty(FID, Fname, FacStatus)

Page 63: Normalization:  Kroenke  Chapters 3 and 4

FacStatus=0 or 1 (undergrad/graduate);FID begins with 1; SID must not begin with 1; SID of grad students begins with 9. Only graduate faculty can advise graduate students.

Constraints

Page 64: Normalization:  Kroenke  Chapters 3 and 4

Alternative constraint statement: “Grad student must be advised by Grad Faculty” “If Sid starts with 9 then FacultyStatus of the advisor must be 1”

Difficult to enforce through the database design since the relevant data lies in two distinct tables.Each relation is still 1NF through 4NF

Page 65: Normalization:  Kroenke  Chapters 3 and 4

Decomposing tables:

Kroenke discussed Themes. Each relation has a theme. 3 themes here:

Facultygrad advisingundergrad advising

Page 66: Normalization:  Kroenke  Chapters 3 and 4

Possible Tables:Faculty(FID, Fname, FacStatus)G-ADV(GSID, Sname, GFID)UG-ADV(UGSID, Sname, FID)

Page 67: Normalization:  Kroenke  Chapters 3 and 4

FID in CDDD where C=1; D=decimal digit

This is a generic notation for our purposes here. In Access you’d write: FID like “1###”; In SQL Server you’d write: FID like “1[0-9][0-9][0-9]” (See F-Adv table in the university database)

Domain Definitions

Page 68: Normalization:  Kroenke  Chapters 3 and 4

Fname in Char(30)FacStatus in [0, 1]GSID in CDDD where C=9; D=Decimal digitUGSID in CDDD where C!=1; C!=9, D= decimal digitSee G-Adv and UG-ADV tables for exact syntax

Page 69: Normalization:  Kroenke  Chapters 3 and 4

Sname in CHAR(30)GFID in {Select FID of Faculty, where FacStatus=1} (assuming the DBMS supports this type of constraint)There is a trigger in G-Adv to implement the equivalent of this.

Page 70: Normalization:  Kroenke  Chapters 3 and 4

All constraints are met by enforcing key and domain restrictions. i.e. it is DK/NF This relation is guaranteed to have NO modification anomalies.

Relations & Keys:

Page 71: Normalization:  Kroenke  Chapters 3 and 4

A company hires student interns to work on various projects under the guidance of company employees. Semantics are as follows:

A student intern can work on several projects and a project can use several interns.A project can have several team leaders (or co-leaders) which are company employees but an employee works on only one project.For each project on which an intern participates there is one team leader to which that intern must report.Consider a table, IPE, consisting of 3 attributes: Intern ID (I#), Project ID (P#), and employee ID (E#). So for example, if (I4, P5, E3) is an entry in this table then it means that Intern I4 is working on project P5 and must report to team leader E3 for that project.

Examples:

Page 72: Normalization:  Kroenke  Chapters 3 and 4

(I#, P#) E# and E# P# Primary key should be (I#, P#) violates BCNF

List FDs; what should the primary key be? Find the lowest normal form that is violated?

Page 73: Normalization:  Kroenke  Chapters 3 and 4

(I#, P#) E# and E# P#

Consider three possible decompositions of the above relation as follows. Primary keys are underlined.

Table IE(I#, E#) and IP(I#, P#).IE might contain (I2, E4) and (I2, E6); IP might contain (I2, P3) and (I2, P5);What project does E4 work on?Lose the project to which an employee is assigned.

Page 74: Normalization:  Kroenke  Chapters 3 and 4

(I#, P#) E# and E# P#

Table IE(I#, E#) and EP(E#, P#)Since E# P# we can get the project that an intern is working on through a join of these two tables.

Page 75: Normalization:  Kroenke  Chapters 3 and 4

(I#, P#) E# and E# P#

Table EP(E#, P#) and IP(I#, P#) EP might contain (E2, P4) and (E4, P4); IP might contain (I2, P4)To which employee does I2 report?Lose the employee to which an intern reports.

Page 76: Normalization:  Kroenke  Chapters 3 and 4

Assume the following scenario in a university in which a student is paid by a department to do work for a faculty member. Semantics are as follows:

A department can hire many students and a faculty member can have many students working for him/her.Each student can work for only one faculty member and is paid through the faculty member’s department budget.A department has many faculty members but each faculty member is a member of one department.There is a table consisting of 3 attributes: Student ID (SID), Department ID (DID), and Faculty ID (FID). So for example, if (S4, D5, F3) is an entry in this table then it means that Student S4 is working for faculty member F3 who, in turn, is a member of department D5.

Page 77: Normalization:  Kroenke  Chapters 3 and 4

SID FID DIDPrimary key should be SIDViolates 3NF

List FDs; what should the primary key be? Find the lowest normal form that is violated?

Page 78: Normalization:  Kroenke  Chapters 3 and 4

SID FID DID

Consider three possible decompositions of the above relation as follows:

SF(SID,FID) and SD(SID, DID): By doing a join between these tables, you can get the department of a faculty memberBut only IF the faculty member has a student employee.Also, tables are not independent

Page 79: Normalization:  Kroenke  Chapters 3 and 4

SID FID DID

SF(SID, FID) and FD(FID, DID): By doing a join between these tables, you can get the department that is paying the student.

Page 80: Normalization:  Kroenke  Chapters 3 and 4

SID FID DID

FD(FID, DID) and SD(SID, DID)Can you construct an example that shows you may not get the faculty member for whom a student is working?

Page 81: Normalization:  Kroenke  Chapters 3 and 4

From a previous exam

An organization needs to track many ongoing projects, the department responsible for each project, and which employees are project leaders. Rules are as follows

Each project is the responsibility of a single department. Each project has one project leader who is a member of the department responsible for the project.A department can have many employees and be responsible for many projects.An employee can be a project leader for several projects.Each employee is assigned to one department.

Proceed as in the previous slides