CS 338 Functional Dependencies · Cannot determine which functional dependencies hold ... Lossless-Join BCNF Decomposition Example R = {Sno,Sname,City,Pno,Pname,Price }

Winter 2017

Bojana Bislimovska

CS 338

Functional Dependencies

Major research

• Design Guidelines for Relation Schemas

• Functional Dependency

• Set and Attribute Closure

• Schema Decomposition

• Boyce-Codd Normal Form

Outline

Major research

Measures to determine the quality of relation schema design:

• Clear semantics of attributes in the schema

• Reduction of redundant information in tuples

• Reduction of NULL values in tuples

• Disallowment of generating spurious tuples

Above-mentioned measures are not always independent of one another

Design Guidelines for Relation Schemas

Major research

Guideline 1: Design a relation schema so that is easy to explain its meaning.

• Do not combine attributes from multiple entity types and relationship types into a single relation

• Example of violating guideline 1

• Every tuple includes employee and department data

• Redundant repetition of Dname and Manager Ssn for every employee (increases storage space)

• Potential for many NULL values (employee without department, department without employees)


Major research

• Update anomalies

Deleting the last employee in a dept should not delete dept

Changing the dept name/mgr requires many tuples to be updated

Inserting employees requires checking for consistency of its dept name and manager

• Guideline 2: Design base relation schemas so that no insertion, deletion, or modification anomalies are present

• Consistent with guideline 1


Major research

• NULL values in tuples

• If some attributes do not apply to all tuples in the relation

• Multiple interpetation of NULL values

Guideline 3: Avoid placing attributes in a base relation whose values may frequently be NULL. If NULLs are unavoidable, make sure they apply in exceptional cases, and do not apply to a majority of tuples in the relation.


Major research

• Spurious tuples

Guideline 4: Design relation schemas so that they can be joined with equality conditions on appropirately related attribute pairs (primary key, foreign key) that guarantees that no spurious tuples are generated.


Major research

Functional Dependency

• Constraint between two sets of attributes from the database

• Expresses the fact that in a relation schema (values of) a set of attributes uniquely determine (values of) another set of attributes

Definition: Given relation schema R(A1,A2,…,An) and sets of attributes X {A1,A2,…,An}, Y {A1,A2,…,An}, X Y specifies the following constraint (functional dependency): for any tuples t1 and t2 in any valid relation state r of R, if t1[X] = t2[X] then t1[Y] = t2[Y] .

• A functional dependency is a property of the semantics (meaning) of the attributes

• Main use in the further description of a relation schema by specifying constraints on its attributes that must always hold

• Given a relation state

Cannot determine which functional dependencies hold

Can state that functional dependency does not hold if there are tuples that show violation of the dependency

Major research

Functional Dependency

• Notation: {B1,B2,…,Bi} {C1,C2,…,Cj} but can omit set braces if i=1 or j=1, respectively.

• Example

Following functional dependencies hold

Ssn Ename

Pnumber {Pname, Plocation}

{Ssn, Pnumber} Hours

Major research

Functional Dependencies and Keys

• Keys (as defined previously):

A superkey is a set of attributes such that no two tuples (in an instance) agree on their values for those attributes.

A candidate key is a minimal superkey.

A primary key is a candidate key chosen by the DBA

• Relating keys and FDs:

If K ⊆ R is a superkey for relatio s he a R, the depe de K → R holds on R.

If depe de K → R holds o R a d we assu e that R does ot o tai duplicate tuples (i.e. relational model) then K ⊆ R is a superkey for relation schema R

Major research

Closure of FD Sets

How do we know what additional FDs hold in a schema?

• The closure of the set of functional dependencies F (denoted F +) is the set of all functional dependencies that are satisfied by every relational instance that satisfies F.

• Informally, F + includes all of the dependencies in F, plus any dependencies they imply.

Major research

Computing Attribute Closures

• If we want to derive the maximal set of attributes functionally determined by some X (called the attribute closure of X).

function ComputeX +(X, F)

begin

X + := X;

while true do

if there e ists Y → ) ∈ F such that

(1) Y ⊆ X +, and

(2) Z ⊆ X +

then X + := X + ∪ Z

else exit;

return X +;

end

Major research

Computing Attribute Closures

• Let R be a relational schema and F a set of functional dependencies on R. Then

• Theorem:

X is a superkey of R if and only if ComputeX +(X, F) = R

• Theorem:

X → Y ∈ F + if and only if Y ⊆ ComputeX +(X, F)

Major research

Attribute Closure Example

Example: F = { Ssn → EName

Pnum → Pname, Ploc

Ploc, Hours → Allowa e }

ComputeX +({Pnum,Hours},F):

FD X+

initial Pnum, Hours

Pnum→ Pname, Ploc Pnum, Hours, Pname, Ploc

Ploc, Hours→ Allowa e Pnum, Hours, Pname, Ploc, Allowance

function ComputeX +(X, F)

begin

X + := X;

while true do

if there e ists Y → ) ∈ F such that

(1) Y ⊆ X +, and

(2) Z ⊆ X +

then X + := X + ∪ Z

else exit;

return X +;

end

Major research

Schema Decomposition

Let R be a relation schema (= set of attributes). The collection {R1, . . . , Rn} of relation schemas is a decomposition of R if

R = R1 ∪ R2 ∪ · · · ∪ Rn

A good decomposition does not

• lose information

• complicate checking of constraints

• contain anomalies (or at least contains fewer anomalies)

Major research

Lossless-Join Decompositions

• We should be able to reconstruct the instance of the original table from the instances of the tables in the decomposition

Example: Consider replacing

By decomposing into two tables

Student Assignment Group Mark

Ann A1 G1 80

Ann A2 G3 60

Bob A1 G2 60

Student Group Mark

Ann G1 80

Ann G3 60

Bob G2 60

Assignment Mark

A1 80

A2 60

A1 60

Major research


Computing natural join of SGM and AM produces

. . . and we get extra data (spurious tuples). We would therefore lose information if we were to replace the first table, Marks, by SGM and AM.

If re-joining SGM and AM would always produce exactly the tuples in Marks, then we call SGM and AM a lossless-join decomposition

Student Assignment Group Mark

Ann A1 G1 80

Ann A2 G3 60

Ann A1 G3 60

Bob A2 G2 60

Bob A1 G2 60

Major research


• A decomposition {R1, R2} of R is lossless if and only if the common attributes of R1 and R2 form a superkey for either s he a, that is R ∩ R → R or R ∩ R → R2

• Example: In the previous example we had

R = {Student, Assignment, Group, Mark} ,

F = { “tude t, Assig e t → Group, Mark } , R1 = {Student, Group, Mark} ,

R2 = {Assignment, Mark}

• Decomposition {R1, R2} is lossy e ause R ∩ R = {Mark} is ot a superkey of either {Student, Group, Mark} or {Assignment, Mark}

Major research

Normal Forms

What is a good relatio al data ase s he a? • Rule of thumb: Independent facts in separate tables:

Each relation schema should consist of a primary key and a set of mutually independent attri utes

This is achieved by transforming a schema into a normal form.

• Goals:

Intuitive and straightforward transformation

Anomaly-free/Nonredundant representation of data

Major research

Boyce-Codd Normal Form (BCNF) - Informal

• BCNF formalizes the goal that in a good database schema, independent relationships are stored in separate tables.

• Given a database schema and a set of functional dependencies for the attributes in the schema, we can determine whether the schema is in BCNF. A database schema is in BCNF if each of its relation schemas is in BCNF.

• Informally, a relation schema is in BCNF if and only if any group of its attributes that functionally determines any others of its attributes functionally determines all others, i.e., that group of attributes is a superkey of the relation.

Major research

Formal Definition of BCNF

Let R be a relation schema and F a set of functional dependencies.

Schema R is in BCNF (w.r.t. F) if and only if

whenever (X → Y ) ∈ F + and XY ⊆ R, then either

• (X → Y ) is trivial (i.e., Y ⊆ X), or

• X is a superkey of R

A database schema {R1, . . . , Rn} is in BCNF if each relation schema Ri is in BCNF.

Major research

BCNF and Redundancy

• Why does BCNF avoid redundancy? Consider:

Supplied_Items

• The following functional dependency holds: Sno → Sname, City

• Therefore, supplier name “ ith a d it Aja ust e repeated for ea h ite supplied supplier S1.

• Assume the above FD holds over a schema R that is in BCNF. This implies that:

Sno is a superkey for R

each Sno value appears on one row only

no need to repeat Sname and City values

Sno Sname City Pno Pname Price

Major research

Lossless-Join BCNF Decomposition

function DecomposeBCNF(R, F)

begin

Result := {R};

while some Ri ∈ Result a d X → Y ∈ F +

violate the BCNF condition do begin

Replace Ri by Ri − Y − X ; Add {X, Y } to Result;

end;

return Result;

end

Major research

Lossless-Join BCNF Decomposition

• No efficient procedure to do this exists.

• Results depend on sequence of FDs used to decompose the relations.

• It is possible that no lossless join dependency preserving BCNF decomposition exists

Major research

Lossless-Join BCNF Decomposition Example

• R = {Sno,Sname,City,Pno,Pname,Price}

• functional dependencies:

Sno → Sname,City

Pno → Pname

Sno,Pno → Pri e • This schema is not in BCNF because, for example, Sno determines Sname and

City, but is not a superkey of R.

Major research

Lossless-Join BCNF Decomposition Example

• The complete schema is now

R1 = {Sno,Sname,City}

R2 = {Sno,Pno,Price}

R3 = {Pno,Pname}

• This schema is a lossless-

join, BCNF decomposition of

the original schema R.

Documents

CS 338 Functional Dependencies · Cannot determine which functional dependencies hold ... Lossless-Join BCNF Decomposition Example R = {Sno,Sname,City,Pno,Pname,Price }