Upload
hadar
View
43
Download
0
Embed Size (px)
DESCRIPTION
Chapter 7: Relational Database Design. Where are we? We can build an ER model for information in an enterprise convert it to relation schema (tables) Are we done? (no) Relational database design requires that we find a “good” collection of relation schemas Design goals: - PowerPoint PPT Presentation
Citation preview
1.1
Chapter 7: Relational Database DesignChapter 7: Relational Database Design
Where are we?
We can build an ER model for information in an enterprise
convert it to relation schema (tables)
Are we done? (no)
Relational database design requires that we find a “good” collection of relation schemas
Design goals: avoid redundant data
ensure that relationships among entities are represented
facilitate checking updates for violation of integrity constraints
1.2
ExampleExampleLending_schema =
(branch_name, branch_city, assets, customer_name, loan_number, amount)
Redundancy: data for branch_name, branch_city, assets are repeated for each loan that a branch makes wasted space
complicated updating with possibility of inconsistency of assets value
Using nulls to store information about a branch if no loans exist
Normalization: decomposing into more relations, e.g.
(branch_name, branch_city, assets) and
(branch_namecustomer_name, loan_number, amount)
1.3
7.1.1 Design Alternative: Larger Schemas7.1.1 Design Alternative: Larger Schemas(not usually a good idea)(not usually a good idea)
Suppose we combine borrower and loan to get
bor_loan = (customer_id, loan_number, amount )
Result is possible repetition of information (L-100 in example below)
Normally db design goes from the one table to two
1.4
More Important: Decomposing SchemasMore Important: Decomposing Schemas
Start with bor_loan = (customer_id, loan_number, amount )
If more than one customer has the same loan, the amount is repeated
Option: split (decompose) it into borrower (customer_id, loan_number) and loan= (loan_number, amount )
How would we know to do this?
The value of amount depends on loan_number
called a functional dependency:
loan_number amount
This suggests decomposing bor_loan into borrower and loan
Nice property: borrower loan = bor_loan
1.5
Not all Decompositions are GoodNot all Decompositions are Good
employee = (employee_id, employee_name,telephone_number,
start_date) decomposes into
employee1
= (employee_id, employee_name)
employee2
= (employee_name, telephone_number,
start_date)
We lose information:
employee1 employee2 ≠ employee
Can’t reconstruct the original employee
relation
This is a lossy decomposition
1.6
Normal FormsNormal Forms
Requirements on relations that reflect good design
Most basic: first normal form
A domain is atomic if its elements are considered indivisible units non-atomic: a set of names, composite attributes
Relational schema R is in first normal form if the domains of all attributes are atomic
Non-atomic values encourage redundant data and complicate storage e.g. multivalued attribute children might be stored with each parent
example from text: employeeID concatenates dept# and unique ID
These are two different pieces of information that might need to be split apart
Assume all relations are in first normal form e.g. person(name, birthday) should be defined as
person(name, b_month, b_year, b_day)
1.7
Decide what good form means
If R is not in good form, decompose it into a set of relations
{R1, R2, ..., Rn}
such that
each Ri is in good form
the decomposition is a lossless-join decomposition
preferably, the decomposition is dependency preserving
Theory is based on functional dependencies generalization of the notion of a key
allow us to decompose a relation meaningfully
Goal:Goal: A Theory A Theory
1.8
Functional DependenciesFunctional Dependencies
Constraints on the set of legal relations
When the value for a set of attributes uniquely determines the value for another set of attributes e.g. city and state imply the value for sales_tax in
(city_name, state_name, sales_tax)
expressed as
city_name, state_name sales_tax
Does every relation have a functional dependency? primary key any set of attributes
trivial dependencies (satisfied by all instances of a relation):
customer_name, loan_number customer_name
customer_name customer_name
is trivial if
1.9
Functional Dependencies (cont.)Functional Dependencies (cont.)
Let R be a relation schema and R , R
The functional dependency
holds on R if and only if for any legal relations r(R), when any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes :
t1[] = t2 [] t1[ ] = t2 [ ]
Example: Consider r(A,B ) with the following instance of r
On this instance, A B does not hold, but B A does
Does this make B A a functional dependency?
branch_name branch_city holds on slide 2. Will it always?
1 41 53 7
1.10
Functional Dependencies (cont.)Functional Dependencies (cont.)
Revisiting candidate and superkeys:
K is a superkey for relation schema R iff K R
K is a candidate key for R iff
K R, and
for no K, R
Some constraints cannot be expressed using superkeys.
Loan_info_schema =
(branch_name, loan_number, customer_name, amount)
might have functional dependency
customer_name branch_name
but customer_name is not a superkey
Functional dependencies can express constraints such as this
Updates to the db can be tested to see if they satisfy the given set of functional dependencies
1.11
Functional Dependencies (cont.)Functional Dependencies (cont.)
A specific instance of a schema may satisfy a functional dependency, even if the functional dependency does not hold on all legal instances. e.g., some, but not all, instances of loan_schema may satisfy
loan_number customer_name.
Database design goal: identify and enforce FDs that should hold
1.12
Boyce-Codd Normal FormBoyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form
, where R and R,
at least one of the following holds: is trivial (i.e., )
is a superkey for R
Example: R = (A, B, C), key = {A}
F = {A B B C}
R is not in BCNF (why?)
Decompose R into R1 = (A, B), R2 = (B, C)
R1 and R2 are in BCNF (why?)
This decomposition is dependency-preserving
1.13
Decomposing a Schema into BCNFDecomposing a Schema into BCNF
Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF. Decompose R into: (U )
( R - ( - ) )
Back to the example
R = (A, B, C), key = {A}
F = {A B, B C}
B C causes the violation, so
= B
= C
and R = (A, B, C) is replaced by
(U ) = ( B,C )
( R - ( - ) ) = ( A, B )
1.14
BCNF and Dependency PreservationBCNF and Dependency Preservation
Consistency constraints need to be checked with database update
E.g. attribute types
uniqueness of the primary key
foreign key/primary key consistency
others expressed in SQL: assertions, triggers, check constraints
functional dependencies (dependency preserving)
Functional dependencies that span more than one relation are costly to check
Example: R = (A, B, C), key = {A} F = {A B, B C}:
R1(A,B), R2(B,C) – can check dependencies easily
R1(A,B), R2(A,C) – need to join R1 and R2 to check B C
1.15
BCNF ExampleBCNF Example
R = (branch_name, branch_city, assets, customer_name, loan_number, amount)
F = {branch_name assets, branch_city loan_number amount, branch_name}
Key = {loan_number, customer_name}
1. R is not BCNF, use branch_name assets, branch_city to obtain
R1 = (branch_name, branch_city, assets)
R2 = (branch_name, customer_name, loan_number, amount)
2. R2 is not BCNF, use loan_number amount, branch_name
R3 = (branch_name, loan_number, amount)
R4 = (customer_name, loan_number)
3. Final decomposition: R1, R3, R4
1.16
Third Normal Form: MotivationThird Normal Form: Motivation
In some situations BCNF is not dependency preserving, and
efficient checking for FD violation on updates is important
Example
R = (J, K, L) F = {JK L, L K}candidate keys are JK and JL R is not in BCNF (L is not a superkey)
possible decomposition: R1 = (L, K), R2 = (L, J)
does not preserve JK L
Solution: a weaker normal form, third normal form (3NF) allows some redundancy (with resultant problems)
but functional dependencies can be checked on individual relations without computing a join
There is always a 3NF lossless-join, dependency-preserving decomposition
1.17
Third Normal FormThird Normal Form
Relation schema R is in third normal form (3NF) if for all:
in F+
at least one of the following holds: is trivial (i.e., )
is a superkey for R
each attribute A in – is contained in a candidate key for R.
(each attribute may be in a different candidate key)
If a relation is in BCNF it is in 3NF one of the first two conditions above always holds in BCNF
Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later)
1.18
3NF Example3NF Example
R = (J, K, L )F = {JK L, L K }
Candidate keys JK, JL
R is in 3NF JK L JK is a superkey
L K K is contained in a candidate key
Possible problems in this schema: repetition of information
(e.g., the relationship l1, k1)
need to use null values (e.g., to represent the relationship l2, k2 where there is no corresponding value for J)
Jj1
j2
j3
null
Ll1
l1
l1
l2
Kk1
k1
k1
k2
1.19
Design GoalsDesign Goals
Goal for a relational database design is: BCNF
lossless join
dependency preservation.
If we cannot achieve this, we accept one of lack of dependency preservation
redundancy due to use of 3NF
Next step: functional-dependency theory a formal theory that tells us which functional dependencies are
implied logically by a given set of functional dependencies
algorithms to generate lossless decompositions into BCNF and 3NF
algorithms to test if a decomposition is dependency-preserving
1.20
Closure of a Set of FDsClosure of a Set of FDs
Given a set F set of FDs, other FDs are logically implied by F example: If A B and B C, we can infer that A C
The set of all functional dependencies logically implied by F is the closure of F, denoted F+.
We can find all of F+ by applying Armstrong’s Axioms: if , then (reflexivity)
if , then (augmentation)
if , and , then (transitivity)
These rules are sound (generate only functional dependencies that actually hold) and
complete (generate all functional dependencies that hold)
1.21
ExampleExample
R = (A, B, C, G, H, I)F = { A B, A C, CG H, CG I, B H}
Some members of F+
A H
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG and then transitivity with CG I
CG HI
by augmenting CG I to infer CG CGI,
and augmenting of CG H to infer CGI HI,
and then transitivity
1.22
Procedure for Computing FProcedure for Computing F++
To compute the closure of a set of functional dependencies F:
F + = Frepeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity then add the resulting functional dependency to
F +
until F + does not change any further
Alternative procedure for this task later
CS 325 spring ‘02
1.23
Example AgainExample Again
R = (A, B, C, G, H, I)F = { A B
A CCG HCG I B H}
reflexivity and augmentation give AA, ... ,AG BG, AG CG,
CGI HI, CG CGI,
transitivity gives A H from A B and B H
CG HI from CG CGI, and CGI HI
... (takes forever)
1.24
Functional Dependencies Functional Dependencies Simplify manual computation of F+ by additional rules
If and hold, then holds (union)
If holds, then holds and holds (decomposition)
If and hold, then holds (pseudotransitivity)
These rules can be inferred from Armstrong’s axioms
The closure under F of a set of attributes, (denoted by +) as the set of attributes that are functionally determined by under F
Algorithm to compute +, the closure of under F
result := ;while (changes to result) do
for each in F dobeginif result then result := result end
1.25
Example of Attribute Set ClosureExample of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A BA C CG HCG IB H}
(AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Is AG a candidate key? 1. Is AG a super key?
2. Does AG R? == Is (AG)+ R
Is any subset of AG a superkey?1. Does A R? == Is (A)+ R
2. Does G R? == Is (G)+ R
1.26
Uses of Attribute ClosureUses of Attribute Closure
Testing for superkey: to test if is a superkey, compute +, and check if + contains all
attributes of R
Testing functional dependencies to check if a functional dependency holds (or, in other words,
is in F+), just check if +.
that is, compute + by using attribute closure, and then check if it contains .
a simple, useful, and cheap testl
Computing F+
for each R, find the closure +, and for each S +, output a functional dependency S.
CS 325 spring ‘02
1.27
Finding F+Finding F+
R = (A, B, C, G, H, I)F = { A B, A C, CG H, CG I, B H}
Find the closure of attribute combinations A+ = ABCH
B+ = BH
C+ = C
G+ = G
H+ = H
I+ = I
CG+ = CGHI ...
Subset of F+
A ABCH
B BH
CG CGHI
1.28
Canonical CoverCanonical Cover
Sets of functional dependencies may have redundant dependencies that can be inferred from the others For example: A C is redundant in: {A B, B C}
Parts of a functional dependency may be redundant
E.g.: on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D}
E.g.: on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D}
Intuitively, a canonical cover of F is a minimal set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies
Why do we want this? To have a minimal set to check when performing an update.
1.29
Extraneous AttributesExtraneous Attributes
Consider a set F of functional dependencies and the functional dependency in F. Attribute A is extraneous in if A
and F logically implies (F – { }) {( – A) }.
Attribute A is extraneous in if A and the set of functional dependencies (F – { }) { ( – A)} logically implies F.
Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one
Example: F = {A C, AB C } B is extraneous in AB C because {A C, AB C} logically
implies A C (I.e. the result of dropping B from AB C).
Example: F = {A C, AB CD} C is extraneous in AB CD since AB C can be inferred even
after deleting C
1.30
Testing if an Attribute is ExtraneousTesting if an Attribute is Extraneous
Consider a set F of functional dependencies and the functional dependency in F.
To test if attribute A is extraneous in
1. compute ({} – A)+ using the dependencies in F
2. check that ({} – A)+ contains ; if it does, A is extraneous in
To test if attribute A is extraneous in
1. compute + using only the dependencies in F’ = (F – { }) { ( – A)},
2. check that + contains A; if it does, A is extraneous in
1.31
Canonical CoverCanonical Cover
A canonical cover for F is a set of dependencies Fc such that F logically implies all dependencies in Fc, and
Fc logically implies all dependencies in F, and
no functional dependency in Fc contains an extraneous attribute, and
each left side of functional dependency in Fc is unique.
To compute a canonical cover for F:repeat
Use the union rule to replace any dependencies in F 1 1 and 1 2 with 1 1 2
Find a functional dependency with an extraneous attribute either in or in
If an extraneous attribute is found, delete it from until F does not change
(Union rule may apply again after deleting extraneous attributes)
1.32
Computing a Canonical CoverComputing a Canonical Cover
R = (A, B, C)F = {A BC
B C A BAB C}
Combine A BC and A B into A BC Set is now {A BC, B C, AB C}
A is extraneous in AB C check if the result of deleting A from AB C is implied by the other dependencies
yes: in fact, B C is already present!
set is now {A BC, B C}
C is extraneous in A BC check if A C is logically implied by A B and the other dependencies
yes: using transitivity on A B and B C.
can use attribute closure of A in more complex cases
The canonical cover is: A BB C
1.33
Lossless-join DecompositionLossless-join Decomposition
For the decomposition R = (R1, R2), we require that for all
possible relations r on schema R
r = R1 (r ) R2 (r )
(lossless join)
A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+:
R1 R2 R1
R1 R2 R2
1.34
ExampleExample
R = (A, B, C)
F = {A B, B C) can be decomposed in two different ways
R1 = (A, B), R2 = (B, C)
lossless-join decomposition:
R1 R2 = {B} and B BC
dependency preserving
R1 = (A, B), R2 = (A, C)
lossless-join decomposition:
R1 R2 = {A} and A AB
not dependency preserving (cannot check B C without computing R1 R2)
1.35
Dependency PreservationDependency Preservation
Let Fi be the set of dependencies F + that include only attributes in Ri.
A decomposition is dependency preserving, if
(F1 F2 … Fn )+ = F +
Why is this nice? each dependency can be checked using only one relation
Using F is not sufficient:
R = (A, B, C )
F = {A BC}
R1 = (A, B), R2 = (A, C)
(F1 F2 )+ = (φ) +
But: F + includes A B and A C
(F1 F2 )+ =( (A B) (A C) ) + = F +
1.36
Dependency PreservationDependency Preservation
Computing F + is expensive.
Alternate: check each dependency in F
To check if is preserved in the decomposition of R into R1, R2,
…, Rn :
result = while (changes to result) do
for each Ri in the decomposition
t = (result Ri)+ Ri //whatever in Ri is dependent on
result = result t
If result contains all attributes in , then the functional dependency is preserved
1.37
ExampleExample
R = (A, B, C )
F = {A BC B C}
R is not in BCNF because of B C
R1 = (A, B), R2 = (B, C) is BCNF.
Check dependency preserving for A BC:
result = A
Use R1: t = (result Ri)+ Ri = (A (A,B))+ (A, B)
= (A, B, C) (A, B) = (A, B)
result = (A,B)
Use R2: t = (result R2)+ R2 = ((A,B) (B,C))+ (B, C)
= (B)+ (B, C) = (B, C) (B, C) = (B, C)
result = (A,B) (B, C) = (A,B,C) which contains BC
result = while (changes to result) do
for each Ri in the decomposition (attribute closure is wrt F) t = (result Ri)+ Ri
result = result t
If result contains all attributes in , then the functional dependency
is preserved
1.38
Testing for BCNFTesting for BCNF
To check if a non-trivial dependency causes a violation of BCNF compute + (the attribute closure of ), and verify that it includes all attributes of R (it is a superkey of R).
Simplified test: check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.
Even simpler: use Fc
Problem: using only Fc is incorrect for testing relations in a decomposition of R example: R = (A, B, C, D, E), with F = { A B, BC D}
decompose R into R1 = (A,B) and R2 = (A,C,D, E)
neither of the dependencies in F contain only attributes from (A,C,D,E) so we might think R2 satisfies BCNF
dependency AC D in F+ shows R2 is not in BCNF
1.39
Testing A Decomposition for BCNFTesting A Decomposition for BCNF
To check if a relation Ri in a decomposition of R is in BCNF,
test Ri for BCNF with respect to the restriction of F+ to Ri (all FDs in F+ that contain only attributes
from Ri)
or use the original set F with the following test:
for every set of attributes Ri, check that + (the attribute closure of ) either includes no attribute of Ri- , or includes all attributes of Ri.
if the condition is violated by some in F,
the dependency (+ - ) Ri holds on Ri and
Ri violates BCNF.
use to decompose Ri
Apply to R = (A, B, C, D, E), with F = { A B, BC D} and
R1 = (A,B) and R2 = (A,C, D, E): a possible problem is AC in R2
(AC)+ = (ABCD), thus AC D is a dependency.
Decompose R2 to R3 = (ACD) and R4 =(ACB)
1.40
BCNF Decomposition AlgorithmBCNF Decomposition Algorithm
result := {R };done := false;
compute F +;while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then beginlet be a nontrivial functional dependency that holds
on Ri such that Ri is not in F +,
and = ; result := (result – Ri ) (Ri – ) (, );
endelse done := true;
Each Ri is now in BCNF and the decomposition is lossless-join
1.41
Example of BCNF DecompositionExample of BCNF Decomposition
Original relation R and functional dependency F
R = (A, B, C, D, E, F)
F = {B A, ACD, BC E, BC AF}
Key = {BC}
Canonical cover Fc:
{B A, ACD, BC EF}
Use B A to decompose R:
R1 = (A,B )
R2 = (B,C,D,E,F)
Use BC EF to decompose R2
R3 = (BCEF)
R4 = (BCD)
Use attribute closure tests to see if done, e.g. (BC) on R4
Final decomposition: R1, R3, R4
1.42
BCNF SummaryBCNF Summary
Algorithm says decompose using F +
This takes forever
Easier: use Fc
But this may miss some FDs
So, use Fc but check attribute closure:
for every set of attributes Ri, check that + (the attribute closure of ) either includes no attribute of Ri- , or includes all attributes of Ri.
1.43
3NF Decomposition Algorithm3NF Decomposition AlgorithmLet Fc be a canonical cover for F;i := 0;for each functional dependency in Fc do
if none of the schemas Rj, 1 j i contains then begin
i := i + 1;Ri := ;
end if none of the schemas Rj, 1 j i contains a candidate key for R
then begin i := i + 1; Ri := any candidate key for R;
end return (R1, R2, ..., Ri)
Each relation schema Ri is in 3NF (Ri := : → is in the canonical cover, the last Ri contains a candidate key and nothing else so can't be decomposed
Decomposition is dependency preserving (each → in the canonical cover is in one of the Ri
Decomposition is lossless-join (one of the Ri s contains a candidate key)
1.44
3NF Decomposition3NF Decomposition
Original relation R and functional dependency F
R = (A, B, C, D, E, F)
F = {B A, ACD, BC E, BC AF}
Key = {BC}
Canonical cover Fc: {B A, ACD, BC EF}
Decomposition
R1 = (A,B )
R2 = (A,C,D)
R3 = (BCEF)
R3 contains a candidate key so we are done
Compare with BCNF
R1 = (AB)
R3 =(BCEF)
R3 =(BCD) which is not dependency preserving
1.45
3NF Decomposition: Example3NF Decomposition: Example
Relation schema:
cust_banker_branch = (customer_id, employee_id, branch_name, type )
The functional dependencies for this relation schema are:
customer_id, employee_id branch_name, type
employee_id branch_name
customer_id, branch_name employee_id
Compute a canonical cover:
branch_name is extraneous in the r.h.s. of the 1st dependency
no other attribute is extraneous, so we get FC =
customer_id, employee_id type employee_id branch_name
customer_id, branch_name employee_id
1.46
3NF Decomposition Example (Cont.)3NF Decomposition Example (Cont.)
The for loop generates following 3NF schema:
(customer_id, employee_id, type )
(employee_id, branch_name)
(customer_id, branch_name, employee_id)
(customer_id, employee_id, type ) contains a candidate key of the original schema, so no further relation schema needs be added
The second schema is redundant
Minor extension of the 3NF decomposition algorithm: at end of for loop, detect and delete schemas which are subsets of other schemas
If the FDs were considered 1st, 3rd, 2nd (employee_id, branch_name) would not be included in the decomposition because it is a subset of (customer_id, branch_name, employee_id)
Final result will not depend on the order in which FDs are considered
1.47
Comparison of BCNF and 3NFComparison of BCNF and 3NF
It is always possible to decompose a relation into a set of relations that are in 3NF such that: the decomposition is lossless
the dependencies are preserved
It is always possible to decompose a relation into a set of relations that are in BCNF such that: the decomposition is lossless
it may not be possible to preserve dependencies
1.48
ER Model and NormalizationER Model and Normalization
When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not need further normalization
In a real (imperfect) design, there can be functional dependencies from non-key attributes of an entity example: an employee entity with attributes department_number
and department_address, and a functional dependency department_number department_address
good design would have made department an entity
1.49
Denormalization for PerformanceDenormalization for Performance
May want to use non-normalized schema for performance when queries span two relations (avoid a join)
Alternative 1: Use denormalized relation containing attributes that are usually combined faster lookup
extra space and extra execution time for updates
extra coding work for programmer and possibility of error in extra code
Alternative 2: use a materialized view with the specific attributes benefits and drawbacks same as above, except no extra coding work for
programmer and avoids possible errors
view: a virtual table representing the result of a query. A query or update of
the view's table is against the underlying base tables.
materialized view: stored as a concrete table that is infrequently updated
from the original base tables