THEORY OF DEPENDENCIES IN RELATIONAL DATABASE
Overview
• Introduction • Characteristics Of “BAD” Schema• What Is Functional Dependency?• Armstrong’s Reference Rules• Equivalence & Minimal Cover• Normalization• Normalization Types And Details• BCNF• Higher Normal Forms• De-Normalization• Multi-valued Dependencies(MVD)• Join Dependencies• Inclusion Dependencies• Conclusion• References
INTRODUCTION
• The main aim for Database Design is coming up with “GOOD” schema.• Problem- 1.How do we characterize the “GOODNESS” of a schema?
2.If two or more alternative schemas are available , how do we compare them?
3.What are the problems with “BAD” schema?
• An example-
Characteristics of “BAD” schema
• Redundant storage of DATA - Office Phone & HOD info – stored redundantly-wastage of disk space
• A program that updates Office Phone of a department must change it at several places - more running time & error prone
ANOMALIES-
a. Insertion anomaly - No way of inserting info about a new department unless we also enter details of a (dummy) student in department.
b. Deletion anomaly – If all students of a certain department leave and we delete their tuples , information about department itself is lost .
c. Update anomaly – Updating office phone of a department 1. value in several tuples need to be changed 2.if a tuple is missed-inconsistency in data
What is functional dependency?
• Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs• FDs and keys are used to define normal forms for relations.
NORMAL FORMS - 1. Each NF specifies certain conditions. 2. If the conditions are satisfied by the schema certain kind of problems are avoided
Consider the schemaStudent(s.name,rollno.,gender,dept,h.name,roomno.}
Since rollno. Is a key,Rollno. →{s.name,gender,dept, h.name,roomno.}
Let each student is given a hostel room,Then h.name,roomno. → rollno.
More about functional dependency…
Armstrong’s reference rules
Sound & complete inference rules
•Armstrong shows that Rules 1,2,3 are sound & Complete. •These are called Armstrong’s Axioms(AA)
SOUNDNESS-
•Every new FD X → Y Derived from a given set of FDs F using AA is such that F {X → Y)╞
Sound & complete inference rules(2)
COMPLETENESS-
• Any FD X→Y logically implied by F (i.e. F ╞ {X→Y} ) can be derived from F using AA
CLOSURE OF A SET OF FDs-
• Closure of a set of FDs is the set F+ of all the FDs that can be inferred from F.• Closure of a set of attributes X w.r.t F is the set of X + of all attributes that are Functionally determined by XEx- P{a, b, c, d, e, f} set of FDs F on it, as follows: F={a → d, b →{e, f}, {a, b }→ c} F+ :the closure of F a + ={a, d} b + ={b, e, f} {a, b} + ={a, b, c, d, e, f}
Equivalence & minimal cover
• EQUIVALENCE of sets of FDs: Two sets of FDs F & G are equivalent if F =G i.e. Every FD in F can be inferred from G & every FD in G can be inferred from F.
• EXTRANEOUS ATTRIBUTE: The removal of which attribute doesn’t change F + . Ex- Given F={A → C, AB → C} B is extraneous in AB → C as A → C logically implies AB → C .
• MINIMAL COVER: A minimal cover of a set of FDs G is a minimal set of dependencies F that is equivalent to E. Here F + =G +, if we modify G by deleting an FD or by deleting attribute From an FD in G, the closure changes. RHS of each FD in G is a single attribute. Ex-{A → B, ABCD → E, EF → GH, ACDF → EG} has the following minimal Cover: {A → B, ACD → E, EF → G, EF → H}
Normalization
Boyce-Codd and
Higher
Functional dependencyof nonkey attributes on the primary key - Atomic values only
Full Functional dependencyof nonkey attributes on the primary key
No transitive dependency between nonkey attributes
All determinants are candidate keys - Single multivalued dependency
Normalization (2)
• Un-normalized relations: First step in normalization is to convert the data into 2D table. Data can be repeated within a column.
• First Normal Form (1 NF) Only atomic values at each row and column.
• Second Normal Form (2 NF) A relation is said to be in Second Normal Form when every non-key attribute is fully functionally dependent on the primary key.
Applicable for composite key & when there is composite key , there may exist partial FD, which 2NF denies, So to get 2NF we have to Decompose it into Relation schema.
After Decomposition , it is Lossless or NOT should be verified.
Normalization (3) – 2 NF
• Full Functional Dependency:
A FD X → Y is said to be a FULL FD if after removal of any attribute from X, the FD doesn’t hold good anymore.
• Partial Functional Dependency:
A FD X → Y is partial FD if {X-A} → Y is also true.
• Decomposition:
Let R=(A,B,C,D)
X=(P,Q,S,T) st. R= P υ Q υ S υ T
Replacing R by P,Q,S,T- process of decomposing R
Normalization(4)-2 NF
DESIRABLE PROPERTIES OF DECOMPOSITION:
• Not all Decomposition of a schema are useful.• We require two properties to be satisfied.
Lossless join property- The information in an instance r of R must be preserved in the instances .
* If R is decomposed into P , Q and P ∩ Q ≠ Φ , then it is lossless.
Dependency preserving property:- if a set F of dependencies hold on R it should be possible to enforcing appropriate dependencies on each r.
2 NF - Example
• EID → Name, Address, Birthdate• EID, Pname → StartDate• Candidate key is {EID, PName}. • The nonprime attributes are Name, Address, Birthdate, StartDate. • Nonprime attributes Name, Address, Birthdate violate 2NF because they are functionally dependent
Normalization(5)-3 NF
• 2NF, plus no transitive functional dependencies.• Given three attributes in a relation A, B, C, if A B and B C, this forms a transitive functional dependency.• Avoid transitive dependencies for 3NFEx-
Here, Customer_ID Salesperson, and Salesperson Region, cause a transitive dependency
Solution:
Boyce-codded normal form
• Most 3NF relations are also BCNF relations.• A 3NF relation is NOT in BCNF if:
Candidate keys in the relation are composite keys (they are not single attributes)
There is more than one candidate key in the relation, and The keys are not disjoint, that is, some attributes in the keys are
common
Patient # Patient Name Patient Address
1111 John White15 New St. New York, NY
1234 Mary Jones10 Main St. Rye, NY
2345Charles Brown
Dogwood Lane Harrison, NY
4876 Hal Kane55 Boston Post Road, Chester,
5123 Paul KosherBlind Brook Mamaroneck, NY
6845 Ann HoodHilton Road Larchmont, NY
Multi-valued dependencies(MVD)
Higher normal forms
Fourth Normal Form ( 4 NF)• Any relation is in Fourth Normal Form if it is BCNF and any multi-valued dependencies are trivial• Eliminate non-trivial multi-valued dependencies by projecting into simpler tables
JOIN DEPENDENCIES• A join dependency denoted by JD (R1,R2,R3,……Rn), specified on relational schema R specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a non-additive join decomposition into R1,R2,….. Rn NOTE - An MVD is a special case of JD where n=2 i.e. a JD denoted as JD (R1,R2) implies an MVD (R1∩R2) →→(R1-R2) Fifth Normal Form• A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation.• Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation
De-normalization
• De-normalization is the process of modifying a perfectly normalized database design for performance reasons.
• It is a natural and necessary part of database design, but must follow proper normalization.
• It always makes your system potentially less efficient and flexible.
So de-normalize as needed, but not frivolously.
De-normalization
Customer IDAddressNameTelephone
Order Order NoDate TakenDate DispatchedDate InvoicedCust ID
Before:Customer IDAddressNameTelephone
Order Order NoDate TakenDate DispatchedDate InvoicedCust IDCust Name
After:
Inclusion dependency
• The foreign key(or referential integrity)constraint can not be specified as a functional or multi-valued dependency because it relates attributes across relations.
• An ID R.X<S.Y between two sets of attributes – X of relation schema R & y of relation schema S – specifies the constraint that at any specific time when r is a relation state of R and s a relation state of S , we have
╥y(s(S)) ⊇ ╥x(r(R)) Condition
• X of R and Y of S must have same no. of attribute.• The domains for each pair of corresponding attribute should be compatible.
So far no normal form have been developed based on ID
Conclusion
• After we have the ER diagrams each relation in the schema must be independently reviewed and normalized when needed.
• Functional dependencies are the building blocks that enable the analysis of data redundancy and the elimination of anomalies caused by data redundancy through the process of normalization
• Normalization is a technique that facilitates systematic validation of participation of attributes in a relation schema from a perspective of data redundancy.
• This process gives us the final opportunity to correct errors and establish a robust design before implementing the database system
References
• Fundamentals of Database systems,5th edition by Ramez Elmasari, Shamkant B. Navathe
• Database system concepts by A. Seilberschatz, H. korth, S Sudersan
• An introduction to Database system by C.J. Date
• Lotito, J. (2001). Concepts of Database Design and Management. Retrived September 2007 from http://www.sitepoint.com/article/database-design-management
• Scamell, R.W., & Umanath N.S. (2007). Data Modeling and Database Design: Boston, MA: Thomson
Questions ???
Thank you…