View
225
Download
0
Tags:
Embed Size (px)
Citation preview
INFO 340
Lecture 7Functional Dependency,
Normalization
DeMorgan’s Theorem
• A AND B = A OR B
• A OR B = A AND B
“Spreadsheet Syndrome”
• When you use a spreadsheet program, you only really have one table.
• This leads to duplication of data.
Normalization
• Goal: Every non-key column is directly dependent on the key, the whole key, and nothing but the key
• Goal: Reduce redundancies, less anomalies, and improve efficiency.
Data Redundancy & Update Anomalies
• Insertion Anomaly– Staff # | sName | position | salary | branch# | bAddress
• Add new staff & bAddress must be updated also – creating opportunity for error
• Want to add new branch w/no staff means we have to enter nulls for staff members
• Deletion Anomaly• Deleting last staff member of a branch also deletes details on branch
• Modification Anomaly• Updating details of a particular branch must be done for all rows –
creating opportunity for error
Functional Dependency & Normalization
• How to identify the most commonly used normal forms, namely First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF).
What happens if normalization hasn’t occurred?
• Data duplication
• Multiple truths
• Difficulty to query
Full functional dependency
• A fully functional dependency is when you can not remove items from the first set (the A in AB) and maintain a functional dependency.
Transitive Dependency
• Transitive dependency describes a condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).
Functional Dependency & Normalization
• Main characteristics of functional dependencies used in normalization:– There is a one-to-one relationship between the
attribute(s) on the left-hand side (determinant) and those on the right-hand side of a functional dependency.
– Holds for all time.– The determinant has the minimal number of
attributes necessary to maintain the dependency with the attribute(s) on the right hand-side.
Normalization
• Formal technique for analyzing a relation based on its primary key and the functional dependencies between the attributes of that relation.
• Formal method to cross-check your work – “sanity check”
• Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.
• As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.
1st Normalized Form
• A relation in which the intersection of each row and column contains one and only one value.
• Atomicity. Based upon you’re requirements, a column holds only one value.
2nd Normal Form
• Based on the concept of full functional dependency.
• A relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key.
2NF examplesStudent Class Location
John CSE 143 EEB 103
John EE 131 EEB 103
Susie INFO 340 MGH 238
Susie MATH 124 PAR 104
Susie EE 131 EEB 103
• While in 1NF form, it is not in 2NF form. Candidate Key {Student,Class} .
• Location is not fully functional dependent, since it is dependent only on Class.
Student Class
John CSE 143
John EE 131
Susie INFO 340
Susie MATH 124
Susie EE 131
Class Location
CSE 143 EEB 103
EE 131 EEB 103
INFO 340 MGH 238
MATH 124
PAR 104
3rd Normal Form
• Based on the concept of transitive dependency.
• A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key.
3NF examplePublisherID Name Address City State ZIP
1 Apress 2560 Ninth Street, Station 219 Berkeley CA 94710
• Looks good, but notice that City and State are really dependent on ZIP, not Publisher_ID.
• A good way to find transitive functional dependencies is think to yourself. – “If I update this column, do I need to update others?”
• In this case, updating the City column would require you to update the ZIP and possible the State column.
• This example, though, hints that one of the dangers of normalization, that you can sometimes go too far..
PublisherID Name Address ZIP
1 Apress 2560 Ninth Street, Station 219 94710
ZIP City State
94710 Berkeley CA
MidTerm Overview
• Limitations of file-based systems• Difference in a DDL & DML• Advantages/disadvantages of DBMS’s• Differences in External, Conceptual, &
Internal levels of DBMS’s• Data independence• Functions of a DBMS• Relation, attribute, domain,cardinality,
degree• Attribute domains• Cartesian product• Properties of a relation• Keys – super, candidate, primary, foreign• Null• Entity integrity, referential integrity• Sets – union, intersection, difference• Joins – inner, right outer, left outer
• SQL – selects, updates, inserts, aggregates, group by, order by
• Wild cards• Nested query• DeMorgan’s Theorem in an SQL query• Relational algebra – difference between a
selection & a projection• Entity relationship diagrams• Mulitplicity• Functional dependency• Definitions of First, Second, & Third
Normal form• Be able to identify if a relation is in 1NF,
2NF, or 3NF• Difference between Integer types in
MySQL
Homework
Complete Mini-Project work
Prepare for mid-term