Upload
yael
View
38
Download
0
Embed Size (px)
DESCRIPTION
INFYS540. Normalizing Your Database and Why you WANT to do it!. Lesson 7 Chapter 5 Appendix. Data Redundancy Problems. Redundancy breeds errors Same data defined in multiple places is BAD Spelling/typographical error prone Lack of data integrity Inability to perform simple queries - PowerPoint PPT Presentation
Citation preview
04/21/23 1
Normalizing Your Database and
Why you WANT to do it!
INFYS540
Lesson 7
Chapter 5 Appendix
04/21/23 3
Data Redundancy Problems
• Redundancy breeds errors– Same data defined in multiple places is BAD– Spelling/typographical error prone– Lack of data integrity
• Inability to perform simple queries
• Inflexibility and inscalability
• Impossible to MAINTAIN!
04/21/23 5
Relational DatabasePROJECTCHIEFProject Project ChiefComputing 333-22-1111Intranet 987-65-4321Contracting 123-45-6789CAT 333-22-1111
DEPARTMENTSDept Dept. Director RoomMLD 181-94-5676 B115C2G 987-65-4321 123M&B 123-45-6789 147
EMPLOYEESLName FName SSN DeptJones Mike 123-45-6789 M&BSmith Tony 987-65-4321 C2GLee Bruce 567-89-1234 MLDDoodle Yankee 333-22-1111 M&B
1
1
What is a candidate key? What is a primary key?What is a foreign key?
04/21/23 11
Purpose of Normalization• Take advantage of the powerful tools
available in a DBMS• There are five levels of Normalization
– The higher the Normal Form the “better” and more efficient the database
– But, increasing the levels of Normal Form takes time and effort
– For most applications, 3rd Normal Form will solve most potential problems with a DB
04/21/23 12
Normalizing Database
• Process of creating well-structured tables.
• Improve performance, integrity of data
• 5-step process (w/ 2 rules) to achieve Third Normal Form (3NF)
• First two steps put DB into a form so you can normalize it
04/21/23 13
Rule #1 in Databases
Never design redundant data into a Databaseduplicate data is not consistent
duplicate data wastes space
04/21/23 14
Step 1. Primary Keys• A primary key is one or more data fields (columns)
that uniquely identify each record in the table• What would the primary key be below?
– “table of employees, assigned to a department.”
EMPLOYEESLName FName SSN DeptJones Mike 123-45-6789 MathSmith Tony 987-65-4321 M&BLee Bruce 567-89-1234 Science
04/21/23 15
Step 1. Primary Keys
• Answer: The SSN
• It is the only “guaranteed” unique column in the table. Names are easily repeated.
EMPLOYEESLName FName SSNDeptJones Mike 123-45-6789 MathSmith Tony 987-65-4321 M&BLee Bruce 567-89-1234 Science
04/21/23 16
Step 1. Primary Keys
• Now try the following example:• “A table of projects assigned to employees, listing
the project name and the employee’s function on the project.”
EmpProjCounter SSN Project Function 1 123-45-6789 Dining Designer 2 123-45-6789 Computing Designer 3 987-65-4321 Contracting Designer 4 444-55-6666 Intranet Webmaster 5 222-99-7777 Dining Overwatch
A Counter --The MS Access Default Key
04/21/23 17
Step 1. Primary Keys
• It is the combination of the SSN and the Project fields. Why?
EMPLOYEES’ PROJECTSCounter SSN Project Function 1 123-45-6789 Dining Designer 2 123-45-6789 Computing Designer 3 987-65-4321 Contracting Designer 4 444-55-6666 Intranet Webmaster 5 222-99-7777 Dining Overwatch
04/21/23 18
Step 1. Primary Keys
• Because, you can have the following:
EMPLOYEES’ PROJECTSCounter SSN Project Function1 123-45-6789 Dining Designer2 123-45-6789 Dining Designer3 987-65-4321 Intranet Designer4 444-55-6666 Intranet Webmaster5 222-99-7777 Dining Overwatch
• Redundant records! (Redundancy = BAD)
04/21/23 19
Rule #2 about Databases
NEVER Use a
Counter as a
Primary Key
04/21/23 20
Step 2: Eliminate Many-to-Many Relationships
• What is wrong with the following table? • “a table of personnel authorized access to a project”
PROJECTS QUERY ACCESSProject Access_1 Access_2 Access_3Dining 222-99-7777 181-94-5676Computing 222-99-7777 181-94-5676Intranet 987-65-4321 818-49-6765 123-45-6789
04/21/23 21
Step 2: Eliminate Many-to-Many Relationships
• Here’s essentially what this table looks like within the Access relationships diagram:
Projects:Project
Project ChiefDepartmentAccess_1Access_2Access_3
Employees:SSN
Last NameFirst Name
....
has access to info about
04/21/23 22
Step 2: Eliminate Many-to-Many Relationships
• Here’s how you model it in a database:– Break it up into two one-to-many relationships
Projects:Project
Project ChiefDepartment
....
Employees:SSN
Last NameFirst Name
....Access to
Project Info:ProjectSSN
1 1
04/21/23 23
Step 2: Eliminate Many-to-Many Relationships
• How to do it:– The primary key of the new table is the
composite of the primary keys of the existing tables.
• Primary key of Projects = Project Name
• Primary key of Employees = SSN
• New table primary key of Project Name and SSN
04/21/23 24
Step 2: Eliminate Many-to-Many Relationships
– No artificial restrictions on number of people with access
– You can add attributes about the types of access granted
– You can easily query who has access to information about each project
EMPLOYEELName FName SSNJones Mike 123-45-6789 Smith Tony 987-65-4321 Lee Bruce 567-89-1234 Doodle Yankee 333-22-1111
PROJ QUERY ACCESSProject SSNDining 222-99-7777Dining 181-94-5676Computing 222-99-7777Computing 181-94-5676Intranet 987-65-4321Intranet 818-49-6765Intranet 123-45-6789
PROJECTProject ProjectChief DeptComputing 333-22-1111 MATHIntranet 987-65-4321 M&BContracting 123-45-6789 M&BCAT 333-22-1111 Admin
04/21/23 25
What is wrong with the following?
“A table of PCs, which are loaded with many different applications, and assigned to a user.”
PCSerial# LoadedSoftware Assigned 10291 Word, Powerpoint, ccMail Jones 10301 Word, Powerpoint, Lotus Notes Smith 10311 Word, LotusNotes, Borland C++ Hacker
04/21/23 26
• “Atomic” - the data occupying a field cannot be further broken down.– i.e., no multi-data entries– i.e., “No attributes can have more than one value for a single
instance of an entity”
PCSerial# LoadedSoftware Assigned 10291 Word, Powerpoint, ccMail Jones
• If not atomic, updating is complex and error prone• If not atomic, can not easily query the database
Step 3: Achieving 1NF:All Data must be Atomic
04/21/23 27
Step 3 Answer
PCSerial# LoadedSoftware Assigned 10291 Word Jones 10291 Powerpoint Jones 10291 ccMail Jones 10301 Word Smith 10301 Powerpoint Smith 10301 LotusNotes Smith 10311 Word Hacker 10311 LotusNotes Hacker 10311 Borland C++ Hacker
04/21/23 28
Step 3. Achieving 1NF:All Data must be Atomic
Another source of redundancy: calculated fieldsTotalYTDAgeDaysRemaining
Solution: Use a Query!
Remove all calculated fields from table and create a query
...then use the query whenever you need up-to-date data
04/21/23 29
Step 4. Achieving 2NF:Eliminate Partial Dependencies
• What is a partial dependency?
– Look at the table. What’s redundant?– “A table of functions an employee is assigned to for a project, and the project chief.”
EMPLOYEES’ PROJECTSSSN Project Function Project Chief123-45-6789 Dining Designer 222-99-7777123-45-6789 Computing Designer 333-88-5656123-45-6789 Intranet Member 987-65-4321987-65-4321 Intranet Designer 987-65-4321444-55-6666 Intranet Webmaster 987-65-4321222-99-7777 Dining Overwatch 222-99-7777
04/21/23 30
Step 4. Achieving 2NF:Eliminate Partial Dependencies
• Function depends on the entire primary key: SSN and Project.• ProjectChief is dependent on just a portion of the primary key
EMPLOYEES’ PROJECTSSSN Project Function ProjectChief123-45-6789 Dining Designer 222-99-7777123-45-6789 Computing Designer 333-88-5656123-45-6789 Intranet Member 987-65-4321987-65-4321 Intranet Designer 987-65-4321444-55-6666 Intranet Webmaster 987-65-4321222-99-7777 Dining Overwatch 222-99-7777
04/21/23 31
Step 4. Achieving 2NF:Eliminate Partial Dependencies
• Why is this bad?– Well, what’s wrong with the following?
EMPLOYEES’ PROJECTSSSN Project Function Project Chief123-45-6789 Dining Designer 222-99-7777123-45-6789 Computing Designer 333-88-5656123-45-6789 Intranet Member 987-65-4321987-65-4321 Intranet Designer 987-65-4321444-55-6666 Intranet Webmaster 222-99-7777222-99-7777 Dining Overwatch 222-99-7777
04/21/23 32
Step 4. Achieving 2NF:Eliminate Partial Dependencies
• A partial dependency (PD) occurs when a non-key field depends on only a part of the primary key, and not the whole primary key.
• PDs are a relation. So, we need a new table.....EMPLOYEES’ PROJECTSSSN Project Function Project Chief123-45-6789 Dining Designer 222-99-7777123-45-6789 Computing Designer 333-88-5656123-45-6789 Intranet Member 987-65-4321987-65-4321 Intranet Designer 987-65-4321444-55-6666 Intranet Webmaster 987-65-4321222-99-7777 Dining Overwatch 222-99-7777
04/21/23 33
Step 4. Achieving 2NF:Eliminate Partial Dependencies
• Here’s how it should look......EMPLOYEES’ PROJECTSSSN Project Function123-45-6789 Dining Designer123-45-6789 Computing Designer123-45-6789 Intranet Member987-65-4321 Intranet Designer444-55-6666 Intranet Webmaster222-99-7777 Dining OverwatchPROJECTSProject Project ChiefDining 222-99-7777Computing 333-88-5656Intranet 987-65-4321
04/21/23 34
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
• What is wrong with the following table?
PROJECTSProject Project Chief Dept. Dept. Director RoomDining 222-99-7777 Admin 181-94-5676 B115Computing 333-88-5656 Admin 181-94-5676 B115Intranet 987-65-4321 M&B 818-49-6765 123Contracting 187-87-8787 M&B 818-49-6765 123CAT 333-22-1111 Grounds 123-45-6789 147
04/21/23 35
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
• We have fields dependent on a non-key field:– The Director and Room fields clearly relate to the Dept., and have nothing
to do with the project. (Dept is a “determinant” that is not a candidate key)
PROJECTSProject Project Chief Dept. Dept. Director RoomDining 222-99-7777 Admin 181-94-5676 B115Computing 333-88-5656 Admin 181-94-5676 B115Intranet 987-65-4321 M&B 818-49-6765 123Contracting 187-87-8787 M&B 818-49-6765 123CAT 333-22-1111 GRND 123-45-6789 147
04/21/23 36
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
• A transitive dependency occurs when a non-key field depends on another non-key field.
• Why is this bad?.
– A typo appeared in the Contracting line. A database without the transitive dependency would not have allowed this to happen.
PROJECTSProject Project Chief Dept. Dept. Director RoomDining 222-99-7777 Admin 181-94-5676 B115Computing 333-88-5656 Admin 181-94-5676 B115Intranet 987-65-4321 M&B 818-49-6765 123Contracting 187-87-8787 M&B 818-49-6765 124CAT 333-22-1111 GRND 123-45-6789 147
04/21/23 37
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
• How to do it:a. Which fields are dependent on a non-key field
in the table? (Director, Room)
b. Which fields are these dependent on? (Dept)
c. Create a new table with (b) as the primary key.
d. Put (a) in the new table.
e. Remove (a) from the old table.
04/21/23 38
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
• Here are the new tables.
PROJECTSProject Project Chief Dept.Dining 222-99-7777 AdminComputing 333-88-5656 AdminIntranet 987-65-4321 M&BContracting 187-87-8787 M&BCAT 333-22-1111 GRNDDEPARTMENTSDept. Name Dept. Director RoomAdmin 181-94-5676 B115M&B 818-49-6765 123GRND 123-45-6789 147
04/21/23 39
Data Analysis: Normalization• An entity is in first normal form (1NF) if there are no
attributes that can have more than one value for a single instance of the entity.
• An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it.
• An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes.
04/21/23 41
Conclusion
• Rule1: Never design redundant data into a database • Rule2: Never use a counter as Primary Key• Identify proper primary keys (1NF)• Break up many-to-many relationships (1NF)• 1NF: Break all data into atomic components• 2NF: Identify/eliminate partial dependencies• 3NF: Eliminate transitive dependencies• Common sense test
04/21/23 42
What is a Good Data Model?– A good data model is simple.
• As a general rule, the data attributes that describe an entity should
describe only that entity.
– A good data model is essentially non-redundant. • This means that each data attribute, other than foreign keys, describes
at most one entity.
– A good data model should be flexible and adaptable to future needs.
• We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs.
04/21/23 49
• Data and Referential Integrity– There are at least three types of data integrity that
must be designed into any database - key integrity, domain integrity and referential integrity.
– Key Integrity:• Every table should have a primary key (which may be
concatenated). – The primary key must be controlled such that no two records
in the table have the same primary key value.
– The primary key for a record must never be allowed to have a NULL value.
04/21/23 50
• Data and Referential Integrity– Domain Integrity:
• Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values.
– Referential Integrity:• A referential integrity error exists when a foreign
key value in one table has no matching primary key value in the related table.
04/21/23 51
• Referential Integrity:• Referential integrity is specified in the form of deletion rules
as follows:– No restriction.
• Any record in the table may be deleted without regard to any records in any other tables.
– Delete:Cascade.
• A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table.
– Delete:Restrict.
• A deletion of a record in the table must be disallowed until any matching records are deleted from a related table.
– Delete:Set Null.
• A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL.
04/21/23 54
• Database Capacity Planning– A database is stored on disk.
• The database administrator will want an estimate of disk capacity for the new database to ensure that sufficient disk space is available.
– Database capacity planning can be calculated with simple arithmetic as follows.
1 For each table, sum the field sizes. – This is the record size for the table.
2 For each table, multiply the record size times the number of entity instances to be included in the table.
– This is the table size.
04/21/23 55
• Database Capacity Planning– Database capacity planning can be calculated
with simple arithmetic as follows. (continued)3 Sum the table sizes.
– This is the database size.
4 Optionally, add a slack capacity buffer (e.g., 10%) to account for unanticipated factors or inaccurate estimates above.
– This is the anticipated database capacity.