68
Normalization CSC 3800 Fall 2008

Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Embed Size (px)

Citation preview

Page 1: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Normalization

CSC 3800

Fall 2008

Page 2: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Database Normalization Database normalization is the process of

removing redundant data from your tables to improve storage efficiency, data integrity, and scalability.

In the relational model, methods exist for quantifying how efficient a database is. These classifications are called normal forms (or NF), and there are algorithms for converting a given database between them.

Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.

Page 3: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

History

Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks Codd stated:“There is, in fact, a very simple elimination procedure which we shall call normalization. Through decomposition nonsimple domains are replaced by ‘domains whose elements are atomic (nondecomposable) values.’”

Page 4: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Normal Form

Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. There are now others that are generally accepted, but 3NF is widely considered to be sufficient for most applications. Most tables when reaching 3NF are also in BCNF (Boyce-Codd Normal Form).

Page 5: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Why Normalize?

FlexibilityStructure supports many ways to look at

the dataData Integrity

“Modification Anomalies”Deletion InsertionUpdate

EfficiencyEliminate redundant data and save space

Page 6: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Normalization Defined

“In relational database design, the process of organizing data to minimize duplication.

Normalization usually involves dividing a database into two or more tables and defining relationships between the tables.

The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.” - Webopedia, http://webopedia.internet.com/TERM/n/normalization.html

Page 7: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

The Normal Forms

A series of logical steps to take to normalize data tables

First Normal FormSecondThirdBoyce CoddThere’s more, but beyond scope of

this class

Page 8: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Un-normalized Table

OrderDate Customer Items11/30/1998 Joe Smith Hammer, Saw, Nails

OrderDate Customer Item1 Item2 Item311/30/1998 Joe Smith Hammer Saw Nails

or

Page 9: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

First Normal Form

Remove horizontal redundanciesNo two columns hold the same informationNo single column holds more than a single

itemEach row must be unique

Use a primary keyBenefits

Easier to query/sort the dataMore scalableEach row can be identified for updating

Page 10: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

First Normal Form

All columns (fields) must be atomicMeans : no repeating items in columns

OrderDate Customer Items11/30/1998 Joe Smith Hammer, Saw, Nails

OrderDate Customer Item1 Item2 Item311/30/1998 Joe Smith Hammer Saw Nails

Solution: make a separate table for each set of attributes with a primary key (parser, append query)

CustomersCustomerIDName

OrdersOrderIDItem CustomerIDOrderDate

Page 11: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

First Normal Form Tables

CustomerID Name 1 Joe Smith

Orders

OrderID Item CustomerID OrderDate 1 Hammer 1 11/30/1998 1 Saw 1 11/30/1998 1 Nails 1 11/30/1998

Customers

Page 12: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Second Normal Form (2NF)

In 1NF and every non-key column is fully dependent on the (entire) primary key Means : Do(es) the key field(s) imply the rest of

the fields? Do we need to know both OrderID and Item to know the Customer and Date? Clue: repeating fields

OrderID Item CustomerID OrderDate1 Hammer 1 11/30/19981 Saw 1 11/30/19981 Nails 1 11/30/1998

Orders

Page 13: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Second Normal Form

Table must be in First Normal FormRemove vertical redundancy

The same value should not repeat across rows

Composite keysAll columns in a row must refer to BOTH

parts of the keyBenefits

Increased storage efficiencyLess data repetition

Page 14: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Second Normal Form (2NF)

In 1NF and every non-key column is fully dependent on the (entire) primary key Means : Do(es) the key field(s) imply the rest of the fields? Do we

need to know both OrderID and Item to know the Customer and Date? Clue: repeating fields

Solution: Remove to a separate table (Make Table)

OrderID Item CustomerID OrderDate1 Hammer 1 11/30/19981 Saw 1 11/30/19981 Nails 1 11/30/1998

OrderDetailsOrderIDItem

OrdersOrderIDCustomerIDOrderDate

Page 15: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Second Normal Form Tables

OrderID CustomerID OrderDate 1 1 11/30/1998

Orders

OrderDetails

OrderID Item 1 Hammer 1 Saw 1 Nails

CustomerID Name 1 Joe Smith

Customers

Page 16: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Third Normal Form (3NF)

In 2NF and every non-key column is mutually independent means : Calculations

Item Quantity Price TotalHammer 2 $10 $20Saw 5 $40 $200Nails 8 $1 $8

Page 17: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Third Normal Form

Table must be in Second Normal Form If your table is 2NF, there is a good chance

it is 3NFAll columns must relate directly to the

primary keyBenefits

No extraneous data

Page 18: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Third Normal Form (3NF)

In 2NF and every non-key column is mutually independent means : Calculations

•Solution: Put calculations in queries and forms

Item Quantity Price TotalHammer 2 $10 $20Saw 5 $40 $200Nails 8 $1 $8

OrderDetailsOrderIDItemQuantityPrice

Put expression in text control or in query:=Quantity * Price

Page 19: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Third Normal Form (3NF)

OrderID Item Quantity Price 1 Hammer 2 $10 1 Saw 5 $40 1 Nails 8 $1

OrderDetails

Put expression in text control or in query:=Quantity * Price

SELECT OrderID, Item, Quantity, Price, Price*Quantity FROM OrderDetails

Page 20: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

2/16/98 10MGS 404

2/16/98 10

Boyce-Codd Form (3NF) - Examples

A more restricted version of 3NF (known as Boyce-Codd Normal Form) requires that the determinant of every functional dependency in a relation be a key - for every FD: X => Y, X is a key

Consider the following relation:STU-MAJ-ADV (Student-Id, Major, Advisor)Advisor => Major, but Advisor is not a key

Boyce-Codd Normal Form for above:STU-ADV (Student-Id, Advisor)ADV-MAJ (Advisor, Major)

Kumar Madurai: http://www.mgt.buffalo.edu/courses/mgs/404/mfc/lecture4.ppt

Page 21: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Primary Key

Unique Identifier for every row in the table Integers vice Text to save memory,

increase speedCan be “composite”Surrogate is best bet!

Meaningless, numeric column acting as primary key in lieu of something like SSN or phone number - (both can be reissued!)

Page 22: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Relationships

One to many to enforce “Referential Integrity”

Two “foreign” keys make a composite primary key and “relate” many to many tables

A look up table - it doesn’t reference any others

Page 23: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Table Prefixes Aid Development

First, we’ll get replace text PK with number The Items table is a “look up” with tlkp prefix

tlkp “lookup” table (no “foreign keys”)

OrderDetails is renamed “trelOrderItem” a “relational” table trel “relational” (or junction or linking)

two foreign keys make a primary

tblOrdersOrderIDCustomerIDOrderDate

OrderDetailsOrderIDItem trelOrderItem

OrderIDItemID

tlkpItemsItemIDItemName

Page 24: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Referential Integrity

Every piece of “foreign” key data has a primary key on the one site of the relationship No “orphan” records. Every child has a parent Can’t delete records from primary table if in related table

Benefits - Data Integrity and Propagation If update fields in main table, reflected in all queries Can’t add a record in related table without adding it to main Cascade Delete: If delete record from primary table, all

children deleted - use with care! Better idea to “archive” Cascade Update: If change the primary key field, will change

foreign key

Page 25: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

When Not to Normalize

Want to keep tables simple so user can make their own queries Avoid processing multiple tables

Archiving Records If no need to perform complex queries or “resurrect” Flatten and store in one or more tables

Testing shows Normalization has poorer performance “Sounds Like” field example Can also try temp tables produced from Make Table queries

Page 26: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Real World - School Data

Student Student Previous CurrentLast First Parent 1 Parent 2 Teacher TeacherSmith Renee Ann Jones Theodore Smith Hamil BurkeMills Lucy Barbara Mills Steve Mills Hamil Burke Jones Brendan Jennifer Jones Stephen Jones Hamil Burke ….

Street Address City State Postal Code Home Phone5551 Private Hill Annandale Virginia 22003- (703) 323-08934902 Acme Ct Annandale Virginia 22003- (703) 764-58295304 Gains Street Fairfax Virginia 22032- (703) 978-1083 ….

First Year Last Year AgeProgram Enrolled Attended Birthday inSept Map Coord NotesPF / 0 0 6/25/93 5 22 A-3PF 96/97 0 8/14/93 5 21 F-3PH 96/97 0 6/13/94 4 21 A-4

Page 27: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

One Possible Design

Page 28: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Examples

Page 29: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

1. Eliminate Repeating Groups In the original member list, each member name is followed by any databases that

the member has experience with. Some might know many, and others might not know any. To answer the question, "Who knows DB2?" we need to perform an awkward scan of the list looking for references to DB2. This is inefficient and an extremely untidy way to store information.

Moving the known databases into a seperate table helps a lot. Separating the repeating groups of databases from the member information results in first normal form. The MemberID in the database table matches the primary key in the member table, providing a foreign key for relating the two tables with a join operation. Now we can answer the question by looking in the database table for "DB2" and getting the list of members.

Original Table

1NF Tables

Page 30: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

2. Eliminate Redundant Data In the Database Table, the primary key is made up of the MemberID and the DatabaseID. This makes sense

for other attributes like "Where Learned" and "Skill Level" attributes, since they will be different for every member/database combination. But the database name depends only on the DatabaseID. The same database name will appear redundantly every time its associated ID appears in the Database Table.

Suppose you want to reclassify a database - give it a different DatabaseID. The change has to be made for every member that lists that database! If you miss some, you'll have several members with the same database under different IDs. This is an update anomaly.

Or suppose the last member listing a particular database leaves the group. His records will be removed from the system, and the database will not be stored anywhere! This is a delete anomaly. To avoid these problems, we need second normal form.

To achieve this, separate the attributes depending on both parts of the key from those depending only on the DatabaseID. This results in two tables: "Database" which gives the name for each DatabaseID, and "MemberDatabase" which lists the databases for each member.

Now we can reclassify a database in a single operation: look up the DatabaseID in the "Database" table and change its name. The result will instantly be available throughout the application.

Page 31: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

3. Eliminate Columns Not Dependent On Key

The Member table satisfies first normal form - it contains no repeating groups. It satisfies second normal form - since it doesn't have a multivalued key. But the key is MemberID, and the company name and location describe only a company, not a member. To achieve third normal form, they must be moved into a separate table. Since they describe a company, CompanyCode becomes the key of the new "Company" table.

The motivation for this is the same for second normal form: we want to avoid update and delete anomalies. For example, suppose no members from the IBM were currently stored in the database. With the previous design, there would be no record of its existence, even though 20 past members were from IBM!

Page 32: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

BCNF. Boyce-Codd Normal Form

Boyce-Codd Normal Form states mathematically that:A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not in X, then X is a candidate key for R.BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys.

Basically, a humorous way to remember BCNF is that all functional dependencies are:"The key, the whole key, and nothing but the key, so help me Codd."

Page 33: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

Page 34: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

Name Address Phone

Sally Singer123 Broadway New York, NY, 11234

(111) 222-3345

Jason Jumper456 Jolly Jumper St. Trenton NJ, 11547

(222) 334-5566

The First Normal Form

For a table to be in first normal form, data must be broken up into the smallest units possible. For example, the following table is not in first normal form.

Page 35: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

To conform to first normal form, this table would require additional fields. The name field should be divided into first and last name and the address should be divided by street, city state, and zip like this.

ID First Last Street City State Zip Phone

564 Sally Singer 123 Broadway New York NY 11234 (111) 222-3345

565 Jason Jumper 456 Jolly Jumper St. Trenton NJ 11547 (222) 334-5566

Page 36: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

In addition to breaking data up into the smallest meaningful values, tables in first normal form should not contain repetitions groups of fields such as in the following table.

Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3

TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs

RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs    

Page 37: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

The problem here is that each representative can have multiple clients not all will have three. Some may have less as is the case in the second record, tying up storage space in your database that is not being used, and some may have more, in which case there are not enough fields. The solution to this is to add a record for each new piece of information.

Rep IDRep First Name

Rep Last Name

ClientTime With Client

TS-89 Gilroy Gladstone US Corp 14 hrs

TS-89 Gilroy Gladstone Taggarts 26 hrs

TS-89 Gilroy Gladstone Kilroy Inc. 9 hrs

RK-56 Mary Mayhem Italiana 67 hrs

RK-56 Mary Mayhem Linkers 2 hrs

Notice the splitting of the first and last name fields again.

Page 38: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

This table is now in first normal form. Note that by avoiding repeating groups of fields, we have created a new problem in that there are identical values in the primary key field, violating the rules of the primary key. In order to remedy this, we need to have some other way of identifying each record. This can be done with the creation of a new key called client ID.

Rep ID*Rep First Name

Rep Last Name

Client ID*

ClientTime With Client

TS-89 Gilroy Gladstone 978 US Corp 14 hrs

TS-89 Gilroy Gladstone 665 Taggarts 26 hrs

TS-89 Gilroy Gladstone 782 Kilroy Inc. 9 hrs

RK-56 Mary Mayhem 221 Italiana 67 hrs

RK-56 Mary Mayhem 982 Linkers 2 hrs

This new field can now be used in conjunction with the Rep ID field to create a multiple field primary key. This will prevent confusion if ever more than one Representative were to serve a single client.

Page 39: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

Second Normal Form

The second normal form applies only to tables with multiple field primary keys.  Take the following table for example.

Rep ID*Rep First Name

Rep Last Name

Client ID*

ClientTime With Client

TS-89 Gilroy Gladstone 978 US Corp 14 hrs

TS-89 Gilroy Gladstone 665 Taggarts 26 hrs

TS-89 Gilroy Gladstone 782 Kilroy Inc. 9 hrs

RK-56 Mary Mayhem 221 Italiana 67 hrs

RK-56 Mary Mayhem 982 Linkers 2 hrs

RK-56  Mary  Mayhem 665 Taggarts 4 hrs

This table is already in first normal form.  It has a primary key consisting of Rep ID and Client ID since neither alone can be considered a unique value.  

Page 40: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2 The second normal form states that each field in a multiple field

primary key table must be directly related to the entire primary key. Or in other words, each non-key field should be a fact about all the fields in the primary key. Only fields that are absolutely necessary should show up in our table, all other fields should reside in different tables.  In order to find out which fields are necessary we should ask a few questions of our database.  In our preceding example, I should ask the question "What information is this table meant to store?" Currently, the answer is not obvious. It may be meant to store information about individual clients,  or it could be holding data for employees time cards.  As a further example, if my database is going to contain records of employees I may want a table of demographics and a table for payroll.  The demographics will have all the employees personal information and will assign them an ID number.  I should not have to enter the data twice, the payroll table on the other hand should refer to each employee only by their ID number. I can then link the two tables by a relationship and will then have access to all the necessary data.  

Page 41: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

In the table of the preceding example we are devoting three field to the identification of the employee and two to the identification of the client.  I could identify them with only one field each -- the primary key.  I can then take out the extraneous fields and put them in their own table.  For example,  my database would then look like the following.

Rep ID* Client ID* Time With Client

TS-89 978 14 hrs

TS-89 665 26 hrs

TS-89 782 9 hrs

RK-56 221 67 hrs

RK-56 982 2 hrs

RK-56 665 4 hrs

The above table contains time card information.

Page 42: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

Rep ID* First Name Last Name

TS-89 Gilroy Gladstone

RK-56 Mary Mayhem

The above table contains Employee Information.

Page 43: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

Client ID* Client Name

978 US Corp

665 Taggarts

782 Kilroy Inc.

221 Italiana

982 Linkers

The above table contains Client Information

These tables are now in normal form.  By splitting off the unnecessary information and putting it in its own tables, we have eliminated redundancy and put our first table in second normal form.  These tables are now ready to be linked through relationship to each other. 

Page 44: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 2

Third Normal Form Third normal form is the same as second

normal form except that it only refers to tables that have a single field as their primary key.  In other words, each non-key field in the table should be a fact about the primary key. Either of the preceding two tables act as an example of third normal form since all the fields in each table are necessary to describe the primary key. 

Once all the tables in a database have been taken through the third normal form, we can begin to set up relationships. 

Page 45: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Example 3

Page 46: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Repeating Groups And Normalization To First Normal Form (1nf)

Invoice# Date Customer#Salesperson

Region

Item# PriceDescription

INVOICES (2NF)

Invoice# Quantity

INVOICE-ITEMS (1NF)

100110021003

7/1/927/1/927/1/92

456329897

JohnMaryAl

WestEastWest

121348540

45105

$2.25$3.70$0.40

WidgetGearBolt

Invoice# Date Customer#Salesperson

Region Item# QuantityPriceDescription

SALES-INFORMATION

Page 47: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

What Is The Problem With Description/Price?

Insert anomaliesDelete anomaliesUpdate anomalies

Page 48: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Decomposition Of A First-normal-form (1nf) Table

Item# PriceDescriptionInvoice# Quantity

INVOICE-ITEMS (1NF)

Item# PriceDescription

ITEMS (2NF)

Item#Invoice# Quantity

INVOICE-ITEMS-QTY (2NF)

You can only have a 2nd Normal Form problem if there is a composite primary Key

Page 49: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Database Normalization

Functional dependency is key in understanding the process of normalization. Functional dependency means that if there is only one possible value of Y for every value of X, then Y is functionally dependent on X.

Page 50: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Database Normalization

Think of an invoice table. Two fields would be invoice # and date. Which field is functionally dependent on the other?

INVOICE # DATE

Date is functionally dependent on invoice number.

Page 51: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Dependencies

Functional Dependency is “good”. With functional dependency the primary key (Attribute A) determines the value of all the other non-key attributes (Attributes B,C,D,etc.)

Transitive dependency is “bad”. Transitive dependency exists if the primary key (Attribute A) determines non-key Attribute B, and Attribute B determines non-key Attribute C.

Page 52: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Decomposition Of A Second-normal-form (2nf) Table

Invoice# Date Customer# Salesperson

INVOICES (3NF)

Salesperson Region

SALESPERSON-REGION (3NF)

Invoice# Date Customer# Salesperson Region

SALES (2NF)

This is a transitive dependency which must be eliminated for 3NF

Page 53: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Summary Of 3nf Relations For Sales Database

Item# PriceDescription

ITEMS (3NF)

Salesperson Region

SALESPERSON-REGION (3NF)

Invoice# Date Customer# Salesperson

INVOICES (3NF)

Item# Quantity

INVOICE-ITEMS-QTY (3NF)

100110021003

7/1/927/1/927/1/92

456329897

JohnMaryAl

WestEastWest

121348540

45105

$2.25$3.70$0.40

WidgetGearBolt

Invoice#

100110021003

121348540

JohnMaryAl

Page 54: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

END

Page 55: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Other Slides

Page 56: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Table 1

Title Author1 Author2

ISBN Subject Pages Publisher

Database System Concepts

Abraham Silberschatz

Henry F. Korth

0072958863 MySQL, Computers

1168 McGraw-Hill

Operating System Concepts

Abraham Silberschatz

Henry F. Korth

0471694665 Computers 944 McGraw-Hill

Page 57: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Orders Table Problems

This table is not very efficient with storage.

This design does not protect data integrity.

Third, this table does not scale well.

Page 58: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

First Normal Form

In our Table, we have two violations of First Normal Form:

First, we have more than one author field,

Second, our subject field contains more than one piece of information. With more than one value in a single field, it would be very difficult to search for all books on a given subject.

Page 59: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

First Normal Table

Table 2

Title Author ISBN Subject Pages Publisher

Database System Concepts

Abraham Silberschatz

0072958863 MySQL 1168 McGraw-Hill

Database System Concepts

Henry F. Korth

0072958863 Computers 1168 McGraw-Hill

Operating System Concepts

Henry F. Korth

0471694665 Computers 944 McGraw-Hill

Operating System Concepts

Abraham Silberschatz

0471694665 Computers 944 McGraw-Hill

Page 60: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

We now have two rows for a single book. Additionally, we would be violating the Second Normal Form…

A better solution to our problem would be to separate the data into separate tables- an Author table and a Subject table to store our information, removing that information from the Book table:

Additional Problems

Page 61: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Subject_ID Subject

1 MySQL

2 Computers

Author_ID Last Name First Name

1 Silberschatz Abraham

2 Korth Henry

ISBN Title Pages Publisher

0072958863 Database System Concepts

1168 McGraw-Hill

0471694665 Operating System Concepts

944 McGraw-Hill

Subject Table Author Table

Book Table

Second Normal Tables

Page 62: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Each table has a primary key, used for joining tables together when querying the data. A primary key value must be unique with in the table (no two books can have the same ISBN number), and a primary key is also an index, which speeds up data retrieval based on the primary key.

Now to define relationships between the tables

Additional Problems

Page 63: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Relationships

ISBN Author_ID

0072958863 1

0072958863 2

0471694665 1

0471694665 2

ISBN Subject_ID

0072958863 1

0072958863 2

0471694665 2

Book_Author Table Book_Subject Table

Page 64: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Second Normal Form

As the First Normal Form deals with redundancy of data across a horizontal row, Second Normal Form (or 2NF) deals with redundancy of data in vertical columns.

As stated earlier, the normal forms are progressive, so to achieve Second Normal Form, the tables must already be in First Normal Form.

The Book Table will be used for the 2NF example

Page 65: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

2NF Table

Publisher_ID Publisher Name

1 McGraw-Hill

ISBN Title Pages Publisher_ID

0072958863 Database System Concepts

1168 1

0471694665 Operating System Concepts

944 1

Publisher Table

Book Table

Page 66: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

2NF

Here we have a one-to-many relationship between the book table and the publisher. A book has only one publisher, and a publisher will publish many books. When we have a one-to-many relationship, we place a foreign key in the Book Table, pointing to the primary key of the Publisher Table.

The other requirement for Second Normal Form is that you cannot have any data in a table with a composite key that does not relate to all portions of the composite key.

Page 67: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Third Normal Form

Third normal form (3NF) requires that there are no functional dependencies of non-key attributes on something other than a candidate key.

A table is in 3NF if all of the non-primary key attributes are mutually independent

There should not be transitive dependencies

Page 68: Normalization CSC 3800 Fall 2008. Database Normalization Database normalization is the process of removing redundant data from your tables to improve

Boyce-Codd Normal Form

BCNF requires that the table is 3NF and only determinants are the candidate keys