23
D D B B S S Y Y S S T T E E M M S S Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business 3500 DBMS Bob Travica Updated 2010

DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

Embed Size (px)

Citation preview

Page 1: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

Chapter 3

Data Normalization

Based on G. Post, DBMS: Designing & Building Business

Applications

University of ManitobaAsper School of Business

3500 DBMSBob Travica

Updated 2010

Page 2: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

2 of 23

Normalization

The process of putting data into the format of relational databases (or, organizing data for relational databases)

Practically boils down to defining tables so that a) problems (anomalies) with insertion, deletion and

modification of data are avoidedb) data quality is preserved (completeness, integrity)c) redundancy is reduced

Page 3: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

3 of 23

Relational Database Terminology

Relational database: A collection of tables (relations). Tables store atomic data.

Table: A collection of columns (attributes, properties, fields)

describing an entity (class).

Table is also a collection of rows (records) each with the same number of columns.

Each row stores data on objects (entity instances).

EmployeeID TaxpayerID LastName FirstName HomePhone Address

12512 888-22-5552 Cartom Abdul (603) 323-9893 252 South Street15293 222-55-3737 Venetiaan Roland (804) 888-6667 937 Paramaribo Ln22343 293-87-4343 Johnson John (703) 222-9384 234 Main Street29387 837-36-2933 Stenheim Susan (410) 330-9837 8934 W. Maple

Attributes/Properties

Rows/Objects

Entity (Class): EmployeeTable: Employee

Page 4: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

4 of 23

Relational Database Terminology – Primary Key

Every table has a primary key (key) – an attribute that uniquely identifies each row (e.g., EmployeeID on previous slide)

Primary key can span more than one column combined (combined, composite, concatenated) key.

Note: Watch for data types (e.g., number vs. text) and naming rules (arbitrary but consistent).

OrderItem

OrderID ItemID Quantity 1 229 2 1 253 4 2 229 1 2 555 4

Primary key can be generated automatically by DBMS – surrogate key.

Other attributes are called non-key columns.

Page 5: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

5 of 23

Relational Database Shorthand Notation

Customer(CustomerID, LastName, FirstName, Address, City, State,

ZipPostalCode, TelephoneNumber) *

Table nameNon-key columns

Primary key is underlined

• Note: Telephone number can be used as a “backup key.”

Page 6: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

6 of 23

Order Management Application

Customer

Order

Salesperson

Item

OrderItem

1

*

1

*

1

1*

*

Normalized Tables Diagram,

Schema

Non-Normalized Class Diagram

Customer

Order

Salesperson

Item

1

*

1

*

*

*

OrderItem

Association class

(ItemOrdered,

OrderDetail, etc.)

Page 7: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

7 of 23

Customer(CustomerID, Name, Address, City, Phone)

Salesperson(EmployeeID, Name, DateHired)

Order(OrderID, OrderDate, CustomerID, EmployeeID)

OrderItem(OrderID, ItemID, Quantity)

Item(ItemID, Description, ListPrice)

Shorthand Notation for

Normalized Tables Diagram – Foreign Key

• Foreign Key = Attribute that is a key in another table (e.g., CustomerID in Order).

• Logic & naming of OrderItem: Replacing the Order-Item many-to-many

relationship with two 1:M relationships.

• OrderItem has a combined key—OrderID+ItemID.

Page 8: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

8 of 23

Page 9: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

9 of 23

Video Store Transaction Management System (VTMS):Classes, Columns & Business Rules

Customer table Key: CustomerID Attributes:

Name Address Phone

Video table Key: VideoID Attributes :

Title RentalFee Rating…

RentalTransaction table Key: TransactionID Attributes :

CustomerID Date

VideoRented table Key: TransactionID + VideoID Attributes:

Copy#

“Static (Master) Data”—Market &Inventory Entities

(don’t change often)

“Dynamic Data” (Transaction Data)— Operations Entities(change more often)

Business Rules: A customer can have many transactions… Each transaction can include

many videos… A transaction can include only one copy of a particular video...

Page 10: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

10 of 23

Normalized Schema for STMS In Short Hand Notation

Customer(CustomerID, LastName, FirstName, Address, City, …)

VideoRented(TransID, VideoID, Copy#)

Video(VideoID, Title, RentalFee)

RentalTransaction(TransID, RentDate, CustomerID)Transaction data stored in 2

tables due to the business rule

that a rental transaction can

include just 1 copy of a video.

Page 11: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

11 of 23

• How to get to those four tables from the business rule? • Are not these two tables enough?

Customer Video * rents *

Not good because:• Transaction data would have to be part of table Customer (or Video), which causes

repetition of Customer data for each transaction—redundancy.

Customer(CustomerID, LastName, FirstName, … VideoID, Date)

Video(VideoID, Title, RentalFee)

• Deletion of transaction data causes deletion of customer data—

deletion anomaly.

• New customers cannot be added because VideoID as part of the key

in Video cannot be empty —insertion anomaly.

Why Normalize – Avoiding data anomalies

Partial schema for this class diagram:

Page 12: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

12 of 23

Normalization

Rule of Thumb: Each many-to-many relationship must be replaced by 2 one-to-many relationships (see Customer-Order-Item above).

Customer Video * rents *

RentalTransaction

1

* *

*has

includes

1.

Customer Video

RentalTransaction

1

**

*has contains

VideoRented1 includes*

1 is rented

*

2.

Table VideoRented tracks each copy of a particular video. Multiplicity on the video side is forced down to 1, which enforces the business rule that only 1 copy of a video can be rented out in a transaction (slides 9 & 10).

Still M:M

How to track different copiesof same video?

Page 13: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

13 of 23

Normalization – Step by Step

Interview users, understand output needed. Put data into a large table (RentalForm).

Pick out attributes.

Find repeating groups.

Look for potential keys.

Identify computed values.

RentalForm(TransID, RentDate, (CustomerID, Name, Address, City, State, …),(VideoID, Copy#, Title, RentalFee))

Page 14: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

14 of 23

Problems with Repeating Groups (Sections)

RentalForm(TransID, RentDate, (CustomerID, Phone, Name, Address, City, State, …),(VideoID, Copy#, Title, Rent))

TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title Rent1 4/18/02 3 Washington 502-777-7575 95 Easy Street 1 2 2001: A Space Odyssey $1.501 4/18/02 3 Washington 502-777-7575 95 Easy Street 6 3 Clockwork Orange $1.502 4/30/02 7 Lasater 615-888-4474 67 S. Ray Drive 8 1 Hopscotch $1.502 4/30/02 7 Lasater 615-888-4474 67 S. Ray Drive 2 1 Apocalypse Now $2.002 4/30/02 7 Lasater 615-888-4474 67 S. Ray Drive 6 1 Clockwork Orange $1.50

Repeating Groups

Problems:

• Insertion Anomaly: Inserting a Customer creates blank space in video and transactions columns. With VideoID as part of key, customer and video data must be inserted at the same time.

• Deletion Anomaly: Delete transaction data => delete customer and video data.

• Useless redundancy & wasted storage.

• If there are repeating sections, the table is not in the first normal form (1NF).

Page 15: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

15 of 23

First Normal Form (1NF)

1NF: A table is in 1NF if it does not have repeating sections.

Normalization Procedure: Remove repeating sections by splitting the initial table into new tables. Link new tables on the key from the initial table.

RentalTransaction(TransID, RentDate)

Video(TransID, VideoID, Copy#, Title, RentalFee)

Customer(TransID, CustomerID, Phone, Name, Address, City, State, ZipCod)

NewReminder of

initial table

Page 16: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

16 of 23

Problems with First Normal Form

There are problems concerning the relationship between the key and non-keys.

Concept of Functional Dependence: An attribute depends on another attribute if changing the later causes a

change of the former. The key column must be sufficient for determining values of the non-

key columns.

TransID VideoID Copy# Title RentalFee1 1 2 2001: A Space Odyssey $1.501 6 3 Clockwork Orange $1.502 8 1 Hopscotch $1.502 2 1 Apocalypse Now $2.002 6 1 Clockwork Orange $1.50

Video

Apply only to tables with concatenated keys:

Page 17: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

17 of 23

Problems with First Normal Form (cont.)

If any non-key column depends just on a part of the key (there is partial functional dependence), the table is not in 2NF.

VideoID is sufficient for predicting titles and rental fees. There is Partial Functional Dependency between the combined key and Title and RentalFee.

Copy# depends on full key (TransID + VideoID) --Full Functional Dependency on the key.

Video(TransID, VideoID, Copy#, Title, RentalFee)

Combined determine

Sufficient to determine

Page 18: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

18 of 23

Second Normal Form (2NF)

2NF: A table is in 2NF if it is (a) is 1NF and (b) non-key columns depend on the entire key.

Normalization Procedure: Move TransID and Copy# into a new table VideoRented. Preserve a link between Video and VideoRented by importing

VideoID in table VideoRented.

Video(TransID, VideoID, Copy#, Title, RentalFee)

move moveexport

VideoRented(TransID, VideoID, Copy#) New

Video(VideoID, Title, RentalFee) Resulting Video table

Page 19: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

19 of 23

Table Customer must also be brought into 2NF by moving TransID

into table RentalTransaction (already there) and exporting CustomerID.

Customer(TransID, CustomerID, Phone, Name, Address, City, State,…)

RentalTransaction(TransID, RentDate, CustomerID)

move export

Completed

Resulting Customer table

Customer(CustomerID, LastName, FirstName, Address, City, …)

Finalize 2NF…

Page 20: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

20 of 23

Third Normal Form (3NF)

Problems with 3NF: If any non-key depends on some other non-key there is transitive dependency and the table is not in 3NF.

3 NF: Table is in 3NF if it is (a) in 2NF, and (b) each non-key attribute depends on the key only.

Our design is already in 3NF!

Customer(CustomerID, LastName, FirstName, Address, City, …)

VideoRented(TransID, VideoID, Copy#)

Video(VideoID, Title, RentalFee)

RentalTransaction(TransID, RentDate, CustomerID)

Page 21: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

21 of 23

Table in 2NF: Sales(CustomerID, CustomerName, Salesperson, Region)

3NF Example

• Solution – split table into 12 tables: :

Sales(CustomerID, CustomerName, Salesperson)

Salesperson(Salesperson, Region)

• Violation of 3NF: Region (non-key) is dependent on Salesperson.

• Forms beyond the 3rd are very rare and reaching 3NF is sufficient

for practical purposes.

Page 22: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

22 of 23

Schema for VSTMS Allowing Multiple Copies perTransaction

Customer(CustomerID, LastName, FirstName, Address, City, …)

Video(VideoID, Title, RentalFee)

RentalTransaction(TransID, CustomerID, VideoID, RentDate)

Note:

Video key can be made unique: VideoID = 85.1 (decimal place designates a copy),

or 85c1 (text type), or use a bar code for each video and copy (ItemID).

1

1

*

*

Page 23: DBSYSTEMS Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business

DDBB

SSYYSSTTEEMMSS

23 of 23

Normalization Summary (Must know!)

1) If a table has repeating sections, there is huge redundancy and different classes are

mixed together. Split the table, so that classes are clearly differentiated. Result: 1NF.

2) If a table has a combined key, non-key columns may depend on just a part of the primary key, and so there is partial functional dependency. Split the table so that in new tables non-keys depend on the entire key. Result: 2NF.

3) If a non-key depends on another non-key, there is transitive dependency. Split the table so that in new tables each non-key depends on the key and nothing but the key. Result: 3NF.

1NF: A table is in 1NF if it does not have repeating sections.

2NF: A table is in 2NF if it is in 1NF and non-key columns depend on the entire key.

3NF: A table is in 3NF if it is in 2NF and all non-keys depend on the key only.