Upload
avdi-thachi
View
240
Download
0
Embed Size (px)
Citation preview
7/31/2019 8.1. Normalization
1/59
Prof. Margita Kon-Popovska 2006
DATABASE SYSTEMS
Fall 2006
Normalization
7/31/2019 8.1. Normalization
2/59
Prof. Margita Kon-Popovska 2006 2
Relation
Definition: A relation is a named, two-dimensional table of data
Table is made up of rows (records), and columns (attribute or field)
Not all tables qualify as relations. Requirements:
Every relation has a unique name.
Every attribute value isatomic(not multivalued, not composite)
Every row is unique (cant have two rows with exactly the same values forall their fields)
Attributes (columns) in tables have unique names
The order of the columns is irrelevant
The order of the rows is irrelevant
NOTE: all relations are in 1st
Normal form
7/31/2019 8.1. Normalization
3/59
Prof. Margita Kon-Popovska 2006 3
Example
QuestionIs this a relation? AnswerNo due to repeating groups
QuestionWhats the primary key? AnswerEmp_ID
7/31/2019 8.1. Normalization
4/59
Prof. Margita Kon-Popovska 2006 4
Question: Is this a relation-Is this in 1st NF?
Answer : Yes, unique rows and nomultivalued attributes
Question: Whats the primary key? Answer: Composite primary keyEmp_ID, Course_Title
7/31/2019 8.1. Normalization
5/59
Prof. Margita Kon-Popovska 2006 5
Data Normalization
Primarily a tool to validate and improve a logical designso that it satisfies certain constraints that avoidunnecessary duplication of data,
i.e. reduce redundancy
Achieve a design that is highly flexible
Ensure that the design is free of certain update,insertion and deletion anomalies
The process of decomposing relations with anomalies toproduce smaller, well-structuredrelations
7/31/2019 8.1. Normalization
6/59
Prof. Margita Kon-Popovska 2006 6
Normalization
The process of decomposing complex datastructures into simple relations according to aset of dependency rules.
McFadden and Hoffer
7/31/2019 8.1. Normalization
7/59 Prof. Margita Kon-Popovska 2006 7
Well-Structured Relations
A relation that contains minimal data redundancy andallows users to insert, delete, and update rows withoutcausing data inconsistencies
Goal is to avoid anomalies
Insertion Anomaly adding new rows forces user to createduplicate data
Deletion Anomaly deleting rows may cause a loss of data thatwould be needed for other future rows
Modification Anomaly changing data in a row forces changes
to other rows because of duplication
General rule of thumb: a table should not pertain to
more than one entity type
7/31/2019 8.1. Normalization
8/59
7/31/2019 8.1. Normalization
9/59 Prof. Margita Kon-Popovska 2006 9
Possible inconsistencies on updating
null key
insertion anomaly
update
anomaly
potential deletionanomaly
duplication
7/31/2019 8.1. Normalization
10/59 Prof. Margita Kon-Popovska 2006 10
BCNF
3NF
2NF
Normalization
1NF
Progressively puttingthe relation into a
higher normal form
to get well structuredrelations4NF
5NF
*Domain.Key NF DK/NF
7/31/2019 8.1. Normalization
11/59 Prof. Margita Kon-Popovska 2006 11
Steps innormalization
7/31/2019 8.1. Normalization
12/59 Prof. Margita Kon-Popovska 2006 12
First Normal Form (1NF)
No multivalued attributes-Every attribute value is atomic
isin 1NF
is notin 1NF(multivalued attributes)
it is not a relation
7/31/2019 8.1. Normalization
13/59
Prof. Margita Kon-Popovska 2006 13
Example: Invoice
How to get Relations (tables) in 1NF
7/31/2019 8.1. Normalization
14/59
Prof. Margita Kon-Popovska 2006 14
Stereos To GoInvoice
Order No.
Date / /
Account No.
Item
Number Product Description/Manufacturer Qty Price
Product
Code
1
2
3
4
5
Date Delivered / /
Customer:
Address:
City State Zip Code
10001
5 8 03
0000-000-0000-0
John Smith
2036-26 StreetSacramento CA 95819
SAGX730 Pioneer Remote A/V Receiver
AT10 Cervwin Vega Loudspeakers
CDPC725 Sony Disc-Jockey CD Changer
6 8 03
Subtotal
Shipping & Handling
Sales Tax
Total
132985
10000
10306
153291
1
1
1
56995
35995
39995
Go, Hogs
1/05
Stereos To Go
0000 000 0000 0
John Smith
7/31/2019 8.1. Normalization
15/59
Prof. Margita Kon-Popovska 2006 15
Unnormalized table
How would we get 1NF of invoice?
(Invoice-no, Invoice-date, Date-delivered, Cust-account, Cust-name,Cust-addr, Cust-city, Cust-state, Zip-code,Item1, Item_descrip1, Item_qty1, Item_price1,Item2, Item_descrip2, Item_qty2, Item_price2, . . . ,
Item7, Item_descrip7, Item_qty7, Item_price7)
repeatinggroup
A relation is in first normal form if and only ifevery attribute is single-valued for each tuple.
Remove all repeating groups
Create a flat file
7/31/2019 8.1. Normalization
16/59
Prof. Margita Kon-Popovska 2006 16
Unnormalized to 1NF
Nominated group of attributesto serve as the key
(form a unique combination)
Repeating groups
eliminated.
Each row retains data forone item.
If a person bought 5 items,
we would have five tuples
(Invoice-no, Invoice-date, Date-delivered, Cust-account, Cust-name, Cust-addr, Cust-city, Cust-state, Zip-code,Item, Item_descrip, Item_qty, Item_price)
7/31/2019 8.1. Normalization
17/59
Prof. Margita Kon-Popovska 2006 17
1NF
10001 123456 John Smith SAGX730 Pioneer Remote A/V Rec 1 569.95
10001 123456 John Smith AT10 Cerwin Vega Loudspeakers 1 359.95
10001 123456 John Smith CDPC725 Sony Disc Jockey CD 1 399.95
10001 123456 John Smith S/H Shipping 1 100.00
10001 123456 John Smith Tax Sales Tax 1 103.06
Flat File
DescriptionItem
QuantityItemPriceItem
7/31/2019 8.1. Normalization
18/59
Prof. Margita Kon-Popovska 2006 18
Functional Dependencies
Functional Dependency: The value of oneattribute (set of attributes) (the determinant)determines the value of another attribute (setof attributes)
Functional dependency B is functionally dependent on A if each value of A
is associated with exactly one value of B
While a primary key is always a determinant,a determinant is not necessarily a primary key
Attribute A Attribute B
Determinant
7/31/2019 8.1. Normalization
19/59
Prof. Margita Kon-Popovska 2006 19
Full Functional Dependencies
Full Functional dependency B is full functionally dependent on A if it is functionally
dependant on entire A and not on part of A
Determinant
Attribute BAttribute A1,A2..An
7/31/2019 8.1. Normalization
20/59
Prof. Margita Kon-Popovska 2006 20
Functional dependences and Keys
Candidate Key: A unique identifier. One of the candidate keys will
become the primary key
E.g. perhaps there is both credit card number and SS# in atablein this case both are candidate keys
Non-Key attribute, attribute not part of any candidatekey
Each non-key attribute is functionally dependent onevery candidate key (1NF)
Each non-key attribute is fully functionally dependenton every candidate key (2NF)
7/31/2019 8.1. Normalization
21/59
Prof. Margita Kon-Popovska 2006 21
Second Normal Form (2NF)
1NF + every non-key attribute is fully functionallydependent on the ENTIRE key
Every non-key attribute must be defined by the entirekey, not by only part of the key
No partial functional dependencies
So Table EMPLOEE2 is NOT in 2NF
(see next slide)
7/31/2019 8.1. Normalization
22/59
Prof. Margita Kon-Popovska 2006 22
Functional Dependencies Example
Dependency on entire primary key
Dependency on onlypartof the key
EmpID, CourseTitle DateCompleted
EmpID Name, DeptName, Salary
Therefore, NOT in 2nd Normal Form!!
7/31/2019 8.1. Normalization
23/59
Prof. Margita Kon-Popovska 2006 23
Getting it into 2nd Normal Form
Decomposed into two separate relations
Both are fullfunctional
dependencies
7/31/2019 8.1. Normalization
24/59
7/31/2019 8.1. Normalization
25/59
Prof. Margita Kon-Popovska 2006 25
Example: Invoice
How to get Relations (tables) in 2NF
7/31/2019 8.1. Normalization
26/59
7/31/2019 8.1. Normalization
27/59
Prof. Margita Kon-Popovska 2006 27
Some of the attributes are dependentupon invoice_number for
their values, some are dependant upon invoice_number anditem, and others are dependentupon item only.
In either case, they are notfunctionally dependenton the entire
key.
Using Invoice number and Item as the key...
(Invoice_number, Invoice_date, Date_delivered, Cust_account,Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,Item, Item_descrip, Item_qty, Item_price)
7/31/2019 8.1. Normalization
28/59
7/31/2019 8.1. Normalization
29/59
Prof. Margita Kon-Popovska 2006 29
(Invoice_number, Invoice_date, Date_delivered,Cust_account, Cust_name, Cust_addr, Cust_city,Cust_state, Zip_code)
Composite key
Partial dependency
(Invoice_number, Invoice_date, Date_delivered, Cust_account,
Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)
(Invoice_number, Item, Item_descrip, Item_qty, Item_price)
(Item, Item_descrip)
(Invoice_number, Item, Item_descrip, Item_qty, Item_price)
7/31/2019 8.1. Normalization
30/59
Prof. Margita Kon-Popovska 2006 30
Third Normal Form (3NF)
Transitive dependenciesone attribute functionally determines a second, whichfunctionally determines a third)
2NF + no transitive dependenciesA relation is in third normal form if it is in second normalform and no nonkey attribute is transitively dependent onthe key. Remove transitive dependencies
Each nonkey attribute must depend upon the key, the wholekey, and nothing but key.
Kent, 1978
7/31/2019 8.1. Normalization
31/59
Prof. Margita Kon-Popovska 2006 31
Example: Relation with transitive dependency
(a) SALES relation with simple data
7/31/2019 8.1. Normalization
32/59
7/31/2019 8.1. Normalization
33/59
Prof. Margita Kon-Popovska 2006 33
Removing a transitive dependency
(a) Decomposing the SALES relation
7/31/2019 8.1. Normalization
34/59
Prof. Margita Kon-Popovska 2006 34
Relations in 3NF
Now, there are no transitive dependencies
Both relations are in 3rd NF
CustID Name
CustID Salesperson
Salesperson Region
7/31/2019 8.1. Normalization
35/59
Prof. Margita Kon-Popovska 2006 35
Example: Invoice
How to get Relations (tables) in 3NF
7/31/2019 8.1. Normalization
36/59
Prof. Margita Kon-Popovska 2006 36
From 2NF to 3NF
Which attributes are dependent on others?
Is there a problem?
(Invoice_number, Invoice_date, Date_delivered, Cust_account,Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)
(Invoice_number, Item, Item_qty, Item_price)
(Item, Item_descrip)
7/31/2019 8.1. Normalization
37/59
Prof. Margita Kon-Popovska 2006 37
Transitive Dependencies and Anomalies
Insertion anomalies
To add a new row, all customer (name, address, city,state, zip code, phone) and products (description)must be consistent with previous entries
Deletion anomalies
By deleting a row, a customer or product may ceaseto exist
Modification anomalies To modify a customers or products data in one row,
all modifications must be carried out to all others
7/31/2019 8.1. Normalization
38/59
Prof. Margita Kon-Popovska 2006 38
Deletion Anomaly
4377182 John Smith Sacramento CA 95831
4398711 Al Gore Davis CA 95691
4578461 Gray Davis Sacramento CA 95831
4873179 Lisa Carr Reno NV 89557
By deleting customer Al Gore, we would also be deleting
Town Davis, and State California.
Invoice number
7/31/2019 8.1. Normalization
39/59
Prof. Margita Kon-Popovska 2006 39
TransitiveDependencies
Invoice_number
Invoice_date
Date_delivered
Cust_accountCust_name
Cust_addr
Cust_city
Cust_state
Zip_code
Item
Item_descripInvoice_number+Item
Item_qty
Item_price
A condition where A, B, Care attributes of a relationsuch that if A B andB C, then C is
transitively dependent onA via B (provided that A isnot functionally dependent
on B or C).
Why Should City and State Be Separated from
7/31/2019 8.1. Normalization
40/59
Prof. Margita Kon-Popovska 2006 40
Why Should City and State Be Separated fromCustomer Relation?
City and state are dependent on zip code for their valuesand not the customers identifier (i.e., key).
Zip_code City, State
Otherwise,
Cust_account Cust_addr, Zip_code City, State
In which case, you have transitive dependency.
7/31/2019 8.1. Normalization
41/59
Prof. Margita Kon-Popovska 2006 41
3NF
Invoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code Relation(Zip_code, City, State)
Invoice_items Relation(Invoice_number, Item, Item_qty, Item_price)
Items Relation(Item, Item_descrip)
7/31/2019 8.1. Normalization
42/59
Prof. Margita Kon-Popovska 2006 42
Further Anomalies
DVD-A110 PanasonicPV-4210 PanasonicPV-4250 Panasonic
CT-32S35 PAN
Inconsistency
DVD-A110 PanasonicPV-4210 PanaSonicPV-4250 Pana SonicCT-32S35 PAN
In Item-decript manufacturer
name is contained
It will be useful to change all
Panasonic products
manufacturer name to
Panasonic USA
Item Item_descrip
Insert a new Panasonic product
7/31/2019 8.1. Normalization
43/59
Prof. Margita Kon-Popovska 2006 43
3NF
Invoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code Relation(Zip_code, City, State)
Invoice_items Relation(Invoice_number, Item, Item_qty, Item_price)
Items Relation(Item, Item_descrip)
Since the Items relation contains the manufacturers name in thedescription, a separate Manufacturers relation can be created
Manufacturers Relation(Manuf_code, Manuf_name)
First to Third Normal Form (1NF 3NF)
7/31/2019 8.1. Normalization
44/59
Prof. Margita Kon-Popovska 2006 44
First to Third Normal Form (1NF - 3NF)
1NF: A relation is in first normal form if and only if everyattribute is single-valued for each tuple(remove therepeating or multi-value attributes and create a flat file)
2NF: A relation is in second normal form if and only if it
is in first normal form and the nonkey attributes are fullyfunctionally dependent on the key(remove partialdependencies)
3NF: A relation is in third normal form if it is in second
normal form and no nonkey attribute is transitivelydependent on the key(remove transitive dependencies)
7/31/2019 8.1. Normalization
45/59
Example (Employee)
7/31/2019 8.1. Normalization
46/59
Prof. Margita Kon-Popovska 2006 46
Example (Employee)
EMPLOYEE (EmpId, Name, Dept, Salary, Course1, DateTook1, Fee1,
Course2, DateTook2, Fee2, )
7/31/2019 8.1. Normalization
47/59
7/31/2019 8.1. Normalization
48/59
Prof. Margita Kon-Popovska 2006 48
If employees can take a course more than once: TOOK_COURSE ( EmpId, Course, DateTook)
2NF
7/31/2019 8.1. Normalization
49/59
Prof. Margita Kon-Popovska 2006 49
Example (Hospital)
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug g s ide effec
1111
145
311
Jan 1,
1995; June
12, 1995 John White
15 New St.
New York,
NY
Beth Little
Michael
Diamond
Gallstone
s removal;
Kidney
stones
removal
Penicillin,
none-
rash
none
1234
243
467
Apr 5,1994 May
10, 1995 Mary Jones
10 Main St.
Rye, NY
Charles
FieldPatricia
Gold
Eye
Cataract
removalThrombos
is removal
Tetracyclin
e none
Fever
none
2345 189
Jan 8,
1996 Charles Brown
Dogwood
Lane
Harrison,
NY
David
Rosen
Open
Heart
Surgery
Cephalosp
orin none
4876 145Nov 5,1995 Hal Kane
55 Boston
Post Road,
Chester,CN Beth Little
Cholecystectomy Demicillin none
5123 145
May 10,
1995 Paul Kosher
Blind Brook
Mamaronec
k, NY Beth Little
Gallstone
s
Removal none none
6845 243
Apr 5,
1994 Dec15, 1984 Ann Hood
Hilton Road
Larchmont,NY CharlesField
Eye
Cornea
Replacem
ent Eye
cataractremoval Tetracycline Fever
NF
7/31/2019 8.1. Normalization
50/59
Prof. Margita Kon-Popovska 2006 50
1NF
Patient # Surgeon # Surgery DatePatient NamePatient AddrSurgeon Name Surgery Drug admin ide Effect
1111 145 01-Jan-95 John White
15 New St.
New York,
NY Beth Little
Gallstone
s removal Penicillin rash
1111 311 12-Jun-95 John White
15 New St.
New York,
NY
Michael
Diamond
Kidney
stones
removal none none
1234 243 05-Apr-94 Mary Jones
10 Main St.
Rye, NY Charles Field
Eye
Cataract
removal
Tetracyclin
e Fever
1234 467 10-May-95 Mary Jones
10 Main St.
Rye, NY Patricia Gold
Thrombos
is removal none none
2345 189 08-Jan-96
Charles
Brown
Dogwood
Lane
Harrison,
NY David Rosen
Open
Heart
Surgery
Cephalosp
orin none
4876 145 05-Nov-95 Hal Kane
55 Boston
Post Road,
Chester,
CN Beth Little
Cholecyst
ectomy Demicillin none
5123 145 10-May-95 Paul Kosher
Blind Brook
Mamaronec
k, NY Beth Little
Gallstone
s
Removal none none
6845 243 05-Apr-94 Ann Hood
Hilton Road
Larchmont,
NY Charles Field
Eye
Cornea
Replacem
ent
Tetracyclin
e Fever
6845 243 15-Dec-84 Ann Hood
Hilton Road
Larchmont,
NY Charles Field
Eye
cataract
removal none none
2NF
7/31/2019 8.1. Normalization
51/59
Prof. Margita Kon-Popovska 2006 51
2NF
Patient # Patient Name Patient Address
1111 John White
15 New St. New
York, NY
1234 Mary Jones
10 Main St. Rye,
NY
2345
Charles
Brown
Dogwood Lane
Harrison, NY
4876 Hal Kane
55 Boston Post
Road, Chester,
5123 Paul Kosher
Blind Brook
Mamaroneck, NY
6845 Ann HoodHilton RoadLarchmont, NY
Surgeon # Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
7/31/2019 8.1. Normalization
52/59
Prof. Margita Kon-Popovska 2006 52
Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects
1111 145 01-Jan-95
Gallstones
removal Penicillin rash
1111 311 12-Jun-95
stones
removal none none
1234 243 05-Apr-94
Eye Cataract
removal Tetracycline Fever
1234 467 10-May-95
Thrombosis
removal none none
2345 189 08-Jan-96
Open Heart
Surgery
Cephalospori
n none
4876 145 05-Nov-95
Cholecystect
omy Demicillin none
5123 145 10-May-95
Gallstones
Removal none none
6845 243 15-Dec-84
Eye cataract
removal none none
6845 243 05-Apr-94
Eye Cornea
Replacement Tetracycline Fever
7/31/2019 8.1. Normalization
53/59
Prof. Margita Kon-Popovska 2006 53
Example
Work on
project
3NF
7/31/2019 8.1. Normalization
54/59
Prof. Margita Kon-Popovska 2006 54
3NF
Find candidate keys, primary keys, all functional
7/31/2019 8.1. Normalization
55/59
Prof. Margita Kon-Popovska 2006 55
dependences and transform following relations in 3NF
Housing
StId Dorm Fee
100 B1101 1000
101 B1102 1100
102 B2101 1000
StId Dorm, StId Fee
not in 3NF because of DormFee
1.
7/31/2019 8.1. Normalization
56/59
7/31/2019 8.1. Normalization
57/59
Prof. Margita Kon-Popovska 2006 57
EXAM(S#, Name, P#, NameCourse, Mark)
STUDENT(S#, Name,P#, NnameCourse)
STUDENT(S#,P#, Na,e, NameCourse, Gender,
Mark, Data)
4.
E l i d t li
7/31/2019 8.1. Normalization
58/59
Prof. Margita Kon-Popovska 2006 58
Explain update anomalies
PRODUCTION(P#, M#, E#)
P1 M1 E1
P2 M2 E3
P3 M1 E1P4 M1 E1
P5 M3 E2
P6 M4 E1
P#M#P# E#
M# E#
P stands for Product
M stands for Machine
E stands for Employee
Willinsertion
produce inconsistence
7/31/2019 8.1. Normalization
59/59
Determine Candidate keys and primarykey, functional dependences
and see if this relation could be normalizedAdvisory
St Id Major Advisor
100 Math Smith
100 Language Ringo
101 Language Yung
101 Math Smith
102 Math Peris