Upload
tiara
View
43
Download
0
Tags:
Embed Size (px)
DESCRIPTION
IT-501 Database Management Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture 07 Relational Database Design Normalization-Part-1. Outline. Overview of Relational DBMS Normalization(1 st lecture). Normalization. - PowerPoint PPT Presentation
Citation preview
IT-501Database Management
SystemsBy-
Jesmin AkhterAssistant Professor, IIT, Jahangirnagar University
Lecture 07 Relational Database Design
Normalization-Part-1
Outline Overview of Relational DBMS
Normalization(1st lecture)
The aim of normalization is to eliminate various anomalies (or undesirable aspects) of a relation in order to obtain “better” relations.
The following four problems might exist in a relation scheme: Repetition anomaly Update anomaly Insertion anomaly Deletion anomaly
Slide 4
Normalization
Repetition Anomaly The NAME,TITLE, SAL attribute values are repeated
for each project that the employee is involved in. Waste of space Complicates updates Contrary to the spirit of databases
ENO
EMP
ENAME TITLE SAL
J. Doe Elect. Eng. 40000M. Smith 34000M. Smith
AnalystAnalyst 34000
A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000R. Davis Mech. Eng. 27000
E1E2E2E3E3E4E5E6E7E8 J. Jones Syst. Anal. 34000
24
PNO RESP DUR
P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48P3 Engineer 36P3 Manager 40
Update Anomaly If any attribute of project (say SAL of an employee) is
updated, multiple tuples have to be updated to reflect the change.
ENO
EMPENAME TITLE SAL
J. Doe Elect. Eng. 40000M. Smith 34000M. Smith
AnalystAnalyst 34000
A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000R. Davis Mech. Eng. 27000
E1E2E2E3E3E4E5E6E7E8 J. Jones Syst. Anal. 34000
24
PNO RESP DUR
P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48P3 Engineer 36P3 Manager 40
Insertion Anomaly
It may not be possible to store information about a new project until an employee is assigned to it.
ENO
EMP
ENAME TITLE SAL
J. Doe Elect. Eng. 40000M. Smith 34000M. Smith
AnalystAnalyst 34000
A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000
R. Davis Mech. Eng. 27000
E1E2E2E3E3E4E5E6
E7E8 J. Jones Syst. Anal. 34000
24
PNO RESP DUR
P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48P3 Engineer 36
P3 Manager 40
Deletion Anomaly If an engineer, who is the only employee on a project,
leaves the company, his personal information cannot be deleted, or the information about that project is lost.
May have to delete many tuples.
ENO
EMPENAME TITLE SAL
J. Doe Elect. Eng. 40000M. Smith 34000M. Smith
AnalystAnalyst 34000
A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000R. Davis Mech. Eng. 27000
E1E2E2E3E3E4E5E6E7E8 J. Jones Syst. Anal. 34000
24
PNO RESP DUR
P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48P3 Engineer 36P3 Manager 40
What to do? Take each relation individually and “improve” it in terms
of the desired characteristics Normal forms
o Atomic values (1NF)o Can be defined according to keys and dependencies.o Functional Dependencies ( 2NF, 3NF, BCNF)o Multivalued dependencies (4NF)
Normalizationo Normalization is a process of concept separation which applies a
top-down methodology for producing a schema by subsequent refinements and decompositions.
o Do not combine unrelated sets of facts in one table; each relation should contain an independent set of facts.
o Universal relation assumption
Normalization Issues How do we decompose a schema into a desirable
normal form? What criteria should the decomposed schemas follow
in order to preserve the semantics of the original schema? Reconstructability: recover the original relation no spurious
joins Lossless decomposition: no information loss Dependency preservation: the constraints (i.e., dependencies)
that hold on the original relation should be enforceable by means of the constraints (i.e., dependencies) defined on the decomposed relations.
A Combined Schema Without Repetition
Consider combining relations sec_class(sec_id, building, room_number) and section(course_id, sec_id, semester, year) into one relation section(course_id, sec_id, semester, year,
building, room_number) No repetition in this case
What About Smaller Schemas? Suppose we had started with inst_dept. How would we know to split up
(decompose) it into instructor and department? Write a rule “if there were a schema (dept_name, building, budget), then
dept_name would be a candidate key” Denote as a functional dependency:
dept_name building, budget In inst_dept, because dept_name is not a candidate key, the building and
budget of a department may have to be repeated. This indicates the need to decompose inst_dept
Not all decompositions are good. Suppose we decompose employee(ID, name, street, city, salary) intoemployee1 (ID, name)employee2 (name, street, city, salary)
The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition.
A Lossy Decomposition
Example of Lossless-Join Decomposition
Lossless join decomposition Decomposition of R = (A, B, C)
R1 = (A, B) R2 = (B, C)
A B
12
A
B
12
r B,C(r)
A (r) B (r) A B
12
C
AB
B
12
C
AB
C
AB
A,B(r)
Unnormalized (UDF)
First normal form(1NF)
Remove repeating groups
Second normal form(2NF)
Remove partial dependencies
Third normal form(3NF)
Remove transitive dependencies
Boyce-Codd normalform (BCNF)
Remove remaining functional dependency anomalies
Fourth normal form(4NF)
Remove multivalued dependencies
Fifth normal form(5NF)
Remove remaining anomalies
Stages of Normalization
Repeating GroupsA repeating group is an attribute (or set of attributes) that can have more than one value for a primary key value.
staffNo job dept dname city contact NumberSL10 Salesman 10 Sales Stratford 018111777, 018111888, 079311122 SA51 Manager 20 Accounts Barking 017111777DS40 Clerk 20 Accounts Barking NullOS45 Clerk 30 Operations Barking 079311555
Example We have the following relation that contains staff and department details and a list of telephone contact numbers for each member of staff.
Repeating Groups are not allowed in a relational design, since all attributes have to be ‘atomic’ - i.e., there can only be one value per cell in a table!
Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Stud_ID Name Course_ID Units101 Lennon MSI 250, MSI 415 3.00
125 Johnson MSI 331 3.00
Repeating Groups
STUDENT
Functional DependencyFormal Definition: Attribute B is functionally dependant upon attribute A (or a collection of attributes) if a value of A determines a single value of attribute B at any one time.
Formal Notation: A B This should be read as ‘A determines B’ or ‘B is functionally dependant on A’. A is called the determinant and B is called the object of the determinant.
staffNo job dept dname SL10 Salesman 10 SalesSA51 Manager 20 AccountsDS40 Clerk 20 AccountsOS45 Clerk 30 Operations
Example:
staffNo jobstaffNo deptstaffNo dnamedept dname
Functional Dependencies
Functional Dependency
Full Functional Dependency: Only of relevance with composite determinants. This is the situation when it is necessary to use all the attributes of the composite determinant to identify its object uniquely.
order# line# qty price A001 001 10 200A002 001 20 400A002 002 20 800A004 001 15 300
Example:
(Order#, line#) qty(Order#, line#) price
Full Functional Dependencies
Compound Determinants: If more than one attribute is necessary to determine another attribute in an entity, then such a determinant is termed a composite determinant.
Functional DependencyPartial Functional Dependency: This is the situation that exists if it is necessary to only use a subset of the attributes of the composite determinant to identify its object uniquely.
(student#, unit#) grade
Full Functional Dependencies
unit# room
Partial Functional Dependencies
Repetition of data!
student# unit# room grade
9900100 A01 TH224 2
9900010 A01 TH224 14
9901011 A02 JS075 3
9900001 A01 TH224 16
Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key.
CUSTOMER
Cust_ID Name Order_ID
101 AT&T 1234
101 AT&T 156
125 Cisco 1250
Partial Dependency
Functional Dependency
Transitive DependencyDefinition: A transitive dependency exists when there is an intermediate functional dependency.
Formal Notation: If A B and B C, then it can be stated that the following transitive dependency exists: A B C
staffNo deptdept dnamestaffNo dept dname
Transitive Dependencies
Repetition of data!
staffNo job dept dname SL10 Salesman 10 Sales
SA51 Manager 20 AccountsDS40 Clerk 20 AccountsOS45 Clerk 30 Operations
Example:
Transitive Dependency – when a non-key attribute determines another non-key attribute.
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Transitive Dependency
Transitive Dependency
Normal Forms: Review
Unnormalized – There are multivalued attributes or repeating groups
1 NF – No multivalued attributes or repeating groups. 2 NF – 1 NF plus no partial dependencies 3 NF – 2 NF plus no transitive dependencies
Example 1: Determine NF
ISBN Title ISBN Publisher Publisher Address
BOOK
ISBN Title Publisher Address
All attributes are directly or indirectly determined
by the primary key; therefore, the relation is
at least in 1 NF
Example 1: Determine NF
ISBN Title ISBN Publisher Publisher Address
BOOK
ISBN Title Publisher Address
The relation is at least in 1NF. There is no COMPOSITE
primary key, therefore there can’t be partial dependencies.
Therefore, the relation is at least in 2NF
Example 1: Determine NF
ISBN Title ISBN Publisher Publisher Address
BOOK
ISBN Title Publisher Address
Publisher is a non-key attribute, and it determines Address, another non-key attribute.
Therefore, there is a transitive dependency, which means that
the relation is NOT in 3 NF.
Example 1: Determine NF
ISBN Title ISBN Publisher Publisher Address
BOOK
ISBN Title Publisher Address
We know that the relation is at least in 2NF, and it is not in 3 NF. Therefore, we conclude that the relation is in 2NF.
Example 1: Determine NF
ISBN Title ISBN Publisher Publisher
Address
BOOK
ISBN Title Publisher Address
In your solution you will write the following justification:
1) No M/V attributes, therefore at least 1NF
2) No partial dependencies, therefore at least 2NF
3) There is a transitive dependency (Publisher Address), therefore,
not 3NFConclusion: The relation is in 2NF
Product_ID Description
ORDER
Order_No Product_ID Description
Example 2: Determine NF
All attributes are directly or indirectly determined by the
primary key; therefore, the relation is at least in 1 NF
Product_ID Description
Example 2: Determine NF
ORDER
Order_No Product_ID Description
The relation is at least in 1NF. There is a COMPOSITE Primary Key (PK) (Order_No,
Product_ID), therefore there can be partial dependencies. Product_ID, which is a part of PK, determines Description; hence, there is a partial dependency. Therefore, the relation is not 2NF. No sense to check for transitive dependencies!
Product_ID Description
Example 2: Determine NF
ORDER
Order_No Product_ID Description
We know that the relation is at least in 1NF, and it is not in 2 NF.
Therefore, we conclude that the relation is in 1 NF.
Product_ID Description
Example 2: Determine NF
ORDER
Order_No Product_ID Description
In your solution you will write the following justification:
1) No M/V attributes, therefore at least 1NF2) There is a partial dependency (Product_ID Description), therefore
not in 2NFConclusion: The relation is in 1NF
Thank You