8.1. Normalization

Embed Size (px)

Citation preview

  • 7/31/2019 8.1. Normalization

    1/59

    Prof. Margita Kon-Popovska 2006

    DATABASE SYSTEMS

    Fall 2006

    Normalization

  • 7/31/2019 8.1. Normalization

    2/59

    Prof. Margita Kon-Popovska 2006 2

    Relation

    Definition: A relation is a named, two-dimensional table of data

    Table is made up of rows (records), and columns (attribute or field)

    Not all tables qualify as relations. Requirements:

    Every relation has a unique name.

    Every attribute value isatomic(not multivalued, not composite)

    Every row is unique (cant have two rows with exactly the same values forall their fields)

    Attributes (columns) in tables have unique names

    The order of the columns is irrelevant

    The order of the rows is irrelevant

    NOTE: all relations are in 1st

    Normal form

  • 7/31/2019 8.1. Normalization

    3/59

    Prof. Margita Kon-Popovska 2006 3

    Example

    QuestionIs this a relation? AnswerNo due to repeating groups

    QuestionWhats the primary key? AnswerEmp_ID

  • 7/31/2019 8.1. Normalization

    4/59

    Prof. Margita Kon-Popovska 2006 4

    Question: Is this a relation-Is this in 1st NF?

    Answer : Yes, unique rows and nomultivalued attributes

    Question: Whats the primary key? Answer: Composite primary keyEmp_ID, Course_Title

  • 7/31/2019 8.1. Normalization

    5/59

    Prof. Margita Kon-Popovska 2006 5

    Data Normalization

    Primarily a tool to validate and improve a logical designso that it satisfies certain constraints that avoidunnecessary duplication of data,

    i.e. reduce redundancy

    Achieve a design that is highly flexible

    Ensure that the design is free of certain update,insertion and deletion anomalies

    The process of decomposing relations with anomalies toproduce smaller, well-structuredrelations

  • 7/31/2019 8.1. Normalization

    6/59

    Prof. Margita Kon-Popovska 2006 6

    Normalization

    The process of decomposing complex datastructures into simple relations according to aset of dependency rules.

    McFadden and Hoffer

  • 7/31/2019 8.1. Normalization

    7/59 Prof. Margita Kon-Popovska 2006 7

    Well-Structured Relations

    A relation that contains minimal data redundancy andallows users to insert, delete, and update rows withoutcausing data inconsistencies

    Goal is to avoid anomalies

    Insertion Anomaly adding new rows forces user to createduplicate data

    Deletion Anomaly deleting rows may cause a loss of data thatwould be needed for other future rows

    Modification Anomaly changing data in a row forces changes

    to other rows because of duplication

    General rule of thumb: a table should not pertain to

    more than one entity type

  • 7/31/2019 8.1. Normalization

    8/59

  • 7/31/2019 8.1. Normalization

    9/59 Prof. Margita Kon-Popovska 2006 9

    Possible inconsistencies on updating

    null key

    insertion anomaly

    update

    anomaly

    potential deletionanomaly

    duplication

  • 7/31/2019 8.1. Normalization

    10/59 Prof. Margita Kon-Popovska 2006 10

    BCNF

    3NF

    2NF

    Normalization

    1NF

    Progressively puttingthe relation into a

    higher normal form

    to get well structuredrelations4NF

    5NF

    *Domain.Key NF DK/NF

  • 7/31/2019 8.1. Normalization

    11/59 Prof. Margita Kon-Popovska 2006 11

    Steps innormalization

  • 7/31/2019 8.1. Normalization

    12/59 Prof. Margita Kon-Popovska 2006 12

    First Normal Form (1NF)

    No multivalued attributes-Every attribute value is atomic

    isin 1NF

    is notin 1NF(multivalued attributes)

    it is not a relation

  • 7/31/2019 8.1. Normalization

    13/59

    Prof. Margita Kon-Popovska 2006 13

    Example: Invoice

    How to get Relations (tables) in 1NF

  • 7/31/2019 8.1. Normalization

    14/59

    Prof. Margita Kon-Popovska 2006 14

    Stereos To GoInvoice

    Order No.

    Date / /

    Account No.

    Item

    Number Product Description/Manufacturer Qty Price

    Product

    Code

    1

    2

    3

    4

    5

    Date Delivered / /

    Customer:

    Address:

    City State Zip Code

    10001

    5 8 03

    0000-000-0000-0

    John Smith

    2036-26 StreetSacramento CA 95819

    SAGX730 Pioneer Remote A/V Receiver

    AT10 Cervwin Vega Loudspeakers

    CDPC725 Sony Disc-Jockey CD Changer

    6 8 03

    Subtotal

    Shipping & Handling

    Sales Tax

    Total

    132985

    10000

    10306

    153291

    1

    1

    1

    56995

    35995

    39995

    Go, Hogs

    1/05

    Stereos To Go

    0000 000 0000 0

    John Smith

  • 7/31/2019 8.1. Normalization

    15/59

    Prof. Margita Kon-Popovska 2006 15

    Unnormalized table

    How would we get 1NF of invoice?

    (Invoice-no, Invoice-date, Date-delivered, Cust-account, Cust-name,Cust-addr, Cust-city, Cust-state, Zip-code,Item1, Item_descrip1, Item_qty1, Item_price1,Item2, Item_descrip2, Item_qty2, Item_price2, . . . ,

    Item7, Item_descrip7, Item_qty7, Item_price7)

    repeatinggroup

    A relation is in first normal form if and only ifevery attribute is single-valued for each tuple.

    Remove all repeating groups

    Create a flat file

  • 7/31/2019 8.1. Normalization

    16/59

    Prof. Margita Kon-Popovska 2006 16

    Unnormalized to 1NF

    Nominated group of attributesto serve as the key

    (form a unique combination)

    Repeating groups

    eliminated.

    Each row retains data forone item.

    If a person bought 5 items,

    we would have five tuples

    (Invoice-no, Invoice-date, Date-delivered, Cust-account, Cust-name, Cust-addr, Cust-city, Cust-state, Zip-code,Item, Item_descrip, Item_qty, Item_price)

  • 7/31/2019 8.1. Normalization

    17/59

    Prof. Margita Kon-Popovska 2006 17

    1NF

    10001 123456 John Smith SAGX730 Pioneer Remote A/V Rec 1 569.95

    10001 123456 John Smith AT10 Cerwin Vega Loudspeakers 1 359.95

    10001 123456 John Smith CDPC725 Sony Disc Jockey CD 1 399.95

    10001 123456 John Smith S/H Shipping 1 100.00

    10001 123456 John Smith Tax Sales Tax 1 103.06

    Flat File

    DescriptionItem

    QuantityItemPriceItem

  • 7/31/2019 8.1. Normalization

    18/59

    Prof. Margita Kon-Popovska 2006 18

    Functional Dependencies

    Functional Dependency: The value of oneattribute (set of attributes) (the determinant)determines the value of another attribute (setof attributes)

    Functional dependency B is functionally dependent on A if each value of A

    is associated with exactly one value of B

    While a primary key is always a determinant,a determinant is not necessarily a primary key

    Attribute A Attribute B

    Determinant

  • 7/31/2019 8.1. Normalization

    19/59

    Prof. Margita Kon-Popovska 2006 19

    Full Functional Dependencies

    Full Functional dependency B is full functionally dependent on A if it is functionally

    dependant on entire A and not on part of A

    Determinant

    Attribute BAttribute A1,A2..An

  • 7/31/2019 8.1. Normalization

    20/59

    Prof. Margita Kon-Popovska 2006 20

    Functional dependences and Keys

    Candidate Key: A unique identifier. One of the candidate keys will

    become the primary key

    E.g. perhaps there is both credit card number and SS# in atablein this case both are candidate keys

    Non-Key attribute, attribute not part of any candidatekey

    Each non-key attribute is functionally dependent onevery candidate key (1NF)

    Each non-key attribute is fully functionally dependenton every candidate key (2NF)

  • 7/31/2019 8.1. Normalization

    21/59

    Prof. Margita Kon-Popovska 2006 21

    Second Normal Form (2NF)

    1NF + every non-key attribute is fully functionallydependent on the ENTIRE key

    Every non-key attribute must be defined by the entirekey, not by only part of the key

    No partial functional dependencies

    So Table EMPLOEE2 is NOT in 2NF

    (see next slide)

  • 7/31/2019 8.1. Normalization

    22/59

    Prof. Margita Kon-Popovska 2006 22

    Functional Dependencies Example

    Dependency on entire primary key

    Dependency on onlypartof the key

    EmpID, CourseTitle DateCompleted

    EmpID Name, DeptName, Salary

    Therefore, NOT in 2nd Normal Form!!

  • 7/31/2019 8.1. Normalization

    23/59

    Prof. Margita Kon-Popovska 2006 23

    Getting it into 2nd Normal Form

    Decomposed into two separate relations

    Both are fullfunctional

    dependencies

  • 7/31/2019 8.1. Normalization

    24/59

  • 7/31/2019 8.1. Normalization

    25/59

    Prof. Margita Kon-Popovska 2006 25

    Example: Invoice

    How to get Relations (tables) in 2NF

  • 7/31/2019 8.1. Normalization

    26/59

  • 7/31/2019 8.1. Normalization

    27/59

    Prof. Margita Kon-Popovska 2006 27

    Some of the attributes are dependentupon invoice_number for

    their values, some are dependant upon invoice_number anditem, and others are dependentupon item only.

    In either case, they are notfunctionally dependenton the entire

    key.

    Using Invoice number and Item as the key...

    (Invoice_number, Invoice_date, Date_delivered, Cust_account,Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,Item, Item_descrip, Item_qty, Item_price)

  • 7/31/2019 8.1. Normalization

    28/59

  • 7/31/2019 8.1. Normalization

    29/59

    Prof. Margita Kon-Popovska 2006 29

    (Invoice_number, Invoice_date, Date_delivered,Cust_account, Cust_name, Cust_addr, Cust_city,Cust_state, Zip_code)

    Composite key

    Partial dependency

    (Invoice_number, Invoice_date, Date_delivered, Cust_account,

    Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)

    (Invoice_number, Item, Item_descrip, Item_qty, Item_price)

    (Item, Item_descrip)

    (Invoice_number, Item, Item_descrip, Item_qty, Item_price)

  • 7/31/2019 8.1. Normalization

    30/59

    Prof. Margita Kon-Popovska 2006 30

    Third Normal Form (3NF)

    Transitive dependenciesone attribute functionally determines a second, whichfunctionally determines a third)

    2NF + no transitive dependenciesA relation is in third normal form if it is in second normalform and no nonkey attribute is transitively dependent onthe key. Remove transitive dependencies

    Each nonkey attribute must depend upon the key, the wholekey, and nothing but key.

    Kent, 1978

  • 7/31/2019 8.1. Normalization

    31/59

    Prof. Margita Kon-Popovska 2006 31

    Example: Relation with transitive dependency

    (a) SALES relation with simple data

  • 7/31/2019 8.1. Normalization

    32/59

  • 7/31/2019 8.1. Normalization

    33/59

    Prof. Margita Kon-Popovska 2006 33

    Removing a transitive dependency

    (a) Decomposing the SALES relation

  • 7/31/2019 8.1. Normalization

    34/59

    Prof. Margita Kon-Popovska 2006 34

    Relations in 3NF

    Now, there are no transitive dependencies

    Both relations are in 3rd NF

    CustID Name

    CustID Salesperson

    Salesperson Region

  • 7/31/2019 8.1. Normalization

    35/59

    Prof. Margita Kon-Popovska 2006 35

    Example: Invoice

    How to get Relations (tables) in 3NF

  • 7/31/2019 8.1. Normalization

    36/59

    Prof. Margita Kon-Popovska 2006 36

    From 2NF to 3NF

    Which attributes are dependent on others?

    Is there a problem?

    (Invoice_number, Invoice_date, Date_delivered, Cust_account,Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)

    (Invoice_number, Item, Item_qty, Item_price)

    (Item, Item_descrip)

  • 7/31/2019 8.1. Normalization

    37/59

    Prof. Margita Kon-Popovska 2006 37

    Transitive Dependencies and Anomalies

    Insertion anomalies

    To add a new row, all customer (name, address, city,state, zip code, phone) and products (description)must be consistent with previous entries

    Deletion anomalies

    By deleting a row, a customer or product may ceaseto exist

    Modification anomalies To modify a customers or products data in one row,

    all modifications must be carried out to all others

  • 7/31/2019 8.1. Normalization

    38/59

    Prof. Margita Kon-Popovska 2006 38

    Deletion Anomaly

    4377182 John Smith Sacramento CA 95831

    4398711 Al Gore Davis CA 95691

    4578461 Gray Davis Sacramento CA 95831

    4873179 Lisa Carr Reno NV 89557

    By deleting customer Al Gore, we would also be deleting

    Town Davis, and State California.

    Invoice number

  • 7/31/2019 8.1. Normalization

    39/59

    Prof. Margita Kon-Popovska 2006 39

    TransitiveDependencies

    Invoice_number

    Invoice_date

    Date_delivered

    Cust_accountCust_name

    Cust_addr

    Cust_city

    Cust_state

    Zip_code

    Item

    Item_descripInvoice_number+Item

    Item_qty

    Item_price

    A condition where A, B, Care attributes of a relationsuch that if A B andB C, then C is

    transitively dependent onA via B (provided that A isnot functionally dependent

    on B or C).

    Why Should City and State Be Separated from

  • 7/31/2019 8.1. Normalization

    40/59

    Prof. Margita Kon-Popovska 2006 40

    Why Should City and State Be Separated fromCustomer Relation?

    City and state are dependent on zip code for their valuesand not the customers identifier (i.e., key).

    Zip_code City, State

    Otherwise,

    Cust_account Cust_addr, Zip_code City, State

    In which case, you have transitive dependency.

  • 7/31/2019 8.1. Normalization

    41/59

    Prof. Margita Kon-Popovska 2006 41

    3NF

    Invoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)

    Customer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)

    Zip_code Relation(Zip_code, City, State)

    Invoice_items Relation(Invoice_number, Item, Item_qty, Item_price)

    Items Relation(Item, Item_descrip)

  • 7/31/2019 8.1. Normalization

    42/59

    Prof. Margita Kon-Popovska 2006 42

    Further Anomalies

    DVD-A110 PanasonicPV-4210 PanasonicPV-4250 Panasonic

    CT-32S35 PAN

    Inconsistency

    DVD-A110 PanasonicPV-4210 PanaSonicPV-4250 Pana SonicCT-32S35 PAN

    In Item-decript manufacturer

    name is contained

    It will be useful to change all

    Panasonic products

    manufacturer name to

    Panasonic USA

    Item Item_descrip

    Insert a new Panasonic product

  • 7/31/2019 8.1. Normalization

    43/59

    Prof. Margita Kon-Popovska 2006 43

    3NF

    Invoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)

    Customer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)

    Zip_code Relation(Zip_code, City, State)

    Invoice_items Relation(Invoice_number, Item, Item_qty, Item_price)

    Items Relation(Item, Item_descrip)

    Since the Items relation contains the manufacturers name in thedescription, a separate Manufacturers relation can be created

    Manufacturers Relation(Manuf_code, Manuf_name)

    First to Third Normal Form (1NF 3NF)

  • 7/31/2019 8.1. Normalization

    44/59

    Prof. Margita Kon-Popovska 2006 44

    First to Third Normal Form (1NF - 3NF)

    1NF: A relation is in first normal form if and only if everyattribute is single-valued for each tuple(remove therepeating or multi-value attributes and create a flat file)

    2NF: A relation is in second normal form if and only if it

    is in first normal form and the nonkey attributes are fullyfunctionally dependent on the key(remove partialdependencies)

    3NF: A relation is in third normal form if it is in second

    normal form and no nonkey attribute is transitivelydependent on the key(remove transitive dependencies)

  • 7/31/2019 8.1. Normalization

    45/59

    Example (Employee)

  • 7/31/2019 8.1. Normalization

    46/59

    Prof. Margita Kon-Popovska 2006 46

    Example (Employee)

    EMPLOYEE (EmpId, Name, Dept, Salary, Course1, DateTook1, Fee1,

    Course2, DateTook2, Fee2, )

  • 7/31/2019 8.1. Normalization

    47/59

  • 7/31/2019 8.1. Normalization

    48/59

    Prof. Margita Kon-Popovska 2006 48

    If employees can take a course more than once: TOOK_COURSE ( EmpId, Course, DateTook)

    2NF

  • 7/31/2019 8.1. Normalization

    49/59

    Prof. Margita Kon-Popovska 2006 49

    Example (Hospital)

    Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug g s ide effec

    1111

    145

    311

    Jan 1,

    1995; June

    12, 1995 John White

    15 New St.

    New York,

    NY

    Beth Little

    Michael

    Diamond

    Gallstone

    s removal;

    Kidney

    stones

    removal

    Penicillin,

    none-

    rash

    none

    1234

    243

    467

    Apr 5,1994 May

    10, 1995 Mary Jones

    10 Main St.

    Rye, NY

    Charles

    FieldPatricia

    Gold

    Eye

    Cataract

    removalThrombos

    is removal

    Tetracyclin

    e none

    Fever

    none

    2345 189

    Jan 8,

    1996 Charles Brown

    Dogwood

    Lane

    Harrison,

    NY

    David

    Rosen

    Open

    Heart

    Surgery

    Cephalosp

    orin none

    4876 145Nov 5,1995 Hal Kane

    55 Boston

    Post Road,

    Chester,CN Beth Little

    Cholecystectomy Demicillin none

    5123 145

    May 10,

    1995 Paul Kosher

    Blind Brook

    Mamaronec

    k, NY Beth Little

    Gallstone

    s

    Removal none none

    6845 243

    Apr 5,

    1994 Dec15, 1984 Ann Hood

    Hilton Road

    Larchmont,NY CharlesField

    Eye

    Cornea

    Replacem

    ent Eye

    cataractremoval Tetracycline Fever

    NF

  • 7/31/2019 8.1. Normalization

    50/59

    Prof. Margita Kon-Popovska 2006 50

    1NF

    Patient # Surgeon # Surgery DatePatient NamePatient AddrSurgeon Name Surgery Drug admin ide Effect

    1111 145 01-Jan-95 John White

    15 New St.

    New York,

    NY Beth Little

    Gallstone

    s removal Penicillin rash

    1111 311 12-Jun-95 John White

    15 New St.

    New York,

    NY

    Michael

    Diamond

    Kidney

    stones

    removal none none

    1234 243 05-Apr-94 Mary Jones

    10 Main St.

    Rye, NY Charles Field

    Eye

    Cataract

    removal

    Tetracyclin

    e Fever

    1234 467 10-May-95 Mary Jones

    10 Main St.

    Rye, NY Patricia Gold

    Thrombos

    is removal none none

    2345 189 08-Jan-96

    Charles

    Brown

    Dogwood

    Lane

    Harrison,

    NY David Rosen

    Open

    Heart

    Surgery

    Cephalosp

    orin none

    4876 145 05-Nov-95 Hal Kane

    55 Boston

    Post Road,

    Chester,

    CN Beth Little

    Cholecyst

    ectomy Demicillin none

    5123 145 10-May-95 Paul Kosher

    Blind Brook

    Mamaronec

    k, NY Beth Little

    Gallstone

    s

    Removal none none

    6845 243 05-Apr-94 Ann Hood

    Hilton Road

    Larchmont,

    NY Charles Field

    Eye

    Cornea

    Replacem

    ent

    Tetracyclin

    e Fever

    6845 243 15-Dec-84 Ann Hood

    Hilton Road

    Larchmont,

    NY Charles Field

    Eye

    cataract

    removal none none

    2NF

  • 7/31/2019 8.1. Normalization

    51/59

    Prof. Margita Kon-Popovska 2006 51

    2NF

    Patient # Patient Name Patient Address

    1111 John White

    15 New St. New

    York, NY

    1234 Mary Jones

    10 Main St. Rye,

    NY

    2345

    Charles

    Brown

    Dogwood Lane

    Harrison, NY

    4876 Hal Kane

    55 Boston Post

    Road, Chester,

    5123 Paul Kosher

    Blind Brook

    Mamaroneck, NY

    6845 Ann HoodHilton RoadLarchmont, NY

    Surgeon # Surgeon Name

    145 Beth Little

    189 David Rosen

    243 Charles Field

    311 Michael Diamond

    467 Patricia Gold

  • 7/31/2019 8.1. Normalization

    52/59

    Prof. Margita Kon-Popovska 2006 52

    Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects

    1111 145 01-Jan-95

    Gallstones

    removal Penicillin rash

    1111 311 12-Jun-95

    stones

    removal none none

    1234 243 05-Apr-94

    Eye Cataract

    removal Tetracycline Fever

    1234 467 10-May-95

    Thrombosis

    removal none none

    2345 189 08-Jan-96

    Open Heart

    Surgery

    Cephalospori

    n none

    4876 145 05-Nov-95

    Cholecystect

    omy Demicillin none

    5123 145 10-May-95

    Gallstones

    Removal none none

    6845 243 15-Dec-84

    Eye cataract

    removal none none

    6845 243 05-Apr-94

    Eye Cornea

    Replacement Tetracycline Fever

  • 7/31/2019 8.1. Normalization

    53/59

    Prof. Margita Kon-Popovska 2006 53

    Example

    Work on

    project

    3NF

  • 7/31/2019 8.1. Normalization

    54/59

    Prof. Margita Kon-Popovska 2006 54

    3NF

    Find candidate keys, primary keys, all functional

  • 7/31/2019 8.1. Normalization

    55/59

    Prof. Margita Kon-Popovska 2006 55

    dependences and transform following relations in 3NF

    Housing

    StId Dorm Fee

    100 B1101 1000

    101 B1102 1100

    102 B2101 1000

    StId Dorm, StId Fee

    not in 3NF because of DormFee

    1.

  • 7/31/2019 8.1. Normalization

    56/59

  • 7/31/2019 8.1. Normalization

    57/59

    Prof. Margita Kon-Popovska 2006 57

    EXAM(S#, Name, P#, NameCourse, Mark)

    STUDENT(S#, Name,P#, NnameCourse)

    STUDENT(S#,P#, Na,e, NameCourse, Gender,

    Mark, Data)

    4.

    E l i d t li

  • 7/31/2019 8.1. Normalization

    58/59

    Prof. Margita Kon-Popovska 2006 58

    Explain update anomalies

    PRODUCTION(P#, M#, E#)

    P1 M1 E1

    P2 M2 E3

    P3 M1 E1P4 M1 E1

    P5 M3 E2

    P6 M4 E1

    P#M#P# E#

    M# E#

    P stands for Product

    M stands for Machine

    E stands for Employee

    Willinsertion

    produce inconsistence

  • 7/31/2019 8.1. Normalization

    59/59

    Determine Candidate keys and primarykey, functional dependences

    and see if this relation could be normalizedAdvisory

    St Id Major Advisor

    100 Math Smith

    100 Language Ringo

    101 Language Yung

    101 Math Smith

    102 Math Peris