Slide 12 - Database Design & Normalization

  • Upload
    ayisha

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 Slide 12 - Database Design & Normalization

    1/29

    Database Design &Database Design &

    normalizationnormalization

  • 8/14/2019 Slide 12 - Database Design & Normalization

    2/29

    Why?Why?

    Why ? Why ? Why?Why ? Why ? Why?

    Why we need to talk about databaseWhy we need to talk about database

    design?design?

  • 8/14/2019 Slide 12 - Database Design & Normalization

    3/29

    Lets start with an example.

    Say you need a sales report something like this:

    Customer Catalog Unit Qty Actual Extended

    No. Name Address No. Description Price Date Sold Price Price

    131 Jo Blo 13 May St 3A21 T-Shirt 12.49 03/01/98 45 10.00 450.00179 Yo Yo 271 OK Ave 1B77 Sweats 15.00 01/03/98 12 15.00 180.00212 Mu Mu 32 Saddle Rd 4X21 Pants 23.47 12/11/98 5 21.00 105.00 . . . . . . . . . .

    . . . . . . . . . .. . . . . . . . . .

  • 8/14/2019 Slide 12 - Database Design & Normalization

    4/29

    is to build ais to build a relational tablerelational table

    thatthat mimics this report.mimics this report. That is, it has theThat is, it has the same columnssame columns as this report.as this report.

    But what would we call this class?But what would we call this class?

    The best name would probably be somethingThe best name would probably be somethinglikelike SalesSales ororSales Analysis.Sales Analysis.

    But . . .But . . .

    What the uninitiateduninitiated (read amateuramateur) database

    designer tends to do

  • 8/14/2019 Slide 12 - Database Design & Normalization

    5/29

    We have:We have:

    Data that describes aData that describes a CustomerCustomer(Cust No./Name/Address)(Cust No./Name/Address)

    Data that describes aData that describes a ProductProduct(Cat No/Description/Unit Price)(Cat No/Description/Unit Price)

    And data that describes aAnd data that describes a SaleSale(Date/Quantity/Actual(Date/Quantity/Actual and Extended Prices)and Extended Prices) Compare this situation with all the earlier models we have looked at,Compare this situation with all the earlier models we have looked at,

    Youll see thatYoull see that CustomerCustomer,, ProductProduct andand SaleSale should each be ashould each be a

    separate class . . .separate class . . .

    qThe problem is that we have three kindsthree kinds

    of data in this report.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    6/29

    The maintenance horror of theThe maintenance horror of the

    poorly designed databasepoorly designed databaseA customer can continuously buy severalA customer can continuously buy several

    kinds of product.kinds of product.

    What if he change his name?What if he change his name?

    What if the price of a product is increasedWhat if the price of a product is increasedor decreased?or decreased?

    What if a customer change its address?What if a customer change its address?

  • 8/14/2019 Slide 12 - Database Design & Normalization

    7/29

    What is the problem of theWhat is the problem of the

    amateurs database design?amateurs database design? This structureThis structure does notdoes not allows our database toallows our database to

    answeranswer

    any queryany querythat could possibly be dreamed upthat could possibly be dreamed upagainst that data.against that data.

    Some query can be done but verySome query can be done but veryinefficientinefficient

  • 8/14/2019 Slide 12 - Database Design & Normalization

    8/29

    The Un-normalized structure that mimickedthe report will have problems ,

    down the line a few months or years,

    Attempting to answer queries

    that the database designer did not foresee -

    What I refer to as:

    That most dreaded of all database phenomena,

    Unanticipated Queries

  • 8/14/2019 Slide 12 - Database Design & Normalization

    9/29

    NormalizationNormalization

    What Normalization is forWhat Normalization is for

    is to make sureis to make surethat each database table carriesthat each database table carries

    only the attributesonly the attributes

    thatthat actually describeactually describe

    What is needed.What is needed.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    10/29

    NormalizationNormalization

    Definition: Normalization is the process ofDefinition: Normalization is the process of

    structuring relational database schema such thatstructuring relational database schema such thatmost ambiguity is removed. The stages ofmost ambiguity is removed. The stages of

    normalization are referred to as normal formsnormalization are referred to as normal formsand progress from the least restrictive (Firstand progress from the least restrictive (First

    Normal Form) through the most restrictive (FifthNormal Form) through the most restrictive (Fifth

    Normal Form). Generally, most databaseNormal Form). Generally, most database

    designers do not attempt to implement anythingdesigners do not attempt to implement anythinghigher than Third Normal Form or Boyce-Coddhigher than Third Normal Form or Boyce-Codd

    Normal Form.Normal Form.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    11/29

    A simpler explanation toA simpler explanation to

    normalizationnormalization

    There are two goals of the normalization process:There are two goals of the normalization process:

    eliminate redundant dataeliminate redundant data (for example, storing(for example, storing

    the same data in more than one table) andthe same data in more than one table) and

    ensure data dependenciesensure data dependencies make sense (onlymake sense (onlystoring related data in a table). Both of these arestoring related data in a table). Both of these are

    worthy goals as they reduce the amount ofworthy goals as they reduce the amount ofspace a database consumes and ensure thatspace a database consumes and ensure that

    data is logically stored.data is logically stored.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    12/29

    Normal formsNormal forms

    The database community has developed aThe database community has developed aseries of guidelines for ensuring that databasesseries of guidelines for ensuring that databasesare normalized. These are referred to as normalare normalized. These are referred to as normalforms and are numbered from one (the lowestforms and are numbered from one (the lowest

    form of normalization, referred to as first normalform of normalization, referred to as first normalform orform or1NF1NF) through five (fifth normal form or) through five (fifth normal form or5NF5NF).).

    In practical applications, you'll often see 1NF,In practical applications, you'll often see 1NF,2NF2NF, and, and 3NF3NF along with the occasional 4NF.along with the occasional 4NF.Fifth normal form is very rarely seen and won'tFifth normal form is very rarely seen and won'tbe discussed in this article.be discussed in this article.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    13/29

    Normal form hierarchyNormal form hierarchy

    First normal form (1NF)First normal form (1NF) sets the very basic rules for an organized database:sets the very basic rules for an organized database: Eliminate duplicative columns from the same table.Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row withCreate separate tables for each group of related data and identify each row with

    a unique column or set of columns (the primary key).a unique column or set of columns (the primary key). Second normal form (2NF)Second normal form (2NF) further addresses the concept of removingfurther addresses the concept of removing

    duplicative data:duplicative data:

    Meet all the requirements of the first normal form.Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them inRemove subsets of data that apply to multiple rows of a table and place them in

    separate tables.separate tables. Create relationships between these new tables and their predecessors throughCreate relationships between these new tables and their predecessors through

    the use of foreign keys.the use of foreign keys. Third normal form (3NF)Third normal form (3NF) goes one large step further:goes one large step further:

    Meet all the requirements of the second normal form.Meet all the requirements of the second normal form.

    Remove columns that are not dependent upon the primary key.Remove columns that are not dependent upon the primary key. Finally, fourth normal form (4NF)Finally, fourth normal form (4NF) has one additional requirement:has one additional requirement:

    Meet all the requirements of the third normal form.Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies.A relation is in 4NF if it has no multi-valued dependencies.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    14/29

    1ST NF1ST NF

    Eliminate duplicative columns from theEliminate duplicative columns from thesame table.same table.

    Create separate tables for each group ofCreate separate tables for each group ofrelated data and identify each row with arelated data and identify each row with aunique column or set of columns (theunique column or set of columns (theprimary key).primary key).

  • 8/14/2019 Slide 12 - Database Design & Normalization

    15/29

    An classic exampleAn classic example

    a table within a human resourcesa table within a human resourcesdatabase that stores the manager-database that stores the manager-subordinate relationship.subordinate relationship.

    For the purposes of our example, we lFor the purposes of our example, we limpose the business rule that eachimpose the business rule that eachmanagermanagermay have one or moremay have one or more

    subordinatessubordinates while each subordinate maywhile each subordinate mayhave only one manager.have only one manager.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    16/29

    An intuitive tableAn intuitive table

    AlanJim

    MarkCarolJasonMikeMary

    BethMaryJimBob

    Subordinate4Subordinate3Subordinate2Subordinate1Manager

  • 8/14/2019 Slide 12 - Database Design & Normalization

    17/29

    Why it is not even 1st NF?Why it is not even 1st NF?

    recall the first rule imposed by 1NF: eliminaterecall the first rule imposed by 1NF: eliminateduplicative columns from the same table.?duplicative columns from the same table.?Clearly, the Subordinate1-Subordinate4 columnsClearly, the Subordinate1-Subordinate4 columnsare duplicative.are duplicative.

    Jim only has one subordinate, the Subordinate2-Jim only has one subordinate, the Subordinate2-Subordinate4 columns are simply wastedSubordinate4 columns are simply wastedstorage spacestorage space

    Furthermore, Mary already has 4 subordinates ?Furthermore, Mary already has 4 subordinates ?what happens if she takes on anotherwhat happens if she takes on anotheremployee? The whole table structure wouldemployee? The whole table structure wouldrequire modification.require modification.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    18/29

    A second bright ideaA second bright idea

    Let try something like this:Let try something like this:

    Manager SubordinatesManager Subordinates Bob Jim, Mary, Beth Mary Mike, Jason,Bob Jim, Mary, Beth Mary Mike, Jason,

    Carol, Mark Jim Alan This solution is closer, but it also falls short ofCarol, Mark Jim Alan This solution is closer, but it also falls short ofthe markthe mark The subordinates column is still duplicative and non-atomic. WhatThe subordinates column is still duplicative and non-atomic. What

    happens when we need to add or remove a subordinate?? We needhappens when we need to add or remove a subordinate?? We needto read and write the entire contents of the table.? That not a bigto read and write the entire contents of the table.? That not a bigdeal in this situation, but what if one manager had one hundreddeal in this situation, but what if one manager had one hundredemployees??Also, it complicates the process of selecting data fromemployees??Also, it complicates the process of selecting data fromthe database in future queries.the database in future queries.

    AlanJim

    Mike, Jason, Carol, MarkMary

    Jim, Mary, BethBob

    SubordinatesManager

  • 8/14/2019 Slide 12 - Database Design & Normalization

    19/29

    Here is a table that satisfies the firstHere is a table that satisfies the first

    rule of 1NF:rule of 1NF:

    AlanJim

    MarkMary

    CarolMary

    JasonMary

    MikeMary

    BethBob

    MaryBob

    JimBob

    SubordinateManager

  • 8/14/2019 Slide 12 - Database Design & Normalization

    20/29

    Not finished yetNot finished yet

    Now, what about the second rule: identify each row withNow, what about the second rule: identify each row witha unique column or set of columns (the primary key)a unique column or set of columns (the primary key)

    You might take a look at the table above and suggest theYou might take a look at the table above and suggest theuse of the subordinate column as a primary key. In fact,use of the subordinate column as a primary key. In fact,

    the subordinate column is a good candidate for a primarythe subordinate column is a good candidate for a primarykey due to the fact that our business rules specified thatkey due to the fact that our business rules specified thateach subordinate may have only one manager.each subordinate may have only one manager.

    However, the data that we have chosen to store in ourHowever, the data that we have chosen to store in ourtable makes this a less than ideal solution.? Whattable makes this a less than ideal solution.? What

    happens if we hire another employee named Jim? Howhappens if we hire another employee named Jim? Howdo we store his manager-subordinate relationship in thedo we store his manager-subordinate relationship in thedatabase??database??

  • 8/14/2019 Slide 12 - Database Design & Normalization

    21/29

    Finally, the 1st NFFinally, the 1st NF

    202143

    196201

    187201041201

    156201

    123182

    201182143182

    SubordinateManager

    It best to use a truly unique identifier (like an employee ID or SSN) as aprimary key.? Our final table would look like this:

  • 8/14/2019 Slide 12 - Database Design & Normalization

    22/29

    Towards to 2NFTowards to 2NF

    Definition:Definition: In order to be in SecondIn order to be in SecondNormal Form, aNormal Form, a relationrelation must first fulfill themust first fulfill therequirements to be inrequirements to be in First Normal FormFirst Normal Form..

    Additionally, each nonkeyAdditionally, each nonkey attributeattribute in thein therelation must be functionally dependentrelation must be functionally dependentupon theupon the primary keyprimary key..

    http://databases.about.com/library/glossary/bldef-relation.htmhttp://databases.about.com/library/glossary/bldef-relation.htmhttp://databases.about.com/library/glossary/bldef-1nf.htmhttp://databases.about.com/library/glossary/bldef-1nf.htmhttp://databases.about.com/library/glossary/bldef-attribute.htmhttp://databases.about.com/library/glossary/bldef-attribute.htmhttp://databases.about.com/library/glossary/bldef-primarykey.htmhttp://databases.about.com/library/glossary/bldef-primarykey.htmhttp://databases.about.com/library/glossary/bldef-primarykey.htmhttp://databases.about.com/library/glossary/bldef-attribute.htmhttp://databases.about.com/library/glossary/bldef-1nf.htmhttp://databases.about.com/library/glossary/bldef-relation.htm
  • 8/14/2019 Slide 12 - Database Design & Normalization

    23/29

    An exampleAn example

    $928.53John DoeAcme Widgets4

    $1042.42John DoeAcme Widgets3

    $521.24Fred FlintstoneABC Corporation2

    $134.23John DoeAcme Widgets1

    TotalContact PersonCustomerOrder #

    The relation is in First Normal Form, but not SecondNormal Form:

    In the table above, the order number serves as the primary key. Notice that

    the customer and total amount are dependent upon the order number -- this

    data is specific to each order. However, the contact person is dependent

    upon the customer.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    24/29

    Two tables to satisfy 2NFTwo tables to satisfy 2NF

    Fred FlintstoneABC Corporation

    John DoeAcme Widgets

    Contact PersonCustomer

    $928.53Acme Widgets4

    $1042.42Acme Widgets3

    $521.24ABC Corporation2

    $134.23Acme Widgets1

    TotalCustomerOrder #

  • 8/14/2019 Slide 12 - Database Design & Normalization

    25/29

    commentscomments

    The creation of two separate tables eliminatesThe creation of two separate tables eliminatesthe dependency problem experienced in thethe dependency problem experienced in theprevious case.previous case.

    In the first table, contact person is dependentIn the first table, contact person is dependentupon the primary key -- customer name.Theupon the primary key -- customer name.Thesecond table only includes the informationsecond table only includes the informationunique to each order.unique to each order.

    Someone interested in the contact person forSomeone interested in the contact person foreach order could obtain this information byeach order could obtain this information byperforming aperforming a JOIN operationJOIN operation

    http://databases.about.com/library/glossary/bldef-join.htmhttp://databases.about.com/library/glossary/bldef-join.htmhttp://databases.about.com/library/glossary/bldef-join.htm
  • 8/14/2019 Slide 12 - Database Design & Normalization

    26/29

    3RD NF3RD NF

    Definition:Definition: In order to be in Third NormalIn order to be in Third NormalForm, aForm, a relationrelation must first fulfill themust first fulfill therequirements to be inrequirements to be in Second NormalSecond Normal

    FormForm.?Additionally, all attributes that are.?Additionally, all attributes that arenot dependent upon the primary key mustnot dependent upon the primary key mustbe eliminatedbe eliminated

    http://databases.about.com/library/glossary/bldef-relation.htmhttp://databases.about.com/library/glossary/bldef-relation.htmhttp://databases.about.com/library/glossary/bldef-2nf.htmhttp://databases.about.com/library/glossary/bldef-2nf.htmhttp://databases.about.com/library/glossary/bldef-2nf.htmhttp://databases.about.com/library/glossary/bldef-2nf.htmhttp://databases.about.com/library/glossary/bldef-2nf.htmhttp://databases.about.com/library/glossary/bldef-2nf.htmhttp://databases.about.com/library/glossary/bldef-relation.htm
  • 8/14/2019 Slide 12 - Database Design & Normalization

    27/29

  • 8/14/2019 Slide 12 - Database Design & Normalization

    28/29

    To go or not to go higher?To go or not to go higher?

    This may seem overly complex for dailyThis may seem overly complex for dailyapplications and indeed it may be.applications and indeed it may be.Database designers should always keepDatabase designers should always keep

    in mind the tradeoffs between higher levelin mind the tradeoffs between higher levelnormal forms and the resource issues thatnormal forms and the resource issues thatcomplexity creates.complexity creates.

  • 8/14/2019 Slide 12 - Database Design & Normalization

    29/29

    An exerciseAn exercise

    (20(20 ))

    S#:S#: SNAME:SNAME: CITY1CITY1

    P#P# PNAMEPNAME COLORCOLOR WEIGHTWEIGHT CITY2CITY2 QTYQTY