Download ppt - C8 Normalization

Transcript
  • DataBase course notes 8DataBasesDataBase DesignNormalization Process*

    DataBase course notes 8

  • Database DesignConceptual Data ModelingLogical Database DesignNormalization ProcessImplementing Base Table StructuresDataBase course notes 8*

    DataBase course notes 8

  • NORMALIZATION PROCESSDataBase course notes 8*

    DataBase course notes 8

  • Normalizationprocess of taking entities and attributes that have been discovered and making them suitable for the relational databaseprocess does this by removing redundancies and shaping data in manner that the relational engine desiresDataBase course notes 8*

    DataBase course notes 8

  • Normalizationbased on a set of levels, each of which achieving a level of correctness or adherence to a particular set of rulesrules formally known as forms, normal formsFirst Normal Form(1NF) which eliminates data redundancy and continues through to Fifth Normal Form (5NF)which deals with decomposition of ternary relationshipsDataBase course notes 8*

    DataBase course notes 8

  • Normalizationeach level of normalization indicates an increasing degree of adherence to the recognized standards of database designas you increase degree of normalization of your data, youll naturally tend to create an increasing number of tables of decreasing width (fewer columns)

    DataBase course notes 8*

    DataBase course notes 8

  • Why Normalize?eliminate data thats duplicated, chance it wont match when you need itavoid unnecessary coding needed to keep duplicated data in sync

    keep tables thin, increase number of values that will fit on a page (8K) decrease number of reads that will be needed maximizing use of clustered indexes allow for optimum data access and joinslowering number of indexes per table - indexes are costly to maintainDataBase course notes 8*

    DataBase course notes 8

  • Eliminating duplicated dataany piece of data that occurs more than once in the database => increased probability for errors to happenDataBase course notes 8*Eliminating anomalies INSERT DELETE UPDATE

    Easy to keep database consistent; Easy to preserve the integrity of the database

    DataBase course notes 8

  • Functional dependenciesDataBase course notes 8* Consider R(A1,A2,,An) a relation schema X,Y (A1, A2,, An)

    Definition: The attribute X functionally determines the attribute Y, X -> Y, if and only if for any value of X, there is only one value of Y corresponding to X.

    The functional dependency X->Y is total if there isnt any Z, ZX, Z -> Y; otherwise, it is partial

    Observations: If X->Y, then, for any Z, Z Y, we have: X->Z If X->Y and X is a simple attribute, then Y is totally (functionally) dependent on X. If Y is totally dependent on Z, then we have X->Y for every composed attribute X that contains Z.

    DataBase course notes 8

  • Armstrongs axioms

    A1 (Reflexivity) If Y X => X->Y

    A2 (Augmentation) If X->Y => XZ -> Y Z

    A3 (Transitivity) If X->Y and Y->Z => X->Z

  • Process of Normalizationtake entities that are complex and extract simpler entities from themcontinues until every table in database represents one thing (simple entity) and every column describes that thingDataBase course notes 8*

    DataBase course notes 8

  • 3 categories of normalization stepsentity and attribute shaperelationships between attributesmulti-valued and join dependencies in entitiesDataBase course notes 8*

    DataBase course notes 8

  • Entity and attribute shapeFirst Normal Formall attributes must be atomic, that is, only a single value represented in a single attribute in a single instance of an entityall instances of an entity must contain the same number of valuesall instances of an entity must be differentDataBase course notes 8*

    DataBase course notes 8

  • First Normal Formviolations => data handling not optimal - having to decode multiple values stored where a single one should be having duplicated rows that cannot be distinguished from one another

    DataBase course notes 8*

    DataBase course notes 8

  • for example, consider group of data like 1, 2, 3, 5, 7likely represents five separate valuesatomicity is to consider whether you would ever need to deal with part of column without other parts of data in that same column1, 2, 3, 5, 7 list always treated as single value, it might be acceptable to store value in single columnif you might need to deal with value 3 individually, then the list is definitely not in First Normal Formif there is not plan to use list elements individually, you should consider whether it is still better to store each value individually to allow for future possible usageDataBase course notes 8*

    DataBase course notes 8

  • E-Mail [email protected] AccountName: name1Domain: domain1.comDataBase course notes 8*

    DataBase course notes 8

  • E-Mail Addressesif all youll ever do is send e-mail, then single column is perfectly acceptableIf you need to consider what domains you have e-mail addresses stored for => access individual parts, then its a completely different matterDataBase course notes 8*

    DataBase course notes 8

  • Telephone NumbersAAA-EEE-NNNN (XXXX):AAA area code indicates calling area located within a stateEEE exchange - indicates a set of numbers within an area codeNNNN number - used to make individual phone numbers uniqueXXXX extension - number that must be dialed after connectingDataBase course notes 8*

    DataBase course notes 8

  • Mailing AddressesDataBase course notes 8*

    DataBase course notes 8

  • Mailing AddressesDataBase course notes 8*

    DataBase course notes 8

  • All instances in entity contain same number of valuesentities have a fixed number of attributesand tables have a fixed number of columnsentities should be designed such that every attribute has a fixed number of values associated with itexample of a violation of this rule in entities that have several attributes with same base name suffixed (or prefixed) with a number, such as Payment1, Payment2, and so onDataBase course notes 8*

    DataBase course notes 8

  • Programming Anomalies avoided by First Normal Formmodifying lists in single columnmodifying multipart valuesdealing with a variable number of facts in an instanceDataBase course notes 8*

    DataBase course notes 8

  • Clues that design is not in First Normal Formstring data that contains separator-type charactersattribute names with numbers at the endtables with no or poorly defined keysDataBase course notes 8*

    DataBase course notes 8

  • Relationships Between AttributesSecond Normal Formrelationships between non-key attributes and part of the primary keyThird Normal Formrelationships between non-key attributesBCNF (Boyce Codd Normal Form)relationships between non-key attributes and any key

    Non-key attributes must provide a detail about the key, the whole key, and nothing but the key.DataBase course notes 8*

    DataBase course notes 8

  • Second Normal Formentity must be in First Normal Form.each attribute must be a fact describing the entire keytechnically relevant only when a composite key (a key composed of two or more columns) exists in the entityDefinition A relation R is in the second normal form (FN2) if it is in FN1 and every nonkey attribute is totally dependent on every relationship keyDataBase course notes 8*

    DataBase course notes 8

  • Each non-key attribute must describe entire keyDataBase course notes 8*

    DataBase course notes 8

  • BookIsbnNumber attribute uniquely identifies bookAuthorSocialSecurityNumber uniquely identifies authortwo columns create key that uniquely identifies an author for bookBookTitle describes bookbut doesnt describe author at allAuthorFirstName and AuthorLastName, describe author, but not bookDataBase course notes 8*

    DataBase course notes 8

  • BookIsbnNumber BookTitle AuthorSocialSecurityNumber AuthorFirstNameAuthorSocialSecurityNumber AuthorLastNameBookIsbnNumber, AuthorSocialSecurityNumber RoyaltyPercentageDataBase course notes 8*

    DataBase course notes 8

  • DataBase course notes 8*

    DataBase course notes 8

  • Programming problems avoidedall programming issues that arise with Second Normal Form (as well as Third and Boyce-Codd Normal Forms) deal with functional dependencies that can end up corrupting dataDataBase course notes 8*

    DataBase course notes 8

  • DataBase course notes 8*

    DataBase course notes 8

  • same authors information would have to be duplicated amongst all bookscannot delete only book and keep author aroundcannot insert only author whitout bookDataBase course notes 8*

    DataBase course notes 8

  • AnomaliesUPDATEduplicate data, have to update multiple rowsINSERTcannot insert data for an entity without relationship to any other entityDELETEcannot delete data for an entity without risk of looseing info about related entity DataBase course notes 8*

    DataBase course notes 8

  • Clues that entity is not in Second Normal Formrepeating key attribute name prefixes, indicating that values are probably describing some additional entitydata in repeating groups, showing signs of functional dependencies between attributescomposite keys without foreign key, which might be sign you have key values that identify multiple things DataBase course notes 8*

    DataBase course notes 8

  • Third Normal Formentity must be in Second Normal Form.non-key attributes cannot describe other non-key attributes

    Definition: A relation R is in the third normal form (FN3) if it is in FN2 and none of the non-key attributes is not functionally dependent on another non-key attribute of the relation.DataBase course notes 8*

    DataBase course notes 8

  • non-key attributes cannot describe other non-key attributesDataBase course notes 8*PublisherName -> PublisherCity

    DataBase course notes 8

  • Title defines title for the book defined by BookIsbnNumberPrice indicates price of the bookPublisherName describes the books publisherPublisherCity also sort of describes something about the book, in that it tells where the publisher was locateddoesnt make sense in this context, because location of publisher is directly dependent on what publisher is represented by PublisherNameDataBase course notes 8*

    DataBase course notes 8

  • DataBase course notes 8*Anomalies INSERT- cannot register a publisher unless there is a book that belongs to that publisher

    DELETE - if we delete the only book of a certain publisher, we lose all the information referring to that publisher

    UPDATE - the information referring to a certain publisher is redundant; if we want to update the information of a publisher, we must perform the same operation for all the books that belong to that publisher

    DataBase course notes 8

  • DataBase course notes 8*

    DataBase course notes 8

  • Publisher entity has data concerning only the publisherBook entity has book informationnow if we want to add information to our schema concerning the publisher, contact information or address, its obvious where we add that informationCity attribute clearly identifying publishernot the bookDataBase course notes 8*

    DataBase course notes 8

  • Clues that entities are not in Third Normal Formmultiple attributes with same prefixmuch like Second Normal Form, only this time not in the keyrepeating groups of datasummary data that refers to data in a different entity altogetherPrice in Invoice as SUM(Quantity*ProductCost) from LineItemsDataBase course notes 8*

    DataBase course notes 8

  • Boyce-Codd Normal FormRay Boyce, Edgar F. Coddentity is in First Normal Form.all attributes are fully dependent on a keyevery determinant is a keyDataBase course notes 8*

    DataBase course notes 8

  • Entity in BCNF if every Determinant is keyDeterminant Any attribute or combination of attributes on which any other attribute or combination of attributes is functionally dependent.BCNF extends previous normal forms by saying that each entity might have many keys, and all attributes must be dependent on one of these keysDataBase course notes 8*

    DataBase course notes 8

  • Third Normal Form table which does not have multiple overlapping candidate keys is guaranteed to be in BCNFThird Normal Form table with two or more overlapping candidate keys may or may not be in BCNFDefinition A relation R is in the Boyce-Codd Normal Form (BCNF), if, for every functional dependency X->A from R, where A is an attribute that doesnt belong to X => X is a key, or includes a key from R. DataBase course notes 8*

    DataBase course notes 8

  • Court BookingsDataBase course notes 8*

    CourtStart TimeEnd TimeRate Type109:3010:30SAVER111:0012:00SAVER114:0015:30STANDARD210:0011:30PREMIUM-B211:3013:30PREMIUM-B215:0016:30PREMIUM-A

    DataBase course notes 8

  • Court Bookingshard court (Court1) and grass court (Court2)booking defined by Court and period for which the Court is reserved booking has Rate Type associatedSAVER for hard made by members STANDARD for hard made by non-members PREMIUM-A for grass made by members PREMIUM-B for grass made by non-members DataBase course notes 8*

    DataBase course notes 8

  • Court Bookings - candidate keys {Court, Start Time} {Court, End Time} {Rate Type, Start Time} {Rate Type, End Time} DataBase course notes 8*

    DataBase course notes 8

  • table adheres to both 2NF and 3NFtable does not adhere to BCNFbecause of dependency Rate Type Court, in which the determining attribute (Rate Type) is neither a candidate key, nor a superset of a candidate key

    DataBase course notes 8*

    DataBase course notes 8

  • Rate TypesCourt BookingsDataBase course notes 8*

    Rate TypeCourtMember FlagSAVER1YesSTANDARD1NoPREMIUM-A2YesPREMIUM-B2No

    CourtStart TimeEnd TimeMember Flag109:3010:30Yes111:0012:00Yes114:0015:30No210:0011:30No211:3013:30No215:0016:30Yes

    DataBase course notes 8

  • candidate keys for Rate Types table are {Rate Type} and {Court, Member Flag}candidate keys for Court Bookings table are {Court, Start Time} and {Court, End Time}both tables are in BCNFhaving one Rate Type associated with two different Courts is now impossibleanomaly affecting original table has been eliminated

    DataBase course notes 8*

    DataBase course notes 8

  • Multivalue DependenciesThird Normal Form is generally considered pinnacle of proper database designserious problems might still remain in logical designDataBase course notes 8*

    DataBase course notes 8

  • Definition We say that there exists a multi-value dependency of the attribute Z on Y, or that Y performs a multi-determination on Z, Y->->Z, if, for every values x1, x2, y, z1, z2, where x1x2, z1 z2, such that the tuples (x1,y,z1) and (x2,y,z2) belong to R, then also the tuples (x1, y, z2) and (x2, y, z1) belong to R.

  • Fourth Normal Formentity must be in BCNFthere must not be more than one multivalue dependency between an attribute and the key of the entityDefinition A relationship R is in the fourth normal form if, for every multivalue dependency, X->->Y, then X is a key or includes a key in R.DataBase course notes 8*

    DataBase course notes 8

  • Fourth Normal Formtable is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X Y, X is a super key, X is either candidate key or a superset thereofDataBase course notes 8*

    DataBase course notes 8

  • Fourth Normal Form violationsternary relationshipslurking multivalued attributesDataBase course notes 8*

    DataBase course notes 8

  • DataBase course notes 8*

    RestaurantPizza VarietyDelivery AreaA1 PizzaThick CrustSpringfieldA1 PizzaThick CrustShelbyvilleA1 PizzaThick CrustCapital CityA1 PizzaStuffed CrustSpringfieldA1 PizzaStuffed CrustShelbyvilleA1 PizzaStuffed CrustCapital CityElite PizzaThin CrustCapital CityElite PizzaStuffed CrustCapital CityVincenzo's PizzaThick CrustSpringfieldVincenzo's PizzaThick CrustShelbyvilleVincenzo's PizzaThin CrustSpringfieldVincenzo's PizzaThin CrustShelbyville

    DataBase course notes 8

  • table has no non-key attributesmeets all normal forms up to BCNFnot in 4NF, non-trivial multivalued dependencies{Restaurant} {Pizza Variety} {Restaurant} {Delivery Area}eliminate possibility of anomaliesDataBase course notes 8*

    DataBase course notes 8

  • AnomaliesINSERT If we add a certain kind of pizza, delivered to a certain restaurant, then we have to repeat this information for every delivery area corresponding to that restaurant

    DELETE If we delete the information that corresponds to the only pizza delivered by a certain restaurant, then we have to delete the information that refers to all the areas that restaurant is delivering to.

    UPDATE If we want to update the name of the pizza delivered by a certain restaurant, then we have to update this name for all the corresponding delivery areas of that restaurant

  • DataBase course notes 8*4th NORMAL FORM (4NF) - OK

    RestaurantPizza VarietyA1 PizzaThick CrustA1 PizzaStuffed CrustElite PizzaThin CrustElite PizzaStuffed CrustVincenzo's PizzaThick CrustVincenzo's PizzaThin Crust

    RestaurantDelivery AreaA1 PizzaSpringfieldA1 PizzaShelbyvilleA1 PizzaCapital CityElite PizzaCapital CityVincenzo's PizzaSpringfieldVincenzo's PizzaShelb

    DataBase course notes 8

  • in contrast, if pizza varieties offered by restaurant sometimes did legitimately vary from one delivery area to another, the original three-column table would satisfy 4NFDataBase course notes 8*

    DataBase course notes 8

  • Fifth Normal Formnot every ternary relationship can be broken down into two entities related to a thirdaim of 5NF is to ensure that any ternary relationships that still exist in 4NF, can be decomposed into entities without loss of informationeliminates problems with update anomalies due to multivalve dependenciesDataBase course notes 8*

    DataBase course notes 8

  • DecompositionR=(Professor, Discipline, Language) assume to be in the 4-th normalformR1=(Professor, Discipline)R2=(Professor, Language)

    R1|> *(R1, R2, R3) is a join dependency on the relation R

  • A relation is in FN5 if and only if the coupling dependencies that exist in a relation are implied by a key of the relation

    Evidence(Professor, Student, Discipline, Language, Mark)Key: Student, Disciplinedecomposed, without loss of information, inSDP(Student, Discipline, Professor)SDL(Student, Discipline, Language)SDM (Student, Discipline, Mark)

  • Denormalizationused primarily to improve performance in cases where over-normalized structures are causing overhead to query processorwhether slightly slower (but 100 percent accurate) application is not preferable to a faster application of lower accuracyduring logical modeling, we should never step back from our normalized structures to performance-tune our applications proactivelyDataBase course notes 8*

    DataBase course notes 8

    ***********************************************************