Data Management: Databases and Organizations Richard Watson

  • Published on

  • View

  • Download

Embed Size (px)


Data Management: Databases and Organizations Richard Watson. Summary of Chapter 7 and Basic Structures prepared by Kirk Scott. Data Modeling and SQL. Chapter 7. Data Modeling Reference: Basic Structures. Chapter 7. Data Modeling. - PowerPoint PPT Presentation


<p>Slide 1</p> <p>Data Management: Databases and OrganizationsRichard WatsonSummary of Chapter 7 and Basic Structures prepared by Kirk Scott1Data Modeling and SQLChapter 7. Data ModelingReference: Basic Structures2Chapter 7. Data ModelingThe building blocks of data modeling should be familiar to you:EntitiesAttributesRelationshipsIdentifiers (keys)The next five overheads taken from chapter 7 review the ER notation for these things34</p> <p>5</p> <p>6</p> <p>7</p> <p>8</p> <p>A model is the starting point for creating a databaseNo table need be created before the model is completeQuality of the data model is essentialThe model should be well formed: It should follow the basic rules for entities, attributes, relationships, and keysThe following overhead summarizes the characteristics of a well formed model910</p> <p>A quality data model should be high-fidelityThis means that it has to accurately and completely model the situation in the problem domainA model which is well formed but does not model the problem domain is useless from a practical point of view11The phrase quality improvement in the context of data models means this:It is unrealistic to assume that a good data model can be created on the first tryA data model will evolve as technical mistakes are caughtMore importantly, it will evolve as a result of interaction with users as the problem domain and requirements are more completely understood12The Stock ExampleA simple data model for nations and stocks is given on the next overheadSuperficially, it seems OKIt could be verbally summarized as Nations have stocks1314</p> <p>The book now introduces the following additional textual informationStocks are listed on stock exchanges (a new entity)A nation may have &gt;1 stock exchangeA given stock may be listed on &gt;1 exchange, but it has 1 home exchange15Stocks can be listed on the exchanges of &gt;1 countryNotice that the abstraction of a listing is repeated in this descriptionThat suggests that a listing itself will be an entityThe next overhead shows a revised model that takes into account the new assumptions1617</p> <p>The Geography ExampleNext the book gives a simple example thats supposed to model the relationships between nations, administrative units (states), and citiesSee the next overhead for a straightforward model of this1819</p> <p>The book next observes that exceptions are the bane of a good modelIf you presume to model these globally, then your model should accommodate all possible situationsThe book asks, How many errors can you find in the initial data model?See the table on the following overheads for answers2021</p> <p>22</p> <p>The next overhead shows a nations, administrative units, cities data model that has been revised to take into account these exceptional cases/errors in the initial modelThis revised model may seem needlessly complexHowever, the complexity is not needlessThis is an accurate model of the situation that covers all casesThe initial model was insufficiently complexIt was wrong2324</p> <p>The Women, Men, Marriage, and People ExamplesThis topic was brushed on all the way back in unit oneCapturing the relationships among people is a very common problem that leads to some familiar challenges and design/model choicesOn the following overhead is an ER diagram of the relationship between married men and women2526</p> <p>The foregoing model is obviously hilariously limited in the kind of relationship it can captureIn addition, the book points out the following characteristics of the model which might indicate that a different model would be better1. The labeling of the model indicates that this is a marriage, but there is nothing in the fields that spells this outIn particular, you might think that there would be a date field, a marriage license number, something among the fields that was specific to marriage</p> <p>272. The Man and Woman tables have the same set of attributes, different only in their being name manX or womanXThis might suggest that we are dealing with one entity type, person, rather than two distinct entity types, man and woman283. The last observation concerns the fields manoname and womanonameThese stand for other nameAs the model stands, a person can only have one other nameAlternatively, if the other name field is text, it might be filled with multiple valuesnot an ideal solutionA complete treatment of people and other names might introduce another table so that there could be a one-to-many relationship between people and their various names29The book doesnt solve all of these problems, but it does come up with a second modelIf there were two types to begin with and you combine into one, you frequently get a new field in the resultA person now has a gender fieldAlso, the labeling of the relationship could be made more genericSee the following overhead3031</p> <p>Next the book tackles the topic of multiple marriagesIf youre dealing with a Person table, then the table is in a many-to-many relationship with itselfTo distinguish between multiple marriages, potentially between the same partners, beginning and ending date fields can be added to the table in the middleSee the next overhead for the third version of the model3233</p> <p>In the long run, some sort of arbitrary numbering scheme might be desirableA marriage license number might work, but the book points out that legally speaking it might also be desirable to record common law marriagesNotice in general that a lot of data integrity questions start to arise with a model like thisSee the next overhead for the fourth version of the model</p> <p>3435</p> <p>Next the book considers adding children to the modelChildren are modeled as the result of marriageOf course, this is not always the caseAs long as the marriageno field in the person table can be null, the model accommodates thatStill, it doesnt allow you to record who a persons parents are if the person wasnt the result of marriageSee the next overhead for the fifth version of the model3637</p> <p>The person model could be developed even furtherThis model barely scratches the surface of the variety of human relationshipsIt is already moderately complex but could become more complex38A model is complete when it contains everything needed in practice for a given problemThe model is unsuitable if it isnt complex enoughIt is also unsuitable if it contains detail that isnt needed</p> <p>39The Book ExampleThe book entitles this Whens a book not a book?In other words, the example is an invitation to clarify what you mean when you refer to entities in a designAre you referring to individual objects?Are you referring to kinds of objects?What elements of a design make it possible to distinguish between these meanings?A simplistic initial design is given on the next overhead4041</p> <p>The book observes that a library may have more than one copy of a bookYou might be tempted to model this by adding a copy number to the book recordThe problem with that solution is that the basic book information would be repeated for every copyThe solution is to treat a book as an abstract entity and a copy as a separate, concrete entitySuch a design is shown on the next overhead4243</p> <p>You may have noticed that although the ISBN should be a unique identifier for a book (not a copy) it is not used as a primary key in these designsThe problem is that books before a certain date did not have ISBNsAlso, you may have hand-crafted modern books that werent commercially published and dont have ISBNs44The Employment History ExampleThis example starts out simply enoughA given company has divisionsThe divisions have departmentsDepartments have employeesThis is shown in the ER diagram on the next overhead4546</p> <p>Next, the author observes that over time a given employee may hold different positionsThese positions may be in different departmentsLike marriages, the distinguishing features of positions may include a beginning and ending dateThis is shown in the ER diagram on the next overhead4748</p> <p>Next the author introduces the concept of a payslip into the record-keeping that the model includesIts not fully fleshed out in the next example, but when you look at the diagram you may have an inkling that the treatment of payslips is reminiscent of the treatment of line itemsThis is shown in the ER diagram on the next overhead4950</p> <p>The final design treats payslips exactly like the line item exampleA payslip is like a bill of salePay slip text is like an itemThe table in the middle, PaySlipLine, is like LineItemThe pk of PaySlipLine is the concatenation of the pk of Payslip embedded as a fk, plus a pay slip number (payslipno)The pk of PslText is embedded separately as a fkThis is shown in the ER diagram on the next overhead5152</p> <p>The Aircraft Leasing ExampleIn the previous set of overheads the first design containing a cycle cropped upThis example also contains a cycleThere are three base tables and three tables in the middleEach of the base tables is in a many-to-many relationship with each otherOverall, the tables are in a many-to-many-to-may relationshipThis is shown in the ER diagram on the next overhead5354</p> <p>How to properly model a situation becomes an important question in the next chapter, on normalizationIn the meantime, the following observation can be made:An aircraft lease is an abstract entity that seems to be part of the business problemHowever, it doesnt appear in the designThis isnt just a problem in a theoretical sense</p> <p>55First of all its clear that in order to get complete information about a lease from this design a 6-way join would be neededThats inconvenientAlso, leases themselves may have attributes like starting and ending datesThere is no place to record themAn improved, star-like design for the problem is shown on the next overhead</p> <p>5657</p> <p>The Project Management ExampleThis example addresses the question of where something could or should be modeledIt impinges on the question of how the model has to be changed to capture a more detailed business situationThe first model is given on the next overheadIt should be relatively self-explanatory5859</p> <p>Now consider the altered model on the following overheadThe planned hours attribute has been moved from the Activity entity to the Daily Work entityThis small change in location of a field has a clear and logical outcomeThe planning of project hours is done on a daily basis, not an activity basis6061</p> <p>Cardinality and ModalityCardinality refers to the count of the number of instances of entities in a relationshipModality is a fancy way of saying that there can be 0 entities in a relationshipIn other words, one end of a relationship is optionalThis condition obtains, for example, when a pk in one table has not fk entries in anotherIt also obtains when a fk value is null62The book gives the table shown on the next overhead summarizing cardinality and modality6364</p> <p>The author now enhances the notation for ER diagramsUnlike UML, it is not customary to mark actual digits at the ends of crows feetInstead, a short vertical bar marks the end of a relationship where an instance of an entity is mandatoryAn o marks the end of a relationship where an instance of an entity is optional65The Nation and Stock ExampleThe following diagram of the 1-m relationship between nations and stocks illustrates this new notationNation has a barStock has an oEvery stock has to have a nationThe nation code field in the Stock table cant be nullA nation doesnt have to have a stockThere can be nation code values in the Nation table where no such nation code appears in the Stock table6667</p> <p>The Sale, Item, and Lineitem ExampleThe following diagram of the m-n relationship between sales and items also illustrates this new notationA sale has to have at least one line itemA line item has to belong to a saleA line item has to have an itemAn item doesnt have to be part of a sale</p> <p>68These verbal statements can be translated into null/not null and existence/non-existence requirements for fields and rows in tablesThe new thing illustrated by this example is that if you have a row for a sale in the Sale table, the ER diagram now states that it has to have a corresponding record in the Lineitem tableThis is not something that can be enforced by the database using referential integrity, for exampleIt is a new kind of data integrity constraint</p> <p>6970</p> <p>The Department and Employee ExampleThe following diagram of the 1-m and 1-1 relationships between departments and employees also illustrates this new notationAn employee has to have a departmentA department doesnt have to have employeesA department has to have a bossAn employee doesnt have to be the boss of a department</p> <p>71It is worth paying close attention to the 1-1 relationshipIt looks a little odd to have a line with no crows foot with a bar at one end and an o at the otherRecall that in order to reduce the number of nulls, the 1-1 relationship was captured by embedding the pk of Employee as a fk in DepartmentThe notation means that the fk cant be nullIt also means that not every pk of Employee has to appear as a fk value7273</p> <p>Recall that when presented earlier, the Employee-Department diagram grew to include the (recursive) relationship telling which employee was which other employees bossIn the following diagram, this line has os at both endsThis means its possible to have employees who are not bossesIt also means that the embedded fk field can be nullIn other words, there can be employees who dont have bosses7475</p> <p>The Monarch ExampleThe monarch succession relationship can also be marked for modalityThe first monarch would have no predecessorThe current monarch would have no successor (yet)Both ends of the relationship are optionalThis is shown in the ER diagram on the following overhead7677</p> <p>The Product Assembly ExampleModality can also be added to the product-assembly exampleIf there is an assembly entry, it has to have a super-productLikewise, if there is an assembly entry, it has to have a sub-productOn the other hand, there can be products that are neither super-products nor sub-products78It is interesting to note that in this situation the vertical bars repeat information that can be inferred from the rest of the diagramThe + signs on the crows feet mean that the embedded foreign keys are also primary keysAs primary keys, they cant be nullAs foreign keys, referential integrity states that their values have to occur in the corresponding primary key tableTherefore, the corresponding super-product or sub-product entry has to existIt is mandatory7980</p> <p>Entity TypesThe author categorizes entities into the following types:IndependentWeak or dependentAssociativeAggregateSubordinate81Independent entitiesThe following ER diagram shows two independent entitiesInstances of each can exist regardless of the existence of matching instances of the otherAlthough a pk is embedded as a fk, the pk may have no matches and the fk may be nullIndependent entities are usually the easiest base tables to recognize in a problem domain8283</p> <p>Weak or Dependent...</p>