Database_Design_-_A_Practical_Guide

Embed Size (px)

Citation preview

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    1/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA1

    2007 Progress Software Corporation1

    Database Design, a Practical Guide

    Gus BjrklundWizard, Progress Software Corporation

    Brandon GibbsManager, US-East, Solution Engineer

    Progress Software Corporation

    2007 Progress Software Corporation3

    Rules are made to be broken

    To every rule,there is an exception!

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    2/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA2

    2007 Progress Software Corporation4

    If you thought this talk was going to be aboutindexing

    It isnt. Nor is it about performance.

    2007 Progress Software Corporation5

    Topics

    Theory:

    What is Database Design

    Basic Elements

    Representing the Model as Tables

    Practice

    An Example Some Other Topics

    2007 Progress Software Corporation6

    First, a little theory

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    3/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA3

    2007 Progress Software Corporation7

    What do we mean by databasedesign?

    A process for defining a modelof a subset ofthe real1 world, then representing it as datain tables in a relational database

    At least, thats the definition we will use forthe purposes of this talk.

    1 Well, for small values of real, anyway.

    2007 Progress Software Corporation8

    Basic Elements

    Just 3 Things:

    Entities

    Attributes

    Relationships

    What do we put in our model?

    The entity-relationship model was described by Peter Chen in 1976.

    See http://bit.csc.lsu.edu/~chen/chen.html

    2007 Progress Software Corporation9

    Basic Elements: Entities

    Can be thought of as nouns

    People

    author, composer, performer, seller, buyer

    Places

    home, IP address, URL, destination, factory,store

    Things

    song, recording, instrument, car, invoice

    Is telephone number a place or a thing?

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    4/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA4

    2007 Progress Software Corporation10

    Basic Elements: Attributes

    Can be thought of as adjectives (but only loosely): Length

    Color

    Horsepower

    Part number

    Song Title

    Publication Date

    Size

    Fabric

    Owner

    Is telephone number a attribute or an entity?

    Entities have attributes

    2007 Progress Software Corporation11

    Basic Elements: Relationships

    Can be thought of as verbs: has a

    owns

    contains

    supervises

    performs

    called

    sold

    purchased

    proved

    Entities are connectedbyrelationships

    Is telephone number a relationship?

    2007 Progress Software Corporation12

    Relationships have attributes too

    In May, 1995,Andrew Wiles

    publisheda proof

    of Fermats Last Theorem

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    5/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA5

    2007 Progress Software Corporation13

    Relationships have attributes too

    In May, 1995,Andrew Wiles

    publisheda proof

    of Fermats Last Theorem

    entity

    entity

    relationship

    attribute

    2007 Progress Software Corporation14

    What goes in an entity

    Identifying attributes

    Must be able to uniquely identify the entity

    Can have more than one way to id

    Id can be composite

    Descriptive attributes

    the values you need to keep track of generally should be simple, not complex

    2007 Progress Software Corporation15

    What to include in your model

    The things your application has to keep track of Telephones, wires, switches

    The actions your application or its users perform Make calls, send telephone bills, collect payments

    Some attributes of the things and actions Originating number, date and time of call, duration, called

    number

    Keep it simple

    Be accurate

    Keep it up to date

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    6/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA6

    2007 Progress Software Corporation16

    What to include in your model

    Consider the goals of the system Everything you include should be there for a

    reason you can state

    in no more than two sentences

    Everything should have a clear name

    if you cant name it, it doesnt belong

    Talk to the stakeholders !!!

    2007 Progress Software Corporation17

    What to leave out of your model

    The real world has properties that dontmatter (to your application)

    The real world has relationships that dontmatter

    Things happen in the real world that dont

    matter Keep it simple

    If you cant say why you need it, leave it out

    2007 Progress Software Corporation18

    Logical vs Physical Data Models

    Logical entities often require multiple tables torepresent them Tables can be thought of as logical or physical

    It depends on your point of view

    There is also the physical storage database layout storage areas

    data extents

    disks

    etc.

    We arent going to talk about the physical databaselayout

    We will talk about tables

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    7/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA7

    2007 Progress Software Corporation19

    Mapping Your Model to aDatabase

    Entities become tables Identifiers become indexes

    Attributes become columns

    Data types: pick appropriate

    Relationships become tables or foreign keys

    Simply put,

    2007 Progress Software Corporation20

    In theory, there is no difference betweentheory and practice, but in practice there is.

    Jan van de Snepscheut

    2007 Progress Software Corporation21

    Now for some practice.

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    8/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA8

    2007 Progress Software Corporation22

    An example

    Music store Buys compact disc recordings from

    distributors

    Has inventory

    Allows customers to search for what they want

    Maybe in an in-store kiosk or on the web

    Sells compact discs to customers

    2007 Progress Software Corporation23

    What should we do first?

    2007 Progress Software Corporation24

    Activities

    We buy discs from a distributor

    Orders are sent to a distributor

    Orders are delivered to the store

    Orders may be cancelled

    We sell discs to customers in sales transactions

    Customers buy discs in sales transactions

    Customers search for what they want to buy

    Which of these must be remembered by the system?

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    9/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA9

    2007 Progress Software Corporation25

    What do we need to keep track of

    Discs we have Discs we sold

    Discs we know about and can get

    Discs we have ordered

    Information needed to do our income tax what we paid for stock

    when we bought it

    what we sold it for

    when we sold it

    2007 Progress Software Corporation26

    Disc entities

    UPC Code: 8697-07416-2

    Manufacturer: Sony BMG

    Cost to us: $ 2.00

    Price charged: $ 17.95

    Tax charged: $ 0.80

    Date purchased: March 19, 2007 Date sold: June 9, 2007

    2007 Progress Software Corporation27

    Disc table might look like this

    upc manuf cost price tax datePurch dateSold

    86 97 -0 7416-2 Sony BMG 2 .00 17 .95 0. 90 20 07-03-19 200 7-06-09

    8697-07416-2 Sony BMG 2.00 ? ? 2007-06-09 ?

    314-510347- 2 Island Rec or ds 2 .21 15.95 0 .80 2006- 01-12 2007- 02-14

    314-510347-2 Island R ecords 2.21 ? ? 2006-01-12

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    10/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA10

    2007 Progress Software Corporation28

    Whats wrong?

    Is upc a unique identifier? Might have bought from a distributor

    Have no information about what is on the disc How do customers search?

    Dont know when disc was made

    Could be more than one tax jurisdiction provincial tax, city tax

    Dont know if disc is on order

    Dont know who bought it

    Duplicated data

    Etc., etc.

    2007 Progress Software Corporation29

    Disc entities take 2

    UPC Code: 8697-07416-2

    Manufacturer: Sony BMG

    Dist ri butor : Bob s Wholesale CDs

    Cost to us: $ 2.00

    Price charged: $ 17.95

    Tax charged: $ 0.80

    Date ordered : March 19, 2007

    Date received: March 20, 2007

    Date sold: June 9, 2007 Disc Title: The EssentialJoshua Bell

    Artist: Joshua Bell

    Track 1: Danse Russe

    Tr ack 2: Vi olin Concerto in E Minor

    Tr ack 3: Nocturne in C-shar p Minor

    etc.

    2007 Progress Software Corporation30

    Example: Now Whats wrong?

    This is getting messy

    Activities combined with discs attributes

    Have duplicated information

    How many tracks can there be?

    What if there is more than one artist?

    Dont have all the information a customermight want to use to search

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    11/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA11

    2007 Progress Software Corporation31

    Discs revisited

    Discs have titles Discs have pictures on the cover

    Discs contain tracks

    Discs are made by manufacturers

    Discs are purchased from distributors

    Discs are ordered from distributors

    Discs are delivered to the store

    Discs are sold to customers

    2007 Progress Software Corporation32

    Discs contain tracks

    Tracks contain songs

    Tracks occur in order

    Tracks have a duration

    Songs are performed in performances

    Songs have performers (usually)

    Songs have composers

    Songs have names (titles)

    Songs have a key (but not always)

    Performances are done by performers

    Performers can be groups (bands, orchestras, etc.)

    Performances are performed in a location or venue

    2007 Progress Software Corporation33

    We seem to need these entities

    Discs

    Manufacturers

    Distributors

    Orders

    Customers

    Inventory

    Tracks

    Songs

    Performers

    Groups ?

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    12/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA12

    2007 Progress Software Corporation34

    Songs have names (titles).

    Are names properties of songs?

    Or are they entities related to songs?

    Or are they something else?

    2007 Progress Software Corporation35

    Song data (track 1)

    Title Danse Russe from Swan Lake, Op.20

    Time 4:30

    Composer Peter Tchaikovsky

    Category Classical, v iolin, orchest ra

    Performers Joshua Bell, Michael Tilson Thomas,

    Berlin Philharminic OrchestraTrack number 1

    Disc upc 8697-07416-2

    2007 Progress Software Corporation36

    Song data (track 2)

    Title Violin Concerto in E Minor, Op. 64

    Time 6:27

    Composer Felix Mendelssohn

    Category Classical, v iolin, orchest ra

    Performers Joshua Bell, Sir Roger Norrington,Camerata Salzburg

    Track number 2

    Disc upc 8697-07416-2

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    13/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA13

    2007 Progress Software Corporation37

    Performance data

    Title Violin Concerto in E Minor, Op. 64

    Time 6:27

    Composer Felix Mendelssohn

    Category Classical, v iolin, orchest ra

    Performers Joshua Bell, Sir Roger Norrington,Camerata Salzburg

    2007 Progress Software Corporation38

    Performance data take 2

    Title Violin Concerto in E Minor, Op. 64

    Time 6:27

    Composer Felix Mendelssohn

    Category Classical, v iolin, orchest ra

    Performers Joshua Bell, Sir Roger Norrington,Camerata Salzburg

    PerformanceDate

    ?

    PerformanceLocation

    ?

    2007 Progress Software Corporation39

    Performer data

    id name

    1 Joshua Bell

    2 Sir Roger Norrington

    3 Camerata Salzburg

    4 Michael Tilson Thomas

    5 Berlin Philharmonic

    6 Bono

    7 The Edge

    8 Adam Clayton

    9 Larry Mullen

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    14/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA14

    2007 Progress Software Corporation40

    Performance to PerformerRelationship

    performance id performer id

    1 11 2

    1 3

    1

    2 1

    2 4

    2 5

    2

    325 6

    325 7

    325 8

    325 9

    2007 Progress Software Corporation41

    Performance data take 3

    Performance id 2

    Title Violin Concerto in E Minor, Op. 64

    Time 6:27

    Composer Felix Mendelssohn

    Category Classical, v iolin, orchest ra

    2007 Progress Software Corporation42

    Track to PerformanceRelationship

    Disc up c Track Num Performance id

    8697-07416-2 1 1

    8697-07416-2 2 2

    314-510347-2 1 325

    h 3 D t b D i P ti l G id

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    15/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA15

    2007 Progress Software Corporation43

    Relationships (so far):

    disctrack

    track

    track

    trackperformance

    performance

    performance

    performance

    performer

    performerone to one

    one to many

    many to many

    2007 Progress Software Corporation44

    What happened to Songs?

    2007 Progress Software Corporation45

    Relationships (take 2):

    disctrack

    track

    track

    tracksong

    performance

    performance

    performance

    performer

    performer

    one to one

    one to many

    many to many

    performance

    performance

    performance

    song

    one to many

    arch 3: Database Design a Practical Guide

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    16/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA16

    2007 Progress Software Corporation46

    Relationships (take 3):

    disc

    track

    track

    track

    performance

    performance

    performance

    performer

    performer

    song

    song

    song

    2007 Progress Software Corporation47

    What aboutbusiness entities

    ?

    Where are they

    ?

    2007 Progress Software Corporation48

    Business entities

    disc

    track

    track

    track

    performance

    performance

    performance

    performer

    performer

    song

    song

    song

    arch 3: Database Design a Practical Guide

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    17/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA17

    2007 Progress Software Corporation49

    Business entities

    disc

    track

    track

    track

    performance

    performance

    performance

    performer

    performer

    song

    song

    song

    2007 Progress Software Corporation50

    Business entities

    disc

    track

    track

    track

    performance

    performance

    performance

    performer

    performer

    song

    song

    song

    2007 Progress Software Corporation51

    Should you use arrays?

    arch 3: Database Design, a Practical Guide

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    18/25

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA18

    2007 Progress Software Corporation52

    Indexes

    Enforce uniqueness

    Make searches faster

    Enable fast retrieval of entities by theiridentities

    Enable finding entities with certain attributes

    2007 Progress Software Corporation53

    What indexes do we needfor the music store database?

    2007 Progress Software Corporation54

    Tables

    0) Discs1) Tracks2) Songs3) Performers4) Performances5) Tracks of discs6) Performances of songs7) Performers of performances

    arch 3: Database Design, a Practical Guide

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    19/25

    g ,

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA19

    2007 Progress Software Corporation55

    What indexes do we need

    0) Indexes for identifying attributes1) A unique row identifier2) Indexes for the queries you will do

    2007 Progress Software Corporation56

    What should we do next ?

    2007 Progress Software Corporation57

    Other Topics

    Normalization

    Unique keys

    Word indexes

    Naming

    Customisation

    arch 3: Database Design, a Practical Guide

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    20/25

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA20

    2007 Progress Software Corporation58

    Normalization

    Oversimplified, it means: Dont duplicate data

    Attributes should be simple have only one value

    be necessary

    not derived data

    dont repeat

    Complicated attributes are often entities intheir own right For example, addresses might be

    2007 Progress Software Corporation59

    Unique keys

    EVERY table must have a unique key

    EVERY row needs a unique identifier that never changes even if moved to another database

    (i.e. if you replicate)

    Often, users dont need to see it

    Use a UUID or sequence or maybe datetime

    Unique key is the ONLY way to identify rows

    unambiguously ROWIDs are temporary and can change

    Use the same method throughout Youll be glad you did

    2007 Progress Software Corporation60

    Word indexes

    Can be used to hold multiple status orattribute values Conflicts with normalisation

    Flexible

    Easy to add new ones

    Queries are fast

    Example: Category: classical, violin, orchestral, concerto

    arch 3: Database Design, a Practical Guide

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    21/25

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA21

    2007 Progress Software Corporation61

    Naming

    What is in the column GL01262 ?

    Good names are crucial to understanding

    2007 Progress Software Corporation62

    Naming

    Table and column names should have clearmeanings everyone can understand GL01262 vs dateEntered

    Names with dashes cause inconveniencewith SQL order-date

    Booleans should be named for truth value backOrdered

    No double negations notOutOfStock

    Good names are crucial to understanding

    2007 Progress Software Corporation63

    Making tables customizable

    Spare columns

    Separate table with spare columns

    Separate table with name/value pairs

    We will look at 3 ways:

    arch 3: Database Design, a Practical Guide

    G Bj NO G

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    22/25

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA22

    2007 Progress Software Corporation64

    Spare columns in table

    custnum name city

    001 Bob Phoenix

    002 Alice Boston

    003 Eve Denver

    extra1 extra2 extra3

    frozen ? 0.0

    ? 125.46 0.12

    ? ? ?

    2007 Progress Software Corporation65

    Spare columns in table

    custnum name city

    001 Bob Phoenix

    002 Alice Boston

    003 Eve Denver

    extra1 extra2 extra3

    frozen ? 0.0

    ? 125.46 0.12

    ? ? ?

    What data types should you use?How many spare columns?Wasted columns when not usedHow do you know what each spare got used for?How do you know how many unused spares you have?

    2007 Progress Software Corporation66

    Separate table for spare columns

    custnum name city

    001 Bob Phoenix

    002 Alice Boston

    003 Eve Denver

    custnum extra1 extra2 extra3

    001 frozen ? 0.0

    002 ? 125.46 0.12

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    23/25

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA23

    2007 Progress Software Corporation67

    Separate table for spare columns

    custnum name city

    001 Bob Phoenix

    002 Alice Boston

    003 Eve Denver

    custnum status owed discount

    001 frozen ? 0.0

    002 ? 125.46 0.12

    2007 Progress Software Corporation68

    Separate table with name/valuepairs

    custnum name city

    001 Bob Phoenix

    002 Alice Boston

    003 Eve Denver

    custnum name value

    001 status frozen

    002 owed 125.46

    002 discount 0.12

    2007 Progress Software Corporation69

    Modeling Tools

    PCase

    Enterprise Architect

    Power Designer

    ConceptDraw

    Erwin

    Rational

    Pencil and paper !

    Blackboard !

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    24/25

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA24

    2007 Progress Software Corporation70

    Summary

    Understand the requirements

    Leave out what is not needed

    Review the design with stakeholders

    Evolve the design as changes come up

    Test to make sure it works

    Can it do everything that is needed?

    Does it perform adequately?

    Expect changes to come

    2007 Progress Software Corporation71

    Homework

    Papers Wiles, A.: "Modular elliptic curves and Fermat's Last

    Theorem, Annals of Mathematics 141 (3): 443-551

    Chen, P.: The Entity-Relationship Model -- Toward aUnified View of Data, ACM TODSVol 1, No 1, 1976

    Wikipedia articles to start from: entity-relationship model

    data model

    Books: Teorey, Lightstone, Nadeau: Database Modelingand

    Design, Morgan Kaufmann.

    2007 Progress Software Corporation72

    Questions

    arch 3: Database Design, a Practical Guide

    Gus BjUNOXQG

  • 8/7/2019 Database_Design_-_A_Practical_Guide

    25/25

    Gus BjUNOXQG

    Progress Exchange 2007

    10-13 June, Phoenix, AZ, USA25

    2007 Progress Software Corporation73