72
ARCH-3: Database Design, a Practical Guide Click to add subtitle Gus Björklund Wizard, Progress Software Corporation

ARCH-3: Database Design, a Practical Guide Click to add subtitle Gus Björklund Wizard, Progress Software Corporation

Embed Size (px)

Citation preview

ARCH-3: Database Design, a Practical Guide

Click to add subtitle

Gus BjörklundWizard, Progress Software Corporation

© 2007 Progress Software Corporation2 ARCH-3: Database Design A Practical Guide

Ask questions as we goif I am not being clear.

Warning: there is a mistake in these slides.

© 2007 Progress Software Corporation3 ARCH-3: Database Design A Practical Guide

Rules are made to be broken

To every rule,there is an exception!

© 2007 Progress Software Corporation4 ARCH-3: Database Design A Practical Guide

If you thought this talk was going to be about indexing …

It isn’t. Nor is it about performance.

© 2007 Progress Software Corporation5 ARCH-3: Database Design A Practical Guide

Topics

Theory:• What is Database Design

• Basic Elements

• Representing the Model as Tables

Practice• An Example

Some Other Topics

© 2007 Progress Software Corporation6 ARCH-3: Database Design A Practical Guide

First, a little theory

© 2007 Progress Software Corporation7 ARCH-3: Database Design A Practical Guide

What do we mean by database design?

A process for defining a model of a subset of the “real”1 world, then representing it as data in tables in a relational database

At least, that’s the definition we will use for the purposes of this talk.

1 Well, for small values of real, anyway.

© 2007 Progress Software Corporation8 ARCH-3: Database Design A Practical Guide

Basic Elements

Just 3 Things:• Entities

• Attributes

• Relationships

What do we put in our model?

The “entity-relationship model” was described by Peter Chen in 1976.

See http://bit.csc.lsu.edu/~chen/chen.html

© 2007 Progress Software Corporation9 ARCH-3: Database Design A Practical Guide

Basic Elements: Entities

Can be thought of as nouns• People

– author, composer, performer, seller, buyer

• Places– home, IP address, URL, destination, factory,

store

• Things– song, recording, instrument, car, invoice

Is “telephone number” a place or a thing?

© 2007 Progress Software Corporation10 ARCH-3: Database Design A Practical Guide

Basic Elements: Attributes

Can be thought of as adjectives (but only loosely):• Length• Color• Horsepower• Part number• Song Title• Publication Date• Size• Fabric• Owner

Is “telephone number” a attribute or an entity?

Entities have attributes

© 2007 Progress Software Corporation11 ARCH-3: Database Design A Practical Guide

Basic Elements: Relationships

Can be thought of as verbs:• has a• owns• contains• supervises• performs• called• sold• purchased• proved

Entities are connected by relationships

Is “telephone number” a relationship?

© 2007 Progress Software Corporation12 ARCH-3: Database Design A Practical Guide

Relationships have attributes too

In May, 1995,Andrew Wiles

publisheda proof

of Fermat’s Last Theorem

© 2007 Progress Software Corporation13 ARCH-3: Database Design A Practical Guide

Relationships have attributes too

In May, 1995,Andrew Wiles

publisheda proof

of Fermat’s Last Theorem

entity

entityrelationship

attribute

© 2007 Progress Software Corporation14 ARCH-3: Database Design A Practical Guide

What goes in an entity

Identifying attributes• Must be able to uniquely identify the entity

• Can have more than one way to id

• Id can be composite

Descriptive attributes• the values you need to keep track of

• generally should be simple, not complex

© 2007 Progress Software Corporation15 ARCH-3: Database Design A Practical Guide

What to include in your model

The things your application has to keep track of• Telephones, wires, switches

The actions your application or its users perform• Make calls, send telephone bills, collect payments

Some attributes of the things and actions• Originating number, date and time of call, duration, called

number

Keep it simple Be accurate Keep it up to date

© 2007 Progress Software Corporation16 ARCH-3: Database Design A Practical Guide

What to include in your model

Consider the goals of the system Everything you include should be there for a

reason you can state• in no more than two sentences

Everything should have a clear name• if you can’t name it, it doesn’t belong

Talk to the stakeholders !!!

© 2007 Progress Software Corporation17 ARCH-3: Database Design A Practical Guide

What to leave out of your model

The real world has properties that don’t matter (to your application)

The real world has relationships that don’t matter

Things happen in the real world that don’t matter

Keep it simple• If you can’t say why you need it, leave it out

© 2007 Progress Software Corporation18 ARCH-3: Database Design A Practical Guide

Logical vs Physical Data Models

Logical entities often require multiple tables to represent them• Tables can be thought of as logical or physical• It depends on your point of view

There is also the physical storage database layout• storage areas• data extents• disks• etc.

We aren’t going to talk about the physical database layout

We will talk about tables

© 2007 Progress Software Corporation19 ARCH-3: Database Design A Practical Guide

Mapping Your Model to a Database

Entities become tables• Identifiers become indexes

Attributes become columns• Data types: pick appropriate

Relationships become tables or foreign keys

Simply put,

© 2007 Progress Software Corporation20 ARCH-3: Database Design A Practical Guide

“In theory, there is no difference betweentheory and practice, but in practice there is.”

Jan van de Snepscheut

© 2007 Progress Software Corporation21 ARCH-3: Database Design A Practical Guide

Now for some practice.

© 2007 Progress Software Corporation22 ARCH-3: Database Design A Practical Guide

An example

Music store• Buys compact disc recordings from

distributors

• Has inventory

• Allows customers to search for what they want– Maybe in an in-store kiosk or on the web

• Sells compact discs to customers

© 2007 Progress Software Corporation23 ARCH-3: Database Design A Practical Guide

What should we do first?

© 2007 Progress Software Corporation24 ARCH-3: Database Design A Practical Guide

Activities

We buy discs from a distributor Orders are sent to a distributor Orders are delivered to the store Orders may be cancelled We sell discs to customers in sales transactions Customers buy discs in sales transactions Customers search for what they want to buy

Which of these must be remembered by the system?

© 2007 Progress Software Corporation25 ARCH-3: Database Design A Practical Guide

What do we need to keep track of

Discs we have Discs we sold Discs we know about and can get Discs we have ordered Information needed to do our income tax

• what we paid for stock• when we bought it• what we sold it for• when we sold it

© 2007 Progress Software Corporation26 ARCH-3: Database Design A Practical Guide

Disc entities

UPC Code: 8697-07416-2 Manufacturer: Sony BMG Cost to us: $ 2.00 Price charged: $ 17.95 Tax charged: $ 0.80 Date purchased: March 19, 2007 Date sold: June 9, 2007

© 2007 Progress Software Corporation27 ARCH-3: Database Design A Practical Guide

Disc table might look like this

upc manuf cost price tax datePurch dateSold

8697-07416-2 Sony BMG 2.00 17.95 0.90 2007-03-19 2007-06-09

8697-07416-2 Sony BMG 2.00 ? ? 2007-06-09 ?

314-510347-2 Island Records 2.21 15.95 0.80 2006-01-12 2007-02-14

314-510347-2 Island Records 2.21 ? ? 2006-01-12

© 2007 Progress Software Corporation28 ARCH-3: Database Design A Practical Guide

What’s wrong?

Is upc a unique identifier? Might have bought from a distributor Have no information about what is on the disc

• How do customers search? Don’t know when disc was made Could be more than one tax jurisdiction

• provincial tax, city tax Don’t know if disc is on order Don’t know who bought it Duplicated data Etc., etc.

© 2007 Progress Software Corporation29 ARCH-3: Database Design A Practical Guide

Disc entities take 2

UPC Code: 8697-07416-2 Manufacturer: Sony BMG Distributor: Bob’s Wholesale CD’s Cost to us: $ 2.00 Price charged: $ 17.95 Tax charged: $ 0.80 Date ordered: March 19, 2007 Date received: March 20, 2007 Date sold: June 9, 2007 Disc Title: “The Essential Joshua Bell” Artist: Joshua Bell Track 1: “Danse Russe” Track 2: “Violin Concerto in E Minor” Track 3: “Nocturne in C-sharp Minor” etc.

© 2007 Progress Software Corporation30 ARCH-3: Database Design A Practical Guide

Example: Now What’s wrong?

This is getting messy Activities combined with disc’s attributes Have duplicated information How many tracks can there be? What if there is more than one artist? Don’t have all the information a customer

might want to use to search

© 2007 Progress Software Corporation31 ARCH-3: Database Design A Practical Guide

Discs revisited

Discs have titles Discs have pictures on the cover Discs contain tracks Discs are made by manufacturers Discs are purchased from distributors Discs are ordered from distributors Discs are delivered to the store Discs are sold to customers

© 2007 Progress Software Corporation32 ARCH-3: Database Design A Practical Guide

“Discs contain tracks …”

Tracks contain songs Tracks occur in order Tracks have a duration Songs are performed in performances Songs have performers (usually) Songs have composers Songs have names (titles) Songs have a key (but not always) Performances are done by performers Performers can be groups (bands, orchestras, etc.) Performances are performed in a location or venue

© 2007 Progress Software Corporation33 ARCH-3: Database Design A Practical Guide

We seem to need these entities

Discs Manufacturers Distributors Orders Customers Inventory

Tracks Songs Performers Groups ?

© 2007 Progress Software Corporation34 ARCH-3: Database Design A Practical Guide

Songs have names (titles).

Are names properties of songs?

Or are they entities related to songs?

Or are they something else?

© 2007 Progress Software Corporation35 ARCH-3: Database Design A Practical Guide

Song data (track 1)

Title “Danse Russe” from Swan Lake, Op.20

Time 4:30

Composer Peter Tchaikovsky

Category Classical, violin, orchestra

Performers Joshua Bell, Michael Tilson Thomas, Berlin Philharminic Orchestra

Track number 1

Disc upc 8697-07416-2

© 2007 Progress Software Corporation36 ARCH-3: Database Design A Practical Guide

Song data (track 2)

Title Violin Concerto in E Minor, Op. 64

Time 6:27

Composer Felix Mendelssohn

Category Classical, violin, orchestra

Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg

Track number 2

Disc upc 8697-07416-2

© 2007 Progress Software Corporation37 ARCH-3: Database Design A Practical Guide

Performance data

Title Violin Concerto in E Minor, Op. 64

Time 6:27

Composer Felix Mendelssohn

Category Classical, violin, orchestra

Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg

© 2007 Progress Software Corporation38 ARCH-3: Database Design A Practical Guide

Performance data take 2

Title Violin Concerto in E Minor, Op. 64

Time 6:27

Composer Felix Mendelssohn

Category Classical, violin, orchestra

Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg

Performance Date

?

Performance Location

?

© 2007 Progress Software Corporation39 ARCH-3: Database Design A Practical Guide

Performer data

id name

1 Joshua Bell

2 Sir Roger Norrington

3 Camerata Salzburg

4 Michael Tilson Thomas

5 Berlin Philharmonic

6 Bono

7 The Edge

8 Adam Clayton

9 Larry Mullen

© 2007 Progress Software Corporation40 ARCH-3: Database Design A Practical Guide

Performance to Performer Relationship

performance id performer id

1 1

1 2

1 3

1 …

2 1

2 4

2 5

2 …

325 6

325 7

325 8

325 9

© 2007 Progress Software Corporation41 ARCH-3: Database Design A Practical Guide

Performance data take 3

Performance id 2

Title Violin Concerto in E Minor, Op. 64

Time 6:27

Composer Felix Mendelssohn

Category Classical, violin, orchestra

© 2007 Progress Software Corporation42 ARCH-3: Database Design A Practical Guide

Track to Performance Relationship

Disc upc Track Num Performance id

8697-07416-2 1 1

8697-07416-2 2 2

… … …

314-510347-2 1 325

© 2007 Progress Software Corporation43 ARCH-3: Database Design A Practical Guide

Relationships (so far):

disctrack

track

track

trackperformance

performance

performance

performance

performer

performerone to one

one to many

many to many

© 2007 Progress Software Corporation44 ARCH-3: Database Design A Practical Guide

What happened to Songs?

© 2007 Progress Software Corporation45 ARCH-3: Database Design A Practical Guide

Relationships (take 2):

disctrack

track

track

tracksong

performance

performance

performance

performer

performer

one to one

one to many

many to many

performance

performance

performance

song

one to many

© 2007 Progress Software Corporation46 ARCH-3: Database Design A Practical Guide

Relationships (take 3):

disc

track

track

track

performance

performance

performance

performer

performer

song

song

song

© 2007 Progress Software Corporation47 ARCH-3: Database Design A Practical Guide

What about“business entities”

?

Where are they?

© 2007 Progress Software Corporation48 ARCH-3: Database Design A Practical Guide

Business entities

disc

track

track

track

performance

performance

performance

performer

performer

song

song

song

© 2007 Progress Software Corporation49 ARCH-3: Database Design A Practical Guide

Business entities

disc

track

track

track

performance

performance

performance

performer

performer

song

song

song

© 2007 Progress Software Corporation50 ARCH-3: Database Design A Practical Guide

Business entities

disc

track

track

track

performance

performance

performance

performer

performer

song

song

song

© 2007 Progress Software Corporation51 ARCH-3: Database Design A Practical Guide

Should you use arrays?

© 2007 Progress Software Corporation52 ARCH-3: Database Design A Practical Guide

Indexes

Enforce uniqueness Make searches faster Enable fast retrieval of entities by their

identities Enable finding entities with certain attributes

© 2007 Progress Software Corporation53 ARCH-3: Database Design A Practical Guide

What indexes do we needfor the music store database?

© 2007 Progress Software Corporation54 ARCH-3: Database Design A Practical Guide

Tables

0) Discs1) Tracks2) Songs3) Performers4) Performances5) Tracks of discs6) Performances of songs7) Performers of performances

© 2007 Progress Software Corporation55 ARCH-3: Database Design A Practical Guide

What indexes do we need

0) Indexes for identifying attributes1) A unique row identifier2) Indexes for the queries you will do

© 2007 Progress Software Corporation56 ARCH-3: Database Design A Practical Guide

What should we do next ?

© 2007 Progress Software Corporation57 ARCH-3: Database Design A Practical Guide

Other Topics

Normalization Unique keys Word indexes Naming Customisation

© 2007 Progress Software Corporation58 ARCH-3: Database Design A Practical Guide

Normalization

Oversimplified, it means:• Don’t duplicate data

Attributes should be simple• have only one value• be necessary• not derived data• don’t repeat

Complicated attributes are often entities in their own right• For example, addresses might be

© 2007 Progress Software Corporation59 ARCH-3: Database Design A Practical Guide

Unique keys

EVERY table must have a unique key EVERY row needs a unique identifier

• that never changes even if moved to another database (i.e. if you replicate)

Often, users don’t need to see it Use a UUID or sequence or maybe datetime Unique key is the ONLY way to identify rows

unambiguously ROWID’s are temporary and can change Use the same method throughout

• You’ll be glad you did

© 2007 Progress Software Corporation60 ARCH-3: Database Design A Practical Guide

Word indexes

Can be used to hold multiple status or attribute values• Conflicts with normalisation• Flexible

Easy to add new ones Queries are fast

Example:• Category: classical, violin, orchestral, concerto

© 2007 Progress Software Corporation61 ARCH-3: Database Design A Practical Guide

Naming

• What is in the column “GL01262” ?

Good names are crucial to understanding

© 2007 Progress Software Corporation62 ARCH-3: Database Design A Practical Guide

Naming

Table and column names should have clear meanings everyone can understand• “GL01262” vs “dateEntered”

Names with dashes cause inconvenience with SQL• “order-date”

Booleans should be named for truth value• “backOrdered”

No double negations• “notOutOfStock”

Good names are crucial to understanding

© 2007 Progress Software Corporation63 ARCH-3: Database Design A Practical Guide

Making tables customizable

Spare columns Separate table with spare columns Separate table with name/value pairs

We will look at 3 ways:

© 2007 Progress Software Corporation64 ARCH-3: Database Design A Practical Guide

Spare columns in table

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

extra1 extra2 extra3

frozen ? 0.0

? 125.46 0.12

? ? ?

© 2007 Progress Software Corporation65 ARCH-3: Database Design A Practical Guide

Spare columns in table

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

extra1 extra2 extra3

frozen ? 0.0

? 125.46 0.12

? ? ?

What data types should you use?How many spare columns?Wasted columns when not usedHow do you know what each spare got used for?How do you know how many unused spares you have?

© 2007 Progress Software Corporation66 ARCH-3: Database Design A Practical Guide

Separate table for spare columns

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

custnum extra1 extra2 extra3

001 frozen ? 0.0

002 ? 125.46 0.12

© 2007 Progress Software Corporation67 ARCH-3: Database Design A Practical Guide

Separate table for spare columns

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

custnum status owed discount

001 frozen ? 0.0

002 ? 125.46 0.12

© 2007 Progress Software Corporation68 ARCH-3: Database Design A Practical Guide

Separate table with name/value pairs

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

custnum name value

001 status frozen

002 owed 125.46

002 discount 0.12

© 2007 Progress Software Corporation69 ARCH-3: Database Design A Practical Guide

Modeling Tools

PCase Enterprise Architect Power Designer ConceptDraw Erwin Rational

Pencil and paper !

Blackboard !

© 2007 Progress Software Corporation70 ARCH-3: Database Design A Practical Guide

Summary

Understand the requirements Leave out what is not needed Review the design with stakeholders Evolve the design as changes come up Test to make sure it works

• Can it do everything that is needed?

• Does it perform adequately?

Expect changes to come

© 2007 Progress Software Corporation71 ARCH-3: Database Design A Practical Guide

Homework

Papers• Wiles, A.: "Modular elliptic curves and Fermat's Last

Theorem”, Annals of Mathematics 141 (3): 443-551• Chen, P.: “The Entity-Relationship Model -- Toward a

Unified View of Data”, ACM TODS Vol 1, No 1, 1976 Wikipedia articles to start from:

• entity-relationship model• data model

Books:• Teorey, Lightstone, Nadeau: “Database Modeling and

Design”, Morgan Kaufmann.

© 2007 Progress Software Corporation72 ARCH-3: Database Design A Practical Guide

Questions