Upload
jessie-mckinney
View
218
Download
0
Embed Size (px)
Citation preview
1
SSYYSSTTEEMMSS
DDEESSIIGGNN
AANNAALLYYSSIISS Chapter 17Chapter 17
Data Modeling
Jerry Post
Copyright © 1997
2
SSYYSSTTEEMMSS
DDEESSIIGGNN
Data FilesData Files
Disk drives Platters spin Head moves/rotates Tracks Sectors
File Systems Files Directories/Folders File Allocation Table
Maps logical files to sectors Every file is split into pieces Scattered across the disk Minimum sector size
Old DOS: 32K on 2GB Drive always retrieves
fixed length sectors.
Disk spins Head moves
Tracks
Sectors
Directory EntryFile NameFile ExtensionFile attributeTime of updateDate of updateBeginning disk cluster/sectorFile sizeSecurity attributes
3
SSYYSSTTEEMMSS
DDEESSIIGGNN
Disks: RAIDDisks: RAID
Speed limitations Rotational speed Drive head speed Bus transfer rates
RAID: Redundant Array of Independent (Inexpensive) Drives Store file across many
drives (striping) Some deliberate duplication Speed/parallel Parallel searches
RAID: seen as one drive
FileSector 1Sector 2Sector 3Sector 4Sector 5…
4
SSYYSSTTEEMMSS
DDEESSIIGGNN
Data FilesData Files
Master Sorted Current data totals e.g., Inventory
Transaction Change log Updates to master
Report files (output) Initialization files (software)
Need to separate data storage from physical location. Backup and Restore Changes in physical drive
File Organization Sequential Indexed Linked Lists Hashed
5
SSYYSSTTEEMMSS
DDEESSIIGGNN
Sequential StorageSequential Storage
Common uses When large portions of the
data are always used at one time. e.g., 25%
When table is huge and space is expensive.
When transporting / converting data to a different system.
ID LastName FirstName DateHired1 Reeves Keith 1/29/962 Gibson Bill 3/31/963 Reasoner Katy 2/17/964 Hopkins Alan 2/8/965 James Leisha 1/6/966 Eaton Anissa 8/23/967 Farris Dustin 3/28/968 Carpenter Carlos 12/29/969 O'Connor Jessica 7/23/9610 Shields Howard 7/13/96
6
SSYYSSTTEEMMSS
DDEESSIIGGNN
Indexed SequentialIndexed Sequential
Common uses Large tables. Need many sequential lists. Some random search--with
one or two key columns. Hold index in RAM if
possible/speed.
ID LastName FirstName DateHired1 Reeves Keith 1/29/962 Gibson Bill 3/31/963 Reasoner Katy 2/17/964 Hopkins Alan 2/8/965 James Leisha 1/6/966 Eaton Anissa 8/23/967 Farris Dustin 3/28/968 Carpenter Carlos 12/29/969 O'Connor Jessica 7/23/9610 Shields Howard 7/13/96
A11A22A32A42A47A58A63A67A78A83
Address
LastName PointerCarpenter A67Eaton A58Farris A63Gibson A22Hopkins A42James A47O'Connor A78Reasoner A32Reeves A11Shields A83
7
SSYYSSTTEEMMSS
DDEESSIIGGNN
Linked ListsLinked Lists
Separate each element/key. Pointers to next element. Starting point.
CarpenterB87 B29
GibsonB38 00
EatonB29 B71
FarrisB71 B38
8
SSYYSSTTEEMMSS
DDEESSIIGGNN
Insert into a Linked ListInsert into a Linked List
Get space/location with address. Data: Save row (A97). Key: Save key and pointer
to data (B14).
Find insert location. Eccles would be after Eaton
and before Farris. From prior key (Eaton), put
next address (B71) into new key, next pointer.
Put new address (B14) in prior key, next pointer.
FarrisB71 B38 A63
EatonB29 B71 A58
EcclesB14 B71 A97
NewData = new (. . .)NewKey = new (. . .)NewKey->Key = “Eccles”NewKey->Data = NewData
FindInsertPoint(List, PriorKey, NewKey)
NewKey->Next = PriorKey->NextPriorKey->Next = NewKey
B14
9
SSYYSSTTEEMMSS
DDEESSIIGGNN
Direct Access / HashedDirect Access / Hashed
Convert key value directly to location (relative or absolute). Use prime modulus
Choose prime number greater than expected database size (n).
Divide and use remainder.
Set aside spaces (fixed-length) to hold each row.
Collision/overflow space for duplicates.
Extremely fast retrieval. Very poor sequential access. Reorganize if out of space!
Example Prime = 101 Key = 528 Modulus = 23
Overflow/collisions
10
SSYYSSTTEEMMSS
DDEESSIIGGNN
Why Normalization?Why Normalization?
Need standardized data definition Advantages of DBMS require careful design Define data correctly and the rest is much easier It especially makes it easier to expand database later Method applies to most models and most DBMS
Similar to Entity-Relationship Similar to Objects (without inheritance and methods) Goal: Define tables carefully
Save space Minimize redundancy Protect data
11
SSYYSSTTEEMMSS
DDEESSIIGGNN
NotationNotation
Table name
Primary key is underlined
Table columns
Customer(CustomerID, Phone, Name, Address, City, State, ZipCode)
CustomerID Phone LastName FirstName Address City State Zipcode
1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 421222 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 421013 502-777-7575 Washington Elroy 95 Easy Street Smith’s Grove KY 421714 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 421225 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 421026 616-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 371487 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 371488 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 370319 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 4272110 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127
12
SSYYSSTTEEMMSS
DDEESSIIGGNN
Sample: Video DatabaseSample: Video Database
Repeating sectionPossible Keys
13
SSYYSSTTEEMMSS
DDEESSIIGGNN
Initial ObjectsInitial Objects
Customers Key: Assign a CustomerID Sample Properties
Name Address Phone
Videos Key: Assign a MovieID Sample Properties
Title RentalPrice Rating Description
RentalTransaction Event/Relationship Key: Assign TransactionID Sample Properties
CustomerID Date
VideosRented Event/Repeating list Keys: TransactionID +
MovieID Sample Properties
VideoCopy#
14
SSYYSSTTEEMMSS
DDEESSIIGGNN
Initial Form EvaluationInitial Form Evaluation
Collect forms from users Write down properties Find repeating groups ( . . .) Look for potential keys: key Identify computed values Notation makes it easier to
identify and solve problems Results equivalent to
diagrams, but will fit on one or two pages
RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode,(VideoID, Copy#, Title, Rent ) )
15
SSYYSSTTEEMMSS
DDEESSIIGGNN
Problems with Repeating SectionsProblems with Repeating Sections
RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode,(VideoID, Copy#, Title, Rent ) )
TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title Rent1 4/18/95 3 Washington 502-777-7575 95 Easy Street 1 2 2001: A Space Odyssey $1.501 4/18/95 3 Washington 502-777-7575 95 Easy Street 6 3 Clockwork Orange $1.502 4/30/95 7 Lasater 615-888-4474 67 S. Ray Drive 8 1 Hopscotch $1.502 4/30/95 7 Lasater 615-888-4474 67 S. Ray Drive 2 1 Apocalypse Now $2.002 4/30/95 7 Lasater 615-888-4474 67 S. Ray Drive 6 1 Clockwork Orange $1.503 4/18/95 8 Jones 615-452-1162 867 Lakeside Drive 9 1 Luggage Of The Gods $2.503 4/18/95 8 Jones 615-452-1162 867 Lakeside Drive 15 1 Fabulous Baker Boys $2.003 4/18/95 8 Jones 615-452-1162 867 Lakeside Drive 4 1 Boy And His Dog $2.504 4/18/95 3 Washington 502-777-7575 95 Easy Street 3 1 Blues Brothers $2.004 4/18/95 3 Washington 502-777-7575 95 Easy Street 8 1 Hopscotch $1.504 4/18/95 3 Washington 502-777-7575 95 Easy Street 13 1 Surf Nazis Must Die $2.504 4/18/95 3 Washington 502-777-7575 95 Easy Street 17 1 Witches of Eastwick $2.00
Repeating Section
Causes duplication
Storing data in this raw form would not work very well. For example, repeating sections will cause problems.
Note the duplication of data.
Also, what if a customer has not yet checked out a movie--where do we store that customer’s data?
16
SSYYSSTTEEMMSS
DDEESSIIGGNN
Problems with Repeating SectionsProblems with Repeating Sections
Store repeating data Allocate space How much?
Can’t be short Wasted space
e.g., How many videos will be rented at one time?
A better definition eliminates this problem.
NamePhoneAddressCityStateZipCode
VideoID Copy# Title Rent1. 6 1 Clockwork Orange 1.502. 8 2 Hopscotch 1.503. 4. 5.
{Unused Space}
Not in First Normal Form
Customer Rentals
17
SSYYSSTTEEMMSS
DDEESSIIGGNN
First Normal FormFirst Normal Form
Remove repeating sections Split into two tables Bring key from main and repeating section
RentalLine(TransID, VideoID, Copy#, . . .) Each transaction can have many videos (key VideoID) Each video can be rented on many transactions (key TransID) For each TransID and VideoID, only one Copy# (no key on Copy#)
RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) )
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)
RentalLine(TransID, VideoID, Copy#, Title, Rent )
18
SSYYSSTTEEMMSS
DDEESSIIGGNN
First Normal Form Problems (Data)First Normal Form Problems (Data)
1NF splits repeating groups Still have problems
Replication Hidden dependency: If a video has not been
rented yet, then what is its title?
TransID RentDate CustID Phone LastName FirstName Address City State ZipCode1 4/18/95 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 421712 4/30/95 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 371483 4/18/95 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 370314 4/18/95 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171
TransID VideoID Copy# Title Rent1 1 2 2001: A Space Odyssey $1.501 6 3 Clockwork Orange $1.502 8 1 Hopscotch $1.502 2 1 Apocalypse Now $2.002 6 1 Clockwork Orange $1.503 9 1 Luggage Of The Gods $2.503 15 1 Fabulous Baker Boys $2.003 4 1 Boy And His Dog $2.504 3 1 Blues Brothers $2.004 8 1 Hopscotch $1.504 13 1 Surf Nazis Must Die $2.504 17 1 Witches of Eastwick $2.00
19
SSYYSSTTEEMMSS
DDEESSIIGGNN
Second Normal Form DefinitionSecond Normal Form Definition
Each non-key column must depend on the entire key. Only applies to
concatenated keys Some columns only depend
on part of the key Split those into a new table.
Dependence (definition) If given a value for the key
you always know the value of the property in question, then that property is said to depend on the key.
If you change part of a key and the questionable property does not change, then the table is not in 2NF.
RentalLine(TransID, VideoID, Copy#, Title, Rent)
Depend only on VideoID
Depends on both TransID and VideoID
20
SSYYSSTTEEMMSS
DDEESSIIGGNN
Second Normal Form ExampleSecond Normal Form Example
Title depends only on VideoID Each VideoID can have only one title
Rent depends on VideoID This statement is actually a business rule. It might be different at different stores. Some stores might charge a different rent for each video depending
on the day (or time).
Each non-key column depends on the whole key.
RentalLine(TransID, VideoID, Copy#, Title, Rent)
VideosRented(TransID, VideoID, Copy#)
Videos(VideoID, Title, Rent)
21
SSYYSSTTEEMMSS
DDEESSIIGGNN
Second Normal Form Example (Data)Second Normal Form Example (Data)
TransID VideoID Copy#1 1 21 6 32 2 12 6 12 8 13 4 13 9 13 15 14 3 14 8 14 13 14 17 1
VideoID Title Rent1 2001: A Space Odyssey $1.502 Apocalypse Now $2.003 Blues Brothers $2.004 Boy And His Dog $2.505 Brother From Another Planet $2.006 Clockwork Orange $1.507 Gods Must Be Crazy $2.008 Hopscotch $1.50
VideosRented(TransID, VideoID, Copy#)
Videos(VideoID, Title, Rent)
RentalForm2(TransID, RentDate, CustomerID, Phone,Name, Address, City, State, ZipCode)
(Unchanged)
22
SSYYSSTTEEMMSS
DDEESSIIGGNN
Second Normal Form Problems (Data)Second Normal Form Problems (Data)
Even in 2NF, problems remain Replication Hidden dependency If a customer has not rented a video yet, where do we store
their personal data?
Solution: split table.
TransID RentDate CustID Phone LastName FirstName Address City State ZipCode1 4/18/95 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 421712 4/30/95 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 371483 4/18/95 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 370314 4/18/95 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)
23
SSYYSSTTEEMMSS
DDEESSIIGGNN
Third Normal Form DefinitionThird Normal Form Definition
Each non-key column must depend on nothing but the key. Some columns depend on
columns that are not part of the key.
Split those into a new table. Example: Customers name
does not change for every transaction.
Dependence (definition) If given a value for the key
you always know the value of the property in question, then that property is said to depend on the key.
If you change the key and the questionable property does not change, then the table is not in 3NF.
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)
Depend only on CustomerID
Depend on TransID
24
SSYYSSTTEEMMSS
DDEESSIIGGNN
Third Normal Form ExampleThird Normal Form Example
Customer attributes depend only on Customer ID Split them into new table (Customer) Remember to leave CustomerID in Rentals table. We need to be able to reconnect tables.
3NF is sometimes easier to see if you identify primary objects at the start--then you would recognize that Customer was a separate object.
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)
Rentals(TransID, RentDate, CustomerID )
Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )
25
SSYYSSTTEEMMSS
DDEESSIIGGNN
Third Normal Form Example DataThird Normal Form Example Data
TransID RentDate CustomerID1 4/18/95 32 4/30/95 73 4/18/95 84 4/18/95 3
CustomerID Phone LastName FirstName Address City State ZipCode1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 421222 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 421013 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 421714 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 421225 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 421026 615-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 371487 615-888-4474 Lasater Les 67 S. Ray DrivePortland TN 371488 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 370319 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 4272110 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127
Rentals(TransID, RentDate, CustomerID )
Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )
(Unchanged)
VideosRented(TransID, VideoID, Copy#)
Videos(VideoID, Title, Rent)
26
SSYYSSTTEEMMSS
DDEESSIIGGNN
Third Normal Form Tables (3NF)Third Normal Form Tables (3NF)
Rentals(TransID, RentDate, CustomerID )
Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )
VideosRented(TransID, VideoID, Copy#)
Videos(VideoID, Title, Rent)
27
SSYYSSTTEEMMSS
DDEESSIIGGNN
Checking Your Work (Quality Control)Checking Your Work (Quality Control)
Look for one-to-many relationships. Many side should be keyed (underlined). e.g., VideosRented(TransID, VideoID, . . .). Check each column and ask if it should be 1 : 1 or 1: M. If add a key, renormalize.
Verify no repeating sections (1NF) Check 3NF
Check each column and ask: Does it depend on the whole key and nothing but the key?
Verify that the tables can be reconnected (joined) to form the original tables (draw lines).
Each table represents one object. Enter sample data--look for replication.