32
CS 4432 lecture #4 1 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 1

CS4432: Database Systems IILecture #5

Professor Elke A. Rundensteiner

Page 2: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 2

TODAY

• Lecture on chapter 3 for ~30 min

• Then introduction to project 1 ~20 min

• Project-1 team formation later today (watch for it on my.wpi)

Page 3: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 3

Storage Layout :How to lay out data on disk. ( chapter 3)

Page 4: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 4

Data Items

Records

Blocks

Files

Memory

Overview

Page 5: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 5

Record - Collection of related data

items (called FIELDS)

E.g.: Employee record:name field,salary field,date-of-hire field,

...

Page 6: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 6

Types of records:

• Main choices:– FIXED vs VARIABLE FORMAT– FIXED vs VARIABLE LENGTH

Page 7: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 7

A SCHEMA contains information such as:

- # fields (attributes)- type of each field (length)- order of attributes in record- meaning of each field

(domain) - constraints (primary key, etc).

Fixed format

Not associated with each record.

Page 8: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 8

Example: fixed format and length

Employee record(1) E#, 2 byte integer(2) E.name, 10 char. Schema(3) Dept, 2 byte code

We can simply concatenate fields.

55 s m i t h 02

83 j o n e s 01

Records

Page 9: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 9

• What : Not all fields are included in the record, and possibly in different orders.

• Then : Record itself must contain format, i.e., it is “Self Describing”:

Variable format

Page 10: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 10

Why Variable format ?

• “sparse” records• repeating fields• evolving formats

Page 11: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 11

Example: variable format and length

4I52 4S DROF46

Field name codes could also be strings, i.e. TAGS

# F

ield

s

Cod

e id

enti

fyin

g

field

as

E#

Inte

ger

typ

e

Cod

e f

or

En

am

eS

trin

g t

yp

eLe

ng

th o

f st

r.

Page 12: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 12

• EXAMPLE: variable format record with repeating

fieldse.g., Employee has one or more children

3 E_name: Fred Child: SallyChild: Tom

• Do repeating fields always require variable format and length?

Page 13: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 13

e.g., a person and her hobbies.

Mary Sailing Chess --

•Then must allocate maximum number of

repeating fields (if not used, set to null)

Page 14: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 14

Many variants betweenfixed - variable format:

Example1: Include record type in record

record type record lengthtells me whatto expect(i.e., points to schema)

5 27 . . . .

Page 15: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 15

Record header - data at beginning

that describes record

May contain:- pointer to schema (record type)- length of record - time stamp (create time, mod.

time)- other stuff (e.g., ROW-ID in

Oracle)

Page 16: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 16

Example2: Variant btw FIXED/VAR format

• Hybrid format : one part is fixed, other is variable

E.g.: All employees have E#, name, dept; andother fields vary.

25 Smith Toy 2 retiredHobby:chess

# of varfields

Page 17: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 17

Also, many variations in internal

organization of record

Just to show one: length of field

3 F310 F1 5 F2 12 * * *

3 32 5 15 20 F1 F2 F3

total size

offsets

0 1 2 3 4 5 15 20

Page 18: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 18

Question:We have seen examples for :

* Fixed format and length records* Variable format and length records

(a) Does fixed format and variable length

make sense?(b) Does variable format and fixed length

make sense?

Page 19: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 19

Data Items

Records

Blocks

Files

Memory

Next:

Page 20: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 20

Next: placing records into blocks

blocks ...

a file

assume fixedlength blocks

assume a single file (for now)

Page 21: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 21

(1) separating records(2) spanned vs. unspanned(3) mixed record types – clustering(4) split records(5) sequencing(6) indirection

Options for storing records in blocks:

Page 22: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 22

Block

(a) no need to separate - fixed size recs.(b) special marker(c) give record lengths (or offsets)

- within each record- in block header

(1) Separating records

R2R1 R3

Page 23: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 23

• Unspanned: records must be within one block

block 1 block 2

...

• Spannedblock 1 block 2

...

(2) Spanned vs. Unspanned

R1 R2

R1

R3 R4 R5

R2 R3(a)

R3(b) R6R5R4 R7

(a)

Page 24: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 24

• Unspanned is much simpler, but may waste space…

• Spanned essential if record size > block size

Spanned vs. unspanned:

Page 25: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 25

Example

106 recordseach of size 2,050 bytes (fixed)block size = 4096 bytes

block 1 block 2

2050 bytes wasted 2046 2050 bytes wasted 2046

R1 R2

• Utiliz = 50% -> ½ of space is wasted

Page 26: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 26

• Mixed - records of different types(e.g. EMPLOYEE, DEPT)allowed in same block

e.g., a block

(3) Mixed record types

EMP e1 DEPT d1 DEPT d2

Page 27: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 27

Why do we want to mix?

• Answer: CLUSTERING

Records that are frequently accessed together should bein the same block

• Problems

Creates variable length records in block

Must avoid duplicates (how to cluster?)

Insert/deletes are harder

Page 28: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 28

Example Clustering

Q1: select A#, C_NAME, C_CITY, …from DEPOSIT, CUSTOMERwhere DEPOSIT.C_NAME =

CUSTOMER.C.NAME

a blockCUSTOMER,NAME=SMITH

DEPOSIT,NAME=SMITH

DEPOSIT,NAME=SMITH

Page 29: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 29

• If Q1 frequent, clustering good• But if Q2 frequent

Q2: SELECT * FROM CUSTOMER

CLUSTERING IS COUNTER PRODUCTIVE

Page 30: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 30

Compromise:

No mixing, but keep relatedrecords in same cylinder ...

Page 31: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #4 31

(1) Separating records(2) Spanned vs. Unspanned(3) Mixed record types - Clustering(4) Split records(5) Sequencing(6) Indirection

Recap: Storing records in blocks

Page 32: CS 4432lecture #41 CS4432: Database Systems II Lecture #5 Professor Elke A. Rundensteiner

CS 4432 lecture #5 32

Now on to Project 1 …