40
Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Embed Size (px)

Citation preview

Page 1: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Spring 2007 Midterm 1 ReviewLectures 2-10

Cow book Chapters 1,3,4,5,8,9,10,11

Page 2: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Administrivia

• Midterm 1 – in class this Thursday!– Closed book examination– You will be allowed one 8.5” x 11”

sheet of notes (double sided).

• Sample questions on class web site

Page 3: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Review Outline

• Relational Data Model, Algebra, Calculus and SQL

• Storage, Buffer Management and Indexes

Page 4: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Review: DBMS components

Query Optimizationand Execution

Relational Operators

Access Methods

Buffer Management

Disk Space Management

DB

•Makes efficient use of disk space

-> Think 300,000,000 accounts!

•Makes efficient use of RAM

-> Think 1,000,000 simultaneous requests!

•Provides generic ways to combine data

-> Do you want a list of customers and accounts or the total account balance of all customers?

•Figures out the best way to answer a question

-> There is always more than 1 way to skin a cat…!

•Provides efficient ways to extract data

-> Do you need 1 record or a bunch?

Database application

•Talks to DBMS to manage data for a specific task

-> e.g. app to withdraw/deposit money or provide a history of the account

Page 5: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Review: ACID properties

• A DBMS ensures a database has ACID properties:

• Atomicity – nothing is ever half baked; database changes either happen or they don’t.

• Consistency – you can’t peek at the data til it is baked; database changes aren’t visible until they are committed

• Isolation – concurrent operations have an explainable outcome; multiple users can operate on a database without conflicting

• Durability – what’s done is done; once a database operation completes, it remains even if the database crashes

Page 6: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Review: Relational Data Model

• Most widely used data model.

• Relation: made up of 2 parts:– Schema : specifies name of relation, plus name and

type of each column. • e.g. Students(sid: string, name: string, login: string, age:

integer, gpa: real) – Instance : a table, with rows and columns described

by the schema• Introduced data independence

– Data layout on disk can change without affecting applications using the data

• Keys contribute to data independence– Relationships are determined by field value, not

physical pointers!

Page 7: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Review: Bank of Middle Earth

CustomerID Name Address AccountID

314159 Frodo Baggins

BagEnd 112358

271828 Sam Gamgee

BagShot Row

132124

42 Bilbo Baggins

Rivendell 112358

Account ID

Balance

112358 4500.00

132124 2000.00

Give me an example of…•A super key for Accounts•Good primary key choices for both•A foreign key •A possible check constraint

ALTER TABLE ACCOUNTSADD CONSTRAINT CHECK_BALCHECK (BALANCE>= 0)

Page 8: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Review: Query Languages• Query languages provide 2 key advantages:

– Less work for user asking query– More opportunities for optimization

• Algebra and safe calculus are simple and powerful models for query languages for relational model – Have same expressive power– Algebra is more operational; calculus is more declarative

• SQL can express every query that is expressible in relational algebra/calculus. (and more)

• Two sublanguages:– DDL – Data Definition Language

• Define and modify schema (at all 3 levels)– DML – Data Manipulation Language

• Queries and IUD (insert update delete)

Page 9: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Review: Basic DDLCustomerID

Name Address AccountID

314159 Frodo Baggins

BagEnd 112358

271828 Sam Gamgee

BagShot Row

132124

42 Bilbo Baggins

Rivendell 112358

Account ID

Balance

112358 4500.00

132124 2000.00

CREATE TABLE CUSTOMERS (CustomerID INTEGER NOT NULL, Name VARCHAR(128), Address VARCHAR(256), AccountID INTEGER, PRIMARY KEY(CustomerID), FOREIGN KEY(AccountId) REFERENCES ACCOUNTS);

CREATE TABLE ACCOUNTS (AccountID INTEGER NOT NULL, Balance Double, PRIMARY KEY (AccountID));

• Why do we need NOT NULL?• What would happen if I executed these commands in this order?

Page 10: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Additional operations:•Intersection ()•Join ( ) •Division ( / )

Relational Algebra Review

sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.5 58 rusty 10 35.0

bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red

sid bid day

22 101 10/10/9658 103 11/12/96

Reserves Sailors Boats

Basic operations:•Selection ( σ ) •Projection ( π ) •Cross-product ( ) •Set-difference ( — ) •Union ( )

:tuples that appear in both relations.:like but only keep tuples where common fields are equal.:tuples from relation 1 with matches in relation 2

: gives a subset of rows.: deletes unwanted columns.: combine two relations.: tuples in relation 1, but not 2 : tuples in relation 1 appended with tuples in relation 2.

Page 11: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Relational Algebra Review

sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.5 58 rusty 10 35.0

bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red

sid bid day

22 101 10/10/9658 103 11/12/96

Reserves Sailors Boats

Find names of sailors who’ve reserved a green boat

σ ( color=‘Green’Boats)

( Sailors)

π ( sname )

( Reserves)π( bid )

π( sid )

Page 12: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

πsname ( )

sid sname rating age

1 Frodo 7 22

2 Bilbo 2 39

3 Sam 8 27

Relational Algebra Review

Find names of sailors who’ve reserved all boats•First use division and renaming to find sids of sailors who reserved all boats•Then join result with sailors and project to get their names

ρ (Tempsids, )

( Tempsids Sailors)

( sid,bid Reserves)π ( bid Boats)π Tempsids

sid

bid day

1 103 9/12

2 103 9/13

3 103 9/14

3 101 9/12

1 103 9/13

SailorsReserves

bid bname color

101 Nina red

103 Pinta blue

Boats

sid

bid

1 103

2 103

3 101

3 103

/ bid

101

103

=sid

3

Page 13: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Relational Calculus Review• Variables

TRC: Variables are bound to tuples.DRC: Variables are bound to domain elements (= column

values)

• Constants7, “Foo”, 3.14159, etc.

• Comparison operators=, <>, <, >, etc.

• Logical connectives - not– and - or - implies - is a member of

• QuantifiersX(p(X)): For every X, p(X) must be trueX(p(X)): There exists at least one X such that p(X) is true

Page 14: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Relational Calculus Review Find names of sailors who

have reserved a green boat sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.5 58 rusty 10 35.0

sid bid day

22 101 10/10/9658 103 11/12/96

S

S

S

R

R

bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red

Boats

{ N | S Sailors (S.name = N.name R Reserves(S.sid = R.sid B Boats(B.color = “Green” B.bid = R.bid)))}

Sailors

Reserves

sname

NB

B

B

B

rusty

Page 15: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

sid sname rating age

1 Frodo 7 22

2 Bilbo 2 39

3 Sam 8 27

Relational Calculus Review

Find names of sailors who’ve reserved all boats

sid

bid day

1 103 9/12

2 103 9/13

3 103 9/14

3 101 9/12

1 103 9/13

SailorsReserves

bid bname color

101 Nina red

103 Pinta blue

Boats

{N | SSailors (S.name = N.name BBoats (RReserves (S.sid = R.sid B.bid = R.bid))}

S

sname

N Sam

B

R

R

R

R

R

BS

S

Page 16: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Basic SQL Query

SELECT [DISTINCT] target-listFROM relation-listWHERE qualification

relation-list : A list of relation names, possibly with a range-variable after each name

target-list : A list of attributes of tables in relation-list

DISTINCT: optional keyword indicating answer should not contain duplicates.

In SQL, default is that duplicates are not eliminated! (Result is called a “multiset”)

qualification : Comparisons combined using AND, OR and NOT. Comparisons are Attr op const or Attr1 op Attr2, where op is one of etc.

Page 17: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Set Operators in SQL• UNION

– Returns the UNION of two sets with (same arity)– UNION ALL retains duplicates in result

• INTERSECT– Returns the INTERSECTION of two sets (with same arity)

• EXCEPT– Set difference: A EXCEPT B returns tuples in A but not B

• IN/NOT IN– A in B is true if A is a subset of B

• EXISTS/NOT EXISTS– True if expression evaluates to a set with at least one member

• UNIQUE/NOT UNIQUE– True if expression evaluates to a set with no duplicates

• Value <comparison op> ANY/ALL– Value > ANY A is true if A contains at least one member that

makes the comparison true– Value > ALL B is true if all members of A make the comparison

true

Set operators are almost always used with nested queries

Page 18: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

SQL Review: Nested query

sid bid day

1 103 9/12

2 103 9/13

1 103 9/13

sid

sname

rating

age

1 Frodo

7 22

2 Bilbo 2 39

3 Sam 8 27

SailorsReserves

S

S

S

Find the names of sailors who’ve reserved boat #103 exactly once

SELECT S.snameFROM Sailors SWHERE UNIQUE (SELECT sid, bid FROM Reserves R WHERE R.bid=103 AND S.sid=R.sid)321

Page 19: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Aggregate Operators• Very powerful; enables computations over sets of

tuples

SELECT AVG (S.age)FROM Sailors SWHERE S.rating=10

SELECT COUNT (*)FROM Sailors S

• COUNT: returns a count of tuples in the set

• AVG: returns average of column values in the set

• SUM: returns sum of column values in the set

• MIN, MAX: returns min (max) value of column values in a set.

• DISTINCT can be added to COUNT, AVG, SUM to perform computation only over distinct values.

SELECT AVG(DISTINCT S.age)FROM Sailors SWHERE S.rating=10

Often used with GROUP BY and HAVING clauses

Page 20: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Sailors who have reserved all boats sid snam

erating

age

1 Frodo 7 22

2 Bilbo 2 39

3 Sam 8 27

Sailors

sid bid day

1 102 9/12

2 102 9/12

2 101 9/14

1 102 9/10

2 103 9/13

Reserves

bid bname color

101 Nina red

102 Pinta blue

103 Santa Maria

red

Boats

SELECT S.nameFROM Sailors S, reserves RWHERE S.sid = R.sid GROUP BY S.name, S.sidHAVING COUNT(DISTINCT R.bid) = ( Select COUNT (*) FROM Boats)

count

3

sname sid bid

Frodo 1 102

Bilbo 2 101

Bilbo 2 102

Frodo 1 102

Bilbo 2 103

sname sid bid

Frodo 1 102,102

Bilbo 2 101, 102, 103

sname sid count

Frodo 1 1

Bilbo 2 3

Page 21: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Review: Storage

Query Optimizationand Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

• A DBMS is like an ogre; it has layers

Page 22: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Disks and Files

• DBMS stores information on disks. Why?

• To work with information, DBMS moves data to RAM.– READ: transfer data from disk to main memory

(RAM).– WRITE: transfer data from RAM to disk.

• READ and WRITE are expensive. Why? – must be planned carefully!– DBMS architecture is designed to minimize both

Page 23: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

The Storage Hierarchy

Source: Operating Systems Concepts 5th Edition

–Main memory (RAM) for currently used data.

–Disk for the main database (secondary storage).

–Tapes for archiving older versions of the data (tertiary storage).

Smaller, Faster

Bigger, Slower

Page 24: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Components of a Disk

Platters

The platters spin (say, 120 rps).

Spindle

The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!).

Disk head

Arm movement

Arm assembly

Only one head reads/writes at any one time.

Tracks

Sector

Block size is a multiple of sector size (which is fixed).

Page 25: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Disks are slow. Why?

• Time to access (read/write) a disk block:– seek time (moving

arms to position disk head on track)

– rotational delay (waiting for block to rotate under head)

– transfer time (actually moving data to/from disk surface)

Arm movement

Seek time

Rotational delay

Transfer time

Page 26: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Disk Space Manager

• Lowest layer of DBMS software manages space on disk (using OS file system or not?).

• Higher levels call upon this layer to:– allocate/de-allocate a page– read/write a page

• Best if a request for a sequence of pages is satisfied by pages stored sequentially on disk!– Responsibility of disk space manager.– Higher levels don’t know how this is done, or how

free space is managed.– Though they may make performance

assumptions!• Hence disk space manager should do a decent job.

Page 27: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Buffer Management in a DBMS

DB

MAIN MEMORY

DISK

disk page

free frame

Page Requests from Higher Levels

BUFFER POOL

choice of frame dictatedby replacement policy

• Buffer pool information table contains: <frame#, pageid,

pin_count, dirty>

Page 28: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Buffer Management

• Keeps a group a disk pages in memory

• Records whether each is pinned– What happens when all pages pinned?– Whan happens when a page is

unpinned?

• Keeps track of whether pages are dirty

Page 29: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Buffer Management – Replacement

• What if all frames are used, but not pinned, and a new page is requested?

• What pages are candidates for replacement?

• How is the replaced page chosen?

Page 30: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Replacement Policies

• Least Recently Used (LRU)

• Most Recently Used (MRU)

• Clock

• Advantages? Disadvantages?

Page 31: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

What is in Database Pages?

• Database contains files, which are made up of…

• Pages, which are made up of…• Records, which are made up of…• Fields, which hold single values.

Page 32: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

How are records organized?

• It depends on whether fields variable, or fixed length

• In Minibase, array of type/offsets, followed by data.

F1 F2 F3 F4

Array of Field Offsets

Page 33: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

How are pages organized?• It depends on whether records variable, fixed

length.• In Minibase, slot array at beginning of page,

records compacted at end of page.• What happens if record deleted?

Page iRid = (i,N)

Rid = (i,2)

Rid = (i,1)

Pointerto startof freespace

SLOT DIRECTORY

N . . . 2 120 16 24 N

# slots

Page 34: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

How are files organized?

• Unordered Heap File: chained directory pages, containing records that point to data pages.

DataPage 1

DataPage 2

DataPage N

HeaderPage

DIRECTORY

Page 35: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Several possible file organizations

• Heap Files• Sorted Files• Clustered Indexes• Unclustered Index + regular file• What are the tradeoffs?

– Scan– Sort– Equality Search– Range Search– Insertion/Deletion

Page 36: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Indexes

• Can be used to store data records (alt 1), or be an auxillary data structure that referrs to existing file of records (alt 2, 3)

• Many types of index (B-Tree, Hash Table, R-Tree, etc.)

• How do you choose the right index?

• Difference between clustered and unclustered indexes?

Page 37: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Clustered vs. Unclustered Index• Suppose that Alternative (2) is used for data entries,

and that the data records are stored in a Heap file.– To build clustered index, first sort the Heap file (with

some free space on each block for future inserts). – Overflow blocks may be needed for inserts. (Thus, order

of data recs is `close to’, but not identical to, the sort order.)

Index entries

Data entries

direct search for

(Index File)

(Data file)

Data Records

data entries

Data entries

Data Records

CLUSTERED UNCLUSTERED

Page 38: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

B-Trees: a common, flexible index

• What is a B-Tree?

• What goes in an index (interior) node?

• What goes in a leaf node?

• How do insertions and deletions work?

Page 39: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

Any Questions?

Page 40: Spring 2007 Midterm 1 Review Lectures 2-10 Cow book Chapters 1,3,4,5,8,9,10,11

See you here on Thursday…