Upload
eugenia-pitts
View
215
Download
0
Embed Size (px)
Citation preview
Spring 2007 Midterm 1 ReviewLectures 2-10
Cow book Chapters 1,3,4,5,8,9,10,11
Administrivia
• Midterm 1 – in class this Thursday!– Closed book examination– You will be allowed one 8.5” x 11”
sheet of notes (double sided).
• Sample questions on class web site
Review Outline
• Relational Data Model, Algebra, Calculus and SQL
• Storage, Buffer Management and Indexes
Review: DBMS components
Query Optimizationand Execution
Relational Operators
Access Methods
Buffer Management
Disk Space Management
DB
•Makes efficient use of disk space
-> Think 300,000,000 accounts!
•Makes efficient use of RAM
-> Think 1,000,000 simultaneous requests!
•Provides generic ways to combine data
-> Do you want a list of customers and accounts or the total account balance of all customers?
•Figures out the best way to answer a question
-> There is always more than 1 way to skin a cat…!
•Provides efficient ways to extract data
-> Do you need 1 record or a bunch?
Database application
•Talks to DBMS to manage data for a specific task
-> e.g. app to withdraw/deposit money or provide a history of the account
Review: ACID properties
• A DBMS ensures a database has ACID properties:
• Atomicity – nothing is ever half baked; database changes either happen or they don’t.
• Consistency – you can’t peek at the data til it is baked; database changes aren’t visible until they are committed
• Isolation – concurrent operations have an explainable outcome; multiple users can operate on a database without conflicting
• Durability – what’s done is done; once a database operation completes, it remains even if the database crashes
Review: Relational Data Model
• Most widely used data model.
• Relation: made up of 2 parts:– Schema : specifies name of relation, plus name and
type of each column. • e.g. Students(sid: string, name: string, login: string, age:
integer, gpa: real) – Instance : a table, with rows and columns described
by the schema• Introduced data independence
– Data layout on disk can change without affecting applications using the data
• Keys contribute to data independence– Relationships are determined by field value, not
physical pointers!
Review: Bank of Middle Earth
CustomerID Name Address AccountID
314159 Frodo Baggins
BagEnd 112358
271828 Sam Gamgee
BagShot Row
132124
42 Bilbo Baggins
Rivendell 112358
Account ID
Balance
112358 4500.00
132124 2000.00
Give me an example of…•A super key for Accounts•Good primary key choices for both•A foreign key •A possible check constraint
ALTER TABLE ACCOUNTSADD CONSTRAINT CHECK_BALCHECK (BALANCE>= 0)
Review: Query Languages• Query languages provide 2 key advantages:
– Less work for user asking query– More opportunities for optimization
• Algebra and safe calculus are simple and powerful models for query languages for relational model – Have same expressive power– Algebra is more operational; calculus is more declarative
• SQL can express every query that is expressible in relational algebra/calculus. (and more)
• Two sublanguages:– DDL – Data Definition Language
• Define and modify schema (at all 3 levels)– DML – Data Manipulation Language
• Queries and IUD (insert update delete)
Review: Basic DDLCustomerID
Name Address AccountID
314159 Frodo Baggins
BagEnd 112358
271828 Sam Gamgee
BagShot Row
132124
42 Bilbo Baggins
Rivendell 112358
Account ID
Balance
112358 4500.00
132124 2000.00
CREATE TABLE CUSTOMERS (CustomerID INTEGER NOT NULL, Name VARCHAR(128), Address VARCHAR(256), AccountID INTEGER, PRIMARY KEY(CustomerID), FOREIGN KEY(AccountId) REFERENCES ACCOUNTS);
CREATE TABLE ACCOUNTS (AccountID INTEGER NOT NULL, Balance Double, PRIMARY KEY (AccountID));
• Why do we need NOT NULL?• What would happen if I executed these commands in this order?
Additional operations:•Intersection ()•Join ( ) •Division ( / )
Relational Algebra Review
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5 58 rusty 10 35.0
bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red
sid bid day
22 101 10/10/9658 103 11/12/96
Reserves Sailors Boats
Basic operations:•Selection ( σ ) •Projection ( π ) •Cross-product ( ) •Set-difference ( — ) •Union ( )
:tuples that appear in both relations.:like but only keep tuples where common fields are equal.:tuples from relation 1 with matches in relation 2
: gives a subset of rows.: deletes unwanted columns.: combine two relations.: tuples in relation 1, but not 2 : tuples in relation 1 appended with tuples in relation 2.
Relational Algebra Review
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5 58 rusty 10 35.0
bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red
sid bid day
22 101 10/10/9658 103 11/12/96
Reserves Sailors Boats
Find names of sailors who’ve reserved a green boat
σ ( color=‘Green’Boats)
( Sailors)
π ( sname )
( Reserves)π( bid )
π( sid )
πsname ( )
sid sname rating age
1 Frodo 7 22
2 Bilbo 2 39
3 Sam 8 27
Relational Algebra Review
Find names of sailors who’ve reserved all boats•First use division and renaming to find sids of sailors who reserved all boats•Then join result with sailors and project to get their names
ρ (Tempsids, )
( Tempsids Sailors)
( sid,bid Reserves)π ( bid Boats)π Tempsids
sid
bid day
1 103 9/12
2 103 9/13
3 103 9/14
3 101 9/12
1 103 9/13
SailorsReserves
bid bname color
101 Nina red
103 Pinta blue
Boats
sid
bid
1 103
2 103
3 101
3 103
/ bid
101
103
=sid
3
Relational Calculus Review• Variables
TRC: Variables are bound to tuples.DRC: Variables are bound to domain elements (= column
values)
• Constants7, “Foo”, 3.14159, etc.
• Comparison operators=, <>, <, >, etc.
• Logical connectives - not– and - or - implies - is a member of
• QuantifiersX(p(X)): For every X, p(X) must be trueX(p(X)): There exists at least one X such that p(X) is true
Relational Calculus Review Find names of sailors who
have reserved a green boat sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5 58 rusty 10 35.0
sid bid day
22 101 10/10/9658 103 11/12/96
S
S
S
R
R
bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red
Boats
{ N | S Sailors (S.name = N.name R Reserves(S.sid = R.sid B Boats(B.color = “Green” B.bid = R.bid)))}
Sailors
Reserves
sname
NB
B
B
B
rusty
sid sname rating age
1 Frodo 7 22
2 Bilbo 2 39
3 Sam 8 27
Relational Calculus Review
Find names of sailors who’ve reserved all boats
sid
bid day
1 103 9/12
2 103 9/13
3 103 9/14
3 101 9/12
1 103 9/13
SailorsReserves
bid bname color
101 Nina red
103 Pinta blue
Boats
{N | SSailors (S.name = N.name BBoats (RReserves (S.sid = R.sid B.bid = R.bid))}
S
sname
N Sam
B
R
R
R
R
R
BS
S
Basic SQL Query
SELECT [DISTINCT] target-listFROM relation-listWHERE qualification
relation-list : A list of relation names, possibly with a range-variable after each name
target-list : A list of attributes of tables in relation-list
DISTINCT: optional keyword indicating answer should not contain duplicates.
In SQL, default is that duplicates are not eliminated! (Result is called a “multiset”)
qualification : Comparisons combined using AND, OR and NOT. Comparisons are Attr op const or Attr1 op Attr2, where op is one of etc.
Set Operators in SQL• UNION
– Returns the UNION of two sets with (same arity)– UNION ALL retains duplicates in result
• INTERSECT– Returns the INTERSECTION of two sets (with same arity)
• EXCEPT– Set difference: A EXCEPT B returns tuples in A but not B
• IN/NOT IN– A in B is true if A is a subset of B
• EXISTS/NOT EXISTS– True if expression evaluates to a set with at least one member
• UNIQUE/NOT UNIQUE– True if expression evaluates to a set with no duplicates
• Value <comparison op> ANY/ALL– Value > ANY A is true if A contains at least one member that
makes the comparison true– Value > ALL B is true if all members of A make the comparison
true
Set operators are almost always used with nested queries
SQL Review: Nested query
sid bid day
1 103 9/12
2 103 9/13
1 103 9/13
sid
sname
rating
age
1 Frodo
7 22
2 Bilbo 2 39
3 Sam 8 27
SailorsReserves
S
S
S
Find the names of sailors who’ve reserved boat #103 exactly once
SELECT S.snameFROM Sailors SWHERE UNIQUE (SELECT sid, bid FROM Reserves R WHERE R.bid=103 AND S.sid=R.sid)321
Aggregate Operators• Very powerful; enables computations over sets of
tuples
SELECT AVG (S.age)FROM Sailors SWHERE S.rating=10
SELECT COUNT (*)FROM Sailors S
• COUNT: returns a count of tuples in the set
• AVG: returns average of column values in the set
• SUM: returns sum of column values in the set
• MIN, MAX: returns min (max) value of column values in a set.
• DISTINCT can be added to COUNT, AVG, SUM to perform computation only over distinct values.
SELECT AVG(DISTINCT S.age)FROM Sailors SWHERE S.rating=10
Often used with GROUP BY and HAVING clauses
Sailors who have reserved all boats sid snam
erating
age
1 Frodo 7 22
2 Bilbo 2 39
3 Sam 8 27
Sailors
sid bid day
1 102 9/12
2 102 9/12
2 101 9/14
1 102 9/10
2 103 9/13
Reserves
bid bname color
101 Nina red
102 Pinta blue
103 Santa Maria
red
Boats
SELECT S.nameFROM Sailors S, reserves RWHERE S.sid = R.sid GROUP BY S.name, S.sidHAVING COUNT(DISTINCT R.bid) = ( Select COUNT (*) FROM Boats)
count
3
sname sid bid
Frodo 1 102
Bilbo 2 101
Bilbo 2 102
Frodo 1 102
Bilbo 2 103
sname sid bid
Frodo 1 102,102
Bilbo 2 101, 102, 103
sname sid count
Frodo 1 1
Bilbo 2 3
Review: Storage
Query Optimizationand Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
• A DBMS is like an ogre; it has layers
Disks and Files
• DBMS stores information on disks. Why?
• To work with information, DBMS moves data to RAM.– READ: transfer data from disk to main memory
(RAM).– WRITE: transfer data from RAM to disk.
• READ and WRITE are expensive. Why? – must be planned carefully!– DBMS architecture is designed to minimize both
The Storage Hierarchy
Source: Operating Systems Concepts 5th Edition
–Main memory (RAM) for currently used data.
–Disk for the main database (secondary storage).
–Tapes for archiving older versions of the data (tertiary storage).
Smaller, Faster
Bigger, Slower
Components of a Disk
Platters
The platters spin (say, 120 rps).
Spindle
The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!).
Disk head
Arm movement
Arm assembly
Only one head reads/writes at any one time.
Tracks
Sector
Block size is a multiple of sector size (which is fixed).
Disks are slow. Why?
• Time to access (read/write) a disk block:– seek time (moving
arms to position disk head on track)
– rotational delay (waiting for block to rotate under head)
– transfer time (actually moving data to/from disk surface)
Arm movement
Seek time
Rotational delay
Transfer time
Disk Space Manager
• Lowest layer of DBMS software manages space on disk (using OS file system or not?).
• Higher levels call upon this layer to:– allocate/de-allocate a page– read/write a page
• Best if a request for a sequence of pages is satisfied by pages stored sequentially on disk!– Responsibility of disk space manager.– Higher levels don’t know how this is done, or how
free space is managed.– Though they may make performance
assumptions!• Hence disk space manager should do a decent job.
Buffer Management in a DBMS
DB
MAIN MEMORY
DISK
disk page
free frame
Page Requests from Higher Levels
BUFFER POOL
choice of frame dictatedby replacement policy
• Buffer pool information table contains: <frame#, pageid,
pin_count, dirty>
Buffer Management
• Keeps a group a disk pages in memory
• Records whether each is pinned– What happens when all pages pinned?– Whan happens when a page is
unpinned?
• Keeps track of whether pages are dirty
Buffer Management – Replacement
• What if all frames are used, but not pinned, and a new page is requested?
• What pages are candidates for replacement?
• How is the replaced page chosen?
Replacement Policies
• Least Recently Used (LRU)
• Most Recently Used (MRU)
• Clock
• Advantages? Disadvantages?
What is in Database Pages?
• Database contains files, which are made up of…
• Pages, which are made up of…• Records, which are made up of…• Fields, which hold single values.
How are records organized?
• It depends on whether fields variable, or fixed length
• In Minibase, array of type/offsets, followed by data.
F1 F2 F3 F4
Array of Field Offsets
How are pages organized?• It depends on whether records variable, fixed
length.• In Minibase, slot array at beginning of page,
records compacted at end of page.• What happens if record deleted?
Page iRid = (i,N)
Rid = (i,2)
Rid = (i,1)
Pointerto startof freespace
SLOT DIRECTORY
N . . . 2 120 16 24 N
# slots
How are files organized?
• Unordered Heap File: chained directory pages, containing records that point to data pages.
DataPage 1
DataPage 2
DataPage N
HeaderPage
DIRECTORY
Several possible file organizations
• Heap Files• Sorted Files• Clustered Indexes• Unclustered Index + regular file• What are the tradeoffs?
– Scan– Sort– Equality Search– Range Search– Insertion/Deletion
Indexes
• Can be used to store data records (alt 1), or be an auxillary data structure that referrs to existing file of records (alt 2, 3)
• Many types of index (B-Tree, Hash Table, R-Tree, etc.)
• How do you choose the right index?
• Difference between clustered and unclustered indexes?
Clustered vs. Unclustered Index• Suppose that Alternative (2) is used for data entries,
and that the data records are stored in a Heap file.– To build clustered index, first sort the Heap file (with
some free space on each block for future inserts). – Overflow blocks may be needed for inserts. (Thus, order
of data recs is `close to’, but not identical to, the sort order.)
Index entries
Data entries
direct search for
(Index File)
(Data file)
Data Records
data entries
Data entries
Data Records
CLUSTERED UNCLUSTERED
B-Trees: a common, flexible index
• What is a B-Tree?
• What goes in an index (interior) node?
• What goes in a leaf node?
• How do insertions and deletions work?
Any Questions?
See you here on Thursday…