Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Database Management Systems, R. Ramakrishnan and J. Gehrke
1
Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:
❶ Relational Algebra: More operational, very useful for representing execution plans.
❷ Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.)
☛ Understanding Algebra & Calculus is key to☛ understanding SQL, query processing!
Additional operations:•Intersection ()•Join ( ) •Division ( / )
bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red
sid bid day22 101 10/10/9658 103 11/12/96
Reserves Sailors Boats
Basic operations:•Selection ( σ ) •Projection ( π ) •Cross-product ( ) •Set-difference ( — ) •Union ( )
:tuples in both relations.:like but only keep tuples where common fields are equal.:tuples from relation 1 with matches in relation 2
: gives a subset of rows.: deletes unwanted columns.: combine two relations.: tuples in relation 1, but not 2 : tuples in relation 1 and 2.
Query Optimizationand Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Prediction: These relational operators are going to look hauntingly familiar when we get to them…!
Additional operations:•Intersection ()•Join ( ) •Division ( / )
bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red
sid bid day22 101 10/10/9658 103 11/12/96
Reserves Sailors Boats
Basic operations:•Selection ( σ ) •Projection ( π ) •Cross-product ( ) •Set-difference ( — ) •Union ( )
Find names of sailors who’ve reserved a green boat
σ ( color=‘Green’Boats) ( Sailors)π( sname ) ( Reserves )
bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red
sid bid day22 101 10/10/9658 103 11/12/96
Reserves Sailors Boats
Find names of sailors who’ve reserved a green boat
*Given the previous algebra, a query optimizer would replace it with this!
σ ( color=‘Green’Boats)
( Sailors)
π( sname )
( Reserves)π( bid )
π( sid )
Or better yet:
SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red”
AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2
For each sailor with a rating > 5 that has reserved at least 2 red boats, find the sailor id and the earliest date on which the sailor has a reservation for a red boat.
HAVING COUNT(*)>2
p S.sid, MIN(R.day)
SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2
Sailors Reserves Boats
sB.color = “red”
GROUP BY S.Sid
VS.rating > 5
Allow us to choose different join orders and to `push’ selections and projections ahead of joins.
Selections can be cascaded:
sc1…cn(R) sc1(…(scn(R))…)
SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2
HAVING COUNT(*)>2
sB.color = “red”
GROUP BY S.Sid
pS.sid, MIN(R.day)
Sailors
Reserves BoatssS.rating > 5
Can apply these predicates separately•Can ‘push’ S.rating > 5 down to Sailors
Selections can be commuted:
sc1(sc2(R)) sc2(sc1(R))
SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2
HAVING COUNT(*)>2
sS.Rating > 5
GROUP BY S.Sid
pS.sid, MIN(R.day)
Boats
Reserves SailorssB.color = “red”
Can apply these predicates in different order
Projections can be cascaded:
pa1(R) pa1(…(pa1, …, an(R))…)
SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2
HAVING COUNT(*)>2
sB.color = “red”
GROUP BY S.Sid
pS.sid, MIN(R.day)
Reserves Boats
sS.rating > 5
Sailors
pS.sid
Can project S.sid to reduce size of tuples
Eager projection◦ Can cascade and “push” some
projections thru selection◦ Can cascade and “push” some
projections below one side of a join
◦ Rule of thumb: can project anything not needed “downstream”
HAVING COUNT(*)>2
sB.color = “red”
GROUP BY S.Sid
BoatssS.rating > 5 Sailors
pS.sid
Reservesp
R.sid, R.bid, R.day
SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2
p??
pB.bid, B.color
p??
pS.sid, R.day
pS.sid, MIN(R.day)
p??p
??
(R1 R2) R3= R1 (R2 R3)
A domain is referred to in a relation schema by the domain name and has a set of associated values.◦ Students(sid: string, name: string, login:
string, age: integer, gpa: real) The set of values associated with domain
string is the set of all character strings.
The most widely used relational query language. Standardized
(although most systems add their own “special sauce” -- including PostgreSQL)
We will study SQL92 -- a basic subset
Two sublanguages:◦ DDL – Data Definition Language Define and modify schema (at all 3 levels)
◦ DML – Data Manipulation Language Queries and IUD (insert update delete)
DBMS is responsible for efficient evaluation.◦ Relational completeness means we can define
precise semantics for relational queries.◦ Optimizer can re-order operations, without
affecting query answer.◦ Choices driven by “cost model”
sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27
Sailors
sid bid day1 102 9/122 102 9/13
Reserves
bid bname color101 Nina red102 Pinta blue103 Santa Maria red
Boats
CREATE TABLE Sailors (sid INTEGER, sname CHAR(20), rating INTEGER, age REAL, PRIMARY KEY sid)
CREATE TABLE Boats (bid INTEGER, bname CHAR (20), color CHAR(10) PRIMARY KEY bid)
CREATE TABLE Reserves (sid INTEGER, bid INTEGER, day DATE, PRIMARY KEY (sid, bid, day), FOREIGN KEY sid REFERENCES Sailors, FOREIGN KEY bid REFERENCES Boats)
NOT NULL,
NOT NULL,
NOT NULL,NOT NULL,
NOT NULL,
A foreign key constraint is an Integrity Constraint: ◦ a condition that must be true for any instance of the database; ◦ Specified when schema is defined.◦ Checked when relations are modified.
Primary/foreign key constraints; but databases support more general constraints as well.◦ e.g. domain constraints like: Rating must be between 1 and 10
ALTER TABLE SAILORS ADD CONSTRAINT RATING CHECK (RATING >= 1 AND RATING < 10)
Or even more complex (and potentially nonsensical):ALTER TABLE SAILORS ADD CONSTRAINT RATING CHECK (RATING*AGE/4 <= SID)
Specify them on CREATE or ALTER TABLE statements
Column Constraints:expressions for column constraint must produce boolean results and
reference the related column’s value only.
NOT NULL | NULL | UNIQUE | PRIMARY KEY | CHECK (expression)
FOREIGN KEY (column) referenced_table [ ON DELETE action ] [ ON UPDATE action ] } action is one of:
NO ACTION, CASCADE, SET NULL, SET DEFAULT
Table Constraints:UNIQUE ( column_name [, ... ] )PRIMARY KEY ( column_name [, ... ] ) | CHECK ( expression ) | FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ON DELETE action ] [ ON UPDATE action ] }
Here, expressions, keys, etc can include multiple columns
DBMSs have fairly sophisticated support for constraints!
…but they have drawbacks:◦ Expensive◦ Can’t always return a meaningful error back to the
application. e.g: What if you saw this error when you enrolled in a course online?
“A violation of the constraint imposed by a unique index or a unique constraint occurred”.
◦ Can be inconvenient e.g. What if the ‘Sailing Class’ application wants to register new (unrated)
sailors with rating 0?
So they aren’t widely used◦ Software developers often prefer to keep the
integrity logic in applications instead
DML includes 4 main statements:SELECT (query), INSERT, UPDATE and DELETE
e.g: To find the names of all 19 year old students:
SELECT S.nameFROM Students SWHERE S.age=19
sid name age gpa
53666 Jones 18 3.4 53688
Smith
18
3.2
53650 Smith
login
jones@cs smith@ee
smith@math 19 3.8
We’ll spend a lot of time on this one
SELECT
PROJECT
Can specify a join over two tables as follows:
SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'
result =
sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B
S.name E.cid Jones History105
SELECT
JOIN
PROJECT
SELECT [DISTINCT] target-listFROM relation-listWHERE qualification
relation-list : A list of relation names, possibly with a range-variable after each name
target-list : A list of attributes of tables in relation-list
DISTINCT: optional keyword indicating answer should not contain duplicates.
In SQL, default is that duplicates are not eliminated! (Result is called a “multiset”)
qualification : Comparisons combined using AND, OR and NOT. Comparisons are Attr op const or Attr1 op Attr2, where op is one of ,,,, etc.
Semantics of an SQL query are defined in terms of the following conceptual evaluation strategy:1. FROM clause: compute cross-product of all tables2. WHERE clause: Check conditions, discard tuples that fail.
(called “selection”).3. SELECT clause: Delete unwanted fields. (called
“projection”).4. If DISTINCT specified, eliminate duplicate rows.
Probably the least efficient way to compute a query! ◦ An optimizer will find more efficient strategies to get the
same answer.
sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27
Sailors
sid bid day1 102 9/122 103 9/13
Reserves
bid bname color101 Nina red102 Pinta blue103 Santa Maria red
Boats
SELECT snameFROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103
X
sid sname rating age sid bid day1 Frodo 7 22 1 102 9/121 Frodo 7 22 2 103 9/132 Bilbo 2 39 1 102 9/122 Bilbo 2 39 2 103 9/133 Sam 8 27 1 103 9/123 Sam 8 27 2 103 9/13
sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27
Sailorssid bid day1 102 9/122 103 9/13
Reserves
SailorsXReserves...
sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27
Sailorssid bid day1 102 9/122 103 9/13
Reserves
Question: If |S| is cardinality of Sailors, and |R| is cardinality of Reserves,What is the cardinality of Sailors X Reserves?
Answer: |S| * |R| |Sailors X Reserves| = 3X2 = 6
sid sname rating age sid bid day1 Frodo 7 22 1 102 9/121 Frodo 7 22 2 103 9/132 Bilbo 2 39 1 102 9/122 Bilbo 2 39 2 103 9/133 Sam 8 27 1 102 9/123 Sam 8 27 2 103 9/13
SailorsXReserves
SELECT snameFROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103
sid sname rating age sid bid day1 Frodo 7 22 1 102 9/121 Frodo 7 22 2 103 9/132 Bilbo 2 39 1 102 9/122 Bilbo 2 39 2 103 9/133 Sam 8 27 1 102 9/123 Sam 8 27 2 103 9/13
SailorsXReserves
SELECT snameFROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103
•Used for short hand•Needed when ambiguity could arise
e.g two tables with the same column name:
SELECT snameFROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND Reserves.bid=103
SELECT snameFROM Sailors S, Reserves RWHERE S.sid=R.sid AND R.bid=103
Question: do range variables remind you of anything?
ØVariables in relational calculus
e.g a Self-join:SELECT R1.bid, R1.dateFROM Reserves R1, Reserves R2WHERE R1.bid = R2.bid and R1.date = R2.date and R1.sid != R2.sid
sid bid day1 102 9/123 103 9/124 103 9/132 103 9/12
R1 R2sid bid day
1 102 9/123 103 9/124 103 9/132 103 9/12
Reserves Reserves
R2R2R2
R1
R1
bid day
103 9/12
bid day
103 9/12
SELECT R1.bid, R1.dayFROM Reserves R1, Reserves R2WHERE R1.bid = R2.bid and R1.day = R2.day and R1.sid != R2.sid
bid day
103 9/12
bid day
103 9/12
What are we computing?
Boats reserved on the same dayby different sailors
Can use arithmetic expressions (add other operations we’ll discuss later)
SELECT S.age, S.age-5 AS age1, 2*S.age AS age2 FROM Sailors SWHERE S.sname = ‘Dustin’
SELECT S1.sname AS name1, S2.sname AS name2 FROM Sailors S1, Sailors S2WHERE 2*S1.rating = S2.rating - 1
• Can use AS to provide column names
• Can use “*” if you want all columns:SELECT *FROM Sailors xWHERE x.age > 20
`_’ stands for any one character and `%’ stands for 0 or more arbitrary characters.
• Can also have expressions in WHERE clause:SELECT S1.sname AS name1, S2.sname AS name2 FROM Sailors S1, Sailors S2WHERE 2*S1.rating = S2.rating - 1
SELECT S.age, S.age-5 AS age1, 2*S.age AS age2 FROM Sailors SWHERE S.sname LIKE ‘B_l%o’
•“LIKE” is used for string matching.
Find sailors that have reserved at least one boat
SELECT DISTINCT S.sid FROM Sailors S, Reserves RWHERE S.sid=R.sid
sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27
Sailorssid bid day1 102 9/122 103 9/122 102 9/13
Reserves
sid
12
How about:
SELECT S.sid FROM Sailors S, Reserves RWHERE S.sid=R.sid
sid122
How about:
SELECT S.snameFROM Sailors S, Reserves RWHERE S.sid=R.sid
sid sname rating age1 Frodo 7 222 Bilbo 2 39
3 Sam 8 274 Bilbo 5 32
Sailors
sid bid day1 102 9/122 103 9/134 105 9/13
Reserves
snameFrodoBilboBilbo
SELECT DISTINCT S.snameFROM Sailors S, Reserves RWHERE S.sid=R.sid
snameFrodoBilbo
vs:
Do we find all sailors that reserved at least one boat?
ANDs, ORs, UNIONs and INTERSECTs
sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27
SailorsReserves
bid bname color101 Nina red102 Pinta blue103 Santa Maria red105 Titanic green
Boats
sid bid day1 102 9/122 103 9/134 105 9/13
Xsid24
SELECT R.sidFROM Boats B,Reserves RWHERE(B.color=‘red’ OR B.color=‘green’)
AND R.bid=B.bid
SELECT R.sidFROM Boats B,Reserves RWHERE(B.color=‘red’ AND B.color=‘green’)
AND R.bid=B.bid
ANDs and ORs
sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27
SailorsReservessid bid day1 101 9/122 103 9/131 105 9/13
Xbid bname color101 Nina red102 Pinta blue103 Santa Maria red105 Titanic green
Boats
SELECT R.sidFROM Boats B,Reserves RWHERE B.color = ‘red’
AND R.bid=B.bid
INTERSECT
SELECT R.sidFROM Boats B,Reserves RWHERE B.color = ‘green’
AND R.bid=B.bid
Use INTERSECT instead of AND
Reservessid bid day1 101 9/122 103 9/131 105 9/13
bid bname color101 Nina red102 Pinta blue103 Santa Maria red105 Titanic green
Boats
sid12
sid1 =
sid1
Exercise: try to rewrite this query using a self join instead of INTERSECT!
Could also use UNION for the OR query
Reservessid bid day1 102 9/122 103 9/134 105 9/13
bid bname color101 Nina red102 Pinta blue103 Santa Maria red105 Titanic green
Boats
sid2
sid4
=sid24
SELECT R.sidFROM Boats B, Reserves RWHERE B.color = ‘red’ AND R.bid=B.bid UNION
SELECT R.sidFROM Boats B, Reserves RWHERE B.color = ‘green’ AND R.bid=B.bid
SELECT S.sid FROM Sailors SEXCEPTSELECT S.sid FROM Sailors S, Reserves RWHERE S.sid=R.sid
Find sids of sailors who have not reserved a boat
sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27
Reservessid bid day1 102 9/122 103 9/131 105 9/13
Sailors
First find the set of sailors who have reserved a boat…and then compare it with the rest of the sailors
sid3