Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Introduction to Data Management
Lecture #13 (Relational Calculus, Continued)
Instructor: Mike Carey [email protected]
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
It’s time for another installment of....
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Announcements v HW and exams:
§ HW #4 info • Underway, due on Tuesday (6 PM)
§ Midterm Exam 1 • Grading is underway, goal is 2-week turnaround
v Today’s plan: § Relational calculus, Episode 2 § On to SQL (if time)!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
Ex: Wisconsin Sailing Club Database
sid sname rating age
22 Dustin 7 45.0
29 Brutus 1 33.0
31 Lubber 8 55.5
32 Andy 8 25.5
58 Rusty 10 35.0
64 Horatio 7 35.0
71 Zorba 10 16.0
74 Horatio 9 35.0
85 Art 4 25.5
95 Bob 3 63.5
sid bid date
22 101 10/10/98
22 102 10/10/98
22 103 10/8/98
22 104 10/7/98
31 102 11/10/98
31 103 11/6/98
31 104 11/12/98
64 101 9/5/98
64 102 9/8/98
74 103 9/8/93
bid bname color
101 Interlake blue
102 Interlake red
103 Clipper green
104 Marine red
Sailors Reserves Boats
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5
Tuple Relational Calculus
v Query in TRC has the form: { t(a1,a2,...) | P(t) }
v Answer includes all possible tuples t with the specified schema that make formula P(t) true.
v Formula is recursively defined, starting with simple atomic formulas and building up bigger formulas using logical connectives.
∃∀∈∉¬∧∨⇒≤≥≠
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
Review: Find sailors with a rating > 7
v Simplest answer, if all Sailors attributes desired: { s | s ∈ Sailors ∧ s.rating > 7 }
v Also equivalent to a more general formulation: { t(sid, sname, rating, age) | ∃s ∈ Sailors
( t.sid = s.sid ∧ t.sname = s.sname ∧ t.rating=s.rating ∧ t.age = s.age ∧ s.rating > 7 ) } (Notice how each specifies the answer’s schema and values.)
Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 ...
Ex: TRC Query Semantics
sid sname rating age
22 Dustin 7 45.0
29 Brutus 1 33.0
31 Lubber 8 55.5
32 Andy 8 25.5
58 Rusty 10 35.0
64 Horatio 7 35.0
71 Zorba 10 16.0
74 Horatio 9 35.0
85 Art 4 25.5
95 Bob 3 63.5
sid bid date
22 101 10/10/98
bid bname color
101 Interlake blue
Sailors
sid sname rating age
22 Dustin 7 45.0
nid nname nvalue
1 Pi 3.14159...
sid sname rating age
58 Rusty 10 35.0
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
Review: Find ids of sailors who are older than 30.0 or who have a rating under 8 and are named “Horatio”
{ t(sid) | ∃s ∈ Sailors ( (s.age > 30.0 ∨ (s.rating < 8 ∧ s.sname = “Horatio”)) ∧ t.sid = s.sid ) } v Things to notice:
§ Again, how result schema and values are specified § Use of Boolean formula to specify the query constraints § Highly declarative nature of this form of query language!
Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9
Unsafe Queries and Expressive Power
v It is possible to write syntactically correct calculus queries that have an infinite number of answers! Such queries are called unsafe. § E.g., s | s ∉ Sailors
v It is known that every query that can be expressed in relational algebra can be expressed as a safe query in DRC / TRC; the converse is also true.
v Relational Completeness: Query language (e.g., SQL) can express every query that is expressible in relational algebra/calculus.
∃∀∈∉¬∧∨⇒≤≥≠
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
Find names of sailors who’ve reserved a red boat Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)
{ t(sname) | ∃s ∈ Sailors (t.sname = s.sname ∧ ∃r ∈ Reserves (r.sid = s.sid ∧
∃b ∈ Boats (b.bid = r.bid ∧ b.color = ‘red’))) } v Things to notice:
§ Again, how result schema and values are specified § How joins appear here as value-matching predicates § Highly declarative nature of this form of query language!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11
Find ids of sailors who’ve reserved a red boat or a green boat
Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)
{ t(sid) | ∃s ∈ Sailors (t.sid = s.sid ∧ ∃r ∈ Reserves (r.sid = s.sid ∧
∃b ∈ Boats (b.bid = r.bid ∧ (b.color = ‘red’ ∨ b.color = ‘green’ )))) }
v Things to notice:
§ Just had to add a disjunctive test to the previous query!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
Find ids of sailors who’ve reserved a red boat and a green boat
Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)
{ t(sid) | ∃s ∈ Sailors (t.sid = s.sid ∧ ∃r1 ∈ Reserves (r1.sid = s.sid ∧
∃b1 ∈ Boats (b1.bid = r1.bid ∧ b1.color = ‘red’)) ∧ ∃r2 ∈ Reserves (r2.sid = s.sid ∧
∃b2 ∈ Boats (b2.bid = r2.bid ∧ b2.color = ‘green’)))} v Things to notice:
§ This required several more variables! (Q: Why?) § Q: Could we have done this with just s, r, b1, and b2? (And why?) § Think of tuple variables as fingers pointing at the tables’ rows
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13
sid sname rating age
22 Dustin 7 45.0
29 Brutus 1 33.0
31 Lubber 8 55.5
32 Andy 8 25.5
58 Rusty 10 35.0
64 Horatio 7 35.0
71 Zorba 10 16.0
74 Horatio 9 35.0
85 Art 4 25.5
95 Bob 3 63.5
sid bid date
22 101 10/10/98
22 102 10/10/98
22 103 10/8/98
22 104 10/7/98
31 102 11/10/98
31 103 11/6/98
31 104 11/12/98
64 101 9/5/98
64 102 9/8/98
74 103 9/8/93
bid bname color
101 Interlake blue
102 Interlake red
103 Clipper green
104 Marine red
Sailors
Reserves
Boats
Example: Tuple Variable Bindings
s
r1 r2
b1 b2
(Bindings at one point in time J)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
Find the names of sailors who’ve reserved all boats
Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)
{ t(sname) | ∃s ∈ Sailors (t.sname = s.sname ∧ ∀b ∈ Boats (∃r ∈ Reserves (r.sid = s.sid ∧ b.bid = r.bid) ) ) }
v Things to notice:
§ How universal quantification addresses the “all” query use case § Highly declarative nature of this form of query language!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15
Find the names of sailors who’ve reserved all Interlake boats
Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)
{ t(sname) | ∃s ∈ Sailors (t.sname = s.sname ∧
∀b ∈ Boats (b.bname = ‘Interlake’ ⇒ ( ∃r ∈ Reserves (r.sid = s.sid ∧ b.bid = r.bid) ) ) ) }
v Or, if you prefer:
{ t(sname) | ∃s ∈ Sailors (t.sname = s.sname ∧
∀b ∈ Boats (b.bname ≠ ‘Interlake’ ∨ ( ∃r ∈ Reserves (r.sid = s.sid ∧ b.bid = r.bid) ) ) ) }
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
Relational Calculus Summary
v Relational calculus is non-operational, and users define queries in terms of what they want, not in terms of how to compute it. (Declarativeness: “What, not how!”)
v Algebra and safe calculus subset have same expressive power, leading to the notion of relational completeness for query languages.
v Two calculus variants: TRC (tuple relational calculus, which we’ve studied here) and DRC (domain relational calculus).
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17
SQL-Based DBMSs
v Commercial RDBMS choices include § DB2 (IBM) § Oracle § SQL Server (Microsoft) § Teradata
v Open source RDBMS options include § MySQL § PostgreSQL
v And for so-called “Big Data”, we also have § Apache Hive (on Hadoop) + newer wannabees
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18
Example Instances
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
sid bid day22 101 10/10/9658 103 11/12/96
R1
S1
S2
v We’ll use these instances of our usual Sailors and Reserves relations in our examples.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19
Example Data in MySQL
Sailors Reserves
Boats
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20
Basic SQL Query
v relation-list A list of relation names (possibly with a range-variable after each name).
v target-list A list of attributes of relations in relation-list v qualification Comparisons (Attr op const or Attr1 op
Attr2, where op is one of <, <=, =, >, >=, <>) combined using AND, OR and NOT.
v DISTINCT is an optional keyword indicating that the answer should not contain duplicates. Default is that duplicates are not eliminated! (Bags, not sets.)
SELECT [DISTINCT] target-list FROM relation-list WHERE qualification
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21
Conceptual Evaluation Strategy
v Semantics of an SQL query defined in terms of the following conceptual evaluation strategy: § Compute the cross-product of relation-list. (✕) § Discard resulting tuples if they fail qualifications. (σ) § Project out attributes that are not in target-list. (π) § If DISTINCT is specified, eliminate duplicate rows. (δ)
v This strategy is probably the least efficient way to compute a query! An optimizer will find more efficient strategies to compute the same answers.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22
Example of Conceptual Evaluation SELECT S.sname FROM Sailors S, Reserves R ß using S1 WHERE S.sid=R.sid AND R.bid=103
(sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23
A Note on Range Variables
v Really needed only if the same relation appears twice (or more) in the FROM clause. The previous query can also be written as: SELECT sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND bid=103
SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103
It is good style, however, to use range variables always!
OR
Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24
Find sailors who’ve reserved at least one boat
v Would adding DISTINCT to this query make a difference? (With our data? With possible data?)
v What is the effect of replacing S.sid by S.sname in the SELECT clause? Would adding DISTINCT to this variant of the query make a difference?
SELECT S.sid FROM Sailors S, Reserves R WHERE S.sid=R.sid
Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25
Expressions and Strings
v Illustrates use of arithmetic expressions and string pattern matching: Find triples (of ages of sailors and two fields defined by expressions) for sailors whose names begin and end with B and contain at least three characters.
v AS provides a way to (re)name fields in result. v LIKE is used for string matching. `_’ stands for any
one character and `%’ stands for 0 or more arbitrary characters. (See MySQL docs for more info...)
SELECT S.sname, S.age, 7 * S.age AS dogyears FROM Sailors S WHERE S.sname LIKE ‘B_%B’
Sailors(sid, sname, rating, age)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26
To Be Continued!
v More SQL ahead!