13
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Introduction to Data Management Lecture #13 (Relational Calculus, Continued) Instructor: Mike Carey [email protected] Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2 It’s time for another installment of....

Introduction to Data Management Lecture #13 (Relational

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Introduction to Data Management

Lecture #13 (Relational Calculus, Continued)

Instructor: Mike Carey [email protected]

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

It’s time for another installment of....

Page 2: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3

Announcements v  HW and exams:

§  HW #4 info • Underway, due on Tuesday (6 PM)

§  Midterm Exam 1 • Grading is underway, goal is 2-week turnaround

v  Today’s plan: §  Relational calculus, Episode 2 §  On to SQL (if time)!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

Ex: Wisconsin Sailing Club Database

sid sname rating age

22 Dustin 7 45.0

29 Brutus 1 33.0

31 Lubber 8 55.5

32 Andy 8 25.5

58 Rusty 10 35.0

64 Horatio 7 35.0

71 Zorba 10 16.0

74 Horatio 9 35.0

85 Art 4 25.5

95 Bob 3 63.5

sid bid date

22 101 10/10/98

22 102 10/10/98

22 103 10/8/98

22 104 10/7/98

31 102 11/10/98

31 103 11/6/98

31 104 11/12/98

64 101 9/5/98

64 102 9/8/98

74 103 9/8/93

bid bname color

101 Interlake blue

102 Interlake red

103 Clipper green

104 Marine red

Sailors Reserves Boats

Page 3: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5

Tuple Relational Calculus

v  Query in TRC has the form: { t(a1,a2,...) | P(t) }

v  Answer includes all possible tuples t with the specified schema that make formula P(t) true.

v  Formula is recursively defined, starting with simple atomic formulas and building up bigger formulas using logical connectives.

∃∀∈∉¬∧∨⇒≤≥≠

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

Review: Find sailors with a rating > 7

v  Simplest answer, if all Sailors attributes desired: { s | s ∈ Sailors ∧ s.rating > 7 }

v  Also equivalent to a more general formulation: { t(sid, sname, rating, age) | ∃s ∈ Sailors

( t.sid = s.sid ∧ t.sname = s.sname ∧ t.rating=s.rating ∧ t.age = s.age ∧ s.rating > 7 ) } (Notice how each specifies the answer’s schema and values.)

Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)

Page 4: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 ...

Ex: TRC Query Semantics

sid sname rating age

22 Dustin 7 45.0

29 Brutus 1 33.0

31 Lubber 8 55.5

32 Andy 8 25.5

58 Rusty 10 35.0

64 Horatio 7 35.0

71 Zorba 10 16.0

74 Horatio 9 35.0

85 Art 4 25.5

95 Bob 3 63.5

sid bid date

22 101 10/10/98

bid bname color

101 Interlake blue

Sailors

sid sname rating age

22 Dustin 7 45.0

nid nname nvalue

1 Pi 3.14159...

sid sname rating age

58 Rusty 10 35.0

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8

Review: Find ids of sailors who are older than 30.0 or who have a rating under 8 and are named “Horatio”

{ t(sid) | ∃s ∈ Sailors ( (s.age > 30.0 ∨ (s.rating < 8 ∧ s.sname = “Horatio”)) ∧ t.sid = s.sid ) } v  Things to notice:

§  Again, how result schema and values are specified §  Use of Boolean formula to specify the query constraints §  Highly declarative nature of this form of query language!

Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)

Page 5: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9

Unsafe Queries and Expressive Power

v  It is possible to write syntactically correct calculus queries that have an infinite number of answers! Such queries are called unsafe. §  E.g., s | s ∉ Sailors

v  It is known that every query that can be expressed in relational algebra can be expressed as a safe query in DRC / TRC; the converse is also true.

v  Relational Completeness: Query language (e.g., SQL) can express every query that is expressible in relational algebra/calculus.

∃∀∈∉¬∧∨⇒≤≥≠

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

Find names of sailors who’ve reserved a red boat Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)

{ t(sname) | ∃s ∈ Sailors (t.sname = s.sname ∧ ∃r ∈ Reserves (r.sid = s.sid ∧

∃b ∈ Boats (b.bid = r.bid ∧ b.color = ‘red’))) } v  Things to notice:

§  Again, how result schema and values are specified §  How joins appear here as value-matching predicates §  Highly declarative nature of this form of query language!

Page 6: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11

Find ids of sailors who’ve reserved a red boat or a green boat

Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)

{ t(sid) | ∃s ∈ Sailors (t.sid = s.sid ∧ ∃r ∈ Reserves (r.sid = s.sid ∧

∃b ∈ Boats (b.bid = r.bid ∧ (b.color = ‘red’ ∨ b.color = ‘green’ )))) }

v  Things to notice:

§  Just had to add a disjunctive test to the previous query!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

Find ids of sailors who’ve reserved a red boat and a green boat

Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)

{ t(sid) | ∃s ∈ Sailors (t.sid = s.sid ∧ ∃r1 ∈ Reserves (r1.sid = s.sid ∧

∃b1 ∈ Boats (b1.bid = r1.bid ∧ b1.color = ‘red’)) ∧ ∃r2 ∈ Reserves (r2.sid = s.sid ∧

∃b2 ∈ Boats (b2.bid = r2.bid ∧ b2.color = ‘green’)))} v  Things to notice:

§  This required several more variables! (Q: Why?) §  Q: Could we have done this with just s, r, b1, and b2? (And why?) §  Think of tuple variables as fingers pointing at the tables’ rows

Page 7: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13

sid sname rating age

22 Dustin 7 45.0

29 Brutus 1 33.0

31 Lubber 8 55.5

32 Andy 8 25.5

58 Rusty 10 35.0

64 Horatio 7 35.0

71 Zorba 10 16.0

74 Horatio 9 35.0

85 Art 4 25.5

95 Bob 3 63.5

sid bid date

22 101 10/10/98

22 102 10/10/98

22 103 10/8/98

22 104 10/7/98

31 102 11/10/98

31 103 11/6/98

31 104 11/12/98

64 101 9/5/98

64 102 9/8/98

74 103 9/8/93

bid bname color

101 Interlake blue

102 Interlake red

103 Clipper green

104 Marine red

Sailors

Reserves

Boats

Example: Tuple Variable Bindings

s

r1 r2

b1 b2

(Bindings at one point in time J)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14

Find the names of sailors who’ve reserved all boats

Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)

{ t(sname) | ∃s ∈ Sailors (t.sname = s.sname ∧ ∀b ∈ Boats (∃r ∈ Reserves (r.sid = s.sid ∧ b.bid = r.bid) ) ) }

v  Things to notice:

§  How universal quantification addresses the “all” query use case §  Highly declarative nature of this form of query language!

Page 8: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15

Find the names of sailors who’ve reserved all Interlake boats

Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)

{ t(sname) | ∃s ∈ Sailors (t.sname = s.sname ∧

∀b ∈ Boats (b.bname = ‘Interlake’ ⇒ ( ∃r ∈ Reserves (r.sid = s.sid ∧ b.bid = r.bid) ) ) ) }

v  Or, if you prefer:

{ t(sname) | ∃s ∈ Sailors (t.sname = s.sname ∧

∀b ∈ Boats (b.bname ≠ ‘Interlake’ ∨ ( ∃r ∈ Reserves (r.sid = s.sid ∧ b.bid = r.bid) ) ) ) }

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16

Relational Calculus Summary

v  Relational calculus is non-operational, and users define queries in terms of what they want, not in terms of how to compute it. (Declarativeness: “What, not how!”)

v  Algebra and safe calculus subset have same expressive power, leading to the notion of relational completeness for query languages.

v  Two calculus variants: TRC (tuple relational calculus, which we’ve studied here) and DRC (domain relational calculus).

Page 9: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17

SQL-Based DBMSs

v  Commercial RDBMS choices include §  DB2 (IBM) §  Oracle §  SQL Server (Microsoft) §  Teradata

v  Open source RDBMS options include §  MySQL §  PostgreSQL

v  And for so-called “Big Data”, we also have §  Apache Hive (on Hadoop) + newer wannabees

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18

Example Instances

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day22 101 10/10/9658 103 11/12/96

R1

S1

S2

v  We’ll use these instances of our usual Sailors and Reserves relations in our examples.

Page 10: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19

Example Data in MySQL

Sailors Reserves

Boats

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20

Basic SQL Query

v  relation-list A list of relation names (possibly with a range-variable after each name).

v  target-list A list of attributes of relations in relation-list v  qualification Comparisons (Attr op const or Attr1 op

Attr2, where op is one of <, <=, =, >, >=, <>) combined using AND, OR and NOT.

v  DISTINCT is an optional keyword indicating that the answer should not contain duplicates. Default is that duplicates are not eliminated! (Bags, not sets.)

SELECT [DISTINCT] target-list FROM relation-list WHERE qualification

Page 11: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21

Conceptual Evaluation Strategy

v  Semantics of an SQL query defined in terms of the following conceptual evaluation strategy: §  Compute the cross-product of relation-list. (✕) §  Discard resulting tuples if they fail qualifications. (σ) §  Project out attributes that are not in target-list. (π) §  If DISTINCT is specified, eliminate duplicate rows. (δ)

v  This strategy is probably the least efficient way to compute a query! An optimizer will find more efficient strategies to compute the same answers.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22

Example of Conceptual Evaluation SELECT S.sname FROM Sailors S, Reserves R ß using S1 WHERE S.sid=R.sid AND R.bid=103

(sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96

58 rusty 10 35.0 58 103 11/12/96

Page 12: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23

A Note on Range Variables

v  Really needed only if the same relation appears twice (or more) in the FROM clause. The previous query can also be written as: SELECT sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND bid=103

SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103

It is good style, however, to use range variables always!

OR

Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24

Find sailors who’ve reserved at least one boat

v  Would adding DISTINCT to this query make a difference? (With our data? With possible data?)

v  What is the effect of replacing S.sid by S.sname in the SELECT clause? Would adding DISTINCT to this variant of the query make a difference?

SELECT S.sid FROM Sailors S, Reserves R WHERE S.sid=R.sid

Sailors(sid, sname, rating, age) Reserves(sid, bid, day) Boats(bid, bname, color)

Page 13: Introduction to Data Management Lecture #13 (Relational

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25

Expressions and Strings

v  Illustrates use of arithmetic expressions and string pattern matching: Find triples (of ages of sailors and two fields defined by expressions) for sailors whose names begin and end with B and contain at least three characters.

v  AS provides a way to (re)name fields in result. v  LIKE is used for string matching. `_’ stands for any

one character and `%’ stands for 0 or more arbitrary characters. (See MySQL docs for more info...)

SELECT S.sname, S.age, 7 * S.age AS dogyears FROM Sailors S WHERE S.sname LIKE ‘B_%B’

Sailors(sid, sname, rating, age)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26

To Be Continued!

v  More SQL ahead!