49
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

Embed Size (px)

DESCRIPTION

CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. Introduction. - We discuss here two mathematical formalisms which can be used as the basis for stating and obeying queries in some specific user-oriented language.  SQL is based on relational algebra. - PowerPoint PPT Presentation

Citation preview

Page 1: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

1

CHAPTER 4

RELATIONAL ALGEBRA AND CALCULUS

Page 2: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

2

Introduction- We discuss here two mathematical formalisms which

can be used as the basis for stating and obeying queries in some specific user-oriented language.

SQL is based on relational algebra.

- QBE (not included in the new edition of the text) and Datalog (see chapter 24) are based on relational calculus.

- The mathematical formalisms introduce various mathematical operators whose application yields answers to the queries.

Page 3: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

3

PRELIMINARIES- The inputs and outputs of a query are relational instances.- We shall present the mathematical operators as operating on

relations (sets of lists). However, the cost of eliminating duplicate rows has also led to the practical usage of “multisets” (also called bags) rather than sets. Therefore we shall also present the necessary mathematical operations as operating on relations which consist of multisets of rows.

- We shall always illustrate the mathematical operations through examples. To that effect we shall use a sample database defined by the schema (shown on p. 101) and the instances shown in figures 4.1, 4.2, 4.3. In some cases we shall use larger instances under the same schema.

-It will be noted that queries can always be expressed by several different expressions. We shall often look at these various approaches.

-We shall give unique identifying numbers to our illustrative queries.

Page 4: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

4

SAMPLE SCHEMAS AND INSTANCES

The Schemas:Sailors(sid: integer, sname: string, rating: integer, age: real)Boats(bid: integer, bname: string, color: string)Reserves(sid: integer, bid: integer, day: date)

The Instances:

Page 5: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

5

SEMANTICS OF THE SAMPLE RELATIONS

• Sailors: Entity set; lists the relevant properties of sailors.

• BoatsBoats: Entity set; lists the relevant properties of boats.

• Reserves: Relationship set: links sailors and boats by describing the boat number and date for which a sailor made a reservation.

Example of the declarative sentences for which rows stand:

Row 1: “Sailor ’22’ reserved boat number ‘101’ on 10/10/98”.

N.B. The declarative sentence is obvious in this case, but this may not always be the case, especially for relations which express complex relationship sets with complicated constraints. For example we might add the constraint that a sailor can only reserve one boat on any given date; or, worse yet that a sailor can only reserve up to two boats on any given date.

Page 6: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

6

Relational Algebra

Page 7: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

7

Selection and Projection• Selection Operator: σrating>8 (S2)Retrieves from the current instance of relation named S2 those rows

where the value of the attribute ‘rating’ is greater than 8.Applying the above selection operator to the sample instance of S2

shown in figure 4.2 yields the relational instance on figure 4.4 as shown below:

• π

condition

Page 8: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

8

Projection Operator πsname,rating(S2)Retrieves from the current instance of the relation named S2 those

columns whose names are ‘sname’ and ‘rating’.

Applying the above operator to the sample instance of S2 shown in figure 4.2 yields the relational instance on figure 4.5 as shown below:

N. B.: Note that the projection operator can produce duplicate rows in the resulting instance.

Page 9: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

9

Remark Concerning Duplicate Rows

• Duplicate rows are not permitted in relational algebra and calculus.

• Duplicate rows can occur in SQL, though they may be controlled by explicit keywords.

Page 10: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

10

- Projection Operator (cont’d)

Similarly πage(S2) yields the following relational instance

Page 11: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

11

FUNCTIONAL APPLICATION OF SELECTION AND PROJECTION OPERATORS

The selection and projection operators can be applied successively as many times as desired in the usual functional denotation as illustrated below.

Thus the expression

πsname,rating(σrating>8(S2))

yields the following relational instance

Page 12: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

12

SET OPERATIONSThe following set operations are also available in relational algebra:

• Union*

• Intersection*

• Set-difference*

• Cross-product

N.B.

(1) The asterisks indicate operations whose operand relations must be union-compatible. Two relation instances are said to be union-compatible if:

- they have the same number of fields,

- corresponding fields have the same domains.

- they have the same semantics.

(2) The results of set operations on sets and multisets are different, therefore we shall examine both of these separately.

Page 13: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

13

EXAMPLES OF SET OPERATIONS ON RELATIONS

• Union: Given the sample instances S1 and S2

The union of S1 and S2, i.e. S1 ∪ S2 is shown below

Page 14: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

14

• Given the two sample relational instances

We can form:Intersection: S1∩S2

Set-Difference: S1 – S2

Page 15: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

15

Given the two relational samples S1 and S2

We can form the Cross-product S1 R1:

Page 16: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

16

MULTISET OPERATIONSExamples: Given two multisets

M1 = {a, a, b, b, c, d, f} M2 = {a, b, b, d, e, e}We can form

MU = M1 ∪ M2 = {a, a, a, b, b, b, b, c, d, d, d, f, e, e}MI = M1 ∩ M2 = {a, b, b, d}MD = M1 – M2 = {a, c, f}

MC= M1 M2 = {<a, a>, <a, b>, <a, b>, <a, d>, <a, e>, <a, e>, <a, a>, <a, b>, <a, b>, <a, d>, <a, e>, <a, e>, <b, a>, <b, b>, <b, b>, <b, d>, <b, e>,<b, e>, <b, a>, <b, b>, <b, b>, <b, d>, <b, e>,<b, e>, <c, a>, <c, b>, <c, b>, <c, d>, <c, e>, <c, e>, <d, a>, <d, b>, <d, b>, <e, d>, <d, e>, <d, e>, <f, a>, <f, b>, <f, b>, <f, d>, <f, e>, <f, e>}

Page 17: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

17

RENAMING OPERATOR, Purpose: to avoid name conflicts and to permit naming

anonymous relational instances (like query answers).Example: the expression

(C(1 sid1, 5 sid2), S1 R1)Returns a relational instance that contains the tuples

shown in figure 4.11 and has the schemaC(sid1: integer, sname: string, rating: integer, age: real,

sid2: integer, bid: integer, day: date)

N.B. The second operand may be a simple relational instance name, thereby renaming that relational instance.

Instance being renamed(can be a name or an operation)Attributes being renamed

(1st and 5th)Name of new relation

Page 18: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

18

SPECIAL RELATIONAL OPERATORS

The following operators are peculiar to relations:

- Join operators

There are several kind of join operators. We only consider three of these here (others will be considered when we discuss null values – see section 5.6.4):

- (1) Condition Joins

- (2) Equijoins

- (3) Natural Joins

- Division

Page 19: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

19

JOIN OPERATORS Condition Joins: - Defined as a cross-product followed by a selection:

R ⋈c S = σc(R S) ( is called the bow-⋈tie)where c is the condition.

- Example:Given the sample relational instances S1 and R1

The condition join S ⋈S1.sid<R1.sid R1 yields

Page 20: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

20

Equijoin:Special case of the condition join where the join condition consists solely of

equalities between two fields in R and S connected by the logical AND operator ( ).∧

Example: Given the two sample relational instances S1 and R1

The operator S1 R.sid=Ssid R1 yields

Page 21: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

21

Natural Join

- Special case of equijoin where equalities are implicitly specified on all fields having the same name in R and S.

- The condition c is now left out, so that the “bow tie” operator by itself signifies a natural join.

- N. B. If the two relations have no attributes in common, the natural join is simply the cross-product.

Page 22: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

22

DIVISION

- The division operator is used for queries which involve the ‘all’

qualifier such as “Find the names of sailors who have reserved all boats”.

- The division operator is a bit tricky to explain, and perhaps best approached through examples as will be done here.

Page 23: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

23

EXAMPLES OF DIVISION Consider the relational instances A, B1, B2, and B3. The division operation A/Bi yields the three relational instances on the right, which are constructed as follows: search among A for all those rows whose A.pno are the same as all the B.pno, and whose A.sno is the same. The corres- ponding A.sno then belong in the

answer relation.

Thus, B1 contains only p2, so we find in A rows 2, 6, 7, 8 for which A.pno = p2. Therefore we must include in the answer (A/B1) the A.sno values in rows 2, 6, 7, 8. Next, B2 contains p2, p4. So we locate in A rows 2 and 4 which contain respectively p2 and p4 and which have the same value of A.sno (namely s1); then we locate in A rows 8, 9 which again contain respectively p2 and p4 with the same value of A.sno (namely s4); so we place in the answer s1 and s4. Next, B3 contains p1, p2, p4; so we search in A for three rows with the same sno, containing respectively p1, p2, p4. We find only rows 1, 2, 4 for which A.sno is s1. So s1 goes in the answer.

Figure 4.14 Examples Illustrating Division

Page 24: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

24

DIVISIONInterpretation of the division operation A/B:

- Divide the attributes of A into 2 sets: A1 and A2.

- Divide the attributes of B into 2 sets: B2 and B3.

- Where the sets A2 and B2 have the same attributes.

- For each set of values in B2:

- Search in A2 for the sets of rows (having the same A1 values) whose A2 values (taken together) form a set which is the same as the set of B2’s.

- For all the set of rows in A which satisfy the above search, pick out their A1 values and put them in the answer.

Page 25: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

25

DIVISIONExample: Find the names of sailors who have reserved all boats:

(1) A = sid,bid(Reserves). A1 = sid(Reserves) A2 = bid(Reserves)

(2) B2 = bid(Boats) B3 is the rest of B.

Thus, B2 ={101, 102, 103, 104}

(3) Find the rows of A such that their A.sid is the same and their combined A.bid is the set B2.

Thus we find A1 = {22}

(4) Get the set of A2 corresponding to A1: A2 = {Dustin}

Page 26: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

26

FORMAL DEFINITION OF DIVISION

The formal definition of division is as follows:

A/B = x(A) - x((x(A) B) – A)

Page 27: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

27

EXAMPLES OF ALGEBRA QUERIESIn the rest of this chapter we shall illustrate queries using the

following new instances S3 of sailors, R2 of Reserves and B1 of boats.

Page 28: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

28

QUERY Q1Given the relational instances:

(Q1) Find the names of sailors who have reserved boat 103

sname((σbid=103 Reserves) ⋈ Sailors)

The answer is thus the following relational instance

{<Dustin>, <Lubber>, <Horatio>}

Page 29: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

29

QUERY Q1 (cont’d)There are of course several ways to express Q1 in relational algebra.

Here is another:

sname(σbid=103(Reserves Sailors))⋈ Sailors))

Which of these expressions should we use?

That is a question of optimization. Indeed, when we describe how to state queries in SQL, we can leave it to the optimizer in the DBMS to select the nest approach.

Page 30: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

30

QUERY Q2(Q2) Find the names of sailors who have reserved a red boat.

sname((σcolor=‘red’Boats) ⋈ Reserves ⋈ Sailors

Page 31: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

31

QUERY Q3(Q3) Find the colors of boats reserved by Lubber.

color((σsname=‘Lubber’Sailors) ⋈ Reserves ⋈ Boats

Page 32: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

32

QUERY Q4(Q4) Find the names of Sailors who have reserved at least one boat

sname(Sailors ⋈ Reserves)

Page 33: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

33

QUERY Q5 (Q5) Find the names of sailors who have reserved a red or a green boat.

(Tempboats, (σcolor=‘red’) ∪ (σcolor=‘green’Boats))

sname(Tempboats ⋈ Reserves ⋈ Sailors)

Page 34: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

34

QUERY Q6(Q6) Find the names of Sailors who have reserved a red and a green

boat.It seems tempting to use the expression used in Q5, replacing simply

by . However, this won’t work, for such an expression is requesting the names of sailors who have requested a boat that is both red and green! The correct expression is as follows:

(Tempred, sid((σcolor=‘red’Boats) ⋈ Reserves))

(Tempgreen, sid((σcolor=‘green’Boats) ⋈ Reserves))

sname ((Tempred ∩ Tempgreen) ⋈ Sailors)

Page 35: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

35

QUERY Q7

(Q7) Find the names of sailors who have reserved at least two boats.

(Reservations, sid,sname,bid(Sailors ⋈ Reserves))

(Reservationpairs(1sid1, 2sname, 3bid1, 4sid2,

5sname, 6bid2), ReservationsReservations)

sname1σ(sid1=sid2)(bid1bid2)Reservationpairs)

Page 36: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

36

QUERY 8(Q8) Find the sids of sailors with age over 20 who have not reserved a

red boat.

sid(σage>20Sailors) - sid((σcolor=‘red’Boats) ⋈ Reserves ⋈ Sailors)

Page 37: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

37

QUERY 9(Q) Find the names of sailors who have reserved all boats.

(Tempsids, (sid,bidReserves)/(bidBoats))

sname(Tempsids ⋈ Sailors

Page 38: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

38

QUERY Q10(Q10) Find the names of sailors who have reserved all boats called

Interlake.

(Tempsids, (sid,bidReserves)/(bid(σbname=‘Interlake’Boats)))

sname(Tempsids ⋈ Sailors)

Page 39: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

39

RELATIONAL CALCULUS

Page 40: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

40

Introduction

- Relational algebra is procedural: it specifies the procedure to be followed in order to get the answer to the query.

- Relational calculus is declarative: it describes (declares) the answer to the query without specifying how to get it.

- Relational calculus strongly resembles First Order Predicate Logic, or simply first order logic.

- There are two variants of relational calculus:

- Tuple relational calculus (TRC)

- Domain relational calculus (DRC)

Page 41: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

41

TUPLE RELATIONAL CALCULUS

- A query statement in TRC is a set declaration having the form:

{ P first-order logic formula}

- This is to be read as ‘the set of tuple variables, P, for which the specified first order logic formula is true’.

- Thus a TRC query is a request (to the DBMS) to produce a set of tuples corresponding to the tuples of the relational answer in SQL.

- Example

Given the following query:

(Q11) Find all sailors with a rating above 7.

The TRC statement of this query is

{S S Sailors S.rating > 7}.

Page 42: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

42

SYNTAX AND SEMANTICS OF TRC

• The syntax and semantics of TRC is that of first-order logic. It is stated quite precisely in the text and there is no need to repeat it here. Instead we shall examine a few query applications.

Page 43: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

43

QUERY Q12(Q12) Find the names and ages of sailors with a rating above 7.

{P S ∃ Sailors (S.rating > 7 P.name = S.name P.age = S.age)}

Remarks

1. The fact that the tuple variable P occurs with two attributes (using the dot notation) means that solely these two attributes are required in the answer relation.

2. The symbols used are the usual first-order logic symbols:

∀: for all : there exists ∃ ⋀ : and : or ¬ : not : implies⋁ ⇒

Page 44: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

44

QUERIES 1,2,7,9,14

The TRC statements for these queries are pretty well self explanatory, especially with the added English statements of how to read them.

Page 45: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

45

DOMAIN RELATIONAL CALCULUS (1)

- The form of a DRC query is as follows:

{<X1, X2, … , Xn> logical DRC formula}

signifying that the system must construct (and output) a set of all the tuples which satisfy the stated logical DRC formula in terms of the n attributes X1, X2, … ,Xn. Thus, the answer is a relational instance with attributes X1, X2, … , Xn, these attributes corresponding to those of some of the relations in the database.

- Again, the approach used by the system is left unspecified.

- The Syntax and the semantics of the DRC are explicitly and precisely described in the text.

Page 46: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

46

DOMAIN RELATIONAL CALCULUS (2)

Example:

(Q11) Find all sailors with a rating above 7.

{ < I, N, T, A > <I, N, T, A > ∈ Sailors ⋀ T > 7 }

Other queries are illustrated and described in the text with all necessary explanation.

Page 47: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

47

EXPRESSIVE POWER OF ALGEBRA AND CALCULUS( 1)

Safety- Certain queries stated in the relational calculus may lead to

answers which contain an infinite number of tuples (or at least as many as the system can handle).

Example:

Consider the TRC query {S ¬(S ∈ Sailors)}. Since there is a quasi-infinite number of tuples that can be created with the attributes of sailors, the answer is (quasi)-infinite.

- A query which yields a (quasi)-infinite answer is said to be unsafe, and, of course, should not be allowed by the system.

- It is possible to define a safe formula in TRC (see text, section 4.4).

Page 48: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

48

EXPRESSIVE POWER OF ALGEBRA AND CALCULUS (2)

• A query language is said to be relationally complete if it can express all the queries that can be expressed in relational algebra.

• SQL is relationally complete.

• Every query that can be expressed using a safe relational calculus query can be also be expressed as a relational algebra query.

• SQL provides additional expressive power beyond relational algebra.

Page 49: CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

49

QUERY Q1