34
Topics: The Relational Algebra 1 Relational Database Systems 1 DR. Eng. Ramez Alkhatib ramezalkhatib @hotmail.com

Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Topics:The Relational Algebra

1

Relational

Database Systems 1

DR. Eng. Ramez Alkhatib

[email protected]

Page 2: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Relational Data Manipulation Languages

• Variety languages used by relational database management systems

– Procedural languages: The user tells the system how to manipulatethe data, e.g. Relational Algebra

– Declarative languages: the user states what data is needed but notexactly how it is to be located, e.g. Relational Calculus and SQL

– Graphical languages: allowing the user to give an example or anillustration of what data should be found, e.g. QBE

2

𝝈 𝝅 ⋈

Page 3: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Queries and query languages

• Query: A question about the data in a database.

• Query: A statement requesting the retrieval of information from adatabase.

• Example:

• Query language: language in which queries are expressed.

• Query languages versus programming languages!

– Query languages are not intended to be used for complex calculations.

– Query languages support easy and efficient access to large data sets.

3

Find the names of students who are taking DB1

Page 4: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Relational Algebra

• Relational Algebra is a set of 6 operators that act on tables toproduce tables.

• Just as we operate numbers with arithmetic, we operate on tables withrelational algebra.

• Key to understanding SQL and query processing and optimization.

– SQL is, roughly speaking, a generalization of relational algebra.

– Internal languages: an SQL query is rewritten as relational algebraexpression, which can in turn be rewritten into a more efficient formand evaluated using a bunch of well developed algorithms.

4

Page 5: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Relational Algebra

• A set of operations (functions), each of which takes a relation (orrelations) as input and produces a relation as output.

• Basic operations: using these we can build up sophisticated databasequeries.

– Projection

– Selection

– Union

– Difference

– Product

– Renaming

• Additional operations: Intersection, Join, Division.

5

Page 6: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Preliminaries

• Review: a relational database is a collection of data.

• A query is applied to relation instances and the result of a query is alsoa relation instance.

– Schemas of input relations for a query are fixed. The query will runregardless of instances.

– The schema for a result of a given query is also fixed. It will bedetermined by the query.

• Example schemas:

Student(sid:int, sname:string, gpa:real)

Course(cid:string, cname:string, credit:integer, teacher:string)

Enroll(sid:int, cid:string, grade:string)

-------------------

6

Page 7: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Example Instances

Student Enroll

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

4 Ali 1.0

sid cid grade

1 501 A

2 502 A

3 501 C

3 502 B

Course

cid cname credits teacher

501 db 6 slim

502 tc 6 haytham

7

Page 8: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Projection

• Given a list of column names A and a relation R.

• πA(R): extracts the columns in A from the relation R.

• Example:

Student πsid,gpa(Student)

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

sid gpa

1 1.0

2 2.3

3 0.7

• The result of the projection can be visualized as a vertical partition ofthe relation into two relations.

• Questions:

– What is the schema of the result? Recall Student has schema

Student(sid:int, sname:string, gpa:real)

– What is the query (in English)?

8

Page 9: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Projection - continued

• Suppose the result of πA(R) has duplicate values.

• Example:

Student πgpa(Student)

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

4 Ali 1.0

gpa

1.0

2.3

0.7

• In relational algebra, the answer is always a set (has to eliminateduplicates).

• However, SQL and some other languages return, by default, a bag (don’teliminate duplicates)

9

Page 10: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Selection

• Given a condition C and a relation R.

• σC(R): extracts those rows from the relation R that satisfy C.

• Example:

Student σgpa≤2.0(Student)

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

4 Ali 1.0

sid sname gpa

1 Dina 1.0

3 Maria 0.7

4 Ali 1.0

• The result of the selection can be visualized as a horizontal partition ofthe relation into two sets of tuples.

• Questions:

– What is the schema of the result? Recall Student has schemaStudent(sid:int, sname:string, gpa:real)

– What is the query (in English)?10

Page 11: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Selection – What can go into the condition?

• Condition C in σC(R) is built up from

– Boolean operations on the field names: <,≤,=, 6=,≥, >.Example: gpa ≤ 2.0, sname = Ali.

– Predicates constructed from these using ∧ (and) , ∨ (or) , ¬ (not).

• Question: What is the result of σgpa≤2.0 ∧ sname = Ali(Student):

Student

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

4 Ali 1.0

σgpa≤2.0 ∧ sname = Ali(Student)

sid sname gpa

4 Ali 1.0

11

Page 12: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Set Operations

• Set operations: S ∪ T, S − T, S ∩ T– Union (S ∪ T ): a relation that includes all tuples that are either in S

or in T or in both S and T . Duplicate tuples are eliminated.

– Intersection (S ∩ T ): a relation that includes all tuples that are inboth S and T .

– Difference (S − T ): a relation that includes all tuples that are in S

but not in T .

• Condition: All these operations must be union-compatible:

• Question:

– Recall Student and Course given above. Can we writeStudent ∪Course?

– What is the schema of the result of a set operation?

12

(i.e., they must consist of the same attributes)

Page 13: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Set Operations – Union

Student1 Student2

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

4 Ali 1.0

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

5 Amira 1.0

Student1 ∪ Student2

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

4 Ali 1.0

5 Amira 1.0

13

Page 14: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Set Operations – Intersection

Student1 Student2

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

4 Ali 1.0

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

5 Amira 1.0

Student1 ∩ Student2

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

14

Page 15: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Set Operations – Difference

Student1 Student2

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

4 Ali 1.0

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

5 Amira 1.0

Student1− Student2

sid sname gpa

4 Ali 1.0

Student2− Student1

sid sname gpa

5 Amira 1.0

15

Page 16: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Set Operations – Intersection

• In relational algebra, basic set operations are union and set differenceonly.

• We can implement the other set operations using those basic operations.

• For example, for any relations S and T , we can already express S ∩ T

S ∩ T = S − (S − T )

• It is mathematically nice to have fewer operators, however operations likeset difference may be less efficient than intersection.

16

Page 17: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Product

• Product S × T connects two relations S and T that are not necessarilyunion-compatible.

Student Course

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

cid cname credits teacher

501 db 6 slim

502 tc 6 haytham

Student×Course

sid sname gpa cid cname credits teacher

1 Dina 1.0 501 db 6 slim

2 Ahmed 2.3 501 db 6 slim

3 Maria 0.7 501 db 6 slim

1 Dina 1.0 502 tc 6 haytham

2 Ahmed 2.3 502 tc 6 haytham

3 Maria 0.7 502 tc 6 haytham

17

Page 18: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Cartesian Product S × T

• Each row of S is paired with each row of T .

• Schema of the result has one field per field of S and T .

• Example: The schema of Student×Course

(sid, sname, gpa, cid, cname, credits, teacher)

• Question:

– What is the primary key of S × T in general?Answer: Primary key of S and primary key of T .

– Cardinality: Suppose that S has n rows and T has m rows. What isthe cardinality of S × T?Answer: n×m

18

Page 19: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Product - continued

• What happens when we form a product of two relations with columnshaving the same name?

Student Enroll

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

sid cid grade

1 501 A

2 502 A

• May vary among systems: Common answer is to suffix the attributenames with 1 and 2:

Student×Enroll

sid:1 sname gpa sid:2 cid grade

1 Dina 1.0 1 501 A

2 Ahmed 2.3 1 501 A

3 Maria 0.7 1 501 A

1 Dina 1.0 2 502 A

. . . . . . . . . . . . . . . . . .19

Page 20: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Product - continued

• Products are hardly used alone; they are typically used in conjuction witha selection.

• Example: σsid:1 = sid:2 ∧ cid = 501(Student×Enroll)

sid:1 sname gpa sid:2 cid grade

1 Dina 1.0 1 501 A

• What does this query do (in English)?

• Suppose we want to find the names and grades of students who are taking501. How to write the query?

πsname, grade(σsid:1 = sid:2 ∧ cid = 501(Student×Enroll))

sname grade

Dina A

20

Page 21: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Joins – Conditional Join

• The combination of a selection and a join is so common that it has aspecial symbol and name.

S 1C T is defined to be σC(S × T )

• Example: Student 1sid:1 = sid:2 Enroll is

sid:1 sname gpa sid:2 cid grade

1 Dina 1.0 1 501 A

2 Ahmed 2.3 2 502 A

• Questions:

– What is the result schema?Assume that S(A1, . . . , An) and T (B1, . . . , Bm), the join S 1C T

results a relation with the attributes (A1, . . . , An, B1, . . . , Bm)

– Conditional join is in general more efficient than cross product. Why?

• The condition C in a conditional join is usually an equality or conjunctionof equalities (EquiJoin).

21

Page 22: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Natural Join

• S 1 T : special case of conditional join, equality on common fields of Sand T

– Equality condition only

– On all common fields

– Leave only one copy of these fields in the resulting relation.

• Example:

S T S 1 T

A B C

1 2 a

1 2 b

1 3 c

2 1 g

A B D

1 2 d

1 2 e

1 4 d

A B C D

1 2 a d

1 2 a e

1 2 b d

1 2 b e

• Question: What if S and T have no fields in common?

Answer: Cartesian Product

22

Page 23: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Natural Join – Example

Student Enroll

sid sname gpa

1 Dina 1.0

2 Ahmed 2.3

3 Maria 0.7

sid cid grade

1 501 A

2 502 A

Student 1 Enroll

sid sname gpa cid grade

1 Dina 1.0 501 A

2 Ahmed 2.3 502 A

23

Page 24: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Queries – Example (I)

Student(sid:int, sname:string, gpa:real)

Course(cid:string, cname:string, credit:integer, teacher:string)

Enroll(sid:int, cid:string, grade:string)

1. Find the names of the students:

πsname(Student)

2. Find the courses taught by Slim

σteacher = Slim(Course)

3. Find the titles of courses taught by Slim

πcname(σteacher = Slim(Course))

• These queries involve a single relation: Unary operations

• The result of a query is also a relation and therefore can be used as inputof another query.

24

Page 25: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Queries – Example (II)

Student(sid:int, sname:string, gpa:real)

Course(cid:string, cname:string, credit:integer, teacher:string)

Enroll(sid:int, cid:string, grade:string)

Find the names of students who are taking 501.

• Two relations: use (natural) join or product

• Fields: projection

• Condition: selection

Solutions:

• πsname(σcid = 501(Student 1 Enroll))

• πsname(Student 1 σcid = 501(Enroll))

25

Page 26: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Renaming – Another Operator

• It is simpler to break down a complex sequence of operations byspecifying intermediate result relations.

• Example:

πsname(Student 1 σcid = 501(Enroll))

is equivalent to

– temp1← σcid = 501(Enroll)

– temp2← Student 1 temp1

– Result← πsname(temp2)

• The same technique can be used to rename the attributes in theintermediate and result relations.

• Example:

R(firstName)← πsname(temp2)

26

Page 27: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Renaming – Another Operator

• General Rename operation when applied to a relation R of degree n isdenoted by one of the following three forms:

– ρS(B1,...,Bn)(R): renames both the relation and the attributes

– ρS(R): renames the relation only

– ρ(B1,...,Bn)(R): renames the attributes only

• ρ denotes the rename operator

• S the new relation name

• B1, . . . , Bn the new attribute names

• If the attributes of R are A1, . . . , An in that order, then each Ai isrenamed as Bi.

27

Page 28: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Queries - Example (III)

28

Page 29: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Division - Example

R S T = R/S

A B

a1 b1

a2 b1

a3 b1

a4 b1

a1 b2

a3 b2

a2 b3

a3 b3

a4 b3

a1 b4

a2 b4

a3 b4

A

a1

a2

a3

B

b1

b4

29

Page 30: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Division - Another Operator

Find the sids of students who are taking all courses.

π(sid,cid)

(Enroll) / πcid(Course)

In general: R/S

• The schema of S must be a proper subset of the schema of R, e.g.{cid} ⊂ {cid, sid}.

• The schema of the result is the set difference of the schema of R and theschema of S.

• For every tuple t in the result and every tuple s in S, t s (t appended ontos) is in the first relation R.

30

Page 31: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Division – Example

31

Page 32: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

What We Cannot Compute with Relational Algebra?

• Arithmetic operations, e.g, 3 + 3.

• Aggregate, e.g. the number of students who are taking CSEN501, or theaverage GPA of all students.

IN SQL, these are possible – SQL has numerous extensions to relationalalgebra.

• Recursive queries: given a relation parent() compute the ancestor.

These are not possible in SQL either.

• Complex structures, e.g. lists, arrays, nested relations, . . .

SQL cannot handle complex structures either, but they are possible inobject-oriented data models and query languages.

32

Page 33: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Summary – What you should remember!

• What are query languages?

• Relational Algebra: A set of operations (functions), each of which takesa relation (or relations) as input and produces a relation as output.

• Basic Operations: projection, selection, union, difference, product,renaming

• Additional Operations: intersection, division, join (very useful)

• What we cannot do with relational algebra.

33

Page 34: Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and query processing and optimization. { SQL is, roughly speaking, a generalization

Translation of Relational Algebra Exp.

MOVIE(id, namenot null, year, type, remark)

COUNTRY(movie, countrynot null)

Πcountry, name(COUNTRY ⋈COUNTRY.movie=MOVIE.id

ςyear=1893 ⋀ type=‚cinema‘ MOVIE)

From which countries are the movies of the year 1893

and what are their names?

34