42
Database Systems I Foundations of Databases Summer term 2010 Melanie Herschel [email protected] Database Systems Group, University of Tübingen 1 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen Chapter 5 Relational Algebra 1. Introduction 2. ER-Modeling 3. Relational model(ing) 4. Relational algebra 5. SQL 6. Programming 7. Advanced topics • After completing this chapter, you should be able to enumerate and explain the operations of relational algebra (there is a core of 5 relational algebra operators), write relational algebra queries of the type join–select–project, discuss correctness and equivalence of given relational algebra queries. 2

Database Systems I - db.inf.uni-tuebingen.de · Chapter 5 Relational Algebra 1. Introduction 2. ER-Modeling 3. Relational model(ing) 4. Relational algebra 5. SQL 6. Programming 7

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Database Systems IFoundations of Databases

Summer term 2010

Melanie [email protected]

Database Systems Group, University of Tübingen

1

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Chapter 5Relational Algebra

1. Introduction2. ER-Modeling3. Relational model(ing)4. Relational algebra5. SQL6. Programming7. Advanced topics

• After completing this chapter, you should be able to

‣ enumerate and explain the operations of relational algebra (there is a core of 5 relational algebra operators),

‣ write relational algebra queries of the typejoin–select–project,

‣ discuss correctness and equivalence of given relational algebra queries.

2

Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen

Chapter 5Relational algebra

• Introduction

• Unary Operators: Selection, Projection

• Binary Operators: Cartesian Product, Join, Outer Join

• Set Operations

• Combining Operators

• Formal Definitions, A Bit of Theory

3

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Example Database

4

Sample Movie Database Tables

AID Name DOB1 Jolie 4.6.75

2 Pitt 18.12.63

ActorMID Title Year Rating1 Babel 2006 7

2 Inglorious Bastards 2009 8

3 Wanted 2008 3

Movie

AID MID Name1 3 Fox

2 1 Richard Jones

2 2 Lt. Aldo Raine

Role

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra

5

•Relational algebra (RA) is a query language for the relational model with a solid theoretical foundation.

•Relational algebra is not visible at the user interface level (not in any commercial RDBMS, at least).

•However, almost any RDBMS uses RA to represent queries internally (for query optimization and execution).

•Knowledge of relational algebra will help in understanding SQL and relational database systems in general.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra

6

• In mathematics, an algebra is a

‣ set (the carrier), and

‣operations that are closed with respect to the set.

•Example: ( , {∗, +}) forms an algebra.

‣ In case of RA,

‣ the carrier is the set of all finite relations.

•We will get to know the operations of RA in the sequel (one such operation is, for example, ∪).

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra

7

•Another operation of relational algebra is selection.

• In contrast to operations like + in , the selection σ is parameterized by a simple predicate.

•For example, the operation σAID=2 selects all tuples in the input relation that have the value 2 in column AID.

Selection

AID MID Name1 3 Fox

2 1 Richard Jones

2 2 Lt. Aldo Raine

Role

σAID=2 =AID MID Name2 1 Richard Jones

2 2 Lt. Aldo Raine

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra

8

•Since the output of any RA operation is some relation R again, R may be the input for another RA operation.

•The operations of RA nest to arbitrary depth such that complex queries can be evaluated. The final results will always be a relation.

•A query is a term (or expression) in this relational algebra.

A query

πTitle,Name(Role ⋈ σRating>5(Movie))

Title NameBabel Richard Jones

Inglorious Bastards Lt. Aldo Raine

=

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra

9

• There are some differences between the two query languages RA and SQL:

•Null values are usually excluded in the definition of relational algebra, except when operations like outer join are defined.

•Relational algebra treats relations as sets, i.e., duplicate tuples will never occur in the input/output relations of an RA operator.

Remember: In SQL, relations are multisets (bags) and may contain duplicates. Duplicate elimination is explicit in SQL (SELECT DISTINCT).

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra

10

•Relational algebra is the query language when it comes to the study of relational query languages (DB Theory):

•The semantics of RA is much simpler than that of SQL. RA features five basic operations (and can be completely defined on a single page, if you will).

•RA is also a yardstick for measuring the expressiveness of query languages. If a query language QL can express all possible RA queries, then QL is said to be relationally complete.

SQL is relationally complete. Vice versa, every SQL query (without null values, aggregation, and duplicates) can also be written in RA.

Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen

Chapter 5Relational algebra

• Introduction

• Unary Operators: Selection, Projection

• Binary Operators: Cartesian Product, Join, Outer Join

• Set Operations

• Combining Operators

• Formal Definitions, A Bit of Theory

11

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Selection

12

Selection

The selection σφ selects a subset of the tuples of a relation, namely those which satisfy predicate φ. Selection acts like a filter on a set.

Selection

σRating > 5 =

MID Title Year Rating1 Babel 2006 7

2 Inglorious Bastards 2009 8

3 Wanted 2008 3

=

MID Title Year Rating1 Babel 2006 7

2 Inglorious Bastards 2009 8

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Selection

13

•A simple selection predicate φ has the form

⟨Term⟩ ⟨ComparisonOperator⟩ ⟨Term⟩

•⟨Term⟩ is an expression that can be evaluated to a data value for a given tuple:

‣ an attribute name,

‣ a constant value,

‣ an expression built from attributes, constants, and data type operations like +, !, ∗, /.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Selection

14

•⟨ComparisonOperator⟩ is

‣= (equals), " (not equals),

‣< (less than), > (greater than), #, $,

‣or other data type-dependent predicates (e.g., LIKE).

•Examples for simple selection predicates:

‣Name = ‘Fox’

‣Rating > 5

‣Movie.MID = Role.MID

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Selection

15

•σφ(R) may be implemented as follows.

• If index structures are present (e.g., a B-tree index), it is possible to evaluate σφ(R) without reading every tuple of R.

“Naive” selection

create a new temporary relation T; foreach t ∈ R do p ← φ(t); if p then insert t into T; fiodreturn T;

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Selection

16

A few corner cases

Selection (4)181

• σϕ(R) may be imlemented as:“Naive” selection

create a new temporary relation T ;foreach t ∈ R dop ← ϕ(t);if p theninsert t into T ;

fi

od

return T ;

• If index structures are present (e.g., a B-tree index), it ispossible to evaluate σϕ(R) without reading every tuple of R.

Selection (5)182

A few corner cases

σC=1

A B1 31 42 5

= � (schema error)

σA=A

A B1 31 42 5

=A B1 31 42 5

σ1=2

A B1 31 42 5

= A B

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Selection

17

•σφ(R) corresponds to the following SQL query:

SELECT * FROM" R WHERE" φ

•A different relational algebra operation called projection corresponds to the SELECT clause. Source of confusion.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Selection

18

•More complex selection predicates may be performed using the Boolean connectives:

‣φ1 ∧ φ2 (“and”),% φ1 ∨ φ2 (“or”),% ¬φ1 (“not”).

‣Note: σφ1 ∧ φ2 (R) = σφ1 (σφ2 (R)).

•The selection predicate must permit evaluation for each input tuple in isolation.

Thus, exists (∃) and for all (∀) or nested relational algebra queries are not permitted in selection predicates. Actually, such predicates do not add to the expressiveness of RA.

∨ and ¬

Are the Boolean connectives ∨, ¬ strictly needed?

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Projection

19

Projection

The projection !L eliminates all attributes (columns) of the input relation but those mentioned in the projection list L.

Projection

&Title,Year =

MID Title Year Rating1 Babel 2006 7

2 Inglorious Bastards 2009 8

3 Wanted 2008 3

=

Title YearBabel 2006

Inglorious Bastards 2009

Wanted 2008

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Projection

20

•The projection !Ai1 ,...,Aik (R) produces for each input tuple (A1 : d1,...,An : dn) an output tuple (Ai1" : di1,...,Aik" : dik ).

•& may be used to reorder columns.

•“σ discards rows, & discards columns.”

•DB slang: “All attributes not in L are projected away.”

• In general, the cardinalities of the input and output relations are not equal.

Projection eliminates duplicates

&YearMID Title Year Rating1 Babel 2006 7

2 Inglorious ... 2009 8

3 Wall-E 2009 3

=Year2006

2009

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Projection

21

•!Ai1 ,...,Aik (R) may be implemented as follows.

•The necessary duplicate elimination makes !L one of the more costly operations in RDBMSs. Thus, query optimizers try hard to “prove” that the duplicate elimination step is not necessary.

“Naive” projection

create a new temporary relation T; foreach t = (A1 :d1,...,An:dn)∈ R do u ← (Ai1:di1,...,Aik:dik); insert u into T;odeliminate duplicate tuples in T; return T;

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Projection

22

• If RA is used to formalize the semantics of SQL, the format of the projection list is often generalized:

•Attribute renaming:

!B1←Ai1,...,Bk←Aik (R)

•Computations (e.g., string concatenation via ||) to derive the value in new columns, e.g.,

πSID,Name ←First || ’ ’ || Last (Producer)

•Such generalized & operators are also referred to as map operators (as in functional programming languages).

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Projection

23

•&A1,...,Ak (R) corresponds to the SQL query:

SELECT" DISTINCT A1,...,Ak FROM"R

•!B1←Ai1,...,Bk←Aik (R) is equivalent to the SQL query:

SELECT DISTINCT A1"[AS] B1,...,Ak"[AS] Bk FROM"R

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Selection vs. Projection

24

Selection vs. Projection

A1 A2 A3 A4 A1 A2 A3 A4Selection σ Projection !

Filters some rowsPreserves all columns

Preserves all rowsFilters some columns

Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen

Chapter 5Relational algebra

• Introduction

• Unary Operators: Selection, Projection

• Binary Operators: Cartesian Product, Join, Outer Join

• Set Operations

• Combining Operators

• Formal Definitions, A Bit of Theory

25

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Cartesian Product

26

• In general, queries need to combine information from several tables.

• In RA, such queries are formulated using ', the Cartesian product.

Cartesian Product

The Cartesian product R " S of two relations R, S is computed by concatenating each tuple t ∈ R with each tuple u ∈ S (◦ denotes tuple concatenation.)

Cartesian Product

AID1 AName DOB1 Jolie 4.6.75

2 Pitt 18.12.63

ActorAID2 MID RName1 3 Fox

2 1 Jones

2 2 Raine

Role

! =

AID1 AName DOB AID2 MID RName1 Jolie 4.6.75 1 3 Fox

1 Jolie 4.6.75 2 1 Jones

1 Jolie 4.6.75 2 2 Raine

2 Pitt 18.12.63 1 3 Fox

2 Pitt 18.12.63 2 1 Jones

2 Pitt 18.12.63 2 2 Raine

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Cartesian Product

27

• If t = (A1 : a1,...,An : an) and u = (B1 : b1, ..., Bm : bm), then t ◦ u =(A1 : a1, ..., An : an, B1 :b1, ..., Bm : bm).

•The Cartesian product can be implemented as follows:

Cartesian Product: Nested Loops

create a new temporary relation T; foreach t ∈ R do foreach u ∈ S do insert t ◦ u into T; ododreturn T;

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Cartesian Product and Renaming

28

• Since attribute names must be unique within a tuple, the Cartesian product may only be applied if R, S do not share any attribute names. (This is no real restriction because we have renaming in &.)

•R # S may be computed by the equivalent SQL query (SQL does not impose the unique column name restriction, a column A of relation R may uniquely be identified by R.A):

SELECT"*FROM R, S

• In RA, this is often formalized by means of of a renaming operator ρX(R). If sch(R) = (A1 : D1,...,An : Dn), then

ρX(R) ≡ !X.A1←A1,...,X.An←An(R)

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Join

29

•The intermediate result generated by a Cartesian product may be quite large in general (|R| = n,|S| = m ⇒ |R#S| = n∗m).

•Since the combination of Cartesian product and selection in queries is common, a special operator join has been introduced.

Join

The (theta-)join R ⋈θ S between relations R, S is defined as

R ⋈θ S ≡ σθ(R ! S)

The join predicate θ may refer to attribute names of R and S.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Join

30

ρA(Actor) ⋈A.AID=R.AID ρR(Role) - Assuming no key and foreign keys are defined

AID AName DOB1 Jolie 4.6.75

2 Pitt 18.12.63

3 Ford 13.7.42

ActorAID MID RName1 3 Fox

2 1 Jones

2 2 Raine

Role

=

A.AID A.AName A.DOB R.AID R.MID R.RName1 Jolie 4.6.75 1 3 Fox

2 Pitt 18.12.63 2 1 Jones

2 Pitt 18.12.63 2 2 Raine

⋈A.AID=R.AID

Note: actor Ford does not appear in the join result.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Join

31

• R ⋈θ S can be evaluated by “folding” the procedures for σ, ':

Nested Loops Join

create a new temporary relation T;foreach t ∈ R do foreach u ∈ S do if θ(t ◦ u) then insert t ◦ u into T ; fi ododreturn T;

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Join

32

•Join combines tuples from two relations and acts like a filter: tuples without join partner are removed.

•Note: if the join is used to follow a foreign key relationship, then no tuples are filtered.

•There are join variants which act like filters only: left and right semijoin (⋉, ⋊):

R⋉θ S ≡ &sch(R)(R ⋈θ S),

or do not filter at all: outer-join (see below).

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Natural Join

33

•The natural join provides another useful abbreviation (“RA macro”).

• In the natural join R ⋈ S, the join predicate θ is defined to be a conjunctive equality comparison of attributes sharing the same name in R, S.

•Natural join handles the necessary attribute renaming and projection.

Natural Join

Assume R(A, B, C) and S(B, C, D). Then:R ⋈ S = !A,B,C,D(σB=B′∧ C=C′ (R " !B′←B,C′←C,D(S) ) )

(Note: shared columns occur once in the result.)

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Joins in SQL

34

• In SQL, R ⋈θ S is normally written using one of the following notations

•Classic notation: SELECT * FROM R, S WHERE θ

•SQL-92 notation: SELECT * FROM R JOIN S ON θ

•Note: this left query is exactly the SQL equivalent of σθ (R # S) we have seen before.

SQL is a declarative language: it is the task of the SQL optimizer to infer that this query may be evaluated using a join instead of a Cartesian product.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Algebraic Laws

35

•The join satisfies the associativity condition

In “join chains”, parentheses are thus superfluous:

•Join is not commutative unless it is followed by a projection, i.e., a column reordering:

Algebraic Laws (1)207

• The join satisfy the associativity condition

(R �� S) �� T ≡ R �� (S �� T ) .

In “join chains”, parentheses are thus superfluous:

R �� S �� T .

• Join is not commutative unless it is followed by a projection,i.e., a column reordering:

πL(R �� S) ≡ πL(S �� R) .

Algebraic Laws (2)208

• A significant number of further algebraic laws hold, which areheavily utilized by the query optimizer.

• Example: selection push-down.

If predicate ϕ refers to attributes in S only, then

σϕ(R �� S) ≡ R �� σϕ(S) .

Selection push-down

Why is selection push-down considered one of the mostsignificant algebraic optimizations?

• (Such effficiency considerations are the subject of“Datenbanken II.”)

Algebraic Laws (1)207

• The join satisfy the associativity condition

(R �� S) �� T ≡ R �� (S �� T ) .

In “join chains”, parentheses are thus superfluous:

R �� S �� T .

• Join is not commutative unless it is followed by a projection,i.e., a column reordering:

πL(R �� S) ≡ πL(S �� R) .

Algebraic Laws (2)208

• A significant number of further algebraic laws hold, which areheavily utilized by the query optimizer.

• Example: selection push-down.

If predicate ϕ refers to attributes in S only, then

σϕ(R �� S) ≡ R �� σϕ(S) .

Selection push-down

Why is selection push-down considered one of the mostsignificant algebraic optimizations?

• (Such effficiency considerations are the subject of“Datenbanken II.”)

Algebraic Laws (1)207

• The join satisfy the associativity condition

(R �� S) �� T ≡ R �� (S �� T ) .

In “join chains”, parentheses are thus superfluous:

R �� S �� T .

• Join is not commutative unless it is followed by a projection,i.e., a column reordering:

πL(R �� S) ≡ πL(S �� R) .

Algebraic Laws (2)208

• A significant number of further algebraic laws hold, which areheavily utilized by the query optimizer.

• Example: selection push-down.

If predicate ϕ refers to attributes in S only, then

σϕ(R �� S) ≡ R �� σϕ(S) .

Selection push-down

Why is selection push-down considered one of the mostsignificant algebraic optimizations?

• (Such effficiency considerations are the subject of“Datenbanken II.”)

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Algebraic Laws

36

•A significant number of further algebraic laws hold, which are heavily utilized by the query optimizer.

•Example: selection push-down. If predicate φ refers to attributes in S only, then

• (Such efficiency considerations are the subject of “Datenbanken II.”)

Algebraic Laws (1)207

• The join satisfy the associativity condition

(R �� S) �� T ≡ R �� (S �� T ) .

In “join chains”, parentheses are thus superfluous:

R �� S �� T .

• Join is not commutative unless it is followed by a projection,i.e., a column reordering:

πL(R �� S) ≡ πL(S �� R) .

Algebraic Laws (2)208

• A significant number of further algebraic laws hold, which areheavily utilized by the query optimizer.

• Example: selection push-down.

If predicate ϕ refers to attributes in S only, then

σϕ(R �� S) ≡ R �� σϕ(S) .

Selection push-down

Why is selection push-down considered one of the mostsignificant algebraic optimizations?

• (Such effficiency considerations are the subject of“Datenbanken II.”)

Selection push down

Why is selection push-down considered one of the most significant algebraic optimizations?

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

A Common Query Pattern

37

The following operator tree structure is very common:

1.Join all tables needed to answer the query,

2.Select the relevant tuples,

3.Project away all irrelevant columns.

A Common Query Pattern (1)209

• The following operator tree structure is very common:

πA1,...,Ak

σϕ

��θ1��θ2

���

��θn−1Rn

���Rn−1

��R2

���� R1

����

1○ Join all tables needed to answer the query, 2○ select therelevant tuples, 3○ project away all irrelevant columns.

A Common Query Pattern (2)210

• The select-project-join query

πA1,...,Ak (σϕ(R1 ��θ1 R2 ��θ2 · · · ��θn−1 Rn))

has the obvious SQL equivalent

SELECT DISTINCT A1, . . . ,AkFROM R1, . . . ,RnWHERE ϕAND θ1 AND · · · AND θn−1

• It is a common source of errors to forget a join condition:think of the scenario R(A,B), S(B,C), T (C,D) whenattributes A,D are relevant for the query output.

�Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra Quiz (Level: Novice)

38

Sample Movie Database Tables

AID Name DOB1 Jolie 4.6.75

2 Pitt 18.12.63

ActorMID Title Year Rating1 Babel 2006 7

2 Inglorious Bastards 2009 8

3 Wanted 2008 3

Movie

AID MID Name1 3 Fox

2 1 Richard Jones

2 2 Lt. Aldo Raine

Role

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra Quiz (Level: Novice)

39

Formulate equivalent queries in RA

Print all Movie titles produced after 2000 that have a high rating (equal or above 8).

Print all Movie titles and Role names where Ford plays a role.

Print all Movie titles and years of movies that appeared in 2010 featuring young actors (born after 1990).

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Self Joins

40

Sometimes it is necessary to refer to more than one tuple of the same relation at the same time.

•Example: “Which movies are remakes of movie A? These movies are identified by equal title but with a higher year”.

•To answer this query, we need to compare two tuples t, u of the relation Movie:

1.tuple t corresponding to a movie with title A,

2.tuple u corresponding to another movie in which u.Year > t.Year AND t.Title = u.title

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Self Joins

41

•This requires a generalization of the select-project-join query pattern, in which two instances of the same relation are joined (the attributes in at least one instance must be renamed first):

πX.MID( ρX(Movie) ⋈X.Title = Y.Title ⋀ X.Year < Y.Year ρY(Movie))

•Such joins are commonly referred to as self joins.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Outer Join

42

•Join eliminates tuples without a partner.

•The left outer join preserves all tuples in its left argument, even if a tuple does not team up with a partner in the join:

Outer Join (1)

233

• Join (��) eliminates tuples without partner:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca2 b2 c2

• The left outer join preserves all tuples in its left argument,even if a tuple does not team up with a partner in the join:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca1 b1 (null)a2 b2 c2

Outer Join (2)

234

• The right outer join preserves all tuples in its right argument:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca2 b2 c2

(null) b3 c3

• The full outer join preserves all tuples in both arguments:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=

A B Ca1 b1 (null)a2 b2 c2

(null) b3 c3

Outer Join (1)

233

• Join (��) eliminates tuples without partner:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca2 b2 c2

• The left outer join preserves all tuples in its left argument,even if a tuple does not team up with a partner in the join:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca1 b1 (null)a2 b2 c2

Outer Join (2)

234

• The right outer join preserves all tuples in its right argument:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca2 b2 c2

(null) b3 c3

• The full outer join preserves all tuples in both arguments:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=

A B Ca1 b1 (null)a2 b2 c2

(null) b3 c3

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Outer Join

43

•The right outer join preserves all tuples in its right argument:

•The full outer join preserves all tuples in both arguments:

Outer Join (1)

233

• Join (��) eliminates tuples without partner:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca2 b2 c2

• The left outer join preserves all tuples in its left argument,even if a tuple does not team up with a partner in the join:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca1 b1 (null)a2 b2 c2

Outer Join (2)

234

• The right outer join preserves all tuples in its right argument:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca2 b2 c2

(null) b3 c3

• The full outer join preserves all tuples in both arguments:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=

A B Ca1 b1 (null)a2 b2 c2

(null) b3 c3

Outer Join (1)

233

• Join (��) eliminates tuples without partner:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca2 b2 c2

• The left outer join preserves all tuples in its left argument,even if a tuple does not team up with a partner in the join:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca1 b1 (null)a2 b2 c2

Outer Join (2)

234

• The right outer join preserves all tuples in its right argument:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=A B Ca2 b2 c2

(null) b3 c3

• The full outer join preserves all tuples in both arguments:

A Ba1 b1a1 b2

��B Cb2 c2b3 c3

=

A B Ca1 b1 (null)a2 b2 c2

(null) b3 c3

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Outer Join

44

R ⟕ θ S

create a new temporary relation T;foreach t ∈ R do haspartner ← false; foreach u ∈ S do if θ(t ◦ u) then insert t ◦ u into T ; haspartner ← true; fi od if ¬haspartner then insert t ◦ (null,...,null) into T; fiodreturn T;

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Outer Join

45

Outer Join Example

Prepare a full list of movies (ids and titles suffice) and their associated role names, including those movies that have no associated roles (e.g., the documentary (‘Planet Earth’, 2000, 5) ).

πMID, title, name (Movie ⟕MID = MID’ πMID’ ← MID,Name(Role))

MID Title Year Rating3 Wanted 2008 3

4 Planet Earth 2000 5

MovieAID MID Name1 3 Fox

10 3 Sloan

Role

MID Tile Name3 Wanted Fox

3 Wanted Sloan

4 Planet Earth null

Query result

Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen

Chapter 5Relational algebra

• Introduction

• Unary Operators: Selection, Projection

• Binary Operators: Cartesian Product, Join, Outer Join

• Set Operations

• Combining Operators

• Formal Definitions, A Bit of Theory

46

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Operations

47

•Relations are sets (of tuples). The “usual” family of binary set operations can also be applied to relations.

• It is a requirement, that both input relations have the same schema.

Set operations

The set operations of relational algebra are R ∪ S, R ∩ S, and R " S (union, intersection, difference).

A minimal set of operations

Which of these set operations is redundant (i.e., may be derived using an alternative RA expression, just like join)?

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Operations

48

Set Operations (2)

217

R

S

R ∪ S

R ∩ S

R − S

S − R

Set Operations (3)

218

• R ∪ S may be implemented as follows:

Union

create a new temporary relation T ;foreach t ∈ R doinsert t into T ;

od

foreach t ∈ S doinsert t into T ;

od

remove duplicates in T ;return T ;

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Operations

49

•R ∪ S may be implemented as follows:

Union

create a new temporary relation T; foreach t ∈ R do insert t into T; odforeach t ∈ S do insert t into T;odremove duplicates in T; return T;

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Operations

50

•R - S may be implemented as follows:

Difference

create a new temporary relation T; foreach t ∈ R do remove ← false; foreach u ∈ S do remove ← remove ∨ (t = u); od if ¬remove then insert t into T; fiodreturn T;

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Union

51

• In RA queries, a typical application for union is case analysis.

Case analysis using union

The following query assigns movie categories based on ratings.

πMID,Category←‘Favorite movies’ (σRating >= 9(Movies)) ∪ πMID,Category←‘Good movies’ (σRating >= 7 ∧ Rating < 9(Movies))∪ πMID,Category←‘Average movies’ (σRating >= 5 ∧ Rating < 7(Movies))

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Union

52

• In SQL, ∪ is directly supported: keyword UNION. UNION may be placed between two SELECT-FROM-WHERE blocks:

SQL’s UNION

SELECT MID, ‘Favorite Movies’FROM MoviesWHERE Rating >= 9UNIONSELECT MID, ‘Good Movies’FROM MoviesWHERE Rating >= 7 AND Rating < 9UNIONSELECT MID, ‘Average Movies’FROM MoviesWHERE Rating >= 5 AND Rating < 7

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Difference

53

•Note: the RA operators σ, &, ', ∪, ⋈ are monotic by definition, e.g.:

•Then it follows that every query Q that exclusively uses the above operators behaves monotonically:

‣Let I1 be a database state, and let I2 = I1 ∪ {t}(database state after insertion of tuple t).

‣Then every tuple u contained in the answer to Q in state I1 is also contained in the answer to Q in state I2.

Database insertion never invalidates a correct answer.

Union (2)221

• In SQL, ∪ is directly supported: keyword UNION.UNION may be placed between two SELECT-FROM-WHERE blocks:SQL’s UNION

SELECT SID, ’A’ AS GRADE

FROM RESULTS

WHERE CAT = ’M’ AND ENO = ’1’ AND POINTS ¿= 12

UNION

SELECT SID, ’B’ AS GRADE

FROM RESULTS

WHERE CAT = ’M’ AND ENO = ’1’

AND POINTS ¿= 10 AND POINTS ¡ 12

UNION

...

Set Difference (1)222

• Note: the RA operators σ,π,×,��,∪ are monotic bydefinition, e.g.:

R ⊆ S =⇒ σϕ(R) ⊆ σϕ(S) .

• Then it follows that every query Q that exclusively uses theabove operators behaves monotonically:

� Let I1 be a database state, and let I2 = I1 ∪ {t}(database state after insertion of tuple t).

� Then every tuple u contained in the answer to Q in state I1is also contained in the anser to Q in state I2.

Database insertion never invalidates a correct answer.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Difference

54

• If we pose non-monotonic queries, e.g.,

•“Which actor has not played any role?”

•“What movies featuring Actor1 have the highest rating?

then it is obvious that σ, &, ', ⋈, ∪ are not sufficient to formulate the query. Such queries require set difference (!).

A non-monotonic query

“Which actor has not played any role? (Print name and date of birth)”

Example database tables repeated on next slide)

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Example Database

55

Sample Movie Database Tables

AID Name DOB1 Jolie 4.6.75

2 Pitt 18.12.63

3 Depp 9.6.1963

ActorMID Title Year Rating1 Babel 2006 7

2 Inglorious Bastards 2009 8

3 Wanted 2008 3

Movie

AID MID Name1 3 Fox

2 1 Richard Jones

2 2 Lt. Aldo Raine

Role

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Difference

56

A correct solution?

πName, DOB ( Actor ⋈MID ≠ MID2 πMID2 ← MID(Role))

A correct solution?

πAID, Name, DOB(Actor - πMID(Role))

Correct solution!

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Difference

57

•A typical RA query pattern involving set difference is the anti-join.

•Given R(A, B) and S(B, C), retrieve the tuples of R that do not have a (natural) join partner in S (Note: sch(R) ∩ sch(S) = {B}):

(The following is equivalent: )

•There is no common symbol for this anti-join, but seems appropriate (complemented semi-join).

Set Difference (3)225

A correct solution?

πFIRST,LAST(STUDENTS ��SID �=SID2 πSID2←SID(RESULTS))

A correct solution?

πSID,FIRST,LAST(STUDENTS− πSID(RESULTS))

Correct solution!

Set Difference (4)226

• A typical RA query pattern involving set difference is theanti-join.

• Given R(A,B) and S(B,C), retrieve the tuples of R that donot have a (natural) join partner in S(Note: sch(R) ∩ sch(S) = {B}):

R �� (πB(R)− πB(S)) .

(The following is equivalent: R − πsch(R)(R �� S).)

• There is no common symbol for this anti-join, but R�Sseems appropriate (complemented semi-join).

Set Difference (3)225

A correct solution?

πFIRST,LAST(STUDENTS ��SID �=SID2 πSID2←SID(RESULTS))

A correct solution?

πSID,FIRST,LAST(STUDENTS− πSID(RESULTS))

Correct solution!

Set Difference (4)226

• A typical RA query pattern involving set difference is theanti-join.

• Given R(A,B) and S(B,C), retrieve the tuples of R that donot have a (natural) join partner in S(Note: sch(R) ∩ sch(S) = {B}):

R �� (πB(R)− πB(S)) .

(The following is equivalent: R − πsch(R)(R �� S).)

• There is no common symbol for this anti-join, but R�Sseems appropriate (complemented semi-join).

Set Difference (3)225

A correct solution?

πFIRST,LAST(STUDENTS ��SID �=SID2 πSID2←SID(RESULTS))

A correct solution?

πSID,FIRST,LAST(STUDENTS− πSID(RESULTS))

Correct solution!

Set Difference (4)226

• A typical RA query pattern involving set difference is theanti-join.

• Given R(A,B) and S(B,C), retrieve the tuples of R that donot have a (natural) join partner in S(Note: sch(R) ∩ sch(S) = {B}):

R �� (πB(R)− πB(S)) .

(The following is equivalent: R − πsch(R)(R �� S).)

• There is no common symbol for this anti-join, but R�Sseems appropriate (complemented semi-join).

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Difference

58

•Suppose that the relations R, S have been computed as:

•S := SELECT A1,...,An FROM R1,...,Rm WHERE φ1

•R := SELECT B1,...,Bn FROM S1,...,Sk WHERE φ2Set difference R - S in SQL

SELECT" A1, . . . ,An FROM R1,...,Rm WHERE"φ1 AND NOT EXISTS (SELECT * FROM S1,...,Sk WHERE"φ2 AND" B1=A1 AND···AND Bn=An )

The subquery in () returns TRUE if it returns 0 tuples.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Set Difference

59

•Note that the availability of ∪, ! (and ∩) renders complex selection predicates superfluous:

Predicate simplification rules

Set Difference (5)227

• Suppose that the relations R,S have been computed as:� S := SELECT A1, . . . , An FROM R1, . . . , Rm WHERE ϕ1� R := SELECT B1, . . . , Bn FROM S1, . . . , Sk WHERE ϕ2

Set difference R − S in SQL4

SELECT A1, . . . ,AnFROM R1, . . . ,RmWHERE ϕ1 AND NOT EXISTS

(SELECT *

FROM S1, . . . ,SkWHERE ϕ2AND B1=A1 AND · · · AND Bn=An)

4The subquery in () returns TRUE if it returns 0 tuples.

Set Operations and Complex Selections

228

• Note that the availability of ∪,− (and ∩) renders complexselection predicates superfluous:

Predicate Simplification Rules

σϕ1∧ϕ2(Q)→= σϕ1(Q) ∩ σϕ2(Q)

σϕ1∨ϕ2(Q) = σϕ1(Q) ∪ σϕ2(Q)

σ¬ϕ(Q) = Q− σϕ(Q)

RDBMS implement complex selection predicates anyway

Why?RDBMS implement complex selection predicates anyway

Why?

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra Quiz (Level: Intermediate)

60

Formulate equivalent queries in RA

What movies (print MID) starring actor with AID 1 have the highest rating?

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Relational Algebra Quiz (Level: Intermediate)

61

Table pivot

Below are two alternatives to represent award recipients per category. Find RA expressions to transform between the two representations.

Award Year BestMovie BestActorOscar 2010 Great Movie John Doe

Oscar 2009 Fabulous Film Mr. Smith

Oscar 2008 Go see it! Invisible Man

Award_1

Award Year Category WinnerOscar 2010 Best Movie Great Movie

Oscar 2009 Best Movie Fabulous Movie

Oscar 2008 Best Movie Go see it!

Oscar 2010 Best Actor John Doe

Oscar 2009 Best Actor Mr. Smith

Oscar 2008 Best Actor Invisible Man

Award_2

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Summary

62

•Derived (and thus redundant) operations:Theta-Join, Natural Join, Semi-Join, Renaming, and Intersection

•Extensions to the basic relational algebra: left outer join, right outer join, full outer join.

The five basic operations of relational algebra are:

1.Selection

2.Projection

3.Cartesian Product

4.Union

5.Difference

Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen

Chapter 5Relational algebra

• Introduction

• Unary Operators: Selection, Projection

• Binary Operators: Cartesian Product, Join, Outer Join

• Set Operations

• Combining Operators

• Formal Definitions, A Bit of Theory

63

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Combining Operations

64

•Since the result of any relational algebra operation is a relation again, this intermediate result may be the input of a subsequent RA operation.

Example: πTitle,Name(Role ⋈ σRating>5(Movie))

•We can think of the intermediate result to be stored in a named temporary relation (or as a macro definition):

HighRatings := σRating>5(Movie);MR := Role ⋈ HighRatings;πTitle,Name(MR);

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Combining Operations

65

•Composite RA expressions are typically depicted as operator trees:

• In these trees, computation proceeds bottom-up. The evaluation order of sibling branches is not pre-determined.

Movie

σRating>5

Role

πTitle,Name

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Combining Operations

66

•SQL-92 permits the nesting of queries (the result of a SQL query may be used in a place of a relation name):

•Note that this is not the typical style of SQL querying!

Nested SQL Query

SELECT DISTINCT Title, NameFROM ( SELECT * FROM Role, ( SELECT * FROM Movie WHERE Rating > 5 ) AS HighRatings WHERE Role.MID = HighRatings.MID) AS MR

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Combining Operations

67

• Instead, a single SQL query is equivalent to an RA operator tree containing σ, &, and (multiple) ' (see below):

• Really complex queries may be constructed step-by-step (using SQL’s view mechanism), which may be used like a relation:

SELECT-FROM-WHERE Block

SELECT DISTINCT Title, NameFROM Role, MovieWHERE Movie.MID = Role.MID AND Rating > 5

SQL View Definition

CREATE VIEW MR AS SELECT * FROM Role, Movie WHERE Movie.MID = Role.MID AND Rating > 5

Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen

Chapter 5Relational algebra

• Introduction

• Unary Operators: Selection, Projection

• Binary Operators: Cartesian Product, Join, Outer Join

• Set Operations

• Combining Operators

• Formal Definitions, A Bit of Theory

68

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Syntax

69

•Let the following be given:

‣A set D of data type names and for each D ∈ D a set val(D) of values.

‣A set A of valid attribute names (identifiers).

Relational database schema

A relational database schema S consists of

•a finite set of relation names R, and

•for every R ∈ R a relation schema sch(R).

(We ill ignore constraints here.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Syntax

70

•The set of syntactically correct RA expressions or queries is defined recursively, together with the resulting schema of each expression.

Syntax of RA (base cases)

1. R# (relation name) For every R ∈ R, R is an RA expression with schema sch(R).

2. {(A1:d1,...,An:dn)}#(relation constant) A relation constant is an RA expression if A1, ... , An ∈ A, di ∈val(Di)

for 1$ i$ n with D1,...,Dn ∈ D. The schema of this expression is

(A1:D1, ..., An:Dn).

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Syntax

71

•Let Q be an RA expression with schema s = (A1:D1, ..., An:Dn).

Syntax of RA (recursive cases)

3.σAi = Aj (Q) for i,j ∈ {1,...,n} is an RA expression with schema s.

4.σAi = d (Q) for i ∈ {1,...,n} and d ∈ val(Di) is an RA expression with schema s.

5.!B1←Ai1,...,Bm←Aim (Q) for i1,...,im ∈ {1,...,n} and B1,...,Bm ∈ A such that Bj % Bk for j % k is an

RA expression with schema (B1:Di1,...,Bm:Dim).

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Syntax

72

•Let Q1, Q2 be RA expressions with the same schema s.

•Let Q1, Q2 be RA expressions with schemas (A1:D1, ..., An:Dn) and (B1:E1, ..., Bm:Em), respectively.

Syntax of RA (recursive cases)

6. Q1 ∪ Q2 and

7. Q1 $ Q2

are RA expressions with schema s.

Syntax of RA (recursive cases)

8. Q1 # Q2 is an RA expression with schema (A1:D1, ..., An:Dn, B1 :E1 ,..., Bm :Em ) if {A1 , . . . , An} ∩ {B1 , . . . , Bm} = ∅.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Semantics

73

The result of a query Q, i.e., an RA expression, in a database state I is a relation. This relation is denoted by I[Q] and defined recursively corresponding to the syntactic structure of Q.

Database state

A database state I (instance) defines a relation I(R) for every relation name R in the database schema S.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Semantics

74

I[Q]

Definitions: Semantics (1)243

Database State

A database state I (instance) defines a relation I(R) forevery relation name R in the database schema S.

• The result of a query Q, i.e., an RA expression, in adatabase state I is a relation. This relation is denoted by I[Q]and defined recursively corresponding to the syntacticstructure of Q.

Definition: Semantics (2)244

I[Q]

• If Q is a relation name R, then I[Q] := I(R).

• If Q is a constant relation {(A1:d1, . . . , An:dn)}, thenI[Q] := {(d1, . . . , dn)}.

• If Q has the form σAi=Aj (Q1), thenI[Q] := {(d1, . . . , dn) ∈ I[Q1] | di = dj}

• If Q has the form σAi=d(Q1), thenI[Q] := {(d1, . . . , dn) ∈ I[Q1] | di = d}

• If Q has the form πB1←Ai1 ,...,Bm←Aim (Q1), thenI[Q] := {(di1 , . . . , dim) | (d1, . . . , dn) ∈ I[Q1]}

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Semantics

75

I[Q] (continued)

Definition: Semantics (3)245

I[Q] (continued)

• If Q has the form Q1 ∪Q2, thenI[Q] := I[Q1] ∪ I[Q2]

• If Q has the form Q1 −Q2, thenI[Q] := I[Q1]− I[Q2]

• If Q has the form Q1 ×Q2, thenI[Q] := { (d1, . . . , dn, e1, . . . , em) |

(d1, . . . , dn) ∈ I[Q1],(e1, . . . , em) ∈ I[Q2]} .

Monotonicity246

Smaller Database State

A database state I1 is smaller than (or equal to) a databasestate I2, written I1 ⊆ I2, iff I1(R) ⊆ I2(R) for all relationnames R ∈ R of schema S.

Theorem: RA \{−} is monotonicIf an RA expression Q does not contain the − (set difference)operator, then the following holds for all database states I1, I2:

I1 ⊆ I2 =⇒ I1[Q] ⊆ I2[Q] .

Formulate proof by induction on syntactic structure of Q(“structural induction”).

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Monotonicity

76

Smaller database state

A database state I1 is smaller than (or equal to) a database state I2, written I1 ⊆ I2, iff I1(R) ⊆ I2(R) for all relation names R ∈ R of schema S.

Theorem: RA \ {-} is monotonic

If an RA expression Q does not contain the & (set difference) operator, then the following holds for all database states I1, I2:

I1 ⊆ I2 ⇒ I1[Q] ⊆ I2[Q] .

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Equivalence

77

Equivalence of RA expressions

Two RA expressions Q1 and Q2 are equivalent iff they have the same (result) schema and for all database states I, the following holds:

I[Q1] = I[Q2]

•Examples:

‣σφ1(σφ2(Q)) = σφ2(σφ1(Q))

‣(Q1 # Q2) # Q3 = Q1 # (Q2 # Q3)

‣If A is an attribute in the result schema of Q1, then σA=d (Q1 # Q2) = (σA=d(Q1)) # Q2.

•Theorem: The equivalence of (arbitrary) relational algebra expressions is undecidable.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Limitations of RA

78

• Let R be a relation name and assume sch(R) = (A:D, B:D), i.e., both columns share the same data type D. Let val(D) be infinite.

•The transitive closure of I(R) is the set of all (d, e) ∈ val(D) # val(D) such that there are n ∈ , n & 1, and d0, d1,...,dn ∈ val(D) with d = d0, e = dn and (di$1,di) ∈ I(R) for i = 1,...,n.

Example of transitive closure

from toa b

b c

c d

R

a b

cd

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Limitations of RA

79

•Theorem: There is no RA expression Q such that I[Q] is the transitive closure of I(R) for all database states I.

•An n-fold self-join will find all paths in the graph of length n + 1. To compute the transitive closure for arbitrary graphs, i.e., for all database states I, is impossible in RA.

Example of transitive closure

In the directed graph example, one self-join(of R with itself) is needed, to follow the edgesin the graph:

πS.from,T.to(ρS(R) ⋈S.to=T.from ρT(R))

from toa b

b c

c d

R

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Limitations of RA

80

•This of course implies that relational algebra is not computationally complete.

•There are functions from database states to relations (query results), for which we could write a program using our favorite programming language, but we will not be able to find an equivalent RA expression to do the same.

•However, this would have been truly unexpected and actually unwanted, because want a guarantee that query evaluation always terminates. This is guaranteed for RA.

Otherwise, we would have solved the halting problem.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Limitations of RA

81

•All RA queries can be evaluated in time that is polynomial in the size of the database state.

•This implies that certain “complex problems” cannot be formulated in relational algebra.

For example, if you find a way to formulate the Traveling Salesman problem in RA, you have solved the famous P = NP problem. (With a solution that nobody expects; contact me to collect your PhD.)

•As the transitive closure example shows, even not all problems of polynomial complexity can be formulated in “classical RA.”

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Expressive Power

82

Read as: “It is possible to write an RA-to-L query compiler.”

Relational completeness

A query language L for the relational model is called strong

relationally complete if, for every DB schema S and for every RA

expression Q1 with respect to S there is a query Q2 ∈ L such that for all

database states I with respect to S the two queries produce the same

results:

I[Q1] = I[Q2]

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Expressive Power

83

•SQL is strong relationally complete.

• If we can even write RA-to-L as well as L-to-RA compilers, both query

languages are equivalent.

•SQL and RA are not equivalent. SQL contains concepts, e.g., the aggregate COUNT, which cannot be simulated in RA.

Equivalent query languages

•Relational algebra,

•SQL without aggregations and with mandatory duplicate elimination,

•Tuple relational calculus,

•Datalog (a Prolog variant) without recursion.

Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Summary

• Relational algebra is a query language over relational data.

• Five basic operators: selection, projection, Cartesian product, union, set difference.

• Derived operators: intersection, join, semijoin, outer joins

• Operators can be nested and form tree-shaped query plans

• Theoretical background: syntax, semantics, monotonicity, equivalence, limitations

84