Upload
jeremy-sanders
View
245
Download
0
Tags:
Embed Size (px)
Citation preview
1
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Chapter 3
Relational algebra and calculus
2
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Query languages for relational databases
• Operations on databases:– queries: "read" data from the database– updates: change the content of the database
• Both can be modeled as functions from databases to databases• Foundations can be studied with reference to query languages:
– relational algebra, a "procedural" language– relational calculus, a "declarative" language– (briefly) Datalog, a more powerful language
• Then, we will study SQL, a practical language (with declarative and procedural features) for queries and updates
3
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Relational algebra
• A collection of operators that– are defined on relations– produce relations as results
and therefore can be combined to form complex expressions• Operators
– union, intersection, difference– renaming– selection– projection– join (natural join, cartesian product, theta join)
4
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Union, intersection, difference
• Relations are sets, so we can apply set operators• However, we want the results to be relations (that is,
homogeneous sets of tuples)• Therefore:
– it is meaningful to apply union, intersection, difference only to pairs of relations defined over the same attributes
5
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Union
Number Surname Age7274 Robinson 377432 O'Malley 399824 Darkes 38
Number Surname Age9297 O'Malley 567432 O'Malley 399824 Darkes 38
Graduates
Managers
Number Surname Age7274 Robinson 377432 O'Malley 399824 Darkes 389297 O'Malley 56
Graduates Managers
6
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Intersection
Number Surname Age7274 Robinson 377432 O'Malley 399824 Darkes 38
Number Surname Age9297 O'Malley 567432 O'Malley 399824 Darkes 38
Graduates
ManagersNumber Surname Age
7432 O'Malley 399824 Darkes 38
Graduates Managers
7
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Difference
Number Surname Age7274 Robinson 377432 O'Malley 399824 Darkes 38
Number Surname Age9297 O'Malley 567432 O'Malley 399824 Darkes 38
Graduates
ManagersNumber Surname Age
7274 Robinson 37
Graduates - Managers
8
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Paternity Maternity ???
Father ChildAdam CainAdam Abel
Abraham IsaacAbraham Ishmael
Paternity Mother ChildEve CainEve Seth
Sarah IsaacHagar Ishmael
Maternity
A meaningful but impossible union
• the problem: Father and Mother are different names, but both represent a "Parent"
• the solution: rename attributes
9
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Renaming
• unary operator• "changes attribute names" without changing values• removes the limitations associated with set operators• notation:
Y X.(r)
• example: Parent Father.(Paternity)
• if there are two or more attributes involved then ordering is meaningful: Location,Pay Branch,Salary.(Employees)
10
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Renaming, example
Father ChildAdam CainAdam Abel
Abraham IsaacAbraham Ishmael
Paternity
Father ChildAdam CainAdam Abel
Abraham IsaacAbraham Ishmael
Parent Father.(Paternity)
11
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Renaming and union
Father ChildAdam CainAdam Abel
Abraham IsaacAbraham Ishmael
Paternity Mother ChildEve CainEve Seth
Sarah IsaacHagar Ishmael
Maternity
Parent Father.(Paternity) Parent Mother.(Maternity)
Parent ChildAdam CainAdam Abel
Abraham IsaacAbraham Ishmael
Eve CainEve Seth
Sarah IsaacHagar Ishmael
12
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Renaming and union, with more attributes
Surname Branch SalaryPatterson Rome 45Trumble London 53
Employees
Location,Pay Branch,Salary (Employees) Location,Pay Factory, Wages (Staff)
Surname Location PayPatterson Rome 45Trumble London 53Cooke Chicago 33Bush Monza 32
Surname Factory WagesPatterson Rome 45Trumble London 53
Staff
13
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Selection and projection
• Two unary operators, in a sense orthogonal:– selection for "horizontal" decompositions– projection for "vertical" decompositions
A B C
A B C
Selection
A B C
Projection
A B
14
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Selection
• Produce results – with the same schema as the operand – with a subset of the tuples (those that satisfy a condition)
• Notation: F(r)
• Semantics: F(r) = { t | t r and t satisfies F}
15
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Selection, example
Surname FirstName Age SalarySmith Mary 25 2000Black Lucy 40 3000Verdi Nico 36 4500Smith Mark 40 3900
Employees
Surname FirstName Age SalarySmith Mary 25 2000Verdi Nico 36 4500
Age<30 Salary>4000 (Employees)
16
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Selection, another example
Surname FirstName PlaceOfBirth ResidenceSmith Mary Rome MilanBlack Lucy Rome RomeVerdi Nico Florence FlorenceSmith Mark Naples Florence
Citizens
PlaceOfBirth=Residence (Citizens)
Surname FirstName PlaceOfBirth ResidenceBlack Lucy Rome RomeVerdi Nico Florence Florence
17
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Projection
• Produce results – over a subset of the attributes of the operand – with values from all its tuples
• Notation (given a relation r(X) and a subset Y of X): Y(r)
• Semantics: Y(r) = { t[Y] | t r }
18
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Projection, example
Surname FirstName Department HeadSmith Mary Sales De RossiBlack Lucy Sales De RossiVerdi Mary Personnel FoxSmith Mark Personnel Fox
Employees
Surname FirstNameSmith MaryBlack LucyVerdi MarySmith Mark
Surname, FirstName(Employees)
19
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Projection, another example
Surname FirstName Department HeadSmith Mary Sales De RossiBlack Lucy Sales De RossiVerdi Mary Personnel FoxSmith Mark Personnel Fox
Employees
Department HeadSales De Rossi
Personnel Fox
Department, Head (Employees)
20
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Cardinality of projections
• The result of a projections contains at most as many tuples as the operand
• It can contain fewer, if several tuples "collapse" Y(r) contains as many tuples as r if and only if Y is a superkey
for r; – this holds also if Y is "by chance" (not defined as a superkey
in the schema, but superkey for the specific instance), see the example
21
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Tuples that collapse
RegNum Surname FirstName BirthDate DegreeProg284328 Smith Luigi 29/04/59 Computing296328 Smith John 29/04/59 Computing587614 Smith Lucy 01/05/61 Engineering934856 Black Lucy 01/05/61 Fine Art965536 Black Lucy 05/03/58 Fine Art
Students
Surname DegreeProgSmith ComputingSmith EngineeringBlack Fine Art
Surname, DegreeProg (Students)
22
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Tuples that do not collapse, "by chance"
Students
Surname, DegreeProg (Students) Surname DegreeProgSmith ComputingSmith EngineeringBlack Fine ArtBlack Engineering
RegNum Surname FirstName BirthDate DegreeProg296328 Smith John 29/04/59 Computing587614 Smith Lucy 01/05/61 Engineering934856 Black Lucy 01/05/61 Fine Art965536 Black Lucy 05/03/58 Engineering
23
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Join
• The most typical operator in relational algebra• allows to establish connections among data in different
relations, taking into advantage the "value-based" nature of the relational model
• Two main versions of the join:– "natural" join: takes attribute names into account– "theta" join
• They are all denoted by the symbol
24
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
A natural join
Employee DepartmentSmith salesBlack productionWhite production
Department Headproduction Mori
sales Brown
Employee Department HeadSmith sales BrownBlack production MoriWhite production Mori
r1 r2
r1 r2
25
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Natural join: definition
• r1 (X1), r2 (X2)
• r1 r2 (natural join of r1 and r2) is a relation on X1X2 (the union of the two sets):
{ t on X1X2 | t [X1] r1 and t [X2] r2 }
or, equivalently
{ t on X1X2 | exist t1 r1 and t2 r2 with t [X1] = t1 and t [X2] = t2 }
26
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Natural join: comments
• The tuples in the result are obtained by combining tuples in the operands with equal values on the common attributes
• The common attributes often form a key of one of the operands (remember: references are realized by means of keys, and we join in order to follow references)
27
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Another natural join
Offences Code Date Officer Dept Registartion143256 25/10/1992 567 75 5694 FR987554 26/10/1992 456 75 5694 FR987557 26/10/1992 456 75 6544 XY630876 15/10/1992 456 47 6544 XY539856 12/10/1992 567 47 6544 XY
Cars Registration Dept Owner …6544 XY 75 Cordon Edouard …7122 HT 75 Cordon Edouard …5694 FR 75 Latour Hortense …6544 XY 47 Mimault Bernard …
Code Date Officer Dept Registration Owner …143256 25/10/1992 567 75 5694 FR Latour Hortense …987554 26/10/1992 456 75 5694 FR Latour Hortense …987557 26/10/1992 456 75 6544 XY Cordon Edouard …630876 15/10/1992 456 47 6544 XY Cordon Edouard …539856 12/10/1992 567 47 6544 XY Cordon Edouard …
Offences Cars
28
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Yet another join
• Compare with the union: – the same data can be combined in various ways
Father ChildAdam CainAdam Abel
Abraham IsaacAbraham Ishmael
Paternity Mother ChildEve CainEve Seth
Sarah IsaacHagar Ishmael
Maternity
Paternity Maternity
Father Child MotherAdam Cain Eve
Abraham Isaac SarahAbraham Ishmael Hagar
29
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Joins can be "incomplete"
• If a tuple does not have a "counterpart" in the other relation, then it does not contribute to the join ("dangling" tuple)
Employee DepartmentSmith salesBlack productionWhite production
Department Headproduction Moripurchasing Brown
Employee Department HeadBlack production MoriWhite production Mori
r1 r2
r1 r2
30
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Joins can be empty
• As an extreme, we might have that no tuple has a counterpart, and all tuples are dangling
Employee DepartmentSmith salesBlack productionWhite production
Department Headmarketing Moripurchasing Brown
Employee Department Head
r1 r2
r1 r2
31
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
The other extreme
• If each tuple of each operand can be combined with all the tuples of the other, then the join has a cardinality that is the product of the cardinalities of the operands
Employee ProjectSmith ABlack AWhite A
Project HeadA MoriA Brown
Employee Project HeadSmith A MoriBlack A BrownWhite A MoriSmith A BrownBlack A MoriWhite A Brown
r1 r2
r1 r2
32
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
How many tuples in a join?
Given r1 (X1), r2 (X2)
• the join has a cardinality between zero and the products of the cardilnalities of the operands:
0 | r1 r2 | | r1 | | r2|
(| r | is the cardinality of relation r)• moreover:
– if the join is complete, then its cardinality is at least the maximum of | r1 | and | r2|
– if X1X2 contains a key for r2, then | r1 r2 | | r1|
– if X1X2 is the primary key for r2, and there is a referential constraint between X1X2 in r1 and such a key, then | r1 r2 | = | r1|
33
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Outer joins
• A variant of the join, to keep all pieces of information from the operands
• It "pads with nulls" the tuples that have no counterpart• Three variants:
– "left": only tuples of the first operand are padded– "right": only tuples of the second operand are padded– "full": tuples of both operands are padded
34
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Outer joinsEmployee Department
Smith salesBlack productionWhite production
Department Headproduction Moripurchasing Brown
Employee Department HeadSmith Sales NULL
Black production MoriWhite production Mori
r1 r2
r1 LEFTr2
Employee Department HeadBlack production MoriWhite production MoriNULL purchasing Brown
r1 RIGHT r2
Employee Department HeadSmith Sales NULL
Black production MoriWhite production MoriNULL purchasing Brown
r1 FULL r2
35
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
N-ary join
• The natural join is
– commutative: r1 r2 = r2 r1
– associative: (r1 r2) r3 = r1 (r2 r3)
• Therefore, we can write n-ary joins without ambiguity:
r1 r2 … rn
36
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
N-ary joinEmployee Department
Smith salesBlack productionBrown marketingWhite production
Department Divisionproduction Amarketing Bpurchasing B
r1 r2
Division HeadA MoriB Brown
r3
Employee Department Division HeadBlack production A MoriBrown marketing B BrownWhite production A Mori
r1 r2 r3
37
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Cartesian product
• The natural join is defined also when the operands have no attributes in common
• in this case no condition is imposed on tuples, and therefore the result contains tuples obtained by combining the tuples of the operands in all possible ways
38
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Cartesian product: example
Employee ProjectSmith ABlack ABlack B
Code NameA VenusB Mars
Employee Project Code NameSmith A A VenusBlack A A VenusBlack B A VenusSmith A B MarsBlack A B MarsBlack B B Mars
Employees Projects
Employes Projects
39
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Theta-join
• In most cases, a cartesian product is meaningful only if followed by a selection:– theta-join: a derived operator
r1 F r2 = F(r1 r2)
– if F is a conjunction of equalities, then we have an equi-join
40
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Equi-join: example
Employee ProjectSmith ABlack ABlack B
Code NameA VenusB Mars
Employee Project Code NameSmith A A VenusBlack A A VenusBlack B B Mars
Employees Projects
Employes Project=Code Projects
41
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Queries
• A query is a function from database instances to relations• Queries are formulated in relational algebra by means of
expressions over relations
42
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
A database for the examples
Number Name Age Salary101 Mary Smith 34 40103 Mary Bianchi 23 35104 Luigi Neri 38 61105 Nico Bini 44 38210 Marco Celli 49 60231 Siro Bisi 50 60252 Nico Bini 44 70301 Steve Smith 34 70375 Mary Smith 50 65
Employees
Head Employee210 101210 103210 104231 105301 210301 231375 252
Supervision
43
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Example 1
• Find the numbers, names and ages of employees earning more than 40 thousand.
Number Name Age104 Luigi Neri 38210 Marco Celli 49231 Siro Bisi 50252 Nico Bini 44301 Steve Smith 34375 Mary Smith 50
44
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Example 2
• Find the registration numbers of the supervisors of the employees earning more than 40 thousand
Head210301375
45
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Example 3
• Find the names and salaries of the supervisors of the employees earning more than 40 thousand
NameH SalaryHMarco Celli 60Steve Smith 70Mary Smith 65
46
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Example 4
• Find the employees earning more than their respective supervisors, showing registration numbers, names and salaries of the employees and supervisors
Number Name Salary NumberH NameH SalaryH104 Luigi Neri 61 210 Marco Celli 60252 Nico Bini 70 375 Mary Smith 65
47
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Example 5
• Find the registration numbers and names of the supervisors whose employees all earn more than 40 thousand
Number Name301 Steve Smith375 Mary Smith
48
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Algebra with null values
Age>30 (People)
• which tuples belong to the result?• The first yes, the second no, but the third?
Name Age SalaryAldo 35 15
Andrea 27 21Maria NULL 42
People
49
Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus
McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999
Material on relational calculus and Datalog will be provided in the near future.
Please contact Paolo Atzeni ([email protected])
for more information