60
FS 2014 ETH Zurich Data Modelling and Databases Systems Group Exercise Sheet 1 Prof. D. Kossmann Discussion: March 4 / 7 ER Modeling 1 Library System Assume there is a library system with the following properties: The library contains one or several copies of the same book. Every copy of a book has a copy number and is located at a specified location in a shelf. A copy is identified by the copy number and the ISBN number of the book. Every book has a unique ISBN, a publication year, a title, an author, and a number of pages. Books are published by publishers. A publisher has a name as well as a location. Within the library system, books are assigned to one or several categories. A category can be a subcategory of exactly one other category. A category has a name and no further properties. Each reader needs to provide his/her family name, his/her firstname, his/her city, and his/her date of birth to register at the library. Each reader gets a unique reader number. Readers borrow books. Upon borrowing the return date is stored. Create an ER diagram of this mini world. 2 Relationships in ER Model the following relationships in ER. a) An apartment is located in a house in a street in a city in a country. b) Two teams play football against each other. A referee makes sure the rules are followed. c) Men and women have a father and a mother each. 3 Modeling ER in ER Model ER in ER. 1

ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

FS 2014ETH Zurich Data Modelling and DatabasesSystems Group Exercise Sheet 1Prof. D. Kossmann Discussion: March 4 / 7

ER Modeling

1 Library System

Assume there is a library system with the following properties:

• The library contains one or several copies of the same book.

• Every copy of a book has a copy number and is located at a specified location in a shelf. A copyis identified by the copy number and the ISBN number of the book.

• Every book has a unique ISBN, a publication year, a title, an author, and a number of pages.

• Books are published by publishers.

• A publisher has a name as well as a location.

• Within the library system, books are assigned to one or several categories.

• A category can be a subcategory of exactly one other category. A category has a name and nofurther properties.

• Each reader needs to provide his/her family name, his/her firstname, his/her city, and his/her dateof birth to register at the library. Each reader gets a unique reader number.

• Readers borrow books.

• Upon borrowing the return date is stored.

Create an ER diagram of this mini world.

2 Relationships in ER

Model the following relationships in ER.

a) An apartment is located in a house in a street in a city in a country.

b) Two teams play football against each other. A referee makes sure the rules are followed.

c) Men and women have a father and a mother each.

3 Modeling ER in ER

Model ER in ER.

1

Page 2: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

FS 2014ETH Zurich Data Modelling and DatabasesSystems Group Exercise Sheet 1Prof. D. Kossmann Discussion: March 4 / 7

ER Modeling – Solutions

1 Library System

Reader

ReaderNr

FamilyName

Firstname

CityBirthdate

CopyCopyBorrowsN M

ReturnDate Shelf

Position

CopyNr

Book

Available

N

1

Available

ISBN

PubYear

Title

Author

NumPages

CategoryContains

1

N

InCatN M

Catname

Publishes

N

Publisher

Pubname

Pubcity

1

Alternative solutions:

• Borrows as entity (what is the key of this entity?)

• Author as its own entity

• Shelf as its own entity

• Publishes relationship as N:M (books are published by several publishers)

1

Page 3: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

2 Relationships in ER

a)

CopyAppartement

Available

N

1

Located in

CopyHouse

Available

N

Located in

CopyStreet

Country

Available

1

N

Located in

CopyCity

Available

1

Located in

1 N

b)

Team

Plays

HeimM

1

Referee

GastN

c)

Man

Is Son

Son N

Women

Is Daughter

Daughter N

Father 1

Mother 1

Father 1

Mother 1

Man Is Son Women

Is Son

SonN

Mother1

Father1

SonN

The second solution is fragmented. Daughter has to be added accordingly.Both solutions are correct, but the second solution is more precise. In the second solution, the mothercan be derived directly, where as in the first solution one has to know the father.

2

Page 4: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

3 Modeling ER in ER

Weak Entity

Entity Name

Relationship

Participate

N

M

Functionality

Role

Min

Name

Name

is-a

AvailableHas

Max

is-a

CopyRelationship of

Weak Entity

N 1

CopyAttribute of

EntityAvailableHasN 1

Name IsPartOfKey

CopyAttribute of

Relationship

AvailableParticipate

N

Functionality

Role

Min

Max

1

Not expressed:

• Relationship of weak entity can only be N:1 or 1:1

• It is not possible to specify that relationships are weak entities.

• It is not possible to specify that ”generalisation” implies ”inheritance”; i.e., that the set of attributesof a sub-entity is a subset of the set of attributes of its super-entity.

3

Page 5: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

FS 2014ETH Zurich Data Modelling and DatabasesSystems Group Exercise Sheet 2Prof. D. Kossmann Discussion: March 11 / 14

Relational Modeling

1 At Least One Key

Why does every relation in a relational schema have at least one key?

2 Big Relation

What happens if you implement the ER diagram from exercise 1.1 (Library System) using a singlerelation (instead of using one relation per entity)?

3 Schema Training

Convert the following ER diagrams into relational schemas.

3.1 Library System

Reader

ReaderNr

FamilyName

Firstname

CityBirthdate

CopyCopyBorrowsN M

ReturnDate Shelf

Position

CopyNr

Book

Available

N

1

Available

ISBN

PubYear

Title

Author

NumPages

CategoryContains

1

N

InCatN M

Catname

Publishes

N

Publisher

Pubname

Pubcity

1

1

Page 6: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

3.2 Inheritance

Man

Is Son

Son N

Women

Is Daughter

Daughter N

Father 1

Mother 1

Father 1

Mother 1

Man Is Son Women

Is Son

SonN

Mother1

Father1

SonN

3.3 Football

Team

Plays

HeimM

1

Referee

GastN

3.4 Address

CopyAppartement

Available

N

1

Located in

CopyHouse

Available

N

Located in

CopyStreet

Country

Available

1

N

Located in

CopyCity

Available

1

Located in

1 N

2

Page 7: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

3.5 Trains

Trainstation

Name NumTracks

CityLocatedInN 1

Name Canton

Start

1

N Departure

Arrival

Connects

1

Train

TrainNr Length

N

Destination

1

N

1

3.6 Hospital

Worker

PersNrName

RoomRoom

Expertise Degree

RoomNr

At11

At

#Beds

Station

Works

StationNr

LocatedAt

Nurse

Skills

N

1

Name

is_a

Doctor TreatsN

Patient

Name

PatientNr

M

IllnessN

N

From

To

3

Page 8: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 2

Relational Model – Solutions

1 At Least One Key

In relational modeling all entries of one relation have to be different. To achieve this, one needs a minimalnumber of attributes to differentiate between the elements. Basically, all attributes form a set whichmakes the elements of a relation different from each other. The solution, however, does not have to beminimal. Therefore, a subset is used from which no further elements can be removed.

2 Big Relation

As an example, we show the result of combining a publisher and a book.

ISBN TITLE AUTHOR NumPages PUBYEAR PUBNAME PUBCITY12345 Database systems Kemper 504 1999 Oldenbourg Munich78912 Databases in Pernul 650 2003 Oldenbourg Munich

Corporations

For each book we now also have to store the publisher information. This redundency has a numberof consequences:

• Wasted storage space: The publisher city has to be stored for each book instead of only once.

• Update problems: When the publisher city changes, every book of this publisher has to be updated.This is error prone (maybe some books get overlooked) and costly (one update for each book insteadonly once for the publisher).

• Insert problems: Since the publisher city would be entered manually, lots of tiping mistakes canoccur which produce inconsistent entries.

3 Schema Training

3.1 Library System

Translating the entities leads to the following relations:Reader (ReaderNr, FamilyName, Firstname, City, Birthdate) (1)Book (ISBN,Title, Author, NumPages, PubYear) (2)Publisher (Pubname, Pubcity) (3)Category (Catname) (4)Copy (ISBN, CopyNr, Shelf, Position) (5)

For the relationships we create these relations:

1

Page 9: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

Borrows (ReaderNr, ISBN, CopyNr, ReturnDate) (6)Available (ISBN, CopyNr) (7)Contains (Catname, ContainedIn) (8)InCat (ISBN, Catname) (9)Publishes (ISBN, Pubname) (10)

Finally, we will combine relations for binary relationships, if they have the same keys and they are oftype 1:N, N:1, or 1:1. ( (7) → (5), (8) → (4), (10) → (2) ):

Reader (ReaderNr, FamilyName, Firstname, City, Birthdate)Book (ISBN, Title, Author, NumPages, PubYear, Pubname)Publisher (Pubname, Pubcity)Category (Catname, ContainedIn)Copy (ISBN, CopyNr, Shelf, Position)Borrows (ReaderNr, ISBN, CopyNr, ReturnDate)InCat (ISBN, Catname)

Note: Weak relationships are often not explicitly shown as relations, because they disapear anyway whencombined as shown above (e.g. relation (7) in this example). Take care, however, not to forget attributeswhen using such shortcuts.

3.2 Inheritance

1st version:Man (MName)Women (WName)IsSon (Son, Father, Mother) or (Son, Father, Mother)IsDaughter (Daughter, Father, Mother) or (Daughter, Father, Mother)

2nd version:Man (MName, Father, Mother)Women (WName, Father, Mother)

A solution where Son and Daughter are shown in their own relations is also possible.

3.3 Football

Solution:Team (Teamname)Referee (RefereeName)Plays (RefereeName, Home, Visiting)

3.4 Address

Solution:Country (CoName)City (CoName, CiName)Street (CoName, CiName, SName)House (CoName, CiName, SName, HName)Appartment (CoName, CiName, SName, HName, AName)

3.5 Trains

Initially you get the following relations for entities:City (Name, Canton) (1)Trainstation (Name, NumTracks) (2)Train (TrainNr, Length) (3)

For the relationships we get:LocatedIn (TrainstationName, CityName, Canton) (4)Start (TrainNr, StartTrainstationName) (5)Destination (TrainNr, DestTrainstationName) (6)Connects (FromTrainstation, ToTrainstation, TrainNr, Departure, Arrival) or(FromTrainstation, ToTrainstation, TrainNr, Departure, Arrival) (7)

2

Page 10: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

Next we will combine relations for binary relationships, if they have the same keys and they are of type1:N, N:1, or 1:1. ( (4) → (2), (5) → (3), (6) → (3)):

City (Name, Canton)Trainstation (Name, NumTracks, CityName, Canton)Train (TrainNr, Length, StartTrainstationName, DestTrainstationName)Connects (FromTrainstation, ToTrainstation, TrainNr, Departure, Arrival) or(FromTrainstation, ToTrainstation, TrainNr, Departure, Arrival)

3.6 Hospital

Initially you get the following relations for entities:Worker (PersNr, Name) (1)Station (StationNr, Name) (2)Doctor (PersNr, Expertise, Degree) (3)Nurse (PersNr, Skills) (4)Patient (PatientNr, Name, Illness) (5)Room (StationNr, RoomNr, NumBeds) (6)

For the relationships we get:Works (StationNr, PersNr) (7)Treats (PatientNr, PersNr) (8)LocatedAt (StationNr, RoomNr, PatientNr, From, To) (9)At (StationNr, RoomNr) (10)

And finally combine relations with the same key: ( (2) and (1/3/4), (5) and (9), (6) and (10)):Worker (PersNr, Name, StationNr)Station (StationNr, Name)Doctor (PersNr, Name, StationNr, Expertise, Degree)Nurse (PersNr, Name, StationNr, Skills)Patient (PatientNr, Name, Illness, RoomNr, From, To)Room (StationNr, RoomNr, NumBeds)Treats (PatientNr, PersNr)

3

Page 11: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 3

Relational Algebra and SQL

This exercise sheet will be discussed on 18.03.2014 and 21.03.2014.

1 Relational Algebra - Task 1

Consider the following relational schema:

Reader ( RDNR, Surname, Firstname, City, Birthdate )Book ( ISBN, Title, Author, NoPages, PubYear, PublisherName )Publisher ( PublisherName, PublisherCity )Category ( CategoryName, BelongsTo )Copy ( ISBN, CopyNumber, Shelf, Position )Loan ( ReaderNr, ISBN, Copy, ReturnDate )BookCategory ( ISBN, CategoryName )

Formulate the following queries in relational algebra:

a) Which are the last names of the readers in Zurich?

b) Which books (Author, Title) are from publishers in Zurich, Bern or New York?

c) Which books (Author, Title) has the reader Lemmi Schmoker borrowed?

d) Which books in the category ”Alps” do not belong to the category ”Switzerland”? Do not takeinto account subcategories!

e) Which readers (Surname, Firstname) have borrowed books that were published in their home town?

f) Which readers (Surname, Firstname) have borrowed at least a book that has been borrowed also bythe reader Lemmi Schmoker (the reader Lemmi Schmoker should not be included in the results)?

2 Relational Algebra - Task 2

Consider the following relational schema:

Cities (Name, State)Stations (Name, NoPlatforms, CityName, State)Itinerary (ItNr, Length, StartStation, DestinationStation)Connections (FromStation, ToStation, ItNr, Departure, Arrival)

Suppose that the relation ”Connections” already contains the transitive closure. For example, if thereis a train from Zurich via Bern and Fribourg to Geneva, then there exists a relation tuple for Zurich→

1

Page 12: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

Bern, Zurich→Fribourg and Zurich→Geneva (the same for the other intermediate stations).Formulate the following queries in relational algebra:

a) Find all the direct connections from Zurich to Geneva

b) Find all the single-transfer connections from Zurich to Locarno. The transfer station can be any ofthe stations but the connecting trains should run on the same day. (You can use a function DAY()on the attributes Departure and Arrival in order to determine the day)

c) What changes if the relation ”Connections” does not contain a summary tuple for the transitive clo-sure. For example the route Zurich → Geneva is represented only by Zurich→Bern, Bern→Fribourgand Fribourg→Geneva.

3 Relational Algebra - Task 3

Express the outer joins using the basic operations of relational algebra.

4 SQL - Task 1

Formulate the questions of the first exercise in SQL.

5 SQL - Task 2

Given the relational schema in the first exercise express the following questions in SQL

a) List all the publishers and their respective books.

b) Which book has the maximum number of pages?

c) Which authors have written more than 5 books?

d) Which book has more pages than twice the average of the number of pages of all books?

e) Which categories do not have any subcategories?

f) Which author has written more books? (*)

g) Which reader has borrowed all the books (by ISBN, not copies) from the author ”Ephraim Kishon”?(*)

h) For which of the books there is at least one copy available?(*)

i) Which are the ten oldest books?(**)

j) Which are all the subcategories of the ”Sport” category (direct or non-direct categories)?(**)

(*) Difficult task, optional, solvable with the lecture knowledge(**) Difficult task, optional, additional knowledge on SQL is necessary

6 SQL - Task 3

Formulate in SQL the following modifications to the database of the first exercise:

a) The reader Max Muster borrows the copy with CopyNumber 4 of the book with ISBN 123456.

b) Delete the books that are published after 2013.

c) Change the return date of all the books in the category ”Databases” that should be returned before15.03.2013 so that they can be kept for 30 days longer (Assume that you can add days to dates inSQL).

2

Page 13: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 3

Relational Algebra and SQL - Solutions

1 Relational Algebra - Task 1

Formulate the following queries in relational algebra:

a) Which are the last names of the readers in Zurich?

ΠSurname(σCity=Zurich(READER))

b) Which books (Author, Title) are from publishers in Zurich, Bern or New York?

ΠAuthor,T itle(BOOK 1 (σCity=ZurichORCity=BernORCity=NewY ork(PUBLISHER)))

c) Which books (Author, Title) has the reader Lemmi Schmoker borrowed?

ΠAuthor,T itle(BOOK 1 LOAN 1ReaderNr=RDNR (σSurname=SchmokerANDFirstname=Lemmi(READER)))

d) Which books in the category ”Alps” do not belong to the category ”Switzerland”? Do not takeinto account subcategories!

(ΠISBN (σCategoryName=Alps(BOOKS)))− (ΠISBN (σCategoryName=Switzerland(BOOKS)))

e) Which readers (Surname, Firstname) have borrowed books that were published in their home town?

ΠFirstname,Surname(σCity=PublisherCity(PUBLISHER 1 BOOK 1 LOAN 1ReaderNr=RDNR READER))

f) Which readers (Surname, Firstname) have borrowed at least a book that has been borrowed also bythe reader Lemmi Schmoker (the reader Lemmi Schmoker should not be included in the results)?

ΠR1.F irstname,R1.Surname(ρR1(READER) 1ReaderNr=RDNR ρL1(LOAN))1R1.RDNR<>R2.RDNR,L1.ISBN=L2.ISBN

(ρR2(σSurname=SchmokerANDSurname=LemmiREADER) 1ReaderNr=RDNR ρL2(LOAN)))

2 Relational Algebra - Task 2

Formulate the following queries in relational algebra:

a) Find all the direct connections from Zurich to Geneva

1

Page 14: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

(ρFromName←Name(ΠName(σCityName=Zurich(STATIONS))))1FromName=FromStation CONNECTIONS

1ToName=ToStation

(ρToName←Name(ΠName(σCityName=Geneva(STATIONS))))

b) Find all the single-transfer connections from Zurich to Locarno. The transfer station can be any ofthe stations but the connecting trains should run on the same day. (You can use a function DAY()on the attributes Departure and Arrival in order to determine the day)

(ρFromName←Name(ΠName(σCityName=Zurich(STATIONS))))1FromName=c1.FromStation ρc1(CONNECTIONS)1c1.ToStation=c2.FromStation∧c1.Arrival<c2.Departure

∧DAY (c1.Arrival)=DAY (c2.Departure)∧c1.ItNr<>c2.ItNr

ρc2(CONNECTIONS)1ToName=ToStation

(ρToName←Name(ΠName(σCityName=Locarno(STATIONS))))

c) What changes if the relation ”Connections” does not contain a summary tuple for the transitive clo-sure. For example the route Zurich→ Geneva is represented only by Zurich→Bern, Bern→Fribourgand Fribourg→Geneva.

Here it is required to compute the transitive closure separately. In relational algebra this is partiallypossible. This means that one is able to find the connections for a fixed number of transfers. Itis not possible to compute the transitive closure for an infinite number of transfers. In otherprogramming languages this could be solved by recursion. Here it is presented a possible solutionto the task with a maximum of two intermediate stop overs (the join with the STATIONS is notpresented).

σFromStation=ZurichHB∧ToStation=GenevaHB(CONNECTIONS)∪σc1.FromStation=ZurichHB∧c2.ToStation=GenevaHB(ρc1(CONNECTIONS)1c1.ToStation=c2.FromStation∧c1.ItNr=c2.ItNr ρc2(CONNECTIONS))∪σc1.FromStation=ZurichHB∧c3.ToStation=GenevaHB(ρc1(CONNECTIONS)

1c1.ToStation=c2.FromStation∧c1.ItNr=c2.ItNr ρc2(CONNECTIONS)1c2.ToStation=c3.FromStation∧c2.ItNr=c3.ItNr ρc3(CONNECTIONS))

This is a union of direct connections with the connections with one intermediate station andconnections with two intermediate stations. The general case of an indeterminate number ofintermediate stations is not covered.

3 Relational Algebra - Task 3

R ./ S = ΠR∪S(R−ΠR(R 1 S)) ∪ (R 1 S)R ./ S = ΠR∪S(S −ΠS(R 1 S)) ∪ (R 1 S)R 1 S = σp(R× S)R ./ S = (R 1 S) ∪ (ΠR∪S(R−ΠR(R 1 S))) ∪ (ΠR∪S(S −ΠS(R 1 S)))

4 SQL - Task 1

a) Which are the last names of the readers in Zurich?

SELECT DISTINCT Surname

FROM Reader

WHERE City = ’Zurich’

ORDER BY Surname DESC

2

Page 15: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

b) Which books (Author, Title) are from publishers in Zurich, Bern or New York?

SELECT Author, Title

FROM Book B, Publisher P

WHERE B.PublisherName = P.PublisherName

AND (P.PublisherCity = ’Zurich’

OR P.PublisherCity = ’Bern’

OR P.PublisherCity = ’New York’)

c) Which books (Author, Title) has the reader Lemmi Schmoker borrowed?

SELECT B.Author, B.Title

FROM Reader R, Loan L, Book B

WHERE R.Surname = ’Schmoker’

AND R.Firstname = ’Lemmi’

AND R.RDNR = L.ReaderNr

AND L.ISBN = B.ISBN

d) Which books in the category ”Alps” do not belong to the category ”Switzerland” at the same time?Do not take into account subcategories!

(SELECT ISBN

FROM BookCategory

WHERE CategoryName = ’Alps’) EXCEPT

(SELECT ISBN

FROM BookCategory

WHERE CategoryName = ’Switzerland’)

e) Which readers (Surname, Firstname) have borrowed books that were published in their home town?

SELECT R.Firstname, R.Surname

FROM Reader R, Loans L, Book B, Publisher P

WHERE R.RDNR = L.ReaderNr

AND L.ISBN = B.ISBN

AND B.PublisherName = P.PublisherName

AND R.City = P.PublisherCity

f) Which readers (Surname, Firstname) have borrowed at least a book that has been borrowed also bythe reader Lemmi Schmoker (the reader Lemmi Schmoker should not be included in the results)?

SELECT R1.Firstname, R1.Surname

FROM Reader R1, Loan L1, Loan L2, Reader R2

WHERE R2.Firstname=’Lemmi’

AND R2.NAME = ’Schmoker’

AND L2.ReaderNr = R2.RDNR

3

Page 16: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

AND L1.ISBN = L2.ISBN

AND R1.RDNR = L1.ReaderNr

AND R1.RDNR <> R2.RDNR

5 SQL - Task 2

Given the relational schema in the first exercise express the following questions in SQL

a) List all the publishers and their respective books.

SELECT Publishername, Title

FROM Book B RIGHT OUTER JOIN Publisher P ON B.PublisherName = P.PublisherName

(ORDER BY B.PublisherName ASC, Title ASC);

b) Which book has the maximum number of pages?

SELECT Title

FROM Book

WHERE NoPages IN

(SELECT MAX(NoPages) FROM Book);

This query can output several records in the case when there are many books which have themaximum number of pages.

c) Which authors have written more than 5 books?

SELECT Author, COUNT(Title) AS number books FROM Book

GROUP BY Author

HAVING number books > 5

The number of books per author can be determined only after the grouping. Thus, HAVING isnecessary in this case.

d) Which book has more pages than twice the average of the number of pages of all books?

SELECT Title

FROM Book

WHERE NoPages >= 2*

(SELECT AVG(NoPages) FROM Book);

e) Which categories do not have any subcategories?

SELECT C1.CategoryName FROM Category C1 WHERE NOT EXISTS

(SELECT CategoryName

FROM Category C2

WHERE C2.BelongsTo = C1.CategoryName);

Alternative solution:

SELECT C1.CategoryName

4

Page 17: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

FROM Category C1 LEFT OUTER JOIN Category C2 ON C1.CategoryName = C2.BelongsTo

WHERE C2.CategoryName IS NULL

f) Which author has written more books? (*) First we build a table A with the number of books foreach author and then we find the maximum in this table.

SELECT A.Author FROM

(SELECT Author, COUNT(Title) AS number books FROM Book

GROUP BY Author) A

WHERE A.number books = (SELECT MAX(number books) FROM A);

Alternative solution with HAVING:

SELECT Author, COUNT(Title) AS number books FROM Book

GROUP BY Author

HAVING number books >= ALL(

SELECT COUNT(Title) FROM Book

GROUP BY Author)

g) Which reader has borrowed all the books (by ISBN, not copies) from the author ”Ephraim Kishon”?(*)

SELECT Firstname, Lastname

FROM Loan, Reader, Book

WHERE Reader.RDNR= Loan.ReaderNr AND

Loan.ISBN=Book.ISBN AND

Author = ’Ephraim Kishon’

GROUP BY Firstname, Lastname

HAVING COUNT(Loan.ISBN) =

(SELECT COUNT(ISBN)

FROM Book

WHERE Author = ’Ephraim Kishon’)

This solution requires that no reader has borrowed two copies of the same book. If we cannotassume this, the problem can be solved by COUNT (DISTINCT ISBN) or with a subquery thatfor every reader finds he different ISBNs of the books he has borrowed.

h) For which of the books there is at least one copy available?(*) It is assumed that in the Loan tablethere are only the loans. Otherwise, it would be necessary to filter the loans by the return date.

SELECT Title FROM Book WHERE ISBN IN

(SELECT ISBN FROM

((SELECT CopyNumber, ISBN FROM Copy)

EXCEPT

(SELECT Copy, ISBN FROM Loan))

);

From the existing copies we select only those that are not in the loans.

5

Page 18: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

Alternative solution:

SELECT ISBN

FROM Copy C GROUP BY C.ISBN HAVING COUNT(*) >

(SELECT COUNT(*)

FROM Loan L

WHERE L.ISBN = C.ISBN)

The copies of books are grouped by ISBN and the number of copies for each ISBN is compared withthe number of copies that are borrowed for the same ISBN. The query can be rewritten differentlyso that the copies and the loans are grouped separately and then joined by the predicate L.number< C.number.

i) Which are the ten oldest books?(**)

SELECT ISBN, Author, Title FROM Book

ORDER BY PubYear TOP 10 -- (SQL Server)

LIMIT 10 -- (MySQL, PostGres)

LIMIT 0,10 -- (MySQL, PostGres)

FETCH FIRST 10 ROWS -- (DB2)

SELECT ISBN, Author, Title FROM

(SELECT ISBN, Author, Title FROM Book

ORDER BY PubYear ) WHERE ROWNUM < 10 -- (Oracle)

Unfortunately, there is no single syntax.

j) Which are all the subcategories of the ”Sport” category (direct or non-direct categories)?(**)

(SELECT CategoryName

FROM Category

CONNECT BY PRIOR CategoryName = BelongsTo START WITH BelongsTo = ’Sport’)

EXCEPT

(SELECT CategoryName

FROM Category

WHERE CategoryName = ’Sport’);

This solution is specific for Oracle. In DB2 or SQL Server there are other approaches available. Ata known fixed depth of the hierarchy the problem can be solved similarly as in the lectures.

6 SQL - Task 3

Formulate in SQL the following modifications to the database of the first exercise:

a) The reader Max Muster borrows the copy with CopyNumber 4 of the book with ISBN 123456.

INSERT INTO Loan (ReaderNr, ISBN, Copy) SELECT RDNR, 123456, 4

FROM Reader

WHERE Firstname = ’Max’ AND Surname = ’Muster’

6

Page 19: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

b) Delete the books that are published after 2013.

DELETE FROM Book WHERE PubYear > 2013

c) Change the return date of all the books in the category ”Databases” that should be returned before15.03.2013 so that they can be kept for 30 days longer (Assume that you can add days to dates inSQL).

UPDATE Loan SET ReturnDate = ReturnDate + 30 WHERE ReturnDate < ’2013-03-15’

AND ISBN IN

(SELECT ISBN

FROM BookCategory

WHERE CategoryName = ’Databases’)

7

Page 20: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

FS 2014ETH Zurich Data Modeling and DatabasesSystems Group Exercise Sheet 4Prof. D. Kossmann Discussion: April 1/ 4

Integrity Constraints

1 Foreign Keys and Constraints: Theory

1. What is a foreign key constraint?

2. Why are such constraints important?

3. What is referential integrity?

2 Foreign Keys and Constraints: Practice

Consider the following relational schema:

Table Name AttributesReader (readerId, firstName, lastName, address, city, dateOfBirth)Book (isbn, title, author, numberOfPages, yearOfPublication, publisherName)Publisher (publisherName, placeOfPublication)Categories (categoryName, includedIn)Copy (ISBN, copyNumber, shelf, position)Loan (readerId, ISBN, copyNumber, returnDate)BookCategory (ISBN, categoryName)

1. What problems may appear if this schema is realized in SQL without considering integrity con-strains?

2. What foreign keys are necessary and which strategies have to be applied when changing them?Please give an explanation why you chose a particular strategy.

3. What is the implication of deleting a publisher? What is the consequence of updating a readerId?In both cases, take into account the keys and rules of your solution for question 2.2.

4. Enforce that only readers who live in Zurich can be inserted.

5. Enforce that a reader can only borrow up to 20 books! Give a solution using CHECK and a solutionusing Trigger.

1

Page 21: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

3 Foreign Keys and Constraints: Practice 2

Consider the following relational schema:

Table Name AttributesEmp (eid: integer, ename: string, age: integer, salary: real)Works (eid: integer, did: integer, pcttime: integer)Dept (did: integer, dname: string, budget: real, managerid: integer)

1. Give an example of a foreign key constraint that involves the Dept relation. What are the optionsfor enforcing this constraint when a user attempts to delete a Dept tuple?

2. Write the SQL statements to create the preceding relations, including appropriate versions of allprimary and foreign key integrity constraints.

3. Define the Dept relation in SQL so that every department is guaranteed to have a manager.

4. Write an SQL statement to delete the Toy department. Given the referential integrity constraintsyou chose for this schema, explain what happens when this statement is executed.

4 Bonus: More SQL Practice

Consider the following relational table ‘Employees’ that stores employee data:

EmployeeID EmployeeName ManagerID Salary . . .1 Peter Muller 15 85,000 . . .2 Marco Berlusconi NULL 350,000 . . .3 Stephan Meier 2 210,000 . . .. . . . . . . . . . . . . . .

The ManagerID attribute is a foreign key that references the EmployeeID of the direct superior of agiven employee. Marco Berlusconi is the CEO and has no direct superior, hence, the NULL value in thecorresponding record.

Task : Write an SQL query that calculates the total salary of employees that work under directsupervision of a specific manager, i.e., that produces an ordered result of the following form (highestamount of total money at the top):

EmployeeID EmployeeName TotalMoney2 Marco Berlusconi 2,498,7503 Stephan Meier 798,9008 Petra Hunziker 678,950. . . . . . . . .

2

Page 22: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

FS 2014ETH Zurich Data Modeling and DatabasesSystems Group Exercise Sheet 4Prof. D. Kossmann Discussion: April 1/ 4

Integrity Constraints

1 Foreign Keys and Constraints: Theory

1. What is a foreign key constraint? Solution:

• A foreign key is an attribute of a relation that references a primary key of another relation.Thus, a foreign key implies a referential constraint between two relations.

2. Why are such constraints important? Solution:

• Foreign key constraints are important to ensure consistency between relations, i.e., to preventan inconsistent state of the database.

3. What is referential integrity? Solution:

• Referential integrity means that for every value of one attribute, the same value must exist inanother attribute of a different relation, i.e., in a relational database, any field in a table thatis declared a foreign key, either contains a reference to an existing primary key of anothertable, or a NULL value (if the field was not explicitly defined as not null).

2 Foreign Keys and Constraints: Practice

Consider the following relational schema:

Table Name AttributesReader (readerId, firstName, lastName, address, city, dateOfBirth)Book (isbn, title, author, numberOfPages, yearOfPublication, publisherName)Publisher (publisherName, placeOfPublication)Categories (categoryName, includedIn)Copy (ISBN, copyNumber, shelf, position)Loan (readerId, ISBN, copyNumber, returnDate)BookCategory (ISBN, categoryName)

1. What problems may appear if this schema is realized in SQL without considering integrity con-strains? Solution:

• If primary keys are not used, duplicates of the same data item are possible, e.g., severalcategories with the same name could exist.

1

Page 23: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

• Without foreign key constraints, dangling references after deleting or updating are not pre-vented, e.g., books that no longer exist could still be referenced by the ‘Copy’ table.

• Semantically incorrect values in certain columns could not be forbidden. Standard SQL allowsno constraints in the data domain such as, e.g., prohibiting birthdays in the future.

2. What foreign keys are necessary and which strategies have to be applied when changing them?Please give an explanation why you chose a particular strategy. Solution:

• E.g. Book: publisherName references Publisher(publisherName): ON DELETE SET NULLor NO ACTION, ON UPDATE NO ACTION or CASCADE.If a publisher is deleted, the books which are published by this publisher should exist withoutthe publisher entry (publisher entry could be set to NULL). One can also avoid the deletion.Changing the name of a publisher should be prevented (NO ACTION) or passed (CASCADE).

• E.g. Categories: includedIn references another Category: ON DELETE SET NULL, ONUPDATE NO ACTION or CASCADE.If a super-category is deleted, the sub-category should still exist without it. Changing the nameof a super-category should be passed (CASCADE) or prevented (NO ACTION).

• E.g. BookCategory: ISBN references Book, categoryName references Category. For both: ONDELETE CASCADE, ON UPDATE CASCADE.If a referenced tuple in a N:M relationship is deleted, all references should be deleted.

• E.g. Copy: ISBN references Book: ON DELETE CASCADE, ON UPDATE CASCADE.If a tuple of book is deleted, the dependent tuples must be deleted as well.

• E.g. Loan: readerId references Reader, (ISBN, copyNumber) reference Copy. For both: ONDELETE CASCADE, ON UPDATE CASCADE.If a tuple of reader/copy is deleted, the dependent tuples must be deleted as well.

3. What is the implication of deleting a publisher? What is the consequence of updating a readerId?In both cases, take into account the keys and rules of your solution for question 2.2. Solution:

• If no books of the publisher are available, the publisher is deleted. There are no other effects.If books are available, the deletion is either stopped (NO ACTION concerning the books) orthe publisher of these books is set to NULL.

• The modification of the reader number leads to a modification of the reader number at anylending of the reader. If the DBMS does not support ON UPDATE CASCADE, the update isprevented.

4. Enforce that only readers who live in Zurich can be inserted.

Solution: In the definition (SQL DDL) of the table “Reader”: (city in (‘Zurich’))

5. Enforce that a reader can only borrow up to 20 books! Give a solution using CHECK and a solutionusing Trigger.

Solution:

With CHECK:In Loan:

CONSTRAINT CHECK (NOT EXISTS (SELECT COUNT(*) FROM LoanGROUP BY readerId HAVING COUNT(*) > 20 ) )

Note, that this is not supported in certain DBMS, like DB2. DB2 does not allow subqueries in check.

With a trigger:

CREATE TRIGGER AMAX NO CASCADE

2

Page 24: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

BEFORE INSERT ON LoanREFERENCING NEW AS newlFOR EACH ROWWHEN ( (SELECT COUNT (*) FROM LoanWHERE Loan.readerId = newl.readerId) >= 20)BEGIN ATOMICSIGNAL SQLSTATE ‘-1’SET Message TEXT = ‘Illegal Insert - too many books per reader’;END

SQLSTATE is given as an example, a correct SQLSTATE should be defined as documented.This Trigger treats only the case when the number of borrowed books is increased by INSERT(“BEFORE INSERT”). It is still possible that another reader borrows the books and then the readernumber is changed. In this case one needs another trigger with “ON UPDATE”.This trigger works in DB2.

3 Foreign Keys and Constraints: Practice 2

Consider the following relational schema:

Table Name AttributesEmp (eid: integer, ename: string, age: integer, salary: real)Works (eid: integer, did: integer, pcttime: integer)Dept (did: integer, dname: string, budget: real, managerid: integer)

1. Give an example of a foreign key constraint that involves the Dept relation. What are the optionsfor enforcing this constraint when a user attempts to delete a Dept tuple?

Solution:These foreign key constaints are necessary:

• Works : FOREIGN KEY did REFERENCES Dept(did)

• Dept: FOREIGN KEY managerid REFERENCES Emp(eid)

When deleting a Dept tuple, we need to remove the respective Works tuple(s). This can be donewith the ON DELETE CASCADE rule.

2. Write the SQL statements to create the preceding relations, including appropriate versions of allprimary and foreign key integrity constraints.

Solution:CREATE TABLE Emp ( eid INT, ename VARCHAR, age INT, salary REAL,PRIMARY KEY (eid) );

CREATE TABLE Dept ( did INT, dname VARCHAR, budget REAL, managerId INT,PRIMARY KEY (did),FOREIGN KEY managerid REFERENCES Emp(eid) );

CREATE TABLE Works ( eid INT, did INT, pcttime INT,PRIMARY KEY (eid, did),FOREIGN KEY eid REFERENCES Emp(eid) ON DELETE CASCADE,FOREIGN KEY did REFERENCES Dept(did) ON DELETE CASCADE);

Alternative: if we allow some departments to not have a manager, we could define the managerIdfield of the Dept table as managerId INT ON DELETE SET NULL.

3. Define the Dept relation in SQL so that every department is guaranteed to have a manager.

3

Page 25: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

Solution:managerId INT NOT NULL

4. Write an SQL statement to delete the Toy department. Given the referential integrity constraintsyou chose for this schema, explain what happens when this statement is executed.

Solution:DELETE FROM Dept WHERE dname = ’Toy’;The tuple of the toy department is going to be deleted. Additionally, all tuples from the Worksrelation that reference the Toy department are going to be deleted as well.

4 Bonus: More SQL Practice

Consider the following relational table ‘Employees’ that stores employee data:

EmployeeID EmployeeName ManagerID Salary . . .1 Peter Muller 15 85,000 . . .2 Marco Berlusconi NULL 350,000 . . .3 Stephan Meier 2 210,000 . . .. . . . . . . . . . . . . . .

The ManagerID attribute is a foreign key that references the EmployeeID of the direct superior of agiven employee. Marco Berlusconi is the CEO and has no direct superior, hence, the NULL value in thecorresponding record.

Task : Write an SQL query that calculates the total salary of employees that work under directsupervision of a specific manager, i.e., that produces an ordered result of the following form (highestamount of total money at the top):

EmployeeID EmployeeName TotalMoney2 Marco Berlusconi 2,498,7503 Stephan Meier 798,9008 Petra Hunziker 678,950. . . . . . . . .

Solution:

SELECT a.EmployeeID, a.EmployeeName, SUM(b.Salary) AS TotalMoneyFROM Employees AS a INNER JOIN Employees AS b ON a.EmployeeID=b.ManagerID

GROUP BY a.EmployeeID, a.EmployeeNameORDER BY SUM(b.Salary) DESC;

4

Page 26: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 5

Normal Forms I

1 Functional Dependencies and Keys

Consider the following relation:

Order (Product ID, Product Name, Customer ID, Customer Name, Date,Item Price, Amount, VAT, Gross Total, Net Total)

Note:

• The tax value can vary from product to product (e.g. 8% for books, 16% for luxury items)

• Customer orders on the same day are combined. We only have one order per customer and per day

Questions:

1. Determine all non-trivial functional dependencies of the relation Order.

2. What are the key candidates?

2 Closure of Attributes

Given a relation (A,B,C,D,E,G) with the following eight functional dependencies F :

AB → C D → EG C → A BE → CBC → D CG→ BD ACD → B CE → AG

We define α = BD.Find the closure of attributes α+ of (F, α)

3 Minimal Basis

Find a minimal basis of the following sets of functional dependencies.

1. (a) A→ BC

(b) B → C

(c) A→ B

(d) AB → C

1

Page 27: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

2. (a) AB → C

(b) C → A

(c) BC → D

(d) ACD → B

(e) BE → C

(f) CE → FA

(g) CF → BD

(h) D → EF

4 Decomposition

Consider the following relation R and its decomposition into R1 and R2.

Table Name AttributesR (Student ID, Date Enrolled, Course ID, Room NR, Professor)R1 (Student ID,Date Enrolled,Course ID)R2 (Date Enrolled, Room NR, Professor)

1. Show that the decomposition of R to R1 and R2 is a lossy decomposition? It will help to find thefunctional dependencies first.

2. Find and explain a losless decomposition of the same relation.

For this exercise, assume that courses are taught only by one professor and take place in just oneroom.

2

Page 28: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 5

Normal Forms I - Solutions

1 Functional Dependencies and Keys

Consider the following relation:

Order (Product ID, Product Name, Customer ID, Customer Name, Date,Item Price, Amount, VAT, Gross Total, Net Total)

Note:

• The tax value can vary from product to product (e.g. 8% for books, 16% for luxury items)

• Customer orders on the same day are combined. We only have one order per customer and per day

Questions:

1. Determine all non-trivial functional dependencies of the relation Order.

Product ID → Product Name, Item Price, V ATCustomer ID → Customer NameProduct ID,Customer ID,Date→ AmountItem Price,Amount→ Net TotalNet Total, V AT → Gross TotalProduct Name→ Product IDCustomer Name→ Customer ID

The last two are valid only if the product/costumer names are unique.

Because of the correlation between Net Total, VAT and Gross Total, one could consider the fol-lowing functional dependencies as well:

Gross Total, V AT → Net TotalGross Total,Net Total→ V AT

2. What are the key candidates?

{Product ID,Customer ID,Date}

If product/customer names are unique we have two additional key candidates:

{Product ID,Customer Name,Date}{Product Name,Customer Name,Date}{Product Name,Customer ID,Date}

1

Page 29: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

2 Closure of Attributes

Given a relation (A,B,C,D,E,G) with the following eight functional dependencies F :

AB → C D → EG C → A BE → CBC → D CG→ BD ACD → B CE → AG

We define α = BD.Find the closure of attributes α+ of (F, α)

Solution:To compute the closure of attributes, we use the algorithm presented in the lecture. The solution bellowoutputs the iterations of the algorithm:

Resuls FD usedResult := BD D → EG

BDEG BE → CBCDEG CG→ BDno change CE → AGABCDEG AB → Cno change C → Ano change BC → Dno change ACD → Bno change D → EGno change BE → Cno change CG→ BDno change CE → AG

The solution is α+ = {A,B,C,D,E,G}

3 Minimal Basis

Find a minimal basis of the following sets of functional dependencies.

1. (a) A→ BC

(b) B → C

(c) A→ B

(d) AB → C

Reduction of left sides:AB → C can be reduced to A→ C because {C} ⊆ Closure(F,A) = (A), (AB), (ABC)

Reduction of right sides:A→ BC can be reduced to A→ Bbecause {C} ⊆ Closure(F\{A→ BC}+ {A→ B}, A) = (A), (AB), (ABC)We now have two times the FD A→ B. One of them can be removed.

A→ C can be reduced to A→ � because {C} ⊆ Closure(F\{A→ C}, A) = (A), (AB), (ABC)

As minimal basis remains {A→ B, B → C}

2

Page 30: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

2. (a) AB → C

(b) C → A

(c) BC → D

(d) ACD → B

(e) BE → C

(f) CE → FA

(g) CF → BD

(h) D → EF

Reduction of left sides:ACD → B can be reduced to CD → B because {B} ⊆ Closure(F,CD) = (CD), (CDEF ), (CDEFB)

Reduction of right sides:CD → B can be reduced to CD → �because {B} ⊆ Closure(F\{CD → B}, CD) = (CD), (CDEF ), (CDEFB)

CE → FA can be reduced to CE → Fbecause {A} ⊆ Closure(F\{CE → FA}+ {CE → F}, CE) = (CE), (CEF ), (CEFA)

CF → BD can be reduced to CF → Bbecause {D} ⊆ Closure(F\{CF → BD}+ {CF → B}, CF ) = (CF ), (CFB), (CFBD)

As minimal basis remains:

AB → C, C → A, BC → D, BE → C,CE → F , CF → B, D → EF

An alternative solution would be to right-reduce CF → BD instead of CD → B

4 Decomposition

Consider the following relation R and its decomposition into R1 and R2.

Table Name AttributesR (Student ID, Date Enrolled, Course ID, Room NR, Professor)R1 (Student ID,Date Enrolled,Course ID)R2 (Date Enrolled, Room NR, Professor)

1. Show that the decomposition of R to R1 and R2 is a lossy decomposition? It will help to find thefunctional dependencies first.

The functional dependencies are:Course ID → Room NR,ProfessorStudent ID,Course ID → Date Enrolled

The key of the relation is {Student ID,Course ID}.

R1 ∩ R2 = Date Enrolled but this entails neither R1 nor R2, so the decomposition lemma doesnot hold.

2. Find and explain a losless decomposition of the same relation.

3

Page 31: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

R can be decomposed into:R1(StudentID,DateEnrolled, CourseID)R2(CourseID,RoomNR,Professor)

R1 ∩R2 = Course ID and Course ID → R2

For this exercise, assume that courses are taught only by one professor and take place in just oneroom.

4

Page 32: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 6

Normal Forms II - Solution

1 Normal Forms: Theory Questions

1. 2NF requires that non-key attributes depend on all the attributes comprising a key (and not aproper subset). What additional restriction does 3NF impose?Clarification: The superkey is a superset of a minimum key. In 3NF, non-key attributes mustdepend only on all the attributes comprising a key and nothing else. They are not allowed todepend on non-key attributes alone. So, let’s say we have R(ABCD) with AB being the key. Someof them are allowed to exist without violating the 3NF rules.

(a) AB → C X

(b) A→ D No

(c) B → A X- A is part of a key

(d) B → D No

(e) ABD → C X

2. (a) Does a relation that complies with 3NF also comply with BCNF?Not necessarily. The difference is that for BCNF-compliant relations, all attributes (key andnon-key attributes) must depend on a whole key and nothing else.

(b) Can any schema be decomposed into BCNF? At what price?Every schema can be decomposed into BCNF while maintaining the lossless join property byusing a distinct algorithm. However, it is not guaranteed that the functional dependenciespresent in the initial table will be preserved. So, valid BCNF is not always achievable.

(c) A relation with attributes A,B,C and the following functional dependencies:AB → C,C → B is not in BCNF.Why not?Keys of the relation are: A,B (AB → C) and A,C (C → B,CA→ B)However, the attribute B which is part of a key depends on just C and not AC. That is, itdoes not depend on all attributes of a key .

(d) If we try to decompose it, what will happen?If (according to the decomposition algorithm) we decompose R(A,B,C) in relations (A,C) and(B,C), the functional dependency (AB → C) cannot be preserved.

3. (a) How are multivalued dependencies different than functional dependencies regarding the infor-mation they provide about the schema?Functional dependencies provide information about what kind of tuples are not allowed toexist, that is, if A→ B then we can’t have two tuples with A=3, B=6 and A=3, B=7. Therecan be only one value for B for every given value of A. Multivalued dependencies, on the otherhand provide the information that a set of tuples must exist in a relation.

1

Page 33: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

(b) What conditions must the multivalued dependencies (if any) satisfy for a relation to be in4NF?A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependenciesX →→ Y , X is a superkey, that is, X is either a candidate (minimal) key or a supersetthereof.

(c) Is a decomposition to 4NF always dependency preserving and/or lossless?If we use the algorithm presented in the lecture we can always achieve preservation of thelossless join property.However, dependency presevation is not always achieavable.

4. (a) Give an example of a relation that is in second normal form but not in third normal form.Draw the relational diagram and list all functional dependencies. Explain why it is in 2NFand not in 3NF.

Answer:

SUPPLIERS(supplier no, status, city)

Functional Dependencies: supplier no→ status, supplier no→ city, city → status

Comments:

Lacks mutual independence among non-key attributes.

Mutual dependence is reflected in the transitive dependencies: supplier no → city, city →status.

Anomalies:

INSERT: We cannot record that a particular city has a particular status until we have asupplier in that city.

DELETE: If we delete a supplier which happens to be the last row for a given city value, welose the fact that the city has the given status.

UPDATE: The status for a given city occurs many times, therefore leading to multiple updatesand possible loss of consistency.

(b) Give an example of a relation that is not in 2NF. Draw the relational diagram and list allfunctional dependencies. Explain why it is not in 2NF. Explain how it can be transformedinto a table that is in 2NF.

Answer:

SUPPLIER(supplier no, status, city, part no, quantity)

Functional Dependencies: (supplier no, part no)→ quantity, (supplier no)→ status,(supplier no)→ city, city → status (Supplier’s status is determined by location)

Comments:

Non-key attributes are not mutually independent (city → status).

Non-key attributes are not fully functionally dependent on the primary key (i.e., status andcity are dependent on just part of the key, namely supplier no).

Anomalies:

INSERT: We cannot enter the fact that a given supplier is located in a given city until thatsupplier supplies at least one part (otherwise, we would have to enter a null value for a columnparticipating in the primary key C a violation of the definition of a relation).

DELETE: If we delete the last (only) row for a given supplier, we lose the information thatthe supplier is located in a particular city.

UPDATE: The city value appears many times for the same supplier. This can lead to incon-sistency or the need to change many values of city if a supplier moves.

2

Page 34: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

2 2NF

Consider the following relational schema for LINEITEM:LINEITEM (OrderNumber, ItemNumber, Description, Price, Quantity)

1. Find the functional dependencies and a key of the relation above.ItemNumber → Description, PriceOrderNumber, ItemNumber → Quantity This is a key.

2. What normal form is the above LINEITEM relation in ?The description and the price depend on parts of the key (just the ItemNumber) and not on thewhole key. Therefore the relation is not in 2NF. It is in 1NF though.

3. What are some disadvantages of this choice of schema?The item description and price are stored unnecesarily for each instance of the particular item inthe LINEITEM relation, which might lead to inconsistencies. On the other hand, a positive effectof this is that, in order to find the total price of items, no join is needed.

3 3NF

Consider the following relational schema (CONCERT) describing musical events in Switzerland:

Venue Year Singer GenreX 1999 Cher popZ 1999 Cher popY 2001 Cher popY 2001 Porcupine Tree rock

Is this in 2NF? 3NF? Determine functional dependencies to prove your point (show your working).

Singer → GenreThe key is Singer, Y ear, V enue. The genre depends on the singer which is part of the key. So the

relation is not even in 2NF.

Now look at this slightly different schema:

Venue Year Singer Number of AttendeesX 1999 Cher 10 000Z 1999 Cher 8 000Y 2001 Cher 9 0000Y 2001 Porcupine Tree 10 000

What about this? Is this in 2NF? How about 3NF?V enue, Y ear, Singer → NumberofAttendees. Yes. The only non-key attribute (Number of Attendees)depends on the whole key and nothing else.

The point of this exercise is to show the difference between storing superfluous information in a table(genre of music) and information that is really needed because it differs from tuple to tuple (number ofattendees).

Assumptions: One singer per different concert, and singers stick to just one genre of music.Also, singers do not visit the same venue twice in the same year.

4 Synthesis Algorithm

Consider the following relation:

R(A,B,C,D)

3

Page 35: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

with the following functional dependencies (there are no further non-trivial functional dependencies).

A→ DA,B → CA,C → B

1. Specify the candidate key of this relation.

Answer

Closure(F, AB) = (AB), (ABD), (ABCD) ⇒ AB = Super-KeyClosure(F, A) = (A),(AD)Closure(F, B) = (B)⇒ AB = Candidate Key

Closure(F, AC) = (AC),(ACD),(ABCD) ⇒ AC = Super-KeyClosure(F, C) = (C)⇒ AC = Candidate Key

2. Apply the synthesis algorithm to transform the schema into 3NF (loss and dependency preserving).

Answer

FC = FA → D ⇒ R1=A,DAB → C ⇒ R2 = A,B,CAC → B ⇒ R3 = A,C,BR3 ⊆ R2 ⇒ R1=A,D; R2 = A,B,C

5 Decomposition Algorithm

Consider the following relations and functional dependencies:

1. S1(A,B,C,D) with functional dependencies:

A,C → D, A→ B

2. R1(A,B,D,E); R2(A,C, F ) with functional dependencies:

A→ B,E A→ D F → A A,C → F B,C → E C → A

3. S2(A,B,C) with functional dependencies:

A,B → C C → A

(a) Determine all the candidate keys

(b) In which normal form are the relations?

(c) Transfer the relations in 3NF

(d) Are the resulting relations in BCNF? If not, convert them to BCNF.

4

Page 36: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

Answer

1. S1(A,B,C,D) with functional dependencies:

A,C → D, A→ B

The only key candidate is the pair (A,C).S1 is in 1NF as there are no multi-value attributes.S1 is not in 2NF as B is not entirely dependent on (A,C).

The decomposition of S1(A,B,C,D) into S11(A,B) and S12(A,C,D) is in 2NF, 3NF and BCNF.

2. R1(A,B,D,E); R2(A,C, F ) with functional dependencies:

A→ B,E A→ D F → A A,C → F B,C → E C → A

(a) To determine the key candidates, let’s first compute the minimal basis:

Left reduction:Reduce AC → F to C → F because {F} ⊆ Closure(FD,C) = (C), (AC), (ACF )Reduce BC → E to C → E because {E} ⊆ Closure(FD,C) = (C), (AC), (ABCE)

Right reduction:Reduce C → E to C → � because {E} ⊆ Closure(FD\{C → E}, C) = (C), (AC), (ABCE)Reduce C → A to C → � because {E} ⊆ Closure(FD\{C → A}, C) = (C), (CF ), (ACF )

After applying the union rule to FDs with the same left side, remains as minimal basis:

A→ BDE F → A C → F

A is the candidate key of R1, C is the candidate key of R2.

(b) Normal forms:

R1:1NF yes2NF yes - BDE is dependent on A3NF yes - A is superkeyBCNF yes - A is superkey

R2:1NF yes2NF yes - AF is dependent on C3NF no - F → A is not trivial. A is not part of the candidate key and F is not a superkeyBCNF no - Not in 3NF

The decomposition of R2(A,C, F ) into R21(C,F ) and R22(A,F ) is in 3NF and BCNF.

Note: The synthesis algorithm is not defined for two or more relations. An alternative solu-tion would be to join the relations R1 and R2 and then apply the algorithm. The result wouldbe identical.

3. S2(A,B,C) with functional dependencies:

A,B → C C → A

5

Page 37: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

The two key candidates for S2 are (A,B) and (B,C)S2 is in 1NF as there are no multi-value attributesS2 is in 2NF as there are no non-key attributesS2 is in 3NF as A,B is a superkey, and A is part of a candidate keyS2 is not in BCNF as in FD C → A, C is not a superkey

The decomposition of S2(A,B,C) into S21(B,C) and S22(A,C) is in BCNF.The decomposition is lossless (both relations can be joined), but the FD AB → C is not preserved.

6 Normalization up to BCNF

Consider the following relation:

Shipping(ShipName, ShipType, TripId, Cargo, Port,Date)

and the following functional dependencies:

ShipName→ ShipTypeTripId→ ShipName,Cargo

ShipName,Date→ TripId, Port

or, expanded to a canonical set:

ShipName→ ShipTypeTripId→ ShipName

TripId→ CargoShipName,Date→ TripIdShipName,Date→ Port

and also, we can infer

TripId,Date→ Port

1. Find the candidate key(s).ShipName,Date AND TripId,Date.

2. Apply the Synthesis Algorithm for 3NF. (How would you apply the Decomposition Algorithm toget 3NF?)This relation is neither in 3NF nor in 2NF because ShipType depends on part of a candidatekey (ShipName,Date) and also Cargo depends on part of (another) candidate key (TripId,Date)which violates 2NF. Note that it’s okay for ShipName to depend on TripId for 2NF, because its anattribute that belongs to a key.

Step 1: Define the minimal basis as above.

Step 2: According to the synthesis algorithm, we first create one relation for each functional de-pendency in the minimal basis as follows:R1(ShipName, ShipType) with ShipName→ ShipType - 3NFR2(TripId, ShipName, Cargo) with TripId→ ShipName,Cargo - 3NFR3(ShipName, Date, TripId, Port) with ShipName,Date→ TripId, Port and TripId→ ShipName- 3NF

Note that R3 is indeed in 3NF. R3 has two candidate keys: ShipName, Date and TripId,Date. BothTripId and ShipName are not non-key attributes. Thus, there is no 3NF violation. The relationthough, is not in BCNF.

Step 3: Since one of the keys of Shipping is contained in at least one of the relations, the step isnot applicable in this case.

Step 4: Since there is no Ri ⊆ Rj , no elemination is feasible.

6

Page 38: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

3. Apply the Decomposition Algorithm for BCNF.There is still a problem because ShipName depends on TripId (TripId being part of a candidatekey). Thus, we need to further split R3. According to the decomposition algorithm we firstidentify the ”evil” dependency in the relation which in this case is TripId → ShipName. Then,we define R31(TripId, ShipName) = {TripId} ∪ {ShipName} and R32(Date, TripId, Port) = R3 -{ShipName}. R31 is excluded from the final results since it is contained in R2. Finally we are leftwith the following relations accordingly renamed as:

SHIPS(ShipName, ShipType) with ShipName→ ShipType for R1TRIPS(TripId, ShipName, Cargo) with TripId→ ShipName,Cargo for R2TRIPPORTS(Date, TripId, Port) with TripId,Date→ Port for R32

Note that here, for this particular BCNF decomposition, the functional dependency ShipName,Date→TripId is not preserved. Neither is ShipName,Date→ Port

7

Page 39: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 6

Normal Forms II

1 Normal Forms: Theory Questions

1. 2NF requires that non-key attributes (attributes that are not a part of any candidate key) dependon all the attributes comprising a key of the relation (and not a proper subset).

(a) What additional restriction does 3NF impose?

(b) So, let’s say we have a relation R(ABCD) with AB being the key. Some of the followingfunctional dependencies are allowed to exist without violating the 3NF rules. Which, andwhy?

i. AB → C

ii. A → D

iii. B → A

iv. B → D

v. ABD → C

2. (a) Does a relation that complies with 3NF also comply with BCNF?

(b) Can any schema be decomposed into BCNF? At what price?

(c) A relation with attributes A,B,C and the following functional dependencies:AB → C,C → B is not in BCNF.Why not?

(d) If we try to decompose it using the known algorithm, what will happen?

3. (a) How are multivalued dependencies different than functional dependencies regarding the infor-mation they provide about the schema?

(b) What conditions must the multivalued dependencies (if any) satisfy for a relation to be in4NF?

(c) Is a decomposition to 4NF always dependency preserving and/or lossless?

4. (a) Give an example of a relation that is in second normal form but not in third normal form.Draw the relational diagram and list all functional dependencies. Explain why it is in 2NFand not in 3NF.

(b) Give an example of a relation that is not in 2NF. Draw the relational diagram and list allfunctional dependencies. Explain why it is not in 2NF. Explain how it can be transformedinto a table that is in 2NF.

1

Page 40: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

2 2NF

Consider the following relational schema for LINEITEM:

LINEITEM (OrderNumber, ItemNumber, Description, Price, Quantity)

1. Find the functional dependencies and a key of the relation above (price refers to the price of oneitem).

2. What normal form is the above LINEITEM relation in ?

3. What are some advantages/disadvantages of this choice of schema?

3 3NF

Consider the following relational schema (CONCERT) describing musical events in Switzerland:

Venue Year Singer GenreX 1999 Cher popY 2001 Cher popZ 1999 Cher popY 2001 Porcupine Tree rock

Is this in 2NF? Determine functional dependencies to prove your point (show your working).Assumptions: One singer per different concert, and singers stick to just one genre of music for all of

their career Also, singers do not visit the same venue twice in the same year.

Now look at this slightly different schema:

Venue Year Singer Number of AttendeesX 1999 Cher 10 000Y 2001 Cher 9 0000Z 1999 Cher 8 000Y 2001 Porcupine Tree 10 000

What about this? Is this in 2NF? 3NF?

4 Synthesis Algorithm

Consider the following relation:

R(A,B,C,D)

with the following functional dependencies (there are no further non-trivial functional dependencies).

A → DA,B → CA,C → B

1. Specify the candidate key of this relation.

2. Apply the synthesis algorithm to transform the schema into 3NF (loss and dependency preserving).

2

Page 41: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

5 Decomposition Algorithm

Consider the following relations and functional dependencies:

1. S1(A,B,C,D) with functional dependencies:

A,C → D, A → B

2. R1(A,B,D,E); R2(A,C, F ) with functional dependencies:

A → B,E A → D F → A A,C → F B,C → E C → A

3. S2(A,B,C) with functional dependencies:

A,B → C C → A

(a) Determine all the candidate keys

(b) In which normal form are the relations?

(c) Transfer the relations in 3NF

(d) Are the resulting relations in BCNF? If not, convert them to BCNF.

6 Normalization up to BCNF

Consider the following relation:

Shipping (ShipName, ShipType, TripId, Cargo, Port, Date)

and the following functional dependencies:

ShipName → ShipTypeTripId → ShipName,Cargo

ShipName,Date → TripId, Port

1. Find the candidate key(s).

2. Apply the Synthesis Algorithm for 3NF. (How would you apply the Decomposition Algorithm toget 3NF?)

3. Apply the Decomposition Algorithm for BCNF.

3

Page 42: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

FS 2014ETH Zurich Data Modeling and DatabasesSystems Group Exercise Sheet 7Prof. D. Kossmann Discussion: May 6 / 9

Query Processing

1 Index Access (Random vs. Sequential IO)

Given is the following SQL Query:

SELECT * FROM emp WHERE salary = 200;

Assume that the ”emp” table contains 100 Mio tuples that are stored in 1 Mio pages on the hard disk(on average 100 tuples per page). There exists a B-tree index on the column ”emp.salary”. We have aninfinite amount of memory available and the mentioned B-tree completely resides in memory, while noneof the pages are in memory yet. Let’s suppose a random access to the disk takes 28 ms, a sequentialaccess to the disk takes 0.28ms. Note that nowadays these numbers are much smaller and vary from onedisk type to another.

1. Determine the cost (cumulated disk access time) to answer the SQL query if the B-tree is used.Specify a formula, that takes as parameter the number of tuples that have a salary of 200.

2. For what parameter (number of employees with salary 200) is the usage of the B-tree faster com-pared to the alternative of scanning the entire table? How much faster is the table scan, if all 100Mio employees would have a salary of 200?

2 Memory Management

Given are the following operators:

• Nested-loop Join

• Grace Hash Join

• Sort Merge Join

• Table Scan

• Index Scan (Access the table with a B-tree. Two buffers are required: One for the tree, anotherfor the pages of the table)

Specify for each of these operators, the lowest amount of buffer space (in number of pages) that needs tobe allocated to execute the operator. The answer might depend on the size of the input relation(s). Alsospecify the highest amount of buffer space required for the optimal / fastest execution of each operator.

1

Page 43: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

For some operators, the buffer replacement policy might impact performance. If so, specify the strategyyou would use.

3 Query Processing

Assume we have the following relational schema:

Customer(Cid, Name)Order(Oid, Customer, Volume)

There are 1000 Customer Tuples and 100000 Order Tuples. Size of each Tuple is 100 Bytes.Additionally, assume we have the following query, that asks for the volume of orders of a Customer calledWutz:

SELECT sum(o.Volume)FROM Customer c, Order oWHERE c.Cid = o.Customer AND c.Name = Wutz;

1. Translate this SQL Query to a relational algebra expression. (Hint: you may use the sum function.)

2. Explain roughly how you would implement each operator, i.e. use a keyword that determines whichalgorithm you would use to implement an operator. (E.g. 2-Phase External Sort)

3. For each operator, give the amount of main memory that you would allocate. Explain roughlywhy? How much memory do you need to process the whole query?

4 Alternative Query Plans

Assume we have the following Query:

SELECT *FROM R, S, TWHERE R.rid = S.sid AND S.sid = T.tid AND T.tid = R.rid

1. Give 3 different query plans for this Query (inclusive join method).

2. For each given plan in the previous part, specify the size of each table so that each plan would beoptimum.

3. Take one of the plans from the previous part and assume none of the tables fit in main memory,i.e. memory is at most half the size of the smallest table. Under these conditions, how do youallocate buffers? What will your page replacement policy be?

2

Page 44: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 7

Query Processing - Solution

1 Index Access (Random vs. Sequential IO)

Given is the following SQL Query:

SELECT * FROM emp WHERE salary = 200;

Assume that the ”emp” table contains 100 Mio tuples that are stored in 1 Mio pages on the hard disk(on average 100 tuples per page). There exists a B-tree index on the column ”emp.salary”. We have aninfinite amount of memory available and the mentioned B-tree completely resides in memory, while noneof the pages are in memory yet. Let’s suppose a random access to the disk takes 28 ms, a sequentialaccess to the disk takes 0.28ms. Note that nowadays these numbers are much smaller and vary from onedisk type to another.

1. Determine the cost (cumulated disk access time) to answer the SQL query if the B-tree is used.Specify a formula, that takes as parameter the number of tuples that have a salary of 200.

Index-Scan: To determine the disk access time x (in ms) of an index scan, we use the for-mula from [1]. We get the number of accessed pages by multiplying the total number of pages mwith the probability that a page contains one of k employees with salary 200. Each of these pagesis read via random I/O:

x = m ∗ (1− (1− 1

m)k) ∗ 28

[1] S.B. Yao. Approximating Block Accesses in Database Organizations. In Communications ofthe ACM, 20(4):260-261, 1977.

2. For what parameter (number of employees with salary 200) is the usage of the B-tree faster com-pared to the alternative of scanning the entire table? How much faster is the table scan, if all 100Mio employees would have a salary of 200?

Table-Scan In a table scan the first page is accessed via random I/O. Thereafter, all follow-ing pages can be read sequentially. The disk access time y (in ms) for k employees with the samesalary is always:

y = 28 + (1000000− 1) ∗ 0.28

A table scan is better than an index scan if y ≤ xThis is the case if: k ≥ 10051

1

Page 45: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

If the selectivity of the query is greater then 0.01% (more than 0.01% of all tuples match thepredicate) we should perform a table-scan. If all the employees would have a salary of 200 then wecan compute x for m = 106 and k = 100 ∗ 106. The table scan then is r times faster where r = x

y .

2 Memory Management

Given are the following operators:

• Nested-loop Join

• Grace Hash Join

• Sort Merge Join

• Table Scan

• Index Scan (Access the table with a B-tree. Two buffers are required: One for the tree, anotherfor the pages of the table)

Specify for each of these operators, the lowest amount of buffer space (in number of pages) that needs tobe allocated to execute the operator. The answer might depend on the size of the input relation(s). Alsospecify the highest amount of buffer space required for the optimal / fastest execution of each operator.For some operators, the buffer replacement policy might impact performance. If so, specify the strategyyou would use.

OperatorMinimal buffer

sizeMaximal buffer

sizeReplacement

policy

Nested-Loop Join 2 PagesAll pages of theinner relation

Most-Recently-Used

Grace Hash JoinSquare root of thenumber of pages ofthe inner relation

All pages of theinner relation

-

Sort Merge Join

Sum of the squareroot of the number

of pages of bothrelations

All pages of bothrelations

-

Table Scan 1 Page

Number of pagesthat can be readsequentially from

disk

-

Index Scan 2 PagesEntire B-Tree +All pages of the

relation

Least-Frequently-Used

3 Query Processing

Assume we have the following relational schema:

Customer(Cid, Name)Order(Oid, Customer, Volume)

There are 1000 Customer Tuples and 100000 Order Tuples. Size of each Tuple is 100 Bytes.Additionally, assume we have the following query, that asks for the volume of orders of a Customer calledWutz:

SELECT sum(o.Volume)

2

Page 46: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

FROM Customer c, Order oWHERE c.Cid = o.Customer AND c.Name = Wutz;

1. Translate this SQL Query to a relational algebra expression. (Hint: you may use the sum function.)

SUM (πV olume (ORDER 1Cid=Customer (πCid (σName=′Wutz′ (CUSTOMER))))

2. Explain roughly how you would implement each operator, i.e. use a keyword that determines whichalgorithm you would use to implement an operator. (E.g. 2-Phase External Sort)

for σ: index scan if index available else scan and read out tuples that match the condition.for π: read out the volume from all the tuples.for 1: Nested Loop Join.for SUM: add all the tuples.

3. For each operator, give the amount of main memory that you would allocate. Explain roughlywhy? How much memory do you need to process the whole query?

for σ: 100 Bytes (Stream of Tuples)for π: 100 Bytes (Stream of Tuples)for 1: 200 Bytes (Two Tuples: One for Order and one for Customer)for SUM: 100 Bytes (Stream of Tuples)In total: 200 Bytes is the minimum that we can allocate to process this query.

Note that: The output is not considered in this kind of memory allocation. You can also ar-gue that for each operator you will need to allocate an additional 100 Byte for the output tuple.

4 Alternative Query Plans

Assume we have the following Query:

SELECT *FROM R, S, TWHERE R.rid = S.sid AND S.sid = T.tid AND T.tid = R.rid

1. Give 3 different query plans for this Query (inclusive join method).

(a) R 11 (S 12 T) → 11 : Nested Loop Join and 12 : Grace Hash Join

(b) (R 11 S) 12 T → 11 : Nested Loop Join and 12 : Sort Merge Join

(c) (R 11 T) 12 S → 11 : Indexed Nested Loop Join and 12 : Grace Hash Join

2. For each given plan in the previous part, specify the size of each table so that each plan would beoptimum.

(a) R is small, S and T are big, but their join results in a small intermediate table.

(b) R and S are small tables, T is a big Table.

(c) T is small and R has indices on the join condition thus it can be big, S is big.

3. Take one of the plans from the previous part and assume none of the tables fit in main memory,i.e. memory is at most half the size of the smallest table. Under these conditions, how do youallocate buffers? What will your page replacement policy be?

We take plan 1) R 11 (S 12 T) → 11 : Nested Loop Join and 12 : Grace Hash Join

First we process (S 12 T): In the previous section we assumed that S and T are big tables.

3

Page 47: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

We thus begin to allocate buffers according to the solution of the second exercise of this sheet,i.e. for the grace-hash join we read as many pages as the square root of the total number of pagesof the inner table into the memory, and the replacement policy does not matter. For the NestedLoop Join we take 2 pages (one of the outer and one of the inner relation) into memory and thereplacement policy will be Most-Recently-Used.

4

Page 48: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 8

Query Processing II

1 TID Concept

In the TID Concept, a “forward” is created whenever a record is moved to a different page:

(a) Explain under which circumstances a record moves to another page?

(b) Which basic assumption does the TID Concept make?

(c) What happens if a record moves twice?

(d) How would you implement a table scan in a system that makes use of the TID Concept?

(e) Consider Exercise 1 from Exercise Sheet 7. How does the presence of the TID Concept change theequations and the trade-offs of using an index vs. a table scan for this query?

2 Freespace Management

(a) Explain how a B(+)-tree can be used to organize a table (so-called index-organized table). Sketchhow the records of the following Emp table would be stored if you keep indexes on both columns,name and salary:

create table Emp(name : varchar(256) primary key,salary : int

);

(b) Give an alternative way to store the tuples of the Emp table (independent of the existence of indexes).How would you insert a new Emp record into the database using your scheme?

(c) Compare the approaches of (a) and (b) in terms of storage utilization and time to find free space fora new record.

3 Blobs

Explain how a B(+) tree index can be used to store a Blob (binary large object such as a video).

1

Page 49: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

4 Query Optimization

Enumerate all possible ways to order the joins of the following query:

SELECT *FROM A, B, CWHERE A.a = B.a AND A.a = C.a AND B.a = C.a;

Can you give a formula that estimates the number of join orders for an “n-way” join?

5 Serializability

Consider the following two transactions:

T1 T2read(A); read(A);A:=A-N; A := A+M;write(A); write(A);read(B);B:=B+N;write(B);

Find all possible transaction schedules (histories) and state for each possible schedule whether it isserializable.

2

Page 50: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 8

Query Processing II - Solution

1 TID Concept

In the TID Concept, a “forward” is created whenever a record is moved to a different page:

(a) Explain under which circumstances a record moves to another page?

• When an update to a variable-length field (e.g., of type VARCHAR) causes a record to grow andthere is no more room on the page, the record needs to be moved to a different page.

(b) Which basic assumption does the TID Concept make?

• It is assumed that a “forward” is always smaller than a record.

(c) What happens if a record moves twice?

• The “forward” is simply updated to point to the new location of the record.

(d) How would you implement a table scan in a system that makes use of the TID Concept?

• Following “forwards” would result in random I/O and is not necessary since all pages will bescanned anyway. Therefore, simply ignoring “forwards” for table scans leads to a fast imple-mentation because it allows sequential I/O.

(e) Consider Exercise 1 from Exercise Sheet 7. How does the presence of the TID Concept change theequations and the trade-offs of using an index vs. a table scan for this query?

• Index scan: With no “forwards” the equation does not change. For each of the k records thatwe select a “forward” would cause an additional random I/O request. If for a total of N recordsthere are F “forwards”, then the probability for each of the k records to have a “forward” isF/N , and thus the total amount of additional random I/O requests can be estimated as k∗F/N .The modified equation from Exercise Sheet 7 is depicted below:

x = (m ∗ (1− (1− 1

m)k) + k ∗ F/N) ∗ 28

• Table scan: Since we are ignoring “forwards” (see 1.d), the equation does not change.

1

Page 51: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

2 Freespace Management

(a) Explain how a B(+)-tree can be used to organize a table (so-called index-organized table). Sketchhow the records of the following Emp table would be stored if you keep indexes on both columns,name and salary:

create table Emp(name : varchar(256) primary key,salary : int

);

• In an index-organized table, the entire record (as opposed to a mere record ID (RID)) is storedtogether with the key in a leaf of a B(+)-tree. The leaves of the index-organized table wouldcontain tuples of the following form <name,salary>. The index for salary would be implementedas a secondary index with the leaves containing keys for the index-table, i.e., in this case,tuples of the following form <salary,name>.

(b) Give an alternative way to store the tuples of the Emp table (independent of the existence of indexes).How would you insert a new Emp record into the database using your scheme?

• An alternative would be a so-called heap file. A heap file is a simple list of pages. Inserts arevery efficient, i.e., records are added to the last page of the heap.

(c) Compare the approaches of (a) and (b) in terms of storage utilization and time to find free space fora new record.

• Storage utilization:

– Index-organized table: The B(+)-tree guarantees a storage utilization of at least 50% (be-sides the root page, B(+)-tree pages are always at least half full).

– Heap file: there are no guarantees with respect to storage utilization. In the absence ofdeletes, storage utilization is close to 100%. However, note that if records are deleted,they are typically simply marked as such since reorganizing the entire heap would be tooexpensive. Thus, excessive deleting could lead to poor storage utilization.

• Insert performance:

– Index-organized table: The complexity is O(log N), where N is the total number of keys.To find the right page for the insert, we first need to traverse the tree (O(log N)). Insertinga new record could then cause a split of the page, which potentially could propagate all theway to the root (O(log N)).

– Heap file: The complexity is O(1). With a reference to the last page of the heap file we needone I/O request to read that page and another one to write it back to disk (if the last pagedoes not have enough space, an additional new page needs to be allocated).

3 Blobs

Explain how a B(+) tree index can be used to store a Blob (binary large object such as a video).

• The Blob can be partitioned into multiple pages. Then the B(+) tree is used to index the byteposition in the Blob corresponding to the first byte stored in a page. In the following example, aBlob is partitioned over 1 KB pages:

1024 2048 . . .

bytes 0-1023 bytes 1024-2047 bytes 2048-3071 . . .

2

Page 52: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

4 Query Optimization

Enumerate all possible ways to order the joins of the following query:

SELECT *FROM A, B, CWHERE A.a = B.a AND A.a = C.a AND B.a = C.a;

1. (A ./ B) ./ C

2. (A ./ C) ./ B

3. (B ./ A) ./ C

4. (C ./ A) ./ B

5. (B ./ C) ./ A

6. (C ./ B) ./ A

7. A ./ (B ./ C)

8. A ./ (C ./ B)

9. B ./ (A ./ C)

10. C ./ (A ./ B)

11. B ./ (C ./ A)

12. C ./ (B ./ A)

Can you give a formula that estimates the number of join orders for an “n-way” join?

• There are (n + 1)! possible ways to order the relations in an “n-way” join. This can be seen asfollows, a relation can be at n+1 positions in the sequence of all relations of the join. There are thenn remaining positions for the second relation, and so on, i.e., (n+1)∗n∗ (n−1)∗ . . . ∗1 = (n+1)!.

• The Catalan number

Cn =1

(n + 1)∗(

2nn

)=

1

(n + 1)∗ (2n)!

n! (2n− n)!=

(2n)!

(n + 1)!n!

determines the number of different ways n + 1 factors can be completely parenthesized.

• Hence, the total number of join orders is the number of permutations of the relations times thenumber of ways to parenthesize a permutation:

N#relations−1 = (n + 1)! ∗ (2n)!

(n + 1)!n!=

(2n)!

n!

• We already computed by hand, at the beginning of this exercise, that a 2-way join can be orderedin N2 = 4!

2! = 12 different ways. For a 3-way join there exist N3 = 6!3! = 120 alternatives, and for

a 4-way join N3 = 8!4! = 1680 alternatives.

5 Serializability

Consider the following two transactions:

T1 T2read(A); read(A);A:=A-N; A := A+M;write(A); write(A);read(B);B:=B+N;write(B);

Find all possible transaction schedules (histories) and state for each possible schedule whether it isserializable.

3

Page 53: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

• In the following, we omit the operations that only change the local state (e.g. assignments to localvariables). Furthermore, ri(A) denotes a read operation on A of transaction Ti and wj(B) denotesa write operation on B of transaction Tj.

• The trivial histories H1, H2 are the ones where the transactions T1 and T2 are executed consecu-tively. Histories H8, . . . ,H15 are not serializable because of circular dependencies in the serializ-ability graph. In H8, for instance, r1(A), w2(A) indicates that T1 → T2, however, the subsequentoperations w2(A), w1(A) indicate that T2 → T1.

History SerializationH1 r2(A), w2(A), r1(A), w1(A), r1(B), w1(B) T2, T1

H2 r1(A), w1(A), r1(B), w1(B), r2(A), w2(A) T1, T2

H3 r1(A), w1(A), r2(A), w2(A), r1(B), w1(B) T1, T2

H4 r1(A), w1(A), r2(A), r1(B), w2(A), w1(B) T1, T2

H5 r1(A), w1(A), r2(A), r1(B), w1(B), w2(A) T1, T2

H6 r1(A), w1(A), r1(B), r2(A), w2(A), w1(B) T1, T2

H7 r1(A), w1(A), r1(B), r2(A), w1(B), w2(A) T1, T2

H8 r2(A), r1(A), w2(A), w1(A), r1(B), w1(B) not serializableH9 r2(A), r1(A), w1(A), w2(A), r1(B), w1(B) not serializableH10 r2(A), r1(A), w1(A), r1(B), w2(A), w1(B) not serializableH11 r2(A), r1(A), w1(A), r1(B), w1(B), w2(A) not serializableH12 r1(A), r2(A), w2(A), w1(A), r1(B), w1(B) not serializableH13 r1(A), r2(A), w1(A), w2(A), r1(B), w1(B) not serializableH14 r1(A), r2(A), w1(A), r1(B), w2(A), w1(B) not serializableH15 r1(A), r2(A), w1(A), r1(B), w1(B), w2(A) not serializable

4

Page 54: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 9

Transactions

1 2PL and Snapshot Isolation

Let T1, T2, T3, T4 be transactions that operate on objects A,B,C,D,E. Now consider the followinghistories:

H1 r1(A), r2(B), r3(B), r3(C), w2(A), r2(D), r1(A), w1(B), w2(D), r1(A), w2(C), w2(B)r2(B), w3(B), r2(B), c2, w4(C), r4(C), w4(A), c1, c4, c3

H2 r1(A), r2(C), w3(D), w1(A), r1(D), w2(A), r2(B), r2(C), w2(B), w3(C), r2(A), w1(B),r1(B), r3(D), w1(B), c1, r3(B), c2, c3

H3 r1(E), r2(B), r2(A), w2(B), w2(A), w1(B), r2(D), r2(E), r3(E), r2(A), r2(C), w2(A),w2(D), r1(A), w2(C), w1(A), r1(C), r2(E), r3(D), r1(A), w3(D), w1(A), r3(A), w1(C), r3(A),w1(B), r3(C), r3(B), r3(C), w3(A), c1, c2, c3

H4 r3(A), r2(C), r1(B), w1(A), r1(C), r2(A), a1, w2(C), c2, r3(C), c3

For each of the above histories, answer the following questions:

1. Is the history serializable?

2. If yes, what is the equivalent serial history?

3. Is it possible to execute the given history using 2PL? How does 2PL behave?

4. How does snapshot isolation behave? If the history is accepted, are there any inconsistencies?

2 Serializability reloaded

Is the following history serializable? If yes, what is the equivalent serial history? If it is not serializable,indicate why.

1: T1 select sum(balance) from account;2: T2 insert into account(no, balance) values (4711, 0);3: T2 commit4: T1 select avg(balance) from account;5: T1 commit

1

Page 55: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

3 ACID I

Why is it reasonable to check the integrity constraints at the end of a transaction? Which of theACID-properties is required for this?

4 ACID II

The following is a true story. On May 30th 2009, D.A.K. flew from Zurich to San Francisco via Munich.At the Munich airport, there were electronic barriers equipped with bar-code readers. At the gate,each passenger would insert his/her boarding pass into the bar-code reader, the barrier would open, thepassenger could pass and the barrier would close again. Unfortunately, just when D.A.K. inserted hisboarding pass, the electronic barrier broke down. Hence, he tried again at a different barrier, only to findout that he was rejected because the same boarding pass had allegedly already been used. Nonetheless,the staff let D.A.K. enter the airplane.

Later on, when D.A.K. was already seated in the airplane, it was announced that the take-off wasdelayed for 30 minutes; some passenger’s luggage had to be removed from the airplane, because he orshe apparently had not boarded the airplane. It was only in San Francisco when D.A.K. realized that itwas actually his luggage that was removed from the airplane.

This only happened because an important principle of transaction management was violated. Whichone was it?

5 2PL vs. Snapshot Isolation

Give an example of a history that is accepted by Snapshot Isolation but rejected by 2PL. Is your historyserializable? Does Snapshot Isolation lead to inconsistencies?

2

Page 56: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

ETH Zurich FS 2014Systems Group Data Modeling and DatabasesProf. D. Kossmann Exercise Sheet 9

Transactions - Solution

1 2PL and Snapshot Isolation

Let T1, T2, T3, T4 be transactions that operate on objects A,B,C,D,E. Now consider the followinghistories:

H1 r1(A), r2(B), r3(B), r3(C), w2(A), r2(D), r1(A), w1(B), w2(D), r1(A), w2(C), w2(B)r2(B), w3(B), r2(B), c2, w4(C), r4(C), w4(A), c1, c4, c3

H2 r1(A), r2(C), w3(D), w1(A), r1(D), w2(A), r2(B), r2(C), w2(B), w3(C), r2(A), w1(B),r1(B), r3(D), w1(B), c1, r3(B), c2, c3

H3 r1(E), r2(B), r2(A), w2(B), w2(A), w1(B), r2(D), r2(E), r3(E), r2(A), r2(C), w2(A),w2(D), r1(A), w2(C), w1(A), r1(C), r2(E), r3(D), r1(A), w3(D), w1(A), r3(A), w1(C), r3(A),w1(B), r3(C), r3(B), r3(C), w3(A), c1, c2, c3

H4 r3(A), r2(C), r1(B), w1(A), r1(C), r2(A), a1, w2(C), c2, r3(C), c3

For each of the above histories, answer the following questions:

1. Is the history serializable?

2. If yes, what is the equivalent serial history?

3. Is it possible to execute the given history using 2PL? How does 2PL behave?

4. How does snapshot isolation behave? If the history is accepted, are there any inconsistencies?

Solution

H1

1. The history is not serializable. The subsequence r1(A), w2(A) indicates T1 ! T2, however r2(B), w1(B)indicates T2 ! T1. Hence, there is a cycle in the dependency graph.

3. • 2PL: Non-serializable histories cannot be generated by 2PL.

• SI: The history could not have been generated by Snapshot Isolation; both T1 and T2 writeto B and T2 commits after the beginning of transaction T1, but before the commit of T1.

1

Page 57: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

H2

1. The history is not serializable. The subsequence w1(A), . . . , w2(A) indicates T1 ! T2 whereas thesubsequence r2(B), . . . , w1(B) indicates T2 ! T1. Hence, there is a cycle in the dependency graph.

3. • 2PL: Non-serializable histories cannot be generated by 2PL.

• SI: The history could not have been generated by Snapshot Isoluation; T2 needs to be aborted,since both T1 and T2 write A and B and T1 commits after the beginning, but before the commitof transaction T2.

H3

1. The history is serializable. The corresponding dependency graph is:

T2

T1

T3

2. Hence, an equivalent serial history is: T2 ! T1 ! T3.

3. • SI: The history could not have been generated by Snapshot Isolation: Both T1 and T2 writeA and B and T1 committed after the beginning of transaction, but before the commit of T2.

• 2PL: The history could have been generated by 2PL. In the following, LS(A) and LX(A)denote the attempt of acquiring a lock on data object A for shared and exclusive access,respectively; US(A) and UX(A) denote the release of the shared and exclusive lock:

2

Page 58: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

T1 T2 T3BOTLS(E), r1(E) BOT

LX(A), LX(B), LX(C), LX(D), LS(E)r2(B), r2(A), w2(B), w2(A), UX(B)

LX(B), w1(B)r2(D), r2(E) BOT

LS(E), r3(E)r2(A), r2(C), w2(A), w2(D)UX(A), UX(D)

LX(A), r1(A)w2(C), UX(C)

LX(C), w1(A), r1(C)US(E)

r2(E), US(E)LX(D), r3(D)

r1(A)w3(D)

w1(A), UX(A)LX(A), r3(A)

w1(C), UX(C)r3(A)

w1(B), UX(B)LS(B), LS(C)r3(C), r3(B), r3(C), w3(A)UX(A), UX(D), US(B), US(C), US(E)

CommitCommit

Commit

H4

1. The history is not serializable: The subsequence r3(A), w1(A), r2(A) indicates the dependencyT3 ! T1 ! T2 and the subsequence r2(C), r1(C), w2(C), c2, r3(C) indicates the dependency T1 !T2 ! T3. The resulting dependency graph thus already contains a cycle, however it is not complete.An abort of a transaction is similar to a write on all object which have been written by the aborted

transaction. Hence, since T1 was aborted, the history H4 can be rewritten by replacing a1 byw1(A):

H 04 : r3(A), r2(C), r1(B), w1(A), r1(C), r2(A),w1(A), w2(C), c2, r3(C), c3

Now, the subsequence r2(A), w1(A) indicates an additional dependency T2 ! T1, resulting in thefollowing dependency graph:

T1

T2

T3

3. • 2PL: The history is not serializable and thus could not have been generated by 2PL.

• SI: The history can be generated by SI. T1 is aborted, so it can be neglected. T3 is read-only,therefore T2 and T3 do not conflict.

3

Page 59: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

2 Serializability reloaded

Is the following history serializable? If yes, what is the equivalent serial history? If it is not serializable,indicate why.

1: T1 select sum(balance) from account;2: T2 insert into account(no, balance) values (4711, 0);3: T2 commit

4: T1 select avg(balance) from account;5: T1 commit

Solution

Two histories are equivalent if

• all read-operations of committed transactions return the same result, and

• at the end, the state of the database is the same.

The above history is equivalent to a serial history in which T2 is executed before T1: the database stateis the same for both histories and the read operations (select sum and select avg) return the same result(the newly inserted value is 0 so it’s neutral with respect to sum). Although the above history and theserial history T2, T1 do not order the conflicts in the same way (the sum on the accounts and the insertare not in the same order in both), they are equivalent. This makes the condition in the lecture’s lemmaon histories equivalence su�cient but not necessary.

3 ACID I

Why is it reasonable to check the integrity constraints at the end of a transaction? Which of theACID-properties is required for this?

Solution

The integrity constraints are only checked at the end of a transaction, because during the lifetime of atransaction the database could be in an inconsistent state. A transaction should transform a consistentdatabase state to another consistent database state (the answer to the second question is therefore“consistency”).

4 ACID II

The following is a true story. On May 30th 2009, D.A.K. flew from Zurich to San Francisco via Munich.At the Munich airport, there were electronic barriers equipped with bar-code readers. At the gate,each passenger would insert his/her boarding pass into the bar-code reader, the barrier would open, thepassenger could pass and the barrier would close again. Unfortunately, just when D.A.K. inserted hisboarding pass, the electronic barrier broke down. Hence, he tried again at a di↵erent barrier, only to findout that he was rejected because the same boarding pass had allegedly already been used. Nonetheless,the sta↵ let D.A.K. enter the airplane.

Later on, when D.A.K. was already seated in the airplane, it was announced that the take-o↵ wasdelayed for 30 minutes; some passenger’s luggage had to be removed from the airplane, because he orshe apparently had not boarded the airplane. It was only in San Francisco when D.A.K. realized that itwas actually his luggage that was removed from the airplane.

This only happened because an important principle of transaction management was violated. Whichone was it?

4

Page 60: ER Modeling - ETH Z · g) Which reader has borrowed all the books (by ISBN, not copies) from the author "Ephraim Kishon"?(*) h) For which of the books there is at least one copy available?(*)

Solution

Atomicity: the last part of the transaction (passing the barrier) was not executed but still the trans-action was committed to the database.

5 2PL vs. Snapshot Isolation

Give an example of a history that is accepted by Snapshot Isolation but rejected by 2PL. Is your historyserializable? Does Snapshot Isolation lead to inconsistencies?

Solution

History H4 from exercise 1 is an example; Snapshot Isolation cannot generate inconsistencies in thegiven history because after reading C (the only object that could generate inconsistencies because it isconcurrently written by T2), T3 makes no changes to the database.Another example:

T1 T2

r2(A)r1(B)w1(A)

w2(B)

This history could not have been generated by 2PL, because it leads to a deadlock. T1 waits on Aand T2 waits on B. However, the history could have been generated by Snapshot Isolation, since the twotransactions write to di↵erent objects. Inconsistencies could arise if, for example, in transaction T2 thevalue written to B depends on the value read for A (A was changed in the meantime by T1).

5