142
CSC 261/461 Database Systems Lecture 14 Spring 2017 MW 3: 25 pm – 4: 40 pm January 18 – May 3 Dewey 1101

CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Embed Size (px)

Citation preview

Page 1: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

CSC 261/461 – Database SystemsLecture 14

Spring 2017MW 3:25 pm – 4:40 pm

January 18 – May 3Dewey 1101

Page 2: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Announcement

• Midterm on this Wednesday

• Please arrive 5 minutes early; We will start at 3:25 sharp

• I may post a sitting arrangement tomorrow.

• Start studying for MT

CSC261,Spring2017,UR

Page 3: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Are you ready?

CSC261,Spring2017,UR

Page 4: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Night before the exam

CSC261,Spring2017,UR

Page 5: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

During the exam

CSC261,Spring2017,UR

Page 6: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Or Finishing too early

CSC261,Spring2017,UR

Page 7: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

MIDTERM REVIEW

CSC261,Spring2017,UR

Page 8: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Chapters to Read

• Chapter1• Chapter2• Chapter3• Chapter4(Just thebasics.OnlyISArelationships. Evenstudying theslides isfine)• Chapter5,6,7(SQL)• Chapter8• Chapter9• Chapter14(14.1-14.5)• Chapter15(15.1-15.3)

8

Page 9: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Exam Structure

• Problem 1 (20 pts)– Short answers, True/False, or One liner

• Other problems (40 pts)

• Total: 60 pts

• 2 bonus points for writing the class id correctly.

CSC261,Spring2017,UR

Page 10: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Time distribution

• Time crunch– Not really– Should take more than 60 minutes to finish.– Not as relaxed as the quiz

CSC261,Spring2017,UR

Page 11: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

MATERIALS COVERED

CSC261,Spring2017,UR

Page 12: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Questions to ponder

• Why not Lists? Why database?• How related tables avoid problems associated with lists?

CSC261,Spring2017,UR

Page 13: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Problems with Lists

• Multiple Concepts or Themes:– Microsoft Excel vs Microsoft Access

• Redundancy• Anomalies:

– Deletion anomalies– Update anomalies– Insertion anomalies

CSC261,Spring2017,UR

Page 14: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

List vs Database

• Lists do not provide information about relations!

• Break lists into tables

• Facilitates:– Insert– Delete – Update

• Input and Output interface (Forms and Reports)

• Query!

CSC261,Spring2017,UR

Page 15: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Again, Why database?

• To store data• To provide structure• Mechanism for querying, creating, modifying and deleting

data. • CRED (Create, Read, Update, Delete)• Store information and relationships

CSC261,Spring2017,UR

Page 16: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Simplified Database System Environment

CSC261,Spring2017,UR

Page 17: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

SQL

CSC261,Spring2017,UR

Page 18: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

General form SQL

CSC261,Spring2017,UR

SELECT SFROM R1,…,RnWHERE C1GROUP BY a1,…,akHAVING C2

Evaluationsteps:1. EvaluateFROM-WHERE:applycondition

C1 ontheattributesinR1,…,Rn

2. GROUPBYtheattributesa1,…,ak3. ApplyconditionC2 toeachgroup (may

haveaggregates)4. ComputeaggregatesinSandreturnthe

result

Page 19: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

19

Grouping and Aggregation

1.ComputetheFROM andWHERE clauses

2.GroupbytheattributesintheGROUPBY

3.ComputetheSELECT clause:groupedattributesandaggregates

Semanticsofthequery:

Page 20: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

20

1. Compute the FROM and WHERE clauses

Product Date Price QuantityBagel 10/21 1 20Bagel 10/25 1.50 20

Banana 10/3 0.5 10Banana 10/10 1 10

SELECT product, SUM(price*quantity) AS TotalSalesFROM PurchaseWHERE date > ‘10/1/2005’GROUP BY product

FROM

Page 21: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Product Date Price QuantityBagel 10/21 1 20Bagel 10/25 1.50 20

Banana 10/3 0.5 10Banana 10/10 1 10

21

2. Group by the attributes in the GROUP BY

SELECT product, SUM(price*quantity) AS TotalSalesFROM PurchaseWHERE date > ‘10/1/2005’GROUP BY product

GROUP BY Product Date Price Quantity

Bagel10/21 1 2010/25 1.50 20

Banana10/3 0.5 10

10/10 1 10

Page 22: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

22

3. Compute the SELECT clause: grouped attributes and aggregates

SELECT product, SUM(price*quantity) AS TotalSalesFROM PurchaseWHERE date > ‘10/1/2005’GROUP BY product

Product TotalSales

Bagel 50

Banana 15

SELECTProduct Date Price Quantity

Bagel10/21 1 2010/25 1.50 20

Banana10/3 0.5 10

10/10 1 10

Page 23: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

23

HAVING Clause

Samequeryasbefore,exceptthatweconsideronlyproductsthathavemorethan100buyers

HAVINGclausescontainsconditions onaggregates

SELECT product, SUM(price*quantity)FROM PurchaseWHERE date > ‘10/1/2005’GROUP BY productHAVING SUM(quantity) > 100

WhereasWHEREclausesconditiononindividual tuples…

Page 24: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

24

Null Values

Unexpected behavior:

SELECT *FROM PersonWHERE age < 25 OR age >= 25

SomePersonsarenotincluded !

Page 25: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

25

Null Values

Can test for NULL explicitly:– x IS NULL– x IS NOT NULL

SELECT *FROM PersonWHERE age < 25 OR age >= 25

OR age IS NULL

NowitincludesallPersons!

Page 26: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

26

Inner Joins

By default, joins in SQL are “inner joins”:

SELECT Product.name, Purchase.storeFROM Product JOIN Purchase ON Product.name = Purchase.prodName

SELECT Product.name, Purchase.storeFROM Product, PurchaseWHERE Product.name = Purchase.prodName

Product(name, category)Purchase(prodName, store)

Bothequivalent:BothINNERJOINS!

Page 27: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

27

Inner Joins + NULLS = Lost data?

By default, joins in SQL are “inner joins”:

However:Productsthatneversold(withnoPurchasetuple)willbelost!

SELECT Product.name, Purchase.storeFROM Product JOIN Purchase ON Product.name = Purchase.prodName

SELECT Product.name, Purchase.storeFROM Product, PurchaseWHERE Product.name = Purchase.prodName

Product(name, category)Purchase(prodName, store)

Page 28: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

28

name category

Gizmo gadget

Camera Photo

OneClick Photo

prodName store

Gizmo Wiz

Camera Ritz

Camera Wiz

name store

Gizmo Wiz

Camera Ritz

Camera Wiz

Product PurchaseINNER JOIN:

SELECT Product.name, Purchase.storeFROM Product INNER JOIN Purchase

ON Product.name = Purchase.prodName

Note:anotherequivalentwaytowriteanINNERJOIN!

Page 29: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

29

name category

Gizmo gadget

Camera Photo

OneClick Photo

prodName store

Gizmo Wiz

Camera Ritz

Camera Wiz

name store

Gizmo Wiz

Camera Ritz

Camera Wiz

OneClick NULL

Product PurchaseLEFT OUTER JOIN:

SELECT Product.name, Purchase.storeFROM Product LEFT OUTER JOIN Purchase

ON Product.name = Purchase.prodName

Page 30: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

30

Other Outer Joins

• Left outer join:– Include the left tuple even if there’s no match

• Right outer join:– Include the right tuple even if there’s no match

• Full outer join:– Include the both left and right tuples even if there’s no match

Page 31: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Also read

• Triggers• Views• Operations on Databases

– Create, Drop, and Alter

• Operations of Tables– Create, Delete and Update

• Constraint:– Key constraint– Foreign key– On Delete cascade, Set Null, Set Default;

CSC261,Spring2017,UR

Page 32: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

DATABASE DESIGN

CSC261,Spring2017,UR

Page 33: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Database Design Process

1. Requirements analysis– What is going to be stored?

– How is it going to be used?

– What are we going to do with the data?

– Who should access the data?

Technicalandnon-technicalpeopleareinvolved

1.Requirements Analysis 2.Conceptual Design 3.Logical,Physical,Security,etc.

CSC261,Spring2017,UR

Page 34: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Database Design Process

CSC261,Spring2017,UR

2. Conceptual Design

– A high-level description of the database

– Sufficiently precise that technical people can understand it

– But, not so precise that non-technical people can’t participate

ThisiswhereE/Rfitsin.

1.Requirements Analysis 2.Conceptual Design 3.Logical,Physical,Security,etc.

Page 35: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Database Design Process

CSC261,Spring2017,UR

1.Requirements Analysis 2.Conceptual Design 3.Logical,Physical,Security,etc.

3. Implementation:

• LogicalDatabaseDesign

• PhysicalDatabaseDesign

• SecurityDesign

Page 36: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Thisprocessisiteratedmany

times

E/RisavisualsyntaxforDBdesignwhichispreciseenough fortechnicalpoints,butabstractedenough fornon-technical

people

Database Design Process1.Requirements Analysis 2.Conceptual Design 3.Logical,Physical,Security,etc.

MakesProduct

name category

price

Company

name

E/RModel&Diagrams used

CSC261,Spring2017,UR

Page 37: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Requirements Become the E-R Data Model

CSC261,Spring2017,UR

• After the requirements have been gathered, they are transformed into an Entity Relationship (E-R) Data Model

• E-R Models consist of1. Entities2. Attributes

a) Identifiers (Keys)b) Non-key attributes

3. Relationships

Page 38: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

1. E/R BASICS: ENTITIES & RELATIONS

CSC261,Spring2017,UR

Page 39: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Entities and Entity Sets

• An entity set has attributes– Represented by ovals

attached to an entity set

Product

name category

price

Shapesareimportant.Colorsarenot.

CSC261,Spring2017,UR

Page 40: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Keys

• A key is a minimal set of attributes that uniquely identifies an entity.

Product

name category

price

Denoteelements oftheprimarykey byunderlining.

Here,{name,category}isnot akey(itisnotminimal).

TheE/Rmodelforcesustodesignateasingleprimary key,thoughtheremaybemultiplecandidatekeys

CSC261,Spring2017,UR

Page 41: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Relationships

• A relationship connects two or more entity sets.• It is represented by a diamond, with lines to each of the

entity sets involved.• The degree of the relationship defines the number of entity

classes that participate in the relationship– Degree 1 is a unary relationship– Degree 2 is a binary relationship– Degree 3 is a ternary relationship

CSC261,Spring2017,UR

Page 42: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Conceptual Unary Relationship

CSC261,Spring2017,UR

Person Marries

Page 43: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Conceptual Binary Relationship

Person CarsOwns

CSC261,Spring2017,UR

Page 44: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Conceptual Ternary Relationship

Patient DrugPrescription

Doctor

CSC261,Spring2017,UR

Page 45: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

The R in E/R: Relationships

• E, R and A together:

Product

name category

priceCompany

name

Makes

CSC261,Spring2017,UR

Page 46: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

What is a Relationship?

MakesProduct

name category

price

Company

name

Arelationship betweenentitysetsPandCisasubsetofallpossiblepairsofentitiesinPandC,withtuplesuniquelyidentifiedbyPandC’skeys

CSC261,Spring2017,UR

Page 47: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

What is a Relationship?

name category price

Gizmo Electronics $9.99

GizmoLite Electronics $7.50

Gadget Toys $5.50

name

GizmoWorks

GadgetCorp

ProductCompanyC.name P.name P.category P.price

GizmoWorks Gizmo Electronics $9.99

GizmoWorks GizmoLite Electronics $7.50

GizmoWorks Gadget Toys $5.50

GadgetCorp Gizmo Electronics $9.99

GadgetCorp GizmoLite Electronics $7.50

GadgetCorp Gadget Toys $5.50

Company C×Product P

MakesProduct

name categoryprice

Company

name

Arelationship betweenentitysetsPandCisasubsetofallpossiblepairsofentitiesinPandC,withtuplesuniquelyidentifiedbyPandC’skeys

CSC261,Spring2017,UR

Page 48: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

What is a Relationship?

name category price

Gizmo Electronics $9.99

GizmoLite Electronics $7.50

Gadget Toys $5.50

name

GizmoWorks

GadgetCorp

ProductCompanyC.name P.name P.category P.price

GizmoWorks Gizmo Electronics $9.99

GizmoWorks GizmoLite Electronics $7.50

GizmoWorks Gadget Toys $5.50

GadgetCorp Gizmo Electronics $9.99

GadgetCorp GizmoLite Electronics $7.50

GadgetCorp Gadget Toys $5.50

Company C ×Product P

C.name P.name

GizmoWorks Gizmo

GizmoWorks GizmoLite

GadgetCorp Gadget

MakesMakesProduct

name categoryprice

Company

name

Arelationship betweenentitysetsPandCisasubsetofallpossiblepairsofentitiesinPandC,withtuplesuniquelyidentifiedbyPandC’skeys

CSC261,Spring2017,UR

Page 49: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Product

name category

priceCompany

name

Makes

since

• Relationshipsmayhaveattributesaswell.

Forexample:“since”recordswhencompanystartedmakingaproduct

Note:“since” isimplicitlyuniqueperpairhere!Why?

Relationships and Attributes

CSC261,Spring2017,UR

Page 50: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Summary

• Summary

CSC261,Spring2017,UR

IdentifyingRelationship

Page 51: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Design Theory (ER model to Relations)

Page 52: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Entity Sets to Tables

CSC261,Spring2017,UR

Employees

lot

name

ssn

ssn name lot

123-22-3333 Alex 23

234-44-6666 Bob 44

567-88-9787 John 12

CREATE TABLE Employees ( ssn char(11),name varchar(30),lot Integer,PRIMARY KEY (ssn))

Page 53: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Other Conversions (ER model to Tables)

• Relationships:• Many to Many • One to Many• One to One

CSC261,Spring2017,UR

Page 54: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Scenario

CSC261,Spring2017,UR

• One customer can have at max 2 loans. One loan can be given to multiple customers.

What it really means: – One customer can have (0,2) loans– One loan can be given to (1,n) customer – This is a many to many scenario

Page 55: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

CSC261,Spring2017,UR

Crow’sfootNotation

ChenNotation

Page 56: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

FUNCTIONAL DEPENDECIES

CSC261,Spring2017,UR

Page 57: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Prime and Non-prime attributes

• A Prime attribute must be a member of some candidate key

• A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.

CSC261,Spring2017,UR

Page 58: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

58

Back to Conceptual Design

Now that we know how to find FDs, it’s a straight-forward process:

1. Search for “bad” FDs

2. If there are any, then keep decomposing the table into sub-tablesuntil no more bad FDs

3. When done, the database schema is normalized

Page 59: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Boyce-Codd Normal Form (BCNF)

• Main idea is that we define “good” and “bad” FDs as follows:

– X à A is a “good FD” if X is a (super)key• In other words, if A is the set of all attributes

– X à A is a “bad FD” otherwise

• We will try to eliminate the “bad” FDs!– Via normalization

Page 60: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Normalizing into 2NF and 3NF

CSC261,Spring2017,UR

Page 61: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Figure 14.12 Normalization into 2NF and 3NF

Slide 14- 61

Figure 14.12Normalization into 2NF and 3NF. (a) The LOTS relation with its functional dependencies FD1 through FD4. (b) Decomposing into the 2NF relations LOTS1 and LOTS2. (c) Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B. (d) Progressive normalization of LOTS into a 3NF design.

Page 62: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Normal Forms Defined Informally

• 1st normal form– All attributes depend on the key

• 2nd normal form– All attributes depend on the whole key

• 3rd normal form– All attributes depend on nothing but the key

Slide 14- 62

Page 63: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

General Definition of 2NF and 3NF (For Multiple Candidate Keys)

• A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R

• A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on any key of R

Slide 14- 63

Page 64: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

1. BOYCE-CODD NORMAL FORM

64

Page 65: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Figure 14.14 A relation TEACH that is in 3NF but not in BCNF

Slide 14- 65

• Two FDs exist in the relation TEACH:– fd1: { student, course} -> instructor– fd2: instructor -> course

• {student, course} is a candidate key for this relation

• So this relation is in 3NF but not inBCNF

• A relation NOT in BCNF should be decomposed so as to meet this property, – while possibly forgoing the preservation

of all functional dependencies in the decomposed relations.

Page 66: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Achieving the BCNF by Decomposition (2)

n Three possible decompositions for relation TEACHnD1: {student, instructor} and {student, course}nD2: {course, instructor } and {course, student}nD3: {instructor, course } and {instructor, student} ü

Slide 14- 66

Page 67: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

4.3 Interpreting the General Definition of Third Normal Form (2)

n ALTERNATIVE DEFINITION of 3NF: We can restate the definition as:

A relation schema R is in third normal form (3NF) if, whenever a nontrivial FD XàA holds in R, either

a) X is a superkey of R or b) A is a prime attribute of R

The condition (b) takes care of the dependencies that “slip through” (are allowable to) 3NF but are “caught by” BCNF which we discuss next.

Slide 14- 67

Page 68: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

1. BOYCE-CODD NORMAL FORM

68

Page 69: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

What you will learn about in this section

1. Boyce-Codd NormalForm

2. TheBCNFDecompositionAlgorithm

69

Page 70: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

5. BCNF (Boyce-Codd Normal Form)

• Definition of 3NF: • A relation schema R is in 3NF if, whenever a nontrivial FD XàA

holds in R, eithera) X is a superkey of R or b) A is a prime attribute of R

• A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X → A holds in R, then

a) X is a superkey of Rb) There is no b

• Each normal form is strictly stronger than the previous one– Every 2NF relation is in 1NF– Every 3NF relation is in 2NF– Every BCNF relation is in 3NF

Slide 14- 70

Page 71: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Slide 14- 71

Boyce-Codd normal form

Figure 14.13Boyce-Codd normal form. (a) BCNF normalization of

LOTS1A with the functional dependency FD2 being lost in the decomposition. (b) A schematic relation with FDs; it is

in 3NF, but not in BCNF due to the f.d. C → B.

Page 72: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

A relation TEACH that is in 3NF but not in BCNF

Slide 14- 72

• Two FDs exist in the relation TEACH:

– {student, course} à instructor– instructor à course

• {student, course} is a candidate key for this relation

• So this relation is in 3NF but not inBCNF

• A relation NOT in BCNF should be decomposed

X à A

Page 73: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Achieving the BCNF by Decomposition

• Three possible decompositions for relation TEACH– D1: {student, instructor} and {student, course}

– D2: {course, instructor } and {course, student}

– D3: {instructor, course } and {instructor, student}

Slide 14- 73

ü

Page 74: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

74

Boyce-Codd Normal Form

BCNFisasimpleconditionforremovinganomaliesfromrelations:

Inotherwords:thereareno“bad”FDs

ArelationRisinBCNF if:

if{X1,...,Xn}à A isanon-trivial FDinR

then{X1,...,Xn}isasuperkey forR

Page 75: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

75

Example

Whatisthekey?{SSN,PhoneNumber}

Name SSN PhoneNumber CityFred 123-45-6789 206-555-1234 SeattleFred 123-45-6789 206-555-6543 SeattleJoe 987-65-4321 908-555-2121 WestfieldJoe 987-65-4321 908-555-1234 Westfield

{SSN} à {Name,City}

⟹Not inBCNF

ThisFDisbadbecauseitisnot asuperkey

Page 76: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

76

Example

Name SSN CityFred 123-45-6789 SeattleJoe 987-65-4321 Madison

SSN PhoneNumber123-45-6789 206-555-1234123-45-6789 206-555-6543987-65-4321 908-555-2121987-65-4321 908-555-1234

Let’scheckanomalies:• Redundancy?• Update?• Delete?

{SSN} à {Name,City}

NowinBCNF!

ThisFDisnowgoodbecauseitisthekey

Page 77: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

BCNF Decomposition

CSC261,Spring2017,UR

BCNFDecomp(R):

IfXà AcausesBCNFviolation:DecomposeRintoR1=XAR2=R–A(Note:XispresentinbothR1andR2)

ReturnBCNFDecomp(R1),BCNFDecomp(R2)

Page 78: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

R(A,B,C,D,E)

{A} à {B,C}{C} à {D}

Example

BCNFDecomp(R):

IfXà AcausesBCNFviolation:DecomposeRintoR1=XAR2=R–A(Note:XispresentinbothR1andR2)

ReturnBCNFDecomp(R1),BCNFDecomp(R2)

Page 79: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

79

Example

R(A,B,C,D,E){A}+ ={A,B,C,D}≠{A,B,C,D,E}

R1(A,B,C,D){C}+ ={C,D}≠{A,B,C,D}

R2(A,E)R11(C,D) R12(A,B,C)

R(A,B,C,D,E)

{A} à {B,C}{C} à {D}

Page 80: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

2. DECOMPOSITIONS

80

Page 81: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Recap: Decompose to remove redundancies

1. We saw that redundancies in the data (“bad FDs”) can lead to data anomalies

2. We developed mechanisms to detect and remove redundancies by decomposing tables into BCNF1. BCNF decomposition is standard practice- very powerful &

widely used!

3. However, sometimes decompositions can lead to more subtle unwanted effects…

81

Whendoesthishappen?

Page 82: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

82

Decompositions in General

R1 =theprojectionofRonA1,...,An,B1,...,Bm

R(A1,...,An,B1,...,Bm,C1,...,Cp)

R1(A1,...,An,B1,...,Bm) R2(A1,...,An,C1,...,Cp)

R2 =theprojectionofRonA1,...,An,C1,...,Cp

Page 83: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

83

Theory of Decomposition

Name Price CategoryGizmo 19.99 Gadget

OneClick 24.99 CameraGizmo 19.99 Camera

Name PriceGizmo 19.99

OneClick 24.99Gizmo 19.99

Name CategoryGizmo Gadget

OneClick CameraGizmo Camera

I.e.itisaLosslessdecomposition

Sometimesadecompositionis“correct”

Page 84: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

84

Lossy Decomposition

Name Price CategoryGizmo 19.99 Gadget

OneClick 24.99 CameraGizmo 19.99 Camera

Name CategoryGizmo Gadget

OneClick CameraGizmo Camera

Price Category19.99 Gadget24.99 Camera19.99 Camera

What’swronghere?

Howeversometimesitisn’t

Page 85: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Lossless Decompositions

AdecompositionRto(R1,R2)islossless ifR=R1JoinR2

R(A1,...,An,B1,...,Bm,C1,...,Cp)

R1(A1,...,An,B1,...,Bm) R2(A1,...,An,C1,...,Cp)

Page 86: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Lossless Decompositions

86

BCNFdecompositionisalwayslossless.Why?

Note:don’tneed{A1,...,An}à {C1,...,Cp}

If {A1,...,An}à {B1,...,Bm}Thenthedecompositionislossless

R(A1,...,An,B1,...,Bm,C1,...,Cp)

R1(A1,...,An,B1,...,Bm) R2(A1,...,An,C1,...,Cp)

Page 87: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

A relation TEACH that is in 3NF but not in BCNF

Slide 14- 87

• Two FDs exist in the relation TEACH:

– {student, course} à instructor– instructor à course

• {student, course} is a candidate key for this relation

• So this relation is in 3NF but not inBCNF

• A relation NOT in BCNF should be decomposed

X à A

Page 88: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Achieving the BCNF by Decomposition (2)

• Three possible decompositions for relation TEACH– D1: {student, instructor} and {student, course}

– D2: {course, instructor } and {course, student}

– D3: {instructor, course } and {instructor, student}

Slide 14- 88

ü

Page 89: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

A problem with BCNF

Problem:ToenforceaFD,mustreconstructoriginalrelation—oneachinsert!

Page 90: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

90

A Problem with BCNF

{Unit} à {Company}{Company,Product} à {Unit}

WedoaBCNFdecompositionona“bad”FD:{Unit}+ = {Unit, Company}

WelosetheFD{Company,Product} à {Unit}!!

Unit Company Product… … …

Unit Company… …

Unit Product… …

{Unit} à {Company}

Page 91: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

91

So Why is that a Problem?

Noproblemsofar.Alllocal FD’saresatisfied.

Unit CompanyGalaga99 UWBingo UW

Unit ProductGalaga99 DatabasesBingo Databases

Unit Company ProductGalaga99 UW DatabasesBingo UW Databases

Let’sputallthedatabackintoasingletableagain:

{Unit} à {Company}

ViolatestheFD{Company,Product} à {Unit}!!

Page 92: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

92

The Problem

• We started with a table R and FDs F

• We decomposed R into BCNF tables R1, R2, …with their own FDs F1, F2, …

• We insert some tuples into each of the relations—which satisfy their local FDs but when reconstruct it violates some FD across tables!

PracticalProblem:ToenforceFD,mustreconstructR—oneachinsert!

Page 93: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

93

Possible Solutions

• Various ways to handle so that decompositions are all lossless / no FDs lost– For example 3NF- stop short of full BCNF decompositions.

• Usually a tradeoff between redundancy / data anomalies and FD preservation…

BCNFstillmostcommon-withadditionalstepstokeeptrackoflostFDs…

Page 94: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

RELATIONAL ALGEBRA & CALCULUS

CSC261,Spring2017,UR

Page 95: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

• Returns all tuples which satisfy a condition

• Notation: σc(R)• Examples

– σSalary > 40000 (Employee)– σname = “Smith” (Employee)

• The condition c can be =, <, ≤, >, ≥, <>

SELECT *FROM StudentsWHERE gpa > 3.5;

SQL:

RA:𝜎%&'().+(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

Students(sid,sname,gpa)

1. Selection (𝜎)

CSC261,Spring2017,UR

Page 96: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

σSalary >40000 (Employee)

SSN Name Salary1234545 John 2000005423341 Smith 6000004352342 Fred 500000

SSN Name Salary5423341 Smith 6000004352342 Fred 500000

Anotherexample:

CSC261,Spring2017,UR

Page 97: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

• Eliminates columns, then removes duplicates

• Notation: Π A1,…,An (R)• Example: project social-

security number and names:– Π SSN, Name (Employee)– Output schema:

Answer(SSN, Name)

SELECT DISTINCTsname,gpa

FROM Students;

SQL:

RA:Π67'89,%&'(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

Students(sid,sname,gpa)

2. Projection (Π)

CSC261,Spring2017,UR

Page 98: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Π Name,Salary (Employee)

SSN Name Salary1234545 John 2000005423341 John 6000004352342 John 200000

Name SalaryJohn 200000John 600000

Anotherexample:

CSC261,Spring2017,UR

Page 99: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Note that RA Operators are Compositional!

SELECT DISTINCTsname,gpa

FROM StudentsWHERE gpa > 3.5;

Students(sid,sname,gpa)

HowdowerepresentthisqueryinRA?

Π67'89,%&'(𝜎%&'().+(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))

𝜎%&'().+(Π67'89,%&'(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))

Aretheselogicallyequivalent?

CSC261,Spring2017,UR

Page 100: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

• Each tuple in R1 with each tuple in R2

• Notation: R1 × R2• Example:

– Employee × Dependents

• Rare in practice; mainly used to express joins

SELECT *FROM Students, People;

SQL:

RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒

Students(sid,sname,gpa)People(ssn,pname,address)

3. Cross-Product (×)

CSC261,Spring2017,UR

Page 101: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

ssn pname address1234545 John 216 Rosse

5423341 Bob 217 Rosse

sid sname gpa001 John 3.4

002 Bob 1.3

𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒

×

ssn pname address sid sname gpa1234545 John 216 Rosse 001 John 3.4

5423341 Bob 217 Rosse 001 John 3.4

1234545 John 216 Rosse 002 Bob 1.3

5423341 Bob 216 Rosse 002 Bob 1.3

People StudentsAnotherexample:

CSC261,Spring2017,UR

Page 102: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

• Changes the schema, not the instance

• A ‘special’ operator- neither basic nor derived

• Notation: ρ B1,…,Bn (R)

• Note: this is shorthand for the proper form (since names, not order matters!):– ρ A1àB1,…,AnàBn (R)

SELECTsid AS studId,sname AS name,gpa AS gradePtAvg

FROM Students;

SQL:

RA:𝜌6@ABCB ,7'89,%D'B9E@FG%(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

Students(sid,sname,gpa)

Wecareaboutthisoperatorbecauseweareworkinginanamedperspective

Renaming (𝜌)

CSC261,Spring2017,UR

Page 103: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

sid sname gpa001 John 3.4

002 Bob 1.3

𝜌6@ABCB,7'89,%D'B9E@FG%(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

Students

studId name gradePtAvg001 John 3.4

002 Bob 1.3

Students

Anotherexample:

CSC261,Spring2017,UR

Page 104: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

• Notation: R1⋈R2

• Joins R1 and R2 on equality of all shared attributes– If R1 has attribute set A, and R2 has attribute

set B, and they share attributes A⋂B = C, can also be written: R1 ⋈ 𝐶R2

• Our first example of a derived RAoperator:– Meaning: R1⋈ R2 = ΠA U B(σC=D(𝜌K→M(R1) × R2))– Where:

• The rename 𝜌K→M renames the shared attributes in one of the relations

• The selection σC=D checks equality of the shared attributes

• The projection ΠA U B eliminates the duplicate common attributes

SELECT DISTINCTsid, S.name, gpa,ssn, address

FROM Students S,People P

WHERE S.name = P.name;

SQL:

RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒

Students(sid,name,gpa)People(ssn,name,address)

Natural Join (⋈) Note: Textbook notation is *

CSC261,Spring2017,UR

Page 105: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

ssn P.name address1234545 John 216 Rosse

5423341 Bob 217 Rosse

sid S.name gpa001 John 3.4

002 Bob 1.3

𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒

sid S.name gpa ssn address001 John 3.4 1234545 216 Rosse

002 Bob 1.3 5423341 216 Rosse

PeoplePStudentsSAnotherexample:

CSC261,Spring2017,UR

Page 106: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Natural Join

• Given schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈S ?

• Given R(A, B, C), S(D, E), what is R ⋈S ?

• Given R(A, B), S(A, B), what is R ⋈S ?

CSC261,Spring2017,UR

Page 107: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Example: Converting SFW Query -> RA

SELECT DISTINCTgpa,address

FROM Students S,People P

WHERE gpa > 3.5 ANDS.name = P.name;

HowdowerepresentthisqueryinRA?

Π%&','BBD966(𝜎%&'().+(𝑆 ⋈ 𝑃))

Students(sid,name,gpa)People(ssn,name,address)

CSC261,Spring2017,UR

Page 108: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Logical Equivalece of RA Plans

• Given relations R(A,B) and S(B,C):

– Here, projection & selection commute:

• 𝜎FN+(ΠF(𝑅)) = ΠF(𝜎FN+(𝑅))

– What about here?

• 𝜎FN+(ΠQ(𝑅))?= ΠQ(𝜎FN+(𝑅))

CSC261,Spring2017,UR

Page 109: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

• Five basic operators:

1. Selection: σ2. Projection: Π3. Cartesian Product: ×4. Union: ∪5. Difference: -

• Derived or auxiliary operators:

– Intersection– Joins (natural,equi-join, theta join, semi-join)– Renaming: ρ– Division

RelationalAlgebra(RA)

We’lllookatthese

Andalsoatsomeofthesederivedoperators

CSC261,Spring2017,UR

Page 110: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

1. Union (∪) and 2. Difference (–)

• R1 ∪ R2• Example:

– ActiveEmployees∪ RetiredEmployees

• R1 – R2• Example:

– AllEmployees -- RetiredEmployees

R1 R2

R1 R2

CSC261,Spring2017,UR

Page 111: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

What about Intersection (∩) ?

• It is a derived operator• R1 ∩ R2 = R1 – (R1 – R2)• Also expressed as a join!• Example

– UnionizedEmployees ∩ RetiredEmployees

R1 R2

CSC261,Spring2017,UR

Page 112: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Theta Join (⋈θ)

• A join that involves a predicate• R1 ⋈θ R2 = σ θ (R1 × R2)• Here θ can be any condition

SELECT *FROM

Students,PeopleWHERE θ;

SQL:

RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈S 𝑃𝑒𝑜𝑝𝑙𝑒

Students(sid,sname,gpa)People(ssn,pname,address)

Notethatnaturaljoin isathetajoin+aprojection.

CSC261,Spring2017,UR

Page 113: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Equi-join (⋈A=B)

• A theta join where θ is an equality• R1 ⋈A=B R2 = σ A=B (R1 × R2)• Example:

– Employee ⋈SSN=SSN Dependents SELECT *FROM

Students S,People P

WHERE sname = pname;

SQL:

RA:𝑆 ⋈67'89N&7'89 𝑃

Students(sid,sname,gpa)People(ssn,pname,address)

Mostcommonjoininpractice!

CSC261,Spring2017,UR

Page 114: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Semijoin (⋉)

• R ⋉ S = Π A1,…,An (R ⋈ S)• Where A1, …, An are the attributes in R• Example:

– Employee ⋉Dependents SELECT DISTINCTsid,sname,gpa

FROM Students,People

WHEREsname = pname;

SQL:

RA:

𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋉ 𝑃𝑒𝑜𝑝𝑙𝑒

Students(sid,sname,gpa)People(ssn,pname,address)

CSC261,Spring2017,UR

Page 115: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Divison (÷)

– T(Y) = R(Y,X) ÷ S(X)

– Y is the set of attributes of R that are not attributes of S.

– For a tuple t to appear in the result T of the Division, the values in t must appear in R in combination with every tuple in S.

CSC261,Spring2017,UR

Page 116: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Example

CSC261,Spring2017,UR

https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/

R(Y,X) T(Y)S(X)÷ =

SELECT PS1.pilot_nameFROM PilotSkills AS PS1, Hangar AS H1WHERE PS1.plane_name = H1.plane_nameGROUP BY PS1.pilot_name HAVING COUNT(PS1.plane_name) =

(SELECT COUNT(plane_name) FROM Hangar);

Page 117: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Complete Set of Relational Operations

• The set of operations including • Select σ, • Project π• Union ∪• Difference –• Rename ρ, and • Cartesian Product X

– is called a complete set– because any other relational algebra expression can be expressed

by a combination of these five operations.

• For example: – R ∩ S = (R ∪ S ) – ((R − S) ∪ (S − R))– R⋈<join condition>S = σ <join condition> (R X S)

CSC261,Spring2017,UR

Page 118: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Table 8.1 Operations of Relational Algebra

continued on next slideCSC261,Spring2017,UR

Page 119: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Table 8.1 Operations of Relational Algebra (continued)

CSC261,Spring2017,UR

Page 120: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Slide 8- 120

Examples of Queries in Relational Algebra : Procedural Form

• Q1:RetrievethenameandAddressofallEmployeeswhoworkforthe‘Research’department.

Research_dept← σ Dname=’Research’(Department)Research_emps← (RESEARCH_DEPDNumber=Dno Employee)

Result ←π Fname,Lname,Address(Research_emps)

Asasingleexpression,thisquerybecomes:

π Fname,Lname,Address (σ Dname=‘Research’ (Department) Dnumber=Dno Employee)⋈

Page 121: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Slide 8- 121

Examples of Queries in Relational Algebra : Procedural Form

• Q6:RetrievethenamesofEmployeeswhohavenodependents.

ALL_EMPS←π SSN(Employee)

EMPS_WITH_DEPS(SSN)←π Essn(DEPENDENT)EMPS_WITHOUT_DEPS← (ALL_EMPS- EMPS_WITH_DEPS)

RESULT←π Lname,Fname (EMPS_WITHOUT_DEPS* Employee)

Asasingleexpression,thisquerybecomes:

π Lname,Fname((π Ssn (Employee) − ρ Ssn (πEssn (Dependent))) ∗ Employee)

Page 122: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Division

• T(Y) = R(Y,X)÷ S(X)

• Thecompletedivisionexpression:• 𝑅 ÷ 𝑆 = 𝜋W𝑅 − 𝜋W 𝜋W 𝑅 ×𝑆 − 𝑅

Ignoringtheprojections,therearejustthreesteps:• Computeallpossibleattributepairings• Removetheexistingpairings• Removethenon–answersfromthepossibleanswers

CSC261,Spring2017,UR

https://www2.cs.arizona.edu/~mccann/research/divpresentation.pdf

Page 123: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Division Example

Y X

y1 x1

y1 x2

y2 x1

y3 x1

y3 x2

y3 x3

CSC261,Spring2017,UR

X

x1

x2

Y X

y1 x1

y1 x2

y2 x1

y2 x2

y3 x1

y3 x2

Y X

y2 x2

Y

y2

Y

y1

y3

R(Y,X) S(X) 𝜋W 𝑅 ×𝑆 𝜋W 𝑅 ×𝑆 − 𝑅

𝜋W 𝜋W 𝑅 ×𝑆 − 𝑅𝑅 ÷ 𝑆 =

𝜋W𝑅 − 𝜋W 𝜋W 𝑅 ×𝑆 − 𝑅

Page 124: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

What is First Order Logic (Informally)

CSC261,Spring2017,UR

Page 125: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Relational Algebra vs Relational Calculus

• In a calculus expression, there is no order of operations to specify how to retrieve the query result.

• A calculus expression specifies only what information the result should contain.

• A calculus expression specifies what is to be retrieved rather than how to retrieve it.

CSC261,Spring2017,UR

Page 126: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Relational Algebra vs Relational Calculus (cont.)

• Relational calculus is considered to be a nonprocedural or declarative language.

• This differs from relational algebra

• In relational algebra, we need to write a sequence of operations to specify a retrieval request

• hence– relational algebra can be considered as a procedural way of stating

a query.– relational calculus can be considered as a declarative way of

stating a query.

CSC261,Spring2017,UR

Page 127: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Slide 8- 127

Tuple Relational Calculus

• The tuple relational calculus is based on specifying a number of tuple variables.

• Each tuple variable usually ranges over a particular database relation, meaning that the variable may take as its value any individual tuple from that relation.

• A simple tuple relational calculus query is of the form{t | COND(t)}

– where t is a tuple variable and COND (t) is a conditional expression or formula involving t.

– The result of such a query is the set of all tuples t that satisfy COND (t).

Page 128: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Formula and Atoms

• A formula is made up of atoms• An atom can be one of the following:

– R(t) • where R is a relation and t is a tuple variable

– ti.A op tj.B• op can be: =,<, ≤, >, ≥,≠

– ti.A op c• c is a constant.

CSC261,Spring2017,UR

Page 129: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Formula

• A formula (boolean condition) is made up of one or more atoms.

• Atoms are connected via AND,OR,andNOT

• Rules:• Rule 1: Every atom is a formula• Rule 2: If F1 and F2 are formulas, then so are:

– (F1 AND F2),– (F1 OR F2),– NOT(F1),– NOT(F2)

CSC261,Spring2017,UR

Page 130: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Slide 8- 130

Tuple Relational Calculus (continued)

• Example: Find the first and last names of all Employees whose salary is above $50,000

• Tuple calculus expression:{t.Fname,t.Lname |Employee(t)AND t.Salary >50000}

πFname,Lname σ SALARY>50000

Page 131: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

The Existential and Universal Quantifiers

• Two special symbols called quantifiers can appear in formulas; these are the universal quantifier (∀) and the existential quantifier (∃).

• Informally, a tuple variable t is bound if it is quantified– meaning that it appears in an (∀ t) or (∃ t) clause– otherwise, it is free.

Page 132: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Rule 3 and 4

• If F is a formula, then so are (∃ t)(F) and (∀ t)(F)

• The formula (∃ t)(F) is true

– if the formula F evaluates to true for some (at least one) tuple assigned to free occurrences of t in F

– otherwise (∃ t)(F) is false.

• The formula (∀ t)(F) is true

– if the formula F evaluates to true for every tuple assigned to free occurrences of t in F

– otherwise (∀ t)(F) is false.

CSC261,Spring2017,UR

Page 133: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Slide 8- 133

The Existential and Universal Quantifiers (continued)

• ∀ is called the universal or “for all” quantifier

• ∃ is called the existential or “there exists” quantifier

Page 134: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Slide 8- 134

Example Query Using Existential Quantifier

• Example: List the name of each employee who works on some project controlled by department number 5.

• Tuple calculus expression:

{e.Lame,e.Fname|Employee(e)AND((∃𝑥)(∃w)(Project(x)ANDWorks_on(w)ANDx.Dnum=5ANDw.Essn=e.SsnANDx.pNumber=w.Pno))}

Page 135: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Example Query Using Universal Quantifier

• Example: List the name of each employee who works on all project controlled by department number 5.

• Tuple calculus expression:{

e.Lame,e.Fname|Employee(e)AND(

(∀𝑥)(NOT(Project(x))ORNOT(x.Dnum=5)

OR((∃w)(Works_on(w)ANDw.Essn=e.SsnANDx.pNumber=w.Pno))

))

}

Page 136: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Example Query Using Universal Quantifier

• Example: List the name of each employee who works on all project controlled by department number 5.

• Tuple calculus expression:{

e.Lame,e.Fname|Employee(e)ANDF’}whereF’=(

(∀𝑥)(NOT(Project(x))ORNOT(x.Dnum=5)

OR((∃w)(Works_on(w)ANDw.Essn=e.SsnANDx.pNumber=w.Pno))

))

Page 137: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Example Query Using Universal Quantifier

• Example: List the name of each employee who works on all project controlled by department number 5.

• Tuple calculus expression:{

e.Lame,e.Fname|Employee(e)ANDF’}whereF’=(

(∀𝑥)(NOT(Project(x))ORF1))WhereF1=

NOT(x.Dnum=5)OR

((∃w)(Works_on(w)ANDw.Essn=e.SsnANDx.pNumber=w.Pno))

Page 138: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Example Query Using Universal Quantifier

• Example: List the name of each employee who works on all project controlled by department number 5.

• Tuple calculus expression:{

e.Lame,e.Fname|Employee(e)ANDF’}whereF’=(

(∀𝑥)(NOT(Project(x))ORF1))WhereF1=

NOT(x.Dnum=5)OR

((∃w)(Works_on(w)ANDw.Essn=e.SsnANDx.pNumber=w.Pno))

Page 139: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

A à B

• 𝐴 → 𝐵 = 𝑁𝑂𝑇𝐴𝑂𝑅𝐵

• XY→ 𝑍 = 𝑁𝑂𝑇𝑋𝑂𝑅𝑁𝑂𝑇𝑌𝑂𝑅𝑍

CSC261,Spring2017,UR

A B AàB NOTAORB

0 0 1 1

0 1 1 1

1 0 0 0

1 1 1 1

Page 140: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Example Query Using Universal Quantifier

• Example: List the name of each employee who works on all project controlled by department number 5.

• Tuple calculus expression:{

e.Lame,e.Fname|Employee(e)AND F’}whereF’=(

(∀𝑥)((Project(x)ANDx.Dnum=5) →

((∃w)(Works_on(w)ANDw.Essn=e.SsnANDx.pNumber=w.Pno))

Page 141: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Slide 8- 141

The Domain Relational Calculus

• Another variation of relational calculus is called the – domain relational calculus

• Domain calculus differs from tuple calculus in the type of variables used in formulas:– Rather than having variables range over tuples, the variables range

over single values from domains of attributes.

Page 142: CSC 261/461 – Database Systems Lecture 14 · CSC 261/461 – Database Systems Lecture 14 ... True/False, or One liner ... Database Design Process 1. Requirements analysis

Acknowledgement

• Some of the slides in this presentation are taken from the slides provided by the authors.

• Many of these slides are taken from cs145 course offered byStanford University.

• Thanks to YouTube, especially to Dr. Daniel Soper for his useful videos.

CSC261,Spring2017,UR