40
CSC 261/461 Database Systems Lecture 5 Fall 2017

CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

CSC 261/461 – Database SystemsLecture 5

Fall 2017

Page 2: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

MULTISET OPERATIONS IN SQL

2

Page 3: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

UNION

3

SELECT R.AFROM R, SWHERE R.A=S.AUNIONSELECT R.AFROM R, TWHERE R.A=T.A Q1 Q2

𝑟. 𝐴 𝑟. 𝐴 = 𝑠. 𝐴 ∪ 𝑟. 𝐴 𝑟. 𝐴 = 𝑡. 𝐴}

Whyaren’tthereduplicates?

Whatifwewantduplicates?

Page 4: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

UNION ALL

4

SELECT R.AFROM R, SWHERE R.A=S.AUNION ALLSELECT R.AFROM R, TWHERE R.A=T.A Q1 Q2

𝑟. 𝐴 𝑟. 𝐴 = 𝑠. 𝐴 ∪ 𝑟. 𝐴 𝑟. 𝐴 = 𝑡. 𝐴}

ALLindicatesMultisetoperations

Page 5: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

EXCEPT

5

SELECT R.AFROM R, SWHERE R.A=S.AEXCEPTSELECT R.AFROM R, TWHERE R.A=T.A Q1 Q2

𝑟. 𝐴 𝑟. 𝐴 = 𝑠. 𝐴 \{𝑟. 𝐴|𝑟. 𝐴 = 𝑡. 𝐴}

Page 6: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

6

Nested queries: Sub-queries Returning Relations

SELECT DISTINCT c.cityFROM Company cWHERE c.name IN (

SELECT pr.makerFROM Purchase p, Product prWHERE p.product = pr.name

AND p.buyer = ‘Joe Blow‘)

“CitieswhereonecanfindcompaniesthatmanufactureproductsboughtbyJoeBlow”

Company(name, city)Product(name, maker)Purchase(id, product, buyer)

Anotherexample:

Page 7: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

7

Subqueries Returning Relations

SELECT nameFROM ProductWHERE price > ALL(

SELECT priceFROM ProductWHERE maker = ‘Gizmo-Works’)

Product(name, price, category, maker)

Youcanalsouseoperationsoftheform:• s>ALLR• s<ANYR• EXISTSR

Findproductsthataremoreexpensivethanallthoseproducedby“Gizmo-Works”

Ex:

ANYandALLnotsupportedbySQLite.

Page 8: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

8

Subqueries Returning Relations

SELECT p1.nameFROM Product p1WHERE p1.maker = ‘Gizmo-Works’

AND EXISTS(SELECT p2.name

FROM Product p2WHERE p2.maker <> ‘Gizmo-Works’

AND p1.name = p2.name)

Product(name, price, category, maker)

Youcanalsouseoperationsoftheform:• s>ALLR• s<ANYR• EXISTSR

Find‘copycat’products,i.e.productsmadebycompetitorswiththesamenamesasproductsmadeby“Gizmo-Works”

Ex:

<>means!=

Page 9: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

9

Nested queries as alternatives to INTERSECT and EXCEPT

(SELECT R.A, R.BFROM R)INTERSECT(SELECT S.A, S.BFROM S)

SELECT R.A, R.BFROM RWHERE EXISTS(

SELECT *FROM S

WHERE R.A=S.A AND R.B=S.B)

SELECT R.A, R.BFROM RWHERE NOT EXISTS(

SELECT *FROM SWHERE R.A=S.A AND R.B=S.B)

(SELECT R.A, R.BFROM R)EXCEPT(SELECT S.A, S.BFROM S)

Page 10: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

10

Correlated Queries

SELECT DISTINCT titleFROM Movie AS mWHERE year <> ANY(

SELECT yearFROM MovieWHERE title = m.title)

Movie(title, year, director, length)Findmovieswhosetitleappearsmorethanonce.

Notethescopingofthevariables!

Page 11: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

Basic SQL Summary

• SQL provides a high-level declarative language for manipulating data (DML)

• The workhorse is the SFW block

• Set operators are powerful but have some subtleties

• Powerful, nested queries also allowed.

11

Page 12: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

2. AGGREGATION & GROUP BY

12

Page 13: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

What you will learn about in this section

1. Aggregationoperators

2. GROUPBY

3. GROUPBY:withHAVING,semantics

13

Page 14: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

14

Aggregation

SELECT COUNT(*)FROM ProductWHERE year > 1995

ExceptCOUNT,allaggregationsapplytoasingleattribute

SELECT AVG(price)FROM ProductWHERE maker = “Toyota”

• SQLsupportsseveralaggregation operations:• SUM,COUNT,MIN,MAX,AVG

Page 15: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

15

• COUNTappliestoduplicates,unlessotherwisestated

SELECT COUNT(category) FROM ProductWHERE year > 1995

Note:SameasCOUNT(*).Why?

Weprobablywant:

SELECT COUNT(DISTINCT category)FROM ProductWHERE year > 1995

Aggregation: COUNT

Page 16: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

16

Purchase(product, date, price, quantity)

More Examples

SELECT SUM(price * quantity)FROM Purchase

SELECT SUM(price * quantity)FROM PurchaseWHERE product = ‘bagel’

Whatdothesemean?

Page 17: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

17

Simple Aggregations

PurchaseProduct Date Price Quantitybagel 10/21 1 20banana 10/3 0.5 10banana 10/10 1 10bagel 10/25 1.50 20

SELECT SUM(price * quantity)FROM PurchaseWHERE product = ‘bagel’

50(=1*20+1.50*20)

Page 18: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

18

Grouping and Aggregation

SELECT product,SUM(price * quantity) AS TotalSales

FROM PurchaseWHERE date > ‘10/1/2005’GROUP BY product

Let’sseewhatthismeans…

Findtotalsalesafter10/1/2005perproduct.

Purchase(product, date, price, quantity)

Page 19: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

19

Grouping and Aggregation

1.ComputetheFROM andWHERE clauses

2.GroupbytheattributesintheGROUPBY

3.ComputetheSELECT clause:groupedattributesandaggregates

Semanticsofthequery:

Page 20: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

20

1. Compute the FROM and WHERE clauses

Product Date Price QuantityBagel 10/21 1 20Bagel 10/25 1.50 20Banana 10/3 0.5 10Banana 10/10 1 10

SELECT product, SUM(price*quantity) AS TotalSalesFROM PurchaseWHERE date > ‘10/1/2005’GROUP BY product

FROM

Page 21: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

Product Date Price QuantityBagel 10/21 1 20Bagel 10/25 1.50 20Banana 10/3 0.5 10Banana 10/10 1 10

21

2. Group by the attributes in the GROUP BY

SELECT product, SUM(price*quantity) AS TotalSalesFROM PurchaseWHERE date > ‘10/1/2005’GROUP BY product

GROUP BY Product Date Price Quantity

Bagel10/21 1 2010/25 1.50 20

Banana10/3 0.5 1010/10 1 10

Page 22: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

22

3. Compute the SELECT clause: grouped attributes and aggregates

SELECT product, SUM(price*quantity) AS TotalSalesFROM PurchaseWHERE date > ‘10/1/2005’GROUP BY product

Product TotalSales

Bagel 50

Banana 15

SELECTProduct Date Price Quantity

Bagel10/21 1 2010/25 1.50 20

Banana10/3 0.5 1010/10 1 10

Page 23: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

23

HAVING Clause

Samequeryasbefore,exceptthatweconsideronlyproductsthathavemorethan100buyers

HAVINGclausescontainsconditionsonaggregates

SELECT product, SUM(price*quantity)FROM PurchaseWHERE date > ‘10/1/2005’GROUP BY productHAVING SUM(quantity) > 100

WhereasWHEREclausesconditiononindividualtuples…

Page 24: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

24

General form of Grouping and Aggregation

• S = Can ONLY contain attributes a1,…,ak and/or aggregates over other attributes

• C1 = is any condition on the attributes in R1,…,Rn

• C2 = is any condition on the aggregate expressions

SELECT SFROM R1,…,RnWHERE C1GROUP BY a1,…,akHAVING C2

Why?

Page 25: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

25

General form of Grouping and Aggregation

SELECT SFROM R1,…,RnWHERE C1GROUP BY a1,…,akHAVING C2

Evaluationsteps:

1. EvaluateFROM-WHERE:applyconditionC1 ontheattributesinR1,…,Rn

2. GROUPBYtheattributesa1,…,ak3. ApplyconditionC2 toeachgroup(mayhaveaggregates)

4. ComputeaggregatesinSandreturntheresult

Page 26: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

26

Group-by vs. Nested Query

• Find authors who wrote ³ 10 documents:

• Attempt 1: with nested queriesSELECT DISTINCT Author.nameFROM AuthorWHERE COUNT(

SELECT Wrote.urlFROM WroteWHERE Author.login = Wrote.login) > 10

Author(login, name)Wrote(login, url)

ThisisSQLbyanovice

Page 27: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

27

Group-by vs. Nested Query

• Find all authors who wrote at least 10 documents:• Attempt 2: SQL style (with GROUP BY)

SELECT Author.nameFROM Author, WroteWHERE Author.login = Wrote.loginGROUP BY Author.nameHAVING COUNT(Wrote.url) > 10

NoneedforDISTINCT:automaticallyfromGROUPBY

ThisisSQLbyanexpert

Page 28: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

Group-by vs. Nested Query

Which way is more efficient?

• Attempt #1- With nested: How many times do we do a SFW query over all of the Wrote relations?

• Attempt #2- With group-by: How about when written this way?

WithGROUPBYcanbemuchmoreefficient!

Page 29: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

Topics Covered

1. NULLs

2. OuterJoins

3. WithandCase

4. Constraint

5. Schema Change Statements

29

Page 30: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

Comparisons Involving NULLand Three-Valued Logic (cont’d.)

Slide 7- 6

Page 31: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

31

NULLS in SQL

• Whenever we don’t have a value, we can put a NULL

• Can mean many things:– Value does not exists– Value exists but is unknown– Value not applicable– Etc.

• The schema specifies for each attribute if can be null (nullable attribute) or not

• Each individual NULL value considered to be different from every other NULL value

• SQL uses a three-valued logic:– TRUE, FALSE, and UNKNOWN (like Maybe)

• NULL = NULL comparison is avoided

• How does SQL cope with tables that have NULLs?

Page 32: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

32

Null Values

Unexpected behavior:

SELECT *FROM PersonWHERE age < 25 OR age >= 25

SomePersonsarenotincluded!

Page 33: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

33

Null Values

Can test for NULL explicitly:– x IS NULL– x IS NOT NULL

SELECT *FROM PersonWHERE age < 25 OR age >= 25

OR age IS NULL

NowitincludesallPersons!

Page 34: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

34

RECAP: Inner Joins

By default, joins in SQL are “inner joins”:

SELECT Product.name, Purchase.storeFROM Product JOIN Purchase ON Product.name = Purchase.prodName

SELECT Product.name, Purchase.storeFROM Product, PurchaseWHERE Product.name = Purchase.prodName

Product(name, category)Purchase(prodName, store)

Bothequivalent:BothINNERJOINS!

Page 35: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

35

Inner Joins + NULLS = Lost data?

By default, joins in SQL are “inner joins”:

However:Productsthatneversold(withnoPurchasetuple)willbelost!

SELECT Product.name, Purchase.storeFROM Product JOIN Purchase ON Product.name = Purchase.prodName

SELECT Product.name, Purchase.storeFROM Product, PurchaseWHERE Product.name = Purchase.prodName

Product(name, category)Purchase(prodName, store)

Page 36: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

36

Outer Joins

• An outer join returns tuples from the joined relations that don’t have a corresponding tuple in the other relations– I.e. If we join relations A and B on a.X = b.X, and there is an entry in A

with X=5, but none in B with X=5…• A LEFT OUTER JOIN will return a tuple (a, NULL)!

• Left outer joins in SQL: SELECT Product.name, Purchase.storeFROM Product LEFT OUTER JOIN Purchase ON

Product.name = Purchase.prodName

Nowwe’llgetproductseveniftheydidn’tsell

Page 37: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

37

name category

Gizmo gadget

Camera Photo

OneClick Photo

prodName store

Gizmo Wiz

Camera Ritz

Camera Wiz

name store

Gizmo Wiz

Camera Ritz

Camera Wiz

Product PurchaseINNER JOIN:

SELECT Product.name, Purchase.storeFROM Product INNER JOIN Purchase

ON Product.name = Purchase.prodName

Note:anotherequivalentwaytowriteanINNERJOIN!

Page 38: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

38

name category

Gizmo gadget

Camera Photo

OneClick Photo

prodName store

Gizmo Wiz

Camera Ritz

Camera Wiz

name store

Gizmo Wiz

Camera Ritz

Camera Wiz

OneClick NULL

Product PurchaseLEFT OUTER JOIN:

SELECT Product.name, Purchase.storeFROM Product LEFT OUTER JOIN Purchase

ON Product.name = Purchase.prodName

Page 39: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

39

Other Outer Joins

• Left outer join:– Include the left tuple even if there’s no match

• Right outer join:– Include the right tuple even if there’s no match

• Full outer join:– Include the both left and right tuples even if there’s no match

Page 40: CSC 261/461 –Database Systems Lecture 5 · 2017. 9. 20. · 6 Nested queries: Sub-queries Returning Relations SELECT DISTINCTc.city FROM Company c WHEREc.nameIN(SELECTpr.maker FROM

Acknowledgement

• Some of the slides in this presentation are taken from the slides provided by the authors.

• Many of these slides are taken from cs145 course offered byStanford University.

• Thanks to YouTube, especially to Dr. Daniel Soper for his useful videos.

CSC261,Spring2017,UR