38
M.P. Johnson, DBMS, Stern/NYU, Sprin g 2008 1 C20.0046: Database Management Systems Lecture #11 M.P. Johnson Stern School of Business, NYU Spring, 2008

M.P. Johnson, DBMS, Stern/NYU, Spring 20081 C20.0046: Database Management Systems Lecture #11 M.P. Johnson Stern School of Business, NYU Spring, 2008

  • View
    240

  • Download
    4

Embed Size (px)

Citation preview

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

1

C20.0046: Database Management SystemsLecture #11

M.P. Johnson

Stern School of Business, NYU

Spring, 2008

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

2

Agenda Nulls & outer joins

Grouping & aggregation

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

3

Acc(name,bal,type,…) Q2: Find holder of largest account of each type

Note:1. scope of variables

2. this can still be expressed as single SFW

SELECT name, typeFROM Acc a1WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type)

SELECT name, typeFROM Acc a1WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type)

Recall: correlated subqueries

correlation

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

4

New topic: Nulls in SQL If we don’t have a value, can put a NULL

Null can mean several things: Value does not exists Value exists but is unknown Value not applicable

But null is not the same as 0 See Douglas Foster Wallace…

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

5

Null Values x = NULL 4*(3-x)/7 = NULL x = NULL x + 3 – x = NULL x = NULL 3 + (x-x) = NULL x = NULL x = 'Joe' is UNKNOWN

In general: no row using null fields appear in the selection test will pass the test With one exception

Pace Boole, SQL has three boolean values: FALSE = 0 TRUE = 1 UNKNOWN = 0.5

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

6

Null values in boolean expressions C1 AND C2 = min(C1, C2) C1 OR C2 = max(C1, C2) NOT C1 = 1 – C1

height > 6 = UNKNOWN UNKNOWN OR weight > 190 = UNKOWN (age < 25) AND UNKNOWN = UNKNOWN

E.g.age=20height=NULLweight=180

SELECT *FROM PersonWHERE (age < 25) AND (height > 6 OR weight > 190)

SELECT *FROM PersonWHERE (age < 25) AND (height > 6 OR weight > 190)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

7

Comparing null and non-nulls The schema specifies whether null is allowed for

each attribute NOT NULL to forbid Nulls are allowed by default

Unexpected behavior:

Some Persons are not included! The “trichotomy law” does not hold!

SELECT *FROM PersonWHERE age < 25 OR age >= 25

SELECT *FROM PersonWHERE age < 25 OR age >= 25

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

8

Testing for null values Can test for NULL explicitly:

x IS NULL x IS NOT NULL

But: x = NULL is never true

Now it includes all Persons

SELECT *FROM PersonWHERE age < 25 OR age >= 25 OR age IS NULL

SELECT *FROM PersonWHERE age < 25 OR age >= 25 OR age IS NULL

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

9

Null/logic review TRUE AND UNKNOWN = ?

TRUE OR UNKNOWN = ?

UNKNOWN OR UNKNOWN = ?

X = NULL = ?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

10

Next: Outer join Like inner join except that dangling tuples are

included, padded with nulls

Left outerjoin: dangling tuples from left are included Nulls appear “on the right”

Right outerjoin: dangling tuples from right are included Nulls appear “on the left”

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

11

Cross join - example

Name Address Gender Birthdate

Hanks 123 Palm Rd M 01/01/60

Taylor 456 Maple Av F 02/02/40

Lucas 789 Oak St M 03/03/55

Name Address Networth

Spielberg 246 Palm Rd 10M

Taylor 456 Maple Av 20M

Lucas 789 Oak St 30M

MovieStar

MovieExec

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

12

Name Address G. Birthdate Name Address Net

Hanks 123 Palm Rd M 01/01/60

Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M

Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M

Spielberg 246 Palm Rd 10M

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

13

Outer Join - ExampleSELECT * FROM MovieStar LEFT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name

SELECT * FROM MovieStar LEFT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name

SELECT * FROM MovieStar RIGHT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name

SELECT * FROM MovieStar RIGHT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name

Name Address G. Birthdate Name Address Net

Hanks 123 Palm Rd M 01/01/60 Null Null Null

Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M

Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M

Null Null Null Null Spielberg 246 Palm Rd 10M

Name Address G. Birthdate Name Address Net

Hanks 123 Palm Rd M 01/01/60 Null Null Null

Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M

Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M

Null Null Null Null Spielberg 246 Palm Rd 10M

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

14

Outer Join - Example

Name Address Gender Birthdate

Hanks 123 Palm Rd M 01/01/60

Taylor 456 Maple Av F 02/02/40

Lucas 789 Oak St M 03/03/55

Name Address Networth

Spielberg 246 Palm Rd 10M

Taylor 456 Maple Av 20M

Lucas 789 Oak St 30M

MovieStar MovieExec

SELECT * FROM MovieStar FULL OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name

SELECT * FROM MovieStar FULL OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name

Name Address G. Birthdate Name Address Net

Hanks 123 Palm Rd M 01/01/60 Null Null Null

Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M

Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M

Null Null Null Null Spielberg 246 Palm Rd 10M

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

15

New-style outer joins Outer joins may be left, right, or full

FROM A LEFT [OUTER] JOIN B; FROM A RIGHT [OUTER] JOIN B; FROM A FULL [OUTER] JOIN B;

OUTER is optional If OUTER is included, then FULL is the default

Q: How to remember left v. right? A: It indicates the side whose rows are always

included

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

16

Next: Grouping & Aggregation In SQL:

aggregation operators in SELECT, Grouping in GROUP BY clause

Recall aggregation operators: sum, avg, min, max, count

strings, numbers, dates Each applies to scalars Count also applies to row: count(*) Can DISTINCT inside aggregation op: count(DISTINCT x)

Grouping: group rows that agree on single value Each group becomes one row in result

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

17

Aggregation functions Numerical: SUM, AVG, MIN, MAX Char: MIN, MAX

In lexocographic/alphabetic order Any attribute: COUNT

Number of values

SUM(B) = 10 AVG(A) = 1.5 MIN(A) = 1 MAX(A) = 3 COUNT(A) = 4

A B

1 2

3 4

1 2

1 2

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

18

Straight aggregation In R.A. sum(x)total(R) In SQL:

Just put the aggregation op in SELECT NB: aggreg. ops applied to each non-null val

count(x) counts the number of nun-null vals in field x Use count(*) to count the number of rows

SELECT SUM(x) totalFROM R

SELECT SUM(x) totalFROM R

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

19

Straight aggregation example COUNT applies to duplicates, unless otherwise stated:

Better:

Can we say:

same as Count(*), except excludes nulls

SELECT Count(category)FROM ProductWHERE year > 1995

SELECT Count(category)FROM ProductWHERE year > 1995

SELECT COUNT(DISTINCT category)FROM ProductWHERE year > 1995

SELECT COUNT(DISTINCT category)FROM ProductWHERE year > 1995

SELECT category, COUNT(category)FROM ProductWHERE year > 1995

SELECT category, COUNT(category)FROM ProductWHERE year > 1995

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

20

Straight aggregation example Purchase(product, date, price, quantity)

Q: Find total sales for the entire database:

Q: Find total sales of bagels:

SELECT SUM(price * quantity)FROM Purchase

SELECT SUM(price * quantity)FROM Purchase

SELECT SUM(price * quantity)FROM PurchaseWHERE product = 'bagel'

SELECT SUM(price * quantity)FROM PurchaseWHERE product = 'bagel'

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

21

Largest balance again Acc(name,bal,type) Q: Who has the largest balance? Q: Who has the largest balance of each

type?

Can we do these with aggregation functions?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

22

Straight grouping Group rows together by field values Produces one row for each group

I.e., by each (combin. of) grouped val(s) Don’t select non-grouped fields

Reduces to DISTINCT selections:

SELECT productFROM PurchaseGROUP BY product

SELECT productFROM PurchaseGROUP BY product

SELECT DISTINCT productFROM Purchase

SELECT DISTINCT productFROM Purchase

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

23

Grouping & aggregation Sometimes want to group and compute

aggregations by group Aggregation op applied to rows in group, not to all rows in table

Purchase(product, date, price, quantity) Find total sales for products that sold for > 0.50:

SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product

SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

24

Illustrated G&A example

Product Date Price Quantity

Bagel 10/21 0.85 15

Banana 10/22 0.52 7

Banana 10/19 0.52 17

Bagel 10/20 0.85 20

Purchase

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

25

Product Date Price Quantity

Banana 10/19 0.52 17

Banana 10/22 0.52 7

Bagel 10/20 0.85 20

Bagel 10/21 0.85 15

First compute the FROM-WHERE Then GROUP BY product:

Illustrated G&A example

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

26

Product TotalSales

Bagel $29.75

Banana $12.48

Finally, aggregate and select:

Illustrated G&A example

SELECT product, SUM(price*quantity) totalFROM PurchaseWHERW price > .50GROUP BY product

SELECT product, SUM(price*quantity) totalFROM PurchaseWHERW price > .50GROUP BY product

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

27

Illustrated G&A example GROUP BY may be reduced to (a possibly more

complicated) subquery:

SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product

SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product

SELECT DISTINCT x.product, (SELECT SUM(y.price*y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price > .50) totalFROM Purchase xWHERE x.price > .50

SELECT DISTINCT x.product, (SELECT SUM(y.price*y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price > .50) totalFROM Purchase xWHERE x.price > .50

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

28

For every product, what is the total sales and max quantity sold?

Product SumSales MaxQuantity

Banana $12.48 17

Bagel $29.75 20

Multiple aggregations

SELECT product, SUM(price * quantity) SumSales, MAX(quantity) MaxQuantityFROM PurchaseWHERE price > .50GROUP BY product

SELECT product, SUM(price * quantity) SumSales, MAX(quantity) MaxQuantityFROM PurchaseWHERE price > .50GROUP BY product

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

29

Another grouping/aggregation e.g. Movie(title, year, length, studioName)

Q: How many total minutes of film have been produced by each studio?

Strategy: Divide movies into groups per studio, then add lengths per group

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

30

Another grouping/aggregation e.g.

Title Year Length Studio

Star Wars 1977 120 Fox

Jedi 1980 105 Fox

Aviator 2004 800 Miramax

Pulp Fiction 1995 110 Miramax

Lost in Translation

2003 95 Universal

SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio

SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

31

Another grouping/aggregation e.g.

Title Year Length Studio

Star Wars 1977 120 Fox

Jedi 1980 105 Fox

Aviator 2004 800 Miramax

Pulp Fiction 1995 110 Miramax

Lost in Translation

2003 95 Universal

SELECT studio, sum(length) lengthFROM MoviesGROUP BY studio

SELECT studio, sum(length) lengthFROM MoviesGROUP BY studio

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

32

Another grouping/aggregation e.g.

Title Year Length Studio

Star Wars 1977 120 Fox

Jedi 1980 105 Fox

Aviator 2004 800 Miramax

Pulp Fiction 1995 110 Miramax

Lost in Translation

2003 95 Universal

Studio Length

Fox 225

Miramax 910

Universal 95

SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio

SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

33

Grouping/aggregation example StarsIn(SName,Title,Year) Q: Find the year of each star’s first movie

Q: Find the span of each star’s career Look up first and last movies

SELECT sname, min(year) firstyearFROM StarsInGROUP BY sname

SELECT sname, min(year) firstyearFROM StarsInGROUP BY sname

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

34

Account types again Acc(name,bal,type) Q: Who has the largest balance of each

type?

Can we do this with grouping/aggregation?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

35

G & A for constructed relations Movie(title,year,producerSsn,length) MovieExec(name,ssn,netWorth)

Can do the same thing for larger, non-atomic relations Q: How many mins. of film did each producer make?

What happens to non-producer movie-execs?

SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY name

SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY name

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

36

HAVING clauses Sometimes want to limit which rows may be grouped Q: How many mins. of film did each rich producer

make? Rich = netWorth > 10000000

Q: Is HAVING necessary here? A: No, could just add rich req. to WHERE

SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING netWorth > 10000000

SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING netWorth > 10000000

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

37

HAVING clauses Sometimes want to limit which rows may be

grouped Q: How many mins. of film did each rich producer

make? Old = made movies before 1930

Q: Is HAVING necessary here?

SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING min(year) < 1930

SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING min(year) < 1930

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

38

Review Examples from sqlzoo.net

SELECT LFROM R1, …, Rn

WHERE C

SELECT LFROM R1, …, Rn

WHERE C

L(C(R1 x … Rn)L(C(R1 x … Rn)