48
Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 1 C hapter6 How to code subqueries

Murach’s SQL Server 2008, C6© 2008, Mike Murach & Associates, Inc.Slide 1

Embed Size (px)

Citation preview

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 1

Chapter 6

How to code subqueries

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 2

Objectives

Applied

Code SELECT statements that require subqueries.

Code SELECT statements that use common table expressions (CTEs) to define the subqueries.

Knowledge

Describe the way subqueries can be used in the WHERE, HAVING, FROM and SELECT clauses of a SELECT statement.

Describe the difference between a correlated subquery and a noncorrelated subquery.

Describe the use of common table expressions (CTEs).

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 3

Objectives (continued) Explain the difference between a correlated subquery and a

noncorrelated subquery.

Given a SELECT statement that uses a subquery, explain how the result set of the subquery will affect the final result set.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 4

How to use subqueries A subquery is a SELECT statement that’s coded within another

SQL statement. A subquery can return a single value, a result set that contains a

single column, or a result set that contains one or more columns. A subquery that returns a single value can be coded, or introduced,

anywhere an expression is allowed. A subquery that returns a single column can be introduced in place

of a list of values, such as the values for an IN phrase. A subquery that returns one or more columns can be introduced in

place of a table in the FROM clause. A subquery that’s used in a WHERE or HAVING clause is called

a subquery search condition or a subquery predicate. This is themost common use for a subquery.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 5

How to use subqueries (continued) The syntax for a subquery is the same as for a standard SELECT

statement. However, a subquery doesn’t typically include the GROUP BY or HAVING clause, and it can’t include an ORDER BY clause unless the TOP phrase is used.

Subqueries can be nested within other subqueries. However, subqueries that are nested more than two or three levels deep can be difficult to read and can result in poor performance.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 6

Four ways to introduce a subquery in a SELECT statement 1. In a WHERE clause as a search condition

2. In a HAVING clause as a search condition

3. In the FROM clause as a table specification

4. In the SELECT clause as a column specification

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 7

A SELECT statement that uses a subquery in theWHERE clause

SELECT InvoiceNumber, InvoiceDate, InvoiceTotalFROM InvoicesWHERE InvoiceTotal > (SELECT AVG(InvoiceTotal) FROM Invoices)ORDER BY InvoiceTotal

The value returned by the subquery1879.7413

The result set

(21 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 8

How subqueries compare to joins Like a join, a subquery can be used to code queries that work with

two or more tables. Most subqueries can be restated as joins and most joins can be

restated as subqueries.

Advantages of joins The result set of a join can include columns from both tables. The

result set of a query with a subquery can only include columnsfrom the table named in the outer query, not in the subquery.

A join tends to be more intuitive when it uses an existingrelationship between the two tables, such as a primary key toforeign key relationship.

A query with a join typically performs faster than the same querywith a subquery, especially if the query uses only inner joins.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 9

Advantages of subqueries You can use a subquery to pass an aggregate value to the outer

query. A subquery tends to be more intuitive when it uses an ad hoc

relationship between the two tables. Long, complex queries can sometimes be easier to code using

subqueries.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 10

A query that uses an inner joinSELECT InvoiceNumber, InvoiceDate, InvoiceTotalFROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorIDWHERE VendorState = 'CA'ORDER BY InvoiceDate

The same query restated with a subquerySELECT InvoiceNumber, InvoiceDate, InvoiceTotalFROM InvoicesWHERE VendorID IN (SELECT VendorID FROM Vendors WHERE VendorState = 'CA')ORDER BY InvoiceDate

The result set returned by both queries

(40 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 11

The syntax of a WHERE clause that uses an INphrase with a subquery

WHERE test_expression [NOT] IN (subquery)

How to use subqueries with the IN operator You can introduce a subquery with the IN operator to provide

the list of values that are tested against the test expression. When you use the IN operator, the subquery must return a

single column of values. A query that uses the NOT IN operator with a subquery can

typically be restated using an outer join.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 12

A query that returns vendors without invoicesSELECT VendorID, VendorName, VendorStateFROM VendorsWHERE VendorID NOT IN (SELECT DISTINCT VendorID FROM Invoices)

The result of the subquery

(34 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 13

The result set of vendors without invoices

(88 rows)

The query restated without a subquerySELECT Vendors.VendorID, VendorName, VendorStateFROM Vendors LEFT JOIN Invoices ON Vendors.VendorID = Invoices.VendorIDWHERE Invoices.VendorID IS NULL

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 14

The syntax of a WHERE clause that compares anexpression with the value returned by a subquery

WHERE expression comparison_operator [SOME|ANY|ALL] (subquery)

How to compare the result of a subquery with anexpression You can use a comparison operator in a search condition to

compare an expression with the results of a subquery. If you code a search condition without the SOME, ANY, and ALL

keywords, the subquery must return a single value.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 15

A query that returns invoices with a balance dueless than the average

SELECT InvoiceNumber, InvoiceDate, InvoiceTotal, InvoiceTotal - PaymentTotal - CreditTotal AS BalanceDueFROM InvoicesWHERE InvoiceTotal - PaymentTotal - CreditTotal > 0 AND InvoiceTotal - PaymentTotal - CreditTotal < (SELECT AVG(InvoiceTotal - PaymentTotal – CreditTotal) FROM Invoices WHERE InvoiceTotal - PaymentTotal - CreditTotal > 0)ORDER BY InvoiceTotal DESC

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 16

The result of the subquery (the average balance due)

2910.9472

The result set

(9 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 17

How the ALL keyword works

ConditionEquivalentexpression Description

x > ALL (1, 2) x > 2 x must be greater than themaximum value returned bythe subquery.

x < ALL (1, 2) x < 1 x must be less than theminimum value returned bythe subquery.

x = ALL (1, 2) (x = 1) AND(x = 2)

This condition evaluates toTrue only if the subqueryreturns a single value or if allthe values returned by thesubquery are the same.

x <> ALL (1, 2) (x <> 1) AND(x <> 2)

This condition is equivalent to:x NOT IN (1, 2)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 18

How the ALL keyword works (continued) The ALL keyword tests that a comparison condition is true for all

of the values returned by a subquery.

If no rows are returned by the subquery, a comparison that uses ALL is always true.

If all of the rows returned by the subquery contain a null value, a comparison that uses ALL is always false.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 19

A query that returns invoices that are larger than the largest invoice for vendor 34

SELECT VendorName, InvoiceNumber, InvoiceTotal FROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorID WHERE InvoiceTotal > ALL (SELECT InvoiceTotal FROM Invoices WHERE VendorID = 34) ORDER BY VendorName

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 20

The result of the subquery (the invoice totals forvendor 34)

The result set

(25 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 21

How the ANY and SOME keywords work

ConditionEquivalentexpression Description

x > ANY (1, 2) x > 1 x must be greater than theminimum value returned by thesubquery.

x < ANY (1, 2) x < 2 x must be less than the maximumvalue returned by the subquery.

x = ANY (1, 2) (x = 1) OR(x = 2)

This condition is equivalent to:x IN (1, 2)

x <> ANY (1, 2) (x <> 1) OR(x <> 2)

This condition will evaluate toTrue for any non-empty result setcontaining at least one non-nullvalue that isn’t equal to x.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 22

How the ANY and SOME keywords work (continued) The ANY or SOME keyword tests that a condition is true for at

least one of the values returned by a subquery. SOME is the ANSI-standard keyword, but ANY is more commonly used.

If no rows are returned by the subquery or all of the rows returned by the subquery contain a null value, a comparison that uses ANY or SOME is always false.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 23

A query that returns invoices smaller than the largest invoice for vendor 115

SELECT VendorName, InvoiceNumber, InvoiceTotal FROM Vendors JOIN Invoices ON Vendors.VendorID = Invoices.InvoiceID WHERE InvoiceTotal < ANY (SELECT InvoiceTotal FROM Invoices WHERE VendorID = 115)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 24

The result of the subquery (the invoice totals forvendor 115)

The result set

(17 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 25

How to code correlated subqueries A correlated subquery is a subquery that is executed once for each

row processed by the outer query. A noncorrelated subquery isexecuted only once.

A correlated subquery refers to the value of a column in the outerquery. Because that value varies depending on the row that’s beingprocessed, each execution of the subquery returns a different result.

To refer to a value in the outer query, a correlated subquery uses aqualified column name that includes the table name from the outerquery.

If the subquery uses the same table as the outer query, an alias, orcorrelation name, must be assigned to one of the tables to removeambiguity.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 26

A query that uses a correlated subquery tocalculate the average invoice total for each vendor

SELECT VendorID, InvoiceNumber, InvoiceTotalFROM Invoices AS Inv_MainWHERE InvoiceTotal > (SELECT AVG(InvoiceTotal) FROM Invoices AS Inv_Sub WHERE Inv_Sub.VendorID = Inv_Main.VendorID)ORDER BY VendorID, InvoiceTotal

The value returned by the subquery for vendor 9528.5016

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 27

The result set for the query with the correlatedsubquery

(36 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 28

The syntax of a subquery that uses the EXISTSoperator

WHERE [NOT] EXISTS (subquery)

How to use the EXISTS operator You can use the EXISTS operator to test that one or more rows

are returned by the subquery. You can use NOT along with the EXISTS operator to test that no

rows are returned by the subquery. A subquery with the EXISTS operator doesn’t return any rows.

It returns an indication of whether any rows meet the condition. Because no rows are returned by the subquery, it doesn’t matter

what columns you specify in the SELECT clause. The EXISTS operator is used most often with correlated

subqueries.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 29

A query that returns vendors without invoicesSELECT VendorID, VendorName, VendorStateFROM VendorsWHERE NOT EXISTS (SELECT * FROM Invoices WHERE Invoices.VendorID = Vendors.VendorID)

The result set

(88 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 30

How to code subqueries in the FROM clause A subquery that’s coded in the FROM clause returns a result

set called a derived table. When you create a derived table, you must assign an alias to

it. Then, you can use the derived table within the outer queryjust as you would any other table.

When you code a subquery in the FROM clause, you mustassign names to any calculated values in the result set.

Derived tables are most useful when you need to furthersummarize the results of a summary query.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 31

A query that uses a derived table to retrieve thetop 5 vendors by average invoice total

SELECT Invoices.VendorID, MAX(InvoiceDate) AS LatestInv, AVG(InvoiceTotal) AS AvgInvoiceFROM Invoices JOIN (SELECT TOP 5 VendorID, AVG(InvoiceTotal) AS AvgInvoice FROM Invoices GROUP BY VendorID ORDER BY AvgInvoice DESC) AS TopVendor ON Invoices.VendorID = TopVendor.VendorIDGROUP BY Invoices.VendorIDORDER BY LatestInv DESC

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 32

The derived table generated by the subquery thatretrieves the top 5 vendors

The result set

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 33

How to code subqueries in the SELECT clause When you code a subquery for a column specification in the

SELECT clause, the subquery must return a single value. A subquery that’s coded within a SELECT clause is usually a

correlated subquery. Subqueries are seldom coded in the SELECT clause. Joins are

used instead because they’re generally faster and morereadable.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 34

A query that uses a correlated subquery in its SELECT to retrieve the most recent invoice for each vendor

SELECT DISTINCT VendorName, (SELECT MAX(InvoiceDate) FROM Invoices WHERE Invoices.VendorID = Vendors.VendorID) AS LatestInv FROM Vendors ORDER BY LatestInv DESC

The result set

(122 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 35

The same query using a join instead of asubquery in the SELECT clause

SELECT VendorName, MAX(InvoiceDate) AS LatestInvFROM Vendors JOIN Invoices ON Vendors.VendorID = Invoices.VendorIDGROUP BY VendorNameORDER BY LatestInv DESC

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 36

A procedure for building complex queries1. State the problem to be solved by the query in English.

2. Use pseudocode to outline the query. The pseudocode shouldidentify the subqueries used by the query and the data they return.It should also include aliases used for any derived tables.

3. If necessary, use pseudocode to outline each subquery.

4. Code the subqueries and test them to be sure that they return thecorrect data.

5. Code and test the final query.

The problem to be solved Which vendor in each state has the largest invoice total?

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 37

Pseudocode for the querySELECT Summary1.VendorState, Summary1.VendorName, TopInState.SumOfInvoicesFROM (Derived table returning VendorState, VendorName, SumOfInvoices) AS Summary1 JOIN (Derived table returning VendorState, MAX(SumOfInvoices)) AS TopInState ON Summary1.VendorState = TopInState.VendorState AND Summary1.SumOfInvoices = TopInState.SumOfInvoicesORDER BY Summary1.VendorState

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 38

Pseudocode for the TopInState subquerySELECT Summary2.VendorState, MAX(Summary2.SumOfInvoices)FROM (Derived table returning VendorState, VendorName, SumOfInvoices) AS Summary2GROUP BY Summary2.VendorState

The code for the Summary1 and Summary2subqueries

SELECT V_Sub.VendorState, V_Sub.VendorName, SUM(I_Sub.InvoiceTotal) AS SumOfInvoicesFROM Invoices AS I_Sub JOIN Vendors AS V_Sub ON I_Sub.VendorID = V_Sub.VendorIDGROUP BY V_Sub.VendorState, V_Sub.VendorName

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 39

The result of the Summary1 and Summary2subqueries

(34 rows)

The result of the TopInState subquery

(10 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 40

All of the code for the query with three subqueries SELECT Summary1.VendorState, Summary1.VendorName, TopInState.SumOfInvoices FROM (SELECT V_Sub.VendorState, V_Sub.VendorName, SUM(I_Sub.InvoiceTotal) AS SumOfInvoices FROM Invoices AS I_Sub JOIN Vendors AS V_Sub ON I_Sub.VendorID = V_Sub.VendorID GROUP BY V_Sub.VendorState, V_Sub.VendorName) AS Summary1 JOIN (SELECT Summary2.VendorState, MAX(Summary2.SumOfInvoices) AS SumOfInvoices FROM (SELECT V_Sub.VendorState, V_Sub.VendorName, SUM(I_Sub.InvoiceTotal) AS SumOfInvoices FROM Invoices AS I_Sub JOIN Vendors AS V_Sub ON I_Sub.VendorID = V_Sub.VendorID GROUP BY V_Sub.VendorState, V_Sub.VendorName) AS Summary2 GROUP BY Summary2.VendorState) AS TopInState ON Summary1.VendorState = TopInState.VendorState AND Summary1.SumOfInvoices = TopInState.SumOfInvoices ORDER BY Summary1.VendorState

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 41

The result set of the query with three subqueries

(10 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 42

The syntax of a CTE WITH cte_name1 AS (query_definition1) [, cte_name2 AS (query_definition2)] [...] sql_statement

How to code a CTE A common table expression (CTE) is an expression that creates

one or more temporary tables that can be used by the following query.

To use a CTE with a query, you code the WITH keyword followed by the definition of the CTE. Then, immediately after the CTE, you code the statement that uses it.

Separate multiple CTEs with commas. Each CTE can refer to itself and any previously defined CTEs in the same WITH clause.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 43

Two CTEs and a query that uses them WITH Summary AS ( SELECT VendorState, VendorName, SUM(InvoiceTotal) AS SumOfInvoices FROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorID GROUP BY VendorState, VendorName ), TopInState AS ( SELECT VendorState, MAX(SumOfInvoices) AS SumOfInvoices FROM Summary GROUP BY VendorState ) SELECT Summary.VendorState, Summary.VendorName, TopInState.SumOfInvoices FROM Summary JOIN TopInState ON Summary.VendorState = TopInState.VendorState AND Summary.SumOfInvoices = TopInState.SumOfInvoices ORDER BY Summary.VendorState

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 44

The result set

(10 rows)

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 45

How to code a recursive CTE A recursive query is a query that is able to loop through a result

set and perform processing to return a final result set. A recursive CTE can be used to create a recursive query.

A recursive CTE must contain at least two query definitions, an anchor member and a recursive member, and these members must be connected by the UNION ALL operator.

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 46

The Employees table

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 47

A recursive CTE that returns hierarchical data WITH EmployeesCTE AS ( -- Anchor member SELECT EmployeeID, FirstName + ' ' + LastName As EmployeeName, 1 As Rank FROM Employees WHERE ManagerID IS NULL UNION ALL -- Recursive member SELECT Employees.EmployeeID, FirstName + ' ' + LastName, Rank + 1 FROM Employees JOIN EmployeesCTE ON Employees.ManagerID = EmployeesCTE.EmployeeID ) SELECT * FROM EmployeesCTE ORDER BY Rank, EmployeeID

Murach’s SQL Server 2008, C6 © 2008, Mike Murach & Associates, Inc. Slide 48

The final result set