Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 13: Query Processing

Database System Concepts, 5th Ed.

©Silberschatz, Korth and SudarshanSee www.db-book.com for conditions on re-use

Chapter 13: Query ProcessingChapter 13: Query Processing

©Silberschatz, Korth and Sudarshan13.2Database System Concepts - 5th Edition, Aug 27, 2005.

Chapter 13: Query ProcessingChapter 13: Query Processing

Overview

Measures of Query Cost

Join Operation


Basic Steps in Query ProcessingBasic Steps in Query Processing

1. Parsing and translation

2. Optimization

3. Evaluation


Basic Steps in Query Processing Basic Steps in Query Processing (Cont.)(Cont.)

Parsing and translation

translate the query into its internal form. This is then translated into relational algebra.

Parser checks syntax, verifies relations

Evaluation

The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.


Basic Steps in Query Processing : Basic Steps in Query Processing : OptimizationOptimization

A relational algebra expression may have many equivalent expressions

E.g., balance2500(balance(account)) is equivalent to

balance(balance2500(account))

Each relational algebra operation can be evaluated using one of several different algorithms

Correspondingly, a relational-algebra expression can be evaluated in many ways.

Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.

E.g., can use an index on balance to find accounts with balance < 2500,

or can perform complete relation scan and discard accounts with balance 2500


Evaluation PlanEvaluation Plan

An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated.


Basic Steps: Optimization (Cont.)Basic Steps: Optimization (Cont.)

Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost.

Cost is estimated using statistical information from the database catalog

e.g. number of tuples in each relation, size of tuples, etc.


Measures of Query CostMeasures of Query Cost

Cost is generally measured as total elapsed time for answering query

Many factors contribute to time cost

disk accesses, CPU, or even network communication

Typically disk access is the predominant cost, and is also relatively easy to estimate. Measured by taking into account

Number of seeks * average-seek-cost

Number of blocks read * average-block-read-cost

Number of blocks written * average-block-write-cost


Measures of Query Cost (Cont.)Measures of Query Cost (Cont.)

For simplicity we just use the number of block transfers from disk as the cost measures

tT – time to transfer one block

Cost for b block transfers b * tT


Join OperationJoin Operation

Several different algorithms to implement joins

1. Nested-loop join

2. Block nested-loop join

3. Indexed nested-loop join

4. Merge-join

5. Hash-join

Choice based on cost estimate


Join OperationJoin Operation

- Join StrategiesJoin Strategies

Consider

deposit(branch-name, account-#, customer-name, balance)

customer(customer-name, c-st, c-city)

Consider deposit customer

= 10,000

= 200

Simple Iteration (Nested-loop join) :

Assume no indices.

It must examine 10,000 * 200 = 2,000,000 pairs of tuples

Expensive since it examines every pair of tuples in the two relations.

dn

cn


1. Nested-Loop Join1. Nested-Loop Join

(case 1)(case 1)

for each tuple d in deposit do

begin

for each tuple c in customer do

begin

test pair(d,c) to see if a tuple should

be added to the result

end

end

If the tuples of deposit are stored together physically (assume 20

tuples fit in one block), reading deposit requires 10,000/20=500

block accesses

(cf. in the worst case, 10,000 block access)


1. Nested-Loop Join (Cont.)1. Nested-Loop Join (Cont.)

As for the customer, 200/20 = 10 accesses per tuple of deposit if it is stored together physically.

Thus 10 * 10,000 = 100,000 block accesses to customer are needed to process the query.

∴ total : 100,500 block accesses


1. Nested-Loop Join (Cont.)1. Nested-Loop Join (Cont.)

(case 2)(case 2)

Assume that customer in the outer loop and deposit in the inner loop.

100,000 accesses to deposit (200 * (10,000/20) = 100,000) + 10

accesses to read the customer (200/20 = 10)

∴ total 100,010 block accesses

Thus the choice of inner and outer loop relations can have a

dramatic effect on the cost of evaluating queries.


2. Block Nested-Loop Join2. Block Nested-Loop Join

Variant of nested-loop join in which every block of inner relation is paired with every block of outer relation.

Block-Oriented Iteration :

for each block Bd of deposit do

begin

for each block Bc of customer do

begin

for each tuple d in Bd do

begin

for each tuple c in Bc do

begin

test pair(d,c) to see if a tuple

should be added to the result

end

end

end

end


2. Block Nested-Loop Join (Cont.)2. Block Nested-Loop Join (Cont.)

per-block basis(not per-tuple basis) saving in block accesses.

Assume deposit & customer are stored together physically.

Instead of reading the customer relation once for each tuple of

deposit, we read the customer relation one for each block of

deposit.

5,500 accesses = ( 5,000(=500(200/20) ) accesses to customer block + 500(=10,000/20) accesses to deposit blocks)


2. Block Nested-Loop Join (Cont.)2. Block Nested-Loop Join (Cont.)

Think customer : outer loop

deposit : inner loop

(10 (10,000/20) = 10 500

= 5,000 access to deposit + 10 (200/20 = 10) accesses to customer)

= 5,000+10 =5,010 accesses.

A major advantage to use of the smaller relation(customer) in

the inner loop is that it may be possible to store the entire

relation in main memory temporarily.

If customer fit in M.M, 500 block access to read deposit + 10

blocks to read customer 510 accesses


3. Merge-Join3. Merge-Join

Merge-Join :

Assume that both relations are in sorted order on the join attributes and are stored together physically

deposit customer 510 block accesses

Merge-Join allows us to compute the join by reading each block exactly once.

500 block accesses to read deposit (10,000/20 = 500) + 10 block accesses to read customer (200/20 = 10)

510 block accesses


3. Merge-Join (Cont.)3. Merge-Join (Cont.)

Algorithms :

- A group of tuples of one relation with the same value on the join attributes is read.

- The corresponding tuples of the other relation are read.

- Since the relations are in sorted order, tuples with the same value on the join attributes are in consecutive order. This allows us to read each tuple only once.


3. Merge-Join (Cont.)3. Merge-Join (Cont.)


4. Indexed nested-loop join4. Indexed nested-loop join

Simple iteration (Nested-loop join)

deposit customer 10,000 X 200 = 2,000,000

block accesses (no physical clustering of tuples)

Merge-join requires sorted order.

Block-oriented iteration requires that tuples of each relation be stored physically together.

But there are no restrictions on the simple iteration (nested-loop join).


4. Indexed nested-loop join4. Indexed nested-loop join

If an index exists on customer for customer-name, then 10,000

block accesses to read deposit + 10,000 3 block accesses

( 2 for index block, 1 to read the customer tuple itself)

40,000 block accesses

Given a tuple d in deposit, it is no longer necessary to read the

entire customer relation. Instead, the index is used to look up

tuples in customer for which the customer-name value is

d[customer-name]. Only one tuples in customer table for which

d[c-name] = c[c-name] since c-name is a primary key for

customer.


5. Hash-Join5. Hash-Join

Hash Join :

A hash function h is used to hash tuples of both relations on the

basis of join attributes.

Let d be a tuple in deposit, c be a tuple in customer.

If h(c) ≠ h(d), then c & d must have different values for

customer-name. If h(c) = h(d), check.


5. Hash-Join (Cont.)5. Hash-Join (Cont.)

h: customer-name { 0, 1, 2, .... , Max }

denote buckets of pointers to customer.

denote buckets of pointers to deposit.

rd : the set of deposit tuples that hash to bucket i.

rc - the set of customer tuples that hash to bucket i.

rd rc

Total 510(for hashing) + 510(perform rd rc) = 1,020 block accesses. Assume that deposit and customer tuples are stored together physically, respectively.

max10 ,...,, ccc HHH

max10 ,...,, ddd HHH


5. Hash-Join (Cont.)5. Hash-Join (Cont.)


Three-Way JoinThree-Way Join

Consider

branch(branch-name, assets, b-city)

deposit(branch-name, account-#, customer-name, balance)

customer(customer-name, c-st, c-city)

Consider

branch deposit customer

Where

ndeposit = 10,000

ncustomer = 200

n branch = 50

Consider a choice of which join to compute first.



It is associative :

Estimation of the size of a natural join

Let and be relations

① If then

② If is a key for

then the number of tuples is the number of tuples in .

(a tuple of will join with exactly one tuple from )

Ex)

)( 11 Rr )( 22 Rr

21 RR

21 RR ,1R

1r2r

2r



Strategy 1.

① deposit customer first

since c-name is a key for customer, at most 10,000 tuples.

② build an index an branch for b-name.

compute branch (deposit customer)

For each t ∈ deposit customer, look up the tuple in

branch with a branch-name value of t[branch-name].

Since b-name is a key for branch, examine only one branch

tuples for each of 10,000 tuples in (deposit customer).

※ If R1 ∩ R2 is a key for R1,

the # of tuple in r1 r2 ≤ the # of tuples in r2.



Strategy 2.

50 * 10,000 * 200 possibilities, without constructing indices at all.

Strategy 3.

build two indices :

on branch for b-name.

on customer for c-name.

Consider each t ∈ deposit, look up the corresponding tuple in customer and the corresponding tuple in branch.

Thus, we examine each tuple of deposit exactly once.

Database System Concepts, 5th Ed.

©Silberschatz, Korth and SudarshanSee www.db-book.com for conditions on re-use

End of Chapter 13End of Chapter 13

Documents

Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 13: Query Processing