35
CPSC404, Laks V.S. Lakshmanan 1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

Embed Size (px)

Citation preview

Page 1: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 1

Relational Query Optimization

Chapter 15 Ramakrishnan & Gehrke

(Sections 15.1-15.6)

Page 2: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 2

What you will learn from this lecture

Cost-based query optimization (System R) – Plan space considered – How cost is estimated – Selectivity estimation using histograms.

Page 3: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 3

Highlights of System R Optimizer

Impact:– Most widely used currently; works well for < 10 joins.

Cost estimation: Approximations used. – Statistics, maintained in system catalogs, used to estimate

cost of operations and result sizes.– System R considers combination of CPU and I/O costs.

Plan Space: Too large, must be pruned.– Only the space of left-deep plans is considered.

Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation.

– Cartesian products avoided.

Page 4: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 4

Schema for Examples

Ratings:– Each tuple is 40 bytes long, 100 tuples per page,

1000 pages. Songs:

– Each tuple is 50 bytes long, 80 tuples per page, 500 pages.

Ratings(uid: integer, sid: integer, time: date, rating: integer)Songs(sid: integer, sname: string, genre: string, year: date)

Page 5: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 5

Query Blocks: Units of Optimization A SQL query is parsed into a collection

of query blocks, and these are optimized one block at a time.

Nested blocks are usually treated as calls to a subroutine, made once per outer tuple. (This is an over-simplification, but serves for now.)

SELECT S.snameFROM Songs SWHERE S.sid IN (SELECT R.sid FROM Ratings R WHERE R.rating>7 AND R.time > S.year)

Nested blockOuter block

For each block, the plans considered are:– All available access methods, for each reln in FROM clause.– All left-deep join trees (i.e., all ways to join the relations one-at-a-time, with the inner reln in the FROM clause*, considering all reln permutations and join methods.)

* i.e., we pipeline what is being computed.

Page 6: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 6

Cost Estimation (Recap 1) For each plan considered, must estimate cost:

– Must estimate cost of each operation in plan tree. Depends on input cardinalities. We’ve already discussed how to estimate the cost of

operations (sequential scan, index scan, joins, etc.)– Must estimate size of result for each operation in

tree! Use information about the input relations. For selections and joins, assume independence of predicates.

We’ll discuss the System R cost estimation approach.– Inexact, but works ok in practice.– More sophisticated techniques known now.

Page 7: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 7

Statistics and Catalogs (Recap 2)

Need information about the relations and indexes involved. Catalogs typically contain at least:– # tuples (NTuples) and # pages (NPages) for each relation.– # distinct key values (NKeys) and NPages for each index.– Index height, low/high key values (Low/High) for each tree

index. Catalogs updated periodically.

– Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok.

More detailed information (e.g., histograms of the values in some field) are sometimes stored.

Page 8: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 8

Size Estimation and Reduction Factors (Recap 3)

Consider a query block: Maximum # tuples in result is the product of the

cardinalities of relations in the FROM clause. Reduction factor (RF) associated with each term

reflects the impact of the term in reducing result size. Result cardinality = Max # tuples * product of all RF’s.– Implicit assumption that terms are independent!– Term col=value has RF 1/NKeys(I), given index I on col– Term col1=col2 has RF 1/MAX(NKeys(I1), NKeys(I2))– Term col>value has RF (High(I)-value)/(High(I)-Low(I) +1)

SELECT attribute listFROM relation listWHERE term1 AND ... AND termk

Page 9: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 9

Histograms Example: Songs(sid, sname, genre, year). Suppose

recorded years [1994,2008] and there are 45 tuples. Suppose the year values occur in the following pattern:

year frequency94 2

95 3

96 3

97 1

98 2

99 1

00 3

01 8

02 4

03 2

04 0

05 1

06 2

07 4

08 9

•For year > 07, what does our uniformity assumption tell us?•RF= 1/15, so 45 x 1/15 = 3 tuples qualify. But in reality, 9 tuples do!

•Can we improve estimation by maintaining small amount of additional info.? •Idea: repeat what we did for the interval [94,08] to subintervals!

frequencytable

Page 10: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 10

Histogram Generation & Use

age frequency94 2

95 3

96 3

97 1

98 2

99 1

00 3

01 8

02 4

03 2

04 0

05 1

06 2

07 4

08 9

Can slice up the frequency table using different schemes. Each scheme leads to a type of histogram.

Split up the range into subintervals of equal width.

Make uniformity assumption inside each bucket.

e.g., 95 is estimated to occur 8/3 = 2.67 times (approx).

Bucket (Bin)

Suppose we want to create a Histogram with 5 buckets.

Page 11: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 11

Equiwidth Histograms Equiwidth: split [94,08] into 5 buckets

(i.e., bins) of equal width; sum up frequencies bin-wide.

bucket frequency

94-96 8

97-99 4

00-02 15

03-05 3

06-08 15

Equiwidth histogram With 5 buckets (bins).

Frequency Ntuples.Note: this is already much closer to actual distribution.

What’s our current estimate for year > 07? 1/3 x 15 = 5 tuples. Compare with actual data. Compare with uniformity assumption.Uniformity assumption = histogram with 1 bucket!

Page 12: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 12

Equidepth Histogram Generation

year frequency94 2

95 3

96 3

97 1

98 2

99 1

00 3

01 8

02 4

03 2

04 0

05 1

06 2

07 4

08 9

Can slice up the frequency table using different schemes. Each scheme leads to a type of histogram.

Split up the range into subintervals such that each contains about the same total # occurrences, i.e., 45/5= 9 in our example.

Want to make a 5 bucket Histogram again.

Page 13: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 13

Equidepth Histograms

Equidepth:

bucket frequency

94-97 9

98-00 6

01-02 12

03-07 9

08 9

•Estimate for year > 07 is 9. •Even better than equiwidth HG. •Happens to be exact in this case.

•What is the maximum error in estimate for EW HG and ED HGin the present example?

•What are the main differences between equiwidth and equidepth HGs?

Page 14: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 14

A slightly different slicing Some of the bars have been moved. The

slicing cannot be perfect in general.

year frequency94 2

95 3

96 3

97 1

98 2

99 1

00 3

01 8

02 4

03 2

04 0

05 1

06 2

07 4

08 9

Page 15: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 15

Equidepth Histograms

Equidepth:

bucket frequency

94-98 11

99-01 12

02-05 7

06-07 6

08 9

•How does this HG compare with the previous ED HG?

bucket frequency

94-97 9

98-00 6

01-02 12

03-07 9

08 9

First ED HG.

Which HG would you put your money on?

Page 16: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 16

Histogram Use How do you calculate RF using a histogram? Straightforward for Attr=val. What about Attr < val? (we use 1st EDHG below.) E.g.: year > 98:

– #tuplesuniform = 10/15 x 45 = 30. RFuniform = 30/45 = 10/15 = 2/3.

– #tuplesewhg = 1 x 4/3 + 33 = 34.33. RFewhg = 34.33/45. – #tuplesedhg = 2 x 6/3 + 30 = 34. RFedhg = 34/45. – Actual # (tuples with year > 98) = 34. Actual RF = 34/45.

In general, can define optimal histograms (given # buckets) to minimize expected error.

Interesting algorithmic problem with lots of research literature.

HGs useful for selectivity estimation for multi-dimensional queries.

Page 17: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 17

Relational Algebra Equivalences

Allow us to choose different join orders and to `push’ selections and projections ahead of joins.

Selections: (Cascade)

c cn c cnR R1 1 ... . . .

c c c cR R1 2 2 1 (Commute)

Projections: RR anaa ...11 (Cascade)

Joins: R (S T) (R S) T (Associative)

(R S) (S R) (Commutative)

R (S T) (T R) S Show that:

when a1 a2 … an.

Page 18: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 18

More Equivalences

A projection commutes with a selection that only uses attributes retained by the projection.

Selection between attributes of the two arguments of a cross-product converts cross-product to a join.

A selection on just attributes of R commutes with R S. (i.e., (R S) (R) S )

Similarly, if a projection follows a join R S, we can `push’ it by retaining only attributes of R (and S) that are needed for the join or are kept by the projection.

Page 19: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 19

Some more equivalences

Join vs. Group-by: _{R.sid: max(time)}(Songs Ratings)

_{R.sid, max(time) ([ _{R.sid: max(time)}

(Ratings) Songs]) _{R.sid: max(time)}

(Ratings). provided the RIC Ratings.sid Songs.sid holds

(needed only for second equivalence). Does the equivalence hold for other aggregate

functions: e.g., what about avg(rating)? or count(time)?

R.sid=S.sid

R.sid=S.sid

Page 20: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 20

One more on Group-By and Join.

_{A: agg(D)}[r(A,B) s(B,C,D)] _{A,agg(D)}(r(A,B) _{B:agg(D)}

(s(B,C,D))). Does this equivalence always hold? What

constraints, if any, are needed for it to hold? If the constraint is relaxed, what then?

Page 21: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 21

Enumeration of Alternative Plans

There are two main cases:– Single-relation plans– Multiple-relation plans

For queries over a single relation, queries consist of a combination of selects, projects, and aggregate ops:– Each available access path (file [i.e., table]

scan/index scan/index probe) is considered, and the one with the least estimated cost is chosen.

– The different operations are essentially carried out together (e.g., if an index is used for a selection, projection is done for each retrieved tuple, and the resulting tuples are pipelined into the aggregate computation).

Page 22: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 22

Cost Estimates for Single-Relation Plans

Index I on primary key matches equality selection:– Cost is Height(I)+1 for a B+ tree, about 1.2 for hash index.

Additional cost if alternative 2 or 3 is used.

Clustered index I matching one or more selects:– (NPages(I)+NPages(R)) * product of RF’s of matching selects.

[this is mainly for secondary keys.] Non-clustered index I matching one or more selects:

– (NPages(I)+NTuples(R)) * product of RF’s of matching selects. Sequential scan of file:

– NPages(R). Note: Typically, no duplicate elimination on

projections! (Exception: Done on answers if user says DISTINCT.)

seco

nd

ary

key

Page 23: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 23

Example

If we have an index on genre:– (1/NKeys(I)) * NTuples(R) = (1/10) * 40000 tuples

retrieved. [assuming, # distinct genres= 10.] – Suppose index size Npages(I) = 50 pages. – Clustered index: (1/NKeys(I)) * (NPages(I)+NPages(R))

= (1/10) * (50+500) pages are retrieved. (This is the cost.)

– Unclustered index: (1/NKeys(I)) * (NPages(I)+NTuples(R)) = (1/10) * (50+40000) pages are retrieved.

Doing a file scan:– We retrieve all file pages (500).

SELECT S.sidFROM Songs SWHERE S.genre=`friendship’

Page 24: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 24

An Example with Aggregation SELECT year, COUNT(*)

FROM Songs WHERE year>2000 AND genre=`friendship’ GROUP BY yearHAVING COUNT DISTINCT (sname) > 2

PROJ_{year, count(sname)} ( HAVING_{count disinct (sname)>2}( GROUP BY_{year: count distinct(sname)} ( PROJ_{year,sname}(

SEL_{year>2000&genre=`friendship’}(Songs))))).

Page 25: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 25

Plan w/o indexes Scan Songs file; apply SEL/PROJ on the fly. Write out result into temp. Sort by year (for GROUP BY); aggregate during

merge; do HAVING on the fly. What’s the cost?

– 500 I/Os for scan – RF for year>2000 is 0.5 (approx.); RF for

genre=`friendship’ is 0.1 (used default value). – RF= 0.5 x 0.1; PROJ reduces size by 0.5 (if all fields

have the same size). Temp. size = 500 x 0.5 x 0.1 x 0.5 = 13 pages (approx), so 13 I/Os for writing it.

– 3 x 13 for sorting it. [can be less if we have sufficient buffer space.]

– Total = 552 I/Os.

Page 26: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 26

Plans w/ indexes. Pick one (whatever QOzer estimates is best) index among

those available; probe the index and verify unmatched selection conditions as pages are read in.

Use multiple indexes; do a rid intersection; then fetch resulting pages.

If GROUP-BY attrs are a prefix of a B+tree index, use the index to retrieve tuples in sort order (simplifies GROUP-BY). Rest of the ops done on each tuple, then each group.

Suppose there is a dense index that includes all attrs mentioned in query. Then only need to scan the index’s data entries. If index matches some of the SEL conditions, even better. If index is B+tree and GROUP-BY attrs form a prefix, can avoid sorting! – Does it matter if this index is clustered or not?

(what is a dense index? and what is a sparse index?) Note: we can keep attr-values as data entries even if they are not

indexed! E.g., index on genre, but data entry -- (genre, year). Why /When would this help?

Page 27: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 27

Plans w/ indexes – Example. Revisit query; data entries = (key, rid) pairs. B+tree

index on year, hash index on genre, B+tree index on (year, genre, sname).

1. Fetch data entries using hash index on genre; then records; then apply remaining ops. Does clustering make a difference?

2. Use both B+tree index on year and hash index on genre; intersect rids and fetch data.

3. Use only B+tree index on year (in sorted order). Helps greatly for GROUP-BY.

4. Scan only the B+tree index on (year, genre, sname) & fetch data entries with year>2000 and in sorted order. Then apply remaining ops. (index could be unclustered.) index must be dense for this to be work!

Page 28: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 28

Queries Over Multiple Relations

Fundamental decision in System R: only left-deep join trees are considered.– As the number of joins increases, the number of alternative

plans grows rapidly; we need to restrict the search space.– Left-deep trees allow us to generate all fully pipelined plans.

Intermediate results not written to temporary files. Not all left-deep plans are fully pipelined (e.g., Sort-Merge

join).

BA

C

D

BA

C

D

C DBA

Page 29: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 29

Enumeration of Left-Deep Plans Left-deep plans differ only in the order of relations, the

access method for each relation, and the join method for each join.

Enumerated using N passes (if N relations joined):– Pass 1: Find best 1-relation plan for each relation.– Pass 2: Find best way to join result of each 1-relation plan (as

outer) to another relation. (All 2-relation plans.) – Pass N: Find best way to join result of a (N-1)-relation plan (as

outer) to the N-th relation. (All N-relation plans.) For each subset of relations, retain only:

– Cheapest plan overall, plus– Cheapest plan for each interesting order of the tuples.

Where does this help?

Page 30: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 30

Enumeration of Plans (Contd.)

ORDER BY, GROUP BY, aggregates etc. handled as a final step, using either an `interestingly ordered’ plan or an additional sorting operator. Of course, hashing is an option too.

An N-1 way plan is not combined with an additional relation unless there is a join condition between them, unless all predicates in WHERE have been used up.– i.e., avoid Cartesian products if possible.

In spite of pruning plan space, this approach is still exponential in the # of tables.

Page 31: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 31

Example

Pass1:– Songs: B+ tree matches year>2000, and is

probably cheapest. However, if this selection is expected to retrieve a lot of tuples, and index is unclustered, file scan may be cheaper.

Still, B+ tree plan kept (because tuples are in year order).

– Ratings: B+ tree on uid matches uid=100; perhaps cheapest.

Songs: B+ tree on year. Hash on sidRatings: B+ tree on uid

Pass 2:– We consider each plan retained from Pass 1 as the outer, and consider how to join it with the (only) other relation.

e.g., Ratings as outer: Hash index can be used to get Songs tuples that satisfy sid = outer tuple’s sid value. What other plans might be considered?

Ratings Songs

sid=sid

uid=100 year> 2000

sname

Page 32: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 32

Example (contd.)

If we have > 2 relations to join, for pass 2 pass 3 (indeed, for pass k pass k+1), the same reasoning applies. – Use same criteria as before for deciding which

plans to keep for the join of each subset of k relations. [(k+1)^th rel inner for join plan.]

– For each [k relations] Join [chosen (k+1)^th rel], consider all possible strategies and choose cheapest plans and/or plans w/ interesting tuple orders.

Page 33: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 33

Nested Queries

Nested block is optimized independently, with the outer tuple considered as providing a selection condition.

Outer block is optimized with the cost of `calling’ a nested block computation taken into account.

Implicit ordering of these blocks means that some good strategies are not considered. The non-nested version of the query is typically optimized better.

Not all queries can be flattened this way!

SELECT S.snameFROM Songs SWHERE S.sid IN (SELECT R.sid FROM Ratings R WHERE R.rating>7 AND R.time > S.year) Nested block to optimize:

SELECT R.sid FROM Ratings R WHERE R.rating>7 AND R.time> outer value

Equivalent non-nested query:SELECT S.snameFROM Songs S, Ratings RWHERE S.sid=R.sid AND R.rating>7 AND R.time > S.year

Page 34: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 34

Summary Query optimization is an important task in a

relational DBMS. Must understand optimization in order to

understand the performance impact of a given database design (relations, indexes) on a workload (set of queries).

Two parts to optimizing a query:– Consider a set of alternative plans.

Must prune search space; typically, left-deep plans only.– Must estimate cost of each plan that is considered.

Must estimate size of result and cost for each plan node. Key issues: Statistics, indexes, operator implementations.

Page 35: CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6)

CPSC404, Laks V.S. Lakshmanan 35

Summary (Contd.) Single-relation queries:

– All access paths considered, cheapest is chosen.– Issues: Selections that match index, whether index key has all

needed fields and/or provides tuples in a desired order. Is the index dense/sparse, clustered/unclustered. (Not all of this matters all the time.)

Multiple-relation queries:– All single-relation plans are first enumerated.

Selections/projections considered as early as possible.– Next, for each 1-relation plan, all ways of joining another relation (as

inner) are considered.– Next, for each 2-relation plan that is `retained’, all ways of joining

another relation (as inner) are considered, etc.– At each level, for each subset of relations, only best plan for each

interesting order of tuples is `retained’, in addition to the global cheapest plan (which may or may not come with an interesting order).