Upload
others
View
23
Download
0
Embed Size (px)
Citation preview
Cost-based Query Optimization
Christoph Koch
2
Computing the cost of a query plan
We know how to estimate the cost of each individual operator.But for doing this we need to know the size of the input.
Compute the size of each relation produced by some operatorBottom-upWe need to know estimates of selectivities/reduction factors for both selection and join conditions.
Given a fixed query plan, if we exchange a particular operator implementation (e.g. NL join against Hashjoin), the output does not change and we do not have to recompute output sizes.
3
Model of Query plans
We use a very powerful model.Ingredients:
Algebra treeFor each operation, a choice of implementationFor some binary operations (e.g. NL joins), an annotation saying which of the two inputs is the outer and which is the inner loop.A choice of whether an operator’s output is to be written back to disk or pipelined into the next operator (sometimes there is no choice).An allocation of memory buffer pages to operators and their input/output lines.
In case of pipelining, the allocation may not be operator by operator but span several operations.
4
A Remark on the cost function for block NL joins
On this slides I use the formula
pages_outer +round_up(pages_outer / (buffer_pages – 1)) *(pages_inner - 1) + 1
I explained this formula in class but it’s not in the book.You can use the formula from the book instead:
pages_outer +round_up(pages_outer / (buffer_pages – 1)) *pages_inner
5
A University Database
roomgradnameid
226FullSocrates2125
7FullKant2137
36FullCurie2136
309AssocAugustinus2134
52AssocPopper2133
310AssocKopernikus2127
232FullRussel2126
Professor (p)
semesternamesid
18Xenokrates24002
2Feuerbach29555
2Theophrastos29120
3Carnap28106
6Schopenhauer27550
8Aristoxenos26830
10Fichte26120
12Jonas25403
Student (s)
21374The three critiques4630
21342CS4325022
21332The Vienna Circle5259
taught_bycreditsnamecid
21374Foundations5001
21262Bioethics5216
21263Theory of Science5052
21254Logics4052
21252Maieutics5049
21263Epistemology5043
21254Ethics5041
Course (v)
521628106
502225403
502229555
504929120
504129120
cidsid
500126120
500129120
525928106
505228106
504128106
405227550
500127550
Takes (h)
6
Assumptions#tuples per page:
p: b1024/50c =20
s: b1024/50c = 20
v: b1024/100c = 10
h: b1024/16c = 64
(page overhead is ignored)
#pages:
p: 800/20 = 40
s: 38000/20 = 1900
v: 2000/10 = 200
h: d60000/64e = 938
(we ignore the overhead of the primary index.)
Relation sizes (#tuples)|p|=800|s|=38000|v|=2000|h|=60000
Avg. Tuple sizep: 50 Bytess: 50 Bytesv: 100 Bytesh: 16 Bytes
selectivitiesSel[sh] = 2.6 * 10^-5Sel[hv] = 5 * 10^-4Sel[vp] = 1.25 * 10^-3Sel[p.Name=...] =1.25 * 10^-3
Page size 1024 BytesMain memory buffers m = 20+1 pages
7
Example 1
Access(p)
πs.semester
h
v
s
(with duplicates)
Joins.sid=h.sid
Joinp.id=v.taught_by
Joinv.cid=h.cidSelp.name=´Socrates´
8
Example 1
Access(p)
πs.semester
s h
v
(with duplicates)
Joinp.id=v.taught_by
Joinv.cid=h.cid
Joins.sid=h.sid
Selp.name=´Socrates´
9
Example 1
Access(p)
πs.semester
s h
v
(with duplicates)
Joins.sid=h.sid
Joinp.id=v.taught_by
Joinv.cid=h.cidSelp.name=´Socrates´
10
Example 1
Access(p)
πs.semester
s h
v
(with duplicates)
Joins.sid=h.sid
Joinp.id=v.taught_by
Joinv.cid=h.cidSelp.name=´Socrates´
11
Example 1
Access(p)
πs.semester
s h
v
(with duplicates)
Joins.sid=h.sid
Joinp.id=v.taught_by
Joinv.cid=h.cidSelp.name=´Socrates´
12
Example 1
Access(p)
πs.semester
s h
v
(with duplicates)
And now we have to allocate memory buffers...
Joins.sid=h.sid
Joinp.id=v.taught_by
Joinv.cid=h.cidSelp.name=´Socrates´
13
Example 1a
Access(p)NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
π s.semester
Selp.name=´Socrates´
NL-Joinv.cid=h.cid
s h
v
(with duplicates)
1 9
Buffer assignments in this color
pipl.
pipl.
10
0
pipl.
91
0
(inner: recompute!)
14
Example 1a
Access(p)
π s.semester
s h
v
(with duplicates)
pipl.
pipl.
10
0
pipl.
1 9
91
0
(inner: recompute!)
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cidSelp.name=´Socrates´
15
Example 1a
s h
vAccess(p)
π s.semester(with duplicates)
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cidSelp.name=´Socrates´
16
Example 1b
Access(p)
π s.semester
s h
v
(with duplicates)
1 9
Buffer assignments in this color
pipl.
pipl.
10
0
pipl.
9 1
0
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cidSelp.name=´Socrates´
17
Example 1b
Access(p)
π s.semester
s h
v
(with duplicates)
pipl.
pipl.
10
0
pipl.
1 9
0
9 1
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cidSelp.name=´Socrates´
18
Example 1b
s h
vAccess(p)
π s.semester(with duplicates)
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cidSelp.name=´Socrates´
19
Example 1c
Access(p)
π s.semester
s h
v
(with duplicates)
1 19
1
Buffer assignments in this color
pipl.
pipl.
10
0
pipl.
181
0
Store intermediateresult
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cidSelp.name=´Socrates´
20
Example 1c
Access(p)
π s.semester
s h
v
(with duplicates)
1 19
1pipl.
pipl.
10
0
pipl.
181
0
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cidSelp.name=´Socrates´
21
Example 1c
s h
vAccess(p)
π s.semester(with duplicates)
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cidSelp.name=´Socrates´
22
Example 2
s
h
v
Access(p)
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
π s.semester
NL-Joinv.cid=h.cid
pipl.
pipl.
1 0
1 0
5 0
Selp.name=´Socrates´
23
Example 2
s
h
v
Access(p)
pipl.
1 0
pipl.
(Same query, differentJoin order)
1 0
5 0
π s.semester
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cid
Selp.name=´Socrates´
24
Example 2
s
h
v
Access(p)
1 0
pipl.
pipl.
1 0
5 0
π s.semester
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cid
Selp.name=´Socrates´
25
Example 2
s
h
v
Access(p)
1 0
5
pipl.
1 0
pipl.
0
π s.semester
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cid
Selp.name=´Socrates´
26
Example 2
s
h
v
Access(p)
5 0
pipl.
1 0
pipl.
1 0
π s.semester
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cid
Selp.name=´Socrates´
27
Example 2
s
h
v
Access(p)
pipl.
1 0
pipl.
1 0
5 0
π s.semester
NL-Joins.sid=h.sid
NL-Joinp.id=v.taught_by
NL-Joinv.cid=h.cid
Selp.name=´Socrates´
28
Example 3a
Access(p)
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
π s.semester
Sortv.cid Sorth.cid
v
h
s
Selp.name=´Socrates´
taught_by
29
Example 3a
20+1
pipl.
1
1
pipl.
0
5 e.g. 2
pipl.
Access(p)
π s.semester
Sortv.cid Sorth.cid
v
h
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
30
Example 3a
20+1
pipl.
1
1
pipl.
0
e.g. 2
pipl.
5
Access(p)
π s.semester
Sortv.cid Sorth.cid
v
h
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
31
Example 3a
20+1
pipl.
1
1
pipl.
0
e.g. 2
pipl.
5
Access(p)
π s.semester
Sortv.cid Sorth.cid
v
h
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
32
Example 3a
20+1
pipl.
1
1
pipl.
0
e.g. 2
pipl.
5
Access(p)
πs.semester
Sortv.cid Sorth.cid
v
h
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
33
Example 3a
20+1
pipl.
1
1
pipl.
0
e.g. 2
pipl.
5
Access(p)
πs.semester
Sortv.cid Sorth.cid
v
h
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
34
Example 3a
pipl.18
pipl.
20+1
pipl.
1
1
pipl.
0
e.g. 2
pipl.
5
Access(p)
π s.semester
Sortv.cid Sorth.cid
v
h
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
35
Example 3a
pipl.18
pipl.
20+1
pipl.
1
1
pipl.
0
e.g. 2
pipl.
5
Access(p)
π s.semester
Sortv.cid Sorth.cid
v
h
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
36
Example 3a
20+1
pipl.
1
1
pipl.
0
e.g. 2
pipl.
5
Access(p)
π s.semester
Sortv.cid Sorth.cid
v
h
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
37
Example 3b
pipl.3
pipl.
20+1
pipl.
1
1
pipl.
0
e.g. 2
pipl.
5
Access(p)
π s.semester
Sortv.cid Sorth.cid
v
h
s
IndexDup
Hash
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
38
Example 3b
pipl.
pipl.3
pipl.
20+1
pipl.
1
1
pipl.
0
e.g. 25
Access(p)
π s.semester
Sortv.cid Sorth.cid
v
h
s
IndexDup
Hash
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
39
Example 3b
pipl.
pipl.3
pipl.
20+1
pipl.
1
1
pipl.
0
e.g. 25
Access(p)
π s.semester
Sortv.cid Sorth.cid
v
h
s
IndexDup
Hash
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Merge-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
40
Example 3c
pipl.
pipl.3
pipl.
pipl.
1
e.g. 25
1
(15 hashbuckets)
Access(p)
π s.semester
v
h
s
IndexDup
Hash
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Hash-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
41
Example 3d
pipl.
pipl.3
pipl.
pipl.
1
e.g. 2
1 1
5
Access(p)
π s.semester
v
h
IndexDup
Hash
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
NL-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
42
Example 3e
pipl.
pipl.3
pipl.
pipl.
1
e.g. 2
1
5
e.g. 2
Access(p)
π s.semester
v
h
IndexDup
Hash
s
Index-Joinp.id=v.taught_by
Index-Joins.sid=h.sid
Index-Joinv.cid=h.cid
Selp.name=´Socrates´
taught_by
43
Observations
Join order has BIG impact on query evaluation time!
Cost of Operations and size of intermediate result can be computed separately.
The selection of a join operator has only local influence on the runtime of the query, except if it destroys properties of the data (such as sort order) that are important for operations later on in the pipeline or if it requires main memory buffers that are allocated to other operators.
44
Example 4: Selectivity estimates
Access(p)
v
Now no selectivities are given, but we know that
• Professors (p) are in a 1:n relationship with Courses (v) (via foreign key taught_by).
• Each Professor has either rank „Full“ or „Assoc“.
Selectp.grad = ´Full´
Index-Joinp.id=v.taught_by
(UnclusteredB+-tree on taught_by
45
Example 4a: Selectivity estimates
Access(p)
v
1
taught_by
Index-Joinp.id=v.taught_by
Selectp.grad = ´Full´
46
Example 4b: Selectivity estimates
Access(p)
v
It is better NOT toUse the index!
Cost:
(UnclusteredB+-tree on taught_by
NL-Joinp.id=v.taught_by
Selectp.grad = ´Full´
47
Using Histograms
0
50
100
150
200
250
300
350
400
16-17 18-19 20-21 22-23 24-25 26-27
Student.Age
Histogram computed by„analyze table“ statement
48
Example 5: (co-)Clusters
p vProf 1
Course 1.1Course 1.2
Prof 2Course 2.1
Prof 3Course 3.1Course 3.2Course 3.3
1 1
1
NL-Joinp.id=v.taught_by
Selectp.grad = ´Full´
49
Example 6: Seeks
p
v
(UnclusteredB+-tree on taught_by
NL-Joinp.id=v.taught_by
Selectp.grad = ´Full´