Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
scans vs indexesprof. Stratos Idreos
HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/
class 13
/24CS165, Fall 2017 Stratos Idreos 2
1,2,3… 12,15,17 20,… …
35,50
35,…12,20 50,…
b-tree - dynamic tree - always balanced
/24CS165, Fall 2017 Stratos Idreos 3
(secondary) index vs scan: the eternal battle
A A
select … from R where A<v and ….
/24CS165, Fall 2017 Stratos Idreos 3
(secondary) index vs scan: the eternal battle
A A
select … from R where A<v and ….
Just having indexes in the system is or can be useless… or even bad for performance
Knowing when to use an index is key
/24CS165, Fall 2017 Stratos Idreos 3
(secondary) index vs scan: the eternal battle
A A
select … from R where A<v and ….
Just having indexes in the system is or can be useless… or even bad for performance
Knowing when to use an index is key
Primary index vs secondary vs scan?
/24CS165, Fall 2017 Stratos Idreos 4
design/implement numerous possible algorithms + data representations
choose the bestdata source, algorithms and path for each query
database kernel
data data data
algo
rithm
s/op
erat
ors
applications
sql
/24CS165, Fall 2017 Stratos Idreos 5
scan
secondary index scan
/24CS165, Fall 2017 Stratos Idreos 5
scan
secondary index scan
/24CS165, Fall 2017 Stratos Idreos 6
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
A B Ca5 a3 a2 a1 a4
Asecondary index on A
values out of order with base data
a query that select on A and then needs B
intermediate out of order
/24CS165, Fall 2017 Stratos Idreos 6
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
A B Ca5 a3 a2 a1 a4
Asecondary index on A
values out of order with base data
5 3 2 1 4
a query that select on A and then needs B
intermediate out of order
/24CS165, Fall 2017 Stratos Idreos 6
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
A B Ca5 a3 a2 a1 a4
Asecondary index on A
values out of order with base data
5 3 2 1 4
a5 a3 a2 a1 a4
A5 3 2 1 4
select2 1 4
b1 b2 b3 b4 b5
Ba query that select on A and then needs B
intermediate out of order
/24CS165, Fall 2017 Stratos Idreos 7
covering index:contains all columns needed for a set of queries
A A B
no need to go to base data but…
/24CS165, Fall 2017 Stratos Idreos 8
A A
random access to traverse the tree
& need to sort result
sequential access pattern but needs to
access all data
Vs.
/24CS165, Fall 2017 Stratos Idreos 9
/24CS165, Fall 2017 Stratos Idreos 10
the standard solution1) maintain statistics, 2) optimizer chooses access path depending on estimated selectivity
/24CS165, Fall 2017 Stratos Idreos 10
the standard solution1) maintain statistics, 2) optimizer chooses access path depending on estimated selectivity
what is wrong with that
/24CS165, Fall 2017 Stratos Idreos 11
Motivation
0
2
4
6
8
10
12
14
16
18
20
0 20 40 60 80 100
Exec
uti
on
tim
e (s
ec)
Tho
usa
nd
s
Result selectivity (%)
Index Scan
Full Scan
0
200
400
600
800
1000
1200
1400
0 20 40 60 80 100
Exec
uti
on
tim
e (
sec)
Result selectivity (%)
Index Scan
Full Scan
Motivation TPCH (SF10) 2/2
0.1
1
10
100
1000
Q1 Q3 Q5 Q7 Q9 Q11 Q13 Q16 Q19 Q22
Nor
mal
ized
exec
utio
n ti
me
TPC-H Query
Original
Tuned
/24CS165, Fall 2017 Stratos Idreos 11
Motivation
0
2
4
6
8
10
12
14
16
18
20
0 20 40 60 80 100
Exec
uti
on
tim
e (s
ec)
Tho
usa
nd
s
Result selectivity (%)
Index Scan
Full Scan
0
200
400
600
800
1000
1200
1400
0 20 40 60 80 100
Exec
uti
on
tim
e (
sec)
Result selectivity (%)
Index Scan
Full Scan
Motivation TPCH (SF10) 2/2
0.1
1
10
100
1000
Q1 Q3 Q5 Q7 Q9 Q11 Q13 Q16 Q19 Q22
Nor
mal
ized
exec
utio
n ti
me
TPC-H Query
Original
Tuned
ROBUSTNESS
/24CS165, Fall 2017 Stratos Idreos 12
All together
0
50
100
150
200
0 20 40 60 80 100
Exec
utio
n tim
e (s
ec)
Result selectivity (%)
Full ScanIndex ScanOptimizer decisionAvg. statistics collection
0
50
100
150
200
0 20 40 60 80 100Result selectivity (%)
0
50
100
150
200
0 20 40 60 80 100Result selectivity (%)
All together
0
50
100
150
200
0 20 40 60 80 100
Exec
utio
n tim
e (s
ec)
Result selectivity (%)
Full ScanIndex ScanOptimizer decisionAvg. statistics collection
0
50
100
150
200
0 20 40 60 80 100Result selectivity (%)
0
50
100
150
200
0 20 40 60 80 100Result selectivity (%)
All together
0
50
100
150
200
0 20 40 60 80 100
Exec
utio
n tim
e (s
ec)
Result selectivity (%)
Full ScanIndex ScanOptimizer decisionAvg. statistics collection
0
50
100
150
200
0 20 40 60 80 100Result selectivity (%)
0
50
100
150
200
0 20 40 60 80 100Result selectivity (%)
basic stats per column for pair
can we just recompute the statistics?
/24CS165, Fall 2017 Stratos Idreos 13
2012, somewhere in Germany
if I keep 30 data systems researchers “trapped” in a castle for a week, we might be able to
define “robust query processing” and find a few solutions
/24CS165, Fall 2017 Stratos Idreos 14
robust query processing (best definition to date by Goetz) graceful degradation when the environment changes 14
/24CS165, Fall 2017 Stratos Idreos 15
Renata Borovica University of Melbourne
Campbell Fraser Google
Marcin Zukowski Snowflake
selectivity
resp
onse
tim
eindex
scan
Can we avoid bad access path selection(secondary index vs scan)
when we have stale (or no) statistics?
/24CS165, Fall 2017 Stratos Idreos 16
select min(A) from R where B<10 and C<80
logical plan
optimizer
physical plan execution
mid query reoptimization
/24CS165, Fall 2017 Stratos Idreos 17
SWITCH SCANwhile index probing switch to scan if cardinality > estimation
good: avoids worst case bad: performance cliff
SMOOTH SCANgoal avoid performance cliff close to optimal
Cardinality Estimate based SS
Cardinality
Co
st
Index Scan
~ TS
co
st
TS c
ost
Table Scan
𝑋 ∗ 𝐸𝐶
Smooth Scan
Switch Scan
Design smooth scan
/24CS165, Fall 2017 Stratos Idreos 18
/24CS165, Fall 2017 Stratos Idreos 18
/24CS165, Fall 2017 Stratos Idreos 19
Extra: Efficient mid-query re-optimization of sub-optimal query execution plansNavin Kabra and David DeWitt ACM SIGMOD International Conference on Management of Data, 1998
Browse: Smooth Scan: Statistics-Oblivious Access PathsRenata Borovica, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski and Campbell Fraser IEEE International Conference on Data Engineering (ICDE), 2015
Read: Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?
Mike. Kester, Manos Athanassoulis, and Stratos Idreos ACM SIGMOD International Conference on Management of Data, 2017
DATA SYSTEMSprof. Stratos Idreos
class 13
scans vs indexes