Upload
tallys
View
25
Download
0
Embed Size (px)
DESCRIPTION
Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications. Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise Getoor, Univ. of Maryland. VLDB 2009, Lyon, France. Index Selection. Index selection problem: Given a query workload - PowerPoint PPT Presentation
Citation preview
Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications
Karl Schnaitter, UC Santa CruzNeoklis Polyzotis, UC Santa CruzLise Getoor, Univ. of Maryland
VLDB 2009, Lyon, France
2University of California, Santa Cruz
Index Selection• Index selection problem:
– Given a query workload– Choose indices that improve workload performance
• Does index benefit depend on other indices? – If so, this is called index interaction
• Index “benefit” is a key concept– Informally, for an index i,
[benefit of i] = [exec cost without i] – [exec cost with i]
3University of California, Santa Cruz
Related Work• Interactions are a key concern in physical tuning
– [Whang et al. 1981] make assumptions implying that indices on different tables do not interact
– [Finklestein et al. 1988] assume that indices do not interact if they are relevant to separate queries
– [Bruno and Chaudhuri 2007] explicitly account for some interactions in on-line index selection
– Many more…
• These studies treat interactions as a secondary issue, and often rely on ad hoc assumptions
4University of California, Santa Cruz
Index Interactions• Let S be a set of indices relevant to a query Q• •
cost(X)
cost(X {a}) benefit({a}, X)
cost(X {b})
cost(X {a,b}) benefit({a}, X {b})
Indices a,b are independent with respect to X
€
cost(X) = cost of Q if only X ⊆S is available
€
benefit(Y,X ) = cost(X) − cost(Y ∪X)
5University of California, Santa Cruz
Index Interactions
cost(X)
cost(X {a}) benefit({a}, X)
cost(X {b})
cost(X {a,b}) benefit({a}, X {b})
Indices a,b positively interact with respect to X
• Let S be a set of indices relevant to a query Q• •
€
cost(X) = cost of Q if only X ⊆S is available
€
benefit(Y,X ) = cost(X) − cost(Y ∪X)
6University of California, Santa Cruz
Index Interactions
cost(X)
cost(X {a}) benefit({a}, X)
cost(X {b})
cost(X {a,b}) benefit({a}, X {b})
Indices a,b negatively interact with respect to X
• Let S be a set of indices relevant to a query Q• •
€
cost(X) = cost of Q if only X ⊆S is available
€
benefit(Y,X ) = cost(X) − cost(Y ∪X)
7University of California, Santa Cruz
• = degree of interaction between a,b with respect to X
=
Degree of Interaction
=
• •
€
benefit({a},X) − benefit({a},X ∪{b})cost(X ∪{a,b})
€
cost(X ∪{a}) − cost(X) − cost(X ∪{a,b}) + cost(X ∪{b})cost(X ∪{a,b})
€
doi(a,b,X)
€
X€
X ∪{a}
€
X ∪{b}€
X ∪{a,b}
€
doi is symmetric
€
doi(a,b) = maxX ⊆S
doi(a,b,X)
8University of California, Santa Cruz
Problem Statement• Which indices in S interact?• How strong are the interactions?• The Degree of Interaction Problem:
€
Compute doi(a,b) for all a,b∈ S
9University of California, Santa Cruz
Outline
• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information
10University of California, Santa Cruz
Outline
• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information
11University of California, Santa Cruz
Query Optimization• Computing doi(a,b) is not practical if the
optimizer is totally arbitrary– Need to compute
• In practice, query optimization is not arbitrary– E.g., we expect
• We put mild assumptions on query optimization:– Plans are selected from some fixed space P– Optimizer chooses the cheapest feasible plan from P– Ties are broken consistently
€
cost(∅ ) ≥ cost({a})
S allfor ),,( XXbadoi
12University of California, Santa Cruz
Index Benefit Graph• An Index Benefit Graph (IBG) encodes the
selection of optimal plans for a query– Introduced by [Frank, Omiecinski, and Navathe 1992]
• Example IBG when S = {a,b,c,d}
a b c d
a b c b c d
a c b c
= 20
= 45
d = 80c = 80
= 50
c d = 65= 50= 80
used in opt plan
cost of plan
– There are 16 subsets of S– IBG has 8 nodes– But IBG can compute
€
cost(X) for all X ⊆S
13University of California, Santa Cruz
Outline
• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information
14University of California, Santa Cruz
Naive Algorithm• Recall that we want the degree of interaction between
all pairs of indices in S• Each doi(a,b) may be computed directly
€
For all a,b∈ S
€
Initialize T[a,b] = 0
€
Assign T[a,b] = max(d,T[a,b])
€
Let d =cost(X ∪{a}) − cost(X) − cost(X ∪{a,b}) + cost(X ∪{b})
cost(X ∪{a,b})
€
For all X ⊆S
Upon termination, T[a,b] = doi(a,b) for all a,bCan save time using an IBG as a cache of cost
functionDownside: iteration over all subsets of S
15University of California, Santa Cruz
The QINTERACT Algorithm
€
For all a,b∈ S
€
Initialize T[a,b] = 0
€
Assign T[a,b] = max(doi(a,b,X1),doi(a,b,X2),T[a,b])
€
For all IBG nodes Y
€
Construct two index sets X1, X2 ⊆S (see paper)
€
For all a,b∈ S
€
Initialize T[a,b] = 0
€
Assign T[a,b] = max(doi(a,b,X),T[a,b])
€
For all X ⊆S
Naive Algorithm (condensed)
We should avoid evaluating doi(a,b,X) for all
€
X ⊆S
QINTERACT algorithm processes two index sets per IBG node
QINTERACTAlgorithm
16University of California, Santa Cruz
€
cost(∅ )€
cost(a)
€
cost(b)€
cost(ab)
€
cost(u)€
cost(ua)
€
cost(ub)€
cost(aub)
QINTERACT Example
a b u v = 20
a u v = 30 b u v = 30
a u = 40 u v = 40
v = 50u = 50
b v = 40
•Let’s calculate doi(a,b) on the graph below•What happens on iteration Y = {u} ?
Y
a b u v = 20
a u v = 30 b u v = 30
a u = 40 u v = 40
v = 50u = 50
b v = 40
Y
€
doi(a,b,X1) =40 − 50 − 20 + 30
20= 0
€
X1 = {u}
€
doi(a,b, X2) =40 − 50 − 20 + 40
20= 0.5
€
X2 =∅
17University of California, Santa Cruz
Interleaved IBG Processing• In QINTERACT, the IBG is built, then analyzed
– I.e., IBG construction and analysis is serial
• We can discover interactions in a partial IBG
• IBG construction and analysis may be interleaved- Improves accuracy of doi over time
a b c d
a b c b c d
a c
= 20
= 45 = 50
= 80 . . . . . .b c
d = 80c = 80
c d = 65= 50
€
doi(b,d,{a,c}) =45 − 80 − 20 + 20
20=1.75
18University of California, Santa Cruz
Outline
• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information
- Visualizing Index Interactions- Scheduling Index Creation
19University of California, Santa Cruz
Outline
• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information
- Visualizing Index Interactions- Scheduling Index Creation
20University of California, Santa Cruz
Visualizing Index Interactions• We can visualize the doi function as a graph
– Nodes correspond to indices– Edge between a and b has weight doi(a,b)
O(CK,OK)
C(CK,NK)
LI(SK,SD,D,EP,OK)
LI(SD,D)
S(NK,N,SK) S(NK,SK) S(SK,NK)
C(NK,CK)
LI(SD,Q)
0.01
0.02
0.04
0.02
0.03
0.09 0.020.01
0.02TPC-H Query 7
21University of California, Santa Cruz
Interaction Graph• The connected components have special meaning
€
1. The benefit of any X ⊆Ci does not depend on S −Ci
2. Refining the partition loses property (1)3. This is the only partition with property (1) and (2)
€
C1
€
C3
€
C2
22University of California, Santa Cruz
Outline
• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information
- Visualizing Index Interactions- Scheduling Index Creation
23University of California, Santa Cruz
Scheduling Index Creation• Suppose we want to materialize new indices• In what order should they be created?
Benefit
€
∅ a,ba a,b,c
Materialized Indices
€
∅ a,cc a,b,c
Schedule = a,b,c
Choose first schedule to maximize benefit over time (shaded area)€
∅ a,bb a,b,c
Schedule = b,a,c Schedule = c,a,b
24University of California, Santa Cruz
Scheduling Index Creation• We define an optimization problem
– M = preexisting indices– {a1, …, an} = new indices to create
– Permute new indices as t1, …, tn to maximize
€
benefit({t1,..., ti}, M )i=1
n
∑• This problem is computationally hard
– There is a connection to the Set Cover problem, since each new index “covers” more benefit
25University of California, Santa Cruz
Greedy Scheduling• We are tempted to use a greedy heuristic• This results in the third schedule
Greedy schedule can be suboptimal by a factor of about (n – 1)
Benefit
€
∅ a,ba a,b,c
Materialized Indices
€
∅ a,cc a,b,c
Schedule = a,b,c
€
∅ a,bb a,b,c
Schedule = b,a,c Schedule = c,a,b
26University of California, Santa Cruz
Interaction-Aware Scheduling• Scheduling can use interaction graph
€
C1
€
C3
€
C2
Idea: First find optimal sub-schedules for each Ci
Then choose the best interleaving of sub-schedulesThis heuristic avoids the pitfalls of greedy scheduling We can also show stronger performance guarantees
27University of California, Santa Cruz
Conclusions• Index interactions provide useful insights
for physical design tuning• The doi metric is an effective characterization
of interaction relationships• We can analyze interactions efficiently when
the Index Benefit Graph has limited size• Future work?
28University of California, Santa Cruz
Thank You
29University of California, Santa Cruz
Performance Evaluation• QINTERACT implementation in Java
– Uses JDBC to connect to IBM DB2 database• Experiments use 22 TPC-H benchmark queries • We generate indices based on the DB2 advisor
– SALL = all indices recommended by DB2– S1C = indices in SALL with first column only
• We monitor the progress of the “serial” and “interleaved” approaches over time
30University of California, Santa Cruz
Experimental Results
SALL index set0.1 threshold
S1C index set0.1 threshold
31University of California, Santa Cruz
Applications• QINTERACT returns doi(a,b) for all a,b• We propose two applications of this
information– Visualizing index interactions
• Illustrates the global interactions as a graph• Useful when manually tuning the index set
– Scheduling index construction• Want to choose when new indices will be created• Goal is to increase performance as quickly as possible• Knowledge of index interactions can help
32University of California, Santa Cruz
Problem Statement• Which indices in S interact?• How strong are the interactions?• The Degree of Interaction Problem:
€
Compute doi(a,b) for all a,b∈ S
• It may be useful to ignore “minor” interactions• A threshold-based variant:
€
Decide if doi(a,b) > τ for all a,b∈ S
33University of California, Santa Cruz
Index Selection• Index selection problem:
€
a = any indexX = set of other indicesbenefit(a,X ) = cost(X) − cost(X ∪{a})
• Does benefit(a, X) depend on X ? – If so, this is called index interaction
€
W = a query workloadS = a set of indices relevant to Wcost(M ) = cost of W when indices M ⊆S are availableWant to find M ⊆S to minimize cost(M )
• We can quantify the benefit of an index:
34University of California, Santa Cruz
Future Work• Expand our support for updates• Implementation of visualization tool• Experiments with materialization scheduling• Incremental updates to doi function• Exploring stronger assumptions on query
optimization– Efficient upper bounds on doi function?