Upload
rolf
View
62
Download
0
Embed Size (px)
DESCRIPTION
Parallel Query Optimization. Memory. One tuple at a time. Bucket B. Bucket A. Bucket Sizes and I/O Costs. Bucket B does not fit in the memory in its entirety, It must be loaded several times. Memory. One tuple at a time. Bucket B(1). Bucket A(1). Bucket A(2). Bucket B(2). Bucket A(3). - PowerPoint PPT Presentation
Citation preview
Fall 2008 Parallel Query Optimization 1
Parallel Query Optimization
Fall 2008 Parallel Query Optimization 2
Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,
It must be loaded several times.
Bucket B
Memory
Bucket A
One tuple at a time
Fall 2008 Parallel Query Optimization 3
Fit in Memory
Bucket B fits in memory. It needs to be loaded only once.
Bucket B(2)
Bucket B(1)
Memory
Bucket A(1)
One tuple at a time
Bucket B(3)
Bucket A(2)
Bucket A(3)
Fall 2008 Parallel Query Optimization 4
Hash-Based Join
Fall 2008 Parallel Query Optimization 5
GRACE Algorithm
Fall 2008 Parallel Query Optimization 6
Data Skew System performance is very sensitive to the skewn
ess in tuple distribution.
Fall 2008 Parallel Query Optimization 7
Zipf-like DistributionTotal: 1,000,000tuples
Fall 2008 Parallel Query Optimization 8
Partition Tuning Best Fit Decreasing Strategy:
In this partition tuning strategy, the hash buckets are first sorted into decreasing order according to size.
In each iteration, the currently largest bucket is assigned to the currently smallest partition (or PN).
This process is repeated until all the buckets have been allocated.
This is a dynamic load balancing technique.
Fall 2008 Parallel Query Optimization 9
Best Fit Decreasing Strategy
Fall 2008 Parallel Query Optimization 10
Adaptive Load Balancing (ABJ+)
Fall 2008 Parallel Query Optimization 11
ABJ+ vs. GRACE
Fall 2008 Parallel Query Optimization 12
L_LBO in Multi-way Join Queries
L_LBO: Linear Tree with Load Balancing A multi-way join query is treated as a sequential
order of two-way (or single) joins by using ABJ+.
Fall 2008 Parallel Query Optimization 13
B_NLB in Multi-way Join Queries
B_NLB: Bushy Tree without Load Balancing It tries to join as many pairs of relations as possibl
e.Split Phase: Each PN partitions its portion of each relation
into small subbuckets and each subbuckets is transferred to PN corresponding to the bucket ID.
Join Phase: Each PN performs the local joins.
Fall 2008 Parallel Query Optimization 14
NLBO in Multi-way Join Queries
NLBO: No Load Balancing Optimization
Like B_NLB, it tries to join as many pairs of relations as possible.
Hash Phase: Each PN partitions its portion of each relation into small subbuckets and stores them back to its own disks.
Partition Tuning Phase: It allocates the buckets to the PNs using the Best Fit Decreasing Strategy.
Join Phase: Each PN performs the local joins.
Fall 2008 Parallel Query Optimization 15
LBO in Multi-way Join Queries
LBO: Load Balancing Optimization
Hash Phase: hashed and stored back into local disks.
Optimization Phase: using best fit decreasing strategy and a greedy algorithm to select joins which will be executed concurrently.
Executing Phase:
Stage 1: Tune the partitions.
Stage 2: Perform the join operation.
Stage 3: Update the join graph, then go to Optimization Phase.
Fall 2008 Parallel Query Optimization 16
Optimization Phase of LBO
Fall 2008 Parallel Query Optimization 17
Effect of Bucket Skew
Fall 2008 Parallel Query Optimization 18
LBO-FR LBO-SFR: LBO with Fragment & Replicate Featu
re LBO-FR is similar to LBO, except it partitions bu
cket pairs into subbucket pairs if those buckets are too large.
Example: suppose bucket pair (S1, R1) is too large and |S1| > |R2|.
S1
R1
S1,1
R1
S1,2
R1
S1,1
R1
S1,2
R1
S1,3
R1
Fall 2008 Parallel Query Optimization 19
LBO-SFR LBO-SFR: LBO with Symmetric Fragment &
Replicate Feature
S1,1,1
R1,1,1
S1,1,1
R1,1,1
S1,2,1
R1,1,2
S1,1,1
R1,1,1
S1,2,1
R1,1,2
S1,1,2
R1,2,1
S1,2,2
R1,2,2
S1,1,1
R1,1,1
S1,2,1
R1,1,2
S1,3,1
R1,1,3
S1,1,2
R1,2,1
S1,2,2
R1,2,2
S1,3,2
R1,2,3
|S1|>|R1| |S1,1,1|<|R1,1,1|
|S1,1,1|>|R1,1,1|
Parti. S1Parti. R1 Parti. S1
Fall 2008 Parallel Query Optimization 20
Effect of Bucket Skew