Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
WAIM 2015
SALA: A Skew-avoiding and Locality-aware Algorithmfor MapReduce-based Join
Author: Ziyu Lin, Min Xing Cai, Ziming Huang, Yongxuan Lai
Speaker: Minxing CaiDate: 2015-06-10
dblab.xmu.edu.cn
Background
Join operation
R join S on R.uid = S.uid
uid name …
1 Jacky …
2 Lucy …
3 Tom …
4 Kevin …
5 Richard …
… … …
Results of joinDataset R
uid page …
1 /book …
1 /music …
2 /music …
4 /movie …
5 /book …
… … …
Dataset S
uid name page …
1 Jacky /book …
1 Jacky /music …
2 Lucy /music …
4 Kevin /movie …
5 Richard /book …
… … … …
join
Background
MapReduce-based join (Repeartition join)• Redistributing the data to various partition based on the join key.• Key-value pairs that has the same key distributed to the same partition.• Join operation performed in the reduce phase.
Partition 1(Reducer 1)
Partition 2(Reducer 2)
uid name …
1 Jacky …
3 Tom …
5 Richard …
Part of join resultsPart of dataset R
uid page …
1 /book …
1 /music …
5 /book …
Part of dataset S
uid name page …
1 Jacky /book …
1 Jacky /music …
5 Richard /book …
join
uid name …
2 Lucy …
4 Kevin …
… … …
Part of join results
uid page …
2 /music …
4 /movie …
… … …
uid name page …
2 Lucy /music …
4 Kevin /movie …
… … … …
join
Part of dataset R Part of dataset S
Background
MapReduce-based join (Repeartition join):
The process of repartition join with the dataset R and S
Problems
MapReduce-based join suffer performance degradation from partitioning skew when handling skewed data.
• Partitioning skew describes an uneven distribution of key-value pairs across reducers.
Reducer 1 Reducer 2 Reducer 3 Reducer 4
Key-valuepairs
Key-valuepairs
Key-valuepairs
Key-valuepairs
• Default partitioning scheme of MapReduce is hash partitioning: hash(Key) mod R (R: the number of reducers)
• Hash partitioning can’t guarantee uniform distribution of data.
skewed partition
Problems
Execution time of a MapReduce job
Reduce 1Reduce 2Reduce 3
Reduce 4
skewed partition requires more time to fetch the intermediate results
• Reduce phase begins after finish of shuffle phase.• Transferring for skewed partition delays the shuffle phase.
Map phase Shuffle phase Reduce phase
Problems
Execution time of a MapReduce job
Map phase Shuffle phase Reduce phase
• The whole execution time is determined by the slowest reducer.• Skewed partition requires more computing time and therefore
delays the whole job.
skewed partition requires more computing time to perform join operation
Reduce 1Reduce 2Reduce 3
Reduce 4
Existing Approaches
Dynamic Task Splitting• Dynamically split slow tasks to the node which is idle;• but adds a lot of complexity.
Better partitioning scheme• Making a better partition based on the key’s frequency
distribution to achieve load balance;• but requires extra time to obtain key’s frequency.
A Simple Approach: SALA
Key idea: distribute intermediate results based on the distribution information of key’s frequency and location.
Utilizing data locality to reduces the amount of intermediate results transferred across the network would improve the performance.
Reducer 1(Node 1)
Reducer 2(Node 2)
Reducer 3(Node 3)
Reducer 4(Node 4)
Transferred on local
Transferred across the network“Moving computation is cheaper than moving data”
intermediate resultsof map phase
Partition 1
Node 1
Partition 2
Partition 3
Partition 4
A Simple Approach: SALA
Key scheme: Volume/Locality-aware Partitioning. Achieve better load balance and larger data locality
This scheme adopting greedy selection strategy as follows: • (Volume) First process the key value which has larger size of
intermediate results.• (Locality) Each key value is distributed in higher priority to the
node on which most intermediate results of this key are located.
Volume/Locality-aware Partitioning
A simple example
Node 1 Node 2
Join key
number of KV pairs
3 12
4 2
5 9
6 12
8 35
Join key
amoutof KV pairs
1 7
2 22
4 18
8 13
9 10
Node 3
Join key
amoutof KV pairs
1 29
2 16
5 11
8 14
Distribution of intermediate results (each node has 70 KV pairs)
Volume/Locality-aware Partitioning
Partitioning skew happens when using hash partitioning
Reducer 1 Reducer 2
Key: 9Key: 6
Key: 4
Key: 1
Key: 3
Reducer 3
Key: 8
Key: 5
Key: 2
too much data are distributed to reducer-3
locality: 45%
A simple example
Volume/Locality-aware Partitioning
The demo of volume/locality-aware partitioning.
A simple example
Volume/Locality-aware Partitioning
Volume
Amount of KV pairs / N reducer(in this example, volume = 70)
Reducer 1 Reducer 2 Reducer 3
A simple example
idle volume: 70idle volume: 8
Volume/Locality-aware Partitioning
• Extract all (Key, node, sum) tuples then sorted by sum: (8, N1, 35), (1, N3, 29), (2, N2, 22), (4, N2, 18), (2, N3, 16), (8, N3, 14), (8, N2, 13), (6, N1, 12), (5, N3, 11), (9, N2, 10)......
Reducer 1 Reducer 2 Reducer 3
A simple example.
T8 = 62 (total amount of key-8 pairs)V1 = 70 (idle volume of reducer-1)
idle volume: 70 idle volume: 70
Key: 8T8 < V1,Thus key-8 is distribute to reducer-1
Volume/Locality-aware Partitioning
• Extract all (Key, node, sum) tuples then sorted by sum: (8, N1, 35), (1, N3, 29), (2, N2, 22), (4, N2, 18), (2, N3, 16), (8, N3, 14), (8, N2, 13), (6, N1, 12), (5, N3, 11), (9, N2, 10)......
Reducer 1 Reducer 2 Reducer 3
A simple example.
idle volume: 70 idle volume: 70
Key: 8
T1 < V3,key-1 is distribute to reducer-3
Key: 1
idle volume: 8
Volume/Locality-aware Partitioning
• Extract all (Key, node, sum) tuples then sorted by sum: (8, N1, 35), (1, N3, 29), (2, N2, 22), (4, N2, 18), (2, N3, 16), (8, N3, 14), (8, N2, 13), (6, N1, 12), (5, N3, 11), (9, N2, 10)......
Reducer 1 Reducer 2 Reducer 3
A simple example.
idle volume: 12 idle volume: 34
Key: 8
T6 =12, V1 = 8T6 > V1,idle volume for target reducer is not enough, mark it and go on.
Key: 1Key: 2
Key: 4
idle volume: 8
Volume/Locality-aware Partitioning
• After the traverse, there may be some key values which have not been partitioned: (3, N1, 12)
Reducer 1 Reducer 2 Reducer 3
A simple example.
idle volume: 8 idle volume: 2 idle volume: 2
Key: 8
Key: 1Key: 2
Key: 4 Key: 5
Key: 6Key: 9In this case, find out the reducer which has the most idle volume(reducer-1 for now), and distribute that key to this reducer.
Key: 3
Volume/Locality-aware Partitioning
achieve load balance
larger locality: 62%
Partitioning results using volume/locality-aware partitioning scheme
A simple example.
Reducer 1 Reducer 2 Reducer 3
Key: 8
Key: 2 Key: 1
Key: 4 Key: 5
Key: 3Key: 9 Key: 6
The process of SALA algorithm
• Phase 1: sample the dataset and pre-compute partitioning results.• Phase 2: perform the repartition join, but directly partitions the intermediate results according to the partitioning results of phase 1.
The process of SALA algorithm
We implemented the SALA algorithms and run experiments on AliCloud to verify the efficiency of our approach• SALA achieves better load-balance meanwhile reduces network
overhead.
Experiments
Data distribution of different join algorithms
Experiments
• SALA speeds up the join operation when handling skewed data.• SALA performs much better under low bandwidth.
Response time of different join algorithms
Conclusion
• With the study of MapReduce-based join, We propose SALA join algorithm, using volume/locality-aware partitioning to distribute intermediate results.
• SALA guarantees the uniform distribution of data and avoids partitioning skew problem.
• SALA takes full advantage of the data locality feature to reduce the network overhead.
• Experiments show that SALA is efficient to deal with skewed data.
Thanks for you time.
dblab.xmu.edu.cn