Handling Data Skew in Parallel Joins in Shared-Nothing Systems

Handling Data Skew in Parallel Joins in Shared-Nothing Systems

Yu Xu, Pekka Kostamaa, XinZhou (Teradata)Liang Chen (University of California)

SIGMOD’08

Presented by Kisung Kim

Introduction Parallel processing continues to be important in large

data warehouses Shared nothing architecture

– Multiple nodes communicate via high-speed interconnect network– Each node has its own private memory and disks

Parallel Unit (PU)– Virtual processors doing the scans, joins, locking, transaction

management,… Relations are horizontally partitioned across all Pus

– Hash partitioning is commonly used

PU PU PU PU PU PU PU PUData Data Data Data

2 / 28

Introduction Partitioning column

– R: x– S: y

Hash function– h(i) = i mod 3 + 1

3 / 28

Two Join Geographies Redistribution plan

– Redistribute the tables based on join attributes if they are not partitioned by the join attributes

– Join is performed on each PU in parallel

4 / 28

Two Join Geographies Duplication plan

– Duplicate tuples of the smaller relation on each PU to all Pus

– Join is performed on each PU in parallel

5 / 28

Redistribution Skew Hot PU

– After redistribution, some PUs have larger number of tuples than others

– Performance bottleneck in the whole system– Relations with many rows with the same value in the join at-

tributes Adding more nodes will not solve the skew problem Examples

– In travel booking industry, a big customer often makes a large number of reservations on behalf of its end users

– In online e-commerce, a few professionals make millions of transactions a year

– …

6 / 28

Redistribution Skew Relations in these applications are almost evenly par-

titioned When the join attribute is a non-partitioning column

attribute, severe redistribution skew happens Duplication plan can be a solution only when one join

relation is fairly small

Our solution– Partial Redistribution & Partial Duplication (PRPD) join

7 / 28

PRPD Join Assumptions

– DBAs evenly partition their data for efficient parallel process-ing

– Skewed rows tend to be evenly partitioned on each PU– The system knows the set of skewed values

Intuition– Deal with the skewed rows and non-skewed rows of R differ-

ently

8 / 28

PRPD

L1: set of skewed values R.a L2: set of skewed values S.b Step 1

– Scan Ri and split the rows into three sets Ri

2-loc: all skewed rows of Ri

Ri2-dup: every rows of Ri whose R.a value matches any value in

L2

Ri2-redis: all other rows of Ri

– Three spools for each PUi Ri

loc: all rows from Ri2-loc

Ridup: all rows of R duplicated to PUi

Riredis: all rows of R redistributed to Pui

– Similarly on S

Kept Locally Duplicated to all PUs Hash redistributed on

R.a

9 / 28

PRPD: Example

L1 = {1}L2 = {2}

10 / 28

R32-redis

PRPD Step 1

PU3 R32-dup

R32-loc

R22-redis

PU2 R22-dup

R22-loc

R12-redis

PU1 R12-dup

R12-loc

R3redis

R3dup

R3loc

R2redis

R2dup

R2loc

R1redis

R1dup

R1loc

PU3

PU2

PU1

Ri2-loc : Store Locally

11 / 28

R32-redis

PRPD Step 1

PU3 R32-dup

R32-loc

R22-redis

PU2 R22-dup

R22-loc

R12-redis

PU1 R12-dup

R12-loc

R3redis

R3dup

R3loc

R2redis

R2dup

R2loc

R1redis

R1dup

R1loc

PU3

PU2

PU1

Ri2-dup : Duplicate

12 / 28

R32-redis

PRPD Step 1

PU3 R32-dup

R32-loc

R22-redis

PU2 R22-dup

R22-loc

R12-redis

PU1 R12-dup

R12-loc

R3redis

R3dup

R3loc

R2redis

R2dup

R2loc

R1redis

R1dup

R1loc

PU3

PU2

PU1

Ri2-redis : Redistribute

13 / 28

PRPD Step 1

14 / 28

PRPD Step 2 On each PUi,

R1redis

R1dup

R1loc

PU1

S1redis

S1dup

S1loc

15 / 28

PRPD All sub-steps in each step can run in parallel

Overlapping skewed values– The overlapping skew values

Ri2-loc or Ri

2-dup ? – System chooses to include the overlapping skewed value in

only one of L1 and L2– Calculate the size of rows and choose small one

16 / 28

Comparison with Redistribution Plan Use more total spool space than redistribution plan

– PRPD duplicate some rows Less networking cost

– Keep the skewed rows locally PRPD does not send all skewed rows to a single PU

Ri2-redis

Ri2-dup

Ri2-loc

Keep locally , less network cost

Duplicate, more spool space

Same as redistribution plan

17 / 28

Comparison with Duplication Plan Less spool space than duplication plan

– Partial duplication More networking cost

– When data skew is not significant– PRPD plan needs to redistribute a large relation

Less join cost– Duplication plan always joins a complete copy of the dupli-

cated relation

18 / 28

PRPD: Hybrid of Two Plans L1= Ø, L2=Ø

– Same as redistribution plan L1=Uniq(R.a)⊃Uniq(S.b)

– Same as duplication plan (duplicate S)

19 / 28

PRPD: Hybrid of Two Plans n: the number of PUs x: percentage of the skewed rows in a relation R The number of rows of R after redistribution in redistribu-

tion – Hot PU:

– Non-hot PU: The number of rows of R after redistribution in PRPD

– Hot PU:

Ratio of the number of rows of hot PU in redistribution over the number of rows of R in PPRD

20 / 28

Experimental Evaluation Compare PRPD with redistribution plan

– Redistribution plan is more widely used than duplication plan

Schema & test query

21 / 28

Generating Skewed Data Originally 25 unique nations in TPC-H We increased the number of unique nations to 1000 5% skewness

22 / 28

Query Execution Time 10 nodes, 80 PUs Node

– Pentium IV 3.6 GHz CPUs, 4GB memory, 8 PUs 1 million rows for Supplier relation 1 million rows for Customer relation The size of query result is around 1 billion rows

23 / 28

Query Execution Time 1 Hot PUs

24 / 28

Query Execution Time 2 Hot PUs

25 / 28

Different Number of PUs Speedup ratio of PRPD over redistribution plan As the skewness increases, the speedup ratio increases The larger the system, the larger the speed up

26 / 28

Conclusions Effectively handle data skew in joins

– Important challenges in parallel DBMS We propose PRPD join

– Hybrid of redistribution and duplication plan– PRPD also can be used in multiple joins

27 / 28

Thank you

28 / 28

Documents

Handling Data Skew in Parallel Joins in Shared-Nothing Systems