Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
Clarinet: WAN-Aware Optimization for Analytics Queries
Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella
1
Overview
2
• Web apps hosted on multiple DCs Low latency access to end-user
Overview
2
• Web apps hosted on multiple DCs Low latency access to end-user
Overview
2
• Web apps hosted on multiple DCs Low latency access to end-user
Overview
2
• Web apps hosted on multiple DCs Low latency access to end-user
Overview
2
• Web apps hosted on multiple DCs Low latency access to end-user• Need efficient methods to analyze data located in multiple data centers
Centralized Aggregation is Wasteful
3
Centralized Aggregation is Wasteful
3
Intra-data centerAnalytics
Framework
SELECT * … FROM .. WHERE .. ;
Centralized Aggregation is Wasteful
3
Intra-data centerAnalytics
Framework
SELECT * … FROM .. WHERE .. ;
• Available WAN bandwidth is limited Aggregation latency overhead
0
50
100
150
200
250
300
350
400
450
500
1 11 21 31 41 51 61 71 81
Ban
dw
idth
(M
bp
s)
Directional WAN links sorted by bandwidth
Measured pairwise bandwidth between EC2 regions
450 Mbps
20 Mbps
Centralized Aggregation is Wasteful
3
Intra-data centerAnalytics
Framework
SELECT * … FROM .. WHERE .. ;
• Available WAN bandwidth is limited Aggregation latency overhead
• WAN links are expensive High data transfer cost
$$$$
$$$$
$$$
Geo-distributed Analytics
4
Geo-distributed Analytics
4
Analytics framework
One Logical Datacenter
Geo-distributed Analytics
4
Analytics framework
One Logical Datacenter
Distributed Storage Layer
Geo-distributed Analytics
4
Analytics framework
One Logical Datacenter
Distributed Storage Layer
Distributed Execution Layer
SELECT * … FROM .. WHERE .. ;
Query Optimizer
Multi-stage parallelizable jobs
Geo-distributed Analytics
4
Analytics framework
One Logical Datacenter
Distributed Storage Layer
Distributed Execution Layer
SELECT * … FROM .. WHERE .. ;
Query Optimizer
Multi-stage parallelizable jobs
Geo-distributed
Requires WAN-aware optimization
Geo-distributed Analytics
4
Analytics framework
One Logical Datacenter
Distributed Storage Layer
Distributed Execution Layer
SELECT * … FROM .. WHERE .. ;
Query Optimizer
Multi-stage parallelizable jobs
Geo-distributed
Requires WAN-aware optimization
Iridium [SIGCOMM 15]GeoDe [NSDI 15]
Geo-distributed Analytics
4
Analytics framework
One Logical Datacenter
Distributed Storage Layer
Distributed Execution Layer
SELECT * … FROM .. WHERE .. ;
Query Optimizer
Multi-stage parallelizable jobs
Geo-distributed
Requires WAN-aware optimization
Clarinet
Geo-distributed Analytics
4
Analytics framework
One Logical Datacenter
Distributed Storage Layer
Distributed Execution Layer
SELECT * … FROM .. WHERE .. ;
Query Optimizer
Multi-stage parallelizable jobs
Geo-distributed
Requires WAN-aware optimization
Clarinet
2.7x reduction in query runtime
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
5
T1, T2, T3: Tables storing click logs
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
T1, T2, T3: Tables storing click logs
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
⋈
T1
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
10 GB 200 GB
200 GB 200 GB⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
⋈
T1
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
10 GB 200 GB
200 GB 200 GB⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
⋈
T1
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
Plan running time: 41 s
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
10 GB 200 GB
200 GB 200 GB⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
⋈
T1
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
40 s1 s
Plan running time: 41 s
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
10 GB 200 GB
200 GB 200 GB⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
⋈
T1
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
40 s1 s
Plan running time: 41 s
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
WAN-only bottleneck
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
10 GB 200 GB
200 GB 200 GB⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
⋈
T1
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
40 s1 s
Plan running time: 41 s
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
⋈
⋈12 GB
200 GB 200 GB
T2 T1
T3
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 ⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
Plan A Plan B Plan C
Plan running time: 20.96 s Plan running time: 17.6 s
WAN-only bottleneck
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
10 GB 200 GB
200 GB 200 GB⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
⋈
T1
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
40 s1 s
Plan running time: 41 s
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
⋈
⋈12 GB
200 GB 200 GB
T2 T1
T3
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 ⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
Plan A Plan B Plan C
Plan running time: 20.96 s Plan running time: 17.6 s
Chosen by network agnostic query optimizer
WAN-only bottleneck
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
10 GB 200 GB
200 GB 200 GB⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
⋈
T1
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
40 s1 s
Plan running time: 41 s
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
⋈
⋈12 GB
200 GB 200 GB
T2 T1
T3
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 ⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
Plan A Plan B Plan C
Plan running time: 20.96 s Plan running time: 17.6 s
Chosen by network agnostic query optimizer
WAN-only bottleneck
WAN Aware Query Optimization
T2
DC2
T3
DC3
T1
DC1
80 Gbps 40 Gbps
100 Gbps
5
10 GB 200 GB
200 GB 200 GB⋈
T2 T3
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
⋈
T1
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
40 s1 s
Plan running time: 41 s
QUERY
SELECT T1.user, T1.latency, T2.latency, T3.latency
FROM T1, T2, T3
WHERE T1.user == T2.user AND T1.user == T3.user
AND T1.device == T2.device == T3.device == “mobile”;
T1, T2, T3: Tables storing click logs
⋈
⋈12 GB
200 GB 200 GB
T2 T1
T3
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 ⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝑀𝑜𝑏𝑖𝑙𝑒
Plan A Plan B Plan C
Plan running time: 20.96 s Plan running time: 17.6 s
Chosen by network agnostic query optimizer
WAN-only bottleneck
WAN-aware query optimizer that uses network transfer duration to choose query plans
Outline
1. Motivation
2. Challenges in choosing query plan based on WAN transfer durations
3. Solution• Single query
• Multiple simultaneous queries
4. Experimental Evaluation
6
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶
T1 T2
MAP: SELECT MAP: SELECT
REDUCE: JOIN
200 GB200 GB
Map Reduce Job
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶20 s
T1 T2
MAP: SELECT MAP: SELECT
REDUCE: JOIN
200 GB200 GB
Map Reduce Job
Tasks placed in single DC
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶
10 s
T1 T2
MAP: SELECT MAP: SELECT
REDUCE: JOIN
200 GB200 GB
Map Reduce Job
Tasks are placed uniformly across DC1 and DC2
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶
T1 T2
MAP: SELECT MAP: SELECT
REDUCE: JOIN
200 GB200 GB
Map Reduce Job
1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s
Tasks are placed uniformly across DC1 and DC2
While evaluating different query plans
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶
T1 T2
MAP: SELECT MAP: SELECT
REDUCE: JOIN
200 GB200 GB
Map Reduce Job
1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s
20.5 s11.2 s
Tasks are placed uniformly across DC1 and DC2
While evaluating different query plans
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶
T1 T2
MAP: SELECT MAP: SELECT
REDUCE: JOIN
200 GB200 GB
Map Reduce Job
1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s
20.5 s11.2 s
Tasks are placed uniformly across DC1 and DC2
While evaluating different query plans
Used by high priority application
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶
T1 T2
MAP: SELECT MAP: SELECT
REDUCE: JOIN
200 GB200 GB
Map Reduce Job
1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s
20.5 s11.2 s
Tasks are placed uniformly across DC1 and DC2
While evaluating different query plans
Used by high priority application
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶
T1 T2
MAP: SELECT MAP: SELECT
REDUCE: JOIN
200 GB200 GB
Map Reduce Job
1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s
20.5 s11.2 s
Tasks are placed uniformly across DC1 and DC2
While evaluating different query plans
Used by high priority application
Choose query plan based on:1. Best available task placements
Other factors also affect query plan run time
7
80 Gbps
40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
⋈200 GB 200 GB
T1 T2
𝜎𝐶 𝜎𝐶
T1 T2
MAP: SELECT MAP: SELECT
REDUCE: JOIN
200 GB200 GB
Map Reduce Job
1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s
20.5 s11.2 s
Tasks are placed uniformly across DC1 and DC2
While evaluating different query plans
Used by high priority application
Choose query plan based on:1. Best available task placements2. Schedule of network transfers
Joint plan selection, placement and scheduling
8
Joint plan selection, placement and scheduling
Query Optimizer Multiple query plans (join orders) per query
SELECT * FROM … WHERE.. ;
8
Joint plan selection, placement and scheduling
Query Optimizer Multiple query plans (join orders) per query
SELECT * FROM … WHERE.. ;
8
Logical plan to physical plan Assign parallelism for each stage
Joint plan selection, placement and scheduling
Clarinet
Query Optimizer
Network aware task placement and scheduling for each query plan
Multiple query plans (join orders) per query
Choose plan with smallest run time for execution
SELECT * FROM … WHERE.. ;
8
Logical plan to physical plan Assign parallelism for each stage
Joint plan selection, placement and scheduling
Clarinet
Query Optimizer
Network aware task placement and scheduling for each query plan
Multiple query plans (join orders) per query
Choose plan with smallest run time for execution
SELECT * FROM … WHERE.. ;
8
Logical plan to physical plan Assign parallelism for each stageClarinet binds query to plan lower in the stack
Network aware placement and scheduling
9
T1 T3
T2SELECT SELECT
SELECTJOIN
JOIN
Network aware placement and scheduling
• Task placement decided greedily one stage at a time• Minimize per stage run time
9
T1 T3
T2SELECT SELECT
SELECTJOIN
JOIN
Network aware placement and scheduling
• Task placement decided greedily one stage at a time• Minimize per stage run time
• Scheduling of network transfers• Determines start times of inter-DC network transfers
9
T1 T3
T2SELECT SELECT
SELECTJOIN
JOIN
Network aware placement and scheduling
• Task placement decided greedily one stage at a time• Minimize per stage run time
• Scheduling of network transfers• Determines start times of inter-DC network transfers• Formulate a Binary Integer Linear Program to solve
scheduling• Factors transfer dependencies
9
T1 T3
T2SELECT SELECT
SELECTJOIN
JOIN
10
How to extend the late-binding strategy to multiple queries?
Queries affect each others’ run time
11
80 Gbps 40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
Queries affect each others’ run time
11
80 Gbps 40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
QUERY 1
SELECT …
device == “mobile”
…;
QUERY 2
SELECT …
genre == “pc”
…;
Queries affect each others’ run time
11
⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝐶
80 Gbps 40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
QUERY 1
SELECT …
device == “mobile”
…;
QUERY 2
SELECT …
genre == “pc”
…;
⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑃𝐶 𝜎𝑃𝐶
𝜎𝑅
Same query plan (Plan C) for Query 1 and Query 2
Queries affect each others’ run time
11
⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝐶
80 Gbps 40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
QUERY 1
SELECT …
device == “mobile”
…;
QUERY 2
SELECT …
genre == “pc”
…;
⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑃𝐶 𝜎𝑃𝐶
𝜎𝑅
Same query plan (Plan C) for Query 1 and Query 2
Contention increases query run time
Queries affect each others’ run time
11
80 Gbps 40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
QUERY 1
SELECT …
device == “mobile”
…;
QUERY 2
SELECT …
genre == “pc”
…;
⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝐶 ⋈
⋈12 GB
200 GB 200 GB
T2 T1
T3
200 GB
𝜎𝑃𝐶 𝜎𝑃𝐶
𝜎𝐶
Different query plans for Query 1 (Plan C) and Query 2 (Plan B)
Queries affect each others’ run time
11
80 Gbps 40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
QUERY 1
SELECT …
device == “mobile”
…;
QUERY 2
SELECT …
genre == “pc”
…;
⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝐶 ⋈
⋈12 GB
200 GB 200 GB
T2 T1
T3
200 GB
𝜎𝑃𝐶 𝜎𝑃𝐶
𝜎𝐶
Different query plans for Query 1 (Plan C) and Query 2 (Plan B)No contention of network links
Queries affect each others’ run time
11
80 Gbps 40 Gbps
100 Gbps
T2
DC2
T3
DC3
T1
DC1
QUERY 1
SELECT …
device == “mobile”
…;
QUERY 2
SELECT …
genre == “pc”
…;
⋈
⋈16 GB
200 GB 200 GB
T1 T3
T2
200 GB
𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒
𝜎𝐶 ⋈
⋈12 GB
200 GB 200 GB
T2 T1
T3
200 GB
𝜎𝑃𝐶 𝜎𝑃𝐶
𝜎𝐶
Different query plans for Query 1 (Plan C) and Query 2 (Plan B)No contention of network links
Choosing execution plans jointly for multiple queries improves performance
Iterative Shortest Job First
QO
QUERY A QUERY B
QO
QUERY C
QO• Best combination minimize average
completion• Computationally intractable
12
Iterative Shortest Job First
Clarinet
QO
QUERY A QUERY B
QO
QUERY C
QO• Best combination minimize average
completion• Computationally intractable
• Iterative Shortest Job First (SJF) scheduling heuristic
1. Pick shortest physical query plan in each iteration
12
Iterative Shortest Job First
Clarinet
QO
QUERY A QUERY B
QO
QUERY C
QO• Best combination minimize average
completion• Computationally intractable
• Iterative Shortest Job First (SJF) scheduling heuristic
1. Pick shortest physical query plan in each iteration
10 1218 5 8 20 30Iter 1:
12
Iterative Shortest Job First
Clarinet
QO
QUERY A QUERY B
QO
QUERY C
QO• Best combination minimize average
completion• Computationally intractable
• Iterative Shortest Job First (SJF) scheduling heuristic
1. Pick shortest physical query plan in each iteration
10 1218 5 8 20 30Iter 1:
12
Iterative Shortest Job First
Clarinet
QO
QUERY A QUERY B
QO
QUERY C
QO• Best combination minimize average
completion• Computationally intractable
• Iterative Shortest Job First (SJF) scheduling heuristic
1. Pick shortest physical query plan in each iteration
• Reserve bandwidth to guarantee completion time
10 1218 5 8 20 30Iter 1:
0 t12
B1Link 1
Link 2
5
Iterative Shortest Job First
Clarinet
QO
QUERY AQUERY B
QO
QUERY C
QO• Best combination minimize average
completion• Computationally intractable
• Iterative Shortest Job First (SJF) scheduling heuristic
1. Pick shortest physical query plan in each iteration
• Reserve bandwidth to guarantee completion time
0 t12
B1Link 1
Link 2
5
Iterative Shortest Job First
Clarinet
QO
QUERY AQUERY B
QO
QUERY C
QO• Best combination minimize average
completion• Computationally intractable
• Iterative Shortest Job First (SJF) scheduling heuristic
1. Pick shortest physical query plan in each iteration
• Reserve bandwidth to guarantee completion time
15 1718 25 30Iter 2:
0 t12
B1Link 1
Link 2
5
Iterative Shortest Job First
Clarinet
QO
QUERY AQUERY B
QO
QUERY C
QO• Best combination minimize average
completion• Computationally intractable
• Iterative Shortest Job First (SJF) scheduling heuristic
1. Pick shortest physical query plan in each iteration
• Reserve bandwidth to guarantee completion time
15 1718 25 30Iter 2:
0 t12
B1Link 1
Link 2
5
Iterative Shortest Job First
Clarinet
QO
QUERY AQUERY B
QO
QUERY C
QO• Best combination minimize average
completion• Computationally intractable
• Iterative Shortest Job First (SJF) scheduling heuristic
1. Pick shortest physical query plan in each iteration
• Reserve bandwidth to guarantee completion time
15 1718 25 30Iter 2:
A1
A2
0 t12
B1Link 1
Link 2
5 157
Avoid fragmentation and improve completion time
13
Avoid fragmentation and improve completion time
• SJF & reservation leads to bandwidth fragmentation
13
Avoid fragmentation and improve completion time
• SJF & reservation leads to bandwidth fragmentation
13
A1
A2
0 t
B1Link 1
Link 2
1210
Scheduled in SJF order
22
Avoid fragmentation and improve completion time
• SJF & reservation leads to bandwidth fragmentation
13
A1
A2
0 t
B1Link 1
Link 2
1210
Scheduled in SJF order
22
Dominant transfers execute sequentially
Avoid fragmentation and improve completion time
• SJF & reservation leads to bandwidth fragmentation
13
Extended idling
A1
A2
0 t
B1Link 1
Link 2
1210
Scheduled in SJF order
22
Dominant transfers execute sequentially
Avoid fragmentation and improve completion time
• SJF & reservation leads to bandwidth fragmentation
13
Alternate schedule with same query plans
0 t
B1A1
A2
122
Extended idling
A1
A2
0 t
B1Link 1
Link 2
1210
Scheduled in SJF order
22
Dominant transfers execute sequentially
Avoid fragmentation and improve completion time
• SJF & reservation leads to bandwidth fragmentation
13
Alternate schedule with same query plans
0 t
B1A1
A2
122
Extended idling
A1
A2
0 t
B1Link 1
Link 2
1210
Scheduled in SJF order
22
Dominant transfers execute sequentiallyRe-arranging transfers resulting in deviation from
SJF schedule can help
k-Shortest Jobs First Heuristic
14
Link 1
Link 2
Link nOffline schedule
t
k-Shortest Jobs First Heuristic
14
Link 1
Link 2
• Identify transfers of k-shortest yet incomplete jobs
Link nOffline schedule
t
k-Shortest Jobs First Heuristic
14
Link 1
Link 2
• Identify transfers of k-shortest yet incomplete jobs• Relax transfer schedule Start as soon as link is free and task is available
Link nOffline schedule
t
k-Shortest Jobs First Heuristic
14
Link 1
Link 2
• Identify transfers of k-shortest yet incomplete jobs• Relax transfer schedule Start as soon as link is free and task is available• Best ’k’ Prior observations (or) through offline simulations
Link nOffline schedule
t
Clarinet Implementation
15
QUERY 1 QUERY 2 QUERY 3
QO QO QO Existing Query Optimizers
Batch of queries
Clarinet Implementation
15
QUERY 1 QUERY 2 QUERY 3
QO QO QO Existing Query Optimizers• Modified Hive to generate multiple
plans
Batch of queries
Clarinet Implementation
15
QUERY 1 QUERY 2 QUERY 3
QO QO QO Existing Query Optimizers• Modified Hive to generate multiple
plans• QOs control set of generated plans• Existing optimizations are applied
• Push down Select• Partition pruning
Batch of queries
Clarinet Implementation
15
QUERY 1 QUERY 2 QUERY 3
QO QO QO Existing Query Optimizers• Modified Hive to generate multiple
plans• QOs control set of generated plans• Existing optimizations are applied
• Push down Select• Partition pruning
Batch of queries
Enforces Clarinet’s schedule
Clarinet
Execution framework
Clarinet Implementation
15
QUERY 1 QUERY 2 QUERY 3
QO QO QO Existing Query Optimizers• Modified Hive to generate multiple
plans• QOs control set of generated plans• Existing optimizations are applied
• Push down Select• Partition pruning
Batch of queries
Enforces Clarinet’s schedule• Modified Tez’s DAGScheduler
Clarinet
Execution framework
Clarinet Implementation
15
QUERY 1 QUERY 2 QUERY 3
QO QO QO Existing Query Optimizers• Modified Hive to generate multiple
plans• QOs control set of generated plans• Existing optimizations are applied
• Push down Select• Partition pruning
Batch of queriesOnline query arrivals
Enforces Clarinet’s schedule• Modified Tez’s DAGScheduler• Fairness guarantees
Clarinet
Execution framework
16
Evaluation
Compare Clarinet with following GDA approaches:
16
Evaluation
1. Hive2. Hive + Iridium3. Hive + Reducers in single DC
Compare Clarinet with following GDA approaches:
16
Evaluation
1. Hive2. Hive + Iridium3. Hive + Reducers in single DC
: WAN agnostic task placement + scheduling
Compare Clarinet with following GDA approaches:
16
Evaluation
1. Hive2. Hive + Iridium3. Hive + Reducers in single DC
: WAN agnostic task placement + scheduling: WAN aware task placement across DCs
Compare Clarinet with following GDA approaches:
16
Evaluation
1. Hive2. Hive + Iridium3. Hive + Reducers in single DC
: WAN agnostic task placement + scheduling: WAN aware task placement across DCs: Distributed filtering + central aggregation
Compare Clarinet with following GDA approaches:
16
Evaluation
1. Hive2. Hive + Iridium3. Hive + Reducers in single DC
: WAN agnostic task placement + scheduling: WAN aware task placement across DCs: Distributed filtering + central aggregation
Compare Clarinet with following GDA approaches:
• Geo-Distributed Analytics stack across 10 EC2 regions
16
Evaluation
1. Hive2. Hive + Iridium3. Hive + Reducers in single DC
: WAN agnostic task placement + scheduling: WAN aware task placement across DCs: Distributed filtering + central aggregation
Compare Clarinet with following GDA approaches:
• Geo-Distributed Analytics stack across 10 EC2 regions
• Workload:• 30 batches of 12 randomly chosen TPC-DS queries
Evaluation: Reduction in average completion time
17
GDA ApproachVs. Hive
Average Gains
Clarinet 2.7x
Hive + Iridium 1.5x
Hive + Reducers in single DC
0.6x
Evaluation: Reduction in average completion time
17
GDA ApproachVs. Hive
Average Gains
Clarinet 2.7x
Hive + Iridium 1.5x
Hive + Reducers in single DC
0.6x
Clarinet chooses a different plan for 75% of queries
Evaluation: Reduction in average completion time
17
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 6 11 16 21 26 31 36 41 46 51 56
CD
F
Link ID sorted by bandwidth
WAN bandwidth distribution
Hive bytesdistribution
Clarinet bytesdistribution
Data from a single batch 12 queries
GDA ApproachVs. Hive
Average Gains
Clarinet 2.7x
Hive + Iridium 1.5x
Hive + Reducers in single DC
0.6x
Clarinet chooses a different plan for 75% of queries
Evaluation: Reduction in average completion time
17
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 6 11 16 21 26 31 36 41 46 51 56
CD
F
Link ID sorted by bandwidth
WAN bandwidth distribution
Hive bytesdistribution
Clarinet bytesdistribution
Data from a single batch 12 queries
GDA ApproachVs. Hive
Average Gains
Clarinet 2.7x
Hive + Iridium 1.5x
Hive + Reducers in single DC
0.6x
Clarinet chooses a different plan for 75% of queries
Evaluation: Reduction in average completion time
17
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 6 11 16 21 26 31 36 41 46 51 56
CD
F
Link ID sorted by bandwidth
WAN bandwidth distribution
Hive bytesdistribution
Clarinet bytesdistribution
Data from a single batch 12 queries
GDA ApproachVs. Hive
Average Gains
Clarinet 2.7x
Hive + Iridium 1.5x
Hive + Reducers in single DC
0.6x
Clarinet chooses a different plan for 75% of queries
18
Evaluation: Optimization overhead
18
Evaluation: Optimization overhead
1. Generate multiple query plans
2. Iterative multi-query plan selection
18
Evaluation: Optimization overhead
1. Generate multiple query plans• Up to 64 plans in less than 5 s
2. Iterative multi-query plan selection
18
Evaluation: Optimization overhead
1. Generate multiple query plans• Up to 64 plans in less than 5 s
2. Iterative multi-query plan selection• Max. 15 s for batches with 12 queries
18
Evaluation: Optimization overhead
1. Generate multiple query plans• Up to 64 plans in less than 5 s
2. Iterative multi-query plan selection• Max. 15 s for batches with 12 queries
Insignificant w.r.t. query running times (order of 10’s of minutes)
Summary
• WAN-awareness in QO + cross-layer optimization
19
Distributed Storage Layer
Distributed Execution Layer
Query Optimizer Clarinet
Summary
• WAN-awareness in QO + cross-layer optimization
• Presented a scalable way to implement multi-query optimization with minimal overhead
19
Distributed Storage Layer
Distributed Execution Layer
Query Optimizer Clarinet
Summary
• WAN-awareness in QO + cross-layer optimization
• Presented a scalable way to implement multi-query optimization with minimal overhead
19
Distributed Storage Layer
Distributed Execution Layer
Query Optimizer Clarinet
2.7xReduction in average completion time