Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Qifan Pu Shivaram Venkataraman Ion Stoica
LocusShuffling Fast and Slow
on Serverless Architecture
1
Serverless Computing
2
…
3
Serverless Analytics
…
User function
+
Data
i.e., Map()
…
F()
Results
Launch short-lived cloud workers with transparent elasticity and fine-grain usage billing
4
Serverless AnalyticsSo far, a great fit for embarrassingly parallel analytics
ExCamera (NSDI’17): video encoding
PyWren (SoCC’17): scaling python functions
NumPyWren: large-block matrix computation
Stanford gg compiler: distributed compiling
AWS Redshift Spectrum: ETL
5
General Serverless Analtyics
To enable general analytics, one needs to implement shuffle.
- Key operation for join() and groupby()
- 70+% of TPC-DS queries use shuffle
- Most expensive operation in production clusters
… … …Query Results
Shuffle Shuffle
How to perform cost-efficient shuffle on a serverless architecture?
An Shuffle Example: Cloud Sort
6
Lowest cost to sort 100TB of data• Record: Apache Spark, 50min / $144
…
…
…
PartitionTask
PartitionTask
…
PartitionTask
mergetask
mergetask
mergetask
……
1…100
100…201
10000…10100
…
Traditional AnalyticsData communicated directly between servers that execute tasks.
mappertask
mappertask
mappertask
reducertask
reducertask
Server
Server
Server
7
Traditional AnalyticsData communicated directly between servers that execute tasks.
reducertask
reducertask
Server
Server
Server
8
How to shuffle in serverless?
Tasks are short-lived
mappertask
mappertask
mappertask
reducertask
reducertask
Server
Server
Server
9
Tasks are short-lived
reducertask
reducertask
Server
Server
Server
10
How to shuffle in serverless?
Shuffling directly between tasks is difficult in serverless.
How about using S3?- Many frameworks already use it for input/results.- Cheap capacity and elastic bandwidth
11
How to shuffle in serverless?
0
40
80
120
0 1000 2000 3000 4000
Agg
rega
te B
W (G
B/s
)
Number of Lambda Workers
write
read
Shuffle with S3
mappertask
mappertask
mappertask
reducertask
reducertask
Server
Server
Server
Compute
S3
12
How to shuffle in serverless?
Shuffle with S3
reducertask
reducertask
Server
Server
Server
Compute 13
How to shuffle in serverless?
S3How much does it cost for CloudSort?
Shuffle with S3
mappertask
mappertask
mappertask
reducertask
reducertask
Server
Server
Server
Compute
S3
14
How to shuffle in serverless?
$$$
How many requests does it take?
An Shuffle Example: Cloud Sort
15
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
…
• Total number of transfers:- Assuming 1GB serverless containers- Num partition/merge tasks = 100TB/1GB = 10#- Number of files: 10$% = 10 billion files
…
An Shuffle Example: Cloud Sort
16
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
…
• Total number of transfers:- Assuming 1GB serverless containers- Num partition/merge tasks = 100TB/1GB = 10#- Number of files: 10$% = 10 billion files- $0.000005/op -> $50k
…
An Shuffle Example: Cloud Sort
17
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
……
0
4000
8000
0 100 200 300
IOPS
Number of Lambda Workers
Bottlenecked at 4400 ops/sec!
Takes !"#$
%%"" seconds = 26 days
An Shuffle Example: Cloud Sort
18
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
……
0
4000
8000
0 100 200 300
IOPS
Number of Lambda Workers
Bottlenecked at 4400 ops/sec!
Takes !"#$
%%"" seconds = 26 daysS3 lacks cheap and elastic IOPS.
AWS s3 ElastiCache (Redis)Azure BlobAzure CacheGoogle Cloud StorageGoogle MemoryStore…
An Shuffle Example: Cloud Sort
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
……
Require 158 large (r5.24xlarge) instances!Costs $1.6K/hour.
Capacity alone is 10x more expansive than record.
…
An Shuffle Example: Cloud Sort
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
……
Require 158 large (r5.24xlarge) instances!Costs $1.6K/hour.
Capacity alone is 10x more expansive than record.
…
Redis capacity is too expensive for large intermediate data.
Cloud Storage
Service $/Mo/GB $/million writes
AWS S3 0.023 5
GCS 0.026 5
Azure Blob 0.023 6.25
ElastiCache 7.9 0
MemoryStore 16.5 0
Azure Cache 11.6 0
… … …
Capacity IOPS
Slow Storage
Fast Storage
High Low
Low High
Can we leverage the strengths of two storages types?
Locus
22
• Hybrid slow and fast cloud storage to achieve cost-efficient shuffle performance – Intuition: use fast storage to absorb IOPS and create bigger
chunks for slow storage.
Hybrid sort (100TB)
Round1:
Round2:
Round20:
…
Clean cache after each round
…
finalmerge
5TB 5TB 5TB
num reqs = num mergers = 10#partition merge
• Total number of S3 reqs: 20×10# ('(. 10*+)• Total memory cache needed: 5TB (vs. 100TB)
1…100
100…201
10000…10100…5TB 5TB 5TBpartition merge
5TB 5TB 5TBpartition merge
Hybrid sort (100TB)
Round1:
Round2:
Round20:
…
Clean cache after each round
…
5TB 5TB 5TB
num reqs = num mergers = 10#partition merge
• Spark record: 50min + $144• Locus: 50min + $163
1…100
100…201
10000…10100…5TB 5TB 5TBpartition merge
5TB 5TB 5TBpartition merge
finalmerge
Hybrid sort (100TB)
Round1:
Round2:
Round20:
…
Clean cache after each round
…
5TB 5TB 5TB
num reqs = num mergers = 10#partition merge
• Spark record: 50min + $144• Locus: 50min + $163
1…100
100…201
10000…10100…5TB 5TB 5TBpartition merge
5TB 5TB 5TBpartition merge
Locus achieves same performance with 5X less resource.
finalmerge
Locus
26
• Hybrid slow and fast cloud storage to achieve cost-efficient shuffle performance – Intuition: use fast storage to absorb IOPS and create bigger
chunks for slow storage.
• When to use hybrid shuffle?– Alongside, many other configurations
– Amount of fast cache– Level of parallelism– Worker size
Locus Performance Model
27
Navigating cost-performance with different compute and storage configurations is difficult.
28
Navigating cost-performance with different compute and storage configurations is difficult.– How many serverless workers? Which size?
1
10
100
1000
10000
0 0.5 1 1.5 2
Shuf
fle T
ime
(s)
Worker Memory Size (GB)
20GB
Locus Performance Model
29
Navigating cost-performance with different compute and storage configurations is difficult.– How many serverless workers? Which size?
1
10
100
1000
10000
0 0.5 1 1.5 2
Shuf
fle T
ime
(s)
Worker Memory Size (GB)
20GB
1TB
Locus Performance Model
30
Navigating cost-performance with different compute and storage configurations is difficult.– How many serverless workers? Which size?
1
10
100
1000
10000
0 0.5 1 1.5 2
Shuf
fle T
ime
(s)
Worker Memory Size (GB)
20GB200GB1TB
Locus Performance Model
Performance model example: slow-storage only shuffle
31
! – IOPSS – shuffle sizeW – worker size
"2 = $%&%×(
" = max "1, "2
mappertask
mappertask
mappertask
Server
Server
Server
…
S – shuffle sizeB – worker BWP – parallelism
"1 = /0 × 1
Evaluation
32
• Implement Locus on top of PyWren– AWS Lambda– S3 (slow) – Redis (fast)
• Workloads:– CloudSort– TPC-DS– Big data benchmark
• Baseline:– PyWren– Apache Spark on VMs– Redshift
Evaluation
33
Reduce resource usage by up to 59% while achieving comparable performance!
Q1 Q16 Q94 Q95
clus
tert
ime
(cor
e·se
c)
Locus
Spark-58%
-59%
Q1 Q16 Q94 Q95av
erag
e qu
ery
time
(s)
Locus
Spark
TPC-DS
Related Works
34
Pocket (OSDI’18)• New elastic store that scales to application demand
Locus
35
• By judiciously combining slow and fast cloud storage, one can achieve cost-efficient shuffle for serverless analytics.
• Locus achieves this by:– Design algorithms that leverage storage characteristics– A performance model that captures the performance and cost
metrics of shuffle operations.
• 4×-500× performance improvements over baseline and reduce resource usage by up to 59% while achieving comparable performance with traditional analytics