35
Qifan Pu Shivaram Venkataraman Ion Stoica Locus Shuffling Fast and Slow on Serverless Architecture 1

Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Qifan Pu Shivaram Venkataraman Ion Stoica

LocusShuffling Fast and Slow

on Serverless Architecture

1

Page 2: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Serverless Computing

2

Page 3: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

3

Serverless Analytics

User function

+

Data

i.e., Map()

F()

Results

Launch short-lived cloud workers with transparent elasticity and fine-grain usage billing

Page 4: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

4

Serverless AnalyticsSo far, a great fit for embarrassingly parallel analytics

ExCamera (NSDI’17): video encoding

PyWren (SoCC’17): scaling python functions

NumPyWren: large-block matrix computation

Stanford gg compiler: distributed compiling

AWS Redshift Spectrum: ETL

Page 5: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

5

General Serverless Analtyics

To enable general analytics, one needs to implement shuffle.

- Key operation for join() and groupby()

- 70+% of TPC-DS queries use shuffle

- Most expensive operation in production clusters

… … …Query Results

Shuffle Shuffle

How to perform cost-efficient shuffle on a serverless architecture?

Page 6: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

An Shuffle Example: Cloud Sort

6

Lowest cost to sort 100TB of data• Record: Apache Spark, 50min / $144

PartitionTask

PartitionTask

PartitionTask

mergetask

mergetask

mergetask

……

1…100

100…201

10000…10100

Page 7: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Traditional AnalyticsData communicated directly between servers that execute tasks.

mappertask

mappertask

mappertask

reducertask

reducertask

Server

Server

Server

7

Page 8: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Traditional AnalyticsData communicated directly between servers that execute tasks.

reducertask

reducertask

Server

Server

Server

8

Page 9: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

How to shuffle in serverless?

Tasks are short-lived

mappertask

mappertask

mappertask

reducertask

reducertask

Server

Server

Server

9

Page 10: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Tasks are short-lived

reducertask

reducertask

Server

Server

Server

10

How to shuffle in serverless?

Shuffling directly between tasks is difficult in serverless.

Page 11: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

How about using S3?- Many frameworks already use it for input/results.- Cheap capacity and elastic bandwidth

11

How to shuffle in serverless?

0

40

80

120

0 1000 2000 3000 4000

Agg

rega

te B

W (G

B/s

)

Number of Lambda Workers

write

read

Page 12: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Shuffle with S3

mappertask

mappertask

mappertask

reducertask

reducertask

Server

Server

Server

Compute

S3

12

How to shuffle in serverless?

Page 13: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Shuffle with S3

reducertask

reducertask

Server

Server

Server

Compute 13

How to shuffle in serverless?

S3How much does it cost for CloudSort?

Page 14: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Shuffle with S3

mappertask

mappertask

mappertask

reducertask

reducertask

Server

Server

Server

Compute

S3

14

How to shuffle in serverless?

$$$

How many requests does it take?

Page 15: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

An Shuffle Example: Cloud Sort

15

PartitionTask …

PartitionTask …

PartitionTask …

mergetask

mergetask

mergetask

• Total number of transfers:- Assuming 1GB serverless containers- Num partition/merge tasks = 100TB/1GB = 10#- Number of files: 10$% = 10 billion files

Page 16: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

An Shuffle Example: Cloud Sort

16

PartitionTask …

PartitionTask …

PartitionTask …

mergetask

mergetask

mergetask

• Total number of transfers:- Assuming 1GB serverless containers- Num partition/merge tasks = 100TB/1GB = 10#- Number of files: 10$% = 10 billion files- $0.000005/op -> $50k

Page 17: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

An Shuffle Example: Cloud Sort

17

PartitionTask …

PartitionTask …

PartitionTask …

mergetask

mergetask

mergetask

……

0

4000

8000

0 100 200 300

IOPS

Number of Lambda Workers

Bottlenecked at 4400 ops/sec!

Takes !"#$

%%"" seconds = 26 days

Page 18: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

An Shuffle Example: Cloud Sort

18

PartitionTask …

PartitionTask …

PartitionTask …

mergetask

mergetask

mergetask

……

0

4000

8000

0 100 200 300

IOPS

Number of Lambda Workers

Bottlenecked at 4400 ops/sec!

Takes !"#$

%%"" seconds = 26 daysS3 lacks cheap and elastic IOPS.

AWS s3 ElastiCache (Redis)Azure BlobAzure CacheGoogle Cloud StorageGoogle MemoryStore…

Page 19: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

An Shuffle Example: Cloud Sort

PartitionTask …

PartitionTask …

PartitionTask …

mergetask

mergetask

mergetask

……

Require 158 large (r5.24xlarge) instances!Costs $1.6K/hour.

Capacity alone is 10x more expansive than record.

Page 20: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

An Shuffle Example: Cloud Sort

PartitionTask …

PartitionTask …

PartitionTask …

mergetask

mergetask

mergetask

……

Require 158 large (r5.24xlarge) instances!Costs $1.6K/hour.

Capacity alone is 10x more expansive than record.

Redis capacity is too expensive for large intermediate data.

Page 21: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Cloud Storage

Service $/Mo/GB $/million writes

AWS S3 0.023 5

GCS 0.026 5

Azure Blob 0.023 6.25

ElastiCache 7.9 0

MemoryStore 16.5 0

Azure Cache 11.6 0

… … …

Capacity IOPS

Slow Storage

Fast Storage

High Low

Low High

Can we leverage the strengths of two storages types?

Page 22: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Locus

22

• Hybrid slow and fast cloud storage to achieve cost-efficient shuffle performance – Intuition: use fast storage to absorb IOPS and create bigger

chunks for slow storage.

Page 23: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Hybrid sort (100TB)

Round1:

Round2:

Round20:

Clean cache after each round

finalmerge

5TB 5TB 5TB

num reqs = num mergers = 10#partition merge

• Total number of S3 reqs: 20×10# ('(. 10*+)• Total memory cache needed: 5TB (vs. 100TB)

1…100

100…201

10000…10100…5TB 5TB 5TBpartition merge

5TB 5TB 5TBpartition merge

Page 24: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Hybrid sort (100TB)

Round1:

Round2:

Round20:

Clean cache after each round

5TB 5TB 5TB

num reqs = num mergers = 10#partition merge

• Spark record: 50min + $144• Locus: 50min + $163

1…100

100…201

10000…10100…5TB 5TB 5TBpartition merge

5TB 5TB 5TBpartition merge

finalmerge

Page 25: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Hybrid sort (100TB)

Round1:

Round2:

Round20:

Clean cache after each round

5TB 5TB 5TB

num reqs = num mergers = 10#partition merge

• Spark record: 50min + $144• Locus: 50min + $163

1…100

100…201

10000…10100…5TB 5TB 5TBpartition merge

5TB 5TB 5TBpartition merge

Locus achieves same performance with 5X less resource.

finalmerge

Page 26: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Locus

26

• Hybrid slow and fast cloud storage to achieve cost-efficient shuffle performance – Intuition: use fast storage to absorb IOPS and create bigger

chunks for slow storage.

• When to use hybrid shuffle?– Alongside, many other configurations

– Amount of fast cache– Level of parallelism– Worker size

Page 27: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Locus Performance Model

27

Navigating cost-performance with different compute and storage configurations is difficult.

Page 28: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

28

Navigating cost-performance with different compute and storage configurations is difficult.– How many serverless workers? Which size?

1

10

100

1000

10000

0 0.5 1 1.5 2

Shuf

fle T

ime

(s)

Worker Memory Size (GB)

20GB

Locus Performance Model

Page 29: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

29

Navigating cost-performance with different compute and storage configurations is difficult.– How many serverless workers? Which size?

1

10

100

1000

10000

0 0.5 1 1.5 2

Shuf

fle T

ime

(s)

Worker Memory Size (GB)

20GB

1TB

Locus Performance Model

Page 30: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

30

Navigating cost-performance with different compute and storage configurations is difficult.– How many serverless workers? Which size?

1

10

100

1000

10000

0 0.5 1 1.5 2

Shuf

fle T

ime

(s)

Worker Memory Size (GB)

20GB200GB1TB

Locus Performance Model

Page 31: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Performance model example: slow-storage only shuffle

31

! – IOPSS – shuffle sizeW – worker size

"2 = $%&%×(

" = max "1, "2

mappertask

mappertask

mappertask

Server

Server

Server

S – shuffle sizeB – worker BWP – parallelism

"1 = /0 × 1

Page 32: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Evaluation

32

• Implement Locus on top of PyWren– AWS Lambda– S3 (slow) – Redis (fast)

• Workloads:– CloudSort– TPC-DS– Big data benchmark

• Baseline:– PyWren– Apache Spark on VMs– Redshift

Page 33: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Evaluation

33

Reduce resource usage by up to 59% while achieving comparable performance!

Q1 Q16 Q94 Q95

clus

tert

ime

(cor

e·se

c)

Locus

Spark-58%

-59%

Q1 Q16 Q94 Q95av

erag

e qu

ery

time

(s)

Locus

Spark

TPC-DS

Page 34: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Related Works

34

Pocket (OSDI’18)• New elastic store that scales to application demand

Page 35: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage

Locus

35

• By judiciously combining slow and fast cloud storage, one can achieve cost-efficient shuffle for serverless analytics.

• Locus achieves this by:– Design algorithms that leverage storage characteristics– A performance model that captures the performance and cost

metrics of shuffle operations.

• 4×-500× performance improvements over baseline and reduce resource usage by up to 59% while achieving comparable performance with traditional analytics