39
@xaprb Quantifying Scalability With the Universal Scalability Law Baron Schwartz - October 2017

Quantifying Scalability With The USL

Embed Size (px)

Citation preview

Page 1: Quantifying Scalability With The USL

@xaprb

Quantifying ScalabilityWith the Universal Scalability LawBaron Schwartz - October 2017

Page 2: Quantifying Scalability With The USL

@xaprb

About MeFounder of VividCortex

Wrote High Performance MySQL

Love to hear from you: @xaprb and [email protected]

2

Page 3: Quantifying Scalability With The USL

@xaprb

How Systems Fail Under Load

You’ve seen systems become sluggish under high load

How can we describe and reason about what’s happening?

3

Page 4: Quantifying Scalability With The USL

@xaprb

Failure Boundaries

Cook and Rasmussen describe failure boundaries around the operating domain

One such is the unacceptable workload boundary

4

Page 5: Quantifying Scalability With The USL

@xaprb

Workload Failure Isn’t CrispUnacceptable workload is not sharply defined, it’s a gradient

Cook lists 18 precepts of system failure in “How Complex Systems Fail”

#5: Complex systems run in degraded mode

5

Page 6: Quantifying Scalability With The USL

@xaprb

Workload Failure Isn’t CrispCook introduces error margin. What’s the workload margin?

What if you drift into it?

6

Page 7: Quantifying Scalability With The USL

@xaprb

Capacity

Systems can, and do, function beyond their capacity limits

How can we define and reason about system capacity?

7

Page 8: Quantifying Scalability With The USL

@xaprb

Queueing Theory

There’s a branch of operations research called queueing theory

It analyzes what happens to customers when systems get busy

Difficult to apply in “the real world” of capacity & ops

8

Page 9: Quantifying Scalability With The USL

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

2025

Utilization

Res

iden

ce T

ime

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

2025

The hockey stick curve is difficult to

use in practicevery nonlinear and

hard for humans to intuit

Page 10: Quantifying Scalability With The USL

@xaprb

0 2 4 6 8 10

05000

15000

nodes

throughput

Scaling A System: IdealSuppose a clustered system can do X work per unit of time

Ideally, if you double the cluster size, it can do 2X work

10

Page 11: Quantifying Scalability With The USL

@xaprb

0 2 4 6 8 10

05000

15000

nodes

throughput

EquationThe linear scalability equation:

where 𝜆 is the slope of the line

11

Page 12: Quantifying Scalability With The USL

@xaprb

But Our Cluster Isn’t PerfectSpeedup by executing tasks in parallel, e.g. ~ scatter-gather

What happens to performance if some portion isn’t parallelizable?

12

Page 13: Quantifying Scalability With The USL

@xaprb

0 2 4 6 8 10

05000

15000

nodes

throughput

0 2 4 6 8 10

05000

15000

nodes

throughput

Amdahl’s LawAmdahl’s Law describes the fraction 𝜎 that can’t be done in parallel

Adding nodes provides some speedup, but there’s a ceiling

13

Page 14: Quantifying Scalability With The USL

@xaprb

But What If Workers Coordinate?Suppose the parallel workers have dependencies on each other?

14

Page 15: Quantifying Scalability With The USL

@xaprb

N Workers = N(N-1) Pairs

15

Page 16: Quantifying Scalability With The USL

@xaprb 16

Represent crosstalk (coherence) penalty by coefficient 𝜅

The system get less work done as it gets more load!

0 2 4 6 8 10

05000

15000

nodes

throughput

0 2 4 6 8 10

05000

15000

nodes

throughput

Universal Scalability Law

Page 17: Quantifying Scalability With The USL

@xaprb

Crosstalk Penalty Grows Fast

17

𝜅

𝜎

when we reach saturation, 𝜅 is growing very rapidly, againcreating very nonlinear behavior

Page 18: Quantifying Scalability With The USL

@xaprb

Experiment Interactivelydesmos.com/calculator/3cycsgdl0b

Page 19: Quantifying Scalability With The USL

@xaprb

What is Scalability?The USL is a mathematical definition of scalability

It’s a function that turns workload into throughput

19

Page 20: Quantifying Scalability With The USL

@xaprb

But What Is Load?In most circumstances we care about, load is concurrency

Concurrency is the number of requests in progress

It’s surprisingly easy to measure: sum(latency)/interval

Many systems emit it as telemetry

• MySQL: SHOW STATUS LIKE ‘Threads_running’

• Apache: active worker count

20

Page 21: Quantifying Scalability With The USL

@xaprb

The Failure Boundary Is NonlinearThis region is highly nonlinear and unintuitive

21

Page 22: Quantifying Scalability With The USL

Four Great Uses Of The USL

Page 23: Quantifying Scalability With The USL

@xaprb

0 20 40 60 80

05000

10000

15000

Size

Throughput

1. Forecast Workload Failure Boundary

The USL can reveal the workload failure boundary approaching

23

Page 24: Quantifying Scalability With The USL

@xaprb

0 10 20 30 40 50 60

04000

8000

12000

Size

Throughput

1. Forecast Workload Failure BoundaryYou can use regression to extract the coefficients, then plot

Or pot and eyeball to see if you’re getting near the edge

24

Page 25: Quantifying Scalability With The USL

@xaprb

1. Forecast Workload Failure BoundaryCoda Hale wrote a thing about the USL

https://codahale.com/usl4j-and-you/

25

Page 26: Quantifying Scalability With The USL

@xaprb

2. Characterize Non-Scalability

Why doesn’t your system scale perfectly?

The USL reveals amount of serialization vs crosstalk

26

Page 27: Quantifying Scalability With The USL

@xaprb

2. Characterize Non-ScalabilityPaypal’s NodeJS vs Java benchmarks are a good example!

27

https://www.vividcortex.com/blog/2013/12/09/analysis-of-paypals-node-vs-java-benchmarks/

Page 28: Quantifying Scalability With The USL

@xaprb

3. How Scalable SHOULD It Be?

The USL is a framework for making systems look really bad

Many 10+ node MPP databases barely do anything per-node

Calculate per-node a) clients b) data size c) throughput

One 18-node database: 4000 QPS ~220 QPS/node, 5ms latency

28

Page 29: Quantifying Scalability With The USL

@xaprb

3. How Scalable SHOULD It Be?This is an animation of how Citus’s distributed database worksFor the record: Citus isn’t one of the terribly unscalable DB’s

29

Page 30: Quantifying Scalability With The USL

@xaprb

4. See Your Teams As Systems

30

Page 31: Quantifying Scalability With The USL

@xaprb

4. See Your Teams As Systems

“To go fast, go alone. To go far, go together.”Adrian Colyer wrote a good blog post about teams-as-systems and USL

https://blog.acolyer.org/2015/04/29/applying-the-universal-scalability-law-to-organisations/

31

Page 32: Quantifying Scalability With The USL

@xaprb

4. See Your Teams As Systems

The USL isn’t novel in that sense…

Paging Allspaw, Dr. Allspaw,Dr. Allspaw to the operating

theater please

32

Page 33: Quantifying Scalability With The USL

@xaprb

What Else Can The USL Illuminate?

Open-plan offices: My work takes more work when others are nearby

Map-Reduce: That’s a whole lotta overhead, but it sure is scalable

Mutexes: Theoretically just serialize, but those damn OS schedulers

33

Page 34: Quantifying Scalability With The USL

@xaprb

0 5000 10000 15000

0.000

0.002

0.004

Throughput

Latency

What’s NOT Scalability?I commonly see throughput-vs-latency charts

This seems legit till you get systems under high load

34

Page 35: Quantifying Scalability With The USL

@xaprb

Scalability Isn’t Throughput-vs-Latency

The throughput-vs-latency equation has two solutions

35

Page 36: Quantifying Scalability With The USL

@xaprb

0 5 10 15 20 25 30

0.00

120.

0016

0.00

20

Concurrency

Res

pons

e Ti

me

Concurrency-vs-Latency is OKIt’s a simple quadratic per Little’s Law, and is quite useful

36

Page 37: Quantifying Scalability With The USL

@xaprb

Some ResourcesI wrote a book.

I created an Excel sheet.

37

Page 38: Quantifying Scalability With The USL

@xaprb

Conclusions

Workload failure boundaries are usually elastic

You can learn a ton about your systems near the boundaries

Boundary behavior is much less linear than you expect

Don’t fear the load, let it teach you resiliency

38

Page 39: Quantifying Scalability With The USL

@xaprb

Further Reading/References

• https://www.vividcortex.com/resources/ for ebook, Excel worksheet

• http://www.perfdynamics.com/Manifesto/USLscalability.html for the original source

39