60
Building Resource Efficient Distributed Systems At Scale Michael Pellon (@p3ll0n) Operations Engineer

Acug datafiniti pellon_sept2013

Embed Size (px)

Citation preview

Page 1: Acug datafiniti pellon_sept2013

Building Resource Efficient Distributed Systems At Scale

Michael Pellon (@p3ll0n)Operations Engineer

Page 2: Acug datafiniti pellon_sept2013

In the ideal world . . .

. . . we want to be here

cost

wo

rk

Page 3: Acug datafiniti pellon_sept2013

But in the “real” world . . .

. . . we usually find ourselves here

cost

wo

rk

Page 4: Acug datafiniti pellon_sept2013

Big “jumps” are possible in a relatively short timeframe!

req

uest

s p

er s

eco

nd

~ 2009 - 2012

joules

~ 2013 - ???

RPS/dollar: 4.1xRPS/joule: 4.3xRPS/rack: 10.4x

Page 5: Acug datafiniti pellon_sept2013

Avoid “density without value”!

Page 6: Acug datafiniti pellon_sept2013

“Respect the problem.”

- Theo Schlossnagle, OmniTI

Page 7: Acug datafiniti pellon_sept2013

There is no free lunch.

Page 8: Acug datafiniti pellon_sept2013

Tradeoffs cannot be solved by marketing.

Page 9: Acug datafiniti pellon_sept2013

How to play with the “big boys” when you are not as “big” as them ...

Page 10: Acug datafiniti pellon_sept2013

Lesson #1

Understand deeply the relationship between latency, bandwidth and capacity

across all levels of your infrastructure.

Page 11: Acug datafiniti pellon_sept2013

< disk seeks = higher performance

Page 12: Acug datafiniti pellon_sept2013

> caching = higher performance

Page 13: Acug datafiniti pellon_sept2013

We end up with an ever increasing amount of our cheap DRAM is used to hide the terrible latency of our cheap storage.

Page 14: Acug datafiniti pellon_sept2013

This growing split between the bandwidth and latency of our storage systems only becomes apparent at large scale.

Page 15: Acug datafiniti pellon_sept2013

CPU DRAM LAN Disk

Bandwidth 1.50 1.27 1.39 1.28

Latency 1.17 1.07 1.12 1.09

Annual Bandwidth and Latency Improvements (Patterson, 2004)

* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.

➔ CPU fastest to change and DRAM is the slowest.

Page 16: Acug datafiniti pellon_sept2013

CPU DRAM LAN Disk

Bandwidth 1.50 1.27 1.39 1.28

Latency 1.17 1.07 1.12 1.09

Annual Bandwidth and Latency Improvements (Patterson, 2004)

* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.

➔ CPU fastest to change and DRAM is the slowest.

➔ Latency is driven by physical limits whereas bandwidth can be addressed through parallelism.

Page 17: Acug datafiniti pellon_sept2013

CPU DRAM LAN Disk

Bandwidth 1.50 1.27 1.39 1.28

Latency 1.17 1.07 1.12 1.09

Annual Bandwidth and Latency Improvements (Patterson, 2004)

* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.

➔ CPU fastest to change and DRAM is the slowest.

➔ Latency is driven by physical limits whereas bandwidth can be addressed through parallelism.

➔ Bountiful bandwidth with lagging latency!

Page 18: Acug datafiniti pellon_sept2013

CPU DRAM LAN Disk

Bandwidth 1.50 1.27 1.39 1.28

Capacity -- 1.52 -- 1.48

Annual Bandwidth and Capacity Improvements (Patterson, 2004)

* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.

➔ Widening gap between bandwidth and capacity.

Page 19: Acug datafiniti pellon_sept2013

➔ Widening gap between bandwidth and capacity.

➔ Time to read a complete disk with random IO is increasing 22x / decade or 36% / year.

CPU DRAM LAN Disk

Bandwidth 1.50 1.27 1.39 1.28

Capacity -- 1.52 -- 1.48

Annual Bandwidth and Capacity Improvements (Patterson, 2004)

* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.

Page 20: Acug datafiniti pellon_sept2013

➔ Widening gap between bandwidth and capacity.

➔ Time to read a complete disk with random IO is increasing 22x / decade or 36% / year.

➔ Now our applications cannot afford to have a cache miss!

CPU DRAM LAN Disk

Bandwidth 1.50 1.27 1.39 1.28

Capacity -- 1.52 -- 1.48

Annual Bandwidth and Capacity Improvements (Patterson, 2004)

* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.

Page 21: Acug datafiniti pellon_sept2013

Solutions

Caching, prediction and replication.

Page 22: Acug datafiniti pellon_sept2013

Solutions

Caching, prediction and replication.

Page 23: Acug datafiniti pellon_sept2013

Tape is dead.Disk is tape.Flash is disk.

RAM locality is king.

- Jim Gray, Microsoft (2006)

Page 24: Acug datafiniti pellon_sept2013

Requires very careful attention to durability.

Page 25: Acug datafiniti pellon_sept2013

Solutions

Caching, prediction and replication.

Page 26: Acug datafiniti pellon_sept2013

Expend bandwidth to reduce apparent latency.

Page 27: Acug datafiniti pellon_sept2013

Solutions

Caching, prediction and replication.

Page 28: Acug datafiniti pellon_sept2013

Expend capacity to reduce apparent latency.

Page 29: Acug datafiniti pellon_sept2013

Avoid the problem entirely by using more servers with cheaper, lower powered processors that more closely

match the capabilities of the memory subsystem.

Page 30: Acug datafiniti pellon_sept2013

➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.

Page 31: Acug datafiniti pellon_sept2013

➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.

➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.

Page 32: Acug datafiniti pellon_sept2013

➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.

➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.

➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.

Page 33: Acug datafiniti pellon_sept2013

➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.

➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.

➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.

➔ Server CPU = ~$300 - ~$1000

Page 34: Acug datafiniti pellon_sept2013

➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.

➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.

➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.

➔ Server CPU = ~$300 - ~$1000

➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65

Page 35: Acug datafiniti pellon_sept2013

➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.

➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.

➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.

➔ Server CPU = ~$300 - ~$1000

➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65 ➔ ~25% the processing rate @ ~10% the cost!

Page 36: Acug datafiniti pellon_sept2013

➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.

➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.

➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.

➔ Server CPU = ~$300 - ~$1000

➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65 ➔ ~25% the processing rate @ ~10% the cost!

➔ Volume of the device ecosystem fuels innovation so the performance gap shrinks each generation!

Page 37: Acug datafiniti pellon_sept2013

➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!

Page 38: Acug datafiniti pellon_sept2013

➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!

➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!

Page 39: Acug datafiniti pellon_sept2013

➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!

➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!

➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks provisioning 12 - 14 kW racks just to fill it up 50%!)

Page 40: Acug datafiniti pellon_sept2013

➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!

➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!

➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks provisioning 12 - 14 kW racks just to fill it up 50%!)

➔ If you can save a lot on op-ex by spending a little more on cap-ex it’s a great bargain! (ask your CFO!)

Page 41: Acug datafiniti pellon_sept2013

➔ People costs dominate the enterprise player’s data centers but it is very easy and cheap to not let them dominate your costs.

Page 42: Acug datafiniti pellon_sept2013

➔ People costs dominate the enterprise player’s data centers but it is very easy and cheap to not let them dominate your costs.

➔ The barrier to entry into automation tools (Puppet, Chef, etc) has never been lower and their penetration into existing systems (networking devices, etc) has never been higher.

Page 43: Acug datafiniti pellon_sept2013

Lesson #2

Understand that distributed systems are fundamentally about dealing with

distance and having more than one thing.

Page 44: Acug datafiniti pellon_sept2013

Currently writing distributed applications is usually not indistinguishable from writing non-distributed applications.

Page 45: Acug datafiniti pellon_sept2013

Despite the non-zero probability of failure within a nearly every aspect of modern computers;

developers of non-distributed applications do not routinely maintain a concept of failing hardware.

Page 46: Acug datafiniti pellon_sept2013

complexity

Page 47: Acug datafiniti pellon_sept2013

instructions

behaviors

Page 48: Acug datafiniti pellon_sept2013

instructions

behaviors

programming language

hardwarelimitations

Page 49: Acug datafiniti pellon_sept2013

The difference between an entire data center and a single computer should only be quantitative not qualitative.

Page 50: Acug datafiniti pellon_sept2013

Since software development is an entirely quantitative pursuit we should be able to conceal the

entire complexity of the Internet within software.

Page 51: Acug datafiniti pellon_sept2013

A clear trajectory in the same direction …

➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).

Page 52: Acug datafiniti pellon_sept2013

A clear trajectory in the same direction …

➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).

➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.

Page 53: Acug datafiniti pellon_sept2013

A clear trajectory in the same direction …

➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).

➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.

➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Apache Mesos, Airbnb’s Chronos, etc.)

Page 54: Acug datafiniti pellon_sept2013

A clear trajectory in the same direction …

➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).

➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.

➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Apache Mesos, Airbnb Chronos, etc.)

➔ nanomsg scalability protocols (M. Sustrik).

Page 55: Acug datafiniti pellon_sept2013

A clear trajectory in the same direction …

➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).

➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.

➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Mesos, Airbnb, etc.)

➔ nanomsg scalability protocols (M. Sustrik).

➔ Not only possible but the clear “silent” choice of the majority!

Page 56: Acug datafiniti pellon_sept2013

So how to play “big” when you’re “small”?

➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.

Page 57: Acug datafiniti pellon_sept2013

So how to play “big” when you’re “small”?

➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.

➔ That understanding will allow to you operate at economies of scale that free up your most important resource -- people.

Page 58: Acug datafiniti pellon_sept2013

So how to play “big” when you’re “small”?

➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.

➔ That understanding will allow to you operate at economies of scale that free up your most important resource -- people.

➔ But remember the focus of our resources is not necessarily where your resources should be focused nor is anyone elses.

Page 59: Acug datafiniti pellon_sept2013

So how to play “big” when you’re “small”?

➔ Look for areas where a qualitative difference could easily become merely a quantitative difference.

Page 60: Acug datafiniti pellon_sept2013

So how to play “big” when you’re “small”?

➔ Look for areas where a qualitative difference could easily become merely a quantitative difference.

➔ Quantitative problems are easy to solve through technology, however, qualitative problems are very intractable through technology alone.