Upload
shion-deysarkar
View
83
Download
0
Embed Size (px)
Citation preview
Building Resource Efficient Distributed Systems At Scale
Michael Pellon (@p3ll0n)Operations Engineer
In the ideal world . . .
. . . we want to be here
cost
wo
rk
But in the “real” world . . .
. . . we usually find ourselves here
cost
wo
rk
Big “jumps” are possible in a relatively short timeframe!
req
uest
s p
er s
eco
nd
~ 2009 - 2012
joules
~ 2013 - ???
RPS/dollar: 4.1xRPS/joule: 4.3xRPS/rack: 10.4x
Avoid “density without value”!
“Respect the problem.”
- Theo Schlossnagle, OmniTI
There is no free lunch.
Tradeoffs cannot be solved by marketing.
How to play with the “big boys” when you are not as “big” as them ...
Lesson #1
Understand deeply the relationship between latency, bandwidth and capacity
across all levels of your infrastructure.
< disk seeks = higher performance
> caching = higher performance
We end up with an ever increasing amount of our cheap DRAM is used to hide the terrible latency of our cheap storage.
This growing split between the bandwidth and latency of our storage systems only becomes apparent at large scale.
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Latency 1.17 1.07 1.12 1.09
Annual Bandwidth and Latency Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ CPU fastest to change and DRAM is the slowest.
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Latency 1.17 1.07 1.12 1.09
Annual Bandwidth and Latency Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ CPU fastest to change and DRAM is the slowest.
➔ Latency is driven by physical limits whereas bandwidth can be addressed through parallelism.
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Latency 1.17 1.07 1.12 1.09
Annual Bandwidth and Latency Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ CPU fastest to change and DRAM is the slowest.
➔ Latency is driven by physical limits whereas bandwidth can be addressed through parallelism.
➔ Bountiful bandwidth with lagging latency!
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Capacity -- 1.52 -- 1.48
Annual Bandwidth and Capacity Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ Widening gap between bandwidth and capacity.
➔ Widening gap between bandwidth and capacity.
➔ Time to read a complete disk with random IO is increasing 22x / decade or 36% / year.
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Capacity -- 1.52 -- 1.48
Annual Bandwidth and Capacity Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ Widening gap between bandwidth and capacity.
➔ Time to read a complete disk with random IO is increasing 22x / decade or 36% / year.
➔ Now our applications cannot afford to have a cache miss!
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Capacity -- 1.52 -- 1.48
Annual Bandwidth and Capacity Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
Solutions
Caching, prediction and replication.
Solutions
Caching, prediction and replication.
Tape is dead.Disk is tape.Flash is disk.
RAM locality is king.
- Jim Gray, Microsoft (2006)
Requires very careful attention to durability.
Solutions
Caching, prediction and replication.
Expend bandwidth to reduce apparent latency.
Solutions
Caching, prediction and replication.
Expend capacity to reduce apparent latency.
Avoid the problem entirely by using more servers with cheaper, lower powered processors that more closely
match the capabilities of the memory subsystem.
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65 ➔ ~25% the processing rate @ ~10% the cost!
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65 ➔ ~25% the processing rate @ ~10% the cost!
➔ Volume of the device ecosystem fuels innovation so the performance gap shrinks each generation!
➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!
➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!
➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!
➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!
➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!
➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks provisioning 12 - 14 kW racks just to fill it up 50%!)
➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!
➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!
➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks provisioning 12 - 14 kW racks just to fill it up 50%!)
➔ If you can save a lot on op-ex by spending a little more on cap-ex it’s a great bargain! (ask your CFO!)
➔ People costs dominate the enterprise player’s data centers but it is very easy and cheap to not let them dominate your costs.
➔ People costs dominate the enterprise player’s data centers but it is very easy and cheap to not let them dominate your costs.
➔ The barrier to entry into automation tools (Puppet, Chef, etc) has never been lower and their penetration into existing systems (networking devices, etc) has never been higher.
Lesson #2
Understand that distributed systems are fundamentally about dealing with
distance and having more than one thing.
Currently writing distributed applications is usually not indistinguishable from writing non-distributed applications.
Despite the non-zero probability of failure within a nearly every aspect of modern computers;
developers of non-distributed applications do not routinely maintain a concept of failing hardware.
complexity
instructions
behaviors
instructions
behaviors
programming language
hardwarelimitations
The difference between an entire data center and a single computer should only be quantitative not qualitative.
Since software development is an entirely quantitative pursuit we should be able to conceal the
entire complexity of the Internet within software.
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.
➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Apache Mesos, Airbnb’s Chronos, etc.)
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.
➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Apache Mesos, Airbnb Chronos, etc.)
➔ nanomsg scalability protocols (M. Sustrik).
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.
➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Mesos, Airbnb, etc.)
➔ nanomsg scalability protocols (M. Sustrik).
➔ Not only possible but the clear “silent” choice of the majority!
So how to play “big” when you’re “small”?
➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.
So how to play “big” when you’re “small”?
➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.
➔ That understanding will allow to you operate at economies of scale that free up your most important resource -- people.
So how to play “big” when you’re “small”?
➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.
➔ That understanding will allow to you operate at economies of scale that free up your most important resource -- people.
➔ But remember the focus of our resources is not necessarily where your resources should be focused nor is anyone elses.
So how to play “big” when you’re “small”?
➔ Look for areas where a qualitative difference could easily become merely a quantitative difference.
So how to play “big” when you’re “small”?
➔ Look for areas where a qualitative difference could easily become merely a quantitative difference.
➔ Quantitative problems are easy to solve through technology, however, qualitative problems are very intractable through technology alone.