41
WHAT I WISH I KNEW BEFORE SCALING UBER TO 1,000 SERVICES MATT RANNEY

GOTO Chicago 2016

Embed Size (px)

Citation preview

Page 1: GOTO Chicago 2016

WHAT I WISH I KNEW BEFORE SCALING UBER TO 1,000 SERVICESMATT RANNEY

Page 2: GOTO Chicago 2016

WHAT I WISH I KNEW BEFORE SCALING UBER TO 1,000 SERVICESMATT RANNEY

Page 3: GOTO Chicago 2016
Page 4: GOTO Chicago 2016
Page 5: GOTO Chicago 2016

As of April 2016:

Uber Cities Worldwide: 400+ Countries: 70 Employees: 6,000+

Page 6: GOTO Chicago 2016

LIFE LESSONS

Page 7: GOTO Chicago 2016
Page 8: GOTO Chicago 2016

MICROSERVICESImmutable? Append Only?

Page 9: GOTO Chicago 2016

WHY MICROSERVICES?Move and Release Independently Own your Uptime Use the “Best” tool for the job

Page 10: GOTO Chicago 2016

WHAT ARE THE COSTS?Now you have a distributed system Everything is an RPC What if it breaks?

Page 11: GOTO Chicago 2016

LESS OBVIOUS COSTSEverything is a tradeoff You can build around problems Might trade complexity for politics You get to keep your biases

Page 12: GOTO Chicago 2016

pre-history PHP (outsourced)

Dispatch Node.JS, moving Go

Core Services Python, moving to Go

Maps Python and Java

Data Python and Java

Metrics Go

Page 13: GOTO Chicago 2016

LANGUAGESHard to share code Hard to move between teams WIWIK: Fragments the culture

Page 14: GOTO Chicago 2016

RPCHTTP/REST gets complicated JSON needs a schema RPCs are slower than PCs WIWIK: servers are not browsers

Page 15: GOTO Chicago 2016

HOW MANY REPOSMany is good One is good Many is bad One is bad

Page 16: GOTO Chicago 2016
Page 17: GOTO Chicago 2016

APRIL 2016

MAY 2016

Page 18: GOTO Chicago 2016

OPERATIONALWhat happens when things break? Can other teams release your service? Understand a service in the larger context

Page 19: GOTO Chicago 2016

PERFORMANCEDepends on language tools

Page 20: GOTO Chicago 2016
Page 21: GOTO Chicago 2016
Page 22: GOTO Chicago 2016
Page 23: GOTO Chicago 2016
Page 24: GOTO Chicago 2016

PERFORMANCEDoesn’t matter until it does Probably want at least simple perf requirements WIWIK: “good” not required, but “known” is

Page 25: GOTO Chicago 2016

overall latency ≥ latency of slowest 1ms avg, 1000ms p99 use 1: 1% at least 1000ms use 100: 63% at least 1000ms 1.0 - 0.99^100 = 0.634 = 63.4%

FANOUT

Page 26: GOTO Chicago 2016

req

uest

s th

at a

re s

low

0%

25%

50%

75%

100%

Processes Used

1 2 4 8 16 32 64 128 256 512 1024

p95 p99 p99.9

Page 27: GOTO Chicago 2016

TRACINGLots of ways to get this Best way to understand fanout

Page 28: GOTO Chicago 2016
Page 29: GOTO Chicago 2016
Page 30: GOTO Chicago 2016
Page 31: GOTO Chicago 2016

TRACINGProbably want sampling WIWIK: cross-lang context propagation

Page 32: GOTO Chicago 2016

LOGGINGNeed consistent, structured logging Multiple languages makes this hard Logging floods can amplify problems WIWIK: Accounting

Page 33: GOTO Chicago 2016
Page 34: GOTO Chicago 2016
Page 35: GOTO Chicago 2016

LOAD TESTINGNeed to test against production Without breaking metrics Preferably all the time WIWIK: all systems need to handle “test” traffic

Page 36: GOTO Chicago 2016

FAILURE TESTINGWIWIK: people won’t like it

Page 37: GOTO Chicago 2016

MIGRATIONSOld stuff still has to work What happened to immutable? WIWIK: mandates are bad

Page 38: GOTO Chicago 2016

OPEN SOURCEBuild/buy tradeoff is hard Commoditization WIWIK: this will make people sad

Page 39: GOTO Chicago 2016

POLITICSServices allow people to play politics Company > Team > Self

Page 40: GOTO Chicago 2016

TRADEOFFSEverything is a tradeoff Try to make them intentionally

Page 41: GOTO Chicago 2016

THANKS