Microservices Practitioner Summit Jan '15 - Scaling Uber from 1 to 100s of Services - Matt...

Preview:

Citation preview

SCALING UBERMATT RANNEY

As of January 2016:

Uber Cities Worldwide: 361Countries: 67Employees: 5,400Engineers: 1,700US Driver Payments Jan-Oct 2015: $3.5B

UBER ENGINEERING HISTORY

2009-2010 Outsourced PHP + MySQL

Jan 2011 "dispatch" - Node.JS/MongoDB

Jan 2011 “API” - Python/SQLAlchemy/MySQL

Feb 2012 Dispatch swaps MongoDB for Redis

May 2012 Dispatch adds ON fallback

Jan 2013 First non-API Python services

Feb 2013 API switched to Postgres

Mar 2014 New Python services use MySQL

Mar 2014 Schemaless begins, must finish before pg collapse

Sep 2014 First Schemaless - trips out of Postgres

Aug 2015 Dispatch X.0 / Ringpop / Riak

Jan 2016 Go, Java, Cloud, More Abstractions

TECHNICAL DEBT

Credit: NASA, ESA, and R. Thompson (Univ. Arizona)

Credit: NASA, ESA, and Z. Levay (STScI/AURA)

MICROSERVICESImmutable?Append Only?

Node.JSPythonGoJava

SCALING NODE

Getting out of the HTTP+JSON businessHTTP is slow, complex, and inconsistentJSON is hard to validate, awkward in non-nodeThrift is OK, but generated code is bad

SERVICE DISCOVERY

Lots of services, lots of instancesMostly Node.JS and PythonCall graph unknowableSelf-inflicted DoSCascading failures

load balancerservice A

service B

service B

load balancer

service A service B

service B

horizontal scalabilityzipkin tracingcircuit breakingrate limitingfailure testablealmost no configurationas available as possible

overall latency ≥ latency of slowest component1ms avg, 1000ms p99use 1: 1% at least 1000msuse 100: 63% at least 1000ms1.0 - 0.99^100 = 0.634 = 63.4%

LATENCY

requ

ests

that

are

slo

w

0%

25%

50%

75%

100%

Processes Used

1 2 4 8 16 32 64 128 256 512 1024

p95 p99 p99.9

CULTURAL CHANGES

FAILURE TESTING

RETRIES

partner app dispatch DC1Location Updates

State Digest

dispatch DC2

Location UpdatesState Request

THANKS

Recommended