SCALING UBERMATT RANNEY
As of January 2016:
Uber Cities Worldwide: 361Countries: 67Employees: 5,400Engineers: 1,700US Driver Payments Jan-Oct 2015: $3.5B
UBER ENGINEERING HISTORY
2009-2010 Outsourced PHP + MySQL
Jan 2011 "dispatch" - Node.JS/MongoDB
Jan 2011 “API” - Python/SQLAlchemy/MySQL
Feb 2012 Dispatch swaps MongoDB for Redis
May 2012 Dispatch adds ON fallback
Jan 2013 First non-API Python services
Feb 2013 API switched to Postgres
Mar 2014 New Python services use MySQL
Mar 2014 Schemaless begins, must finish before pg collapse
Sep 2014 First Schemaless - trips out of Postgres
Aug 2015 Dispatch X.0 / Ringpop / Riak
Jan 2016 Go, Java, Cloud, More Abstractions
TECHNICAL DEBT
Credit: NASA, ESA, and R. Thompson (Univ. Arizona)
Credit: NASA, ESA, and Z. Levay (STScI/AURA)
MICROSERVICESImmutable?Append Only?
Node.JSPythonGoJava
SCALING NODE
Getting out of the HTTP+JSON businessHTTP is slow, complex, and inconsistentJSON is hard to validate, awkward in non-nodeThrift is OK, but generated code is bad
SERVICE DISCOVERY
Lots of services, lots of instancesMostly Node.JS and PythonCall graph unknowableSelf-inflicted DoSCascading failures
load balancerservice A
service B
service B
load balancer
service A service B
service B
horizontal scalabilityzipkin tracingcircuit breakingrate limitingfailure testablealmost no configurationas available as possible
overall latency ≥ latency of slowest component1ms avg, 1000ms p99use 1: 1% at least 1000msuse 100: 63% at least 1000ms1.0 - 0.99^100 = 0.634 = 63.4%
LATENCY
requ
ests
that
are
slo
w
0%
25%
50%
75%
100%
Processes Used
1 2 4 8 16 32 64 128 256 512 1024
p95 p99 p99.9
CULTURAL CHANGES
FAILURE TESTING
RETRIES
partner app dispatch DC1Location Updates
State Digest
dispatch DC2
Location UpdatesState Request
THANKS