J I A N Q I A O L I U @ T R A C K P O I N T 1 0R A M K I R A M A K R I S H N A @ Y S R 1 7 2 9A L E X W I L T S C H K O @ A W I L T S C H
T W I T T E R E N G I N E E R I N GA D V A N C E D T E C H N O L O G Y G R O U P
A U T O M A T E D J V M T U N I N GW I T H B A Y E S I A N O P T I M I Z A T I O N
M A T E R I A L D E V E L O P E D W I T H
I A N B R O W N @ I G BK E V I N S W E R S K Y @ K S W E R S KJ A S P E R S N O E K @ L A T E N T J A S P E RR Y A N A D A M S @ R Y A N _ P _ A D A M SH U G O L A R O C H E L L E @ H U G O L A R O C H E L L E
D A V E B A R R @ D A V E B A R RJ O H N C O O M E S @ J O H N _ C O O M E SI A N D O W N E S @ N D W N SD A V E R O B I N S O N @ D A V E R O B I N S O NC H R I S R E G A D O @ C H R I S R E G A D OT O D D S T U M P F @ S T U M P F
A C K N O W L E D G E M E N T S
• Jianqao Liu : Graduate Student, ECE at Purdue University (work done while a summer intern at Twitter San Francisco)
• Ramki Ramakrishna : Staff Engineer, JVM Engineering at Twitter San Francisco
• Alex Wiltschko : Research Engineer, Advanced Technology Group at Twitter Boston
JAVAONE 2016
WHO WE ARE
Source: “How we built a metering and chargeback system to incentivize higher resource utilization of Twitter infrastructure”, Micheal Arul, Vinu Charanya, LinuxCon 2016, Toronto, August 22-24, 2016.
• O(103) services• O(105) service instances• Heterogeneous hardware• Varying resources
JAVAONE 2016
TWITTER RUNS ON MICRO-SERVICES
JVM
Hardware
Mesos Container Mesos ContainerKernel + OS Services
JVMMicroservice A Microservice B
h1 , h2 , …
k1 , k2 , …
m1 , m2 , …
j1 , j2 , …
s1 , s2 , …
f(h,k,m,j,s)
JAVAONE 2016
A PERFORMANCE STACK AT TWITTERA simplified view
• Hotspot JVM has hundreds of tunable knobs:$ java -XX:+PrintFlagsFinal -version | grep "=”uintx AdaptiveSizePolicyWeight = 10 {product}uintx AdaptiveSizeThroughPutPolicy = 0 {product}uintx AdaptiveTimeWeight = 25 {product}bool AdjustConcurrency = false {product}bool AggressiveOpts = false {product}intx AliasLevel = 3 {C2 product}bool AlignVector = false {C2 product}…
$ java -XX:+PrintFlagsFinal -version | grep "=” | wc –l757
A large variety of parameters:
• performance-sensitivity• hardware-dependency• mutual (in)dependency
JAVAONE 2016
TUNING AT THE JVM LAYER
Hand-tuning doesn’t scale:• few parameters handled manually• time-consuming, labor-intensive, error-proneCargo-culted configurations
Upgrades make optimality fleeting
Hypothesis: Most micro-services operate below optimalityPreview: 80% improvement on a large service
JAVAONE 2016
PERFORMANCE OPTIMIZATIONNeeds to be continuous
• Given a function f(x1 , x2 , …, xn ) defined over domain X
• Find a configuration A = (a1 , a2 , …, an ) that maximizes f
JAVAONE 2016
PERFORMANCE TUNINGAs a formal optimization problem
• Simple constraints:
x1 < x2 : e.g. NewSize <= HeapSize
a < x3 <= b : e.g. 0 <= MaxTenuringThreshold <= 15
• More complex constraints:
g(x1 , x2 ) <= h(x3, x4 )
• Constraints on behavior:
w(X) < k : e.g. 99 percentile of response latency < 5 ms
r(X) = t(X) : e.g. no requests result in errors
JAVAONE 2016
PERFORMANCE TUNINGAs a constrained optimization problem
Environment may introduce hidden, uncontrollable, and possibly time-varying, parameters, e.g. :• inter-container cross-talk• seasonal/diurnal environmental conditions or load• heterogeneous hardware
JAVAONE 2016
PERFORMANCE TUNINGAs optimization of a noisy, non-stationary cost function
• Design (and refine) a suitable performance metric
• Decide on (and refine) knobs to tune
• Use an iterative strategy to tune these knobs
JAVAONE 2016
PERFORMANCE TUNING
Pick new parameters to testbased on results obtained
Measure performancewith new parameter settings
System being tuned
measureanalyze
JAVAONE 2016
PERFORMANCE TUNINGThe manual approach
Performance Engineer
Pick new parameters to testbased on results obtained
Measure performancewith new parameter settings
System being tuned
“evaluation”“suggestion”
JAVAONE 2016
PERFORMANCE TUNINGUsing an automation assistant
Black Box Tuning
Assistant
A technique from machine learning:Bayesian Optimization
• A machine learning approach to black-box optimization.• A method to learn (potentially noisy) cost functions
• iteratively
• efficiently• Finds very good answers very quickly on a wide variety of problems
I'll show you how it works in practice
JAVAONE 2016
HOW SHOULD WE BUILD AN AUTOMATION ASSISTANT?
Each experiment we run with a different setting of our parameter is expensive
JAVAONE 2016
BAYESIAN OPTIMIZATION
If choosing what experiments to run is important, how do we do it well?
JAVAONE 2016
BAYESIAN OPTIMIZATION
If choosing what experiments to run is important, how do we do it well?
JAVAONE 2016
BAYESIAN OPTIMIZATION
If choosing what experiments to run is important, how do we do it well?
JAVAONE 2016
BAYESIAN OPTIMIZATION
BAYESIAN OPTIMIZATIONIf choosing what experiments to run is important, how do we do it well?
JAVAONE 2016
Expected Improvement
BAYESIAN OPTIMIZATIONBayesOpt in action
Global optimum discovered
JAVAONE 2016
Expected Improvement
BayesOpt works in much higher dimensions than humans do
JAVAONE 2016
BAYESIAN OPTIMIZATION
What do we want an implementation of BayesOpt to look like in practice?• Easy-to-use• Minimal coding required by the user• Support multiple languages• Running concurrent experiments should be trivial
JAVAONE 2016
BUILDING AN AUTOTUNING SERVICEWhat should an ideal autotuning system look like?
JAVAONE 2016
BUILDING AN AUTOTUNING SERVICEAn example API call
LuaClients
LuaClients
LuaClient
LuaClients
LuaClientsMatlabClient
LuaClients
LuaClientsPythonClient
LuaClients
LuaClients
ScalaClient
WEBSERVER
LuaClients
LuaClients
CLIClient
Middleware
Worker(Queue and State Manager)
BayesOpt Engine
Auto-Scaling Group
Queue
Mesos JAVAONE 2016
BUILDING AN AUTOTUNING SERVICEThe service layout of BayesOpt at Twitter
ALTERNATIVE APPROACHESBayesOpt isn't the only way to do it, but it's by far our favorite
Random Search (Bergstra 2012, shockingly good for zero effort!)Parzen Trees (Bergstra 2011)Random Forests (Hutter et al., 2011)Reinforcement Learning (e.g., Google Datacenter Cooling)
We prefer BayesOpt because it's• Robust• Extensible• Battle-tested on many types of real-world, high-impact
problems.
JAVAONE 2016
BAYESOPT WINS AT TWITTERWe're just getting started
Spam detection (+8%)Abuse Detection (+6%)All deep learning applications ("set it and forget it" prototyping)Vine video recommendations (+30% user engagement on recs)Hadoop cost reduction (-80% cost)Revenue applicationsJVM performance (take it away, Jianqiao!)
JAVAONE 2016
Garbage Collector TypeNew Generation SizeSurvivor RatioParallel GC ThreadsConcurrent GC ThreadsPre-fetch Interval SizeClip In-liningBiased LockingThere are dozens more
JAVAONE 2016
A SAMPLING OF JVM PARAMETERS
J1, J2, … , Jn
F(J)
Kernel
JVM
ServiceS1, S2, … , Sm
K1, K2, … , Ko
HardwareH1, H2, … , Hp
F(J,S,K,H,…)
JAVAONE 2016
A TYPICAL MICRO-SERVICE STACK
J1, J2, … , Jn
F(J)
Kernel
JVM
SPECjbb2015
JAVAONE 2016
SPECJBB2015A JVM benchmark
J1, J2, … , Jn
F(J)
Kernel
JVM
Microservice A
JAVAONE 2016
A MICROSERVICE ON THE JVM
• A large production service
• Access to User Objects via a Thrift interface
• Why this service?
• Mature, does not undergo frequent redeploys
• Large number of service instances
JAVAONE 2016
MICROSERVICE "A"
• Environment• Microservice staging environment• Real production traffic (dark read)• Portion of production workload
• Performance metric• RPS: requests per second• GC_cost: wall-clock time spent in gc
• Perf = RPS / GC_Cost
JAVAONE 2016
THE SET-UP
Scor
era
tio
Version Controlled File
StoreService
BayesOptService
Shard #1
Shard #0
JVM TuningService
Aurora Scheduler
Shard #4
Observability/Metrics Service
Mesos
Restart
Stop
Sug
ges
tion
Baseline and
Experim
ent scores
Con
fig
Shard #3
Shard #2
1. Get a new parm suggestion from BayesOpt
2. Generate JVM configuration
3. Upload new configuration to File Store Service
4. Get baseline platform information
5. Stop-and-restart test instance on specific hardware
6. Test for a fixed duration
7. Get valid baselines
8. Obtain baseline performance score
9. Obtain performance score from experiment
10. Compute the ratio and inform BayesOpt
Microservice A
33
0
0.5
1
1.5
2
2.5
0 10 20 30 40 50 60 70 80 90Iterations
Per
form
ance
sco
re r
atio
JAVAONE 2016
EVALUATIONBayesOpt in action
PERFORMANCE OF THE OPTIMUM RESULT
Performance ratio: 590.253 / 324.237 = 182.0%
JAVAONE 2016
REQUESTS PER SECONDOf the optimum BayesOpt result
JAVAONE 2016
GC COSTOf the optimum BayesOpt result
GC_cost ratio: 15.3/ 27.9 = 54.8%, reduce by 45%
JAVAONE 2016
Apples-to-apples comparison:• Over 50 different platforms in Mesos; ~12 major platforms
System services or Python bugs could cause failure:• timeout (subprocess)• return empty content• throw exceptions
Mesos/Aurora scheduler might interrupt & restart an instance
Authentication service ticket timeout
JAVAONE 2016
PRACTICAL EFFORT
• Several concurrent evaluations• Trade off with longer experiment duration• Experiment set per hardware platform• Terminate obviously poor suggestions early
• Stress test/validation of optimal configuration(s)
• General framework/service for optimizing an arbitrary micro-service
• Clean up service authentication and use more robust, official APIs• Replace Python subprocess calls with direct calls to Python APIs
JAVAONE 2016
NEXT STEPS
Optimal Parameter Settings:• Similar new generation size• Smaller tenuring threshold, smaller survivor spaces• Larger prefetch read interval for GC scan• Old generation promotion allocator filter parameters• More GC threads• Higher compilation size threshold
Performance Gains:• GC overhead• Tail response latency• Data center footprint
JAVAONE 2016
LESSONS FROM OPTIMAL RESULT
• Choice of performance function
• Choice of parameters to tune
• Duration of evaluation runs
• Concurrent evaluations
• Factor out hardware effects
• Protect against noise
• Use baseline configurations
• Long range effects
• Stress-testing to filter optima
JAVAONE 2016
MORE LESSONS
• BayesOpt suggestions/evaluations may be sub-optimal• Reliability and redundancy designed into micro-services
architecture• Existing monitors/alarms/alerts/sensors/telemetry
JAVAONE 2016
AUTOMATED PERFORMANCE TUNINGLeverage existing micro-services infrastructure
• Continuous, inexpensive automated optimization of micro-services is possible, even inevitable
• BayesOpt reduces the number of costly experiments to quickly find a near-optimal setting
• Existing micro-services and DevOps frameworks already have most of the infrastructure to support this
JAVAONE 2016
CONCLUSIONAutomated performance optimization in the DevOps deployment workflow
QUESTIONS ?
THANK YOU !