49
© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission. Performance tuning Grails applications by Lari Hotari @lhotari

SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Embed Size (px)

Citation preview

Page 1: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Performance tuning Grails applications

by Lari Hotari @lhotari

Page 2: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts

of their programs, and these attempts at efficiency actually have a strong negative impact when debugging

and maintenance are considered. We should forget about small efficiencies, say about 97% of the time:

premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

- Donald Knuth, 1974

2

Page 3: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Mature performance optimisation

• Find out the quality requirements of your solution. Keep learning about them and keep them up-to-date. It's a moving target.

• Keep up the clarity and the consistency of your solution. • Don't introduce accidental complexity. • Don't do things just "because this is faster" or

someone thinks so. • Start doing mature performance tuning and optimisation

today!3

Page 4: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

How do we define application performance?

Page 5: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Performance aspects

• Latency of operations • Throughput of operations • Quality of operations - efficiency, usability,

responsiveness, correctness, consistency, integrity, reliability, availability, resilience, robustness, recoverability, security, safety, maintainability

5

Page 6: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Amdahl's law

6

Page 7: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Little's law

7

MeanNumberInSystem = MeanThroughput * MeanResponseTime →

MeanThroughput = MeanNumberInSystem / MeanResponseTime

L = λW

Page 8: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Lari's Grails Performance Tuning Method™

TL;DR

Page 9: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Lari's Grails Performance Tuning Method™• Look for 3 things:

• Slow database operations - use a profiler that shows SQL statements

• Thread blocking - shows up as high object monitor usage in the profiler

• Exceptions used in normal program flow - easy to check in profiler

• Pick the low hanging fruits • Find the most limiting bottleneck and eliminate it • Iterate 9

Page 10: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

The goals of Grails application tuning

What are we aiming for?

Page 11: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

What's the goal of performance tuning?

• The primary goal of performance tuning is to assist in fulfilling the quality requirements and constraints of your system.

• Meeting the quality requirements makes you and your stakeholders happy: your customers, your business owners, and you the dev&ops.

11

Page 12: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Performance - Quality of operations

• efficiency • usability • responsiveness • correctness • consistency / integrity /

reliability !

!

!

• availability • resilience / robustness /

recoverability • security / safety • maintainability

12

Page 13: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Operational efficiency

• Tuning your system to meet it's quality requirements with optimal cost

• Optimising costs to run your system - operational efficiency

13

Page 14: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

The continuous improvement strategy for performance tuning anything

How do you succeed in performance tuning?

Page 15: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Performance tuning improvement cycle• Measure & profile

o start with the tools you have available. You can add more tools and methods in the next iteration.

• Think & learn, analyse and plan the next change

o find tools and methods to measure something in the next iteration you want to know about more

• Implement a change 15

Performance tuning feedback

cycle

Measure & profile

Think and

Learn

Do tuning and fixes

Page 16: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Iterate, Iterate, Iterate

• Iterate: do a lot of iterations and change one thing at a time

• learn gradually about your system's performance and operational aspects

16

Page 17: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Feedback from production

• Set up a different feedback cycle for production environments.

• Don't forget that usually it's irrelevant if the system performs well on your laptop.

• If you are not involved in operations, use innovative means to set up a feedback cycle.

17

Page 18: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

More specific derived goals

Page 19: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

If your requirement is to lower latency

• Amdahl's law - you won't be able to effectively speed up a single computation task if you cannot parallellize it.

• In an ordinary synchronous blocking Servlet API programming model, you have to make sure that the use of shared locks and resources is minimised.

• Reducing thread blocking (object monitor usage) is a key principle for improving performance - Amdahl's law explains why.

• The ideal is lock free request handling when synchronous Servlet API is used. 19

Page 20: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Understand Little's law in your context

• With Little's law you can do calculations and reasoning about programming models that fit your requirements and available resources

• the traditional Servlet API thread-per-request model could fit your requirements and you can still make it "fast" (low latency) in most cases.

20

Page 21: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Cons of the thread-per-request model in the light of Little's law and Amdahl's law

• From Little's law: MeanNumberInSystem = MeanThroughput * MeanResponseTime

• In the thread-per-request model, the upper bound for MeanNumberInSystem is the maximum for the number of request handling threads. This might limit the throughput of the system, especially when the response time get higher or request handling threads get blocked and hang.

• Shared locks and resources might set the upper bound to a very low value. Such problems get worse under error conditions. 21

Page 22: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Advantages of thread-per-request model • We are used to debugging the thread-per-request model

- adding breakpoints, attaching the debugger and going through the stack

• The synchronous blocking procedural programming model is something that programmers are used to doing.

• There is friction in switching to different programming models and paradigms.

22

Page 23: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

KillerApp for non-blocking async model

• Responsive streaming of a high number of clients on a single box

• continuously connected real-time apps where low-latency and high availablity is a requirement

• limited resources (must be efficient/optimal)

23

Page 24: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Profiling concepts and tools

Page 25: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

JVM code profiler concepts• Sampling

• statistical ways to get information about the execution using JVM profiling interfaces with a given time interval, for example 100 milliseconds. Statistical methods are used to calculate values based on the samples.

o Unreliable results, but certainly useful in some cases since the overhead of sampling is minimal compared to instrumentation

o Usually helps to get better understanding of the problem if you learn to look past the numeric values returned from measurements.

• Instrumentation o exact measurements of method execution details

25

Page 26: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Load testing tools and services

• Simple command line tools • wrk https://github.com/wg/wrk • modern HTTP benchmarking tool

o has lua scripting support for doing things like verifying the reply

• Load testing toolkits and service providers • Support testing of full use cases and stateful flows • toolkits: JMeter (http://jmeter.apache.org/),

Gatling (http://gatling.io/) 26

Page 27: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Common pitfalls in profiling Grails

• Measuring wall clock time • Measuring CPU time

• Instrumentation usually provides false results because of JIT compilation and other reasons like spin locks

• lack of proper JVM warmup • Relying on gut feeling and being lazy

27

Page 28: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Ground your feet

• Find a way to review production performance graphs regularly, especially after making changes to the system

• system utilisation over time (CPU load, IO load & wait, Memory usage), system input workload (requests) over time, etc.

• In the Cloud, use tools like New Relic to get a view in operations • CloudFoundry based Pivotal Web Services and IBM Bluemix

have New Relic available • In the development environment, use a profiler and debugger to

get understanding. You can use grails-melody plugin to get insight on SQL that's executed.

28

Page 29: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Grails - The low hanging fruit

• Improper JVM config • Slow SQL • Blocking caused by caching • Bad regexps • Unnecessary database transactions • Watch out for blocking in the Java API: Hashtable

29

Page 30: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Environment related problems

• Improper JVM configuration for Grails apps • out-of-the-box Tomcat parameters • a single JVM running with a huge heap on a big box

o If you have a big powerful box, it's better to run multiple small JVMs and put a load balancer in front of them

30

Page 31: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Example of proper Tomcat config for *nix

31

Create a file setenv.sh in tomcat_home/bin directory: ! 1 export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60 2 export LC_ALL=en_US.UTF-8 3 export LANG=en_US.UTF-8 4 CATALINA_OPTS="$CATALINA_OPTS -server -noverify" 5 CATALINA_OPTS="$CATALINA_OPTS -XX:MaxPermSize=256M -Xms768M -Xmx768M" # tune heap size 6 CATALINA_OPTS="$CATALINA_OPTS -Djava.net.preferIPv4Stack=true" # disable IPv6 if not used 7 # set default file encoding and locale 8 CATALINA_OPTS="$CATALINA_OPTS -Dfile.encoding=UTF-8 -Duser.language=en -Duser.country=US" 9 CATALINA_OPTS="$CATALINA_OPTS -Duser.timezone=CST" # set default timezone 10 CATALINA_OPTS="$CATALINA_OPTS -Dgrails.env=production" # set grails environment 11 # set timeouts for JVM URL handler 12 CATALINA_OPTS="$CATALINA_OPTS -Dsun.net.client.defaultConnectTimeout=10000 -Dsun.net.client.defaultReadTimeout=10000" 13 CATALINA_OPTS="$CATALINA_OPTS -Duser.dir=$CATALINA_HOME" # set user.dir 14 export CATALINA_OPTS 15 export CATALINA_PID="$CATALINA_HOME/logs/tomcat.pid"

Page 32: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

JVM heap size

• Assumption: optimising throughput and latency on the cost of memory consumption

• set minimum and maximum heap size to the same value to prevent compaction (that causes full GC)

• look at the presentation recording of the "Tuning Large scale Java platforms" by Emad Benjamin and Jamie O'Meara for more.

• rule in the thumb recommendation for heap size: survivor space size x 3...4 and don't exceed NUMA node's local memory size for your server configuration (use: "numactl --hardware" to find out Numa node size on Linux).

32

Page 33: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

The most common problem: SQL

• SQL and database related bottlenecks: learn how to profile SQL queries and tune your database queries and your database

• grails-melody plugin can be used to spot costly SQL queries in development and testing environments. Nothing prevents use in production however there is a risk that running it in production environment has negative side effects.

• New Relic in CloudFoundry (works for production environments) 33

Page 34: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Use a non-blocking cache implemention

• Guava LoadingCache is a good candidate https://code.google.com/p/guava-libraries/wiki/CachesExplained

• "While the new value is loading the previous value (if any) will continue to be returned by get(key) unless it is evicted. If the new value is loaded successfully it will replace the previous value in the cache; if an exception is thrown while refreshing the previous value will remain, and the exception will be logged and swallowed." (http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/cache/LoadingCache.html#refresh(K))

34

Page 35: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Some regexps are CPU hogs

35https://twitter.com/lhotari/status/474591343923449856

Page 36: SpringOne2014-LariHotari-PerformancetuningGrailsapplications
Page 37: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Verify regexps against catastrophic backtracking

• Verify regexps that are used a lot • use profiler's CPU time measurement to spot • search for the code for candidate regexps • Use a regexp analyser to check regexps with different input size

(jRegExAnalyser/RegexBuddy). • Make sure valid input doesn't trigger "catastrophic backtracking". • Understand what it is.

• http://www.regular-expressions.info/catastrophic.html • "The solution is simple. When nesting repetition operators, make

absolutely sure that there is only one way to match the same match" 37

Page 38: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Eliminate unnecessary database transactions in Grails

• should use "static transactional = false" in services that don't need transactions

• Don't call transactional services from GSP taglibs (or GSP views), that might cause a large number of short transactions during view rendering

38

Page 39: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

JDK has a lot of unnecessary blocking

• java.util.Hashtable/Properties is blocking • these block:

System.getProperty("some.config.value","some.default"), Boolean.getBoolean("some.feature.flag")

• Instantiation of PrintWriter, Locale, NumberFormats, CurrencyFormats etc. , a lot of them has blocking problems because System.getProperty calls.

• Consider monkey patching the JDK's Hashtable class: https://github.com/stephenc/high-scale-lib

39

Page 40: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Misc Grails tips

• Use singleton scope in controllers • grails.controllers.defaultScope = 'singleton'

• default for new apps for a long time, might be problem for upgraded apps

• when changing, make sure that you previously didn't use controller fields for request state handling (that was ok for prototype scope)

• Use controller methods (replace closures with methods in upgraded apps)

40

Page 41: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Tools for performance environments

Page 42: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Simple inspection in production environments

• kill -3 <PID> or jstack <PID> • Makes a thread dump of all threads and outputs it to

System.out which ends up in catalina.out in default Tomcat config.

• the java process keeps running and it doesn't get terminated

42

Page 43: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Java Mission Control & Flight Recorder

• Oracle JDK 7 and 8 includes Java Mission Control since 1.7.0_40 .

• JAVA_HOME/bin/jmc executable for launching the client UI for jmc

• JMC includes Java Flight Recorder which has been designed to be used in production.

• JFR can record data without the UI and store events in a circular buffer for investigation of production problems.

43

Page 44: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

JFR isn't free

• JFR is a commercial non-free feature, available only in Oracle JVMs (originally from JRockit).

• You must buy a license from Oracle for each JVM using it.

• "... require Oracle Java SE Advanced or Oracle Java SE Suite licenses for the computer running the observed JVM" , http://www.oracle.com/technetwork/java/javase/documentation/java-se-product-editions-397069.pdf , page 5

44

Page 45: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Controlling JFR

• enabling JFR with default continuous "black box" recording:

• Runtime controlling using jcmd commands • help for commands with

45

jcmd <pid> help JFR.start jcmd <pid> help JFR.stop jcmd <pid> help JFR.dump jcmd <pid> help JFR.check

export _JAVA_OPTIONS="-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:FlightRecorderOptions=defaultrecording=true"

Page 46: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

Demo

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Page 47: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

wrk http load testing tool sample output

47

1 Running 10s test @ http://localhost:8080/empty-test-app/empty/index 2 10 threads and 10 connections 3 Thread Stats Avg Stdev Max +/- Stdev 4 Latency 1.46ms 4.24ms 17.41ms 93.28% 5 Req/Sec 2.93k 0.90k 5.11k 85.67% 6 Latency Distribution 7 50% 320.00us 8 75% 352.00us 9 90% 406.00us 10 99% 17.34ms 11 249573 requests in 10.00s, 41.22MB read 12 Socket errors: connect 1, read 0, write 0, timeout 5 13 Requests/sec: 24949.26 14 Transfer/sec: 4.12MB

check latency, the max and it's distribution

Total throughput

https://github.com/lhotari/grails-perf-testapps/empty-test-app

Page 48: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Questions?

Page 49: SpringOne2014-LariHotari-PerformancetuningGrailsapplications

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Thanks!Lari Hotari @lhotari

Pivotal Software, Inc.