81
Confidential, Dynatrace LLC Top Performance Bottleneck Pattern Deep Dive Andreas Grabner - @grabnerandi [email protected]

Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Confidential, Dynatrace LLC

Top Performance Bottleneck Pattern Deep Dive

Andreas Grabner - @[email protected]

Page 2: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Why Performance?

Confidential, Dynatrace, LLC

Page 3: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

700 deployments / YEAR

10 + deployments / DAY

50 – 60 deployments / DAY

Every 11.6 SECONDS

Page 4: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

„We increased from monthly to 250+ deployments per week“

dev.otto.de

Page 5: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Not only fast delivered but also delivering fast!

-1000ms +2%

Response Time Conversions

-1000ms +10%

+100ms -1%

Page 6: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

#1: Which Geo has which “User Experience”? #2: Who are

these users?

User Experience == Load Time + Error Rate + Bandwidth

Page 7: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

New Deployment + Mkt Push

Increase # of unhappy users!

Decline in Conversion Rate

Overall increase of Users!

Spikes in FRUSTRATED Users!

Page 8: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Performance == Respone Time | UX + Resource UsageCPU & Mem Impact of updated Dependency Injection Lib

Page 9: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

App with Regular Load supported by

10 Containers

Twice the Load but 48 (=4.8x!) Containers! App doesn’t scale!!

Performance == Scalabilty! Whats the Cost per User?

Page 10: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Performance --> User Behavior: Click Heatmap Anlaysis

Page 11: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

How to analyze perf?

Confidential, Dynatrace, LLC

Page 12: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Time: Wall Clock, CPU, I/O, Wait/Sync, Susp, Page Load

Throughput: # of Requests per Timeinterval

Resources: CPU Cycles, Memory, I/O, Log Messages, ...

Pools and Queues: Sizes, Utilization, Acquisition Time, # Publishers vs # Subscribers, Process Time

Interactions: # SQLs, # Messages, # Services, # Images, # CSS

Errors: Exceptions, HTTPs, TCP Packet Loss

Page 13: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

AND MANY MORE

Page 14: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

0.02ms

0.01ms

Page 15: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -
Page 16: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

https://dynatrace.github.io/ufo/

„In Your Face“ Data!!

Page 17: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Where do your

Stories come

from?

Page 18: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -
Page 19: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Share Your PurePath - http://bit.ly/sharepurepath

Page 20: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

3rd parties

Akamai

Cloudfront

Synthetic

Apache

IIS

Node.js

nginx

Java

.NET

PHP

IBM

WMQ

ESBsMongoDB

Hbase

Cassandra

CICs

IMS

ORACLE

MSSQL

MySQL

DB2

Mobile

Collector

Plugins

Dynatrace Server

Hosts

Session Storage

Splunk

Elasticsearch

Solr

Rich Client

Web Interface

Web

Page 21: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Dev/Arch

Method Level Hotspots

+ Exceptions, Logs, Memory Allocation, Threads, Actual Code ...

Page 22: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -
Page 23: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

20%

80%

Page 24: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -
Page 25: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -
Page 26: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Frontend Performance„We are getting FATer!“

Page 27: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

m.pepsi.com during Super Bowl

434 Resources in total on that page:230 JPEGs, 75 PNGs, 50 GIFs, …

Total size of ~ 20MB

Page 28: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Make F12 your friend!

Page 29: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Monitor your Real End Users#1: Which Geo has which

“User Experience”?#2: Who are these users?

Page 30: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Key Metrics# of ResourcesSize of ResourcesTotal Size of ContentHTTP 3xx, 4xx, 5xx# of Domains

Page 31: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -
Page 32: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -
Page 33: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Backend Performance“The Usual Suspects”

Page 34: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

cite the database as the most

common challenge or issue

with application performance

88%

Page 35: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Slow Single SQL Issue: The root cause can be single SQL queries that block

connections for a long time

#1: Watch out for Slow Single Statement

Page 36: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Analyze Execution Plans

Focus on: Rows Processed, Bytes Read, Impact of Indices, …

Page 37: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Analyze other Database Activity

Tip: Watch out for CPU, Disk, Rows Processed, Execution Time, Locks, …

Page 38: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Excessive SQL: 24889! Calls to JDBC API

Database Heavy: 66.51% (40.27s) Time Spent in SQL Execs

#2: Too many Queries – Too much Data

Page 39: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Inefficient Pool Access: 12444! individual connections

N+1 Query Problem: Same SQL with different bind value

Individual SQL are fast <1msNo need to optimize on the

database side

#3: N+1 Query Access Pattern

Page 40: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

N+1 Query Problem + Excessive SQL: Lazy Loading in Hibernate Executes 4k+ Statements

Database Heavy: 2 SQL Queries executed 4k+ times totaling to 6s

#3: N+1 Query Access Pattern (cont.)

Page 41: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Candidate for Prepared Execution: Watch out for a single SQL that gets executed

more than once per request

Unprepared Statements:Only 7 out of 4904 total executions

across multiple requests got prepared!

Not a single case: Happens for many SQLs in this example

#4: Prepared vs Unprepared

Page 42: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Impact on Execution Plan Generation

CPU intensive operation:A new execution plan is generated for

each unique SQL statements

Page 43: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Oracle v$ metrics to monitor overhead of parsing

High CPU caused by

execution plan generation

Time Spent Breakdown show up to

25% of the time spent on Parsing

Page 44: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: „Ad hoc“ vs Prepared StatementsThe database will generate a new execution plan

for every value of customer_id

The database will generate one single execution plan and reuse it for every value of customer_id

Page 45: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Proof: No Parsing Overhead!

Low CPU usageParsing doesn’t show up anymore

on the Time Spent Breakdown

Page 46: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Proof: Significant Performance ImprovementResponse Time with Ad-Hoc

Queries per Transactions: 177.9 ms

Using Bind Variables, Response

Time reduced by 56% (78.64 ms)

Page 47: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Analyze by SQL Type (SELECT, INSERT, UPDATE, ...)Which Types take how long?

When do you have spikes?

Page 48: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Do we see increased in AVG # of SQL Executions over Time

when we have constant load?Do TOTAL # of SQL Executions increase with load? Shouldn’t

it flatten due to CACHES?

Tip: Does our Data Caching work?

Page 49: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Lessons Learned – Don’t Assume …• … you know what code is doing you inherited!!

• … you are not making mistakes like this

• Explore the Right Tools

• Built-In Database Analysis Tools

• “Logging” options of Frameworks such as Hibernate, …

• JMX, Perf Counters, … of your Application Servers

• Performance Tracing Tools: Dynatrace, Ruxit, NewRelic, AppDynamics, Your Profiler of Choice …

Page 50: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Key Metrics

# of SQL Calls

# of same SQL Execs (1+N)

# of Connections

Rows/Data Transferred

Page 51: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Pools & Queues„Proper Sizing“„Proper Usage“

Page 52: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

#1: Pick Proper Pool Sizes

Do we have enough DB CONNECTIONS per pool?

Page 53: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

#2: Monitoring Pool Usage and Impact

How long to wait for a new connection?

How long connections are in use?

Page 54: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

#3: Correct Load Balancing

Page 55: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

#4: Excessive Thread Usage per RequestExcessive Remoting:

40! RMI Callson 20! parallel threads

Excessive SQL: 2790! SQL Calls Total

Worker Thread Exhaustion:Excessive use of threads in App Server also causes backup of requests in Web Server.

Watch your Worker Thread Pools!

Page 56: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Follow Threads through PurePath

Thread Name: Count Unique Thread IDs

Async/Sync Nodes: RMI calls done on

background threads

Page 57: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Watch out for Sync / Wait1.63s in Object.wait

Means that this thread is put to hold

Waiting on the next Connection to become

available!

Page 58: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Key Metrics

Pool and Queue Sizes

Time in Sync & Wait

# of Threads per Request

Balanced Requests

Page 59: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

When Logging„Impacts Performance“

LOG

Page 60: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

#1: Log Hotspots in Frameworks!callAppenders clear CPU and I/O Hotspot

Excessive logging through Spring Framework

Page 61: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

#2: Debug Log and outdated log4j library#1: Top Problem: log4j.callAppenders

-> 71% Sync Time

#2: Most of logging done from fillDetail method

#3: Doing “DEBUG” log output: Is this necessary?

Page 62: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

#3: Overhead caused by ExceptionsfillInStackTrace is Top 2 in CPU Hotspots

All these Exceptions that never show up in a log file are consuming all CPU

Page 63: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Tip: Too Many Exceptions vs Log Messages

2-5 Log Messages per 5 MinLooking at the important

(SEVERE, FATAL, …) log messages written

Up to 20000 Custom ExceptionsThat’s about 4000x the number of Exceptions per Log Message

Page 64: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Key Metrics

# of Log Entries

Size of Logs per Use Case

Ratio Exceptions to Logs

Page 65: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

„(Micro-)Service Migration Mistakes“

Page 66: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Online Sports Club Search Service

2015201420xx

Response Time

2016+

1) Started as a small project

2) Slowly growing user base

3) Expanding to new markets –

1st performance degradation!

4) Adding more markets – performance becomes

a business impact Users

4) Potentially start loosing users

Page 67: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Can‘t scale vertically endlessly!

2.68s Load Time

94.09% CPU Bound

Early 2015: Monolithic App

Page 68: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Front Endto Cloud

Scale Backendin Containers!

Proposal: Service approach!

Page 69: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

7:00 a.m.

Low Load and Service runningon minimum redundancy

12:00 p.m.

Scaled up service during peak loadwith failover of problematic node

7:00 p.m.

Scaled down again to lower loadand move to different geo location

Testing the Backend Service alone scales well …

Page 70: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

GO LIVE DAY – 7:00 a.m.

Page 71: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

GO LIVE DAY – 12:00 p.m.

Page 72: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

What Went Wrong?

Page 73: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

26.7s Load Time

5kB Payload

33! Service Calls

99kB - 3kB for each call!

171! Total SQL Count

Architecture ViolationDirect access to DB from frontend service

Single search query end-to-end

Page 74: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

2.5s (vs 26.7)

5kB Payload

1! (vs 33!) Service Call

3kB (vs 99) Payload!

3! (vs 177) Total

SQL Count

The fixed end-to-end use case“Re-architect” vs. “Migrate” to Service-Orientation

Page 75: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Key Metrics

# of Service Calls

Payload of Service Calls

# of Involved Threads

1+N Service Call Pattern!

Page 76: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -
Page 77: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

You measure it! from Dev (to) Ops

Page 78: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Build 17 testNewsAlert OKtestSearch OK

Build # Use Case Stat # API Calls # SQL Payload CPU

1 5 2kb 70ms1 40 5kb 120ms

Use Case Tests and Monitors Service & App Metrics

Build 26 testNewsAlert OKtestSearch OK

Build 25 testNewsAlert OKtestSearch OK

1 4 1kb 60ms34 171 104kb 550ms

Ops

#ServInst Usage RT

1 0.5% 7.2s

1 63% 5.2s

1 4 1kb 60ms2 3 8kb 100ms

1 0.6% 4.2s

5 75% 2.5s

Build 35 testNewsAlert -testSearch OK

- - - -2 3 10kb 150ms

- - -

8 80% 2.0s

Metrics from and for Dev(to)Ops

Re-architecture into „Services“ + Performance Fixes

Scenario: Monolithic App with 2 Key Features

Page 79: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

https://dynatrace.github.io/ufo/

„In Your Face Data“

Page 80: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

QuestionsSlides: slideshare.net/grabnerandi

Get Tools: bit.ly/dtpersonal

YouTube Tutorials: bit.ly/dttutorials

PurePerformance Podcast: bit.ly/purepef

Contact Me: [email protected]

Follow Me: @grabnerandi

Read More: blog.dynatrace.com

Page 81: Top Performance Bottleneck Pattern Deep Dive · 2017. 10. 5. · 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 8kb 100ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK -

Andreas GrabnerDynatrace Developer Advocate

@grabnerandi

http://blog.dynatrace.com