An Experimental Study of
Rapidly Alternating Bottleneck
in n-Tier Applications
Qingyang Wang, Yasuhiko Kanemasa, Jack Li,
Deepal Jayasinghe, Toshihiro Shimizu, Masazumi
Matsubara, Motoyuki Kawaba, Calton Pu
Scaling Web Applications
On-Demand in Cloud
2
Good performance + Cost efficiency
BottleneckBottleneck
High throughput + low
response time
High resource
utilization
What If No Bottleneck Was Detected?
3
CPU util. 79.2%
Disk I/O 0.5%
Memory 30%
……
CPU util. 78.1%
Disk I/O 0.5%
Memory 50%
…… CPU util. 34.6%
Disk I/O 0.5%
Memory 50%
……
CPU util. 26.7%
Disk I/O 0.5%
Memory 50%
……
How to scale a web application while no bottleneck is identified?
Bad
performance
Rapidly Alternating Bottlenecks
4
Web server
App server
DB server
t0 t1 t2 t3 t
Bottleneck
Bottleneck
Bottleneck
Bottleneck
workload
Time
2. Duration of each bottleneck is short (e.g., < 100ms)
1. Throughput is limited with no saturated resources
Experimental Setup
5
RUBBoS benchmark: a bulletin board
system like Slashdot
24 web interactions
CPU intensive
Workload consists of emulated clients
Intel Xeon E5607
2 quad-core 2.26 GHz
16 GB memory
Motivational Example
6
Response time & throughput of a 3-minute benchmark
on the 4-tier application with increasing workloads.
WL 14,000
WL 14,000
Resp
on
se t
ime [
s]
Workload [# users x1000]
Th
rou
gh
pu
t [
req
/s]
Workload [# users x1000]
No Obvious Bottleneck is Detected
at WL 14,000
7
CPU util. 79.2% CPU util. 78.1%
Workload is CPU intensive
Disk I/O utilization (<5%), network I/O utilization (<
20%), Memory usage (<40%);
CPU util. 34.6% CPU util. 26.7%
CJDBC
Sources: We find that other than bursty
workload, system environmental conditions:
JVM garbage collection
VM collocation
Detection and Visualization: We implement
a fine-grained monitoring method based on
passive network tracing.
Negligible monitoring overhead for running
applications
Rapidly Alternating Bottleneck:
Sources and Detection
8
Introduction & Motivation
Detection and Visualization
Fine-grained load/throughput analysis
Two Observations of Rapidly Alternating
Bottlenecks
JVM garbage collection (JVM GC)
VM collocation
Conclusion & Future Works
Outline
9
Two Steps for Detecting Rapidly Alternating
bottlenecks
10
1. Find the participating servers that present transient
bottlenecks(e.g., 50ms)
2. Check whether the transient bottlenecks of each
participating server occur in an alternating pattern
Passive Network Tracing Infrastructure
11
Collect interaction messages in the system using
SysViz to measure fine-grained active load and
throughput on each server.
Active load: The # of concurrent requests in a server
Throughput: The # of completed requests of a server
Web AP DB
Network switch
Web server AP server DB server
01:58:20.039 01:58:20.053
01:58:20.072
01:58:20.081
01:58:20.090
01:58:20.124
01:58:20.134
01:58:20.142
01:58:20.154
01:58:20.161
01:58:20.182
01:58:20.193
SysViz box
Fine-Grained Active Load
Calculation in a Server
SysViz
monitoring for
MySQL
50ms 50ms
12
time arrival
timestamp
departure
timestamp
Saturation
area
Active load in a server
Active-Load/Throughput
Correlation Analysis
13
Serv
er
thro
ugh
put
Saturation
point N*
TPmax
Non-
Saturation
area
Active-Load/Throughput Analysis
for MySQL at WL 14,000
14
MySQL active load [#]
MyS
QL
th
rou
gh
pu
t
[req
/s]
Time [s] Time [s]
MySQL active load [#]
MySQL active load (every 50ms) MySQL throughput (every 50ms)
Acti
ve lo
ad
[#
]
Th
rou
gh
pu
t [r
eq
/s]
1
2 3
1
2
3
1
2
3
N*
Introduction & Motivation
Detection and Visualization
Fine-grained load/throughput analysis
Two Observations of Rapidly Alternating
Bottlenecks
JVM garbage collection (JVM GC)
VM collocation
Conclusion & Future Works
Outline
15
Active-Load/Throughput Analysis
at Workload 7,000
16
WL 7,000
Syst
em
Th
rou
gh
pu
t [r
eq
/s]
System is far from
saturation at WL 7,000
Workload [# users x1000]
Tomcat active load [#]
To
mcat
thro
ugh
pu
t [r
eq
/s]
MySQL active load [#]
MyS
QL
th
rou
gh
pu
t [r
eq
/s]
Point of Interest (POI)
Active-Load/Throughput Analysis
at Workload 14,000
17
Syst
em
Th
rou
gh
pu
t [r
eq
/s]
System achieves the
maximum throughput at
WL14,000
Workload [# users x1000]
WL 14,000
Tomcat active load [#]
To
mcat
thro
ugh
pu
t [r
eq
/s]
MySQL active load [#]
MyS
QL
th
rou
gh
pu
t [r
eq
/s]
POI
Timeline Analysis at Workload 14,000
18
Acti
ve lo
ad
[#
]
Th
rou
gh
pu
t [r
eq
/s]
Tomcat scatter graph
4 POI
4
Time (s)
Acti
ve lo
ad
[#
]
To
mcat
GC
run
nin
g r
ati
o [
%]
Time (s)
Timeline Analysis at Workload 14,000 (Cont.)
19
Acti
ve lo
ad
[#
]
Time (s)
Acti
ve lo
ad
[#
]
Th
rou
gh
pu
t [r
eq
/s]
Th
rou
gh
pu
t [r
eq
/s]
MySQL scatter graph
Tomcat scatter graph
Time (s)
4
4’
4
4’
POI
Correlation Analysis of Rapidly Alternating
Bottlenecks
20
To
mcat
acti
ve lo
ad
[#
]
MyS
QL
acti
ve lo
ad
[#
]
Time (s)
Correlation coefficient: -0.42, negative correlation
suggests rapidly alternating bottleneck.
Introduction & Motivation
Detection and Visualization
Fine-grained load/throughput analysis
Two Observations of Rapidly Alternating
Bottlenecks
JVM garbage collection (JVM GC)
VM collocation
Conclusion & Future Works
Outline
21
Rapidly alternating bottlenecks can cause
non-trivial performance loss in an n-tier
system.
We proposed a rapidly alternating bottleneck
detection and visualization method through
fine-grained active-load/throughput analysis
Ongoing work: more analysis of different
types of workloads and more system factors
that cause rapidly alternating bottlenecks.
Conclusion & Future Work
22
Resolving Rapidly Alternating Bottlenecks
25
MySQL active load [#]
MyS
QL
th
rou
gh
pu
t [r
eq
/s]
MySQL active load [#]
MyS
QL
th
rou
gh
pu
t [r
eq
/s]
Tomcat active load [#]
To
mcat
thro
ugh
pu
t [r
eq
/s]
To
mcat
thro
ugh
pu
t [r
eq
/s]
Tomcat active load [#]
Performance Gain After Resolving Rapidly
Alternating Bottlenecks
26
Resp
on
se t
ime[s
]
Workload [# users x1000] T
hro
ugh
pu
t [r
eq
/s]
Workload [# users x1000]