Comprehensive Depiction of Configuration-dependent Performance Anomalies in Distributed Server...

Comprehensive Depiction of Configuration-dependent Performance

Anomalies in Distributed Server Systems

Christopher Stewart, Ming Zhong,

Kai Shen, and Thomas O’Neill

University of Rochester

Presented at the 2nd USENIX Workshop on Hot Topics in System Dependability

Context

Distributed server systems Example: J2EE Application

servers Many system configurations

Switches that control runtime execution

Wide range of workload conditions exogenous demands for system

resources

Example J2EE

Runtime Conditions

System Configurations

Concurrency limit

Component placement

Workload Conditions

Request rate

Presumptions Performance expectations based on

knowledge of system design are reasonable Lead developers–high-level algorithms Administrators–day-to-day experience

Example ExpectationLittle’s Law

Average number of requests in the system equals the average arrival rate times service time

Anomalies

Actual

Expectation

Component Placement Strategies

Real Performance Anomalies

Problem Statement Dependable performance is

important for system management QoS scheduling SLA negotiations

Performance Anomalies – runtime conditions in which performance falls below expectations – are not uncommon

Goals Previous Work: Anomaly characterization can aid the

debugging process and guide online avoidance [AGU-SOSP99, QUI-SOSP05, CHE-NSDI04, COH-SOSP05, KEL-WORLDS05]

Focused on specific runtime conditions (e.g., those encountered during a particular execution)

We wish to depict all anomalous conditions

Comprehensive depictions can: Aid the debugging of production systems before

distribution Enable preemptive avoidance of anomalies in live

systems

Approach Our depictions are derived in a 3-step process:

1. Generate performance expectations by building a comprehensive whole-system performance model

2. Search for anomalous runtime conditions

3. Extrapolate a comprehensive anomaly depiction

Challenges: Model must consider a wide-range of system configurations Systematic method to determine anomaly error threshold An appropriate method to detect correlations between runtime

conditions and anomalies

Outline Performance expectations for a wide-range of

configuration settings

Determination of the anomaly error threshold

Decision-tree based anomaly depiction

Preliminary results

Discussion/ Conclusion

Comprehensive Performance Expectations Modeling the configuration space is hard

Configurations have complex effects on performance Considering a wide-range of configurations increases

model complexity

Our modeling methodology Build performance models as a hierarchy of sub-models Sub-models can be independently adjusted to consider

new system configurations

Rules for Our Sub-Model Hierarchies The output of each sub-model is a workload property

Workload property – internal demands for resources (e.g., CPU consumption)

The inputs to each sub-model are either1. workload properties 2. system configuration settings

Sub-models on the highest level produce performance expectations

Workload properties at the lowest level, canonical workload properties, can be measured independent of system configurations

A Hierarchy of Sub-Models We leverage the

workload properties of earlier work [STE-NSDI05]

Advantages Sub-models have

meaning

Limitations Configuration

dependencies may make sub-models complex

Sub-model 3:average request CPU usage at each machine

Sub-model 4:average request comm. need at each machine

Workload property:component CPU

usage w/o caching

Sub-model 1: average request CPU usage at

each component

Sub-model 2: average request comm. need at

each component

Sub-model 5: average request response time

Configuration:cache coherence

Workload property:component comm. need w/o caching

Configuration:component placement

Configuration:remote

invocation method

Configuration:component placement

Configuration:service

concurrency level

Sub-model 6:system

throughput

Hierarchy of sub-models for J2EE application servers.

Preliminary results

Determination of the Anomaly Error Threshold Sometimes slight discrepancies between actual and

expected performance should be tolerated

Leniency depends on the end-use of the depiction

For online avoidance: focus on error magnitude Large errors may induce poor management decisions Sensitivity analysis of system management functions

For debugging: focus on targeted performance bugs Noisy depictions will mislead debuggers Group anomalies with the same root cause

Anomaly Error Threshold for Debugging Observation: anomaly manifestations due to

the same cause are more likely to share similar error magnitude than unrelated anomaly manifestations

Root causes can be grouped by clustering based on the expectation error:

Anomaly Error Threshold for Debugging Knee-points mark

clusters boundaries

Knee-point selection Higher magnitude

emphasizes large anomalies

Low magnitude captures multiple anomalies

Validation: we notice that knee points disappear when problems are resolved

0 400 800 1200 1600

kneeknee

kneeResponse Time

Sample runtime conditions (sorted on expectation error)

Expectation Error Clustering

Preliminary results

Decision Tree Based Anomaly Depictions Decision trees correlate anomalies to problematic runtime conditions

Interpretable Unlike

Neural Nets, SVM, Perceptrons

No prior knowledge Unlike

Bayesian trees [COH-OSDI04]

Versatile

If a=0: anomaly

If a=1,b=0: normal

If a=1,b=1: anomaly

White-box Usage for Debugging

Prefer shorter, easily interpreted trees

Black-box Usage for Avoidance

Prefer longer, more precise tree

c c c c

Anomaly80% prob.

Normal70% prob.

Anomaly90% prob.

0 1 0 1

a=0,b=1,c=2,….

Anomaly

Normal

Design Recap We wish to depict performance anomalies

across a wide-range of system configurations and workload conditions

1. Derive performance expectations via a hierarchy of sub-models

2. Search for anomalous runtime conditions with carefully selected anomaly error threshold

3. Use decision trees to extrapolate a comprehensive anomaly depiction

Preliminary results

Depiction Assisted Debugging System: JBoss

8 runtime conditions (including app type) 4 machine cluster, 2.66 GHz CPU

Found and fixed 3 performance anomalies One is shown in detail below

Application type

Componentplacement strategy

79%anomalous

68%anomalous

87%anomalous

88%anomalous

Container-managed persistence (CMP)

Node Comp. 1 {2} 2 {1,3,5} 3 {4}

1 {4,5} 2 {1,2,3} 3 null

1 null2 {1,2,4}3 {3,5}

1 {5}2 {1,2,4}3 {3}

Depiction of a real performance anomaly.

Misunderstood J2EE configuration which manifests when multiple components are placed on node 2

Discovered Anomalies

1. Misunderstood J2EE configuration caused remote invocations to unintentionally execute locally

2. A mishandled out-of-memory error under high concurrency caused the Tomcat 5.0 servlet container to drop requests

3. Circular dependency in the component invocation sequences caused connection timeouts under certain component placement strategies

Preliminary results

Discussion Limitations

Cannot detect non-deterministic anomalies Is it model inaccuracy or a performance anomaly?

Requires manual investigation, but model is much less complex than the system

Debugging is still a manual process

Future work Short term: Investigate more system configurations Short term: Depict anomalies in more systems Long term: More systematic depiction-assisted

debugging methods

Take Away Comprehensive depictions of performance

anomalies on a wide-range of runtime conditions can aid debugging and avoidance

We have designed and implemented an approach to: Model a wide-range of system configurations Determine anomalous conditions Depict the anomalies in an easy-to-interpret fashion

We have already used our approach to find 3 performance bugs

Comprehensive Depiction of Configuration-dependent Performance Anomalies in Distributed Server...

Documents

Dec. - February 10 - ERICtechnology with the same old methods (Swaminathan & Yelland, 2003; Zhong & Shen, 2002). That is, trend setters and so-called leaders of technology integration

Dermatologica Sinica - CORE · CASE REPORT Ascher syndrome Zhifang Zhai, Zhiqiang Song, Fei Hao, Baiyu Zhong, Zhu Shen* Department of Dermatology, Southwest Hospital, Third Military

Master Chen Zhen Ya Zhong Hua Xiu Shen Association Notes 2010- English.pdfb. Resume normal breathing and keep mind “blank” (no other thoughts) 3． Step 3 a. Open the eyes slowly

NUS CS5247 Dynamically-stable Motion Planning for Humanoid Robots Presenter Shen zhong Guan Feng 07/11/2003

Depiction of Relief

Visualization hang zhong

全页照片 - WPL.GOV.CNxxzx.wpl.gov.cn/UploadFile/20160401042711447.pdf · shen yang shi gui hua he goo tu zl yuan ju xin zhong xin

A word about the weather: depiction and visualisationresearchonline.rca.ac.uk/...AWordAboutTheWeather... · A word about the weather: depiction and visualisation ... depiction, and

American Automobile Association Group Member: Shen TENG, Mengnan ZHANG, He WANG, Cheng ZHONG, Qingzi YANG, Jiankai LIN, Jing QIN, Danqi LI

Portfolio-ZHONG Jiayin

Maha Mula Acarya Lian Sheng - Vihara Vajra Bhumi SriwijayaInformasi alamat Vihara Zhen Fo Zhong terdekat di kota anda: ... wu shang shen shen wei miao fa. bai qian wan jie nan zao

The Art and Science of Depiction Denotation systempeople.csail.mit.edu/fredo/Depiction/17_Denotation/denotation.pdfThe Art and Science of Depiction Fredo Durand MIT- Lab for Computer

Cycloid: A Constant-Degree and Lookup-Efﬁcient P2P ...Cycloid: A Constant-Degree and Lookup-Efﬁcient P2P Overlay Network Haiying Shen and Cheng-Zhong Xu Department of Electrical

Zhong wang presentation

Inconsistency of Biography of Tsien Hsue-Shen. of Tsien Hsue-Shen.pdf · 2014-06-04 · Inconsistency of Biography of Tsien Hsue-Shen Jian-zhong Zhao ( School of Resource, Environment

Maha Mula Acarya Lian Sheng - · PDF fileInformasi alamat Vihara Zhen Fo Zhong terdekat di kota anda: ... wu shang shen shen wei miao fa. ... chi di chun yi jin sha bu di

Zhong AUC2008

THE INTERNATIONAL PERMIAN TIMESCALE: MARCH 2013 …permian.stratigraphy.org/files/20130721210111619.pdf · THE INTERNATIONAL PERMIAN TIMESCALE: MARCH 2013 UPDATE SHU-ZHONG SHEN 1,

1 IKEA Presented by: Danny Reyes, Feifei Zhong, Candy Zhong, Michelle Chen

Ben Shen Sufletele Entităţilor viscerale /Les Ben Shen ou ... · Ben Shen Sufletele Entităţilor viscerale /Les Ben Shen ou Cinq Ames Vegetatives/ Ben Shen: the Five Psychical-Emotional