33
Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters Weiyi Shang, Ahmed E. Hassan Queen’s University, Kingston, Canada Mohamed Nasser, Parminder Flora BlackBerry, Waterloo, Canada 1

Icpe2015 weiyi shang (1)

  • Upload
    sailqu

  • View
    17

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Icpe2015 weiyi shang (1)

1

Automated Detection of Performance Regressions Using Regression Models on

Clustered Performance Counters

Weiyi Shang, Ahmed E. HassanQueen’s University, Kingston, Canada

Mohamed Nasser, Parminder FloraBlackBerry, Waterloo, Canada

Page 2: Icpe2015 weiyi shang (1)

2

What is a performance regression?

Old version New version

Does the new version have worse performance than the old version?

Page 3: Icpe2015 weiyi shang (1)

3

How to detect performance regression?

Old Version

New version

requests

requests

requests

Performance counters

Performance counters

Page 4: Icpe2015 weiyi shang (1)

4

Large software systems generate large amounts of performance counters

Page 5: Icpe2015 weiyi shang (1)

5

Performance engineers pick counters to compare using T-test

Picking the wrong counters will not detect performance regression!

Are these counters significantly

different between old and new

versions?

Page 6: Icpe2015 weiyi shang (1)

6

Performance engineers build a model using all counters

Target Counter Independent Counters

CPUMemory I/O read I/O write

Page 7: Icpe2015 weiyi shang (1)

7

Target counter

Old version New version

Select target counter

Build model

Verify model

Model Prediction error

Target counter is selected by experience!

Model-based performance regression detection

Page 8: Icpe2015 weiyi shang (1)

8

Selecting a wrong target counter will fail to detect performance regression

Memory-related performance regressions will not be detected by this model!

Memory has a low correlation with CPU

CPUMemory I/O read I/O write

I/O write has a high correlation with CPU

Page 9: Icpe2015 weiyi shang (1)

9

Our approach

Target counter

Old version New version

Select target counter

Build model

Verify model

Model Prediction error

Reduce counters

Reduced counters

Clustercounters

Clusters of counters For each cluster

Page 10: Icpe2015 weiyi shang (1)

10

Select target counter

Build model

Verify model

Reduce counters

ClusterCounters

Removing zero variance counters

Removing redundant counters

Available CPU cores:4, 4, 4, …4, 4

I/O write byte/sec= a*I/O write op/sec +b*CPU

Step 1: Reduce counters

Page 11: Icpe2015 weiyi shang (1)

11

Step 2: Cluster counters

Hierarchical clustering Dendrogram cutting

CPU I/O Memory CPU I/O Memory

Select target counter

Build model

Verify model

Reduce counters

ClusterCounters

Page 12: Icpe2015 weiyi shang (1)

12

Step 3: Select target counter

Select target counter

Build model

Verify model

Reduce counters

ClusterCounters

CPU from old version

CPU from new version

Kolmogorov–Smirnov test

P-VALUE

We select the counter that has significant difference between two versions with highest certainty.

We select the counter with

smallest p-value

Page 13: Icpe2015 weiyi shang (1)

13

Step 3: Build model Select target counter

Build model

Verify model

Reduce counters

ClusterCounters

CPUMemory I/O read I/O write

We build a linear regression model.

+ c+ ba

Page 14: Icpe2015 weiyi shang (1)

14

Step 4: Verify model

Linear regression models for each cluster

Calculate prediction error

New version counters left in cluster 1

Linear regression model for cluster 1

… …

Mean prediction error

< threshold

> threshold

Select target counter

Build model

Verify model

Reduce counters

ClusterCounters

Page 15: Icpe2015 weiyi shang (1)

15

Case study: subject systems

DELL DVD Store

Enterprise Application

CPU overhead Memory overhead I/O overhead

Removing text index Removing column index

Real-life performance regression

Page 16: Icpe2015 weiyi shang (1)

16

Can our approach detect performance

regressions?

AccuracyApplicability

How many target counters does our

approach pick?

Is our approach better than traditional

approaches?

Comparison

Page 17: Icpe2015 weiyi shang (1)

17

Our approach picks a small number of target counters

DELL DVD Store

Our approach picks 2 to 4 target counters.

Performance engineers cannot pick target counters based on experience

Picked target counters are different across runs of our approach.

Page 18: Icpe2015 weiyi shang (1)

18

Can our approach detect performance

regressions?

AccuracyApplicability

How many target counters does our

approach pick?

Is our approach better than traditional

approaches?

Comparison

Our approach picks a small number of target counters.

Page 19: Icpe2015 weiyi shang (1)

19

Can our approach detect performance

regressions?

AccuracyApplicability

How many target counters does our

approach pick?

Is our approach better than traditional

approaches?

Comparison

Our approach picks a small number of target counters.

Page 20: Icpe2015 weiyi shang (1)

20

Our approach can detect performance regressions

DELL DVD Store

Max prediction error4% to 11% without regression

24% to 1622% with regression

Our approach is not heavily impacted by the choice of threshold value.

Without performance regression

3%

9% Max prediction error

With performance regression

2%

916%Max prediction error

Page 21: Icpe2015 weiyi shang (1)

21

Can our approach detect performance

regressions?

AccuracyApplicability

How many target counters does our

approach pick?

Is our approach better than traditional

approaches?

Comparison

Our approach picks a small number of target counters.

Our approach can detect performance regressions and is not heavily impacted by threshold value.

Page 22: Icpe2015 weiyi shang (1)

22

Can our approach detect performance

regressions?

AccuracyApplicability

How many target counters does our

approach pick?

Is our approach better than traditional

approaches?

Comparison

Our approach picks a small number of target counters.

Our approach can detect performance regressions and is not heavily impacted by threshold value.

Page 23: Icpe2015 weiyi shang (1)

23

Our approach outperforms picking one target counter to build a model

DELL DVD Store

Without performance regression

0%

With performance regression

0%

Building one model using memory as target counter may fail to detect performance regression

Page 24: Icpe2015 weiyi shang (1)

24

T-test does not perform well in detecting performance regressions.

DELL DVD Store

Without performance regression

3 to 6 counters are flagged

There are a large number of counters with significant differences in the T-test results even though no regressions exist.

Enterprise Application

Without performance regression

32% counters are flagged

Page 25: Icpe2015 weiyi shang (1)

25

Can our approach detect performance

regressions?

AccuracyApplicability

How many target counters does our

approach pick?

Is our approach better than traditional

approaches?

Comparison

Our approach picks a small number of target counters.

Our approach can detect performance regressions and is not heavily impacted by threshold value.

Our approach out performs traditional approaches.

Page 26: Icpe2015 weiyi shang (1)

26

How to detect performance regression?

Old Version

New version

requests

requests

requests

Performance counters

Performance counters

Page 27: Icpe2015 weiyi shang (1)

27

Page 28: Icpe2015 weiyi shang (1)

28

Target counter

Old version New version

Select target counter

Build model

Verify model

Model Prediction error

Target counter is selected by experience!

Model-based performance regression detection

Page 29: Icpe2015 weiyi shang (1)

29

Page 30: Icpe2015 weiyi shang (1)

30

Our approach

Target counter

Old version New version

Select target counter

Build model

Verify model

Model Prediction error

Reduce counters

Reduced counters

Clustercounters

Clusters of counters For each cluster

Page 31: Icpe2015 weiyi shang (1)

31

Page 32: Icpe2015 weiyi shang (1)

32

Can our approach detect performance

regressions?

AccuracyApplicability

How many target counters does our

approach pick?

Is our approach better than traditional

approaches?

Comparison

Our approach picks a small number of target counters.

Our approach can detect performance regressions and is not heavily impacted by threshold value.

Our approach out performs traditional approaches.