21
1 © Copyright 2012 EMC Corporation. All rights reserved. Testing the Accuracy of Query Optimizers Zhongxian Gu UC Davis Mohamed A. Soliman Greenplum/EMC Florian M. Waas Greenplum/EMC

Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

1© Copyright 2012 EMC Corporation. All rights reserved.

Testing the Accuracy of

Query Optimizers

Zhongxian Gu UC Davis

Mohamed A. Soliman Greenplum/EMC

Florian M. Waas Greenplum/EMC

Page 2: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

2© Copyright 2012 EMC Corporation. All rights reserved.

Evaluating Query Optimizer

Expressiveness

Efficiency

Accuracy

Page 3: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

3© Copyright 2012 EMC Corporation. All rights reserved.

Evaluating Query Optimizer

How can we measure the accuracy of a

commercial query optimizer?

Page 4: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

4© Copyright 2012 EMC Corporation. All rights reserved.

Page 5: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

5© Copyright 2012 EMC Corporation. All rights reserved.

Quality Assurance Challenge

• Functional testing– Correctness tests

– Testing for crashes/asserts, internal errors

• Performance/Optimality– Typically eye-balled by expert for „suspicious‟ plans

– Tested with trivial examples where optimal plan is known

Optimizers are hard to test;

generally under-tested;

Page 6: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

6© Copyright 2012 EMC Corporation. All rights reserved.

Example

• Commercial DB Vendor

• Regular regressions– 400,000 test cases from real-world app stack

– Used for both functional and performance testing

• Major optimizer extension/rework done

• Same regression test used– Ensures same functionality

– Used to demonstrate significant performance improvements

Tests did not reveal sub-optimality for years!

Page 7: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

7© Copyright 2012 EMC Corporation. All rights reserved.

Common Approaches

Approach Pros Cons

Eye-balling • Intuitive

• Suitable for

troubleshooting

• Manual

• Cannot be automated

• Not exhaustive

“Freezing” of plans • Finds all possible

regressions

• Requires manual

verification of changes

• Not scalable

• Hinders innovation in the

long-term

Large test suites • May represent

customer workloads

• Large coverage

• Merely cement current

behavior

• Do not test for optimality

Randomize tests • Test corner cases • Test “plumbing” only

• Do not test for optimality

• Do not test for correctness

Page 8: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

8© Copyright 2012 EMC Corporation. All rights reserved.

Measuring Optimizer Accuracy

Plan search space

p5

p1

p6

p2

p4

p3

p7

23

14

34

112

200

147

9

33

3

20

43

111

187

17

Given two plan alternatives, can the optimizer distinguish

them based on cost?

Page 9: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

10© Copyright 2012 EMC Corporation. All rights reserved.

Measuring Optimizer Accuracy

• Compute a correlation score between the plan ranking

based on estimated cost and that of actual cost– Plan discordance

– Plan relevance

– Pairwise distance

Estimated Cost

p1

p2

p3

p4

Actual Cost

p2

p1

p4

p3

Page 10: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

11© Copyright 2012 EMC Corporation. All rights reserved.

Generating Plan Sample

• Commercial optimizers provide switches to influence plan

choice for a given query– Example: enable/disable Hash Join alternative

– Usually achieved by issuing simple commands, or by incorporating

special constructs into query text

• The existence of optimizer switches allows forcing a number

of plan alternatives

Page 11: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

12© Copyright 2012 EMC Corporation. All rights reserved.

TAQO

JDBC Interface

DBnDB1 . . .

Configuration

Generator

Execution

Tracker

Plan

Deduplicator

Ranker

Input:

• Workload

• DB info.

• Opt switches

TAQO

Output:

• Accuracy

estimate

• Test report

• Plan distribution

<? xml ?>

Page 12: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

13© Copyright 2012 EMC Corporation. All rights reserved.

Evaluation

• 4 commercial optimizers

• TPC-H

• Compare results– Accuracy: Ranking of estimated vs. actual

– Optimality: default plan is optimal

• The tests were automatically conducted using TAQO

Page 13: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

14© Copyright 2012 EMC Corporation. All rights reserved.

Plan Samples

Page 14: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

15© Copyright 2012 EMC Corporation. All rights reserved.

Optimizers Comparison

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q02 q03 q04 q05 q07 q08 q09 q10 q12 q13 q14 q19 q20 q21

Opt-A Opt-B Opt-C Opt-D

Correlation scores for different TPC-H queries. Low score indicates high accuracy

Page 15: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

16© Copyright 2012 EMC Corporation. All rights reserved.

Accuracy vs. Optimality

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.6 0.7 0.8 0.9 1

Op

tim

alit

y

Accuracy

Opt-A

Opt-B

Opt-C

Opt-D

Page 16: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

17© Copyright 2012 EMC Corporation. All rights reserved.

Summary

• TAQO: a new framework for testing the accuracy of query

optimizers

• Leveraging rank correlation metrics for evaluating optimizer‟s

accuracy

• Evaluation of major commercial query optimizers suggest that

the abilities of different optimizers differ widely

• TAQO is part of Greenplum‟s development and test cycles

Page 17: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

18© Copyright 2012 EMC Corporation. All rights reserved.

Page 18: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

19© Copyright 2012 EMC Corporation. All rights reserved.

Rank Correlation Score

• Given a sample of plans, compute a rank correlation score based on the

rank of estimate costs and actual costs

• Encompass quantitative value– Discordance of plan pairs

– Relevance of plan

– Pairwise of distance

• Rank correlation score:

Page 19: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

20© Copyright 2012 EMC Corporation. All rights reserved.

Normalization Across Systems

• Consider the actual number of plans

• Apply k-medoids algorithm to find the k representatives

– Robust to noise and outliers

Page 20: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

21© Copyright 2012 EMC Corporation. All rights reserved.

In-depth Analysis

Plan distribution plots for Query-14

Page 21: Testing the Accuracy of Query Optimizersdbtest2012.comp.polyu.edu.hk/ppt/76-TAQO_2.pdf · Title: TITLE 44 POINT META NORMAL LF ALL CAPS Author: Gautam Created Date: 6/18/2012 11:10:27

22© Copyright 2012 EMC Corporation. All rights reserved.

TAQO- Testing Accuracy of Query Optimizers

• How can we measure the accuracy of an optimizer?

• How can we compare among different optimizers?

• TAQO provides an automatic and general approach to

accomplish the job

• Evaluation on commercial database optimizers