93
TPC-H Studies Joe Chang [email protected] www.qdpma.com

TPC-H Studies Joe Chang [email protected]

Embed Size (px)

Citation preview

Page 2: TPC-H Studies Joe Chang jchang6@yahoo.com

About Joe ChangAbout Joe Chang

SQL Server Execution Plan Cost Model

True cost structure by system architecture

Decoding statblob (distribution statistics)

SQL Clone – statistics-only database

ToolsExecStats – cross-reference index use by SQL-execution plan

Performance Monitoring,

Profiler/Trace aggregation

Page 3: TPC-H Studies Joe Chang jchang6@yahoo.com

TPC-HTPC-H

Page 4: TPC-H Studies Joe Chang jchang6@yahoo.com

TPC-HTPC-H

DSS – 22 queries, geometric mean60X range plan cost, comparable actual range

Power – single streamTests ability to scale parallel execution plans

Throughput – multiple streams

Scale Factor 1 – Line item data is 1GB

875MB with DATE instead of DATETIME

Only single column indexes allowed, Ad-hoc

Page 5: TPC-H Studies Joe Chang jchang6@yahoo.com

SF 10, test studiesSF 10, test studies

Not valid for publication

Auto-Statistics enabled, Excludes compile time

Big Queries – Line Item Scan

Super Scaling – Mission Impossible

Small Queries & High Parallelism

Other queries, negative scaling

Did not apply T2301, or disallow page locks

Page 6: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 7: TPC-H Studies Joe Chang jchang6@yahoo.com

0

500

1,000

1,500

2,000

2,500

3,000

3,500

Q1 Q9 Q13 Q18 Q21

DOP 1 DOP 2 DOP 4

DOP 8 DOP 16

Big Q: Plan Cost vs ActualBig Q: Plan Cost vs ActualPlan Cost reduction from DOP1 to 16/32Q1 28%Q9 44%Q18 70%Q21 20%

Plan Cost says scaling is poor except for Q18,

memory affects Hash IO onset

Plan Cost @ 10GB

0

15

30

45

60

75

Q1 Q9 Q13 Q18 Q21

DOP 1 DOP 2 DOP 4

DOP 8 DOP 16 DOP 24

DOP 30 DOP 32

Actual Query timeIn seconds

Plan Cost is poor indicator of true parallelism scaling

Q18 & Q 21 > 3X Q1, Q9

Page 8: TPC-H Studies Joe Chang jchang6@yahoo.com

02468

10121416182022242628303234

Q1 Q9 Q13 Q18 Q21

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 30 DOP 32

Big Query: Speed Up and CPUBig Query: Speed Up and CPU

Q13 has slightly better than perfect scaling?In general, excellent scaling to DOP 8-24, weak afterwards

Holy Grail

0

10

20

30

40

50

60

70

80

90

Q1 Q9 Q13 Q18 Q21

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 30 DOP 32

CPU timeIn seconds

Speed up relative to DOP 1

Page 9: TPC-H Studies Joe Chang jchang6@yahoo.com

Super ScalingSuper Scaling

Suppose at DOP 1, a query runs for 100 seconds, with one CPU fully pegged

CPU time = 100 sec, elapse time = 100 sec

What is best case for DOP 2?Assuming nearly zero Repartition Threads cost

CPU time = 100 sec, elapsed time = 50?

Super Scaling: CPU time decreases going from Non-Parallel to Parallel plan!No, I have not started drinking, yet

Page 10: TPC-H Studies Joe Chang jchang6@yahoo.com

0.0

0.5

1.0

1.5

2.0

2.5

Q7 Q8 Q11 Q21 Q22

DOP 1 DOP 2

DOP 4 DOP 8

DOP 16 DOP 24

DOP 30 DOP 32

Super ScalingSuper Scaling

CPU-sec goes down from DOP 1 to 2 and higher (typically 8)

0

2

4

6

8

10

12

14

16

18

20

22

24

26

Q7 Q8 Q11 Q21 Q22

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 30 DOP 32

CPU normalized to DOP 1

Speed up relative to DOP 1

3.5X speedup from DOP 1 to 2 (Normalized to DOP 1)

Page 11: TPC-H Studies Joe Chang jchang6@yahoo.com

CPU and Query time in secondsCPU and Query time in seconds

0

2

4

6

8

10

12

14

16

18

20

Q7 Q8 Q11 Q21 Q22

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 30 DOP 32

0

2

4

6

8

10

12

Q7 Q8 Q11 Q21 Q22

DOP 1 DOP 2 DOP 4

DOP 8 DOP 16 DOP 24

DOP 30 DOP 32

CPU time

Query time

Page 12: TPC-H Studies Joe Chang jchang6@yahoo.com

Super Scaling SummarySuper Scaling Summary

Most probable causeBitmap Operator in Parallel Plan

Bitmap Filters are great, Question for Microsoft:

Can I use Bitmap Filters in OLTP systems with non-parallel plans?

Page 13: TPC-H Studies Joe Chang jchang6@yahoo.com

Small Queries – Plan Cost vs ActSmall Queries – Plan Cost vs Act

Query 3 and 16 have lower plan cost than Q17, but not included

0

50

100

150

200

250

Q2 Q4 Q6 Q15 Q17 Q20

DOP 1 DOP 2 DOP 4

DOP 8 DOP 16

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Q2 Q4 Q6 Q15 Q17 Q20

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 30 DOP 32

Q4,6,17 great scaling to DOP 4, then weak

Negative scaling also occurs

Query time

Plan Cost

Page 14: TPC-H Studies Joe Chang jchang6@yahoo.com

Small Queries CPU & SpeedupSmall Queries CPU & Speedup

What did I get for all that extra CPU?, Interpretation: sharp jump in CPU means poor scaling, disproportionate means negative scaling

0

1

2

3

4

5

6

Q2 Q4 Q6 Q15 Q17 Q20

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 30 DOP 32

0

2

4

6

8

10

12

14

16

18

Q2 Q4 Q6 Q15 Q17 Q20

DOP 1 DOP 2 DOP 4

DOP 8 DOP 16 DOP 24

DOP 30 DOP 32

Query 2 negative at DOP 2, Q4 is good, Q6 get speedup, but at CPU premium, Q17 and 20 negative after DOP 8

CPU time

Speed up

Page 15: TPC-H Studies Joe Chang jchang6@yahoo.com

High Parallelism – Small QueriesHigh Parallelism – Small Queries

Why? Almost No value

TPC-H geometric mean scoringSmall queries have as much impact as large

Linear sum of weights large queries

OLTP with 32, 64+ coresParallelism good if super-scaling

Default max degree of parallelism 0

Seriously bad news, especially for small Q

Increase cost threshold for parallelism?

Sometimes you do get lucky

Page 16: TPC-H Studies Joe Chang jchang6@yahoo.com

Q that go NegativeQ that go Negative

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Q17 Q19 Q20 Q22

DOP 1 DOP 2

DOP 4 DOP 8

DOP 16 DOP 24

DOP 30 DOP 32

0

2

4

6

8

10

12

14

Q17 Q19 Q20 Q22

DOP 1 DOP 2

DOP 4 DOP 8

DOP 16 DOP 24

DOP 30 DOP 32

Query time

“Speedup”

Page 17: TPC-H Studies Joe Chang jchang6@yahoo.com

CPUCPU

0

2

4

6

8

10

12

Q17 Q19 Q20 Q22

DOP 1 DOP 2

DOP 4 DOP 8

DOP 16 DOP 24

DOP 30 DOP 32

Page 18: TPC-H Studies Joe Chang jchang6@yahoo.com

Other Queries – CPU & SpeedupOther Queries – CPU & Speedup

0

2

4

6

8

10

12

14

16

18

20

22

Q3 Q5 Q10 Q12 Q14 Q16

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 30 DOP 32

0

2

4

6

8

10

12

14

16

18

20

22

Q3 Q5 Q10 Q12 Q14 Q16

DOP 1 DOP 2

DOP 4 DOP 8

DOP 16 DOP 24

DOP 30 DOP 32

Q3 has problems beyond DOP 2

CPU time

Speedup

Page 19: TPC-H Studies Joe Chang jchang6@yahoo.com

Other - Query Time secondsOther - Query Time seconds

0

2

4

6

8

10

12

14

16

Q3 Q5 Q10 Q12 Q14 Q16

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 30 DOP 32

Query time

Page 20: TPC-H Studies Joe Chang jchang6@yahoo.com

Scaling SummaryScaling Summary

Some queries show excellent scaling

Super-scaling, better than 2X

Sharp CPU jump on last DOP doubling

Need strategy to cap DOPTo limit negative scaling

Especially for some smaller queries?

Other anomalies

Page 21: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 22: TPC-H Studies Joe Chang jchang6@yahoo.com

CompressionCompression

PAGE

Page 23: TPC-H Studies Joe Chang jchang6@yahoo.com

1.0

1.1

1.2

1.3

1.4

1.5

DOP 1 DOP 2 DOP 4 DOP 8 DOP 16 DOP 24 DOP 30 DOP 32

1.0

1.1

1.2

1.3

1.4

1.5

DOP 1 DOP 2 DOP 4 DOP 8 DOP 16 DOP 24 DOP 30 DOP 32

Compression Overhead - OverallCompression Overhead - Overall

40% overhead for compression at low DOP,10% overhead at max DOP???

Query time compressed relative to uncompressed

CPU time compressed relative to uncompressed

Page 24: TPC-H Studies Joe Chang jchang6@yahoo.com

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 32

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

DOP 1 DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 32

Query time compressed relative to uncompressed

CPU time compressed relative to uncompressed

Page 25: TPC-H Studies Joe Chang jchang6@yahoo.com

Compressed TableCompressed Table

LINEITEM – real data may be more compressibleUncompressed: 8,749,760KB, Average Bytes per row: 149Compressed: 4,819,592KB, Average Bytes per row: 82

Page 26: TPC-H Studies Joe Chang jchang6@yahoo.com

PartitioningPartitioning

Orders and Line Item on Order Key

Page 27: TPC-H Studies Joe Chang jchang6@yahoo.com

Partitioning Impact - OverallPartitioning Impact - Overall

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

DOP 1 DOP 2 DOP 4 DOP 8 DOP 16 DOP 24 DOP 30 DOP 32

0.90

0.95

1.00

1.05

1.10

1.15

DOP 1 DOP 2 DOP 4 DOP 8 DOP 16 DOP 24 DOP 30 DOP 32

Query time partitioned relative to not partitioned

CPU time partitioned relative to not partitioned

Page 28: TPC-H Studies Joe Chang jchang6@yahoo.com

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

DOP 1 DOP 2 DOP 4

DOP 8 DOP 16 DOP 24

DOP 32

0

1

2

3

4

5

6

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

DOP 1 DOP 2

DOP 4 DOP 8

DOP 16 DOP 24

DOP 32

Query time partitioned relative to not partitioned

CPU time partitioned relative to not partitioned

Page 29: TPC-H Studies Joe Chang jchang6@yahoo.com

Plan for Partitioned TablesPlan for Partitioned Tables

Page 30: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 31: TPC-H Studies Joe Chang jchang6@yahoo.com

Scaling DW SummaryScaling DW Summary

Massive IO bandwidth

Parallel options for data load, updates etc

Investigate Parallel Execution PlansScaling from DOP 1, 2, 4, 8, 16, 32 etc

Scaling with and w/o HT

Strategy for limiting DOP with multiple users

Page 32: TPC-H Studies Joe Chang jchang6@yahoo.com

Fixes from Microsoft NeededFixes from Microsoft Needed

Contention issues in parallel execution

Table scan, Nested Loops

Better plan cost model for scalingBack-off on parallelism if gain is negligible

Fix throughput degradation with multiple users running big DW queries

Sybase and Oracle, Throughput is close to Power or better

Page 33: TPC-H Studies Joe Chang jchang6@yahoo.com

Query PlansQuery Plans

Page 34: TPC-H Studies Joe Chang jchang6@yahoo.com

Big QueriesBig Queries

Page 35: TPC-H Studies Joe Chang jchang6@yahoo.com

Q1 Pricing Summary ReportQ1 Pricing Summary Report

Page 36: TPC-H Studies Joe Chang jchang6@yahoo.com

Q1 Plan Q1 Plan

Non-Parallel

Parallel

Parallel plan 28% lower than scalar, IO is 70%, no parallel plan cost reduction

Page 37: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 38: TPC-H Studies Joe Chang jchang6@yahoo.com

Q9 Product Type Profit MeasureQ9 Product Type Profit Measure

IO from 4 tables contribute 58% of plan cost, parallel plan is 39% lower

Non-Parallel Parallel

Page 39: TPC-H Studies Joe Chang jchang6@yahoo.com

Q9 Non-Parallel PlanQ9 Non-Parallel Plan

Table/Index Scans comprise 64%, IO from 4 tables contribute 58% of plan cost

Join sequence: Supplier, (Part, PartSupp), Line Item, Orders

Page 40: TPC-H Studies Joe Chang jchang6@yahoo.com

Q9 Parallel PlanQ9 Parallel Plan

Non-Parallel: (Supplier), (Part, PartSupp), Line Item, OrdersParallel: Nation, Supplier, (Part, Line Item), Orders, PartSupp

Page 41: TPC-H Studies Joe Chang jchang6@yahoo.com

Q9 Non-Parallel Plan detailsQ9 Non-Parallel Plan details

Table Scans comprise 64%,IO from 4 tables contribute 58% of plan cost

Page 42: TPC-H Studies Joe Chang jchang6@yahoo.com

Q9 Parallel reg vs Partitioned Q9 Parallel reg vs Partitioned

Page 43: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 44: TPC-H Studies Joe Chang jchang6@yahoo.com

Q13Q13 Why does Q13 have perfect scaling?

Page 45: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 46: TPC-H Studies Joe Chang jchang6@yahoo.com

Q18 Large Volume CustomerQ18 Large Volume Customer

Non-Parallel

Parallel

Page 47: TPC-H Studies Joe Chang jchang6@yahoo.com

Q18 Graphical PlanQ18 Graphical Plan

Non-Parallel Plan: 66% of cost in Hash Match, reduced to 5% in Parallel Plan

Page 48: TPC-H Studies Joe Chang jchang6@yahoo.com

Q18 Plan DetailsQ18 Plan Details

Non-Parallel

Parallel

Non-Parallel Plan Hash Match cost is 1245 IO, 494.6 CPUDOP 16/32: size is below IO threshold, CPU reduced by >10X

Page 49: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 50: TPC-H Studies Joe Chang jchang6@yahoo.com

Q21 Suppliers Who Kept Orders WaitingQ21 Suppliers Who Kept Orders Waiting

Note 3 references to Line Item

Non-Parallel Parallel

Page 51: TPC-H Studies Joe Chang jchang6@yahoo.com

Q21 Non-Parallel PlanQ21 Non-Parallel Plan

H1

H1H2H3

H2H3

Page 52: TPC-H Studies Joe Chang jchang6@yahoo.com

Q21 ParallelQ21 Parallel

Page 53: TPC-H Studies Joe Chang jchang6@yahoo.com

Q21Q21

3 full Line Item clustered index scans

Plan cost is approx 3X Q1, single “scan”

Page 54: TPC-H Studies Joe Chang jchang6@yahoo.com

Super ScalingSuper Scaling

Page 55: TPC-H Studies Joe Chang jchang6@yahoo.com

Q7 Volume ShippingQ7 Volume Shipping

Non-Parallel Parallel

Page 56: TPC-H Studies Joe Chang jchang6@yahoo.com

Q7 Non-Parallel PlanQ7 Non-Parallel Plan

Join sequence: Nation, Customer, Orders, Line Item

Page 57: TPC-H Studies Joe Chang jchang6@yahoo.com

Q7 Parallel PlanQ7 Parallel Plan

Join sequence: Nation, Customer, Orders, Line Item

Page 58: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 59: TPC-H Studies Joe Chang jchang6@yahoo.com

Q8 National Market ShareQ8 National Market Share

Non-Parallel Parallel

Page 60: TPC-H Studies Joe Chang jchang6@yahoo.com

Q8 Non-Parallel PlanQ8 Non-Parallel Plan

Join sequence: Part, Line Item, Orders, Customer

Page 61: TPC-H Studies Joe Chang jchang6@yahoo.com

Q8 Parallel PlanQ8 Parallel Plan

Join sequence: Part, Line Item, Orders, Customer

Page 62: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 63: TPC-H Studies Joe Chang jchang6@yahoo.com

Q11 Important Stock IdentificationQ11 Important Stock Identification

Non-Parallel Parallel

Page 64: TPC-H Studies Joe Chang jchang6@yahoo.com

Q11Q11

Join sequence: A) Nation, Supplier, PartSupp, B) Nation, Supplier, PartSupp

Page 65: TPC-H Studies Joe Chang jchang6@yahoo.com

Q11Q11

Join sequence: A) Nation, Supplier, PartSupp, B) Nation, Supplier, PartSupp

Page 66: TPC-H Studies Joe Chang jchang6@yahoo.com

Small QueriesSmall Queries

Page 67: TPC-H Studies Joe Chang jchang6@yahoo.com

Query 2 Minimum Cost SupplierQuery 2 Minimum Cost Supplier

Wordy, but only touches the small tables, second lowest plan cost (Q15)

Page 68: TPC-H Studies Joe Chang jchang6@yahoo.com

Q2Q2

Clustered Index Scan on Part and PartSupp have highest cost (48%+42%)

Page 69: TPC-H Studies Joe Chang jchang6@yahoo.com

Q2Q2

PartSupp is now Index Scan + Key Lookup

Page 70: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 71: TPC-H Studies Joe Chang jchang6@yahoo.com

Q6 Forecasting Revenue ChangeQ6 Forecasting Revenue Change

Note sure why this blows CPUScalar values are pre-computed, pre-converted

Page 72: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 73: TPC-H Studies Joe Chang jchang6@yahoo.com

Q20?Q20?

This query may get a poor execution plan

Date functions are usually written as

because Line Item date columns are “date” typeCAST helps DOP 1 plan, but get bad plan for parallel

Page 74: TPC-H Studies Joe Chang jchang6@yahoo.com

Q20Q20

Page 75: TPC-H Studies Joe Chang jchang6@yahoo.com

Q20Q20

Page 76: TPC-H Studies Joe Chang jchang6@yahoo.com

Q20 alternate - parallelQ20 alternate - parallel

Statistics estimation error here

Penalty for mistakeapplied here

Page 77: TPC-H Studies Joe Chang jchang6@yahoo.com

Other QueriesOther Queries

Page 78: TPC-H Studies Joe Chang jchang6@yahoo.com

Q3Q3

Page 79: TPC-H Studies Joe Chang jchang6@yahoo.com

Q3Q3

Page 80: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 81: TPC-H Studies Joe Chang jchang6@yahoo.com

Q12 Random IO?Q12 Random IO?

Will this generate random IO?

Page 82: TPC-H Studies Joe Chang jchang6@yahoo.com

Query 12 PlansQuery 12 PlansNon-Parallel

Parallel

Page 83: TPC-H Studies Joe Chang jchang6@yahoo.com

Queries that go NegativeQueries that go Negative

Page 84: TPC-H Studies Joe Chang jchang6@yahoo.com

Q17 Small Quantity Order RevenueQ17 Small Quantity Order Revenue

Page 85: TPC-H Studies Joe Chang jchang6@yahoo.com

Q17Q17

Table Spool is concern

Page 86: TPC-H Studies Joe Chang jchang6@yahoo.com

Q17Q17

the usual suspects

Page 87: TPC-H Studies Joe Chang jchang6@yahoo.com
Page 88: TPC-H Studies Joe Chang jchang6@yahoo.com

Q19Q19

Page 89: TPC-H Studies Joe Chang jchang6@yahoo.com

Q19Q19

Page 90: TPC-H Studies Joe Chang jchang6@yahoo.com

Q22Q22

Page 91: TPC-H Studies Joe Chang jchang6@yahoo.com

Q22Q22

Page 92: TPC-H Studies Joe Chang jchang6@yahoo.com

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Tot

DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 32

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Tot

DOP 2 DOP 4 DOP 8

DOP 16 DOP 24 DOP 32 Speedup from DOP 1 query time

CPU relative to DOP 1

Page 93: TPC-H Studies Joe Chang jchang6@yahoo.com