How to Read Query Plans and Improve Performance

Diagnosing and Improving Query Performance through EXPLAIN plan analysis

Calisto [email protected]

Agenda

• Basics• Some Not So Good Plans• Analyzing Plans

• Diagnostics• Cardinality Estimation correction• Expensive Operations

• An Example• Summary

Optimizer Concepts 101

• Evaluates operation order• Joins, predicate application, aggregation

• Evaluates implementation to use:• table scan vs. index scan• nested-loop join vs. sorted-merge join vs. hash join

• Evaluates parallel join strategies• Co-located join vs directed join vs broadcast join vs repartitioned

join

• Enumerates alternate plans and chooses the optimum • Costs the alternatives based on estimated costs

Is The Optimizer Perfect ?

• No not really … we are always improving it.• It is a model• There are thousands of variables • It depends on the accuracy of information provided as input

• If you can help provide better information to the optimizer, it has a chance of doing a better job

How does the EXPLAIN facility help ?• Offers clues as to why the optimizer has made particular decisions• Allows the user to maintain a history of problem query access plans

during key transition periods• New index additions• Large data updates/additions• RUNSTATS changes• Release to Release migration• Significant DB or DBM configuration changes

• Problem determination is easier, and often faster with a reference plan to compare against

Invoking db2exfmt

• Create EXPLAIN tables ~/sqllib/misc/EXPLAIN.DDL• Populate the EXPLAIN tables

• EXPLAIN PLAN FOR <SQL statement>• For dynamic SQL … SET EXPLAIN MODE • For static SQL set EXPLAIN bind option to YES

• Invoke db2exfmt• db2exfmt -d <dbname> -1 -o <filename>

Global InformationDB2_VERSION: 09.05.4

SOURCE_NAME: SQLC2G15

SOURCE_SCHEMA: NULLID

SOURCE_VERSION:

EXPLAIN_TIME: 2010-03-10-15.29.00.983210

EXPLAIN_REQUESTER: CALISTO

Database Context:

----------------

Parallelism: Inter-Partition Parallelism

CPU Speed: 2.479807e-07

Comm Speed: 100

Buffer Pool size: 114500

Sort Heap size: 20000

Database Heap size: 3615

Lock List size: 30000

Maximum Lock List: 80

Average Applications: 1

Locks Available: 1536000

Package Context:

---------------

SQL Type: Dynamic

Optimization Level: 5

Blocking: Block All Cursors

Isolation Level: Cursor Stability

STATEMENT 1 SECTION 203

------------------------

QUERYNO: 1

QUERYTAG:

Statement Type: Select

Updatable: No

Deletable: No

Query Degree: 1

Original and Rewritten Statement

Original Statement:

------------------

SELECT *

from V1

where V1.C1=(select C1

from CALISTO.T3

where C2=0)

Optimized Statement:

-------------------

SELECT Q7.$C0 AS "C1", Q7.$C1 AS "C2"

FROM (SELECT Q1.C1

FROM CALISTO.T3 AS Q1

WHERE (Q1.C2 = 0)) AS Q2,

(SELECT Q3.C1, Q3.C2

FROM CALISTO.T2 AS Q3

UNION ALL

SELECT Q5.C1, Q5.C2

FROM CALISTO.T1 AS Q5 ) AS Q7

WHERE (Q7.$C0 = Q2.$C0)

Plan GraphAccess Plan:------------

Total Cost: 121.438 Rows RETURN ( 1) Cost I/O | 10.96 NLJOIN ( 2) 591.438 51 /---+---\ 1 10.96 TBSCAN FILTER ( 3) ( 4) 38.8478 512.183 1 50 | | 121 9900 TABLE: CALISTO UNION T3 ( 5)

Continued … 9900 UNION ( 5) 492.8183 50 /------+-----\ 8530 1370 TBSCAN TBSCAN ( 6) ( 7) 126.422 360.409 30 20 | | 1530 1370 TABLE: CALISTO TABLE: CALISTO T2 T1

EXPLAIN Diagnostic Messages

Extended Diagnostic Information:--------------------------------

Diagnostic Identifier: 1Diagnostic Details: EXP0148W The following MQT or statistical view was

considered in query matching: "CALISTO ". “REPLICATED_DIM_TIME".

Diagnostic Identifier: 2Diagnostic Details: EXP0148W The following MQT or statistical view was

considered in query matching: "CALISTO ". "MQT_FCT_CUR_MONTH".

Diagnostic Identifier: 3Diagnostic Details: EXP0149W The following MQT was used (from those

considered) in query matching: "CALISTO ". “REPLICATED_DIM_TIME".

The RETURN Operator1) RETURN: (Return Result)

Cumulative Total Cost: 489328Cumulative CPU Cost: 1.68991e+12Cumulative I/O Cost: 56936.9Cumulative Re-Total Cost: 249.906Cumulative Re-CPU Cost: 1.00777e+09Cumulative Re-I/O Cost: 0Cumulative First Row Cost: 84081.3Cumulative Comm Cost: 7.42953e+07Cumulative First Comm Cost: 6.14019e+07Estimated Bufferpool Buffers: 4843

Arguments:---------BLDLEVEL: (Build level)

DB2 v9.5.0.4 ENVVAR : (Environment Variable)

DB2_ANTIJOIN = EXTENDENVVAR : (Environment Variable)

DB2_REDUCED_OPTIMIZATION = YESHEAPOVER: (Overcommitted on concurrent sortheap usage)

FALSEHEAPUSE : (Maximum Statement Heap Usage)

5808 PagesPREPNODE: (Prepare Node Number)

1PREPTIME: (Statement prepare time)

2004 millisecondsSHEAPCAP: (Cap on concurrent sortheap usage)

100 %STMTHEAP: (Statement heap size)

200000

Operator Details2) HSJOIN: (Hash Join)

Cumulative Total Cost: 2079.53Cumulative CPU Cost: 3.95132e+07Cumulative I/O Cost: 126Cumulative Re-Total Cost: 2079.53Cumulative Re-CPU Cost: 3.95132e+07Cumulative Re-I/O Cost: 126Cumulative First Row Cost: 2079.53Estimated Bufferpool Buffers: 126Arguments:---------BITFLTR : (Hash Join Bit Filter used)

FALSEEARLYOUT: (Early Out flag)

LEFTHASHCODE: (Hash Code Size)

24 BITOUTERJN : (Outer Join type)

LEFT (ANTI)TEMPSIZE: (Temporary Table Page Size)

4096 Predicates:

----------2) Predicate used in Join

Relational Operator: Equal (=)Subquery Input Required: NoFilter Factor: 0.0001Predicate Text:--------------(Q2.C1 = Q1.C1)

Input Streams:-------------

2) From Operator #3Estimated number of rows: 10000

Number of columns: 3Subquery predicate ID: Not

Applicable

Column Names:------------+Q2.C3+Q2.C2+Q2.C1

4) From Operator #4Estimated number of rows: 10000Number of columns: 1Subquery predicate ID: Not

Applicable

Column Names:------------+Q1.C1

Output Streams:--------------

5) To Operator #1

Estimated number of rows: 0.0001Number of columns: 3Subquery predicate ID: Not

ApplicableColumn Names:------------+Q4.C3+Q4.C2+Q4.C1

Object DetailsName : TIME_DIMSchema: CALISTO

Number of Columns: 50 Number of Pages with Rows: 20Number of Pages: 20Number of Rows: 5120Table Overflow Record Count: 0Width of Rows: 590 Time of Creation: 2009-12-16-23.20.43.601285Last Statistics Update: 2010-02-15-03.25.54.394067Primary Tablespace: TEST Tablespace for Indexes: TEST Tablespace for Long Data: NULLPNumber of Referenced Columns: 2 Number of Indexes: 2 Volatile Table: NoNumber of Active Blocks: -1Number of Column Groups: 0 Number of Data Partitions: 1 Average Row Compression Ratio: 0.000000 Percent Rows Compressed: 0.000000 Average Compressed Row Size: 0 Statistics Type: U

Identifying Expensive Operators

• Cardinality is the number above the operator

• Cost is the number just below the operator number• Operator cost is the increase

in the cost compared to the previous operator (or sum of all input operator costs)

• Number of I/Os is just below the cost

3.00579e+6 >NLJOIN ( 10)

1.54341e+06 548903 /---------+---------\ 3.00579e+6 1 HSJOIN IXSCAN ( 11) ( 15) 1.23169e+06 311718 284686 0 /------+------\ |

1.01317e+07 215511 1317 TBSCAN BTQ INDEX: CALISTO ( 12) ( 13) IDX_TBL_F 116551 1.07116e+06 Q5 93464 181971 | | 1.01317e+07 215511 TABLE: CALISTO TBSCAN TBL_E ( 14)

Q3 1.07016e+06 181971 | 215511 TABLE: CALISTO TBL_D

Cardinality

Cost

I/O

33467.5

NLJOIN ( 11) 15491.5 10529.4 /-----------+-----------\

267.277 125.216 BTQ FETCH ( 12) ( 17) 884.75 55.7205 376.972 37.9624 | /------+-----\ 13.3638 125.216 3.16068e+08 FETCH RIDSCN DP-TABLE: CALISTO ( 13) ( 18) TRANSFACT 884.666 46.1653 Q10 376.972 5.96244 /---+---\ | 4686.66 18722 125.216

RIDSCN TABLE: CALISTO SORT ( 14) TIMEDIM ( 19) 529.869 Q9 46.1651 68 5.96244 | | 4686.66 125.216 SORT IXSCAN ( 15) ( 20) 529.869 46.1508 68 5.96244 | | 4686.66 3.16068e+08 IXSCAN INDEX: CALISTO ( 16) TRANSFACT_IDX 528.787 Q10

Example : Try To Avoid SORT On The Inner Of A NLJN ?

• Problem ?• “List-prefetch” chosen

because of poor clustering

• Solution ?• Index-Only Access ?• REORG ?

7.96127 TBSCAN ( 2) 3312.15 702 | 298854 TABLE: CALISTO BIGTABLE Q1

7.96127

FETCH ( 2) 230.236 6.9672 /----+----\ 7.96127 298854 RIDSCN TABLE: CALISTO ( 3) BIGTABLE 30.2756 Q1 4 /-----+------\ 2 6 SORT SORT ( 4) ( 6) 15.138 15.138 2 2 | | 2 6 IXSCAN IXSCAN ( 5) ( 7) 15.1373 15.1373 2 2 | | 298854 298854 INDEX: SAPP29 INDEX: SAPP29 INDX~0 INDX~0 Q1 Q1

Example: OR Predicates

• Problem(C1 = 5 AND C2 = 10) OR

(C1 = 6 AND C2 = 12)• Large table scan to

fetch a few rows

• Solution• Consider indexes for

an Index-Oring plan

8 TBSCAN ( 2) 3312.15 702 | 298854 TABLE: CALISTO BIGTABLE Q1

8 NLJOIN ( 9)

122.22 16 /-----+-----\ 4 2 TBSCAN FETCH ( 10) ( 13) 0.014061 30.3206 0 4 | /----+----\ 4 2 298854 TABFNC: SYSIBM IXSCAN TABLE: CALISTO GENROW ( 3) BIGTABLE Q3 30.2756 Q1 3 | 298854 INDEX: SAPP29 INDX~0 Q1

IN Predicates• Problem

C1 IN (10. 100, 500, 10000)• Large table scan to fetch a few

rows

• Solution• Consider an index with the

column • For example: index on (C1, C3)• Index (C2, C1) is also good if

the query has an equality predicate on C2

3.00579e-5

>NLJOIN ( 10) 1.54341e+06 548903 /---------+---------\

3.00579e-5 3.22724e-6 HSJOIN TBSCAN ( 11) ( 15) 1.23169e+06 311718 284686 264217 /------+------\ |

1.01317e+07 215511 1.01317e+07 TBSCAN BTQ TABLE: CALISTO ( 12) ( 13) TBL_F 116551 1.07116e+06 Q5 93464 181971 | | 1.01317e+07 215511 TABLE: CALISTO TBSCAN TBL_E ( 14) Q3 1.07016e+06 181971 | 215511 TABLE: CALISTO TBL_D

Low Cardinality Estimates• Problem

• Unexpected low cardinality …3e-22 !!

• Table scan on the inner of a NLJN

• Solution• Consider Column Group

Statistics on columns used in HSJN(11) predicates

• Consider Indexes on TBL_F

RUNSTATS ON TABLE CALISTO.TBL_E ON ALL COLUMNS AND COLUMNS ((d_year, d_qtr_name)) WITH DISTRIBUTION AND INDEXES ALL;

Join Underestimation – Overloaded DimensionsSELECT MIN(d_date), MAX(d_date) FROM inventory 1998-01-01 to 2002-12-26 5 Year spread between MIN and MAX

SELECT MIN(d_date), MAX(d_date) FROM date_dim 1900-01-02, 2100-01-01 200 Year spread between MIN and MAX

SELECT Inventory.* FROM Inventory, date_dimWHERE d_date_sk = inv_date_sk and d_quarter_name = '2000Q1' ;

ESTIMATE: ~1/800th of the fact table assuming that the date keys corresponding to the year 2000 has the same probability of being in the fact table as the year 1955

REALITY: 1/20th of the fact table (20 Quarters

7241.95 HSJOIN ( 8) 3349.26 1526.6 /---------+--------\

9091 7241.95 <<<< UNDERESTIMATED! TBSCAN NLJOIN ( 9) ( 10) 614.897 2732.68 603 923.603 | /--------+--------\ 9091 89.5342 80.8848 TABLE: TPCDS BTQ FETCH ITEM ( 11) ( 13) Q1 691.788 22.8145 655 3 | /---+---\ 44.7671 80.8848 5.91947e+06 TBSCAN IXSCAN TABLE: TPCDS ( 12) ( 14) INVENTORY 691.717 15.2153 Q3 655 2 | | 36592 5.91947e+06 TABLE: TPCDS INDEX: TPCDS DATE_DIM INV_INVDATE Q2 Q3

CREATE VIEW V_inventory_date AS (SELECT date_dim.*, inv_date_sk FROM inventory, date_dim WHERE inv_date_sk = d_date_sk );

ALTER VIEW V_inventory_date ENABLE QUERY OPTIMIZATION;

RUNSTATS ON TABLE tpcds.v_inventory_date WITH DISTRIBUTION;

299498 ^HSJOIN ( 8) 7098.88 2814.62 /---------+---------\

BETTER >> 299498 9091 NLJOIN TBSCAN ( 9) ( 18) 6453.95 614.897 2211.62 603 /--------+--------\ | 89.5342 3345.06 9091 MBTQ FETCH TABLE: TPCDS ( 10) ( 14) ITEM 691.788 22.8145 Q1 655 3 | /---+---\ 44.7671 80.8848 5.91947e+06 TBSCAN IXSCAN TABLE: TPCDS ( 12) ( 14) INVENTORY 691.717 15.2153 Q3 655 2 | | 36592 5.91947e+06 TABLE: TPCDS INDEX: TPCDS DATE_DIM INV_INVDATE Q2 Q3

SORT Spills

• The details for a SORT will indicate if the SORT spilled

• The I/Os indicate that there was spilling associated with the SORT.

• Minimize spills by considering indexes and (Also discussed later by balancing SORTHEAP, SHEAPTHRES and BUFFERPOOL)

SORT

( 16) 6.14826e+06 1.30119e+06 | 3.65665e+07 TBSCAN ( 17) 2.00653e+06 1.14286e+06 | 3.74999e+07 TABLE: TPCD.ORDERS

Diagnostic Step 1 - Basics

• Look for the obvious diagnostic messages• No RUNSTATS on table ?

• Look at the global settings• BUFFERPOOL• SORTHEAP • OPTIMIZATION LEVEL• Settings in the RETURN operator

• Any anomalies

Diagnostic Step 2 – Cardinality Underestimation• Look for cardinality underestimation clues

• Get rid of very small numbers like 3.00579e-5 if possible with column group statistics if there are multiple predicates applied to a table

• Get rid of underestimation because of skew in the fact table join columns• Get rid of underestimation with over loaded dimensions• Range predicates on dates [DATECOL >= ‘2010-05-15’]

• Try increasing the number of quantile statistics if estimates look off

• Try not to put expressions around columns if possible

• Re-Explain the query once you have made changes to try and correct the cardinality estimate

Diagnostic Step 3 – Operator Costs

• Typically look for the expensive operators relative to the total cost

• Depending on the operator, there might be different possibilities

• If there seems nothing that you can easily do, consider the next most expensive operator

• Let us consider some operators in the next few slides

Anomaly With Merge Join or Nested Loop Join?

• Occasionally the MSJN or NLJN cost may be less than the sum of its inputs !!

• T1.C1 … Values from 1 to 100• T2.C1 … Values from 100 to 200

• SELECT * FROM T1, T2 WHERE T1.C1 = T2.C1

Expensive Table Scan• Are all the columns needed ?

• … Not much you can do • Only a few columns needed ?

• … Index only access possible ?• Predicates filter significantly ?

• … ISCAN–FETCH make sense ?• … indexes for IN or OR predicates possibility ?

• Is this a fact table with aggregation ?• Could we define a Materialized Query Table

• Are similar local predicates frequently used on this table• Could it be partitioned by range ?• Could this be a Multi-dimension Clustered table ?• Both MDC + Range partitioning ?

Expensive ISCAN and FETCH

• FETCH cost is very high ? OR• List Prefetch plans on the inner of nested loop joins ?

• Look at the CLUSTERRATIO (or CLUSTERFACTOR)• If it is not close to 100 (or 1), consider REORG against this index

if this is the key index used in most queries (this does not apply to Multi-dimension Clustered tables)

• Index-Only Access possible ?• If good filtering commonly used predicates are applied at

the FETCH• Consider adding the column or columns to the index

Expensive SORT

• Is there spilling ?• Could you increase SORTHEAP or BUFFERPOOL ? … do not

increase SORTHEAP too high if there are concurrent Hash joins or sorts in this query and specially in a multi-user environment

• Could you create an index to avoid the sort• Note that an ISCAN-FETCH may be more expensive • Perhaps an index-only access ?

Expensive SORT (Continued)• Do you have a DISTINCT in a

subquery ?• The optimizer sometimes

postpones the duplicate removal

• Consider using a GROUP BY instead of the DISTINCT

4.90699e+08 SORT

( 7)

7.86401e+07

1.58631e+07

|

4.90699e+08

HSJOIN

( 8)

35685.3

34146

/-----------+-----------\

3.25208e+06 84979 HSJOIN TBSCAN

( 9) ( 14)

12161.1 19697.9

12484 21662

/----------+----------\ |

606370 24719 84979 ^HSJOIN TBSCAN CUSTOMER

( 10) ( 13)

10764.8 1322.24

11038 1446

/------+-------\ |

606370 7671 39033 TBSCAN IXSCAN PRODUCT

( 11) ( 12)

10667.2 61.6409

11025 13

| |

606370 7671

FACT INDEX: TIME_IDX

DISTINCT

DISTINCT

Expensive Nested Loop Join• A TSCAN or SORT on the inner should be avoided as far as possible

• Consider an index• Consider clustering if the SORT is for a list prefetch plan

• Outer is large and you have an ISCAN-FETCH on the inner ?• Even if each ISCAN FETCH is not so expensive, it could be executed

thousands of times • See previous slide on Expensive ISCAN-FETCH

• Expression on join predicate ?• Could you avoid the expression on the inner table column so that you

could use an index with start-stop keys ?• Could you use generated columns with that expression ?

Expensive Hash Join

• Is there spilling ? • Optimizer estimates spilling if

• HSJN I/O cost > sum of the I/O of the inputs

• Hash Join uses SORTHEAP to keep the hash tables

Expensive Communication Between Partitions ?

• Large Table Queues (TQs)• Consider replicated dimension tables• Could the large dimension table that is commonly joined be

partitioned the same way as the fact table ?

TPCH Query ExampleSELECT nation, o_year, SUM(amount) AS sum_profit FROM (SELECT n_name as nation, year(o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount

FROM part, supplier, lineitem, partsupp, orders, nation WHERE s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%coral%' ) AS profit GROUP BY nation, o_year ORDER BY nation, o_year desc

Overall Access PlanAccess Plan:----------- Rows RETURN ( 1) | 225 GRPBY ( 2) | 36000 MDTQ ( 3) | 225 GRPBY ( 4) | 225 TBSCAN ( 5) | 225 SORT ( 6) | 2.2704e+07 HSJOIN ( 7) /-------------+-------------\ 2.2704e+07 633503 DTQ HSJOIN ( 8) ( 18)

/-------------+-------------\ 2.2704e+07 633503 DTQ HSJOIN ( 8) ( 18) | /------+-----\ 2.2704e+07 633503 25 HSJOIN TBSCAN BTQ ( 9) ( 19) ( 20) /----+----\ | | 5.07707e+07 2.45672e+07 633503 25 TBSCAN DTQ TABLE: TPCD FETCH ( 10) ( 11) SUPPLIER ( 21) | | /----+----\ 5.07707e+07 2.45672e+07 25 25 TABLE: TPCD HSJOIN IXSCAN TABLE: TPCD PARTSUPP ( 12) ( 22) NATION /-------+-------\ | 9.5221e+07 2.45672e+07 25 TBSCAN HSJOIN INDEX: TPCD ( 13) ( 14) N_NK | /---+---\ 9.5221e+07 3.80893e+08 1.30986e+08 TABLE: TPCD TBSCAN BTQ ORDERS ( 15) ( 16) | | 3.80893e+08 818664 TABLE: TPCD TBSCAN LINEITEM ( 17) | 1.26927e+07 TABLE: TPCD PART

Summary

• The db2exfmt output provides very detailed information about the access plan

• There are some diagnostics provided by DB2

• Correct major cardinality estimation errors

• If estimates are reasonable look at operator costs and consider ways to reduce the cost of the plan

Calisto [email protected]

Documents

How to Read Query Plans and Improve Performance