Db2 sql tuning and bmc catalog manager

DB2 z/OS v8 - SQL Tuning

Overview

Understanding DB2 Optimizer

SQL Coding Strategies & Guidelines Fliter Factor Stage1 & Stage 2 Predicates Explain table How to interpret the Explain Tables

Using Monitoring Tools to understand the performance of SQLs

BMC Apptune BMC SQL Explorer

SQL Coding Strategies & Guidelines

SQL

Optimized Access

Path

DB2 OptimizerCost - Based

Query Cost

FormulasDB2Catalog

Determines database navigation Parses SQL statements for tables and columns which must be accessed Queries statistics from DB2 Catalog (populated by RUNSTATS utility) Determines least expensive access path Checks Authorization

The DB2 Optimizer is Cost Based and chooses the least expensive access path


Avoid unnecessary execution of SQL Consider accomplishing as much as possible with a single call, so as to minimize table

access as far as possible. Limit the data selected (rows & columns) using SQL and avoid filtering using Application

programs. As far as possible, Code predicates on Indexable columns Use equivalent data types for comparison. This avoids the data type conversion overhead. JOIN tables on Indexed columns. Avoid Cartesian Products. The DISTINCT, ORDER BY, GROUP BY, UNION clauses involve a SORT operation. Use

these clauses only if absolutely necessary.


Cursor Usage Tips Use Singleton SELECT statements, if you need to retrieve one row only. This

gives a far better performance than cursors.SELECT … INTO :<host variables>

Cursors should be used when you have more than one row to be retrieved. Cursors have the overhead of OPEN, FETCH & CLOSE.

To update rows using a Cursor, use the FOR UPDATE OF clause. Use FOR FETCH ONLY clause when the cursor is used for data retrieval only.

FOR READ ONLY clause provides the same functionality and is ODBC compliant.

Use the WITH HOLD clause if you don’t want DB2 to automatically close the cursor when the application issues a COMMIT statement.

Static Vs Dynamic SQL The Access paths for Dynamic SQL is determined at run-time, which results in

additional overhead. Also, users need to have direct access to the tables. The Access paths for Static SQL is determined at bind-time, and reused at run-

time. Users need only the EXECUTE access on the plan.


UNION and UNION ALL The OR operator requires Stage 2 processing. Consider rewriting the query as

the union of two SELECT statements, making index access possible UNION ALL allows duplicates, and hence does not involve a SORT.

The BETWEEN clause BETWEEN is usually more efficient than using <= and >= operators, except

when comparing a host variable to 2 columns Stage 2 : WHERE :hostvar BETWEEN col1 and col2 Stage 1: WHERE Col1 <= :hostvar AND col2 >= :hostvar


Use IN Instead of Like If you know that only a certain number of values exist and can be put in a

list Use IN or BETWEEN

IN (‘Value1’, ‘Value2’, ‘Value3’) BETWEEN :valuelow AND :valuehigh

Rather than: LIKE ‘Value_’

Use LIKE With Care Avoid the % or the _ at the beginning because it prevents DB2 from using

a matching index and may cause a scan Use the % or the _ at the end to encourage index usage


Use NOT operator with care Predicates formed using NOT (except NOT EXISTS) are Stage 1, but are not

indexable. For Subquery - when using negation logic:

• Use NOT Exists instead of NOT IN

Code the Most Restrictive Predicate First After the indexes, place the predicate that will eliminate the greatest number of

rows

Avoid Arithmetic in Predicates An index is not used for a column when the column is in an arithmetic

expression. Used at Stage 1 but not indexable


Nested loop join is efficient when Outer table is small. Predicates with small filter factor reduces no of qualifying

rows in outer table. The number of data pages accessed in inner table is also small. Highly clustered index available on join columns of the inner table. This join method is efficient when filtering for both the tables (Outer and inner) is

high. This is the most common Join method.

Merge scan is used when : Qualifying rows of inner and outer tables are large and join predicates also does

not provide much filtering Tables are large and have no indexes with matching columns

Hybrid Join is used when: A non-clustered index available on join column of the inner table and there are

duplicate qualifying rows on outer table.


Join Types & Join Predicate Considerations

Provide accurate JOIN predicates Avoid JOIN without a predicate (Cartesian Join) Join ON indexed columns Use Joins over sub-queries When the results of a join must be sorted -

Limiting the ORDER BY to columns of a single table can sometimes avoid a Sort

Specifying columns from multiple tables definitely involve a Sort Favor coding LEFT OUTER joins over RIGHT OUTER joins as DB2 always

converts RIGHT joins to LEFT before executing it.


Sub-Query Guidelines

– If there are efficient indexes available on the tables in the subquery, then a correlated subquery is likely to be the most efficient kind of subquery.

– If there are no efficient indexes available on the tables in the subquery, then a non-correlated subquery would likely perform better.

– If there are multiple subqueries in any parent query, make sure that the subqueries are ordered in the most efficient manner.


Techniques for Performance Improvement

Use OPTIMIZE OF n ROWS DB2 assumes that only the said number of rows will be retrieved by

the query before choosing the access path. It is basically like giving a Hint to the DB2 Optimizer.

This does not stop the user from accessing the entire result set. This is not useful when DB2 has to gather whole result set before

returning the first n rows. With this clause, DB2 optimizes the query for quicker response.

Updating catalog tables If RUNSTATS is costly or it cannot be executed then catalog table

should be updated manually.

Enhanced Techniques for Performance Improvement


Influencing access path – Add extra Predicate DB2 evaluates the access path based on information available in

catalog tables Wrong catalog information or unavailable catalog information may

result in selection of wrong access path Wrong access path could be because of a wrong index selection or

it could also be of index selection where a tablespace scan is effective

Code extra predicates or change the predicate to make DB2 use a different access path

Adding extra predicate may also influence the selection of join method

If you have extra predicate, Nested loop join may be selected as DB2 assumes that filter factor will be high. The proper type of predicate to add is WHERE T1.C1 = T1.C1

Hybrid join is a costlier method. Outer join does not use hybrid join. So If hybrid join is used by DB2, convert inner join to outer join and add extra predicates to remove unneeded rows.

Enhanced Techniques for Performance Improvement


General recommendations

Make sure that The queries are as simple as possible Unused rows are not fetched. Filtering to be done by DB2 not in the application

program. Unused columns are not selected There is no unnecessary ORDER BY or GROUP BY Clause Use page level locking and try to minimize lock duration. Mass updates should be avoided. Try to use indexable predicates wherever possible Do not code redundant predicates Make sure that declared length of the host variable is not greater than length

attribute of data column. If there are efficient indexes available on the tables in the subquery, co-related

subquery will perform better. Otherwise non co related subquery will perform better. If there are multiple subqueries, make sure that they are ordered in efficient

manner.

Summary

Optimizer assigns a “Filter Factor” (FF) to each predicate or predicate combination

– Number between 0 and 1 that provides the estimated filtering percentage

FF of 0.25 means 25% of the rows are estimated to qualify

– Calculated using available statistics from catalog tables • Column cardinality (COLCARDF) • HIGH2KEY/LOW2KEY • Frequency statistics (FREQUENCYF in SYSCOLDIST)

Filter Factor (FF)

RUNSTATS

RUNSTATS is a DB2 utility which provides catalog statistics used by the optimizer and statistics related to the organization of an object (TS / TB / IX / CO)

Accurate Statistics are a critical factor for performance of the SQL.

Updates the DB2 catalog and reports the statistics.

Some catalog statistics updated by RUNSTATS for use by the optimizer can be manually updated with appropriate authorization (SYSADM).

Stats Used for Access Path Determination

SYSCOLDIST– COLVALUE– FREQUENCYF– TYPE– CARDF – COLGROUPCOLNO– NUMCOLUMNS

SYSCOLUMNS

– COLCARDF– HIGH2KEY– LOW2KEY

SYSINDEXES– CLUSTERING– CLUSTERRATIOF– FIRSTKEYCARDF– FULLKEYCARDFNLEAF– NLEVELS

Stats Used for Access Path Determination

SYSINDEXPART

– LIMITKEY

SYSTABLES

– CARDF

– EDPROC

– NPAGES

– PCTROWCOMP

Stage 1 vs. Stage 2 Predicates

Stage 1 predicates may use an available Index. Stage 2 predicates cannot use any Index.

Wherever possible, prefer to use Stage 1 (Sargable) predicates in the where clause. These are conditions that can be evaluated in the Data Manager of DB2, before the results are passed to Relational Data System (RDS). The more conditions that can be evaluated early on, the more efficient data retrieval is.

Stage 1- Refers to DM( Data Manager) A suitable index must exist! Reduces I-O from disk and bufferpool activity

Stage 2 - Refers to RDS ( Relational Data System)

Stage 1 vs. Stage 2 Predicates

How does the optimizer calculate Filter Factors?

The lower the filter factor, the lower the cost. In general, the more efficient the

query will be

A tool that shows the access path used by a query.

Results of Explain stored in table PLAN_TABLE.

Explain can be run for a query outside a program or for all queries in a program.

For all queries in a program: By using EXPLAIN(YES) parameter during BIND.

Sample Explain Table Output

Explain

Explain

Explain can be run at bind time using parm value of EXPLAIN(YES)

A PLAN_TABLE must previously exist based on OWNER parm value on BIND or current SQLID for dynamic SQL

Explain can also be run against dynamic SQL DELETE FROM PLAN_TABLE WHERE QUERYNO = 999;

EXPLAIN PLAN SET QUERYNO = 999 FOR <SELECT STATEMENT GOES HERE - USE ? IN PLACE OF HOST

VARIABLES>; SELECT * FROM PLAN_TABLE WHERE QUERYNO = 999 ORDER BY QBLOCKNO, PLANNO;

Don’t forget to Explain everything

Plan_Table is where all the tuning starts

Non- Matching Index scan (ACCESSTYPE = I and MATCHCOLS = 0)

Scan all leaf pages of index selected by optimizer selecting one OR more qualifying rows. Scan can be with OR without data access.

Predicate does not match Leading columns in the index

SELECT COUNT(*) FROM TABLEA

SELECT MAX(COL1) FROM TABLEA

SELECT COL1 FROM TABLEA WHERE COL2 = :HV

Interpreting the Plan Table/Analyzing Access Paths

Non-Matching Index Scan Diagram

Root Page

Non-LeafPage 1

Non-LeafPage 2

Leaf Page 1 Leaf Page 2 Leaf Page 3 Leaf Page 4

Matching Index scan (MATCHCOLS > 0)

Scan one or more leaf pages of index selected by optimizer selecting one OR more qualifying rows. Index match based on one or more key columns of selected index. Scan can be with OR without data access. Predicates matches leading columns of the index.

SELECT COL1 FROM TABLEA WHERE COL2 = :HV

SELECT COL2 FROM TABLEA WHERE COL1 = :HV (host variable length longer than COL1)


Root Page

Non-LeafPage 1

Non-LeafPage 2

Leaf Page 1 Leaf Page 2 Leaf Page 3 Leaf Page 4

Data Page Data Page Data Page Data Page Data Page Data Page Data Page Data Page

Matching Index Scan Diagram


One Fetch Index Access (ACCESSTYPE = I1)

In certain circumstances can be THE most efficient access path in DB2.May only need to access only 1 leaf page but MAY need to traverse index tree path.

Requires only one row be retrieved ( Min or Max column function)

SELECT MIN(COL1) FROM TABLEA

SELECT MIN(COL2) FROM TABLEA WHERE COL1 = :HV (will still be I1 BUT with matchcols = 1)


IN List Index Scan (ACCESSTYPE = N)

Scan one or more leaf pages of index selected by optimizer selecting one OR more qualifying rows.

Index match based on one or more key columns of selected index. At least one key column incorporates an IN list.

SELECT * FROM TABLEA WHERE COL1 = :HV

AND COL2 IN (‘A’,’B’,’C’)

SELECT COL3 FROM TABLEA WHERE COL1 IN (‘12345’,’56789’)

AND COL2 = :HV


Table-space scan (ACCESSTYPE = R)

Scan against partitioned tablespace or simple tablespace with one table scans all pages including pages which are empty or contain purely deleted rows.

Scan against simple tablespace containing more than one table includes scanning of tables within that tablespace not necessarily included in the query.

Scan against segmented tablespace includes only pages containing data.

SELECT * FROM TABLEASELECT * FROM TABLEA WHERE COL6 = 0SELECT * FROM TABLEA WHERE COL1 <> :HV


Data Page 1 Data Page 2 Data Page 3 Data Page 4

Tablespace Scan Diagram


DB2 I/O Assisted Mechanisms

Prefetch To read data ahead in anticipation of its use. Prefetch can read up to 32 4K pages for applications, and up to 64 4K pages for utilities. Sequential Prefetch In DB2 UDB for OS/390, a mechanism that triggers consecutive asynchronous I/O operations. Pages are fetched before they are required, and several pages are read with a single I/O operation. This action is determined at bind time and can be detected by a value of “S” in the prefetch column of the plan table. If index AND data are required for the SQL, prefetch can occurs both object types.

Dynamic Prefetch Using the same approach as sequential prefetch, the mechanism is trigger at runtime if DB2 detect that access to the index and/or data pages is sequential in nature but are distributed |in a nonconsecutive manner .

List Prefetch An access method that takes advantage of prefetching even in queries that do not access data sequentially. This is done by scanning the index and collecting RIDs in advance of accessing any data pages. These RIDs are then sorted in page number order, and then data is prefetched using this list.

DB2 Explain Columns

QUERY Number –

Identifies the SQL statement in the PLAN_TABLE (any number you assign - the example uses the numeric part of the userid)

BLOCK –

Query block within the query number, where 1 is the top level SELECT. Subselects, unions, materialized views, and nested table expressions will show multiple query blocks. Each QBLOCK has it's own access path.

PLAN –

Indicates the order in which the tables will be accessed

DB2 Explain Columns

METHOD – Shows which JOIN technique was used:

00- First table accessed, continuation of previous table accessed, or not used.

01- Nested Loop Join. For each row of the present composite table, matching rows of a new table are found and joined

02- Merge Scan Join. The present composite table and the new table are scanned in the order of the join columns, and matching rows are joined.

03- Sorts needed by ORDER BY, GROUP BY, SELECT DISTINCT, UNION, a quantified predicate, or an IN predicate. This step does not access a new table.

04- Hybrid Join. The current composite table is scanned in the order of the join-column rows of the new table. The new table accessed using list prefetch.

DB2 Explain Columns

TNAME –

name of the table whose access this row refers to. Either a table in the FROM clause, or a materialized VIEW name.

TYPE (ACCESS TYPE) –

indicates whether an index was chosen: I = INDEX R = TABLESPACE SCAN (reads every data page of the table once) I1 = ONE-FETCH INDEX SCAN N = INDEX USING IN LIST M = MULTIPLE INDEX SCAN MX = NAMES ONE OF INDEXES USED MI = INTERSECT MULT. INDEXES MU = UNION MULT. INDEXES

DB2 Explain Columns

MC (MATCHCOLS) - number of columns of matching index scan ANAME (ACCESS NAME) - name of index IO (INDEX ONLY) - Y = index alone satisfies data request N = table must be accessed also

8 Sort Groups: Each sort group has four indicators indicating why the sort is necessary. Usually, a sort will cause the statement to run longer.

UNIQ - DISTINCT option or UNION was part of the query or IN list for subselect JOIN - sort for Join ORDERBY - order by option was part of the query GROUPBY - group by option was part of the query

DB2 Explain Columns

Sort flags for 'new' (inner) tables:

SNU - SORTN_UNIQ - Y = remove duplicates, N = no sort SNJ - SORTN_JOIN - Y = sort table for join, N = no sort SNO - SORTN_ORDERBY - Y = sort for order by, N = no sort SNG - SORTN_GROUPBY - Y = sort for group by, N = no sort

Sort flags for 'composite' (outer) tables: SCU - SORTC_UNIQ - Y = remove duplicates, N = no sort SCJ - SORTC_JOIN - Y = sort table for join, N = no sort SCO - SORTC_ORDERBY - Y = sort for order by, N = no sort SCG - SORTC_GROUPBY - Y = sort for group by, N = no sort

PF - PREFETCH - Indicates whether data pages were read in advance by prefetch. S = pure sequential PREFETCH L = PREFETCH through a RID list Blank = unknown, or not applicable

DB2 Explain Columns

MIXOPSEQ The sequence number of a step in a multiple index operation. PAGE_RANGE Whether the table qualifies for page range screening, so that plans

scan only the partitions that are needed. Y = Yes; blank = No COLUMN_FN_EVAL: When an SQL aggregate function is evaluated. R = while the

data is being read from the table or index; S = while performing a sort to satisfy a GROUP BY clause; blank =after data retrieval and after any sorts.

QBLOCK_TYPE For each query block, an indication of the type of SQL operation performed.

JOIN_TYPE: The type of join:F FULL OUTER JOINL LEFT OUTER JOINS STAR JOINblank INNER JOIN or no joinRIGHT OUTER JOIN converts to a LEFT OUTER JOIN

when you use it, so that JOIN_TYPE contains L.

EXPLAIN Statements with examples.doc

Performance Tools Overview

BMC APPTUNE

BMC SQL EXPLORER

BMC APPTUNE

Use Option4-Performance

Products

BMC APPTUNE

Use Option Q-Apptune and

Index components

BMC APPTUNE

Option 1-

SQL

Workload

Setting Options in BMC APPTUNE

Use

Workload

Analysis

Choose

6. Data source

5. Time interval

Viewing Reports in APPTUNE

Use Various

Options To

Generate

Reports

Reports

Generated

for Programs

Viewing SQLs in APPTUNE

Use Option S-

To Show

SQLS

Use Option X-

To EXPLAIN

SQLS

Example of EXPLAIN Result in BMC APPTUNE

Cost

Calculated

by Optimizer

Matching

Index scan

Performed

Matching

Columns

used by index

Table &

Index names

Used by

access path

BMC SQL EXPLORER

Use Option S-

SQL Explorer

Use Option 1-Explain

Setting Options in BMC SQL EXPLORER

Plans orPackages orDBRMS canbe analyzed

Package

options

Analysis run

in Batch

Mode

More references

\BMC SQL EXPLORER.doct

steps to get to Apptune.doc

Run thru of an Actual SQL Tuning Exercise

Set up Development Environment

Use Option 7 - Migrate

Access Path Statistics

Example of the SQL Tuning Process - Development

Step 1.3: Import Statistics From Production to Development

Step 2: Identification of Problem SQL – Identify problem SQL

SQL Statement

being Analysed.

Tool warns that

Cardinality is missing.

Predicate Mismatch is

also detected.


Step 2: Identification of Problem SQL – Check SQL Best Practices

No tool available for checking Best Practices. This

needs to be manually checked

using the SQL Best Practices document already Published.

A snippet of the related Best

Practice from the SQL Guidelines

document.


Step 3: SQL Optimization – SQL Rewrite

No tool available to automatically rewrite SQL

statements. This needs to be

manually rewritten and subsequent

steps for Checking the new Access

Path to be performed.


Step 3: SQL Optimization – Compare Access paths

Access Paths can be compared.

Notice the change in Estimated

Indicative cost. A different Index is being used now.


Bibliography

Redbooks at www.redbooks.ibm.com

DB2 UDB for z/OS V8 Everything you ever wanted to know… SG24-6079

DB2 UDB for z/OS V8 Performance Topics SG24-6465

DB2 for z/OS Application Design for High Performance and Availability SG24-7134 10/05

DB2 UDB for Z/OS V8 Application Programming and SQL Guide

SQL Tuning Best Practices & Guidelines Document

In the IM Project & Document Database Process Document section

1) Database 'IM Project and Document Database'

2) Select the ‘Process Document’ Section

3) Select ‘By Process Category’

4) Select ‘Best Practices’

5) View ‘Table of Contents '

6) Select document 'Database Access - SQL Tuning Best Practice & Guidelines’

Technology

Db2 sql tuning and bmc catalog manager