© IBM Corporation 2005 Informix User Forum 2005 John F. Miller III Explaining SQLEXPLAIN ®

© IBM Corporation 2005

Informix User Forum 2005

John F. Miller III

Explaining SQLEXPLAIN

®


Overview• How to view optimizer decision

– How to enable the optimizer explain file– What parts of the optimizer explain are always

enabled• Overview of operators• Different sections of the optimizer explain file• Controlling the Optimizer


How to Enable the Optimizer Explain

• Enabling and Disabling the Optimizer Explain file– Turning on in application– Dynamically enabling– Enabling statistics


Enable in the Application• Enable in the application

– SET EXPLAIN ON– SET EXPLAIN ON AVOID_EXECUTE– SET EXPLAIN OFF– SET EXPLAIN STATISTICS

• Location of the file– UNIX

• File name sqexplain.out• The current directory if you have a local client• Home directory on remote host if the client is remote

– Windows• %INFORMIXDIR%\sqexpln\{username}.out


Dynamically Enabling the Explain File

• Enable using onmode -Y– onmode –Y {session id} 1

– onmode –Y {session id} 0

• Can not turn on SQL statistics dynamically• Only queries optimized after explain has been enabled

will be displayed in the explain file.• Explain output displayed each time a query is

optimized


View Real Time Query Statistics

onstat –g pqs {sid}– Viewing current progress of query

– Real time data

– Always enabled

– Only while query is executing

Query Operators :Opaddr opname phase rows time in1 in2-----------------------------------------------------------------b444d60 scan done 52 0:0:0 0 0b45cd60 scan done 4 0:0:0 0 0b45e2f0 join done 92 0:0:0 b444d60 b45cd60b460d60 scan done 13 0:0:0 0 0b45e208 join done 795 0:0:0 b45e2f0 b460d60


View Real Time Query StatisticsQuery Operators :Opaddr opname phase rows time in1 in2-----------------------------------------------------------------b444d60 scan done 52 0:0:0 0 0b45cd60 scan done 4 0:0:0 0 0b45e2f0 join done 92 0:0:0 b444d60 b45cd60b460d60 scan done 13 0:0:0 0 0b45e208 join done 795 0:0:0 b45e2f0 b460d60

Scan Scan

ScanJoin

Join


Sysmaster Real Time Optimizer Information

• sysmaster:syssqexplain table–Optimizer plan information always present–Profile information about the query requires

•Environment variable SQLSTATS = 1

–Both a client and server environment variable

–Set before the server starts; all sessions track information

–Set before starting the client; only information for that client will be tracked

–Query Timing and Profile information•Environment variable SQLSTATS = 2


Sysmaster Syssqexplain Column Information

– Columns that are always enabled– sqx_sessionid sqx_sdbno sqx_iscurrent

sqx_conbno sqx_ismain sqx_selflagsqx_estcost sqx_estrows sqx_seqscan sqx_srtscan sqx_autoindex sqx_index

sqx_remsql sqx_mrgjoinsqx_dynhashjoin sqx_keyonly sqx_tempfile

sqx_tempview sqx_secthreads sqx_sqlstatement

– Columns which require SQLSTATS >= 1 – sqx_bufreads sqx_pagereads sqx_bufwrites

sqx_pagewrites sqx_totsorts sqx_dsksortssqx_sortspmax

– Columns which required SQLSTATS = 2 – sqx_executions sqx_cumtime


Major Operator Types• Scan Operators

– The actual operation of reading data.

• Join Operators– Used when a query involves more than one

table– Rows from one table are matched to rows

of the second table

• Filters– The search criteria of a query used to

determine which rows are to be selected by the query.


Scan Types

• Sequential Table Scan– Data rows are read in

sequential order as they exist in the table.

• Index Scan– Data in a table is accessed

through an index.

• ROWID Scan– The data rows physical

location is given and read directly

• Skip Duplicate Index Scan– Special type of index scan in

which duplicate rows are skipped


Scan Types (con’t)• First Row Scan

– Data is scanned until the first row satisfies the search condition is found, then the scan is terminated

• Auto Index Path– Add an index on the fly if one does not exist on a

required column

• Key-Only Scan– If all columns required by the query are available

in the index, then the data row is not retrieved.


Join Types• Nested Loop Join

– Scans the inner table once for every row in the outer table.

– The inner table can use either an index or table scan.

• Dynamic Hash Join– Scans the smaller table and builds a hash table by

applying a hash function on the join key. – The larger table is passed by the hash function to see

if a match occurs.

• Cartesian Product Join– All rows of the inner table with all rows of the outer

table.


Filters• Lower-index filter

– A single filter or set of filters which provide a lower bound or starting position for an index scan.

– Example: column >= 10

Start scanning the ascending index from value 10

• Upper-Index Filter– A single filter or set of filters which

provide an upper bound or stopping position for an index scan


Filters (continued)

• Key-First Filter– Index key filters are applied first to see if the

corresponding data row needs to be retrieved.

– Example: SELECT * FROM tab where c1>=1 and c2=2

If an index exists on c1 and c2 then the filter c2=2 will be applied to each index key first. Only if this is satisfied will the corresponding data row be fetched

– C1>=1 is used as a lower index filter to start the index scan


Five Sections of the SQEXPLAIN File

• Overview• Access Order• Detailed

Access Plan • UDR info• Optimizer

Statistics (optional)

QUERY-----SELECT tab1.c2 FROM tab1,tab2 WHERE tab1.c2 = tab2.c2 AND tab1.c2 MATCHES

"systable*"Estimated Cost: 321Estimated # of Rows Returned: 148

1) miller3.tab1: SEQUENTIAL SCAN

Filters: miller3.tab1.c2 MATCHES 'systable*'

2) miller3.tab2: INDEX PATH

(1) Index Keys: c2 (Key-Only) (Serial, fragments: ALL)

Lower Index Filter: miller3.tab2.c2 MATCHES 'systable*'

DYNAMIC HASH JOIN Dynamic Hash Filters: miller3.tab1.c2 =

miller3.tab2.c2


Overview Section• The SQL statement for the query• An estimate of the query cost in units the

optimizer uses to compare plans• An estimate for the number of rows that the

query is expected to return• Maximum number of threads used to execute the

query if PDQPRIORITY is set.• Any temporary files required for executing order

by and/or group by• Directives followed or ignored


Access Order

• The tables are listed in the order in which they are accessed. – Top of the list is the first table accessed

• For each table the following information is listed:– The accessed table’s name

– The access plan by which the server reads the table

• Sequential scan, Index path,…

• The active fragments to be scanned.

– For each pair of tables the join plan is listed• Nested loop join, dynamic hash join


Detailed Access Plan

• This section is repeated for each table in the Access Section.

• Depending on the type of access different information is displayed– Sequential Scans

• Filters to be applied

– Index and auto-index Scans• The index keys on which filters are applied• Key-Only, Aggregate and/or Key-First • Lower index filter• Upper index filter

• Key-First Filters


UDR Info

• Information about UDRs used in the query.– UDR Name

– UDR id• The UDR id is given because UDRs can have the

same name but take different arguments.

Example:UDRs in query:-------------- UDR id : 234 UDR name: x


Optimizer Statistics• A list of the tables and their internal names• For each operator

– Operator type– Internal table name– Rows produced by the operator– Optimizer predicted rows produce by the

operator– Rows scanned– Estimated cost of this operator


Example of Optimizer StatisticsQUERYSELECT tab1.c2 FROM tab1,tab2 WHERE tab1.c2 = tab2.c2 AND tab1.c2 MATCHES "systable*"Estimated Cost: 321Estimated # of Rows Returned: 148

1) miller3.tab1: SEQUENTIAL SCAN

Filters: miller3.tab1.c2 MATCHES 'systable*'

2) miller3.tab2: INDEX PATH

(1) Index Keys: c2 (Key-Only) (Serial, fragments: ALL) Lower Index Filter: miller3.tab2.c2 MATCHES

'systable*'

DYNAMIC HASH JOIN Dynamic Hash Filters: miller3.tab1.c2 = miller3.tab2.c2


Optimizer Statistics

Table map :----------------------------Internal name Table name----------------------------t1 tab1t2 tab2

type table rows_prod est_rows time rows_scan est_cost-------------------------------------------------------------------------------scan t1 92 92 0:0:0 5244 282

type table rows_prod est_rows time rows_scan est_cost-------------------------------------------------------------------------------scan t2 92 92 0:0:0 92 5

type time rows_bld rows_prb novrflo est_cost-------------------------------------------------------------------------------hjoin 8464 148 0:0:0 92 92 0 321

est_rowsrows_prod

Optimizer predictions

Actual Numbers


Examining the Running QueryNo Statistics VS Medium Statistics

No StatisticsQUERY:------select * from t1 where c1 > 20200

Estimated Cost: 20888Estimated # of Rows Returned: 6760

1) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > 20200

No StatisticsQUERY:------select * from t1 where c1 > 20200


1) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > 20200

Medium StatisticsQUERY:------select * from t1 where c1 > 20200


1) miller3.t1: INDEX PATH (1) Index Keys: c1 (Serial, fragments: ALL) Lower Index Filter: t1.c1 > 20250



1) miller3.t1: INDEX PATH (1) Index Keys: c1 (Serial, fragments: ALL) Lower Index Filter: t1.c1 > 20250

Total Row ReturnedActual 30 estimated 6760 for no statistics 19 for medium

Access MethodChanged from Sequential Scan to Index Path

Estimated CostHuge drop in Estimated Cost

Overall performance improved The estimates were more accurate

The query plan changed


Examining the Running QueryMedium Statistics VS High Statistics

High StatisticsQUERY:------select * from t1 where c1 > 20200


1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250

High StatisticsQUERY:------select * from t1 where c1 > 20200



Total Row ReturnedActual 30, estimated 19 for medium statistics, 30 for high statisticsThe estimated from medium is fairly close

Access Method No change in Access MethodEstimated CostNo significant difference

Overall performance did not change The estimates were slightly more accurate

The query plan did not change








Controlling the Optimizer• External Directives

– Allows DBA to modify the query plan w/o changing the application

– Directives not required to be in the application

– To create external directives use the SAVE EXTERNAL DIRECTIVE statement

• SET OPTIMIZATION– LOW, HIGH,

– FIRST_ROWS, ALL_ROWS

• Update Statistics– Statistics and distributions

• Directives– Positive and negative directives

• Other command and/or environment variables– OPTCOMIND, PDQPRIORITY, IFX_DW_OPT


Other Information

• An Overview of the IBM Informix Dynamic Server Optimizer

– http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0211desai/0211desai.html

• Understanding and Tuning Update Statistics – http://www-.ibm.com/developerworks/db2/zones/informix/library/techarticle/miller/0203m

iller.html

• Predicate Inference in Informix Dynamic Server – http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0206goswami/

0206goswami.html

• IBM Informix Performance Manual• IBM Informix SQL Reference Manual


Questions

Documents

© IBM Corporation 2005 Informix User Forum 2005 John F. Miller III Explaining SQLEXPLAIN ®