Thomas+Niewel+ +Oracletuning

Monitoring and Tuning Oracle for z/OS andOracle for z/Linux

Thomas NiewelOracle Deutschland GmbH

[email protected]

Page 3

Agenda

• Tuning Why ?

• Reasons for bad Response Time

• Statspack

• Diagnosing reasons for bad response Times

• SQL Tuning– TKPROF

– Explain Plan

• WLM

Page 4

Why do we need to tune ?

• Users report „bad“ response times because of

– CPU Time + Wait Time

– Poor performing queries

– SQL-Tuning

– „bad“ Database parameters

– Bottlenecks in „System“ (Operating System, WLM, IO/Subsystem etc.)

Page 5

What can be the reasons for “bad” Response Time

• High CPU Usage

• High I/O Usage

• Memory Usage

• Network problems

• „idle“ System

• Operating System (WLM, VM)

Page 6

Diagnose from the Oracle point of view

Statspack

A short overview

Page 7

Statspack – a short overview

spcreate.sql - installs Statspack (run only once)

statspack.snap - data capture (procedure)

spreport.sql - reporting

spdoc.txt - user documentation

sppurge.sql - delete Statspack data

spdrop.sql - drop Statspack

Page 8

Capturing data

• Prerequisite: timed_statistics=true

• Use stored procedure statspack.snap

SQL> execute statspack.snap;

Page 9

Capturing data

• Get a baseline for future comparisons

• Capture snapshots– across peak load– across batch window – The time between snapshots should be <= 30 minutes

• Capture can be automated– Use OS utility e.g. cron– Use dbms_job

– spauto.sql shipped as example

Page 10

Reporting with Statspack

• All data is held in an Oracle database

• Report between two or more snapshots– cannot report across instance startup

• Spreport.sql creates a report

Page 11

Reporting with Statspack

SQL> @spreport DB Id DB Name Instance# Instance ----------- ---------- ---------- ---------- 1361567071 DB21 1 MAIL

Completed Snapshots

Instance DB Name SnapId Snap Started Snap Level ---------- ---------- ------ ---------------------- ---------- DB21 DB21 1 17 Aug 2003 10:00:16 5 2 17 Aug 2003 10:30:28 5 Enter beginning Snap Id: 1 Enter ending Snap Id: 2

Enter name of output file [sp_1_2] : <enter name or return>

Page 12

Analyzing a Statspack report

• Top down analysis

• Summary page

– Enviroment – Load profile– Instance efficiency– Shared pool usage– Top 5 Timed Events

• Top SQL

Page 13

Environment section

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host

------------ ----------- ------------ -------- ----------- ------- ------------

RECONPRD 1403107896 RECONPRD 1 9.2.0.2.0 NO lin390t1

Snap Id Snap Time Sessions Curs/Sess Comment

------- ------------------ -------- --------- -------------------

Begin Snap: 2 03-Mar-03 11:28:01 10 5.1

End Snap: 31 04-Mar-03 11:58:04 17 5.5

Elapsed: 30.05 (mins)

Cache Sizes (end)

~~~~~~~~~~~~~~~~~

Buffer Cache: 256M Std Block Size: 16K

Shared Pool Size: 48M Log Buffer: 128K

Page 14

Load profile

• Contains a number of common ratios

• Allows characterisation of the application

• Can point to problems– high hard parse rate

– high IO rate

– high login rate

Page 15

Load profile

• Useful if you have a comparable baseline

• What has changed?

– txn/sec change implies changed workload

– redo size/txn implies changed transaction mix

– physical reads/txn implies changed SQL or plan

Page 16

Load profile

Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 19,057.68 20,937.67 Logical reads: 2,408.15 2,645.70 Block changes: 98.64 108.37 Physical reads: 990.47 1,088.18 Physical writes: 6.92 7.61 User calls: 76.40 83.93 Parses: 7.08 7.78 Hard parses: 0.02 0.02 Sorts: 29.22 32.10 Logons: 24.73 27.17 Executes: 63.79 70.08 Transactions: 0.91

% Blocks changed per Read: 4.10 Recursive Call %: 72.76 Rollback per transaction %: 36.52 Rows per Sort: 153.46

Page 17

Instance Efficiency

• Gives an overview of how the instance is performing

• Can also be used with a comparable baseline

• Shared pool Statistics allow quick identification of cursor sharing problems

Page 18

Instance Efficiency

Instance Efficiency Percentages (Target 100%)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.99 Redo NoWait %: 99.97 Buffer Hit %: 59.00 In-memory Sort %: 99.99 Library Hit %: 99.94 Soft Parse %: 99.69 Execute to Parse %: 88.89 Latch Hit %: 99.98Parse CPU to Parse Elapsd %: 56.55 % Non-Parse CPU: 99.93

Shared Pool Statistics Begin End ------ ------ Memory Usage %: 38.86 66.81 % SQL with executions>1: 43.41 87.22 % Memory for SQL w/exec>1: 39.28 80.21

Page 19

Top 5 Timed Events

• CPU time – real work

• Shows where Oracle sessions are waiting

• Compare Wait Time to elapsed time

• % Total Wait Time shows potential benefits

• Use as basis for directed drilldown

% TotalEvent Waits Time (s) Ela Time------------------------------- ------------ ----------- --------CPU time 78,588 50.24enqueue 1,560,523 59,961 38.33db file sequential read 1,635,253 6,324 4.04db file scattered read 14,620,725 5,907 3.78control file parallel write 32,816 1,396 .89

Page 20

Top 5 Timed Events

• Sample drilldowns

– CPU Time „on CPU“

– enqueue e.g TX Enqueue

– db file sequential read

Index Access

– db file scattered read Scan Operationscontrol file parallel write

Page 21

Top SQL

• Helps to find problem statements

– SQL ordered by Gets

– SQL ordered by Reads

– SQL ordered by Executions

– SQL ordered by Parse Calls

Page 22

Top SQL

CPU Elapsd

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value

--------------- ------------ -------------- ------ -------- --------- ----------

79,562,398 8,114 9,805.6 34.6 27182.71 28127.71 1525844323

Module: SQL*Plus

SELECT MAX(STMT_BKG_DATE_CLOSE) FROM GAH_T_STATEMENTS WHERE S

TMT_ACCT_ID = :b1 AND ((:b2 = :b3 AND STMT_CARRIER != :b4 AND

STMT_MSG_TYPE != :b5 AND (:b6 IS NULL OR :b6 = STMT_CARRIER )

AND ((:b8 IS NULL AND STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NU

LL AND :b8 = STMT_MSG_TYPE ))) OR (:b2 = :b13 AND STMT_CARRIER

Page 23

I/O Statistics

• Help to find I/O Problems

– Tablespace IO Stats

– File IO Stats

Page 24

I/O Statistics

Tablespace

------------------------------

Av Av Av Av Buffer Av Buf

Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)

-------------- ------- ------ ------- ------------ -------- ---------- ------

GAH_TS00_DT_MEDIUM

15,242,896 160 0.4 6.1 41,066 0 22,468 18.4

GAH_TS00_IX_ITEM

210,346 2 11.2 1.0 130,299 1 9 15.6

GAH_TS00_IX_MEDIUM

207,433 2 6.9 1.0 86,699 1 39 43.8

RECONPRD_TS00_TEMP

185,865 2 1.7 1.6 101,560 1 0 0.0

GAH_TS00_IX_ITEM_REF

155,027 2 8.4 1.0 34,867 0 1 0.0

Page 25

Diagnosing high CPU usage

• High CPU Usage

• High I/O utilization

• Memory Usage




Page 26

Diagnosing high CPU usage-Operating System-

• Linux/390– sar -u 3 3333

– iostat -x 3

– vmstat 3

– top

– Etc.

• Z/OS– SDSF

– RMF

– Omegamon

– etc.

Page 27


• What can be the reason for „high CPU“ Usage ?

– Shared_Pool / SQL-Cache

– db_file_multiblock_read_count

– Buffer_Cache/ Buffer_Pool

– How can Statements with a great # of buffergets be seperated ?

– Statspack– SQL Script

Page 28


CPU Elapsd

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value

--------------- ------------ -------------- ------ -------- --------- ----------

4,494,662 155 28,997.8 2.0 1049.63 2414.11 3961361411

SELECT * FROM GAH_T_STATEMENTS WHERE STMT_ACCT_ID = :b1 AND

((:b2 = :b3 AND STMT_CARRIER != :b4 AND STMT_MSG_TYPE != :b5

AND (:b6 IS NULL OR :b6 = STMT_CARRIER ) AND ((:b8 IS NULL AND

STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NULL AND :b8 = STMT_MSG_

TYPE ))) OR (:b2 = :b13 AND STMT_CARRIER = :b14 AND STMT_MSG_T

Module: SQL*Plus

Page 29


spool cpu_users.lst

select buffer_gets,disk_reads,executions,ratio_to_report(buffer_gets) over () * 100 buffer_ratio,ratio_to_report(disk_reads) over () * 100 disk_ratio,sql_text from v$sqlareaorder by buffer_ratio desc;

spool off

Page 30


BUFFER_GETS DISK_READS EXECUTIONS BUFFER_RATIO DISK_RATIO

----------- ---------- ---------- ------------ ----------

SQL_TEXT

----------------------------------------------------------------------------------------

19564429 154 46908 65.9945773 5.40350877

select t.schema, t.name, t.flags, q.name from system.aq$_queue_tables t, ys.aq$_queue_table_affinities aft, system.aq$_queues q where aft.table_objno = t.objno and aft.owner_instance = :1 and q.table_objno = t.objno and q.usage = 0 and bitand(t.flags, 4+16+32+64+128+256) = 0 for update of t.name, aft.table_objno skip locked

Page 31

SQL Tuning

• Check Object Statsitics

– Use DBMS_STATS

• Analyze Execution Plan

– Explain Query / V$SQL_PLAN

– Optimize Query

– Optimize Indexes

– Index Only Access, Function Based Indexes

Page 32

Diagnose

• High CPU Usage


• Memory Usage




Page 33

High I/O utilization

• Linux/390– sar -d 3 33333

– iostat -x 3

– vmstat 3

• Z/OS– RMF

– Omegamon etc

Page 34

High I/O utilization• Disk I/O

– Disk access is slower than memory access (Factor 5000 to 100000)

– One physical disk is able to perform 100-150 I/O´s per Second

– Disk Reponse Times (Read operations)

– 2ms (Read from disk cache)

– 10ms – 15ms (Physical Reads)

Page 35


• Reasons for High I/O utilization

– Database Cache too small (DB_CACHE_SIZE)

– Sortarea too small (sort_area_size)

– Hasharea too small (hash_area_size)

– Too many Checkpoints

– Ineffective Execution Plans (e.g. Full-Table-Scans which are not necessary)

Page 36


• Increase Cache Size

– Reduces physical I/O Operations

– Z/OS

– Limited by 31 Bit Arcitecture

– Multiple Adress Spaces help to improve the Memory management

Page 37


Single Shared SGA Across Address Spaces

AS1 AS2 AS3 ASn

• An Oracle server instance has a single SGA regardless of the number of address spaces or regions configured.

• The user context is distributed across all AS

Page 38


• Linux/390

– The default maximum SGA size on Linux/390 is 750 MB without changing the base adress

– the maximum SGA size to 1 GB by changing the SGA base address

Page 39

% Total

Event Waits Time (s) Wt Time

-------------------------------------------- ------------ ------------ -------

db file sequential read 89,086,819 11,009 93.13

db file scattered read 9,875,076 776 6.56

file open 505,227 23 .19

log file sync 440,409 8 .07

latch free 11,042,510 3 .03


Top 5 Timed Events

Page 40

Tablespace

Av Av Av Av Buffer Av Buf

Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)

-------------- ------- ------ ------- ------------ -------- ---------- ------

RECEIVABLE_T_01

18,398,460 213 12.0 1.6 59,325 1 4,892,686 0.0

SO_T_03

6,827,475 79 13.2 1.6 27,462 0 4,506 0.0

SO_I_01

5,356,393 62 9.0 1.3 18,388 0 35,935 0.0

PO_I_01

4,641,732 1400 21.7 1.8 72,563 1 217,799 0.0

Tablespace I/O Stats:High I/O utilization

Page 41

D I R E C T A C C E S S D E V I C E A C T I V I T Y

DEVICE AVG AVG AVG AVG AVG AVG AVG AVG % % % AVG % % STORAGE DEV DEVICE VOLUME LCU ACTIVITY RESP IOSQ DPB CUB DB PEND DISC CONN DEV DEV DEV NUMBER ANY MT GROUP NUM TYPE SERIAL RATE TIME TIME DLY DLY DLY TIME TIME TIME CONN UTIL RESV ALLOC ALLOC PEND DBORACLE 7651 33903 LEOR00 008F 0.817 4 0 0.0 0.0 0.0 0.2 2.6 0.8 0.06 0.28 0.0 1.0 100.0 0.0 DBORACLE 7652 33903 LEOR01 008F 0.878 9 0 0.0 0.0 0.0 0.2 0.3 8.7 0.76 0.79 0.0 3.0 100.0 0.0 DBORACLE 7653 33903 LEOR02 008F 0.502 2 0 0.0 0.0 0.0 0.2 0.0 1.5 0.08 0.08 0.0 6.0 100.0 0.0 DBORACLE 7654 33903 LEOR03 008F 108.968 56 52 0.0 0.0 0.0 0.2 2.4 0.8 0.08 0.32 0.0 1.0 100.0 0.0 DBORACLE 7655 33903 LEOR04 008F 0.828 3 0 0.0 0.0 0.0 0.2 2.3 0.8 0.06 0.25 0.0 1.0 100.0 0.0 DBORACLE 7656 33903 LEOR05 008F 98.779 50 48 0.0 0.0 0.0 0.2 1.7 0.8 0.13 0.42 0.0 1.0 100.0 0.0 DBORACLE 7657 33903 LEOR06 008F 2.768 2 0 0.0 0.0 0.0 0.3 1.3 0.7 0.20 0.56 0.0 1.0 100.0 0.0 DBORACLE 7658 33903 LEOR07 008F 0.943 3 0 0.0 0.0 0.0 0.2 2.3 0.7 0.07 0.28 0.0 1.0 100.0 0.0 DBORACLE 7659 33903 LEOR08 008F 1.003 4 0 0.0 0.0 0.0 0.2 3.5 0.8 0.08 0.43 0.0 1.0 100.0 0.0 DBORACLE 765A 33903 LEOR09 008F 0.945 3 0 0.0 0.0 0.0 0.2 2.2 0.8 0.07 0.28 0.0 1.0 100.0 0.0 DBORACLE 765B 33903 LEOR0A 008F 0.217 3 0 0.0 0.0 0.0 0.2 2.2 0.8 0.02 0.06 0.0 1.0 100.0 0.0 DBORACLE 765C 33903 LEOR0B 008F 0.833 4 0 0.0 0.0 0.0 0.2 2.5 0.8 0.06 0.28 0.0 2.0 100.0 0.0 DBORACLE 765D 33903 LEOR0C 008F 0.963 4 0 0.0 0.0 0.0 0.2 2.7 0.9 0.09 0.35 0.0 1.0 100.0 0.0 DBORACLE 765E 33903 LEOR0D 008F 0.013 3 0 0.0 0.0 0.0 0.2 2.6 0.5 0.00 0.00 0.0 1.0 100.0 0.0 DBORACLE 765F 33903 LEOR0E 008F 0.935 4 0 0.0 0.0 0.0 0.2 3.0 0.8 0.07 0.35 0.0 1.0 100.0 0.0

RMF Report (Monitor 1; RMF Postprocessor)


Page 42

• RMF Report – Explanations

– IOSQ TIME = UCB Queueing time

– Avg Pend Time = ms, all Path´s to logical volume are busy

– AVG Resp Time = Connect Time + Dicsonnect Time + Pending Time + IOSQ


Page 43

SQL Tuning

• Check Object Statsitics

– Use DBMS_STATS

• Analyze Execution Plan

– Explain Query

– Optimize Query

– Optimize Indexes

– Index Only Access, Function Based Indexes

Page 44

Diagnose

• High CPU Usage


• Memory Usage




Page 45

Memory Problems

• How to determine Paging/Swapping

– Linux/390

– VMSTAT

– Z/OS

– RMF

– OMEGAMON

• Reasons for Paging/Swapping

– Too many processes/users

– Database Parameters which are too generously

– DB_CACHE_SIZE

– HASH_SIZE

– SQL_CACHE

Page 46


• High CPU Usage


• Memory Usage




Page 47

Diagnosing Network problems

• Latency– LAN: < 1ms

– WAN: < 10ms - 500ms

– ISDN: < 50ms

– VPN: 100-500 ms

• Badwidth– 11-18 Mbit (Copper)

– 100 Mbit (Copper, fibre)

– 1 Gbit (fibre)

• Great number of small packets– tcp_nodelay

– SDU, TDU-Parameters (not available on z/os)

Page 48


• High CPU Usage


• Memory Usage




Page 49

Idle System

• One CPU is 100% used – All other CPU´s are idle

– Reason

– dedicated Server

– Only one process is running

– Solution

– Parallel Query – Not useful for OLTP Aplications

– Split work - run more Processes

Page 50

Idle System

• Latch Contentions – Use Statspack to diagnose

• Enqueue Waits– Use Statspack to diagnose

– Often Block Contentions because of too small initrans, Freelist, Freelist goup settings

• Parsing because the use of Literals– Use Statspack to diagnose

– Use CURSOR SHARING

– Use Bind Variables

Page 51

Idle System

Top 5 Timed Events

~~~~~~~~~~~~~~~~~~ % Total

Event Waits Time (s) Ela Time

-------------------------------------------- ------------ ----------- --------

enqueue 1,560,523 78,588 50.24

CPU time 59,961 38.33

db file sequential read 1,635,253 6,324 4.04

db file scattered read 14,620,725 5,907 3.78

control file parallel write 32,816 1,396 .89

-------------------------------------------------------------

Page 52

Idle System

Enqueue activity for DB: RECONPRD Instance: RECONPRD Snaps: 2 -31

-> Enqueue stats gathered prior to 9i should not be compared with 9i data

-> ordered by Wait Time desc, Waits desc

Avg Wt Wait

Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s)

-- ------------ ------------ ----------- ----------- ------------- ------------

TX 438,961 438,941 20 114 512,902.49 58,471

TC 34,530 34,530 0 6,904 369.61 2,552

PS 9,526,323 9,386,524 139,799 1,517,315 .25 381

CF 42,761 42,751 10 23 897.57 21

CI 55,594 55,594 0 12 6.17 0

HW 11,356 11,356 0 8 .13 0

-------------------------------------------------------------

Page 53

SQL-Tuning

• Prerequisites– Use Cost based optimizer

– DBMS_STATS (important)

• Explain Query– Create Plan Table: UTLXPLAN

• Visualize Execution Plan– UTLXPLS

– UTLXPLP Note: Scripts are located in xxxxxxxx.yyyyyyyy.SQL library (z/OS)

$ORACLE_HOME/rdbms/admin (Linux/Unix)

Page 54

SQL-TuningSQL> explain plan for select a.* from scott.emp a, scott.dept b where a.deptno=b.deptno;

Explained.

SQL> save explain

Created file explain.sql

SQL> @?/rdbms/admin/utlxpls

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

--------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost |

--------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 14 | 560 | 2 |

| 1 | NESTED LOOPS | | 14 | 560 | 2 |

| 2 | TABLE ACCESS FULL | EMP | 14 | 518 | 2 |

|* 3 | INDEX UNIQUE SCAN | PK_DEPT | 1 | 3 | |

--------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

3 - access("A"."DEPTNO"="B"."DEPTNO")

Page 55

SQL-Tuning

• Optimizer features which help to improve execution plans

– Function based indexes (very important)

– SELECT * From emp where upper(ename) = ´SMITH´

– Bitmap indexes (Useful in case of Read Only)

– Useful for Low Cardinality columns

– Parameter: Optimizer_index_cost_adj

– Optimizer access path selection can be adjusted to be more index friendly

Page 56

SQL-Tuning

• SQLTRACE– Prerequisite: timed_statistics=true

– Activate

– Alter Session set SQL_trace=true

– dbms_system.set_sql_trace_in_session

– Use TKPROF to show execution statistics

– sys=no,explain=uid/pw

Page 57

z/OS WLM• Everything works fine without peaks (e.g.CPU 30%)

• Common Problems we had with WLM(during peak periods) – The „Everything is important syndrom“

– User didn´t classify any discretionary goals

– Everything had the same importance

– Enclave(Sess) with response time goals

– Enclave goes to last period (which was discretionary) shortly after Logon

– No default service class for OSDI

– Mistake in classification rules will result in SYSOTHER being used – discretionary goal

Page 58

Oracle for Linux /390

• We had tuning work

– Linux on an LPAR

– Linux under VM

• We did not have any VM related problems

• The reasons for performance bottlenecks were

– Execution plan of a few SQL Queries

– I/O Subsystem

Page 59

Oracle for z/OS

• The reasons for performance bottlenecks were

– WLM configuration

– Execution plan of a few SQL Queries

– I/O Subsystem

– Variances in disc response time

Page 60

?

Technology

Thomas+Niewel+ +Oracletuning