Upload
afa-reg
View
1.219
Download
1
Tags:
Embed Size (px)
Citation preview
Monitoring and Tuning Oracle for z/OS andOracle for z/Linux
Thomas NiewelOracle Deutschland GmbH
Page 3
Agenda
• Tuning Why ?
• Reasons for bad Response Time
• Statspack
• Diagnosing reasons for bad response Times
• SQL Tuning– TKPROF
– Explain Plan
• WLM
Page 4
Why do we need to tune ?
• Users report „bad“ response times because of
– CPU Time + Wait Time
– Poor performing queries
– SQL-Tuning
– „bad“ Database parameters
– Bottlenecks in „System“ (Operating System, WLM, IO/Subsystem etc.)
Page 5
What can be the reasons for “bad” Response Time
• High CPU Usage
• High I/O Usage
• Memory Usage
• Network problems
• „idle“ System
• Operating System (WLM, VM)
Page 6
Diagnose from the Oracle point of view
Statspack
A short overview
Page 7
Statspack – a short overview
spcreate.sql - installs Statspack (run only once)
statspack.snap - data capture (procedure)
spreport.sql - reporting
spdoc.txt - user documentation
sppurge.sql - delete Statspack data
spdrop.sql - drop Statspack
Page 8
Capturing data
• Prerequisite: timed_statistics=true
• Use stored procedure statspack.snap
SQL> execute statspack.snap;
Page 9
Capturing data
• Get a baseline for future comparisons
• Capture snapshots– across peak load– across batch window – The time between snapshots should be <= 30 minutes
• Capture can be automated– Use OS utility e.g. cron– Use dbms_job
– spauto.sql shipped as example
Page 10
Reporting with Statspack
• All data is held in an Oracle database
• Report between two or more snapshots– cannot report across instance startup
• Spreport.sql creates a report
Page 11
Reporting with Statspack
SQL> @spreport DB Id DB Name Instance# Instance ----------- ---------- ---------- ---------- 1361567071 DB21 1 MAIL
Completed Snapshots
Instance DB Name SnapId Snap Started Snap Level ---------- ---------- ------ ---------------------- ---------- DB21 DB21 1 17 Aug 2003 10:00:16 5 2 17 Aug 2003 10:30:28 5 Enter beginning Snap Id: 1 Enter ending Snap Id: 2
Enter name of output file [sp_1_2] : <enter name or return>
Page 12
Analyzing a Statspack report
• Top down analysis
• Summary page
– Enviroment – Load profile– Instance efficiency– Shared pool usage– Top 5 Timed Events
• Top SQL
Page 13
Environment section
STATSPACK report for
DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
RECONPRD 1403107896 RECONPRD 1 9.2.0.2.0 NO lin390t1
Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 2 03-Mar-03 11:28:01 10 5.1
End Snap: 31 04-Mar-03 11:58:04 17 5.5
Elapsed: 30.05 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 256M Std Block Size: 16K
Shared Pool Size: 48M Log Buffer: 128K
Page 14
Load profile
• Contains a number of common ratios
• Allows characterisation of the application
• Can point to problems– high hard parse rate
– high IO rate
– high login rate
Page 15
Load profile
• Useful if you have a comparable baseline
• What has changed?
– txn/sec change implies changed workload
– redo size/txn implies changed transaction mix
– physical reads/txn implies changed SQL or plan
Page 16
Load profile
Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 19,057.68 20,937.67 Logical reads: 2,408.15 2,645.70 Block changes: 98.64 108.37 Physical reads: 990.47 1,088.18 Physical writes: 6.92 7.61 User calls: 76.40 83.93 Parses: 7.08 7.78 Hard parses: 0.02 0.02 Sorts: 29.22 32.10 Logons: 24.73 27.17 Executes: 63.79 70.08 Transactions: 0.91
% Blocks changed per Read: 4.10 Recursive Call %: 72.76 Rollback per transaction %: 36.52 Rows per Sort: 153.46
Page 17
Instance Efficiency
• Gives an overview of how the instance is performing
• Can also be used with a comparable baseline
• Shared pool Statistics allow quick identification of cursor sharing problems
Page 18
Instance Efficiency
Instance Efficiency Percentages (Target 100%)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.99 Redo NoWait %: 99.97 Buffer Hit %: 59.00 In-memory Sort %: 99.99 Library Hit %: 99.94 Soft Parse %: 99.69 Execute to Parse %: 88.89 Latch Hit %: 99.98Parse CPU to Parse Elapsd %: 56.55 % Non-Parse CPU: 99.93
Shared Pool Statistics Begin End ------ ------ Memory Usage %: 38.86 66.81 % SQL with executions>1: 43.41 87.22 % Memory for SQL w/exec>1: 39.28 80.21
Page 19
Top 5 Timed Events
• CPU time – real work
• Shows where Oracle sessions are waiting
• Compare Wait Time to elapsed time
• % Total Wait Time shows potential benefits
• Use as basis for directed drilldown
% TotalEvent Waits Time (s) Ela Time------------------------------- ------------ ----------- --------CPU time 78,588 50.24enqueue 1,560,523 59,961 38.33db file sequential read 1,635,253 6,324 4.04db file scattered read 14,620,725 5,907 3.78control file parallel write 32,816 1,396 .89
Page 20
Top 5 Timed Events
• Sample drilldowns
– CPU Time „on CPU“
– enqueue e.g TX Enqueue
– db file sequential read
Index Access
– db file scattered read Scan Operationscontrol file parallel write
Page 21
Top SQL
• Helps to find problem statements
– SQL ordered by Gets
– SQL ordered by Reads
– SQL ordered by Executions
– SQL ordered by Parse Calls
Page 22
Top SQL
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
79,562,398 8,114 9,805.6 34.6 27182.71 28127.71 1525844323
Module: SQL*Plus
SELECT MAX(STMT_BKG_DATE_CLOSE) FROM GAH_T_STATEMENTS WHERE S
TMT_ACCT_ID = :b1 AND ((:b2 = :b3 AND STMT_CARRIER != :b4 AND
STMT_MSG_TYPE != :b5 AND (:b6 IS NULL OR :b6 = STMT_CARRIER )
AND ((:b8 IS NULL AND STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NU
LL AND :b8 = STMT_MSG_TYPE ))) OR (:b2 = :b13 AND STMT_CARRIER
Page 23
I/O Statistics
• Help to find I/O Problems
– Tablespace IO Stats
– File IO Stats
Page 24
I/O Statistics
Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
GAH_TS00_DT_MEDIUM
15,242,896 160 0.4 6.1 41,066 0 22,468 18.4
GAH_TS00_IX_ITEM
210,346 2 11.2 1.0 130,299 1 9 15.6
GAH_TS00_IX_MEDIUM
207,433 2 6.9 1.0 86,699 1 39 43.8
RECONPRD_TS00_TEMP
185,865 2 1.7 1.6 101,560 1 0 0.0
GAH_TS00_IX_ITEM_REF
155,027 2 8.4 1.0 34,867 0 1 0.0
Page 25
Diagnosing high CPU usage
• High CPU Usage
• High I/O utilization
• Memory Usage
• Network problems
• „idle“ System
• Operating System (WLM, VM)
Page 26
Diagnosing high CPU usage-Operating System-
• Linux/390– sar -u 3 3333
– iostat -x 3
– vmstat 3
– top
– Etc.
• Z/OS– SDSF
– RMF
– Omegamon
– etc.
Page 27
Diagnosing high CPU usage
• What can be the reason for „high CPU“ Usage ?
– Shared_Pool / SQL-Cache
– db_file_multiblock_read_count
– Buffer_Cache/ Buffer_Pool
– How can Statements with a great # of buffergets be seperated ?
– Statspack– SQL Script
Page 28
Diagnosing high CPU usage
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
4,494,662 155 28,997.8 2.0 1049.63 2414.11 3961361411
SELECT * FROM GAH_T_STATEMENTS WHERE STMT_ACCT_ID = :b1 AND
((:b2 = :b3 AND STMT_CARRIER != :b4 AND STMT_MSG_TYPE != :b5
AND (:b6 IS NULL OR :b6 = STMT_CARRIER ) AND ((:b8 IS NULL AND
STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NULL AND :b8 = STMT_MSG_
TYPE ))) OR (:b2 = :b13 AND STMT_CARRIER = :b14 AND STMT_MSG_T
Module: SQL*Plus
Page 29
Diagnosing high CPU usage
spool cpu_users.lst
select buffer_gets,disk_reads,executions,ratio_to_report(buffer_gets) over () * 100 buffer_ratio,ratio_to_report(disk_reads) over () * 100 disk_ratio,sql_text from v$sqlareaorder by buffer_ratio desc;
spool off
Page 30
Diagnosing high CPU usage
BUFFER_GETS DISK_READS EXECUTIONS BUFFER_RATIO DISK_RATIO
----------- ---------- ---------- ------------ ----------
SQL_TEXT
----------------------------------------------------------------------------------------
19564429 154 46908 65.9945773 5.40350877
select t.schema, t.name, t.flags, q.name from system.aq$_queue_tables t, ys.aq$_queue_table_affinities aft, system.aq$_queues q where aft.table_objno = t.objno and aft.owner_instance = :1 and q.table_objno = t.objno and q.usage = 0 and bitand(t.flags, 4+16+32+64+128+256) = 0 for update of t.name, aft.table_objno skip locked
Page 31
SQL Tuning
• Check Object Statsitics
– Use DBMS_STATS
• Analyze Execution Plan
– Explain Query / V$SQL_PLAN
– Optimize Query
– Optimize Indexes
– Index Only Access, Function Based Indexes
Page 32
Diagnose
• High CPU Usage
• High I/O utilization
• Memory Usage
• Network problems
• „idle“ System
• Operating System (WLM, VM)
Page 33
High I/O utilization
• Linux/390– sar -d 3 33333
– iostat -x 3
– vmstat 3
• Z/OS– RMF
– Omegamon etc
Page 34
High I/O utilization• Disk I/O
– Disk access is slower than memory access (Factor 5000 to 100000)
– One physical disk is able to perform 100-150 I/O´s per Second
– Disk Reponse Times (Read operations)
– 2ms (Read from disk cache)
– 10ms – 15ms (Physical Reads)
Page 35
High I/O utilization
• Reasons for High I/O utilization
– Database Cache too small (DB_CACHE_SIZE)
– Sortarea too small (sort_area_size)
– Hasharea too small (hash_area_size)
– Too many Checkpoints
– Ineffective Execution Plans (e.g. Full-Table-Scans which are not necessary)
Page 36
High I/O utilization
• Increase Cache Size
– Reduces physical I/O Operations
– Z/OS
– Limited by 31 Bit Arcitecture
– Multiple Adress Spaces help to improve the Memory management
Page 37
High I/O utilization
Single Shared SGA Across Address Spaces
AS1 AS2 AS3 ASn
• An Oracle server instance has a single SGA regardless of the number of address spaces or regions configured.
• The user context is distributed across all AS
Page 38
High I/O utilization
• Linux/390
– The default maximum SGA size on Linux/390 is 750 MB without changing the base adress
– the maximum SGA size to 1 GB by changing the SGA base address
Page 39
% Total
Event Waits Time (s) Wt Time
-------------------------------------------- ------------ ------------ -------
db file sequential read 89,086,819 11,009 93.13
db file scattered read 9,875,076 776 6.56
file open 505,227 23 .19
log file sync 440,409 8 .07
latch free 11,042,510 3 .03
High I/O utilization
Top 5 Timed Events
Page 40
Tablespace
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
RECEIVABLE_T_01
18,398,460 213 12.0 1.6 59,325 1 4,892,686 0.0
SO_T_03
6,827,475 79 13.2 1.6 27,462 0 4,506 0.0
SO_I_01
5,356,393 62 9.0 1.3 18,388 0 35,935 0.0
PO_I_01
4,641,732 1400 21.7 1.8 72,563 1 217,799 0.0
Tablespace I/O Stats:High I/O utilization
Page 41
D I R E C T A C C E S S D E V I C E A C T I V I T Y
DEVICE AVG AVG AVG AVG AVG AVG AVG AVG % % % AVG % % STORAGE DEV DEVICE VOLUME LCU ACTIVITY RESP IOSQ DPB CUB DB PEND DISC CONN DEV DEV DEV NUMBER ANY MT GROUP NUM TYPE SERIAL RATE TIME TIME DLY DLY DLY TIME TIME TIME CONN UTIL RESV ALLOC ALLOC PEND DBORACLE 7651 33903 LEOR00 008F 0.817 4 0 0.0 0.0 0.0 0.2 2.6 0.8 0.06 0.28 0.0 1.0 100.0 0.0 DBORACLE 7652 33903 LEOR01 008F 0.878 9 0 0.0 0.0 0.0 0.2 0.3 8.7 0.76 0.79 0.0 3.0 100.0 0.0 DBORACLE 7653 33903 LEOR02 008F 0.502 2 0 0.0 0.0 0.0 0.2 0.0 1.5 0.08 0.08 0.0 6.0 100.0 0.0 DBORACLE 7654 33903 LEOR03 008F 108.968 56 52 0.0 0.0 0.0 0.2 2.4 0.8 0.08 0.32 0.0 1.0 100.0 0.0 DBORACLE 7655 33903 LEOR04 008F 0.828 3 0 0.0 0.0 0.0 0.2 2.3 0.8 0.06 0.25 0.0 1.0 100.0 0.0 DBORACLE 7656 33903 LEOR05 008F 98.779 50 48 0.0 0.0 0.0 0.2 1.7 0.8 0.13 0.42 0.0 1.0 100.0 0.0 DBORACLE 7657 33903 LEOR06 008F 2.768 2 0 0.0 0.0 0.0 0.3 1.3 0.7 0.20 0.56 0.0 1.0 100.0 0.0 DBORACLE 7658 33903 LEOR07 008F 0.943 3 0 0.0 0.0 0.0 0.2 2.3 0.7 0.07 0.28 0.0 1.0 100.0 0.0 DBORACLE 7659 33903 LEOR08 008F 1.003 4 0 0.0 0.0 0.0 0.2 3.5 0.8 0.08 0.43 0.0 1.0 100.0 0.0 DBORACLE 765A 33903 LEOR09 008F 0.945 3 0 0.0 0.0 0.0 0.2 2.2 0.8 0.07 0.28 0.0 1.0 100.0 0.0 DBORACLE 765B 33903 LEOR0A 008F 0.217 3 0 0.0 0.0 0.0 0.2 2.2 0.8 0.02 0.06 0.0 1.0 100.0 0.0 DBORACLE 765C 33903 LEOR0B 008F 0.833 4 0 0.0 0.0 0.0 0.2 2.5 0.8 0.06 0.28 0.0 2.0 100.0 0.0 DBORACLE 765D 33903 LEOR0C 008F 0.963 4 0 0.0 0.0 0.0 0.2 2.7 0.9 0.09 0.35 0.0 1.0 100.0 0.0 DBORACLE 765E 33903 LEOR0D 008F 0.013 3 0 0.0 0.0 0.0 0.2 2.6 0.5 0.00 0.00 0.0 1.0 100.0 0.0 DBORACLE 765F 33903 LEOR0E 008F 0.935 4 0 0.0 0.0 0.0 0.2 3.0 0.8 0.07 0.35 0.0 1.0 100.0 0.0
RMF Report (Monitor 1; RMF Postprocessor)
High I/O utilization
Page 42
• RMF Report – Explanations
– IOSQ TIME = UCB Queueing time
– Avg Pend Time = ms, all Path´s to logical volume are busy
– AVG Resp Time = Connect Time + Dicsonnect Time + Pending Time + IOSQ
High I/O utilization
Page 43
SQL Tuning
• Check Object Statsitics
– Use DBMS_STATS
• Analyze Execution Plan
– Explain Query
– Optimize Query
– Optimize Indexes
– Index Only Access, Function Based Indexes
Page 44
Diagnose
• High CPU Usage
• High I/O utilization
• Memory Usage
• Network problems
• „idle“ System
• Operating System (WLM, VM)
Page 45
Memory Problems
• How to determine Paging/Swapping
– Linux/390
– VMSTAT
– Z/OS
– RMF
– OMEGAMON
• Reasons for Paging/Swapping
– Too many processes/users
– Database Parameters which are too generously
– DB_CACHE_SIZE
– HASH_SIZE
– SQL_CACHE
Page 46
Diagnosing high CPU usage-Operating System-
• High CPU Usage
• High I/O utilization
• Memory Usage
• Network problems
• „idle“ System
• Operating System (WLM, VM)
Page 47
Diagnosing Network problems
• Latency– LAN: < 1ms
– WAN: < 10ms - 500ms
– ISDN: < 50ms
– VPN: 100-500 ms
• Badwidth– 11-18 Mbit (Copper)
– 100 Mbit (Copper, fibre)
– 1 Gbit (fibre)
• Great number of small packets– tcp_nodelay
– SDU, TDU-Parameters (not available on z/os)
Page 48
Diagnosing high CPU usage-Operating System-
• High CPU Usage
• High I/O utilization
• Memory Usage
• Network problems
• „idle“ System
• Operating System (WLM, VM)
Page 49
Idle System
• One CPU is 100% used – All other CPU´s are idle
– Reason
– dedicated Server
– Only one process is running
– Solution
– Parallel Query – Not useful for OLTP Aplications
– Split work - run more Processes
Page 50
Idle System
• Latch Contentions – Use Statspack to diagnose
• Enqueue Waits– Use Statspack to diagnose
– Often Block Contentions because of too small initrans, Freelist, Freelist goup settings
• Parsing because the use of Literals– Use Statspack to diagnose
– Use CURSOR SHARING
– Use Bind Variables
Page 51
Idle System
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
enqueue 1,560,523 78,588 50.24
CPU time 59,961 38.33
db file sequential read 1,635,253 6,324 4.04
db file scattered read 14,620,725 5,907 3.78
control file parallel write 32,816 1,396 .89
-------------------------------------------------------------
Page 52
Idle System
Enqueue activity for DB: RECONPRD Instance: RECONPRD Snaps: 2 -31
-> Enqueue stats gathered prior to 9i should not be compared with 9i data
-> ordered by Wait Time desc, Waits desc
Avg Wt Wait
Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s)
-- ------------ ------------ ----------- ----------- ------------- ------------
TX 438,961 438,941 20 114 512,902.49 58,471
TC 34,530 34,530 0 6,904 369.61 2,552
PS 9,526,323 9,386,524 139,799 1,517,315 .25 381
CF 42,761 42,751 10 23 897.57 21
CI 55,594 55,594 0 12 6.17 0
HW 11,356 11,356 0 8 .13 0
-------------------------------------------------------------
Page 53
SQL-Tuning
• Prerequisites– Use Cost based optimizer
– DBMS_STATS (important)
• Explain Query– Create Plan Table: UTLXPLAN
• Visualize Execution Plan– UTLXPLS
– UTLXPLP Note: Scripts are located in xxxxxxxx.yyyyyyyy.SQL library (z/OS)
$ORACLE_HOME/rdbms/admin (Linux/Unix)
Page 54
SQL-TuningSQL> explain plan for select a.* from scott.emp a, scott.dept b where a.deptno=b.deptno;
Explained.
SQL> save explain
Created file explain.sql
SQL> @?/rdbms/admin/utlxpls
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 560 | 2 |
| 1 | NESTED LOOPS | | 14 | 560 | 2 |
| 2 | TABLE ACCESS FULL | EMP | 14 | 518 | 2 |
|* 3 | INDEX UNIQUE SCAN | PK_DEPT | 1 | 3 | |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
3 - access("A"."DEPTNO"="B"."DEPTNO")
Page 55
SQL-Tuning
• Optimizer features which help to improve execution plans
– Function based indexes (very important)
– SELECT * From emp where upper(ename) = ´SMITH´
– Bitmap indexes (Useful in case of Read Only)
– Useful for Low Cardinality columns
– Parameter: Optimizer_index_cost_adj
– Optimizer access path selection can be adjusted to be more index friendly
Page 56
SQL-Tuning
• SQLTRACE– Prerequisite: timed_statistics=true
– Activate
– Alter Session set SQL_trace=true
– dbms_system.set_sql_trace_in_session
– Use TKPROF to show execution statistics
– sys=no,explain=uid/pw
Page 57
z/OS WLM• Everything works fine without peaks (e.g.CPU 30%)
• Common Problems we had with WLM(during peak periods) – The „Everything is important syndrom“
– User didn´t classify any discretionary goals
– Everything had the same importance
– Enclave(Sess) with response time goals
– Enclave goes to last period (which was discretionary) shortly after Logon
– No default service class for OSDI
– Mistake in classification rules will result in SYSOTHER being used – discretionary goal
Page 58
Oracle for Linux /390
• We had tuning work
– Linux on an LPAR
– Linux under VM
• We did not have any VM related problems
• The reasons for performance bottlenecks were
– Execution plan of a few SQL Queries
– I/O Subsystem
Page 59
Oracle for z/OS
• The reasons for performance bottlenecks were
– WLM configuration
– Execution plan of a few SQL Queries
– I/O Subsystem
– Variances in disc response time
Page 60
?