Oracle Wait Events That Everyone Should Know

Oracle Wait Events

That Everyone Should Know

Kerry OsborneSenior Oracle Guy

What Are Wait Events? Basically a section of code Uses gettimeofday All time is stuck in a bucket with a name Multiple things can be stuck in the same bucket Oracle is very well instrumented (but It’s not 100%) There are 41 classes of wait events in 10.2.0.3 There are 878 wait events in 10.2.0.3

209 enqueue events 29 latch events 41 I/O events

What do you do with Wait Events?

See where Oracle is spending it’s time Create a Resource Profile

Aggregate the time by Wait Event Order the profile by time Include # of occurrences and avg. time/event Shows where to focus Shows how much improvement can be made

Profiles

Basic Tool – maybe the most important Shows up in Statspack / AWR reports Shows up in tkprof output (10g) Can be pulled out of trace files Can be pulled from V$ tables

(v$system_event) Can be pulled from ash tables

Oracle is very well instrumented

Profiles

Top 5 Timed Events~~~~~~~~~~~~~~~~~~ % TotalEvent Waits Time (s) Ela Time-------------------------------------------- ------------ ----------- --------CPU time 37,297 52.27SQL*Net message from dblink 7,065,802 11,312 15.85async disk IO 943,852 5,265 7.38db file sequential read 13,042,432 3,493 4.90direct path write 108,862 3,410 4.78 -------------------------------------------------------------

Why should developers know this stuff?

1. Because you need to know where your code is spending it’s time

2. Scalability is Important

1. These are the events you should know by heart 2. The other 260 or so you can figure out 3. You need to know what causes them 4. You need to know what values are reasonable 5. You need to know what you can do to fix them

Top 10 List of Wait Events

select name, parameter1, parameter2, parameter3from v$event_namewhere name like nvl('&event_name',name)order by name

Not a wait event – but often included in a profile Time spent but not accounted for in other buckets Primarily time spent doing lio (we hope) We want it to be on top It may be an indicator of inefficient plans

CPU

Single block read Usually index block or data block via rowid Can be done by fts to get chained rows P1=file, P2=block, P3=# of blocks (always 1) Always a user’s server process reading into buffer

cache Reasonable e values: <10ms

DB File Sequential Read

WAIT #14: nam='db file sequential read' ela= 36745 file#=1 block#=67745 blocks=1 obj#=84920

Do Less Work tune SQL

reduce lio’s 7 / obj rule of thumb Explain plan lies (see notes on this slide)

bigger buffer cache

Do the Work Faster Speed up I/O

Call Jack

DB File Sequential Read – “fixes”

Explain plan appears to executes a separate code path Even if it didn’t there is no guarantee TEST matches PROD Best bet is to execute and look at V$SQL_PLAN This is easy in 10g with dbms_xplan

select * from table(dbms_xplan.display_cursor('&sql_id','&child_no',''))

See an example in the notes section for this slide

Digression – Explain Plan Lies

Multi block read Usually full table scan or index fast full scan P1=file, P2=block, P3=# of blocks (always < MBRC) Always a user’s server process reading into buffer

cache Reasonable ela values: <10ms Relevant parameters:

DBFMBRC, System Stats: MBRC, _db_file_exec_read_count, _db_file_optimizer_read_count

DB File Scattered Read

WAIT #14: nam='db file scattered read' ela= 19389 file#=139 block#=44 blocks=5 obj#=83580

Do Less Work (fewer multi-block reads) MBRC Better Indexes System Stats bigger buffer cache Tune SQL

Do the Work Faster Speed up I/O

DB File Scattered Read – “fixes”

Usually sorting to Temp Can also be parallel query Could also be insert append, etc… Reasonable ela values: <20ms Relevant parameters:

DBFMBRC _db_file_direct_io_count (1M)

Direct Path Read/Write

WAIT #10: nam='direct path write' ela= 4475 p1=401 p2=1518353 p3=57WAIT #10: nam='direct path read' ela= 16770 p1=401 p2=1482031 p3=63

Adjust PGA_AGGREGATE_TARGET Turn off PX query Call Randy

Direct Path Read/Write – “fixes”

User process wait for LGWR to flush dirty buffers on commit

Synchronous write event P1=log buffer block Reasonable ela values: <4ms

Log File Sync

WAIT #16: nam='log file sync' ela= 1286 buffer#=11910 p2=0 p3=0 obj#=84920

Do Less Work (fewer commits) Autocommit? Row at a time processing with commits?

Do the Work Faster Speed Up I/O

No RAID 5 No Contention (other db, arch, ASM) SS Disk

Improve LGWR Scheduling Renice

Log File Sync – “fixes”

Wait for LGWR to write to current log files Not necessarily synchronous write event System I/O event Reasonable ela values: <4ms Occurs as follows:

Every 3 seconds Log_buffer is 1/3 full or redo = 1M ( default _log_io_size) Commit or rollback by user session RAC needs to transfer a dirty block

Log File Parallel Write

Often fixed by fixing Log File Sync (i.e. speed up i/o, etc)

Force LGWR to write more often

Log File Parallel Write – “fixes”

Happens when database can’t switch to next log file Freezes the whole database Several Flavors

Checkpoint incomplete Archiving needed

Reasonable ela values: 0ms

Log file switch …

WAIT #9: nam='log file switch (checkpoint incomplete)' ela= 986212 p1=0 p2=0 p3=0

Do less work Make total online log space bigger Force more frequent incremental checkpoints Speed up i/o

Log file switch … – “fixes”

Contention event for the same block Read by other session (new event in 10g) Held by other session in incompatible mode

New event in 10g – read by other session Multiple sections of code throw time in this bucket P1=file, P2=block, P3=reason code See metalink note (34405.1) for details

Buffer Busy Waits

WAIT #7: nam='read by other session' ela= 3251 file#=4 block#=188557 class#=1 obj#=53707 tim=1183096831514887

It’s complicated – but basically - eliminate the contention

For Read by Other Session Do Less Work (tune SQL to do less lio) Eliminate i/o – bigger buffer cache, etc… Speed up i/o

For Others Find type of hot objects and statements suffering Address issue - ASSM, Freelist groups (RAC), etc..

Buffer Busy Waits - “fixes”

No place to put a new block

Free Buffer Waits

Do Less Work Reduce the amount of lio

Make more buffer space available Add memory to the buffer cache Speed up DBWR to flush blocks more quickly

Async_io More dbwr processes Renice

Free Buffer Waits - “fixes”

Database is idle – waiting on next request from client

Often marked as an idle event which it may be Time distribution can be skewed Common technique to ignore anything over 1 sec Reasonable e values: <2ms Protocol should be tcp (or beq for local processes)

SQL*Net message from client

WAIT #1: nam='SQL*Net message from client' ela= 336 driver id=1650815232 #bytes=1 p3=0 obj#=83660

Speed up app server Reduce the number of calls Use the right protocol (tcp to beq) Call network guy Get the users to type faster

SQL*Net message from client – “fixes”

More than 1 packet required to send SQL statement

SQL*Net more data from client

Don’t send 50K SQL statements Use packages

SQL*Net more data from client – “fixes”

Time to send a SQL*Net message to a client (sort of) Actually time it took to write the return data from Oracle’s

userland SDU buffer to OS kernel-land TCP socket buffer (Tanel Poder)

Often marked as an idle event which it may be Reasonable e values: <1ms (micro seconds actually) P1=bytes (usually 1 ???)

SQL*Net message to Client

WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=83660

Use correct protocol Fix network issue Call Andy

SQL*Net message to Client – “fixes”

When more than one packet required Rows spanning block LOB’s Array fetches

Often marked as an idle event which it may be Reasonable e values: <1ms P1=bytes (usually 1 ???)

SQL*Net more data to client

WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=83660

Fix chained/migrated rows Don’t use LOBs Nothing to fix

SQL*Net more data to client – “fixes”

Locks for objects Que up in order

9i lumped them all together 10g has 209 separate events

Enqueues

P1 has encoded type and mode

Enqueue in 9i

SELECT chr(bitand(p1,-16777216)/16777215)||

chr(bitand(p1, 16711680)/65535) Type, mod(p1,16) lmodefrom v$session_wait where event=‘enqueue’;

Several Major Categories TM – table modification TX – Transaction locks UL – user lock CI – Cross Instance CU – Cursor Bind HW – High Water RO – Reusable Object ST – Space Transaction TS – Temporary Space

Parameters depend on type

Enqueue in 10g

User can’t modify block until other user commits

Enqueue: TX row lock

Use ash tables to find statements waiting Fix the code

Or do what one of our clients did:write some code to kill any processes blocking for more than 60 seconds – lot’s easier than changing the app

Enqueue: TX row lock – “fixes”

Waiting on exclusive access to a sequence Seems minor but can actually be a big issue on

some systems

Enqueue: SQ

Cache the sequence Use NOORDER in RAC

Enqueue: SQ – “fixes”

Locks internal memory structures Not ordered Very light weight Kid Asking for Candy Model Spins vs. Sleeps (_spin_count, _latch_class_X) Wait events do not include time spent spinning

9i lumped them all together 10g has 28 separate events

Latches

All dumped in the Latch Free event P2 has latch#

Latches in 9i

WAIT #9: nam='latch free' ela= 185 p1=-4611686003650090584 p2=157 p3=0

SQL> Select name from v$latch where latch#=157;

NAME----------------------------------------library cache

Primarily Deal with major SGA areas Shared Pool

Latch: shared pool Latch: library cache Latch row cache

Buffer Cache Latch: cache buffer chains Latch: cache buffers lru chain

Log Buffer Latch: redo …

Parameters depend on type (see v$event_name)

Latches in 10g

Each block hashes to a chain Several chains per latch Several latches per instance (default based on cache

size)

Latch: cache buffers chains

Find hot blocks Fix bad SQL Maybe Increase number of latches Maybe decrease rows per block

Latch: cache buffers chains –”fixes”

Shared Pool Contention Due primarily to parsing See also latch: library cache …

Latch: shared pool

WAIT #28: nam='latch: shared pool' ela= 751 address=1611562656 number=213 tries=1 j#=46024

Use Bind Variables CURSOR_SHARING = FORCE Increase SHARED_POOL

– this is where Darcy came in and sat in my office so I didn’t get much more done –

You can thank Darcy later!

Latch: shared pool – “fixes”

Some tools show this (Hotsos Profiler) Indicates lost time due to rounding (small) Or lost time due to CPU contention (large)

Unaccounted for

Tracing alter session set events '10046 trace name

context forever, level 8';

Spits out wait events, plans (real), stats Level 12 includes bind variables – makes file very

big Puts file in user_dump_dest (ls –altr) Can be executed from logon trigger, in-line, etc… Scoping is important (i.e. when it’s turned on and

off) Note: bug in early versions of 9.2, fixed in 9.2.0.6

Trace File Contents WAIT FETCH EXEC PARSE STAT XCTEND PARSING IN CURSOR

Raw Trace File /opt/oracle/admin/LAB102/udump/lab102_ora_4207.trc Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production ORACLE_HOME = /opt/oracle/product/db/10.2.0/db1 System name: Linux Node name: homer Release: 2.6.9-34.ELhugemem Version: #1 SMP Fri Feb 24 17:04:34 EST 2006 Machine: i686 Instance name: LAB102 Redo thread mounted by this instance: 1 Oracle process number: 20 Unix process pid: 4207, image: oracle@homer (TNS V1-V3)

*** ACTION NAME:() 2007-08-16 13:48:14.571 *** MODULE NAME:(SQL*Plus) 2007-08-16 13:48:14.571 *** SERVICE NAME:(SYS$USERS) 2007-08-16 13:48:14.571 *** SESSION ID:(143.189) 2007-08-16 13:48:14.571 ===================== PARSING IN CURSOR #7 len=68 dep=0 uid=61 oct=42 lid=61 tim=1159462982979740 hv=740818757 ad='30663e48' alter session set events '10046 trace name context forever, level 8' END OF STMT EXEC #7:c=0,e=269,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1159462982979717 WAIT #7: nam='SQL*Net message to client' ela= 7 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1159462982980778 WAIT #7: nam='SQL*Net message from client' ela= 119 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1159462982981008 ===================== PARSING IN CURSOR #8 len=44 dep=0 uid=61 oct=3 lid=61 tim=1159463023994427 hv=761757617 ad='54738434' select avg(col1) from skew where rownum < 10 END OF STMT PARSE #8:c=4000,e=3904,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1159463023994411 EXEC #8:c=0,e=185,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1159463023994844 WAIT #8: nam='SQL*Net message to client' ela= 9 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023994980 WAIT #8: nam='db file scattered read' ela= 218 file#=4 block#=21900 blocks=5 obj#=53707 tim=1159463023995544 FETCH #8:c=0,e=665,p=5,cr=4,cu=0,mis=0,r=1,dep=0,og=1,tim=1159463023995732 WAIT #8: nam='SQL*Net message from client' ela= 157 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023996048 FETCH #8:c=0,e=7,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1159463023996183 WAIT #8: nam='SQL*Net message to client' ela= 7 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023996312 WAIT #8: nam='SQL*Net message from client' ela= 298 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023996669 XCTEND rlbk=0, rd_only=1 STAT #8 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=4 pr=5 pw=0 time=736 us)' STAT #8 id=2 cnt=9 pid=1 pos=1 obj=0 op='COUNT STOPKEY (cr=4 pr=5 pw=0 time=846 us)' STAT #8 id=3 cnt=9 pid=2 pos=1 obj=53707 op='TABLE ACCESS FULL SKEW (cr=4 pr=5 pw=0 time=639 us)' WAIT #0: nam='log file sync' ela= 680 buffer#=4862 p2=0 p3=0 obj#=53707 tim=1159463039852003

Questions?

[email protected]

mailto:[email protected]

Documents

Oracle Wait Events That Everyone Should Know