18
Direct Contention Identification Using Oracle’s Session Wait Virtual Tables Craig A. Shallahamer OraPub, Inc. (http://www.europa.com/~orapub) 1. Abstract With the introduction of the session wait family of virtual performance tables, standard performance tuning practices have been radically changed. No longer is it necessary to gather volumes of data, cal- culate a seemingly ever growing number of perform- ance ratios, and then finally perform analysis. Through the session wait views, Oracle database sys- tems “tell” performance specialists specifically where contention resides and what is causing the contention. This dramatically reduces problem identification time so more time can spent on problem resolution analy- sis. This paper explains how one can use the session wait views to quickly identify contention, discusses various scripts to quickly empower one, and presents techniques to shift one’s thinking from traditional ra- tio analysis to direct problem identification. 2. Introduction An optimized computing system benefits many. Us- ers are able to perform their work in a timely manner and without interruption, computing system adminis- trators can concentrate on other work, and the busi- ness can function without anyone thinking about the information system. Unfortunately this is rarely the case. With the introduction of Oracle7, a new and direct method of detecting contention and diagnosing prob- lems was made available. With the desire for an op- timized system, the professional community’s seem- ingly endless affair for traditional contention identification methods, and the introduction of new methods, have left the database administrator some- what confused, surely misdirected, resulting in wasted time and money, non-optimal computing system per- formance, and decreased system stability. Exceptional performance tuning papers are extremely rare. Virag Saksena’s papers (see references) are truly exceptional and cover areas not mentioned in this paper and briefly touch upon some of the areas covered in this paper. This paper was written to spe- cifically inform performance specialists about the new method of contention identification using Ora- cle’s session wait virtual tables. To accomplish this task, this paper first presents de- tails about the session wait family of virtual perform- ance tables, then discusses the methodology differ- ences with using session wait statistics as opposed to the traditional ratio methods, and finally a very real case study is presented, dissected, and analyzed. 3. Session Wait Statistics A baby and an adult both have a sore throat. When asked by the doctor, “What’s wrong?” the baby will look at the doctor and scream (trust me, I know.), whereas the adult will present attributes about the pain. This not so subtle difference directs the doctor towards different problem identification methods. With the baby, the doctor must perform many tests, gather the data, consolidate the data, perform analy- sis, determine the problem, and make recommenda- tions. With an adult, he simply asks a few questions and then can immediately begins a more detailed ex- amination. Starting with Oracle7, the Oracle server “acts like an adult.” The session wait statistics tells one exactly “what hurts.” Whereas with earlier versions of the sever, one had to perform costly ratio based analysis to determine where the “pain” was originating from. 3.1 “There is always a bottleneck” Don’t fall into the trap of saying, “My system’s slow but there is no bottleneck.” To summarize, you’re wrong. There is always a bottleneck. If not, com- puting systems would run infinitely fast. The fastest car on the biggest highway continually faces bottle- necks. It could be the number of cars on the freeway, the number of lanes, the weather, an accident ahead, the type of petrol, the skill of the driver, the car de- sign, the engine design, the wind direction, one’s per- sonal risk quotient, one’s respect for others, and the list goes on. But there is always a bottleneck. In an Oracle system, even a highly tuned Oracle sys- tem, there is always a bottleneck. Processes are al-

Apps session wait_tables

Embed Size (px)

Citation preview

Page 1: Apps session wait_tables

Direct Contention Identification Using Oracle’s Session Wait Virtual Tables

Craig A. Shallahamer

OraPub, Inc. (http://www.europa.com/~orapub)

-

,

l-

1. Abstract

With the introduction of the session wait family ofvirtual performance tables, standard performancetuning practices have been radically changed. Nolonger is it necessary to gather volumes of data, calculate a seemingly ever growing number of perform-ance ratios, and then finally perform analysis.Through the session wait views, Oracle database systems “tell” performance specialists specifically wherecontention resides and what is causing the contentionThis dramatically reduces problem identification timeso more time can spent on problem resolution analysis. This paper explains how one can use the sessiowait views to quickly identify contention, discussesvarious scripts to quickly empower one, and presentstechniques to shift one’s thinking from traditional ra-tio analysis to direct problem identification.

2. Introduction

An optimized computing system benefits many. Us-ers are able to perform their work in a timely mannerand without interruption, computing system adminis-trators can concentrate on other work, and the business can function without anyone thinking about theinformation system. Unfortunately this is rarely thecase.

With the introduction of Oracle7, a new and directmethod of detecting contention and diagnosing prob-lems was made available. With the desire for an optimized system, the professional community’s seem-ingly endless affair for traditional contentionidentification methods, and the introduction of newmethods, have left the database administrator somewhat confused, surely misdirected, resulting in wastedtime and money, non-optimal computing system per-formance, and decreased system stability.

Exceptional performance tuning papers are extremelyrare. Virag Saksena’s papers (see references) atruly exceptional and cover areas not mentioned inthis paper and briefly touch upon some of the areacovered in this paper. This paper was written to specifically inform performance specialists about the

-

-

.

-n

-

-

-

re

s-

new method of contention identification using Ora-cle’s session wait virtual tables.

To accomplish this task, this paper first presents de-tails about the session wait family of virtual perform-ance tables, then discusses the methodology differ-ences with using session wait statistics as opposed tothe traditional ratio methods, and finally a very realcase study is presented, dissected, and analyzed.

3. Session Wait Statistics

A baby and an adult both have a sore throat. Whenasked by the doctor, “What’s wrong?” the baby willlook at the doctor and scream (trust me, I know.),whereas the adult will present attributes about thepain. This not so subtle difference directs the doctortowards different problem identification methods.

With the baby, the doctor must perform many tests,gather the data, consolidate the data, perform analy-sis, determine the problem, and make recommendations. With an adult, he simply asks a few questionsand then can immediately begins a more detailed ex-amination.

Starting with Oracle7, the Oracle server “acts like anadult.” The session wait statistics tells one exactly“what hurts.” Whereas with earlier versions of thesever, one had to perform costly ratio based analysisto determine where the “pain” was originating from.

3.1 “There is always a bottleneck”

Don’t fall into the trap of saying, “My system’s slowbut there is no bottleneck.” To summarize, you’rewrong. There is always a bottleneck. If not, com-puting systems would run infinitely fast. The fastestcar on the biggest highway continually faces bottle-necks. It could be the number of cars on the freeway,the number of lanes, the weather, an accident aheadthe type of petrol, the skill of the driver, the car de-sign, the engine design, the wind direction, one’s per-sonal risk quotient, one’s respect for others, and thelist goes on. But there is always a bottleneck.

In an Oracle system, even a highly tuned Oracle sys-tem, there is always a bottleneck. Processes are a

Page 2: Apps session wait_tables

s

,n

t

n

-

-

ways waiting for some resource. It could be data al-ready in the SGA, data residing on disk, a latch (seri-alizing execution for a piece of server kernel code),an enqueue (access to an internal structure), a loc(access to a user defined structure), or many otheitems. But the bottom line is, there is always a bot-tleneck, a session is always waiting for something(either to start something or waiting for it to finish),and the session wait statistics tells one exactly what aspecific session is specifically waiting for.

3.2 Session Wait Views

There is actually a family of session wait basedviews. Each view has a special purpose and theycompliment each other providing direction for per-formance specialists. The virtual tables arev$system_event, v$session_event, andv$session_wait. Each virtual table along with anactual script and real-life output is presented below.

3.2.1 High level system perspective usingv$system_event

The v$system_event view looks at sessions at a high-level perspective. It doesn’t report which specificsession is waiting for which specific resource or ob-ject. It sums all the waits since the database instancwas last started by wait category. As with all “in-stance accumulating” statistics (e.g., v$sysstat), col-lecting and reporting the differences over periods oftime is much more useful than a single query.[TotalPerformance Management]

There are only five columns in the v$system_eventview. Each is presented below.

• Event. This is simply the general name of theevent. While there are many wait events, themost common events are latch waits, db filescattered read, db file sequential read, en-queue wait, buffer busy wait, and free bufferwaits. While I will be detailing the most com-mon events, presenting each wait event is out ofscope for this paper. To get details about otherevents one is experiencing, contact an Oraclerepresentative or search the WWW.

• Total_waits. This is the total number of waits,i.e., a count, for the specific event since the da-tabase instance has started.

• Total_timeouts. This is the total number ofwait time-outs, i.e., a count, for the specificevent since the database instance has startedNot all waits result in a time-out. Some waits

kr

e

.

will try “forever” (an exponential back-off algo-rithm is sometimes used) while others will tryonce or a few times, stop, and perform somecoded action.

• Time_waited. This is the total time, in milli-seconds, all sessions have waited for the specificevent since the database instance has started.

• Average_wait. This is the average time, inmilliseconds, all sessions have waited for thespecific event since the database instance hastarted. As one would expect, this should equaltime_waited divided by total_waits.

Figure 1. presents a v$system_event based tool andsample output. Figure 1. clearly shows sessionswaiting for enqueues, busy database block bufferslatches, and database files. Each of these commowaits will be discussed in this paper. Once the over-all system has been examined, one drills-down intoeach one of the significant wait areas, which we beginpresenting below.

3.2.2 High level session perspective usingv$session_event

The v$session_event virtual view presents exactlythe same information as the v$system_event virtualview except it includes the session id for each evenand information is “zeroed” out after a session logsoff. For example, if session ten’s statistics suddenlydrop, that’s because session ten just logged off andanother session logged in and was assigned sessionumber ten.

This view is very useful for determining which ses-sions are causing or experiencing the most waits andthen drilling down into the contention specifics.

Figure 2 below presents a v$session_event based tooland sample output. In situations where the session idis not known or is not relevant, one could execute theFigure 2 script without the session id filter. In the ex-ample shown, session id is asked for and supplied.

Figure 2 clearly shows that session ten is experiencing significant enqueue, buffer busy, and latch waits.As we will see, this combination points towards ex-cessive buffer activity typically caused by poorlytuned SQL, poor application design, or both.

3.2.3 Low level session perspective usingv$session_wait

The v$session_wait view is the most complicatedand the most misunderstood of the session wait fam

Page 3: Apps session wait_tables

ily. I suppose this is because unlike thev$system_event and the v$session_event view, thev$session_wait view reports session level wait in-formation in real-time. It does not store or summa-rize information, but rather dumps out the raw infor-mation as it’s occurring.

Activity in an Oracle database occurs very, very fast(as opposed to other database vendors of course), seven quickly re-running a v$session_wait query maylook totally different than a split-second before.Don’t be alarmed by this, but rather understand whythis occurs.

In addition to presenting information in real-time, thev$session_wait view provides detailed and specificinformation about the actual waits. For example, itwill direct one to sessions experiencing latch conten-tion and show which latch the sessions are waitingfor. Or, if a session is waiting for access to a block,the actual database file number and data block number are provided.

The v$session_wait view also presents timing details,much like the v$system_event and v$session_eventviews. However, because of the v$session_wait’sreal-time reporting, the times presented mean differ-ent things depending on the wait situation. This isexplained in more detail below.

The v$session_wait columns are presented below.

• Sid. The session id number. The same as thesession id number in v$session_event andv$session.

• Seq#. The internal sequence number of the waitfor this session. Use this column to determinethe number of waits, i.e. counts, the session hasexperienced.

• Event. This is simply the general name of theevent. While there are many wait events, themost common events are latch waits, db filescattered read, db file sequential read, en-queue wait, buffer busy wait, and free bufferwaits. While I will be detailing the most com-mon events, presenting each wait event is out ofscope for this paper. To get details about otherevents one is experiencing, contact an Oraclerepresentative or search the WWW

• P[1-3]. These three parameters are pointers tomore details about the specific wait. The pa-rameters are foreign keys to other views and arewait event dependent. For example, for latchwaits, p2 is the latch number, which is a foreignkey to v$latch. But for db file sequential read

o

-

(indexed read, yes an indexed read), p1 is thefile number (foreign key to v$filestat ordba_data_files) and p2 is the actual blocknumber (related to dba_extents, sys.uet$,sys.fet$). To use these parameters, one needs alist of the waits with their associated parameters.Again, for parameter specifics contact an Oraclerepresentative or search the WWW. The p[1-3]text column may provide some information.

• P[1-3]raw. This is the raw representation ofp[1-3]. This is not used very often and none ofthe tools presented in this paper make reference.

• P[1-3]text. Sometime the name of the parame-ter, p[1-3], is provided. Don’t count on it.

• State. This is a very important parameter be-cause is tells one how to interpret the next twocolumns discussed below, wait_time and sec-onds_in_wait. If one misinterprets the state orchooses to ignore it, one will most likely misin-terpret the wait_time and seconds_in_waitnumbers. The state column has four possiblevalues.• Waiting . The session is currently waiting

for the event.• Waited unknown time. The instance pa-

rameter, timed_statistics is set to false.• Waited short time. Indeed, the session

did not wait even one clock tick to getwhat it wanted, so no wait time was re-corded.

• Waited known time. Once the sessionhas waited and then received what itwanted, the state turns from waiting towaited known time.

• Wait_time. The value is dependent on the statecolumn discussed above. If the state columnvalue is:• Waiting , then the wait_time value is bo-

gus.• Waited unknown time, then the

wait_time value is bogus.• Waited short time, then the wait_time

value is bogus.• Waited known time, then the actual wait

time, in milliseconds, is shown. One isvery lucky to see this value, because if thesession begins waiting for another re-source, i.e., status will now be waiting,the wait_time value turns bogus.

• Seconds_in_wait. The value is dependent onthe state column. If the state column value is:

Page 4: Apps session wait_tables

ns

-

x-

• Waiting , then the actual time, in millisec-onds, is shown. If one does see a valuehopefully, one won’t see it again if re-querying. If so, the session is having towait a relatively long time. If one repeat-edly queries, one will get a good idea whena session does wait, how long it waits.

• Waited unknown time, then the sec-onds_in_wait value is bogus.

• Waited short time, then the sec-onds_in_wait value is bogus.

• Waited known time, then the sec-onds_in_wait value is bogus.

Figure 3 shows the source code and an example using the v$session_wait view to gather timing in-formation. Whereas, Figure 4 shows the code and aexample of using the v$session_wait view to gather“drill-down” information.

4. Changing One’s Approach

As presented above, using session wait statistics eables the performance specialist to pursue a differeand much more productive performance tuning approach. While the general performance managemeapproach, as outlined in the Total Performance Man-agement paper (see reference section), remains thsame, the drill-down strategy has been radicallychanged. This section is aimed at contrasting the twbasic drill-down approaches to help one understanthe differences so one can make the switch to thmore powerful of the two.

When drilling down using session wait statistics, onetypically starts out at the high level using thev$system_events view. Based upon the statisticsgathered, one immediately begins to drill-down usingthe v$session_events and the v$session_wait views.One continues this cycle until the contention infor-mation required is found.

Think about the power. In the past, when a persoreported a problem, one begin by looking at ratiostracing their specific session, etc. Now one can looto see exactly why their session is waiting. One caeven say “You are waiting for the X latch becauseeveryone wants to update block number x, in filenumber y.” Not that anyone would actually say this,but the information is available.

Now let’s outline the traditional ratio based approachThere are many sub-systems within an Oracle systemThere are many latches, caches, hash chains, lisprocesses, etc. To determine if a particular area “doing well,” one had to sample some statistics ove

,

of

n

n-nt-nt

e

ode

n,kn

..

ts,isr

time and calculate the ratios which should point onein the right direction. Usually the statistics do pointthe performance specialist in the right direction, butone must truly be a performance specialist, with manyyears of training, before this method becomes effec-tive. What usually happens is various miscellaneoussystem changes are made which seemed to work ithe past. If someone asked why a specific change wamade, they would typically receive a response like,“Well if something works for me, I stick with it!”

Now let’s contrast the two methods. The ratiomethod is like the newborn baby, who just screamswhen it hurts. The doctor has a very difficult problemto solve because before he can not effectively communication with the child. Whereas with an adult,effective communications are available. Contrastingthis with the session wait method, one simply per-forms a few simple queries and lets the system tell thedoctor exactly where it hurts. It’s obviously more dif-ficult than that, but I think you get the point.

So next time you look at the database block buffercache hit ratio and it’s 60% but there are no relatedsession waits, don’t worry about it. And when thedatabase block buffer cache hit ratio is 95% but thereare no related session waits, don’t worry about it.And when sessions begin to wait, don’t worry aboutit, just let the statistics guide you down the criticalpath towards productive and effective performancetuning.

5. Case Study

What is presented below is real life. Nothing is madeup and everything is real. Notice no ratios are usedduring the analysis, yet the problem and it’s relatedaffects were quickly discovered and analyzed.

Our general method is the Total Performance Methodutilizing the holistic problem identification method-ology to approach the problem from an Oracle server,application, and operating system perspective. TheOracle server approach uses session wait statistics eclusively. The application section only references themost resource consuming SQL, and the operatingsystem was diagnosed using only the UNIX sar util-ity. All standard tools and available to everyone. Sohere we go…

5.1 Oracle Server Investigation

5.1.1 High level review

To begin our analysis, we start with examining theoverall situation. We execute sw5.sql which refer-

Page 5: Apps session wait_tables

ee.

.a

de-,in

k

to

e

di-

-

r

r.-

tn-

a

-

ences the v$system_events view. The results areshown in Figure 5.

This system is experiencing serious contention anthe users must be furious. Five different wait typevividly show themselves. Enqueue waits, bufferbusy waits, SQL*Net message from client, rdbmsipc message, and latch free waits. The enqueuewaits are by the far the most prevalent, but experence tells me all the significant waits are all closelrelated.

I’m not interested in the SQL*Net wait messagebased upon my experiences, conversations with mpeers, and SQL*Net is not being used in the systebeing monitored. Because db file sequential read isone of the most common waits in many systems, will be discussed below.

5.1.2 Enqueue Waits

We will begin our quest by investigating the enqueuwaits. Enqueues are probably one of the most misuderstood areas in the Oracle server. Much of thproblem stems from the fact there are many differetypes of enqueues and determining which enqueueprocess is waiting for, what the enqueue means, awhat to do about it, is difficult to determine and tofind relevant documentation. Therefore, before wdig into how to diagnose enqueue waits, a discussiabout protecting Oracle internal structures is in orde

User implemented locks protect user defined strutures (e.g., gl.gl_code_combinations), enqueuesprotect Oracle internal structures (e.g., uet$ andfet$), and latches ensure specific sections of Orackernel code are executed serially by a single sessioFor example, allowing a process to update the uet$table while another process updates the fet$ tablecould have disastrous affects.

A difference between latches and enqueues is latchmaintain no sequence or ordering. The latch algrithm uses either timers to sleep, wake up, and ret(both single and multi-CPU environments), or inmultiprocessor environments, a process can “spinand retry. Since all latch waiters are concurrently retrying, any process may get the latch. While unsetling, it is theoretically possible the first process to trto get a latch, may be the last process to get the latLatching algorithms and operating system scheduleobviously try to minimize this occurrence. Howeveron an extremely latch wait laden environment, thitype of problem can and has occurred.

An enqueue is a sophisticated locking mechaniswhich permits several processes to share known

ds

i-y

ym

it

en-e

nt and

eonr.

c-

len.

eso-ry

”-t-ych.rs,s

mre-

sources to varying degrees. Any object which can bconcurrently used can be protected with an enqueuFor example, Oracle allows varying levels of sharingon tables: two processes can lock a table in sharemode or in share update mode.

Enqueues are platform-specific locking mechanismsAn enqueue allows a server-side process to store value in a lock, that is, the requested lock mode. Anoperating system lock manager, manages the lockeresources. If a process cannot be granted a lock, bcause it is incompatible with the lock mode requestedthe operating system places the requesting process a FIFO wait queue.

The instance parameter enqueue_resources sets thenumber of resources that can be locked by the locmanager. Since multiple DML locks can be placedon an object, there needs to be enough enqueues cover the number of DML locks. For example, ifthree users are modifying data in one table, then threDML locks will be required. If three users are modi-fying data in two tables, then six DML locks will berequired. As of Oracle7, all DDL locks are handledinternally by the Oracle kernel and do not require anenqueue. Also, the Oracle kernel needs arountwenty enqueues for overhead. Therefore, the maxmum sensible enqueue_resources value would betwenty plus the summation of the number of objectstimes the number of DML transactions against theobject. Typically, enqueue_resources is set muchlower. In fact, by default, enqueue_resources isgenerally set to the instance parameter sessions timesfive.

Back to our investigation. Figure 6 shows a very useful yet somewhat cryptic script. The specific en-queue waits are derived from the p1 parameter.Each enqueue wait can be drilled-down even furtheby using the p2 and p3 parameters. For example, theTX enqueue’s p2 and p3 parameters points one tothe undo segment, its slot, and its sequence numbeIn most cases, drilling down to just to the specific enqueue wait name will provide one with plenty of in-formation.

As v$system_events forewarned, there is significantenqueue waiting occurring. This script output doesn’show this, but nearly three-quarters of the users othis system are waiting for the same enqueue resource, a transaction enqueue. More specifically, thetransaction enqueue is held in exclusive mode whentransaction initiates its first change and held until thetransaction performs either commit or a rollback.This directs us to be on the lookout for intense DMLactivity and/or the rollback segments are poorly con

Page 6: Apps session wait_tables

figured. In this particular situation, the rollback seg-ments are configured to handle the application’s expected workload. If however, we address the DMLconcern and transaction enqueues persists, then trollback segments will need to be reconfigured.

5.1.3 Buffer Busy Waits

The next wait to investigate is buffer busy waits. Abuffer is “busy” when another process is reading iinto the data block buffer cache or the relatedblock(s) are being “converted” into a “cleaned-up”block state. If a block is being repeatedly updatedlocating and accessing the buffer(s) could cause session to wait for the block(s).

When a session updates row data in a block, the Orcle kernel performs the least amount of work necessary for the session to continue performing other useful work. Later on, the block(s) will be convertedback into a stable state. However, if a specific blockis repeatedly updated, “copies” of the block are madto ensure a user is not held waiting and internal Oracle integrity is maintained. While this process in-creases performance from one perspective, the timinvolved to locate the busy block will increase,thereby decreasing performance from another pespective. (See the latch waits section below.) Thisprocess can cause a buffer(s) to become busy and duce a buffer busy wait.

We determine the actual buffer being requested by itfile number and its block number. Figure 7 shows thesw3.sql script being used to show the parameters foall the waits, for all the sessions. This much detaican only come from the v$session_wait view. On avery large system, the output could be pages longWe will also use this output for diagnosing the latchwait activity.

For buffer busy waits, db file sequential reads (in-dex reads), and db file scattered reads (full tablescans), p1 is the file number and p2 is the blocknumber. Using this information combined withdba_data_files and dba_extents, we can determinethe actual operating system file name and databasobject these sessions are waiting for. In this casstudy, nearly every busy buffer is for file number five,block number eight. File number five is/u03/oradata/fox/users01.dbf and buffer numbereight is in the account table.

Let’s review. The enqueue waits are directing one tobe on the lookout for heavy DML activity or poorlyconfigured rollback segments. The buffer busy waitsdirects one to a specific block, in the account tableand in the users01.dbf database file. If there is an

-

he

t

,a

a---

e-

e

r-

in-

s

rl

.

ee

I/O bottleneck, the users01.dbf file will most likelybe very “hot.” If not, they will most likely be heavySGA data block buffer activity. The picture is be-coming clearer.

5.1.4 Latch Waits

Figure 7 shows many sessions waiting on the samelatch. The latch number is provided by p2, whichreferences the latch number in the v$latch view. Thescript swlatch.sql, shown in Figure 9 shows thecache buffer chain latch is the latch sessions arewaiting for.

We have already discussed latching in the previoustwo sections, but there is still more details to presentto better understand the latching process. Certainparts of the Oracle kernel code should only be run bya single process at a time to ensure certain structuresare properly maintained (e.g., one wouldn't want toquery a node from a linked list if someone else wassimultaneously deleting the node’s parent). A latch isa data structure and a protocol to ensure only one ses-sion accesses certain parts of the kernel code at atime.

One latch in particular is the cache buffer chainlatch. To allow an Oracle server-side process toquickly determine whether a desired database blockresides in the buffer cache, Oracle keeps a hash tablein the SGA. This hash table consists of several hashchains (there are _db_block_hash_buckets hashchains in the hash table). Each hash chain can beviewed as a hash bucket with a sequential linked listof blocks related to the specific bucket, hence theterm hash chain.

Each database block is uniquely identified by its filenumber and its block number. The hashing algorithm“hashes” each data block about to reside in the datablock buffer cache. The hash function inputs are thefile number, the block number, and the block type.The hash function output is the appropriate hashchain.

Any time a server-side process needs to reference ablock, for any reason, the block can be quickly refer-enced by hashing the block and then serially search-ing the appropriate hash chain for the block’s SGAaddress. So each time a block is inserted, deleted, orjust examined, the appropriate hash chain must be ac-cessed and sequentially searched. To maintain inter-nal integrity, any Oracle server code which touchesthe cache buffer chain must only be executed by asingle session. To accomplish this serial execution,the Oracle kernel code uses the cache buffer chainlatch.

Page 7: Apps session wait_tables

If multiple sessions want to run a section of kernelcode, then one or more sessions must wait for thelatch to become available. Until it does becomeavailable, they must wait. This wait shows up as asession wait, latch wait.

As discussed in the previous two sections, the cachebuffer chain latch wait tells one a process is waitingto acquire the cache buffer chain latch. If a block isbeing updated repeatedly, “copies” of the block aremade to allow the updating processes to continue peforming useful work and to allow other processes toaccess the block. Unfortunately, when this occursthe related hash chain becomes “long.” So to confirmthe existence of the desired block in the SGA, thecache buffer chain latch must be held longer thannormal. This can cause the cache buffer chain latchwait.

We have completed our investigation from an Oracleserver perspective. The evidence shows most sessions are performing an update, updating the samblock over and over again.

5.2 Operating System Perspective

No analysis is complete without looking at the com-puting system from an operating system perspectivePresented below, is a grossly abbreviated operatinsystem investigation. But the focus of this paper is onsession wait statistics, not on the operating system.

In most cases, the operating system investigation wilsupport findings from both an Oracle server and ap-plication perspective. As one will see, this case is nodifferent.

An operating system should be investigated from fourangles. The I/O subsystem, the CPU subsystem, thmemory subsystem, and the network subsystem. Thiparticular computing system resides on a single hosso there is no network activity occurring. While notshown below in Figure 10, memory was checked andwas not a problem. That leaves CPU and I/O. TheCPU and I/O subsystem were very quickly investi-gated by running the UNIX sar -u command. Theresults, which thirty seconds of statistics do notprove, are shown in Figure 10. The output shows aCPU bottleneck with less than one percent of thetime, the CPU is waiting for information from the I/Osubsystem.

This shows the CPU is very busy, possibly concen-trating predominately on memory manipulation, suchas very intense SGA activity. Again this supports thebuffer busy, the latch, and the enqueue wait activitywe discovered.

r-

,

-e

.g

l

est

When there is intense SGA activity, it is not uncom-mon to see CPU system time greater than CPU usertime. Enqueue activity uses the operating systemkernel code for queuing and latching uses the operat-ing system kernel code for “sleeping.” Oracle kernelcode is predominately used during latch wait spinningwhich shows up as CPU user time.

5.3 Application Perspective

So far our investigation shows something is causingserious buffer activity resulting in latch waits, bufferbusy waits, enqueue waits, and a very high CPU utili-zation. Just as with the operating system investiga-tion above, this application investigation is grosslyabbreviated, but serves our purpose.

Since we suspect a single SQL statement is causingthe problems, looking at what SQL statements arecurrently being run may bring our investigation fullcircle. Figure 11 lists all the currently running SQLstatement hash values by querying from the v$sessionview. As we suspected, a large percentage of usersare running the exact same SQL statement.

Now that we have the SQL statement’s hash value wewill query v$sqltext to show the actual SQL state-ment. And there’s our culprit! A very simple updatestatement.

6. Final Analysis

This analysis was very straightforward with eachpiece of evidence nicely building our case. Let’s re-view our evidence.

• Enqueue TX waits directs one to be on thelookout for DML activity that many sessions arerunning, which are accessing the same table(s).This could also indicate poorly configured roll-back segments.

• Buffer busy waits directs one to a specificblock, in the account table and in the us-ers01.dbf database file.

• Cache buffer chain latch waits directs one thatmany sessions want to touch the cache bufferchain. A high update concurrency rate on a sin-gle block could cause this.

• CPU subsystem saturation with most of thetime spent in user mode directs one towardsOracle kernel code based CPU activity. Intensecache buffer chain latch activity, along withother SGA manipulation activity such as en-queue and buffer busy waits are just the thing tocause this.

Page 8: Apps session wait_tables

e

eri-l

ee

l,de-

• I/O subsystem not doing much confirms thatthe users01.dbf database file is not being ac-cessed much from an I/O perspective, but theblocks in the database file are being heavily ma-nipulated in the SGA.

• A single SQL update statement updating asingle block is being performed by a large per-centage of the sessions.

Now that the problem has been identified, we need toask, “What the heck is going on? Why is the samebloody update statement being executed over anover?” Unfortunately, this situation is not that un-common.

Two situations arise which could easily cause thispattern of activity. One example is using a table forsequence number generation and the other is for lowbudget and worthless benchmarks or stress tests. Thcase study shows that even a single SQL statemenstrategically placed and executed, can cause a tremendous amount of contention.

Besides the application and benchmark design issueraised by this example, using a ratio performancetuning approach would have brought one to the samconclusion, but not nearly as quickly and not withsuch exactness and persuasion.

7. Conclusion

This paper was written for those who find perform-ance tuning books and nearly all published perform-ance tuning papers inadequate, antiquated, and nosupplying performance specialists with the most ef-fective methods to tune Oracle based systems. I hopthis paper, through the scripts, the prose, the examples, and the case study, have provided you with thinformation you need to begin significantly increasingyour performance optimization productivity usingOracle virtual session wait based tables. Thank youfor your time.

8. References

Chatterjee, S.; Price, B. Regression Analysis by Ex-ample. John Wiley & Sons, 1991. ISBN 0-471-88479-0

Cook, D., Dudar, M., Shallahamer, C. The RatioModeling Technique. Oracle Corporation WhitePaper, 1997. http://www.europa.com/~orapub

Jain, R. The Art of Computer Systems PerformanceAnalysis. John Wiley & Sons, 1991. ISBN 0-471-50336-3

d

ist,-

s

e

t

e-

e

Levin, R.; Kirkpatrick C.; Rubin, D. Quantitative Ap-proaches to Management. McGraw-Hill BookCompany, 1982. ISBN 0-07-037436-8Menascé,D.; Almeida, V.; Dowdy, L. Capacity Planningand Performance Modeling. PTR Prentice Hall,Englewood Cliffs NJ, 1994. ISBN 0-13-035494-5

Michalko, M. Thinkertoys. Ten Speed Press, 1991.ISBN 0-89815-408-1

Millsap, C. Designing Your System To Meet YourRequirements. Oracle Corporation White Paper,1995. http://www.europa.com/~orapub

Rubinstein, M., Firstenberg, I. Patterns Of ProblemSolving. Prentice Hall, 1995. ISBN 0-13-122706-8

Saksena, Virag. Identifying Resource Intensive SQLIn A Production Environment. Oracle Corpora-tion White Paper, 1996.http://www.europa.com/~orapub

Saksena, Virag. Tuning The Oracle Server - Identi-fying Internal Contention. Oracle CorporationWhite Paper, 1996.http://www.europa.com/~orapub

Shallahamer, C. Avoiding A Database Reorganiza-toin. Oracle Corporation White Paper, 1995.http://www.europa.com/~orapub

Shallahamer, C. Predicting Computing SystemThroughput and Capacity. Oracle CorporationWhite Paper, 1995.http://www.europa.com/~orapub

Shallahamer, C. Total Performance Management.Oracle Corporation White Paper, 1994.http://www.europa.com/~orapub

9. About the Presenter/Author

Mr. Shallahamer's ten-plus years of experience in thIT marketplace brings a unique balance of controlledcreativity to any person, team, or classroom. As thPresident of Orapub, Inc., his objective is to empoweOracle performance specialists and technical archtects. Recently, Mr. Shallahamer directed the globatechnical training efforts for Oracle Consulting'stechnical consultants. Previously he managed thWestern area of Oracle Services System PerformancGroup. Since joining Oracle in 1989, Mr. Shallaha-mer has co-founded three highly respected technicaconsulting groups (National Product Specialist TeamCore Technologies, System Performance Group) anhas worked at hundreds of client sites around thworld. His specializations include business manage

Page 9: Apps session wait_tables

d-

ment, training/education, performance optimizationand management, performance predic-tion/modeling/planning, and system/technical archi-tecture design related research, training, coaching

,

and consulting. As a result, Mr. Shallahamer has hathe pleasure to publish and present a number of papers at the EOUG, OAUG, IOUG, Openworld, and inOracle Magazine.

Page 10: Apps session wait_tables

10. Figures

All tools shown below are available, for free, from OraPub’s web-site (http://www.europa.com/~orapub).

-- file sw5.sql

col event format a25 heading "Wait Event" trunccol tws format 99999999 heading "Total|Waits"col tt format 99999999 heading "Total|Timouts"col tw format 99999.9 heading "Time(sec)|Waited"col avgw format 9999 heading "Avg (ms)|Wait"

select event, total_waits tws, total_timeouts tt, time_waited/1000 tw, average_wait avgwfrom v$system_eventorder by time_waited desc;

SQL> @sw5

Total Total Time(sec) Avg (ms)Wait Event Waits Timouts Waited Wait------------------------- --------- --------- --------- --------enqueue 5804940 635 15780.5 3buffer busy waits 2710683 10031 8469.5 3SQL*Net message from clie 3624 0 3980.5 1098rdbms ipc message 57584 2987 3332.3 58latch free 1046312 1024085 2511.7 2pmon timer 16262 6114 2326.1 143smon timer 81 77 2322.9 #####log file parallel write 267082 0 1644.0 6db file parallel write 21188 3366 1253.2 59rdbms ipc reply 8849 2154 775.5 88PL/SQL lock timer 1473 1347 270.8 184db file scattered read 1145592 0 183.7 0free buffer waits 1820 1608 164.5 90write complete waits 3482 492 122.1 35log file sync 8831 59 111.9 13db file sequential read 319650 0 55.8 0log buffer space 846 219 47.4 56log file switch completio 1141 22 39.9 35control file parallel wri 8052 0 39.2 5db file single write 3110 0 10.2 3log file single write 523 0 2.6 5control file sequential r 11578 0 .3 0row cache lock 12 0 .3 23log file sequential read 264 0 .3 1SQL*Net message to client 3625 0 .1 0process startup 3 0 .0 2SQL*Net break/reset to cl 42 0 .0 0instance state change 1 0 .0 0

Figure 1.

Page 11: Apps session wait_tables

-- file sw4.sql

col sid format 9999 heading "Sess|ID"col event format a25 heading "Wait Event" trunccol tws format 9999999 heading "Total|Waits"col tt format 99999 heading "Total|Timouts"col tw format 9999999 heading "Time (ms)|Waited"col avgw format 9999 heading "Avg (ms)|Wait"

select sid, event, total_waits tws, total_timeouts tt, time_waited tw, average_wait avgwfrom v$session_eventwhere sid = &sidorder by time_waited desc,event;

SQL> @sw4Enter value for sid: 10

Sess Total Total Time (ms) Avg (ms) ID Wait Event Waits Timouts Waited Wait----- ------------------------- -------- ------- --------- -------- 10 enqueue 537484 54 1241548 2 10 buffer busy waits 244341 755 621615 3 10 latch free 80715 78496 120973 1 10 write complete waits 207 16 5447 26 10 free buffer waits 55 52 5258 96 10 log file switch completio 75 1 2446 33 10 log buffer space 34 8 1738 51 10 SQL*Net message from clie 36 0 7 0 10 SQL*Net message to client 36 0 0 0 10 db file sequential read 12 0 0 0

Figure 2

Page 12: Apps session wait_tables

-- file sw2.sql

col event format a25 heading "Wait Event" trunccol state format a15 heading "Wait State" trunccol siw format 99999 heading "Waited So|Far (ms)"col wt format 9999999 heading "Time Waited|(ms)"

select event, state, seconds_in_wait siw, wait_time wtfrom v$session_waitwhere sid = &sidorder by event;

SQL> @sw2Enter value for sid: 10

Waited So Time WaitedWait Event Wait State Far (ms) (ms)------------------------- --------------- --------- -----------latch free WAITING 2 0SQL> /Enter value for sid: 10

Waited So Time WaitedWait Event Wait State Far (ms) (ms)------------------------ --------------- --------- -----------enqueue WAITED SHORT TI 1 -1 (both values bogus)SQL> /Enter value for sid: 10

Waited So Time WaitedWait Event Wait State Far (ms) (ms)------------------------- --------------- --------- -----------buffer busy waits WAITING 1 2 (bogus value)SQL> /Enter value for sid: 10

Waited So Time WaitedWait Event Wait State Far (ms) (ms)------------------------- --------------- --------- -----------buffer busy waits WAITING 0 0

Figure 3

Page 13: Apps session wait_tables

-- file sw3.sql

col sid format 9999 heading "Sess|ID"col event format a10 heading "Wait Event" wrapcol state format a10 heading "Wait State" trunccol siw format 99999 heading "W'd So|Far (ms)"col wt format 9999999 heading "Time|W'd (ms)"col p1 format 9999999999999 heading "P1"col p2 format 999999999 heading "P2"col p3 format 99999 heading "P3"

select sid, event, state, seconds_in_wait siw, wait_time wt, p1, p2, p3from v$session_wait;

SQL> @sw3

Sess W'd So Time ID Wait Event Wait State Far (ms) W'd (ms) P1 P2 P3----- ---------- ---------- -------- -------- -------------- ---------- ------ 45 latch free WAITING 0 0 3759137796 9 0 63 latch free WAITED KNO 0 1 3759137704 9 0 1 pmon timer WAITING 1 0 300 0 0 6 enqueue WAITING 0 0 1398013958 0 0 15 enqueue WAITING 0 0 1415053318 327711 53651… 11 enqueue WAITING 0 0 1415053318 327711 53651 14 buffer bus WAITING 0 0 5 8 1016 y waits… 23 buffer bus WAITING 0 0 5 8 1016 y waits 28 db file se WAITED SHO 0 -1 5 31 1 quential r ead 12 db file sc WAITED SHO 0 -1 5 29 10 attered re ad 19 db file sc WAITED SHO 0 -1 5 26 5 attered re ad 4 smon timer WAITING 94 0 300 0 0 7 SQL*Net me WAITED SHO 0 -1 1650815232 1 0 ssage from client

22 SQL*Net me WAITING 43 0 1650815232 1 0 ssage from client

Figure 4

Page 14: Apps session wait_tables

SQL> @sw5

Total Total Time(sec) Avg (ms)Wait Event Waits Timouts Waited Wait------------------------- --------- --------- --------- --------enqueue 5804940 635 15780.5 3buffer busy waits 2710683 10031 8469.5 3SQL*Net message from clie 3624 0 3980.5 1098rdbms ipc message 57584 2987 3332.3 58latch free 1046312 1024085 2511.7 2pmon timer 16262 6114 2326.1 143smon timer 81 77 2322.9 #####log file parallel write 267082 0 1644.0 6db file parallel write 21188 3366 1253.2 59rdbms ipc reply 8849 2154 775.5 88PL/SQL lock timer 1473 1347 270.8 184db file scattered read 1145592 0 183.7 0free buffer waits 1820 1608 164.5 90write complete waits 3482 492 122.1 35log file sync 8831 59 111.9 13db file sequential read 319650 0 55.8 0log buffer space 846 219 47.4 56log file switch completio 1141 22 39.9 35control file parallel wri 8052 0 39.2 5db file single write 3110 0 10.2 3log file single write 523 0 2.6 5control file sequential r 11578 0 .3 0row cache lock 12 0 .3 23log file sequential read 264 0 .3 1SQL*Net message to client 3625 0 .1 0process startup 3 0 .0 2SQL*Net break/reset to cl 42 0 .0 0instance state change 1 0 .0 0

Figure 5

Page 15: Apps session wait_tables

-- file swenque.sql

col sid format 9999 heading "Sid"col enq format a4 heading "Enq."col edes format a30 heading "Enqueue Name"col md format a10 heading "Lock Mode"col p2 format 9999999 heading "ID 1"col p3 format 9999999 heading "ID 2"

select sid, chr(bitand(p1,-16777216)/16777215)|| chr(bitand(p1, 16711680)/65535) enq, decode( chr(bitand(p1,-16777216)/16777215)||chr(bitand(p1, 16711680)/65535), 'TX','Transaction accessing a rbs', 'ST','Space Mgt activity (e.g., uet$, fet$)', 'SS','Sort Segment activity’, 'TM','DML activity, prevents object DDL’, 'UL','User Defined', chr(bitand(p1,-16777216)/16777215)||chr(bitand(p1, 16711680)/65535)) edes, decode(bitand(p1,65535),1,'Null',2,'Sub-Share',3,'Sub-Exlusive', 4,'Share',5,'Share/Sub-Exclusive',6,'Exclusive','Other') md, p2, p3from v$session_waitwhere event = 'enqueue'

SQL>@swenq

Sid Enq. Enqueue Name Lock Mode ID 1 ID 2----- ---- ------------------------------ ---------- -------- -------- 10 TX RBS Transaction Exclusive 131072 54387 11 TX RBS Transaction Exclusive 131072 54387 14 TX RBS Transaction Exclusive 131072 54387 23 TX RBS Transaction Exclusive 131072 54387 15 TX RBS Transaction Exclusive 131072 54387 61 TX RBS Transaction Exclusive 131072 54387 57 TX RBS Transaction Exclusive 131072 54387 55 TX RBS Transaction Exclusive 131072 54387 50 TX RBS Transaction Exclusive 131072 54387 45 TX RBS Transaction Exclusive 131072 54387 44 TX RBS Transaction Exclusive 131072 54387 42 TX RBS Transaction Exclusive 131072 54387 40 TX RBS Transaction Exclusive 131072 54387 70 TX RBS Transaction Exclusive 131072 54387 66 TX RBS Transaction Exclusive 131072 54387 63 TX RBS Transaction Exclusive 131072 54387 62 TX RBS Transaction Exclusive 131072 54387 39 TX RBS Transaction Exclusive 131072 54387 37 TX RBS Transaction Exclusive 131072 54387 35 TX RBS Transaction Exclusive 131072 54387 33 TX RBS Transaction Exclusive 131072 54387

Figure 6

Page 16: Apps session wait_tables

SQL> @sw3

Sess W'd So Time ID Wait Event Wait State Far (ms) W'd (ms) P1 P2 P3----- ---------- ---------- -------- -------- -------------- ---------- ------ 28 latch free WAITED KNO 0 8 3759327632 11 5 48 latch free WAITING 0 0 3759327632 11 6 68 latch free WAITING 0 0 3759326280 11 5 74 latch free WAITING 0 0 3759334496 11 0 75 latch free WAITING 0 0 3759337616 11 0 72 latch free WAITING 0 0 3759327632 11 7 10 enqueue WAITING 0 0 1415053318 131101 73620 33 enqueue WAITING 1 0 1415053318 262169 73576 11 enqueue WAITING 1 0 1415053318 262169 73576 40 enqueue WAITING 1 0 1415053318 262169 73576 70 enqueue WAITING 1 0 1415053318 262169 73576 61 enqueue WAITING 1 0 1415053318 262169 73576 45 enqueue WAITING 0 0 1415053318 131101 73620 15 enqueue WAITING 1 0 1415053318 262169 73576 14 buffer bus WAITING 0 0 5 8 1016 y waits 37 buffer bus WAITING 0 0 5 8 1016 y waits 42 buffer bus WAITING 0 0 5 8 1016 y waits 55 buffer bus WAITING 0 0 5 8 1016 y waits 63 buffer bus WAITING 1 0 5 8 1016 y waits 62 buffer bus WAITING 0 0 5 8 1016 y waits 66 buffer bus WAITING 1 0 5 8 1016 y waits 57 buffer bus WAITING 0 0 5 8 1016 y waits 50 buffer bus WAITING 0 0 5 8 1016 y waits 44 buffer bus WAITING 0 0 5 8 1016 y waits 39 buffer bus WAITING 0 0 5 8 1016 y waits 12 db file se WAITED SHO 2 -1 5 16 1 quential r ead

6 db file sc WAITED SHO 0 -1 5 299 4 attered re ad

Figure 7

Page 17: Apps session wait_tables

-- file swdbb.sql

col owner format a15 heading "Obj Owner" wrapcol sname format a20 heading "Obj Name" wrapcol stype format a10 heading "Obj Type" wrapcol tblsp format a10 heading "TBS Name" wrapcol fname format a40 heading ""

select owner, segment_name sname, segment_type stype, e.tablespace_name tblsp, file_name fnamefrom dba_extents e, dba_data_files fwhere e.file_id = f.file_id and e.file_id = &file_id and e.block_id <= &block_id and e.block_id + e.blocks > &block_id/

SQL> @swdbbEnter value for file_id: 5Enter value for block_id: 8Enter value for block_id: 8

Obj Owner Obj Name Obj Type TBS Name--------------- -------------------- ---------- ----------CSHALLAH ACCOUNT TABLE USERS/u03/oradata/fox/users01.dbf

Figure 8

-- file swlatch.sql

col lid format 9999 heading "Latch #"col lnm format a40 heading "Latch Name"

select latch# lid, name lnmfrom v$latchwhere latch# = &latch_number/

SQL> @swlatchEnter value for latch_number: 11

Latch # Latch Name------- ---------------------------------------- 11 cache buffers chains

Figure 9

$ sar -u 5 5

SunOS orapub1 5.5.1 Generic sun4u

14:53:17 %usr %sys %wio %idle14:53:22 87 13 0 014:53:27 93 7 0 014:53:32 88 12 0 014:53:37 88 12 0 014:53:42 87 13 0 0

Average 89 11 0 0

Figure 10

Page 18: Apps session wait_tables

SQL> select sid, sql_hash_value from v$session;

Sess ID SQL_HASH_VALUE----- -------------- 1 0 2 0 3 0 4 -1.866E+09 5 2020739258 6 1724283224 7 0 8 1724283224 10 -7079893 11 -7079893 12 -489192978 13 -489192978 14 -7079893 15 -7079893 19 -598940500 22 -1.699E+09 23 -7079893 27 -598940500 28 -489192978 30 -489192978 33 -1.945E+09 35 -7079893 37 -7079893 39 -7079893 40 -7079893 42 -7079893 44 -7079893 45 -7079893 48 -7079893 50 -7079893 55 -7079893 57 -7079893 58 0 61 -7079893 62 -7079893 63 -7079893 66 -7079893 68 -598940500 70 -7079893 72 -489192978 74 -944108210 75 -598940500 76 0

SQL> @sqls2 -7079893

Database: op01 1998 Jul 08 11:21pmReport: sstmt2.sql Page 1

SQL Statement Text (Ident=-7079893)

Line SQL Statement------ ----------------------------------------------------------------- 0 UPDATE ACCOUNT SET BROKER_ID=BROKER_ID WHERE ID = '25372'

Figure 11