Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1

www.enkitec.com 1

Troubleshoo4ng the Most Complex Performance Issue I’ve ever seen

Tanel Poder hAp://blog.tanelpoder.com

hAp://tech.e2sn.com

www.enkitec.com 2

Intro: About me

•  Tanel Põder Oracle Database Performance geek Exadata Performance geek Hadoop Performance geek

•  Enkitec •  Consultant •  Researcher •  Technology Evangelist

•  Just moved to Dallas •  ANer Tallinn -‐> Stockholm -‐> London -‐>

Cancun -‐> Singapore

Expert Oracle Exadata book

(with Kerry Osborne and Randy Johnson of Enkitec)

www.enkitec.com 3

Two issues -‐ actually

•  For warm-‐up: •  cursor pin: S wait events and sporadic CPU spikes •  Read more from my blog entry:

•  hAp://blog.tanelpoder.com/2010/04/21/cursor-‐pin-‐s-‐waits-‐sporadic-‐cpu-‐spikes-‐and-‐systema4c-‐troubleshoo4ng/

•  Or just google for “cursor pin s”

www.enkitec.com 4

Environment

•  High-‐concurrency, high-‐visibility OLTP database •  Oracle 11.1.0.7 single-‐instance, dedicated server processes •  HP-‐UX on Itanium •  32 CPUs, 128 GB RAM

•  Thousands of end users •  Mul4ple WebLogic applica4on servers talking to database via

connec4on pools

www.enkitec.com 5

The problem

•  Sporadic extreme slowness of Oracle DB and the server •  Slowness lasts for 1 .. 20 minutes at a 4me… •  Queries don’t answer or extremely slow •  Can’t even log on to OS during that 4me

•  New SSH connec4ons succeded once the spike was over •  It takes minutes to run simple OS commands during the problem 4me

•  This is a global server-‐wide problem – everyone complains! So, the scope of this problem

is global, server-‐wide. Therefore we can use global,

server-‐wide metrics to diagnose the problem.

www.enkitec.com 6

Let’s pick and diagnose one occurrence of this problem

•  The database response 4mes extremely bad again around 18:10 and this lasted for about 5 minutes…

•  If it’s the users who report the problem (as opposed to applica4on side measurements), then there may be some discrepancies in the user reported 4mes vs actual problem 4me

www.enkitec.com 7

Ini4al AWR Report

Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- ---------

Begin Snap: 61921 30-Oct-10 18:00:10 2,383 28.9

End Snap: 61922 30-Oct-10 18:20:20 2,863 24.7

Elapsed: 20.17 (mins) DB Time: 559.31 (mins)

Avg % DB

Event Waits Time(s) (ms) time Wait Class

--------------------------- --------- ----------- ------ ------ ----------

db file sequential read 2,135,668 21,468 10 64.0 User I/O

DB CPU 5,860 17.5 log file sync 92,720 1,498 16 4.5 Commit

read by other session 91,676 1,307 14 3.9 User I/O

SQL*Net message from dblink 525 1,132 2155 3.4 Network

Host CPU (CPUs: 32 Cores: 32 Sockets: 32) ~~~~~~~~ Load Average

Begin End %User %System %WIO %Idle

--------- --------- --------- --------- --------- ---------

0.37 0.31 16.3 17.6 12.1 66.1

Using a 20 minute report for diagnosing a 5 minute problem?!

Number of sessions has grown by ~500!

This 66% idle is an average over 20

minutes!

www.enkitec.com 8

ASH data (shown in OEM)

•  Average ac4ve sessions showed something different •  Note that this data is from another period of 4me when a similar spike happened

•  In worst 4mes there were up to 220 ac4ve sessions trying to be on CPU! •  Thanks to beAer granularity we see the spikes instead of some 20-‐minute or hourly

averages… •  The problem with ASH samples is that it looks into session state from inside Oracle •  Perhaps the starva5on is due to some other applica5on / instance in the server?

www.enkitec.com 9

How many logons were done?

Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 61921 30-Oct-10 18:00:10 2,383 28.9 End Snap: 61922 30-Oct-10 18:20:20 2,863 24.7 Elapsed: 20.17 (mins) DB Time: 559.31 (mins) Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- index fetch by key 24,174,148 19,971.0 43.6 index scans kdiixs1 24,565,055 20,293.9 44.3 leaf node 90-10 splits 5,865 4.9 0.0 leaf node splits 14,529 12.0 0.0 lob reads 34,480 28.5 0.1 lob writes 1,623,273 1,341.0 2.9 lob writes unaligned 1,623,266 1,341.0 2.9 logons cumulative 2,550 2.1 0.0 messages received 133,740 110.5 0.2 messages sent 133,740 110.5 0.2 min active SCN optimization appl 538,358 444.8 1.0 no buffer to keep pinned count 6,331 5.2 0.0 no work - consistent read gets 146,703,542 121,196.1 264.7 opened cursors cumulative 4,168,700 3,443.9 7.5

Number of sessions has grown by ~500!

2.1 logons per second, but we don’t know how these logons are distributed over the 20 minute

period!

www.enkitec.com 10

OS level metrics don’t lie (well, they do, but less ;-‐)

| | | Phys | Phys |Memory|Pg Out | VM Pg | Date | Time |CPU % | IO Rt | KB Rt | % | Rate | Scan Rt | 10/30/2010|17:45:00| 25.06| 7149.5| 145817.6| 74.07| 0.0| 0.0|

10/30/2010|17:50:00| 24.77| 5334.8| 60928.0| 73.98| 0.0| 0.0|

10/30/2010|17:55:00| 24.60| 7176.4| 186368.0| 73.98| 0.0| 0.0|

10/30/2010|18:00:00| 25.95| 7556.2| 192307.2| 74.11| 0.0| 0.0|

10/30/2010|18:05:00| 22.88| 5379.5| 67584.0| 74.15| 0.0| 0.0|

10/30/2010|18:10:00| 66.89| 4544.6| 58060.8| 76.97| 0.0| 0.0| 10/30/2010|18:15:00| 24.51| 7544.9| 159334.4| 76.40| 0.0| 0.0|

10/30/2010|18:20:00| 25.47| 5144.0| 59187.2| 75.04| 0.0| 0.0|

10/30/2010|18:25:00| 28.38| 10139.5| 151552.0| 74.37| 0.0| 0.0|

1)  What does the Time 18:10:00 mean, beginning of the monitoring interval or end?

2)  66.89% busy during 5 minutes may actually mean 100% busy during ~3 minutes out of 5, but we don’t know that for sure without measuring in more detail (beAer granularity)…

www.enkitec.com 11

Measuring CPU u4liza4on in more detail The spike lasted from 18:11 to 18:14 (3 min)

Around 90% in Kernel mode!!!

www.enkitec.com 12

Checkpoint – measured evidence so far

•  Fact: We have a 100% CPU u4liza4on spike, las4ng 3 minutes

•  Fact: 90% of it is spent in KERNEL mode

•  Fact: We have over 2500 logons done during 20 minute period •  2.1 logons / second on average (which doesn’t sound bad)

•  Kernel mode CPU usage is usually caused by system calls •  …or some internal kernel thread ac4vity 100% CPU usage doesn’t always automa4cally mean you have a serious CPU starva4on problem. The CPU runqueue length would indicate you how much

starva4on (wai4ng for CPU service) there is. However seeing 90% of CPU used in KERNEL mode is definitely not normal for an Oracle database server.

www.enkitec.com 13

Diagnosing 90% kernel-‐mode CPU usage spikes…

1.  Systema4c •  Break down this 90% of Kernel mode CPU usage •  Profiling! •  Oh, this is a produc4on system and the problem is acute & ongoing •  On Solaris, I’d have used Dtrace stack() probe to record OS kernel

stack traces most common on CPU (google for dstackprof) •  Or lockstat as it reports spins on spinlocks (which consume kernel CPU)

•  But this was HP-‐UX and I didn’t know the tools needed •  But I knew what numbers I wanted to see! •  We sent a request to HP-‐UX support:

“How do we measure & break down where is kernel mode CPU used?”

2.  Check for usual suspects •  Fast, cheap checks to rule out or find known troublemakers

www.enkitec.com 14

Kernel mode CPU usage spikes – the usual suspects

•  Before star4ng the systema4c troubleshoo4ng & drilldown, do quick checks for usual suspects •  Remember, the client has a business problem, 4me is of essence…

1.  Logon (or logoff) storms •  Spawning, ini4alizing new processes, opening files and aAaching to

SGA means system calls, kernel CPU usage

2.  Oracle code gevng into some crazy loop (due to a bug) •  Semop(), yield(), read /proc/…, getrusage(), etc loop

3.  OS kernel spinlock conten4on •  Variety of reasons… •  ONen due to bugs in OS or some kernel module

Not so usual suspects really…

www.enkitec.com 15

Measuring logon storms

•  Use the AUD$ records or “logons cumula@ve” number from V$SYSSTAT or AWR, right? •  Wrong!

•  logons cumula5ve number is incremented by the session itself – aNer it has logged on, the same applies to audit records! 1.  ANer the listener connec4on has been established… 2.  The process has been started… 3.  It has aAached to SGA SHM segments… 4.  Audit file has been wriAen (if needed) … 5.  Process, session SGA structures have been created

•  Memory from OS and shared pool allocated (shared pool latches!) 6.  Session has been authen4cated 7.  Then the logons cumula4ve is incremented!

The logon storm may have started way

before these logons finally succeeded!

www.enkitec.com 16


•  Logon storms should be measured at the listener level •  Process listener.log using a script:

$ tail listener.log

30-OCT-2010 23:22:03 * (CONNECT_DATA=…) * establish * E2SNDB * 0



30-OCT-2010 23:22:09 * service_update * E2SNDB * 0

$ fgrep "30-OCT-2010" listener.log | fgrep "establish" | \

awk '{ print $1 " " $2 }' | awk -F: '{ print $1 }' | \

sort | uniq –c

…

5564 30-OCT-2010 20

5571 30-OCT-2010 21

5703 30-OCT-2010 22

4828 30-OCT-2010 23

…

5703 logons during 22:00 – 22:59

www.enkitec.com 17


•  Logon storms should be measured at the listener level •  Process listener.log using a script:

$ fgrep "30-OCT-2010 22:" listener.log | fgrep "establish" | \ awk '{ print $1 " " $2 }' | awk -F: '{ print $1 ":" $2 }' | \ sort | uniq –c … 88 30-OCT-2010 22:00 120 30-OCT-2010 22:01 94 30-OCT-2010 22:02 94 30-OCT-2010 22:03 95 30-OCT-2010 22:04 120 30-OCT-2010 22:05 79 30-OCT-2010 22:06 101 30-OCT-2010 22:07 85 30-OCT-2010 22:08 100 30-OCT-2010 22:09 85 30-OCT-2010 22:10 89 30-OCT-2010 22:11 83 30-OCT-2010 22:12 93 30-OCT-2010 22:13

How many connec4ons listener established every minute (this data from non-‐

problem 4me)

www.enkitec.com 18

Con4nuing the OS kernel mode CPU usage diagnosis

# tusc -cp 8021

( Attached to process 8021 ("oracleXYZ (LOCAL=NO)") [64-bit] )

( Detaching from process 8021 ("oracleXYZ (LOCAL=NO)") )

Syscall Seconds Calls Errors

open 1.05 3 3 ----- ----- ----- -----

Total 1.05 3 3

Let’s trace system calls by one CPU

heavy Oracle process

Interes4ngly 3 open() syscalls take over a second in total. Could this be caused by the general CPU starva4on in the

server?

The next step should have been to check which file did we try to open (but the spike ended before we could do

that)

Seeing errors isn’t a problem itself, as this is how Oracle and libs some4mes check for existence of a file…

www.enkitec.com 19

Con4nuing the OS kernel mode CPU usage diagnosis

•  HP-‐UX support got back to us and recommended the use of runki tool for measuring kernel CPU u4liza4on in detail •  It had to be installed as root •  It measured a lot of things happening in kernel, wri4ng a big output

file •  on Solaris there’s a tool called TNF trace, AIX has trace command for such full kernel tracing

•  Now we had to wait for the problem to happen again •  Didn’t have to wait for too long… •  We sent the raw trace dump to HP Support, so they could run

something like “tkprof” on that tracefile •  It basically just summed up the kernel spinlock wait, spin events by lock, object locked etc

www.enkitec.com 20

kiprof – profiled runki output

Total Hardclock traces: 30239 ================================ State Count Percent USER 7130 23.58 SYS 22965 75.94 IDLE 24 0.08 SSYS 120 0.40

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Kernel Functions executed during profile Count Pct State Function ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 15615 51.64% SYS timed_preArbitration 7130 23.58% USER OTHER 1275 4.22% SYS spinlock 1008 3.33% SYS wait_for_lock_spinner 488 1.61% SYS vx_dirbread 464 1.53% SYS vx_bc_getblk 417 1.38% SYS preArbitration 338 1.12% SYS vx_dirscan 291 0.96% SYS vx_bc_brelse

That’s basically spinning for locks

(adap4ve decisions to spin or wait)

Self-‐explanatory. We are spinning for a lock

VxFS directory block read?

VxFS directory contents scan !!!

www.enkitec.com 21

Drilling down to kernel spinlock spinning

The main spinlock experiencing spinning was related to VxFS:

spn%kern cumpct seconds spn%cpu lock name

7.24 7.24 71.96 3.77 FS:vxfs:bc_freelist_lock spin 0.11 7.36 1.14 0.06 FS:vxfs:inode spin for sleep lock

0.04 7.40 0.44 0.02 FS:vxfs:i_spinspin lock

0.04 7.44 0.42 0.02 Sleep Queue lock

0.01 7.45 0.12 0.01 v_count_lock

Oracle processes were the main ones spinning:

spn%spn cumpct spnsec usrsec kernsec spn%kern process name

99.17 99.17 73.82 880.32 543.70 13.58 oracle 0.60 99.77 0.45 0.00 7.37 6.11 vxfsd

0.08 99.85 0.06 1.42 11.52 0.52 tnslsnr

0.05 99.91 0.04 0.00 402.60 0.01 [IDLE]

0.03 99.93 0.02 0.00 0.50 4.00 xyz

0.03 99.96 0.02 0.00 0.43 4.65 sadc

www.enkitec.com 22

Which system calls?

$ /opt/ktools/bin/kiinfo -kitrace pid=28679,syscalls -ts 1103_1307 | grep open | sort -r -n -k 13,13 | more

2.550292 cpu=6 seqcnt=43808107 pid=28679 ktid=526366 utid=1 open [5] err=17 ret1=12 ret2=1 syscallbeg= 3.295138 ktcsys= 0.000000 A0=0x9fffffff A1=

0xffffceb0 A2=0 A3=0xd02 A4=0 A5=432 fname='/opt/apps/oracle/admin/xyzdb/adump/xyzdb_ora_28679_21.aud'







www.enkitec.com 23

Audit file des4ona4on

•  New audit file name format in 11g… •  A new file is created for each audit file where the SPID collides with a

previous file… •  Every @me when crea@ng a new audit file, Oracle has to check whether

such file already exists with suffix _1, then _2, _3, etc..

$ cd /u01/app/oracle/admin/E2SNDB/adump $ ls -l | head total 4788 -rw-r----- 1 oracle dba 735 Feb 28 16:06 e2sndb_ora_10028_1.aud -rw-r----- 1 oracle dba 710 Oct 16 17:58 e2sndb_ora_10082_1.aud -rw-r----- 1 oracle dba 735 Oct 16 17:58 e2sndb_ora_10082_2.aud -rw-r----- 1 oracle dba 735 Feb 27 17:53 e2sndb_ora_10095_1.aud -rw-r----- 1 oracle dba 736 Oct 16 17:58 e2sndb_ora_10120_1.aud -rw-r----- 1 oracle dba 740 Oct 16 17:58 e2sndb_ora_10125_1.aud -rw-r----- 1 oracle dba 735 Feb 28 16:07 e2sndb_ora_10158_1.aud -rw-r----- 1 oracle dba 735 Feb 24 17:44 e2sndb_ora_10206_1.aud -rw-r----- 1 oracle dba 735 Dec 22 21:28 e2sndb_ora_10482_1.aud

www.enkitec.com 24

Shouldn’t the audit files be created only for SYSDBA and SYSOPER access?

•  In theory, yes. •  In prac4ce in our case, no.

•  Bug 9744092: EXCESSIVE AMOUNT OF AUD FILES BEING GENERATED IN 11.1 •  Oracle generated a new .aud file for every new database connec4on! •  Not just SYSOPER/SYSDBA like normally •  This is all despite having audit_trail = DB

•  Normally the .aud files in audit_dump_dest are not related to regular audit trail, but for SYSOPER/SYSDBA logon, startup/shutdown audi4ng

•  …and if AUDIT_SYS_OPERATIONS = TRUE then all commands issued as SYS

www.enkitec.com 25

Bug 9744092: EXCESSIVE AMOUNT OF AUD FILES BEING GENERATED IN 11.1

PROBLEM: -------- After upgrade to 11.1 the system creates 10 - 16 *aud files per minute in audit_file_dest. Out of 9528 *.aud files that the customer uploaded, 9124 files recorded ACTION:[3] "102". DIAGNOSTIC ANALYSIS: -------------------- The change of behavior (move audit action 102 from aud$ to audit_file_dest when audit_trail=DB.) is due to a fix for an unpublished bug 5476184 in 11.1. It is not an intended feature for 11G. WORKAROUND: ----------- Manual delete of audit files

www.enkitec.com 26

Diagnosis

1.  Thanks to bug 9744092 and a behavior change in Oracle 11.1 a new audit file was created for each new connec4on to DB •  If a file already existed, Oracle checked if a similar file name with

larger suffix value (_2, _3, _4 etc) existed 2.  The audit_file_dest eventually had over 1.5 M files in it!

•  For each logon, mul4ple file existence checks (open() syscalls) had to be done

3.  Checking whether a file existed (open syscall -‐> directory entry scan) became very slow – and it’s done in kernel mode •  A spinlock was held during the directory entry scan •  Other new Oracle processes also wanted to do the directory scan,

resul4ng in spinlock conten4on and further Kernel mode CPU usage 4.  When the DB got slow – app servers fired up hundreds of

new connec4ons to “make things faster” •  This all fed back to the problem – even more conten4on & spinning

www.enkitec.com 27

Limi4ng logon storms

Use Oracle Listener connec4on rate limiter (11gR1+) listener.ora: LISTENER= (ADDRESS_LIST= (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)(RATE_LIMIT=5)) (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1522)(RATE_LIMIT=10)) (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1523)) )

Oracle Documenta5on: Oracle Net Listener Parameters (listener.ora) hQp://download.oracle.com/docs/cd/B28359_01/network.111/b28317/listener.htm

Also, it is possible to limit logoff storm rate _logout_storm_rate parameter (instance-‐wide)

www.enkitec.com 28

Troubleshoo4ng sporadic system performance issues

Right Data !!!

•  Right scope – if your problem lasts for seconds, this should be the granularity of your data too

•  OS level data, in addi4on to the database metrics •  Ideally OS level metrics sampled mul4ple 4mes per minute

www.enkitec.com 29

Conclusions

•  Logon storms are evil! •  They will amplify any performance hiccups as they cause extra load

just when the resources are scarcest

•  Connec4on pools firing up hundreds of new connec4ons are evil! •  Know your limits (both max connec4ons and max connect rate / sec)

•  Here’s a thought: •  If you have planned the servers’ capacity to support N-‐thousand connec4ons anyway (by allowing connec4on pools grow that high), why not create this amount of connec4ons right away?

•  This would avoid logon storms during worst 4mes as all connec4ons have already been created!