40
DB2 problem determination using db2top utility Optimize performance and prevent problems in complex DB2 environments Skill Level: Introductory Tao Wang ([email protected]) DB2 Advanced Technical Support IBM Shen Li ([email protected]) DB2 RAS/PD Software Developer IBM 04 Dec 2008 Get the best possible performance in complex IBM® DB2® for Linux® and UNIX® environments with the db2top utility. In this article, you'll learn about the advantages this tool offers, and see how to use it for monitoring and troubleshooting. In addition, you can follow two sample cases that illustrate how to use this tool to diagnose real problems in a production environment. Introduction There are several methods to collect information and diagnose DB2 system performance issues. The snapshot monitor is one of the most commonly used tools to collect information in order to narrow down a problem. However, most entries in snapshots are cumulative values and show the condition of the system at a point in time. Manual work is needed to get delta value for each entry from one snapshot to the next. The db2top tool comes with DB2, and can be used to calculate the delta values for those snapshot entries in real time. This tool provides a GUI under a command line mode, so that users can get a better understanding while reading each entry. This DB2 problem determination using db2top utility © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 40

DB2 Problem Determination Using Db2top Utility

  • Upload
    pkd007

  • View
    917

  • Download
    17

Embed Size (px)

Citation preview

Page 1: DB2 Problem Determination Using Db2top Utility

DB2 problem determination using db2top utilityOptimize performance and prevent problems in complex DB2environments

Skill Level: Introductory

Tao Wang ([email protected])DB2 Advanced Technical SupportIBM

Shen Li ([email protected])DB2 RAS/PD Software DeveloperIBM

04 Dec 2008

Get the best possible performance in complex IBM® DB2® for Linux® and UNIX®environments with the db2top utility. In this article, you'll learn about the advantagesthis tool offers, and see how to use it for monitoring and troubleshooting. In addition,you can follow two sample cases that illustrate how to use this tool to diagnose realproblems in a production environment.

Introduction

There are several methods to collect information and diagnose DB2 systemperformance issues. The snapshot monitor is one of the most commonly used toolsto collect information in order to narrow down a problem. However, most entries insnapshots are cumulative values and show the condition of the system at a point intime. Manual work is needed to get delta value for each entry from one snapshot tothe next.

The db2top tool comes with DB2, and can be used to calculate the delta values forthose snapshot entries in real time. This tool provides a GUI under a command linemode, so that users can get a better understanding while reading each entry. This

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 40

Page 2: DB2 Problem Determination Using Db2top Utility

tool also integrates multiple types of DB2 snapshots, categorizes them, and presentsthem in different screens for the GUI environment.

This article introduces some commonly used screens in db2top utility in dailyperformance monitoring and troubleshooting work. You'll have a chance to examineseveral examples that show how to use this tool to narrow down problems in realcases. After reading this article, you will be able to:

• Understand how the db2top utility works

• Interpret the most useful entries in several most commonly used screens

• Monitor system performance, know whether there is something abnormalin daily operations, and be able to solve the problem by using db2top.

Read on, or link directly to the section that interests you:

• db2top command syntax

• How to start db2top

• Run db2top in interactive mode

• Run db2top in batch mode

• What can be monitored by db2top?

• Database (d)

• Tablespace (t)

• Dynamic SQL (D)

• Session (l)

• Bufferpool (b)

• Lock (U)

• Table (T)

• Bottlenecks (B)

• Case analysis

• Case 1: Lock waiting analysis in interactive mode

• Case 2: Performance analysis in replay mode

• Conclusion

Most entries or elements of interest are highlighted in red on figures or in bold text.

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 2 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 3: DB2 Problem Determination Using Db2top Utility

All the screenshots are captured from running db2top in interactive mode.

In this article, database "sample" will be used in each example and screenshot.

db2top command syntax

This article does not discuss the db2top command syntax in detail. Detailedcommand syntax and the user manual can be found in the DB2 Information Center.

Usage: db2top [-d dbname] [-n nodename] [-u username] [-p password] [-Vschema]

[-i interval] [-P [part]] [-a] [-B] [-R] [-k] [-x][-f file [+time] [/HH:MM:SS]][-b options [-s [sample]] [-D separator] [-X] -o outfile][-C] [-m duration]

db2top -h

-d : Database name (default DB2DBDFT)-n : Node name-u : User name-p : User password-V : Default explain schema-i : Interval in seconds between snapshots-b : background mode

option: d=database, l=sessions, t=tablespaces,b=bufferpools, T=tables,

D=Dynamic SQL, s=Statements, U=Locks, u=Utilities,F=Federation,

m=Memory -X=XML Output, -L=Write queries toALL.sql,

-A=Performance analysis-o : output file for background mode-a : Monitor only active objects-B : enable bold-R : Reset snapshot at startup-k : Display cumulated counters-x : Extended display-P : Partition snapshot (number or current)-f : Replay monitoring session from snapshot data collector

file,can skip entries when +seconds is specified

-D : Delimiter for -b option-C : Run db2top in snapshot data collector mode-m : Max duration in minutes for -b and -C-s : Max # of samples for -b-h : this help

Parameters can be set in $HOME/.db2toprc, type w in db2top to generatethe resourceconfiguration file.

How to start db2top

db2top can be run in two modes, interactive mode or batch mode. In interactivemode, the user enters command directly at the terminal text user interface and waitsfor the system to respond. Note that the left and right arrow keys on the keyboard

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 3 of 40

Page 4: DB2 Problem Determination Using Db2top Utility

can be used to scroll columns to left or right, so that you can see the hidden columnson many screens in interactive mode. On the other hand, in batch mode a series ofjobs are executed without user interaction.

Run db2top in interactive mode

Enter the following command from a command line to start db2top in interactivemode:

db2top -d sample

Figure 1. To run db2top in interactive mode

In Figure 1, field values are returned at the top of the screen:

[\]15:38:20, refresh=2secs(0.003) AIX,part=[1/1],SHENLI:SAMPLE

• [/]: When rotating, it means that db2top is waiting between two snapshots,otherwise, it means db2top is waiting for an answer from DB2.

• 15:38:20: Current time

• refresh=2secs: Time interval

• refresh=!secs: The exclamation mark means the time to process the

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 4 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 5: DB2 Problem Determination Using Db2top Utility

snapshot by DB2 is longer than the refresh interval. In this case, db2topincreases the interval by 50 percent. If this occurs too often because thesystem is too busy, you can either increase the snapshot interval (optionI), monitor a single database partition (option P), or turn off extendeddisplay mode (option x).

• 0.003: Time spent inside DB2 to process the snapshot

• AIX: Platform on which DB2 is running

• Inactive: Means that the database has not been activated, otherwise itindicates that the database is activated.

• part=[1/1]: Active database partition number versus total databasepartition number. For example, part=[2,3] means one database partitionout of three is down (2 active, 3 total).

• SHENLI: Instance name

• SAMPLE: Database name

[d=Y,a=N,e=N,p=ALL] [qp=off]

• d=Y/N: Delta or cumulative snapshot indicator (command option -k oroption k)

• a=Y/N: Active only or all objects indicator (-a command option set or i)

• e=Y/N: Extended display indicator

• p=ALL: All database partitions

• p=CUR: Current database partition (-P command option with no partitionnumber specified)

• p=3: Target database partition number: say 3

• db2top can be used to monitor a DPF environment. If the -P commandoption is not specified, a global snapshot should be captured.

• qp=off/on: Query patroller indicator (DYNMGMT database configurationparameter) for the database partition on which db2top is attached

Below the status field, a user manual is displayed and can be selected by pressingkeys on the keyboard.

Run db2top in batch mode

You can use db2top in batch mode to monitor a running database unattended. Userscan record performance information using db2top in the background and thehistorical data is stored for further analysis.

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 5 of 40

Page 6: DB2 Problem Determination Using Db2top Utility

The following code listing shows how you would run db2top in collection mode for along period (for example, eight hours in total, and a 15 seconds interval betweeneach snapshot):

db2top -d sample -f collect.file -C-m 480 -i 15[11:36:02] Starting DB2 snapshotdata collector, collection every 15second(s),

max duration 480minute(s), max file growth/hour100.0M,

hit [CTRL+C] tocancel...[11:36:02] Writing to'collect.file',

should I create a namedpipe instead of a file [N/y]? N

Make sure N is input to answer the question.After the data has been collected into the file, users can use the followingcommands to run db2top in replay mode, in order to analyze the data gatheredduring the period of data collection:

db2top -d sample -fcollect.file -b l -A

Option -A enables automatic performance analysis. So, the above command willanalyze the most active sessions, which takes up the most CPU usage.

The following command runs db2top in replay mode, jumping to the time of interestto analyze.

db2top -d sample -fcollect.file /HH:MM:SS

For example, the user restarts db2top in replay mode and it jumps to 2am exactly:

db2top -d sample -fcollect.file /02:00:00

then, the user enters l to analyze what the session was doing.

What can be monitored by db2top?

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 6 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 7: DB2 Problem Determination Using Db2top Utility

Database (d)

Figure 2. Database screen

On the database screen, db2top provides a set of performance monitoring elementsfor the entire database.

Users can monitor active session (MaxActSess), sort memory (SortMemory), logspace (LogUsed), and FCM memory usage (FCM BufLow). These monitoringelements can help users identify what is the current percentage of usage for thoseelements. If one of those elements starts reaching high or even 100 percent, usersshould start to investigate what happened.

The elapsed time between database Start Time and the current time can be used tounderstand how long the database has being activated. This value can be veryuseful when combined with other monitoring elements to investigate issues thathave been floating around over a period of time.

Lock usage (LockUsed) and escalation (LockEscals) can be very helpful to narrowdown locking issues. If a huge number of lock escalations is observed, it is a goodidea to increase the LOCKLIST and MAXLOCKS database parameters, or startlooking at bad queries that may request a huge amount of locks.

L_Reads, P_Reads, and A_Reads represent Logical Reads, Physical Reads, andAsynchronous Reads. Combined with the hit ratio (HitRatio) value, these variablesare very important to evaluate whether most of the reads happened in memory or indisk I/O. Since disk I/O is much slower than in-memory-access, users may prefer to

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 7 of 40

Page 8: DB2 Problem Determination Using Db2top Utility

access data in memory as much as possible. When users see the HitRatio droppinglow, it is then a good time to start looking at whether the bufferpools are not largeenough, or if there is any bad query requesting too much table scans and flushingout other pages from memory to disk.

Similarly with reads, A_Writes represents Asynchronous Writes, which indicates thedata pages are written by an asynchronous page cleaner agent before the bufferpool space is required. By knowing the number of writes happened during theelapsed time of the refresh rate of db2top, users also know how many write requestshave been made in the database. This could be useful to calculate the average timecost per write, which may be helpful in analyzing some performance issues causedby an I/O bottleneck. Users may expect a maximum ratio of A_Writes/Writes for bestwriting I/O performance.

SortOvf represents Sort Overflow. If users find that this number goes very high, itmight be good to look around queries. Sort Overflow happens when Sortheap is notlarge enough, so that a SORT or HashJoin operation may overflow the data intotemp space. Sometime the value can be dropped by increasing the size of Sortheap,but in other cases, it may not help much if the data set being sorted is much largerthan the memory that can be allocated to Sortheap. The sort overflow could be amajor bottleneck in a case like that. It may require physical I/O to proceed SORT orHash Join if the amount of data requested is larger than what the bufferpool can holdin temp space. Therefore, optimizing queries to reduce the number of sort overflowscould significantly help the performance of the system.

The last four entries in the Database screen show the Average Physical Read time(AvgPRdTime), Average Direct Read Time (AvgDRdTime), Average Physical Writetime (AvgPWrTime), and Average Direct Write time (AvgDWrTime). These fourentries directly reflect the performance of the I/O subsystem. If users observed anunexpected large amount of time spent on each Read or Write operation, furtherinvestigation should be made into the I/O subsystem.

Tablespace (t)

Figure 3. Tablespace screen

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 8 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 9: DB2 Problem Determination Using Db2top Utility

The tablespace screen provides detailed information for each tablespace. The HitRatio% and Async Read% columns can be very important to many users. You maynot get precise enough information by only monitoring the bufferpool hit ratio at thedatabase level. In an environment that contains many tablespaces, a bad queryoccurring in one tablespace could be obscured by averaging the hit ratio over alltablespaces. Monitoring Hit Ratio% and Async Read% on each tablespace level canbe useful to analyze how a system works in detail.

Delta logical reads(writes) and Delta physical reads(writes) (Delta l_reads(writes)and Delta p_reads(writes)) illustrate how "busy" those tablespaces are. Sometablespaces may not have a very high bufferpool hit ratio but they may also not havemuch activity. It is good to put more tuning effort into the tablespaces that have moreactivity than those idle ones in most cases.

The left and right arrow keys on the keyboard can be used to scroll columns to theleft or right. The Tablespace screen and some other screens may have multiplecolumns that cannot be displayed within a single screen. By pressing the left or rightarrow keys, users can scroll the screen to display more columns.

By pressing the left arrow key, users can see more read/write entries. Also theaverage read/write time (vg RdTime / Avg WrTime) can be used to understand whatis the average time cost per read/write in the tablespace.

The Space Used, Total Size, and % Full are convenient entries that can be used toeasily understand the size of each tablespace and their utilization.

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 9 of 40

Page 10: DB2 Problem Determination Using Db2top Utility

There are also several more columns that can be used to understand the types oftablespaces, for example DMS or SMS, and whether CIO/DIO are enabled or not.

Dynamic SQL (D)

Figure 4. Dynamic SQL screen

The Dynamic SQL screen provides detailed information for each cached SQLstatement. Users can also use this screen to generate db2expln and db2exfmtoutput for a specific query.

Number of Execution (Num Execution) and Average Execute Time (Avg ExecTime)can be used to understand how many times the specified query has been executedand what the average running time is. Average CPU Time (Avg CpuTime) can beused to compare with the Average Execute Time (Avg ExecTime) to understandwhat percentage of time is being spent on CPU activities, or most of the time beingspent on waiting for locks or I/O.

Rows read and Rows written are useful to understand the behavior of a query. Forexample, if users seeing a SELECT query associating with a huge number ofwritings, that may indicate the query may have sort (hash join) overflow and need tobe further tuned to avoid data overflow in temp space.

The hit ratio (Hit%) for Data, Index, and Temp l_reads are also calculated in db2toputility to help users easily address whether bufferpool size needs to be tuned.Average Sort Per Execution (AvgSort PerExec) and Sort Time are two goodindicators to show how many sorts have been done during the execution.

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 10 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 11: DB2 Problem Determination Using Db2top Utility

db2top utility also provides functionality to generate a db2expln or db2exfmt reportwithout manually running the commands. By entering a capital L on the DynamicSQL screen, it prompts you to enter a SQL hash string. The SQL hash string is thestring showing in the first column of the table, for example"00000005429283171301468277." Users can copy the string and paste it into theprompt and click Enter, as shown in Figure 5:

Figure 5. Dynamic SQL screen -- Query text

Then, choosing the e option on this screen generates db2expln output, or choosingthe x option generates db2exfmt output if the EXPLAIN.DDL has already beenimported to the database.

An empty screen is shown if explain tables do not exist or are under differentschema than the one currently being used. Users could execute the followingcommand to generate explain tables if necessary.

db2 connect to [dbname]db2 set current schema[Schema name]db2 -tvf [instance homedirectory]/sqllib/misc/EXPLAIN.DDLdb2 terminate

Session (l)

Figure 6. Session screen

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 11 of 40

Page 12: DB2 Problem Determination Using Db2top Utility

The Session screen provides detailed information for each application session. Thefirst column shows the Application Handle, and the following three columns: CPU%Total, IO% Total, Mem% Total represent the percentage of the resource thisapplication is consuming. In most cases, each session represents one connectionfrom the application side.

Application Status, and some statistics of rows read and write are displayed afterthese columns. Users can also see LocksHeld, Sorts(sec), and LogUsed informationon this screen. LogUsed information could be helpful to users when the transactionlog is running out of space. By using this monitor element, users are able to getsome ideas about which applications are consuming most of the log space.

The Session screen contains the information similar to what users can see on theDatabase screen. However, the information on the Session screen is for eachapplication. Usually it is good to combine the data from different screens to doperformance analysis. For example, a high number of read problems showing on theDatabase screen can be further investigated by looking on the Session screen andDynamic SQL screen in order to narrow it down to a particular application or SQL.

Bufferpool (b)

Figure 7. Bufferpool screen

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 12 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 13: DB2 Problem Determination Using Db2top Utility

On this screen, db2top provides information about utilization for each bufferpool.Users can see some basic information for bufferpools, such as reads, writes, andsize, and can also see more advanced matrices, such as bufferpool Hit Ratio% andAsync Reads%.

Generally speaking, bufferpool the hit ratio can be defined like the followingmatrices:

1 - ((pool_data_p_reads+ pool_xda_p_reads +

pool_index_p_reads +pool_temp_data_p_reads

+pool_temp_xda_p_reads +pool_temp_index_p_reads )

/ (pool_data_l_reads+ pool_xda_l_reads +pool_index_l_reads +pool_temp_data_l_reads +pool_temp_xda_l_reads

+pool_temp_index_l_reads ))* 100%

Lock (U)

Figure 8. Lock screen

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 13 of 40

Page 14: DB2 Problem Determination Using Db2top Utility

A locking issue is one of the most commonly seen issue during applicationdiagnosis. With db2top utility, users can easily list the locks held by applications.

It is also easier to analyze lock waiting problems using db2top. The following Figures9, 10, and 11 were captured in a testing scenario where a db2bp application iswaiting for another db2bp session.

Figure 9. Lock waiting -- Application status

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 14 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 15: DB2 Problem Determination Using Db2top Utility

In Figure 9, two agents(agent 24 and agent 9) are listed in the first column: AgentId(State). You can see that in the third column, Application Status, one of the agents(agent 24) is stuck in Lock Waiting status.

Figure 10. Lock waiting -- Lock status

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 15 of 40

Page 16: DB2 Problem Determination Using Db2top Utility

If users want to see more information in the Lock, by pressing left arrow on thekeyboard, more columns are displayed, as shown in Figure 10. From the LockStatus column, all locks are in Granted status except one: the lock with "-" status isthe lock being blocked. And in the Lock Mode column, both the requested lock mode(S) and the lock that is being held (IX) are displayed.

Figure 11. Lock waiting -- Table name

In this particular example, as seen in Figure 11, agent 24 is trying to request the Slock on table TAOEWANG.T1 and it is being locked by agent 9, which is holding theIX lock on the object.

Another very useful feature that db2top can provide in this screen is lock chainanalysis. It is not always easy to figure out the lock waiting relationship if multipleapplications are involved in the problem. The db2top utility provides a useful featureto dynamically draw the lock chain so that it is much easier for users to understandthe locking relationship between applications.

By entering a capital L, the lock chain is displayed. An example output could looksimilar to Figure 12:

Figure 12. Lock waiting -- Lock chain

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 16 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 17: DB2 Problem Determination Using Db2top Utility

Table (T)

Figure 13. Table screen

The Table screen shows the table information in the database. The idle table that isnot being accessed during the elapsed time is shown in a white color. The tables

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 17 of 40

Page 18: DB2 Problem Determination Using Db2top Utility

that are being accessed (active) are shown in a green color.

The Delta RowsRead(Written)/s represent the rows being read and written duringthe elapsed time divided by the time interval. This number shows how often aparticular table is used during the period.

There is also information about the table itself. The columns Data Pages and IndexPages represent how many pages are in the table. Table Type and Table Size arealso useful to understand the properties of the table.

Another important column is Rows Overflows/s, which indicates how many rowoverflows happened every second during the elapsed time. The overflown rowsindicate that data fragmentation has occurred. If this number is high, users shouldimprove table performance by reorganizing the table using the REORG utility, whichcleans up this fragmentation.

Bottlenecks (B)

Figure 14. Bottlenecks

Bottleneck analysis is something that a DBA cannot ignore. They want to know

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 18 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 19: DB2 Problem Determination Using Db2top Utility

which agent (application) severely limited the performance or capacity of a specificcomponent in the entire DB2 system. db2top answers this call by displaying themain consumer of critical server resources. The agent ID consuming most resourcesfor each category is shown on the screen.

The square box right under the title "Bottleneck" is for the timing analysis of variousdatabase operations:

The elapsed time used to calculate the percentage of each operation =(wait_lock_time + sort_time + bp_read_time + bp_write_time + async_read_time +async_write_time + prefetch_waite_time + direct_read_time + direct_write_time).

The following is the estimated percentage for each operation:

• wait lock ms: (wait lock time)/(elapsed time) = 80%

• sort ms : (sort time)/(elapsed time) = 0

• bp r/w ms: (buffer pool read and write time)/(elapsed time) = 10%

• async r/w ms: (async read and write)/(elapsed time) = 6%

• pref wait ms: (prefetch_waite_time)/(elapsed time) = 2%

• dir r/w ms: (direct read and write time)/(elapsed time) = 2%

The main body of the "Bottleneck" screen shows which agent is the bottleneck ineach server resource.

The first column, Server Resource, in the screen "Bottlenecks" shows what kind ofserver resource is monitored:

• Cpu: Which agent consumes the most CPU time.

• SessionCpu: Which application session consumes the most CPU time.

• IO r/w: Which agent consumes the most I/O read and write.

• Memory: Which agent consumes the most memory.

• Lock: Which agent is holding the most locks.

• Sorts: Which agent has executed the biggest number of sorting.

• Sort Times: Which agent consumes the longest sorting time.

• Log Used: Which agent consumes the most log space in the most recentunit of work.

• Overflows: Which agent has the most number of sort overflows.

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 19 of 40

Page 20: DB2 Problem Determination Using Db2top Utility

• RowsRead: Which agent has read the most number of rows of records.

• RowsWritten: Which agent has written the most number of rows ofrecords.

• TQ r/w: Which agent has sent and received most number of rows on tablequeues.

• MaxQueryCost: Which agent has the max SQL execution time estimatedby the compiler.

• XDAPages: Which agent has the most number of pages for XDA data(available in V9.1GA and after releases).

For example: Figure 14 shows that agent 683, which is db2bp (DB2 back endprocess), is apparently the bottleneck.

As for memory usage bottleneck analysis, you can see the following in Figure 14:

=> Memory 717.11% 832.0Kdb2bp

This says that among all the agents, agent 7, which is another db2bp (DB2 back endprocess), consumes the most memory: 17.11 percent or 832.0K.

Case analysis

Now that you've looked at the meaning of useful entries on some screens, here aretwo sample cases to illustrate how to use db2top in a working environment to quicklynarrow down the root cause of problems in a system.

The first example is about lock waiting. In this scenario, a heavy workload is runningin the background, and a simulation program is trying to delete rows in a table,causing other sessions to be stuck in lock waiting status.

The second case illustrates how to use db2top in replay mode to captureperformance information over a period of time, so that a DBA is able to review theinformation afterward.

Case 1: Lock waiting analysis in interactive mode

By looking at the Bottleneck screen in db2top, you observed huge lock waiting, asshowing in Figure 16:

Figure 15. Case 1 -- Lock waiting

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 20 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 21: DB2 Problem Determination Using Db2top Utility

By looking at the box shown at the top of the screen, it is clear that the entry "waitlock ms" took the most time, compared to the other operations. This screenshot tellsyou that some application(s) are stuck in lock waiting mode and waiting for locks tobe released.

Usually, it is useful to find out which application is holding most of the locks in thisscenario. From Figure 16, application ID (appid) 7 is shown under the Top Agentcolumn in the Locks row, and the "Resource Usage" column is showing "99.84%" oflocks in the entire database are held by this application.

Now, it is useful to look into this application to understand what exactly it was doing(by entering a), or it is also be helpful to look on the Session screen to see whichapplication is waiting for locks (by entering l).

Entering a on the Bottleneck screen prompts users to input the appid. In this case,"7" is input and it leads to the screen shown in Figure 16:

Figure 16. Case 1 -- Lock holding application

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 21 of 40

Page 22: DB2 Problem Determination Using Db2top Utility

Figure 17 shows the query that was run by appid 7. In this case, the query is"DELETE FROM T1 WHERE EMPNO='000210'."

It is also necessary to confirm whether this query is the one blocking otherapplications. Sometime it is possible that a lock waiting status occurs by waiting fortable locks instead of row locks, which is held by an application with very few locks.

Enter r to go back to the Bottleneck screen, and enter U to go to the Locks screen,as shown in Figure 17.

Figure 17. Case 1 -- Locks

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 22 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 23: DB2 Problem Determination Using Db2top Utility

In Figure 17, appid 7 shows the "UOW Waiting" status and appid 11 is in the LockWaiting status. By pressing the left-arrow key, the screen is scrolled to Figure 18:

Figure 18. Case 1 Lock waiting

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 23 of 40

Page 24: DB2 Problem Determination Using Db2top Utility

In Figure 18, appid 7 is holding more than 5000 locks. Since the application wasdeleting rows from the table, there are 5119 X row locks being held by thisapplication.

By looking into appid 11, in the Locked By column, it shows that the locks that appid11 is requesting are held by appid 7. In the second column, Lock Mode, "NS [X]"means that the application is holding an NS lock on one row and trying to convertinto an X lock, and the Lock Status column shows "-",which means that the lock isnot granted. Therefore, the Locked By column shows that the appid 7 is the oneholding the lock and blocking appid 11 from getting it.

Now it is much more clear what happened to the system. Users may want to knowwhat appid 11 is doing in order to decide whether to let appid 7 continue holding thelock or force it.

By entering a again, and then entering 11, db2top shows the query that wasexecuted by appid 11, as shown in Figure 19.

Figure 19. Case 1 -- Lock waiting application

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 24 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 25: DB2 Problem Determination Using Db2top Utility

In Figure 20, appid 11 seems to be doing a full query to the table (SELECT * FROMT1). The advice is to remove the locks by killing appid 7, which is running queryDELETE FROM T1 WHERE EMPNO='000210'. Therefore, users can switch back toappid 7, enter r to get back to previous screen, enter a and 7 at the prompt, andenter f to force the application.

Case 2: Performance analysis in replay mode

Users can use db2top in replay mode to capture snapshot information over a periodof time with the -C option:

db2top -d sample -C -i 15-m 240

The above command captures a snapshot every 15 seconds for 240 minutes. Theoutput file is saved with the default name of db2snap-[dbname]-[platform][bit].bin inthe current directory.

Users can use db2top to analyze the output data, or even export the data into delimit

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 25 of 40

Page 26: DB2 Problem Determination Using Db2top Utility

format where the columns are separated with ";" character.

In this example, a user program was executed during a batch job running, whichcaused performance degradation. The data captured by db2top is used to narrowdown which program caused the problem.

After data being collected, the following commands can be used to dump data intodelimit format:

db2top -d [dbname] -f[filename] -b [screen suboptions]

For example, the following script can dump all screens into different files that can beused to analyze data, or even export data into a table or Microsoft Excel:

db2top -d sample -fdb2snap-sample-AIX64.bin-b d > dboutdb2top -d sample -fdb2snap-sample-AIX64.bin-b l > sessionoutdb2top -d sample -fdb2snap-sample-AIX64.bin-b t > tbspaceoutdb2top -d sample -fdb2snap-sample-AIX64.bin-b b > bpoutdb2top -d sample -fdb2snap-sample-AIX64.bin-b T > tboutdb2top -d sample -fdb2snap-sample-AIX64.bin-b D > sqloutdb2top -d sample -fdb2snap-sample-AIX64.bin-b s > stmtoutdb2top -d sample -fdb2snap-sample-AIX64.bin-b U > lockoutdb2top -d sample -fdb2snap-sample-AIX64.bin-b u > utiloutdb2top -d sample -fdb2snap-sample-AIX64.bin-b F > fedoutdb2top -d sample -fdb2snap-sample-AIX64.bin-b m > memout

There are several ways to narrow down the problem from these data. db2topprovides a useful option -A for automatic performance analysis, as shown in Figure20.

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 26 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 27: DB2 Problem Determination Using Db2top Utility

db2top -d sample -fdb2snap-sample-AIX64.bin-b l -A

Figure 20. Case 2 -- Auto analysis

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 27 of 40

Page 28: DB2 Problem Determination Using Db2top Utility

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 28 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 29: DB2 Problem Determination Using Db2top Utility

Figure 20 is from the -b l option, which is for session analysis.

The first section shows the top 20 applications consuming most of the CPU. In thiscase, appid 716 totally consumed almost 100 percent of the CPU from 18:58:59 to19:14:46.

The second section in the report (Figure 20) shows the top five applicationsconsuming most of the CPU with about a five minute interval.

It can be seen that between 18:52:59 and 18:58:14, there is no applicationsconsuming significantly high CPU. However, between the time 18:58:14 and19:13:31, appid 716 stayed on top of the list consuming 100 percent of the CPU.This could indicate that appid 716 was doing something odd and needed moreanalysis.

More detailed information can be seen by piping the delimited output into a databaseor Microsoft Excel.

Figure 21 was generated in Microsoft Excel from the file dbout, which was for theDatabase screen:

Figure 21. Case 2 -- I/O spike

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 29 of 40

Page 30: DB2 Problem Determination Using Db2top Utility

In Figure 21, there are two lines showing a spike in the graph. The red linerepresents physical reads and the blue line represents async writes.

Therefore, you can conclude that the database was getting very busy during the timewhen CPU usage was high due to appid 716, which says that it is very possible thatappid 716 caused high CPU and I/O usage.

Next, it will be useful to understand exactly what appid 716 was doing when problemoccured. db2top replay mode is helpful in this situation. From Figure 20, pick a timewhen the CPU was busy due to appid 716 (in this example 19:03:30 was chosen)then run the following command:

db2top -d sample -fdb2snap-sample-AIX64.bin/19:03:30

By switching to Sessions screen (using l), Figure 22 shows the following information:

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 30 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 31: DB2 Problem Determination Using Db2top Utility

Figure 22. Case 2 -- Session

In Figure 22, it is clear that appid 716 was consuming a high amount of CPU andI/O.

Then, entering t to go to the Tablespaces screen shown in Figure 23, shows that thetemp space (TEMPSPACE1) usage was high.

Figure 23. Case 2 -- Tablespace

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 31 of 40

Page 32: DB2 Problem Determination Using Db2top Utility

Next, pressing T to go to the Table screen, as shown in Figure 24, the temp table([716][SHENLI ].TEMP [00001_00002]) on top of the list has a pretty high I/O, andfrom the name of the table, it can be seen that the temp table was used by appid716.

Figure 24. Case 2 -- Table

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 32 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 33: DB2 Problem Determination Using Db2top Utility

It is also helpful to understand what appid 716 was doing. By entering a and thenentering 716, as shown in Figure 25, db2top displays the query that was executedby this application: SELECT * FROM T1 ORDER BY EMPNO

Figure 25. Case 2 -- Statement

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 33 of 40

Page 34: DB2 Problem Determination Using Db2top Utility

For now, the question is: why the statement caused significantly high CPU and I/O?

By entering x on the above screen, it generates db2exfmt output, as shown in Figure26.

Figure 26. Case 2 -- db2exfmt

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 34 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 35: DB2 Problem Determination Using Db2top Utility

From the explain output (Figures 26 and 27), TBSCAN was used against table T1,and the SORT operation happened on column EMPNO.

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 35 of 40

Page 36: DB2 Problem Determination Using Db2top Utility

Figure 27. Case 2 -- db2exfmt1

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 36 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 37: DB2 Problem Determination Using Db2top Utility

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 37 of 40

Page 38: DB2 Problem Determination Using Db2top Utility

In Figure 27 (part of the explain output ), note that the NUMROWS entry shows"1412163," which indicates the SORT operation will sort the entire 1412163 rows inorder to get the result. The SPILLED entry shows 154056, which represents a lot ofpage spilling for the sort operation. Going back to top of the db2exfmt output, SortHeap shows "16" only, which indicates that the db2agent was trying to sort the entire1412163 rows in a 16 page sort heap, which is apparently unable to hold all of thedata. Therefore, sort spilling happened and temp space was over used. That means,the SORT operation caused high CPU and spilling caused high I/O usage in thetemp space.

Finally, users may ask how to solve this problem. Users can use the db2advis utilityto get advice for this query. A typical output of the db2advis query can similar to thefollowing format:

Command:

db2advis -d sample -s"SELECT * FROM T1 ORDER BYEMPNO" -m IMCP

Output:

------ LIST OF RECOMMENDED INDEXES-- ===========================-- index[1], 0.095MB

CREATE INDEX "SHENLI"."IDX810261919380000" ON "SHENLI "."T1"

("EMPNO" ASC, "COMM" ASC, "BONUS" ASC,"SALARY" ASC,

"BIRTHDATE" ASC, "SEX" ASC, "EDLEVEL"ASC, "JOB" ASC,

"HIREDATE" ASC, "PHONENO" ASC, "WORKDEPT"ASC, "LASTNAME"

ASC, "MIDINIT" ASC, "FIRSTNME" ASC) ALLOWREVERSE

SCANS ;COMMIT WORK ;RUNSTATS ON TABLE "SHENLI "."T1" FOR

INDEX "SHENLI "."IDX810261919380000" ;COMMIT WORK ;

The advice is to create an index on table T1 as the query shown in the output.

Conclusion

The concept behind db2top is very different from DB2 Health Monitor. DB2 HealthMonitor sets up a group of thresholds and keeps monitoring those matrices. Onceany of the thresholds is reached, it will trigger the alarm. db2top is basically a tool toperiodically capture snapshots and allow users to read the result visually instead of

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 38 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 39: DB2 Problem Determination Using Db2top Utility

parsing snapshot files.

The db2top utility is a quite useful utility that allows users to monitor a DB2 system ina text graphical interface. The utility can be used to identify whether there is problemduring a period of time, and narrow down the root cause of the problem. Users willfind this a handy utility for monitoring real-time system and debugging problems intheir daily work.

Acknowledgement

Special thanks to Jacques Milman who provided helpful advice during the writing ofthis article.

ibm.com/developerWorks developerWorks®

DB2 problem determination using db2top utility© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 39 of 40

Page 40: DB2 Problem Determination Using Db2top Utility

Resources

Learn

• System Monitor Guide and Reference: Read about monitoring your database'ssystem.

• Performance Guide: Discover how to tune your system for optimal performance.

• DB2 for Linux, UNIX, and Windows Information Center: Learn more aboutdb2top.

• developerWorks Information Management zone: Learn more about DB2. Findtechnical documentation, how-to articles, education, downloads, productinformation, and more.

• Stay current with developerWorks technical events and webcasts.

Get products and technologies

• Build your next development project with IBM trial software, available fordownload directly from developerWorks.

Discuss

• Check out developerWorks blogs and get involved in the developerWorkscommunity.

About the authors

Tao WangTao Wang is an IBM Certified Advanced Database Administrator - DB2 for Linux,UNIX, and Windows. Tao currently works with the DB2 Advanced Support - DownSystem Division (DSD) team and has in-depth knowledge in the engine area.

Shen LiShen Li works on the DB2 RAS/PD development team based at the IBM Toronto lab,specializing in DB2 reliability, availability, serviceability, and problem determination.

developerWorks® ibm.com/developerWorks

DB2 problem determination using db2top utilityPage 40 of 40 © Copyright IBM Corporation 1994, 2008. All rights reserved.