First Things First

First Things First: Software−based Performance Tuning

I S YOUR SYSTEM EXPERIENCING HIGH

CPU use, memory faulting, poor responsetime, high disk activity, long−running batch

jobs, poorly performing SQL or client/server orWeb requests? Are you about to spend thousands ofdollars on more memory, faster disk drives, orother upgrades? Before you do, make sure youknow why you're having performance issues. Acouple of days gathering data from your system andanalyzing it from the highest level down toindividual lines of code is time well spent − andwill likely yield a result you may not haveconsidered. New hardware may have little, if any,effect on system responsiveness, because the rootcause of 90 percent of all performance issues isexcessive I/O in one or more applications on yoursystem.

This white paper will focus on part of theprocess of identifying and solving the root causesof these application performance issues. Thesample data was collected using a toolset for theiSeries from MB Software called the WorkloadPerformance Series . It collects data from a systemand presents it in a way that lets you analyze yourenvironment from various perspectives anddifferent levels of detail. The data is gleaned from areal analysis of an actual system, so it accuratelyreflects what you might expect to find in your ownenvironment. But regardless of whether you use theWorkload Performance Series or anotherdata−gathering solution, what's important is howyou use the data to address the root causes ofperformance issues within your environment.

Subsystem Analysis

First, look at a high−level view of what's going on inyour system. (Figure 1) shows CPU use by subsystem.This particular system had significant response timeissues, and when we look at the data over a 24−hourperiod, we find that QBATCH consumed 55 percent ofthe CPU on this system. MIMIXSBS consumed another15 percent, and QINTER took 11 percent. Typically,we find that a one−to−one relationship exists betweenCPU use and I/O performed, and the subsystem thatconsumes the most CPU also performs the most I/O.We explore this relationship in more detail later on; butif we can reduce these 8.5 billion I/Os, we're going toimprove the duration of jobs, consume less CPU,experience better response time, less memory use, lessmemory faulting, and less disk activity.Physical I/O is one of the slowest things on a box,

relative to accessing data that's already in memory orbeing used internally in your programs. We want toblock data as efficiently as possible, so whenever diskarms are physically going to disk to retrieve data wewant to make sure its being done with the fastest diskdrives, the most disk arms, and the newest cachingalgorithms. But we also want to make sure that we'reonly performing I/O when absolutely necessary toaccomplish a task. Looking at the same data in anotherway (Figure 2) confirms the one−to−one relationshipbetween CPU and I/O. QBATCH, which wasresponsible for 55 percent of the CPU consumption, isresponsible for 52 percent of the I/O.On the other hand, sorting the subsystem data by

number of jobs (Figure 3) shows that more jobs run inthe QPGMBATCH subsystem than in QBATCH.Initiating and terminating jobs creates a tremendousamount of overhead, even if the job only takes one CPUsecond each time it's run. It starts a new job number,opens 15 files, loads 12 programs into memory just todo a tiny bit of work; then it has to remove all thoseprograms from memory, close all the files, terminatethe job, and perhaps even generate a job log each time.

In QPGMBATCH, that happened 288,793 times ina week − that could have been one job sitting in thebackground, waiting on a data queue. Those files couldhave been left open, and the programs left in memory;then QPGMBATCH would only be responsible for onepercent of the jobs on the system, not 23 percent. Inturn, you may have that much less I/O, that much lessCPU.

Figure 1

Figure 2

Figure 3

Figure 4

Job Identification

If we look at the same data by job name, we find moreareas to drill down into for root cause. (Figure 4) showsthe top 10 CPU−consuming jobs on the system.Thousands of jobs run on this system all day long, buton job (PCC7C8619) was responsible for 22 percent ofall CPU consumed on this system − that's a greatopportunity. If you optimize that process to consumeonly 1 percent of the CPU, the whole CPU usage curvethat rises at 8 a.m. and drops at 5 p.m. in a typical bellcurve would drop by 21 percent. That's a dramaticimprovement not only in CPU and system capacity, butalso in I/O − this job performs almost two billionphysical I/Os (remember, I/O typically is the largestresource−consuming function on the system). Thesecond−largest CPU hog (CL1072CL) takes up 14

Figure 5

percent, which isn't much better.The same job data, sorted by physical I/O (Figure

5), shows that PCC7C8619 also performs the most I/O,clearly causing the CPU issue. However, CL1072CLisn't here; maybe that job doesn't have an I/O issue, butanother problem, such as excessive initiation andtermination. From the previous Figures, we know that itimpacts the system negatively, but we'll have to digdeeper to find the reason.

Figure 6

Sorting the data by number of jobs (Figure 6)shows that the jobs running under the nameROBOTCNL comprise 38 percent of all jobs on thesystem. The QZRCSRVS jobs are remote procedurecalls coming from other systems; the chart shows that60,126 times during the test period, another platformdid a remote procedure call and triggered a call to anative program − that's something that could have beendone more efficiently. It's simpler to just use thatbuilt−in Client Access capability for doing these remoteprocedure calls, but when you discover what kind ofimpact it has on the system from a performancestandpoint, you might consider doing things moreefficiently. For example, you could implement aTCP/IP socket client and server capability within thatapplication, or use data queues instead of remoteprocedure calls − either could have consumed a lot lessresource.QZDASOINIT is ODBC requests responsible for

10 percent of the jobs on the system. All day long, thesejob start connections, open SQL requests, do a little bitof work and close the requests. Over and over again,these requests start and stop jobs, open and close files,and load and unload programs into and out of memory.

Many of these jobs could have been left active all dayinstead of being initiated and terminated thousands oftimes throughout the day.

User Utilization

Typically, we believe that the end user using theWRKQRY command, SQL, or some type ofclient/server application like MS Access is theperformance issue on our box. But when we look atutilization by user we may find that what's being donein operations is actually the culprit. Regardless of whatapplication you're running, or what tools you're using toschedule jobs or replicate data for high availability, youmust address the underlying root causes of issues withinyour environment.

Figure 7

(Figure 7) immediately shows that two user IDs(ROBOT and MIMIXOWN) are responsible for asignificant amount of the CPU resource consumed onthis box. The third, TRANSFER, is probablytransferring data between platforms. BATCH andEDIOPR round out the top five resource consumers.But the hundreds of other users on the system representa very small piece of what's actually being utilized froma resource standpoint. (Figure 8) shows once again thatthe top CPU consumer is the top I/O performer, andhammers home the old message: focus on the I/O andyou will address many of the performance issues thatyou experience on your systems.(Figure 9) shows the data sorted by number of jobs,

and immediately highlights an area for improvement.Note that the EDI process − EDIOPR − accounts for 57percent of the jobs on the system. This job might

Figure 8

be running every three seconds to check for data to sendor receive through EDI. If you have a job that's runningevery three seconds, checking, checking, and checkingagain all day long, but never actually doing work,you're wasting a lot of resource. Using triggers, dataqueues, or other better−performing techniques fordetecting transactions at the proper status coulddramatically reduce the amount of resource consumedby this process.

Figure 9

DASD Analysis

Continue drilling down into the data; which files aremost often accessed, and which could be archived oreven purged? The key thing to remember is that youonly want to process the data that will be selected −don't read a million records if you're only going to

select 1,000. If you have seven years' worth of historyin a general ledger transaction history file, and youraccounting users only use it to look for recenttransactions to close the books at month's end, you needto streamline that file. Move the old data from thecurrent history file into an archived history file. Don'tdelete it − it's hard to get users to agree to deleteanything. Instead, create a file that has only the lastmonth's data in it for daily use, and move the previoussix years' data to an archived file. If you users wantcurrent information, they can access one file, and if theywant old history (which doesn't happen nearly as often)they can access the other.

Figure 10

(Figure 10) shows that reorganizing database filescould help reduce I/O of the applications that accessthem. One main production library (MHSFLP) isresponsible for 43 percent of the storage on the system,using almost a terabyte of disk space and comprisingnearly 2 billion records. How much of that library couldbe archived?The same data, when sorted by deleted records,

shows that library IHA440FP has 700 million activerecords and 72 million deleted ones (Figure 11).Interestingly, it's not even the main production datalibrary − in fact, it's only a fraction of the size − but itaccounts for 70 percent of the deleted records on thesystem. Performing some basic system managementtasks, such as database file reorgs, could dramaticallyimprove performance of all processes that access thesefiles.(Figure 12) goes even deeper into the data, to the

user level. It shows that user QSECOFR is responsiblefor

Figure 11

creating database files that account for 46 percent of thestorage, TESTBENCH accounts for 18 percent, andROBOT accounts for 13 percent. But SMARINO, anend user, is consuming 300 GB of disk; is this end usergoing wild with queries? You should keep a close eyeon this data.Looking at file size by database name (Figure 13)

yields a few surprises as well. The CLMFILESAV fileimmediately jumps out; this is a save file that'sapparently never been used, but it's 112 GB − 13percent of the disk on the system. OBNMAS accountsfor 100 GB, but has only been used 16 times in itshistory; is this another 100 GB of wasted space?

The files on (Figure 14) offer great opportunity forimprovement through simple file reorgs. IER200P1 has29 million deleted records and only 9 million active

Figure 12

Figure 13

ones. IIT001W03 is an even better opportunity, simplybecause it will take less time to clean up − it only has88,000 active records.

Figure 14

Query Analysis

Looking at queries provides an even more detailed levelof analysis. This example (Figure 15) shows the systemduring normal business hours. QINTER had 13 billionskipped records; put another way, 13 billion recordswere read and not selected, and therefore didn't need tobe read. If you read a record and don't select it, that's afull table scan, and full table scans unnecessarilyconsume tons of resource.This example is particularly bad. In one day, users

performed a total of 3,000 queries − which resulted

in 14 billion I/Os − to select 8,000 records. This seemslike a ridiculous example, but it's real. This invisibleuse of resource occurs every day in manyenvironments. It's not truly "invisible," but it happensso quickly that most people never see it. These 3,000requests might have appeared only momentarily, and bythe time you have refreshed your screen they weregone. The 14 billion unnecessary I/Os could haveparalyzed the system from an interactive response timestandpoint, but were individually invisible until weanalyzed the data this way.

CO405JTCP isn't much better; it performed threebillion I/Os via almost 3,000 queries to select 38million records. However, another issue exists in thissubsystem − why is it selecting so much? What in theworld are they doing with so many records for eachquery? This problem results from applications thatpreload entire data sets into memory, while the useronly accesses the first small set of records. When theuser pages down, the application accesses the next pageof data out of memory. From a coding perspective, it'smuch easier to preload 10,000 records into memory andlet the user page through them; from a performanceperspective − with hundreds of simultaneous users −it's deadly. For system performance, it's worth theadditional time and effort to code the applicationproperly − the application should read 10 records at atime, and not a single record more. When the user pagesdown, read 10 more records. It's horribly inefficient toload 10,000 records into memory that the user willnever even think of reading.

Let's look at it another way (Figure 16), sorted byrecords selected. Remember CO405JTCP? Notsurprisingly, it accounts for 89 percent of all of the

Figure 15

Figure 16

Figure 17

Figure 18

selected records. QBATCH, which made an appearancein our high−level analysis, is number two. The worstperformer, however, is QINTER. It reads 13 billionrecords to select only 8,000. (Figure 17) shows thesame data organized by number of queries, and again,QINTER stands out; it accounts for half of all thequeries on the system. Unnecessary initiation andtermination is the likely culprit here.(Figure 18) shows the data by job. Remember

PCC7C8619? It was our largest CPU consumer, ourlargest physical I/O performer, and now we that it hasthe most skipped records: 36 percent of the total. That'smore than 3 billion records skipped to select 38 million.This job is consuming 22 percent of our CPU, and nowyou see why.We need to address a few issues here. First, we

must make sure that this job has the proper permanentindexes so that it reads 38 million records instead of 3billion. Next, we should find out why it's selecting somany records each time.

Finally, we need to ask ourselves, "Why is itrunning so many times?" Most likely, you have 15users throughout the company all wanting a copy of asingle report, so they're all running the job individually.It would've been a lot more efficient to run the job onceand print 15 copies of that report.Ultimately, you want to get to the level of detail

that shows what database needs tuning. (Figure 19)shows what looks to be the master file (ENBMAS) thatcan clearly benefit from additional indexing. Theapplication performed 2,000 queries to select 300,000records, and it skipped 7 billion records along the way.Fixing this

Figure 19

Figure 20

Figure 21

problem won't involve the building of 1,000 newlogical files on your system. Some people are veryreluctant to build any new logical files, and in fact somecorporate standards prohibit it. What a mistake that isfrom a performance standpoint. You don't wantredundant indexes of course, but you sometimes need anew one, and developers or database administratorsmust be allowed to properly index and tune databasesso that applications can perform well. In this example,we see three databases that desperately need indexing −they're reading billions of records to select just a fewmillion records.(Figure 20) shows the data sorted by selected

records, while (Figure 21) shows it sorted by queries.Notice that the same files are showing up in each ofthese graphs; when you can see this happen, the areasthat need

tweaking become obvious. From a program standpoint,(Figure 22) shows that IOMEM001 is skipping almost12 billion records in 3,000 queries to select 3,000records. That's 3,000 full table scans in one day toselect one record each time.

Even beyond the database level is the index level.(Figure 23) seems confusing at first, but it's saying this:if the file ENPMAS had a new permanent logical filekeyed by MEMBNO, GRPNUM, EFFDAT, andRECTYP, you would eliminate almost 7 billion I/Os onthis system. These I/Os were occurring because anaccess path didn't exist as a permanent logical or itdidn't get used. Further analysis will show what'scausing that − look at the job log of a job that's runningqueries with debug turned on, and see what thatoperating system is doing. See what decisions it'smaking.

Figure 22

Figure 23

Figure 24

(Figure 24) sorts the query data by selected records.SVCMAS is another master file that's missing an indexkey by MODCOD. Because of that there is 10 timesmore I/O than is necessary. But another issue existshere as well: why did we select 40 million records? Thereason is that we probably neglected to put anothercriteria in this query. We selected by MODCOD anddownloaded 10 million records, then did another pass tofurther subset the data. If we had done all of that in onepass, we might have been able to select just a fewthousand records.

Use Good Judgment

This white paper focuses primarily on SQL−relatedissues. You also may have code issues with your RPG,Cobol, CL, or C code. Or, you may have 2 GB ofjournal receivers per day that your high availability toolis having trouble trying to process. Other MB Softwarewhite papers focus on these other areas.No matter what analysis you do, you must use good

judgment. You would never want to build 1,000 newpermanent logical files. You have to know when tostop. Look at different periods of time and differentsamples of data; maybe the numbers vary dramaticallyduring month end, weekend or day end processing.Look at the data during a variety of key business times,and correct the issues that will have the greatest impacton your system.

Documents

First Things First