AIX Virtual Memory Tuning

2/20/2005 Template Documentation 1

Turning The Knobs: Practical AIX Tuning

770134Susan Schreitmueller

pSeries, Advanced Technical Support


Disclaimer• The suggestions contained in this presentation are

general suggestions formulated by the author not as a recommendation from IBM.

• These recommendations should be carefully examined for your environment and tested rigorously prior to implementing in production.

• All environments differ and requirements vary given application and system nuances. Always use YOUR best judgment.


Unless you:

Put fewer cars on the road

Widen the road

Reroute the cars

You just move the bottleneck to a different location!


Unless resources are actually ADDED to the system, much of tuning is moving a bottleneck from one place to another and balancing the trade-off of what application or request gets the available resource. It is important to note however, that is important to identify the cause rather than the bottleneck in order to effectively manage a constraint.

We must concentrate on the entire picture. From point A to point C, there may be many, many factors. Understanding the throughput from A - C is crucial. Striping and disk layout and file access are defeated when in fact the real bottleneck cause is slow transfer speed due to a low-cost/low-through-put adapter.

LTG

LVM MAPPING

RAID Levels

Disk access

disk layout tuned

High thruput

VMM - tuned for optimal memory allocation

ASYNC I/O &

application access

bottleneck continues to be a slow adapter

point A point Cpoint B - slow adapter

disktechnology

disk layout

logicaltrack

grouping

tuningadapter

application tuning and

access

Examine throughput from Point A to Point C


Network

Virtua

l MemoryManagement

WorkloadManager

System Tuning

PSALLOC

schedo

Async I/O

Performance - pieces of the puzzle

no

nfso

LVM Tuning

vmo# of procs

ioo

CPU

filesystem layout


CPU

Number of processes Process-PrioritiesWLM managed

Memory

Real memoryPagingMemory leaksWLM managed

Disk I/O

Disk balancingTypes of disksLVM policiesWLM managed

Network

NFS used to load applicationsNetwork typeNetwork traffic

Critical Resources: Four Bottlenecks c AUS es

tprof pprofsar -P -u -q ps aux ps topas

svmonvmtunevmoipcsPSALLOC

netpmonnolsattrnfsoentstatnetstat

lvmstatiostatlvm mapwlmmonwlmstatioo


Check CPU

High

CPU %

Check memory

High

paging

possiblememory

constraint

possibleCPU constraint

noBalance disk

yes

possibledisk/SCSIconstraint

Check disk

Disk

balanced

no

yes

yes

no

topas vmstatsar -q | -u | -P

tprof pprof wlmstat

sar -dtopas iostat

filemon lvmstatwlmstat wlmmonsvmon topas

vmstat ps gvcvmtune

NETWORKINGnetstat -v -s -m

netpmonnetperf

lsattr -El ent1

disk layouttransfer

ratesaccess

(seq/par)filemonlvmstat


Networking

Monitor

iptrace, ipfilter, ipreport

netpmon

netstat, nfsstat (entstat)

Tune

no, isno

nfso

adapters - chent

MemoryMonitor

svmon, vmstat, sar

wlmstat, wlmmon

tprof

Tune

vmo

paging controls

Monitor

topas, vmstat, sar

wlmstat, wlmmon

xmperf / PTX

tprof, pprof,

nmon (download)

CURT, SPLAT

Tune

schedo, system parms

CPU/Kernel General

Monitor:

Profiling: tprof, pprof, Xprof

fdpr

CURT, SPLAT

Tune:

database calls

file calls

good programming

Applications

Monitor:

filemon, fileplace

lvm mapping, lvmstat, iostat

wlmstat, wlmmon

Tune:

ioo

AIO – max/min servers

adapter spread, file layout

Disk I/O


Network Options

Network options are set by executing the command:

Prior to AIX 5.2, these options should be placed where they will be re-executed on boot. e.g., an /etc/rc.tune file or /etc/rc.local

AIX 5.2 supports permanent and reboot values retention in - /etc/tunables or /etc/tunables/nextboot | /etc/tunables/lastboot

Interface Specific Network Options (ISNO)Allows some options to be configured differently for the following network interfaces.

10/100/1000 BaseT, 10/100 BaseTATMGigabit Ethernet

no -anfso -a

extendednetstats = 0 strturncnt = 20 directed_broadcast = 1

thewall = 1048576 pseintrstack = 15 ipignoreredirects = 0

sockthresh = 85 lowthresh = 12288 ipsrcroutesend = 1

sb_max = 1310720 medthresh = 90 ipsrcrouterecv = 1 somaxconn = 1024 psecache = 95 ipsrcrouteforward = 1

clean_partial_conns = 0 subnetsarelocal = 1 ip6srcrouteforward = 1

net_malloc_police = 0 maxttl = 1 ip6_defttl = 64 rto_low = 1 ipfragttl = 255 ndpt_keep = 120 rto_high = 64 ipsendredirects = 60 ndpt_reachable = 30 rto_limit = 7 ipforwarding = 1 ndpt_retrans = 1

rto_length = 13 udp_ttl = 1 ndpt_probe = 5 inet_stack_size = 16 tcp_ttl = 30 ndpt_down = 3

arptab_bsiz = 7 arpt_killc = 60 ndp_umaxtries = 3 arptab_nb = 25 tcp_sendspace = 655360 ndp_mmaxtries = 3

tcp_ndebug = 100 tcp_recvspace = 655360 ip6_prune = 2

ifsize = 8 udp_sendspace = 65536 ip6forwarding = 0 arpqsize = 1 udp_recvspace = 655360 multi_homed = 1 ndpqsize = 50 rfc1122addrchk = 65536 main_if6 = 0 route_expire = 1 nonlocsrcroute = 0 main_site6 = 0

send_file_duration = 300 tcp_keepintvl = 1 site6_index = 0 fasttimo = 200 tcp_keepidle = 150 maxnip6q = 20

routerevalidate = 0 bcastping = 14400 llsleep_timeout = 3 nbc_limit = 0 udpcksum = 0 tcp_timewait = 1

nbc_max_cache = 131072 tcp_mssdflt = 1448 tcp_ephemeral_low = 32768 nbc_min_cache = 1 icmpaddressmask = 1448 tcp_ephemeral_high = 65535

nbc_pseg = 0 tcp_keepinit = 0 udp_ephemeral_low = 32768

nbc_pseg_limit = 524288 ie5_old_multicast_mapping = 150 udp_ephemeral_high = 65535

strmsgsz = 0 rfc1323 = 1 delayack = 0

strctlsz = 1024 pmtu_default_age = 0 delayackports = {}

nstrpush = 8 pmtu_rediscover_interval = 10 sack = 0

strthresh = 85 udp_pmtu_discover = 30 use_isno = 1

psetimers = 20 tcp_pmtu_discover = 0 psebufcalls = 20 ipqmaxlen = 0

100

Network Tuning – Choose the ‘knobs’ to turn


portcheck = 0udpchecksum = 1

nfs_socketsize = 600000nfs_tcp_socketsize = 600000

nfs_setattr_error = 0nfs_gather_threshold = 4096

nfs_repeat_messages = 0nfs_udp_duplicate_cache_size = 5000nfs_tcp_duplicate_cache_size = 5000

nfs_server_base_priority = 0nfs_dynamic_retrans = 1

nfs_iopace_pages = 0nfs_max_connections = 0

nfs_max_threads = 3891nfs_use_reserved_ports = 0

nfs_device_specific_bufs = 1nfs_server_clread = 1

nfs_rfc1323 = 1nfs_max_write_size = 32768nfs_max_read_size = 32768

nfs_allow_all_signals = 0

NFS should be tuned also


Tuning the I/O Subsystem

Disk layout: check w/ LVM Mapping

Disk technology: check LTG selection & lsattr -El hdisk#

Check file(Seq. or Parallel) access, concurrent access patterns

Review Async I/O needs and tuning for the application

Utilize filemon, iostat, sar -d to monitor

Physical Layer

Application Access

LVM Layout

Async I/O | Parallel or Sequential


Virtual Memory Management

A (very) short tutorial!


Paging Space

Real Memory

Segment

256MB

Frame 4KB

File System (JFS)

CLIENT PAGES

Working Segment

Persistent Segment

NFS

JFS2

Client Segment


bbb2

bbb4

ccc1

aaa4

aaa2

Real Addr

Free List

+Pbbb2++Pbbb1

Waaa4+Waaa3

+Waaa2++Waaa1

++Cccc1Pbbb4

+Pbbb3

++ Cccc4+Cccc3

+Cccc2

Mod?

Ref?

SegType

Real Addr

+Pbbb1

Waaa3

+Waaa1

+Cccc1

Pbbb3

Mod?Ref?SegType

Real Addr

Page Frame Table (PFT)

PagingSpace

File System

New PFT

NFS/JFS2


vmo - A beginning look

• Let’s look at two examples of controlling memory usage:– maxfree/minfree – to control at what levels the

page replacement algorithm will begin or stop stealing pages

– maxperm/minperm – to control what types (file pages or computational pages) of pages are stolen first


minfree – when # of frames on the free list reaches this value –Page Replacement Algorithm wakes up and begins stealing pages

maxfree – VMM Page Stealing continues until the maxfree value is reached.

120

120 + 8

defaults

On a large memory system or SMP, the defaults of 120 and 128 are a very small amount of the real memory available. If memory demand continues after minfree value is reached, then processes may be suspended or killed. When the number of free pages is = or < than maxfree, algorithm no longer frees pages. There will be insufficient pages relative to the total system memory to satisfy demand.

Large memory system’s

remaining memory


maxperm

maxpermnumperm

Comp Pages

Comp Pages

File Pages

File Pages

minperm minperm

100%

0%

50%

Controlling Memory Selections


vmo & ioo vs. vmtune at AIX V5.2 •With the AIX V5.2 release, vmtune was replaced by the tuning parameters vmo and ioo. schedtune was replaced by the tuning parameter schedtune. All of these parameters (along with noo and nfso) support the ability to retain tuning parameters in the /etc/tunables files

•Although vmtune and schedtune can still be run, the appropriate vmo, ioo or schedtune command should be utilized. vmtune and schedtune are still available for backward compatibility but have limited functionality.


vmtune – maxfree & minfree

• Set minfree = 120 * # of CPU’s * # memory pools

• Set maxfree = minfree + maxpageahead * # of CPU’s

(some recommend starting value of (120+4)* (#of memorypools)

Number of memory pools can be determined through:

• vmtune –a (pre 5.2)

•vmstat –v for AIX 5.2


The nmon and nmon_analyzer are free tools from IBM that are useful for displaying and analyzing AIX performance.

The nmon tool is similar to "topas", which displays real-time AIX performance statistics. But unlike "topas", nmon presents more information and can capture data for analysis and presentation.

The nmon_analyzer tool analyzes the captured performance data. It can create a spreadsheet showing graphs of performance trends.

Using nmon & nmon_analyzer


NMON Analyzer performs analyses of the nmon data to produce the following:

Calculation of weighted averages for hot-spot analysis

Distribution of CPU utilization by processor over the collection interval - useful in identifying single-threaded processes

Additional sections for ESS vpaths showing device busy, read transfer size, and write transfer size by time of day

Total system data rate by time of day, adjusted to exclude double-counting of EMC hdiskpower devices - useful in identifying I/O subsystem and SAN bottlenecks

Separate sheets for EMC hdiskpower and FAStT dac devices

Analysis of memory utilization to show the split between computational and non-computational pages

Total data rates for each network adapter by time of day

Summary data for the TOP section showing average CPU and memory utilization for each command


Examining

numperm/minperm/maxpermwith nmon


Additional information in nmonanalyzer


Initial Tuning – vmtune or vmo

If the load on the system is relatively unknown, the values above could be considered a starting point.

• MINPERM = 15• MAXPERM = 60• numfsbufs = 186• MINFREE = (120 + 4) * # of memory pools (#of memory

pools is found by issuing the vmstat -a command)• MAXFREE = MINFREE + (MAXPGAHEAD (or

j2maxpgahead) * # of Memory Pools)• hd_pbuf_cnt = (# of Disks attached to the server (Physical

or Luns) + 4) times 120


I/O Tuning Parameters• numfsbufs (vmtune –b) specifies the number of file system buffer

structures. This value is critical asVMM will put a process on the wait list if there are insufficient free buffer structures.

• Run vmtune –a (pre 5.2) vmstat –v (5.2 & >) and monitor fsbufwaitcnt. This is incremented each time an I/O operation has to wait for file system buffer structures.

• A general technique is to double the numfsbufs value (up to a maximum of 512) until fsbufwaitcount no longer increases. This value, as it is dynamic, should be re-executed on boot prior to any mount all command.


I/O Tuning Parameters (cont)• hd_pbuf_cnt (vmtune –B) determines the number of pbufs

assigned to LVM. pbufs are pinned memory buffers used to hold pending I/O requests.

• Again, examine vmtune –a and review the psbufwaitcnt. If increasing, multiply the current hd_pbuf_cnt by 2 until psbufwaitcnt stops incrementing.

• Because the hd_pbuf_cnt can only be reduced via a reboot (this is pinned memory) – be frugal when increasing this value.


I/O Tuning• Over 35% I/O wait should be investigated• Oracle databases like async I/O, DB2 & Sybase do not care (a

good place to start would be AIO PARMS of • MINSERVERS = 80 MAXSERVERS = 200 MAXREQUESTS = 8192)

• Recent technology disks will support higher ltg numbers• lvmstat (must be enabled prior to usage) provides detailed

information for I/O contention• filemon is an excellent I/O tool (trace – ensure you turn it off) • numfsbufs and hd_pbuf_cnt adjusted to reduce wait counts in

vmtune or vmstat -v


VMSTAT AIX 5# vmstat - I -t 1 10kthr memory page faults cpu time----- ----------- ------------------------ ------------ ----------- --------r b p avm fre fi fo pi po fr sr in sy cs us sy id wa hr mi se0 0 0 35169 98866 0 0 0 0 0 16 118 231 30 0 1 99 0 12:41:520 1 0 35169 98863 0 0 0 0 0 0 222 100 27 0 0 99 0 12:41:531 1 0 35169 98863 5 0 0 0 0 0 229 88 38 2 0 91 7 12:41:541 1 0 35169 98863 6 0 0 0 0 0 218 58 26 4 5 91 0 12:41:551 1 0 35169 98863 7 0 0 0 0 0 227 58 30 6 0 94 0 12:41:560 1 0 35169 98863 4 0 0 0 0 0 236 72 34 0 0 99 0 12:41:570 1 0 35169 98863 0 9 0 0 0 0 223 72 34 0 0 99 0 12:41:580 1 2 35169 98863 20 7 0 0 0 0 221 60 28 1 0 89 10 12:41:590 1 2 35169 98863 18 4 0 0 0 0 213 58 30 1 5 84 10 12:42:000 1 0 35169 98863 0 0 0 0 0 0 221 72 34 0 0 99 0 12:42:01


VMSTAT AIX 5/@test1 $ vmstat hdisk0 hdisk1 1 10 kthr memory page faults cpu disk xfer ----- ----------- ------------------------ ------------ ----------- ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 2 3 4 1 1 51459 110720 0 0 0 0 0 0 208 2484 1177 26 10 64 0 0 0 3 0 51465 110714 0 0 0 0 0 0 303 5371 1609 26 11 64 0 0 0 1 0 51465 110714 0 0 0 0 0 0 300 5502 1725 27 8 65 0 0 0 3 0 51465 110714 0 0 0 0 0 0 305 5273 1613 27 9 64 0 0 0 1 0 51466 110713 0 0 0 0 0 0 310 5330 1654 21 15 65 0 0 0 1 1 51467 110712 0 0 0 0 0 0 308 5341 1643 28 7 65 0 10 0 1 1 51467 110712 0 0 0 0 0 0 313 5392 1665 28 10 62 0 5 4 1 1 51467 110712 0 0 0 0 0 0 308 5421 1677 23 13 64 0 0 8 2 0 51467 110712 0 0 0 0 0 0 308 5271 1635 27 8 66 0 0 0 1 0 51467 110712 0 0 0 0 0 0 302 5432 1697 29 9 62 0 0 0

The number of transfers per second to the specified physical volumes that occurred in the sample interval. One to four physical volume names can be specified. Transfer statistics are given for each specified drive in the order specified. This count represents requests to the physical device. It does not imply an amount of data that was read or written. Several logical requests can be combined into one physical request.


A look at the changes at AIX 5.2 and beyond


AIX 5.2 Performance ToolsWhat’s Changed? Template-based AIX performance tuning via a stanza based file:

/etc/tunablesSupports no, nfso, schedo (schedtune), and vmo (vmtune)Supports persistent values for no and nfso across rebootFile can be exported and imported to multiple servers

Consolidated access to performance tuning values in SMIT and Web-base System ManagerPerf toolbox and iostat support for ESS vpathsInclude Xprofiler (GUI-based profiling tool) in AIX basePerformance tools support for LPAR, large pages and memory affinity New thread analysis tools: CURT and SPLATtprof enhancements

Support for emulation and alignment interruptsImproved threads supportMultiple process profiling

kdb enhancements for crash, lldb functions

Performance management and debugging tools


vmtuneioo vmo

schedtune schedo•Command Consistency

•Options for display or change

•Ability to control changes now, next boot, all

•Ability to return to defaults, check consistency, save or propogate

•Commands supported from SMIT or WSM

no

nfso


no (new syntax)no –aNetwork Tuning

nfso (new syntax)

nfso –aNFS TuningioovmtuneI/O Tuning

schedoschedtuneScheduler Tuning

vmo & ioovmtuneVMM TuningAIX 5.2AIX 5.1Command

Old vs. New


tuncheck - checks ranges, dependencies, bosboot if required

tunsave - saves current values to a file (optionally nextboot)

tunrestore - restore from a file (now or at reboot)

tundefault - restores to default values

etc/tunables

AIX 5.2/5.3 Tuning: /etc/tunables

Promotes reusability

Flags are now consistant

Automatic saving of parameters

Called from SMIT/WSM


vmo ioo schedo no nfso Common Flags:


General Database Goodness

• Async I/O• Buffering vs. AIX Filesystems• Controlling Logs

– Size to avoid constant switches– Watch placement on volumes

• Use the correct tuning parameters• Set system parameters

– Number of processes per user


Metrics displayedCPU utilization (%user, %sys, %idle, %wait)percentage spent in hypervisor (%hypv) and number of hcalls (hcalls) [both optional]additional shared mode only metrics

Physical Processor Consumed (physc)Percentage of Entitlement Consumed (%entc)Logical CPU Utilization (%lbusy)Available Pool Processors (app)number of virtual context switches (vcsw)

virtual processor hardware preemptionsnumber of phantom interrupts (phint)

interrupts received for other partitionsExample

# lparstat 5 10

System configuration: type=Shared mode=Capped smt=On lcpu=2 mem=2048 psize=1.0 ent=0.50

%user %sys %wait %idle physc %entc lbusy app vcsw phint----- ---- ----- ----- ----- ----- ------ --- ---- ----- 4.8 1.2 0.0 94.0 0.04 7.0 1.7 0.9 1378 0 21.8 1.8 0.0 76.4 0.13 26.3 13.3 0.8 1580 0 31.2 2.2 0.0 66.5 0.19 37.0 16.1 0.8 1461 0 84.9 5.4 0.0 9.7 0.50 99.4 48.7 0.5 1472 0 85.1 5.4 0.0 9.5 0.50 99.5 48.1 0.5 1477 0 77.1 4.9 0.0 18.0 0.45 90.2 44.5 0.4 1546 0 2.9 6.2 0.0 90.9 0.06 11.4 2.2 0.9 1425 1 4.8 13.6 0.0 81.6 0.11 22.6 10.0 0.8 1810 0 4.4 12.3 0.0 83.3 0.10 20.5 10.5 0.9 1773 1

lparstat - monitoring modeMetrics displayed

CPU utilization (%user, %sys, %idle, %wait)percentage spent in hypervisor (%hypv) and number of hcalls (hcalls) [both optional]additional shared mode only metrics

Physical Processor Consumed (physc)Percentage of Entitlement Consumed (%entc)Logical CPU Utilization (%lbusy)Available Pool Processors (app)number of virtual context switches (vcsw)

virtual processor hardware preemptionsnumber of phantom interrupts (phint)

interrupts received for other partitionsExample

# lparstat 5 10

System configuration: type=Shared mode=Capped smt=On lcpu=2 mem=2048 psize=1.0 ent=0.50

%user %sys %wait %idle physc %entc lbusy app vcsw phint----- ---- ----- ----- ----- ----- ------ --- ---- ----- 4.8 1.2 0.0 94.0 0.04 7.0 1.7 0.9 1378 0 21.8 1.8 0.0 76.4 0.13 26.3 13.3 0.8 1580 0 31.2 2.2 0.0 66.5 0.19 37.0 16.1 0.8 1461 0 84.9 5.4 0.0 9.7 0.50 99.4 48.7 0.5 1472 0 85.1 5.4 0.0 9.5 0.50 99.5 48.1 0.5 1477 0 77.1 4.9 0.0 18.0 0.45 90.2 44.5 0.4 1546 0 2.9 6.2 0.0 90.9 0.06 11.4 2.2 0.9 1425 1 4.8 13.6 0.0 81.6 0.11 22.6 10.0 0.8 1810 0 4.4 12.3 0.0 83.3 0.10 20.5 10.5 0.9 1773 1


3 MODES: information (-i) shows static configuration informationdetailed hypervisor (-H) breakdown of hypervisor time by hcall typemonitoring mode (default)

topas - main screen update

New cpu section metrics on physical processing resources consumedPhysc: amount consumed in fractional number of processors%Entc: amount consumed in percentage of entitlement

Topas Monitor for host: specweb8 EVENTS/QUEUES FILE/TTY Sat Mar 13 09:47:18 2004 Interval: 2 Cswitch 50 Readch 0 Syscall 47 Writech 34 Kernel 0.0 | | Reads 0 Rawin 0 User 0.0 | | Writes 0 Ttyout 34 Wait 0.0 | | Forks 0 Igets 0 Idle 100.0 |############################| Execs 0 Namei 1 Physc = 0.01 %Entc= 1.2 Runqueue 0.0 Dirblk 0 Waitqueue 0.0 Network KBPS I-Pack O-Pack KB-In KB-Out en0 0.1 1.0 1.0 0.0 0.1 PAGING MEMORY lo0 0.0 0.0 0.0 0.0 0.0 Faults 0 Real,MB 8191 Steals 0 % Comp 5.4 Disk Busy% KBPS TPS KB-Read KB-Writ PgspIn 0 % Noncomp 1.6 hdisk0 0.0 0.0 0.0 0.0 0.0 PgspOut 0 % Client 1.6 hdisk2 0.0 0.0 0.0 0.0 0.0 PageIn 0 hdisk3 0.0 0.0 0.0 0.0 0.0 PageOut 0 PAGING SPACE hdisk1 0.0 0.0 0.0 0.0 0.0 Sios 0 Size,MB 512 % Used 0.6 Name PID CPU% PgSp Owner NFS (calls/sec) % Free 99.3 IBM.CSMAg 13180 0.0 1.6 root ServerV2 0 syncd 9366 0.0 0.5 root ClientV2 0 Press: prngd 22452 0.0 0.3 root ServerV3 0 "h" for help psgc 2322 0.0 0.0 root ClientV3 0 "q" to quit pilegc 2580 0.0 0.0 root

New metrics are added automatically when running in shared modeCPU utilization metrics are automatically calculated using new purr-based data and formula when running in SMT or shared mode




Filesystem buffers: Insufficient buffers will degrade I/O performance. The default AIX setting for these buffers is typically too low for database servers. JFS/JFS2 are tuned separately. JFS uses “vmtune –b”, while JFS2 uses “vmtune –Z”.Be careful not to set the value too high if running a 32 bit kernel with a large number of filesystems (50+). The buffer setting is per filesystem, and you can run out of kernel memory if this is set too high. (This does not apply to the 64 bit kernel, which supports larger kernel memory sizes.)Tune the filesystem buffers when the system is under peak load. Run the following command multiple times:

AIX 5.1: /usr/samples/kernel/vmtune –a | grep fsbufwaitcntAIX 5.2: vmstat –v | grep “filesystem I/Os blocked with no fsbuf”


Disk Layout: The most important I/O tuning step is to spread all of the data over all of the physical disk drives*. If you have a SAN, work closely with the SAN administrator to understand the logical to physical disk layout. In a SAN, two or more hdisks may reside on the same physical disk. (*-The one exception is when you back up to disk. Be sure the backup disks are on a separate storage system to avoid having a single point of failure.)

Queue depth for fibre channel adapter: This setting depends on the storage vendor. For IBM Shark storage, I set this around 100. If using non-IBM storage, check with your vendor for their queue depth recommendation. High queue depth settings have been known to cause data corruption on some non-IBM storage. If unsure, use the default value.

Asynch I/O: Improves write performance for JFS and JFS2 file systems. It does not apply to raw partitions. AIO is implemented differently in AIX 5.1 and 5.2. In 5.1, the min/max AIO settings are for the entire system. In 5.2, the AIO settings are per CPU.

In AIX 5.1, I set the “max server” to 1000. On a 5.2 system, divide 1000 by the number of CPUs. I tend to over configure AIO, as it requires a reboot. Over configuring “max server” doesn’t use any extra resources, as AIO servers are only created when needed. The “max server” just sets the maximum, not the actual number used. If you plan to use DLPAR and dynamically add CPU’s, contact Supportline to discuss the implications.


Change / Show Characteristics of Operating System

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Maximum number of PROCESSES allowed per user [10000] ***

Maximum number of pages in block I/O BUFFER CACHE [20]

Maximum Kbytes of real memory allowed for MBUFS [0]

Automatically REBOOT system after a crash false

Continuously maintain DISK I/O history t rue ***

HIGH water mark for pending write I/Os per file [0]

LOW water mark for pending write I/Os per file [0]

Amount of usable physical memory in Kbytes 524288

State of system keylock at boot time normal

Enable full CORE dump false

Use pre-430 style CORE dump false

CPU Guard enable ***

ARG/ENV list size in 4K byte blocks [6]

Default max processes per user (128) too low

Continuously maintain disk I/O - sar and iostat to record disk

CPU Guard - CPU Deallocation


Change / Show Characteristics of an Ethernet Adapter

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Ethernet Adapter ent0

Description IBM 10/100 Mbps Ethern

Status Available

Location 10-60

TRANSMIT queue size [8192]

HARDWARE RECEIVE queue size [256]

RECEIVE buffer pool size [384]

Media Speed 10/100,full-duplex****

Inter-Packet Gap [96]

Enable ALTERNATE ETHERNET address no

ALTERNATE ETHERNET address [0x000000000000]

Enable Link Polling no

Time interval for Link Polling [500]

Apply change to DATABASE only no

Avoid use of autonegotiate…


General Oracle Tuning Tips• Enable Async I/O (ORACLE: init.ora thru smit menu) • Read tips on number of AIO servers enabled (maxservers) • Setting minservers higher will ensure that these servers run at a favored priority and with round-

robin policy - otherwise, servers created after minservers (up to maxservers) run with whatever priority and scheduling policy of the process which issued the AIO (ex. Oracle).

• Set tcp.nodelay = true in the protocol.ora file - very important if using large MTUs (ex, SP systems)

• You may want to run fixed priority for all Oracle processes and increase the timeslice w/ schedtune

• dbwriter and logwriter may be run at more favored priorities • Size Oracle log files to avoid frequent log switches • Use raw LV's for your db files rather than filesystem files if the database size is much larger

than the real memory size. If you have to use filesystem files, then use a lot of files - you are asking for trouble if you put all of your tables into a single file.

• Separate your Oracle log files away from all other files (ie, do not mix Oracle sequential I/O with any random I/O)

• timed_statistics can be turned off for better performance• Check that any non-essential traces are off: eg, SQL trace, SQL-NET trace, or Oracle Otrace.• Increase Oracle spincount parameter on faster hardware


pSeries/AIX SAP hints• Configuration of aio0 device

– Symptom of problem is slow application I/O but fast I/O at AIX PV level– ROT for starters

• maxservers = 125% of datafile (container) count • minservers = ½ maxservers

• Filesystem buffers - numfsbufs– Symptom is consistent increases in:

• vmstat –v “filesystem I/Os blocked with no fsbuf” (AIX 5.2)• vmtune –a “fsbufwaitcnt” (AIX 4.3.3)

– Increase using AIX 5 vmo (or vmtune with 5.1)

• JFS i-node serialization with change intensive workloads– Symptom is slow I/O at application, fast at filemon PV level

• Perfpmr traces to confirm– JFS2 cio option removes i-node serialization– With JFS, limit size of datafiles (containers) on change-intensive tablespaces

to 2 GB.


General Backup Goodness

• Large tcpip sizes• Watch concurrent access (disk thrashing)• Remember tcpip parameters are inherited from

parent


Redbook References• Managing AIX Server Farms,

– SG24-6606-00 • AIX 5L Differences Guide,

– SG24-5765-02• AIX Version 4.3 to 5L Migration Guide,

– SG24-6924-00• AIX 5L Performance Tools Handbook

– SG24-6039-00• Database Performance Tuning on AIX

• SG24-5511

• Understanding IBM eServer pSeries Performance and Sizing• SG24-4810


Other References

– “Performance Management Guide”http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/aix52.htm

• SG24-6039-00– Direct I/O:http://www-106.ibm.com/developerworks/eserver/articles/DirectIO.html– Concurrent I/O:http://www-1.ibm.com/servers/aix/whitepapers/db_perf_aix.pdf


SUMMARY• Each environment will have different challenges• Rules of Thumb – are just that; suggestions• If you don’t know what your performance was BEFORE you

made the change, you won’t know what affect you had on performance.

• Carefully define ALL boundaries that you must operate under. Best-case throughput is always controlled by the slowest common denominator!

Documents

AIX Virtual Memory Tuning