Upload
hilary-pierce
View
218
Download
0
Embed Size (px)
Citation preview
1
OpenVMS “Marvel” EV7 Proof Points of Olympic
Proportions
Tech Update - Sept 2003
2
OpenVMS Marvel EV7 Proof Points of Olympic Proportions
• Live, mission-critical, production systems
• Multi-dimensional
• Before and after comparisons
• Upgrades from GS140 & GS160 to GS1280
• Proof of impact on maximum headroom
3
Can your enterprise benefit from an upgrade to a GS1280?
• Systems with high MPsynch• Systems with high primary CPU interrupt load• Poor SMP scaling• Heavy locking• Heavy IO, Direct, Buffered, Mailbox• Heavy use of Oracle, TCPIP, Multinet• Look closer if:
– Systems with poor response time
– Systems with insufficient peak period throughput
4
T4 - Data Sources• Data for these comparisons was collected using
the internally developed T4 (tabular timeline tracking tool) suite of coordinated collection utilities and analyzed with TLViz
• The T4 kit & TLViz have consistently proved themselves invaluable for this kind of before and after comparison project. We have now made T4 publicly available for download (will ship with OpenVMS 7.3-2 in SYS$ETC: )
• T4 could be a useful adjunct to your performance management program.
5
Would you like to participate in our Marvel Proof Point Program?
• Contact [email protected] for more information about how you can take part
• Download T4 kit from public web site:http://h71000.www7.hp.com/OpenVMS/products/t4/
index.html• Start creating a compact, portable T4 based
performance history of your most important systems• The T4 data will create a common and efficient
language for our discussions. We can then work with you and help you evaluate your unique pattern of use and the degree to which the advantages of Marvel EV7 on OpenVMS can most benefit you.
6
Want even more detail?
• The electronic version of this presentation contains extensive captions and notes on each slide for your further study, reflection, and review.
7
CASE 1- Production System
12P GS140 700 MHz vs. 16P GS1280 1.15 GHz
Tremendous Gains in Headroom Oracle Database Server
with Multinet
8
Node(s) : HNAM and HNAM
[MON.STAT]Compute(# 1)gfedcb [MON.STAT]Compute(# 2)gfedcb
12:00:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:00:00(3-Mar-2003)
09:30:00(3-Mar-2003)
09:00:00(3-Mar-2003)
08:30:00(3-Mar-2003)
08:00:00(3-Mar-2003)
55
50
45
40
35
30
25
20
15
10
5
0
55
50
45
40
35
30
25
20
15
10
5
0
Compute Queue Completely Evaporates with GS1280
Green is GS140 at 700 MHz with 12 CPU
Red is GS1280 at 1.15GHZ with 16 CPUs
Peak Queues of 57 drop to queues of 1 or 2
9
Node(s) : HNAM and HNAM
[MON.MODES]Cpu 00 Idle(# 1)gfedcb [MON.MODES]Cpu 00 Idle(# 2)gfedcb
12:00:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:00:00(3-Mar-2003)
09:30:00(3-Mar-2003)
09:00:00(3-Mar-2003)
08:30:00(3-Mar-2003)
08:00:00(3-Mar-2003)
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
With GS1280, 73% spare CPU 0 during absolute peak. With GS140 CPU 0 is completely consumed during peaks (e.g. at 11 AM)
CPU O Idle Time
Green is GS140 at 700 MHz with 12 CPU
Red is GS1280 at 1.15GHZ with 16 CPUs
10
Node(s) : HNAM and HNAM
[MON.SYST]Cpu Busy(# 1)gfedcb [MON.SYST]Cpu Busy(# 2)gfedcb
12:00:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:00:00(3-Mar-2003)
09:30:00(3-Mar-2003)
09:00:00(3-Mar-2003)
08:30:00(3-Mar-2003)
08:00:00(3-Mar-2003)
1,150
1,100
1,050
1,000
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
150
100
50
0
1,150
1,100
1,050
1,000
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
150
100
50
0
Almost 4 to 1 reduction in CPU Busy with GS1280
Green is GS140 at 700 MHz with 12 CPU
Red is GS1280 at 1.15GHZ with 16 CPUs
GS140 is nearly maxed out at more than 1150% busy of 1200% while GS1280 is cruising along at 250% to 350% busy of 1600%
11
Node(s) : HNAM and HNAM
[MON.SYST]Direct I/O Rate(# 1)gfedcb [MON.SYST]Direct I/O Rate(# 2)gfedcb
12:00:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:00:00(3-Mar-2003)
09:30:00(3-Mar-2003)
09:00:00(3-Mar-2003)
08:30:00(3-Mar-2003)
08:00:00(3-Mar-2003)
11,500
11,000
10,500
10,000
9,500
9,000
8,500
8,000
7,500
7,000
6,500
6,000
5,500
5,000
4,500
4,000
3,500
3,000
2,500
2,000
1,500
1,000
500
0
11,500
11,000
10,500
10,000
9,500
9,000
8,500
8,000
7,500
7,000
6,500
6,000
5,500
5,000
4,500
4,000
3,500
3,000
2,500
2,000
1,500
1,000
500
0
DirectIO (includes network traffic)
Green is GS140 at 700 MHz with 12 CPU
Red is GS1280 at 1.15GHZ with 16 CPUs
GS1280 is able to push to higher peaks when load gets heavy, while still having huge spare capacity for more work. GS140 is close to maxed out at 10,000 DIRIO per second
12
Node(s) : HNAM and HNAM
[MON.MODE]Mp Synch(# 1)gfedcb [MON.MODE]Mp Synch(# 2)gfedcb
12:00:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:00:00(3-Mar-2003)
09:30:00(3-Mar-2003)
09:00:00(3-Mar-2003)
08:30:00(3-Mar-2003)
08:00:00(3-Mar-2003)
105
100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
105
100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
MPsynch
Green is GS140 at 700 MHz with 12 CPU
Red is GS1280 at 1.15GHZ with 16 CPUs
MPsynch drops from 90% to under 10% leaving plenty of room for further scaling
13
Node(s) : HNAM and HNAM
[NET.EWA0:]Pkts Sent/Sec(# 1)gfedcb [NET.EWC0:]Pkts Sent/Sec(# 1)gfedc [NET.EWA0:]Pkts Sent/Sec(# 2)gfedc [NET.EWC0:]Pkts Sent/Sec(# 2)gfedcb
12:00:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:00:00(3-Mar-2003)
09:30:00(3-Mar-2003)
09:00:00(3-Mar-2003)
08:30:00(3-Mar-2003)
08:00:00(3-Mar-2003)
6,000
5,500
5,000
4,500
4,000
3,500
3,000
2,500
2,000
1,500
1,000
500
0
6,000
5,500
5,000
4,500
4,000
3,500
3,000
2,500
2,000
1,500
1,000
500
0
Packets Per Second Sent – key throughput metric - Estimate actual maximum rate for GS1280 at more than 20,000/sec
Blue is GS140 at 700 MHz with 12 CPU
Red is GS1280 at 1.15GHZ with 16 CPUs
GS140 maxes out at about 5,000 packets per second with little or no spare capacity. GS1280 reaches 6,000 with substantial spare capacity
14
Case 1 Summary 12P GS140 to 16P GS160
• GS1280 delivers an estimated increase in headroom of at least 4X
• Eliminates CPU 0 bottleneck
• Drastically cuts MPsynch
• Able to handle higher peaks as they arrive
• Almost 4 to 1 reduction in CPU use while doing slightly more work
15
16
Case 2 – Production System10P GS140 700 MHz
vs. 8P GS1280 1.15 GHz
Tremendous Gains in Headroom for a Oracle Database Server despite reduced CPU count
Poised to Scale
17
Compute Queue Completely Evaporates with GS1280 and the current workload demand
Red is GS1280 at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Peak Queues of 32 drop to 3
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[MON.STAT]Compute(# 1)gfedcb [MON.STAT]Compute(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
32
30
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
32
30
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
18
CPU O Idle Time
Green is GS140 at 700 MHz with 12 CPU
Red is GS1280 at 1.15GHZ with 16 CPUs
With GS1280, 69% spare CPU 0 during absolute peak with this workload. With GS140 CPU 0 is completely consumed during peaks (e.g. at 10:30 for many minutes at a time)
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[MON.MODES]Cpu 00 Idle(# 1)gfedcb [MON.MODES]Cpu 00 Idle(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
Red is GS1280 at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
19
More than 3 to 1 reduction in CPU Busy with GS1280
GS140 is completed maxed out at more than 1000% busy while GS1280 is cruising along at 200% to 350% busy of 800% Red is GS1280
at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[MON.SYST]Cpu Busy(# 1)gfedcb [MON.SYST]Cpu Busy(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
150
100
50
0
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
150
100
50
0
20
DirectIO (includes network traffic)
GS1280 is able to push to higher peaks of 10,500 when the load temporarily gets heavier, while still having huge spare capacity for more work (appx 5 CPUs) The 10P GS140 is maxed out at slightly over 8,000 DIRIO per second.
Red is GS1280 at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[MON.SYST]Direct I/O Rate(# 1)gfedcb [MON.SYST]Direct I/O Rate(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
10,500
10,000
9,500
9,000
8,500
8,000
7,500
7,000
6,500
6,000
5,500
5,000
4,500
4,000
3,500
3,000
2,500
2,000
1,500
1,000
500
0
10,500
10,000
9,500
9,000
8,500
8,000
7,500
7,000
6,500
6,000
5,500
5,000
4,500
4,000
3,500
3,000
2,500
2,000
1,500
1,000
500
0
21
MPsynch (more than a 9 to 1 reduction with this workload)
MPsynch drops from peaks of 67% to peaks of only 7% leaving plenty of room for further scaling Red is GS1280
at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[MON.MODE]Mp Synch(# 1)gfedcb [MON.MODE]Mp Synch(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
65
60
55
50
45
40
35
30
25
20
15
10
5
0
65
60
55
50
45
40
35
30
25
20
15
10
5
0
22
Packets Per Second Sent – a key throughput metric - Estimate actual max rate for 8P GS1280 at more than 11,000/sec. With 16P this would rise to 20,000/sec
The10 P GS140 maxes out at about 4,200 packets per second with no spare capacity. The 8P GS1280 reaches 4,800 with more than 4.5 CPUs to spare Red is GS1280
at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[NET.EWA0:]Pkts Sent/Sec(# 1)gfedcb [NET.EWA0:]Pkts Sent/Sec(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
4,8004,6004,4004,2004,0003,8003,6003,4003,2003,0002,8002,6002,4002,2002,0001,8001,6001,4001,2001,000
800600400200
0
4,8004,6004,4004,2004,0003,8003,6003,4003,2003,0002,8002,6002,4002,2002,0001,8001,6001,4001,2001,0008006004002000
23
CPU 0 interrupt – is well poised for scaling to 8, 12, and even more CPUs with the GS1280
During peak periods, despite the fact that the GS1280 with 8P is doing slightly more work, it uses a factor of 3.5X less CPU 0 for interrupt activity Red is GS1280
at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[MON.MODES]Cpu 00 Inter mode(# 1)gfedcb [MON.MODES]Cpu 00 Inter mode(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
55
50
45
40
35
30
25
20
15
10
5
0
55
50
45
40
35
30
25
20
15
10
5
0
At peaks of only 20%, the GS 1280 stands ready to handle substantially higher workloads
24
Disk operations rate – This shows the same head and shoulders pattern as direct IO and packets per second
During peak periods, the 10P GS140 maxes out at 2,200 disk operations per second. With this workload, the 8P is able to reach 2,900 per second with lots of room to spare Red is GS1280
at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[MON.DISK]OpRate(# 1)gfedcb [MON.DISK]OpRate(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
2,800
2,600
2,400
2,200
2,000
1,800
1,600
1,400
1,200
1,000
800
600
400
200
0
2,800
2,600
2,400
2,200
2,000
1,800
1,600
1,400
1,200
1,000
800
600
400
200
0
As the load demand on the GS1280 increases, this 8P model looks capable of driving the disk op rate to 6,000/sec
25
Interrupt load during peak periods drops by a factor of almost 5 to 1 from 240% to 50%.
This is another excellent sign of the potential future scalability of this GS1280 to 8 CPUs, 12 CPUs and beyond. Red is GS1280
at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[MON.MODE]Interrupt State(# 1)gfedcb [MON.MODE]Interrupt State(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
2402302202102001901801701601501401301201101009080706050403020100
2402302202102001901801701601501401301201101009080706050403020100
26
Microseconds of CPU per each Direct IO
Normalized statistics like this show the relative power of each GS1280 CPU at 1.15 GHZ is between 3 to 4 times more than the GS140’s 700 MHz CPUs Red is GS1280
at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
microseconds cpu per dirio(# 1)gfedcb microseconds cpu per dirio(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
1,300
1,200
1,100
1,000
900
800
700
600
500
400
300
200
100
0
1,300
1,200
1,100
1,000
900
800
700
600
500
400
300
200
100
0
27
Disk Reads Per Second
This shows same head and shoulders pattern but even more pronounced than what we saw with network packets Red is GS1280
at 1.15GHZ with 8 CPUs
Green is GS140 at 700 MHz with 10 CPUs
Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140
[XFC]Read IOs/Sec(# 1)gfedcb [XFC]Read IOs/Sec(# 2)gfedcb
11:45:00(3-Mar-2003)
11:30:00(3-Mar-2003)
11:15:00(3-Mar-2003)
11:00:00(3-Mar-2003)
10:45:00(3-Mar-2003)
10:30:00(3-Mar-2003)
10:15:00(3-Mar-2003)
10:00:00(3-Mar-2003)
2,200
2,1002,000
1,900
1,8001,700
1,600
1,500
1,4001,300
1,200
1,100
1,000900
800
700
600500
400
300
200100
0
2,200
2,1002,000
1,900
1,8001,700
1,600
1,500
1,4001,300
1,200
1,100
1,000900
800
700
600500
400
300
200100
0
28
Case 2 Summary 10P GS140 to 8P GS1280
• GS1280 with fewer CPUs delivers an estimated headroom increase more than 2X
• Eliminates CPU busy bottleneck• Drastically cuts MPsynch• Able to handle higher peaks as they arrive• Well positioned to scale to 8, 12, or higher
CPUs and achieve headroom increases of 3.5X or even higher.
29
30
Proof Point Patterns• Dramatic cuts in MPsynch• Large drops in Interrupt mode• Higher, short-lived bursts of throughput
– directIO, packets per second, etc.– The “HEAD and SHOULDERS”
• Large increase in spare capacity and headroom– Overall CPU, primary CPU
Where the workload stays relatively flat at the point of transition, the overall throughput numbers are not that different, but the shape of the new curve with its sharp peaks tells an important story
31
32
Case 3 –Stress Test Marvel 32P – RMS1
• This case shows a segment of our RMS1 testing on the 32P Marvel EV7 @ 1.15 GHz
• Using Multiple 4 GB Ramdisks
• Started at 16P, ramped up workload
• Then increased to 24P, throughput dropped
• Then affinitized jobs, throughput jumped
• Combines timeline data from t4, spl, bmesh
33
Background to this test• RMS1 is based on a customer developed database
benchmark test originally written using Rdb and converted to carry out the same task with RMS
• To generate extremely high rates of IO in order to discover the limits of Marvel 32P performance, we ran multiple copies of RMS1, each using their own dedicated RAMdisk
• Caution: The net effect is a test that generates an extremely heavy load, but that cannot be considered to mirror any typical production environment.
34
Timing of Changes
• 12:05 16 CPUs• 12:11 Start ramp up with 4GB ramdisks• 12:30 Increase to 24 CPUs• 12:38 Set Process Affinity• 12:55 Turn off dedicated lock manager
<Observe how timelines help make sense
of this complicated situation>
35
Direct IO up to 70,000 per second!
Node(s) : PRF31A
[MON.SYST]Direct I/O Rate(# 1)gfedcb
12:50:00(20-Feb-2003)
12:40:00(20-Feb-2003)
12:30:00(20-Feb-2003)
12:20:00(20-Feb-2003)
70,000
65,000
60,000
55,000
50,000
45,000
40,000
35,000
30,000
25,000
20,000
15,000
10,000
5,000
70,000
65,000
60,000
55,000
50,000
45,000
40,000
35,000
30,000
25,000
20,000
15,000
10,000
5,000
For the RMS1 workload, the rate of direct IO per second is a key metric of maximum throughput.
Increasing to 24 CPUs, at
12:30 does not increase throughput.
Turning on Affinity causes throughput to jump from 55,000 to over 70,000, and increase of approximately 30% (1.3X)
36
Kernel & MPsynch switch roles
12:30 is when we jumped from 16 CPUs to 24 CPUs. Note how MPsynch (green) jumps up substantially at that time to over 950%.
At 12:37, we started affinitizing the different processes to CPUs we believed to be close to where there associated RAMdisk was located.
Note how MPsynch and Kernel mode cross over at that point.
Node(s) : PRF31A
[MON.MODE]Kernel Mode(# 1)gfedcb [MON.MODE]Mp Synch(# 1)gfedcb
12:55:00(20-Feb-2003)
12:50:00(20-Feb-2003)
12:45:00(20-Feb-2003)
12:40:00(20-Feb-2003)
12:35:00(20-Feb-2003)
12:30:00(20-Feb-2003)
12:25:00(20-Feb-2003)
12:20:00(20-Feb-2003)
12:15:00(20-Feb-2003)
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
150
100
50
0
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
150
100
50
0
37
Lock Busy % from T4 shows jump with affinity
We had dedicated lock manager turned on for this test which creates a very heaving locking load.
Note that there is no change when the number of CPUs is increased at around 12:30.
Note the big jump in Lock % busy that happens when we affinitize.
At over 90% busy, locking is a clear primary bottleneck that will prevent further increases in throughput even with more CPUs.
Node(s) : PRF31A
[LCK73]Busy %(# 1)gfedcb
12:55:00(20-Feb-2003)
12:50:00(20-Feb-2003)
12:45:00(20-Feb-2003)
12:40:00(20-Feb-2003)
12:35:00(20-Feb-2003)
12:30:00(20-Feb-2003)
12:25:00(20-Feb-2003)
12:20:00(20-Feb-2003)
12:15:00(20-Feb-2003)
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
38
Lock requests per sec vs XFC writeA True Linear Relationship
Scatter Diagram for Data Collection on node PRF31A between
20-FEB-2003 12:10:49.12 and 20-FEB-2003 12:55:04.05
[XFC]Write IOs/Sec12,00010,0008,0006,0004,0002,0000
[LC
K73
]Req
Cou
nt/S
ec
450,000
400,000
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
450,000
400,000
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
1.000
The maximum rate of lock requests per minute is an astounding 450,000 per second.
39
Case 3 - SUMMARY• These are by far the best throughput numbers we
have ever seen on this workload for:– Direct IO, Lock requests per second.
• Performance is great out of the box.• New tools simplify bottleneck identification• Straightforward tuning pushes to even higher
values with a surprising large upward jump• Workloads show consistent ratios between key
statistics (e.g. Lock Requests per DIRIO)• Spinlock related bottlenecks remain with us, albeit
at dramatically higher throughput levels
40
41
Case 4 – Production System
• Upgrade from 16 CPU Wildfire EV68 running at 1.224 GHz (the fastest Wildfire)
• Compared to 16 CPU Marvel EV7 running at 1.15 GHz
• Oracle, TCPIP, Mixed Database Server and Application Server
42
CPU Busy cut in halfNote Color Switch!!!
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
Node(s) : MILP1 and MILP1
[MON_SYST]Cpu Busy(# 1)gfedcb [MON_SYST]Cpu Busy(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
1,400
1,350
1,300
1,250
1,200
1,150
1,100
1,050
1,000
950
900
850
800
750
700
650
600
550
500
450
400
350
300
1,400
1,350
1,300
1,250
1,200
1,150
1,100
1,050
1,000
950
900
850
800
750
700
650
600
550
500
450
400
350
300
43
CPU 0 Interrupt is cut by a factor of more than 3 to 1
Node(s) : MILP1 and MILP1
[MON_MODES]Cpu 00 Inter m ode(# 1)gfedcb [MON_MODES]Cpu 00 Inter m ode(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
44
Buffered IO – sustained higher peaks
Node(s) : MILP1 and MILP1
[MON_SYST]Buffered I/O Rate(# 1)gfedcb [MON_SYST]Buffered I/O Rate(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
10,600
10,400
10,20010,000
9,800
9,600
9,400
9,200
9,000
8,800
8,600
8,400
8,200
8,000
7,800
7,600
7,400
7,2007,000
6,800
6,600
6,400
6,200
6,000
5,800
5,600
5,400
5,200
5,000
4,800
4,600
4,400
10,600
10,400
10,20010,000
9,800
9,600
9,400
9,200
9,000
8,800
8,600
8,400
8,200
8,000
7,800
7,600
7,400
7,2007,000
6,800
6,600
6,400
6,200
6,000
5,800
5,600
5,400
5,200
5,000
4,800
4,600
4,400
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
45
Direct IO – sustained higher peaks
Node(s) : MILP1 and MILP1
[MON_SYST]Direct I/O Rate(# 1)gfedcb [MON_SYST]Direct I/O Rate(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
2,500
2,400
2,300
2,200
2,100
2,000
1,900
1,800
1,700
1,600
1,500
1,400
1,300
1,200
1,100
1,000
900
800
700
600
500
2,500
2,400
2,300
2,200
2,100
2,000
1,900
1,800
1,700
1,600
1,500
1,400
1,300
1,200
1,100
1,000
900
800
700
600
500
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
46
System Wide Interrupt diminished by a factor of 4 to 1
Node(s) : MILP1 and MILP1
[MON_MODE]Interrupt State(# 1)gfedcb [MON_MODE]Interrupt State(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
47
MPsynch shrinks by more than 8 to 1
Node(s) : MILP1 and MILP1
[MON_MODE]Mp Synch(# 1)gfedcb [MON_MODE]Mp Synch(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
290
280
270
260
250
240
230
220
210
200
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
10
290
280
270
260
250
240
230
220
210
200
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
10
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
48
Kernel Mode decreases from 260 to 150
Node(s) : MILP1 and MILP1
[MON_MODE]Kernel Mode(# 1)gfedcb [MON_MODE]Kernel Mode(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
310
300
290
280
270
260
250
240
230
220
210
200
190
180
170
160
150
140
310
300
290
280
270
260
250
240
230
220
210
200
190
180
170
160
150
140
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
49
User Mode decreases from about 480 to 240
Node(s) : MILP1 and MILP1
[MON_MODE]User Mode(# 1)gfedcb [MON_MODE]User Mode(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
660
640
620
600
580
560
540
520
500
480
460
440
420
400
380
360
340
320
300
280
260
240
220
200
180
160
140
660
640
620
600
580
560
540
520
500
480
460
440
420
400
380
360
340
320
300
280
260
240
220
200
180
160
140
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
50
Compute Queue disappears
Node(s) : MILP1 and MILP1
[MON_STAT]Com pute(# 1)gfedcb [MON_STAT]Com pute(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
51
Packets per second – head and shoulders with higher peaks
Node(s) : MILP1 and MILP1
[NET_MON]EWA0: Pkts Sent/Sec(# 1)gfedcb [NET_MON]EWA0: Pkts Sent/Sec(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
4,700
4,600
4,500
4,400
4,300
4,200
4,100
4,000
3,900
3,800
3,700
3,600
3,500
3,400
3,300
3,200
3,100
3,000
2,900
2,800
2,700
2,600
2,500
2,400
4,700
4,600
4,500
4,400
4,300
4,200
4,100
4,000
3,900
3,800
3,700
3,600
3,500
3,400
3,300
3,200
3,100
3,000
2,900
2,800
2,700
2,600
2,500
2,400
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
52
Mailbox Writes – head and shoulders with higher peaks
Node(s) : MILP1 and MILP1
[MON_IO ]Mailbox Write Rate(# 1)gfedcb [MON_IO ]Mailbox Write Rate(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
3,100
3,000
2,900
2,800
2,700
2,600
2,500
2,400
2,300
2,200
2,100
2,000
1,900
1,800
1,700
1,600
1,500
1,400
1,300
1,200
1,100
1,000
3,100
3,000
2,900
2,800
2,700
2,600
2,500
2,400
2,300
2,200
2,100
2,000
1,900
1,800
1,700
1,600
1,500
1,400
1,300
1,200
1,100
1,000
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
53
Dedicated Lock Manager Busy drops from 18% down to about 6%
Node(s) : MILP1 and MILP1
[LCK73_MON]Busy %(# 1)gfedcb [LCK73_MON]Busy %(# 2)gfedcb
11:30:00(31-Mar-2003)
11:00:00(31-Mar-2003)
10:30:00(31-Mar-2003)
10:00:00(31-Mar-2003)
09:30:00(31-Mar-2003)
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
Red is GS1280 with 16 CPUs at 1.15 GHz
Green is GS160 with 16 CPUs at 1.224 GHz
54
Case 4 - SUMMARY• The GS160 with 16 CPUs had been highly
tuned, yet was unable to handle the heaviest peak loads presented.
• Bottleneck was related to reaching maximum TCPIP throughput, related MPsynch, and limits on the max BUFIO
• GS1280 immediately, without further adjustment, provided dramatic increase in maximum throughput & huge improvement in spare capacity and headroom.
55
56
Case 5 – Production System
• NOTE color switch in slides
• Upgrade 12P GS140 to 16P GS1280
• Mixed Application and Database Server
57
MPsynch almost disappearsDrops from 130 to under 10%
Node(s) : ALCOR and ALCOR
[MON.MODE]Mp Synch(# 1)gfedcb [MON.MODE]Mp Synch(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
210
200
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
10
210
200
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
10
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
58
Kernel Mode shrinks by more than 5 to 1
Node(s) : ALCOR and ALCOR
[MON.MODE]Kernel Mode(# 1)gfedcb [MON.MODE]Kernel Mode(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
220
210
200
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
220
210
200
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
59
System Wide Interrupt also shrinks by more than 5 to 1
Node(s) : ALCOR and ALCOR
[MON.MODE]Interrupt State(# 1)gfedcb [MON.MODE]Interrupt State(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
120
110
100
90
80
70
60
50
40
30
20
10
120
110
100
90
80
70
60
50
40
30
20
10
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
60
User Mode is cut in half
Node(s) : ALCOR and ALCOR
[MON.MODE]User Mode(# 1)gfedcb [MON.MODE]User Mode(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
650
600
550
500
450
400
350
300
250
200
150
650
600
550
500
450
400
350
300
250
200
150
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
61
CPU busy drops by almost 3 to 1
Node(s) : ALCOR and ALCOR
[MON.SYST]Cpu Busy(# 1)gfedcb [MON.SYST]Cpu Busy(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
1,100
1,050
1,000
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
1,100
1,050
1,000
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
62
CPU 0 Interrupt almost disappears and drops by more than 6 to1
Node(s) : ALCOR and ALCOR
[MON.MODES]Cpu 00 Inter mode(# 1)gfedcb [MON.MODES]Cpu 00 Inter mode(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
63
Node(s) : ALCOR and ALCOR
[MON.SYST]Buffered I/O Rate(# 1)gfedcb [MON.SYST]Buffered I/O Rate(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
9,500
9,000
8,500
8,000
7,500
7,000
6,500
6,000
5,500
5,000
4,500
4,000
3,500
3,000
2,500
9,500
9,000
8,500
8,000
7,500
7,000
6,500
6,000
5,500
5,000
4,500
4,000
3,500
3,000
2,500
Buffered IO – shows consistently higher peaks. There was a real backlog of work waiting to be serviced
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
64
Direct IO shows substantially higher peakswhich are short-lived
Node(s) : ALCOR and ALCOR
[MON.SYST]Direct I/O Rate(# 1)gfedcb [MON.SYST]Direct I/O Rate(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
850
800
750
700
650
600
550
500
450
400
350
300
250
200
150
100
850
800
750
700
650
600
550
500
450
400
350
300
250
200
150
100
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
65
Mailbox write increases from 1400 to over 2400
Node(s) : ALCOR and ALCOR
[MON.IO]Mailbox Write Rate(# 1)gfedcb [MON.IO]Mailbox Write Rate(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
4,400
4,200
4,000
3,800
3,600
3,400
3,200
3,000
2,800
2,600
2,400
2,200
2,000
1,800
1,600
1,400
1,200
1,000
4,400
4,200
4,000
3,800
3,600
3,400
3,200
3,000
2,800
2,600
2,400
2,200
2,000
1,800
1,600
1,400
1,200
1,000
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
66
Compute Queue evaporates
Node(s) : ALCOR and ALCOR
[MON.STAT]Compute(# 1)gfedcb [MON.STAT]Compute(# 2)gfedcb
10:30:00(6-May-2003)
10:00:00(6-May-2003)
09:30:00(6-May-2003)
09:00:00(6-May-2003)
08:30:00(6-May-2003)
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Red is GS140 with 12 CPUs
Green is GS1280 with 16 CPUs
67
Case 5 - SUMMARY• Huge backlog of work can now be handled
successfully during long lasting peak periods as demonstrated by higher buffered IO and other throughput metrics.
• Substantial further reserves of spare capacity
• Large changes in key performance metrics such as MPsynch, interrupt.
68
Proof Point Summary
• Marvel EV7 GS1280 systems are the best performing VMS systems ever.
• Excellent out-of-the-box performance• Superior SMP scaling• Huge increases in maximum throughput, some
realized immediately, the rest held in reserve as spare capacity.
• Marvel provides the headroom for future growth
69
Can your enterprise benefit from an upgrade to a GS1280?
• Systems with high MPsynch• Systems with high primary CPU interrupt load• Poor SMP scaling• Heavy locking• Heavy IO, Direct, Buffered, Mailbox• Heavy use of Oracle, TCPIP, Multinet• Look closer if:
– Systems with poor response time
– Systems with insufficient peak period throughput
70
Would you like to participate in our Marvel Proof Point Program?
• Contact [email protected] for more information about how you can take part
• Download T4 kit from public web site:http://h71000.www7.hp.com/OpenVMS/products/t4/
index.html• Start creating a compact, portable, T4 based
performance history of your most important systems• The T4 data will create a common and efficient
language for our discussions. We can then work with you and help you evaluate your unique pattern of use and the degree to which the advantages of Marvel EV7 on OpenVMS can most benefit you.
71