1 OpenVMS “Marvel” EV7 Proof Points of Olympic Proportions Tech Update - Sept 2003 [email protected]

1

OpenVMS “Marvel” EV7 Proof Points of Olympic

Proportions

Tech Update - Sept 2003

[email protected]

2

OpenVMS Marvel EV7 Proof Points of Olympic Proportions

• Live, mission-critical, production systems

• Multi-dimensional

• Before and after comparisons

• Upgrades from GS140 & GS160 to GS1280

• Proof of impact on maximum headroom

3

Can your enterprise benefit from an upgrade to a GS1280?

• Systems with high MPsynch• Systems with high primary CPU interrupt load• Poor SMP scaling• Heavy locking• Heavy IO, Direct, Buffered, Mailbox• Heavy use of Oracle, TCPIP, Multinet• Look closer if:

– Systems with poor response time

– Systems with insufficient peak period throughput

4

T4 - Data Sources• Data for these comparisons was collected using

the internally developed T4 (tabular timeline tracking tool) suite of coordinated collection utilities and analyzed with TLViz

• The T4 kit & TLViz have consistently proved themselves invaluable for this kind of before and after comparison project. We have now made T4 publicly available for download (will ship with OpenVMS 7.3-2 in SYS$ETC: )

• T4 could be a useful adjunct to your performance management program.

5

Would you like to participate in our Marvel Proof Point Program?

• Contact [email protected] for more information about how you can take part

• Download T4 kit from public web site:http://h71000.www7.hp.com/OpenVMS/products/t4/

index.html• Start creating a compact, portable T4 based

performance history of your most important systems• The T4 data will create a common and efficient

language for our discussions. We can then work with you and help you evaluate your unique pattern of use and the degree to which the advantages of Marvel EV7 on OpenVMS can most benefit you.

6

Want even more detail?

• The electronic version of this presentation contains extensive captions and notes on each slide for your further study, reflection, and review.

7

CASE 1- Production System

12P GS140 700 MHz vs. 16P GS1280 1.15 GHz

Tremendous Gains in Headroom Oracle Database Server

with Multinet

8

Node(s) : HNAM and HNAM

[MON.STAT]Compute(# 1)gfedcb [MON.STAT]Compute(# 2)gfedcb

12:00:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:00:00(3-Mar-2003)

09:30:00(3-Mar-2003)

09:00:00(3-Mar-2003)

08:30:00(3-Mar-2003)

08:00:00(3-Mar-2003)

55

50

45

40

35

30

25

20

15

10

5

0

55

50

45

40

35

30

25

20

15

10

5

0

Compute Queue Completely Evaporates with GS1280

Green is GS140 at 700 MHz with 12 CPU

Red is GS1280 at 1.15GHZ with 16 CPUs

Peak Queues of 57 drop to queues of 1 or 2

9


[MON.MODES]Cpu 00 Idle(# 1)gfedcb [MON.MODES]Cpu 00 Idle(# 2)gfedcb

12:00:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:00:00(3-Mar-2003)

09:30:00(3-Mar-2003)

09:00:00(3-Mar-2003)

08:30:00(3-Mar-2003)

08:00:00(3-Mar-2003)

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

With GS1280, 73% spare CPU 0 during absolute peak. With GS140 CPU 0 is completely consumed during peaks (e.g. at 11 AM)

CPU O Idle Time



10


[MON.SYST]Cpu Busy(# 1)gfedcb [MON.SYST]Cpu Busy(# 2)gfedcb

12:00:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:00:00(3-Mar-2003)

09:30:00(3-Mar-2003)

09:00:00(3-Mar-2003)

08:30:00(3-Mar-2003)

08:00:00(3-Mar-2003)

1,150

1,100

1,050

1,000

950

900

850

800

750

700

650

600

550

500

450

400

350

300

250

200

150

100

50

0

1,150

1,100

1,050

1,000

950

900

850

800

750

700

650

600

550

500

450

400

350

300

250

200

150

100

50

0

Almost 4 to 1 reduction in CPU Busy with GS1280



GS140 is nearly maxed out at more than 1150% busy of 1200% while GS1280 is cruising along at 250% to 350% busy of 1600%

11


[MON.SYST]Direct I/O Rate(# 1)gfedcb [MON.SYST]Direct I/O Rate(# 2)gfedcb

12:00:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:00:00(3-Mar-2003)

09:30:00(3-Mar-2003)

09:00:00(3-Mar-2003)

08:30:00(3-Mar-2003)

08:00:00(3-Mar-2003)

11,500

11,000

10,500

10,000

9,500

9,000

8,500

8,000

7,500

7,000

6,500

6,000

5,500

5,000

4,500

4,000

3,500

3,000

2,500

2,000

1,500

1,000

500

0

11,500

11,000

10,500

10,000

9,500

9,000

8,500

8,000

7,500

7,000

6,500

6,000

5,500

5,000

4,500

4,000

3,500

3,000

2,500

2,000

1,500

1,000

500

0

DirectIO (includes network traffic)



GS1280 is able to push to higher peaks when load gets heavy, while still having huge spare capacity for more work. GS140 is close to maxed out at 10,000 DIRIO per second

12


[MON.MODE]Mp Synch(# 1)gfedcb [MON.MODE]Mp Synch(# 2)gfedcb

12:00:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:00:00(3-Mar-2003)

09:30:00(3-Mar-2003)

09:00:00(3-Mar-2003)

08:30:00(3-Mar-2003)

08:00:00(3-Mar-2003)

105

100

95

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

105

100

95

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

MPsynch



MPsynch drops from 90% to under 10% leaving plenty of room for further scaling

13


[NET.EWA0:]Pkts Sent/Sec(# 1)gfedcb [NET.EWC0:]Pkts Sent/Sec(# 1)gfedc [NET.EWA0:]Pkts Sent/Sec(# 2)gfedc [NET.EWC0:]Pkts Sent/Sec(# 2)gfedcb

12:00:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:00:00(3-Mar-2003)

09:30:00(3-Mar-2003)

09:00:00(3-Mar-2003)

08:30:00(3-Mar-2003)

08:00:00(3-Mar-2003)

6,000

5,500

5,000

4,500

4,000

3,500

3,000

2,500

2,000

1,500

1,000

500

0

6,000

5,500

5,000

4,500

4,000

3,500

3,000

2,500

2,000

1,500

1,000

500

0

Packets Per Second Sent – key throughput metric - Estimate actual maximum rate for GS1280 at more than 20,000/sec

Blue is GS140 at 700 MHz with 12 CPU


GS140 maxes out at about 5,000 packets per second with little or no spare capacity. GS1280 reaches 6,000 with substantial spare capacity

14

Case 1 Summary 12P GS140 to 16P GS160

• GS1280 delivers an estimated increase in headroom of at least 4X

• Eliminates CPU 0 bottleneck

• Drastically cuts MPsynch

• Able to handle higher peaks as they arrive

• Almost 4 to 1 reduction in CPU use while doing slightly more work

15

16

Case 2 – Production System10P GS140 700 MHz

vs. 8P GS1280 1.15 GHz

Tremendous Gains in Headroom for a Oracle Database Server despite reduced CPU count

Poised to Scale

17

Compute Queue Completely Evaporates with GS1280 and the current workload demand


Green is GS140 at 700 MHz with 10 CPUs

Peak Queues of 32 drop to 3

Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140


11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

32

30

28

26

24

22

20

18

16

14

12

10

8

6

4

2

0

32

30

28

26

24

22

20

18

16

14

12

10

8

6

4

2

0

18

CPU O Idle Time



With GS1280, 69% spare CPU 0 during absolute peak with this workload. With GS140 CPU 0 is completely consumed during peaks (e.g. at 10:30 for many minutes at a time)


[MON.MODES]Cpu 00 Idle(# 1)gfedcb [MON.MODES]Cpu 00 Idle(# 2)gfedcb

11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0



19

More than 3 to 1 reduction in CPU Busy with GS1280

GS140 is completed maxed out at more than 1000% busy while GS1280 is cruising along at 200% to 350% busy of 800% Red is GS1280

at 1.15GHZ with 8 CPUs




11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

950

900

850

800

750

700

650

600

550

500

450

400

350

300

250

200

150

100

50

0

950

900

850

800

750

700

650

600

550

500

450

400

350

300

250

200

150

100

50

0

20

DirectIO (includes network traffic)

GS1280 is able to push to higher peaks of 10,500 when the load temporarily gets heavier, while still having huge spare capacity for more work (appx 5 CPUs) The 10P GS140 is maxed out at slightly over 8,000 DIRIO per second.





11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

10,500

10,000

9,500

9,000

8,500

8,000

7,500

7,000

6,500

6,000

5,500

5,000

4,500

4,000

3,500

3,000

2,500

2,000

1,500

1,000

500

0

10,500

10,000

9,500

9,000

8,500

8,000

7,500

7,000

6,500

6,000

5,500

5,000

4,500

4,000

3,500

3,000

2,500

2,000

1,500

1,000

500

0

21

MPsynch (more than a 9 to 1 reduction with this workload)

MPsynch drops from peaks of 67% to peaks of only 7% leaving plenty of room for further scaling Red is GS1280





11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

65

60

55

50

45

40

35

30

25

20

15

10

5

0

65

60

55

50

45

40

35

30

25

20

15

10

5

0

22

Packets Per Second Sent – a key throughput metric - Estimate actual max rate for 8P GS1280 at more than 11,000/sec. With 16P this would rise to 20,000/sec

The10 P GS140 maxes out at about 4,200 packets per second with no spare capacity. The 8P GS1280 reaches 4,800 with more than 4.5 CPUs to spare Red is GS1280




[NET.EWA0:]Pkts Sent/Sec(# 1)gfedcb [NET.EWA0:]Pkts Sent/Sec(# 2)gfedcb

11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

4,8004,6004,4004,2004,0003,8003,6003,4003,2003,0002,8002,6002,4002,2002,0001,8001,6001,4001,2001,000

800600400200

0

4,8004,6004,4004,2004,0003,8003,6003,4003,2003,0002,8002,6002,4002,2002,0001,8001,6001,4001,2001,0008006004002000

23

CPU 0 interrupt – is well poised for scaling to 8, 12, and even more CPUs with the GS1280

During peak periods, despite the fact that the GS1280 with 8P is doing slightly more work, it uses a factor of 3.5X less CPU 0 for interrupt activity Red is GS1280




[MON.MODES]Cpu 00 Inter mode(# 1)gfedcb [MON.MODES]Cpu 00 Inter mode(# 2)gfedcb

11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

55

50

45

40

35

30

25

20

15

10

5

0

55

50

45

40

35

30

25

20

15

10

5

0

At peaks of only 20%, the GS 1280 stands ready to handle substantially higher workloads

24

Disk operations rate – This shows the same head and shoulders pattern as direct IO and packets per second

During peak periods, the 10P GS140 maxes out at 2,200 disk operations per second. With this workload, the 8P is able to reach 2,900 per second with lots of room to spare Red is GS1280




[MON.DISK]OpRate(# 1)gfedcb [MON.DISK]OpRate(# 2)gfedcb

11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

2,800

2,600

2,400

2,200

2,000

1,800

1,600

1,400

1,200

1,000

800

600

400

200

0

2,800

2,600

2,400

2,200

2,000

1,800

1,600

1,400

1,200

1,000

800

600

400

200

0

As the load demand on the GS1280 increases, this 8P model looks capable of driving the disk op rate to 6,000/sec

25

Interrupt load during peak periods drops by a factor of almost 5 to 1 from 240% to 50%.

This is another excellent sign of the potential future scalability of this GS1280 to 8 CPUs, 12 CPUs and beyond. Red is GS1280




[MON.MODE]Interrupt State(# 1)gfedcb [MON.MODE]Interrupt State(# 2)gfedcb

11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

2402302202102001901801701601501401301201101009080706050403020100

2402302202102001901801701601501401301201101009080706050403020100

26

Microseconds of CPU per each Direct IO

Normalized statistics like this show the relative power of each GS1280 CPU at 1.15 GHZ is between 3 to 4 times more than the GS140’s 700 MHz CPUs Red is GS1280




microseconds cpu per dirio(# 1)gfedcb microseconds cpu per dirio(# 2)gfedcb

11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

1,300

1,200

1,100

1,000

900

800

700

600

500

400

300

200

100

0

1,300

1,200

1,100

1,000

900

800

700

600

500

400

300

200

100

0

27

Disk Reads Per Second

This shows same head and shoulders pattern but even more pronounced than what we saw with network packets Red is GS1280




[XFC]Read IOs/Sec(# 1)gfedcb [XFC]Read IOs/Sec(# 2)gfedcb

11:45:00(3-Mar-2003)

11:30:00(3-Mar-2003)

11:15:00(3-Mar-2003)

11:00:00(3-Mar-2003)

10:45:00(3-Mar-2003)

10:30:00(3-Mar-2003)

10:15:00(3-Mar-2003)

10:00:00(3-Mar-2003)

2,200

2,1002,000

1,900

1,8001,700

1,600

1,500

1,4001,300

1,200

1,100

1,000900

800

700

600500

400

300

200100

0

2,200

2,1002,000

1,900

1,8001,700

1,600

1,500

1,4001,300

1,200

1,100

1,000900

800

700

600500

400

300

200100

0

28

Case 2 Summary 10P GS140 to 8P GS1280

• GS1280 with fewer CPUs delivers an estimated headroom increase more than 2X

• Eliminates CPU busy bottleneck• Drastically cuts MPsynch• Able to handle higher peaks as they arrive• Well positioned to scale to 8, 12, or higher

CPUs and achieve headroom increases of 3.5X or even higher.

29

30

Proof Point Patterns• Dramatic cuts in MPsynch• Large drops in Interrupt mode• Higher, short-lived bursts of throughput

– directIO, packets per second, etc.– The “HEAD and SHOULDERS”

• Large increase in spare capacity and headroom– Overall CPU, primary CPU

Where the workload stays relatively flat at the point of transition, the overall throughput numbers are not that different, but the shape of the new curve with its sharp peaks tells an important story

31

32

Case 3 –Stress Test Marvel 32P – RMS1

• This case shows a segment of our RMS1 testing on the 32P Marvel EV7 @ 1.15 GHz

• Using Multiple 4 GB Ramdisks

• Started at 16P, ramped up workload

• Then increased to 24P, throughput dropped

• Then affinitized jobs, throughput jumped

• Combines timeline data from t4, spl, bmesh

33

Background to this test• RMS1 is based on a customer developed database

benchmark test originally written using Rdb and converted to carry out the same task with RMS

• To generate extremely high rates of IO in order to discover the limits of Marvel 32P performance, we ran multiple copies of RMS1, each using their own dedicated RAMdisk

• Caution: The net effect is a test that generates an extremely heavy load, but that cannot be considered to mirror any typical production environment.

34

Timing of Changes

• 12:05 16 CPUs• 12:11 Start ramp up with 4GB ramdisks• 12:30 Increase to 24 CPUs• 12:38 Set Process Affinity• 12:55 Turn off dedicated lock manager

<Observe how timelines help make sense

of this complicated situation>

35

Direct IO up to 70,000 per second!

Node(s) : PRF31A

[MON.SYST]Direct I/O Rate(# 1)gfedcb

12:50:00(20-Feb-2003)

12:40:00(20-Feb-2003)

12:30:00(20-Feb-2003)

12:20:00(20-Feb-2003)

70,000

65,000

60,000

55,000

50,000

45,000

40,000

35,000

30,000

25,000

20,000

15,000

10,000

5,000

70,000

65,000

60,000

55,000

50,000

45,000

40,000

35,000

30,000

25,000

20,000

15,000

10,000

5,000

For the RMS1 workload, the rate of direct IO per second is a key metric of maximum throughput.

Increasing to 24 CPUs, at

12:30 does not increase throughput.

Turning on Affinity causes throughput to jump from 55,000 to over 70,000, and increase of approximately 30% (1.3X)

36

Kernel & MPsynch switch roles

12:30 is when we jumped from 16 CPUs to 24 CPUs. Note how MPsynch (green) jumps up substantially at that time to over 950%.

At 12:37, we started affinitizing the different processes to CPUs we believed to be close to where there associated RAMdisk was located.

Note how MPsynch and Kernel mode cross over at that point.

Node(s) : PRF31A

[MON.MODE]Kernel Mode(# 1)gfedcb [MON.MODE]Mp Synch(# 1)gfedcb

12:55:00(20-Feb-2003)

12:50:00(20-Feb-2003)

12:45:00(20-Feb-2003)

12:40:00(20-Feb-2003)

12:35:00(20-Feb-2003)

12:30:00(20-Feb-2003)

12:25:00(20-Feb-2003)

12:20:00(20-Feb-2003)

12:15:00(20-Feb-2003)

950

900

850

800

750

700

650

600

550

500

450

400

350

300

250

200

150

100

50

0

950

900

850

800

750

700

650

600

550

500

450

400

350

300

250

200

150

100

50

0

37

Lock Busy % from T4 shows jump with affinity

We had dedicated lock manager turned on for this test which creates a very heaving locking load.

Note that there is no change when the number of CPUs is increased at around 12:30.

Note the big jump in Lock % busy that happens when we affinitize.

At over 90% busy, locking is a clear primary bottleneck that will prevent further increases in throughput even with more CPUs.

Node(s) : PRF31A

[LCK73]Busy %(# 1)gfedcb

12:55:00(20-Feb-2003)

12:50:00(20-Feb-2003)

12:45:00(20-Feb-2003)

12:40:00(20-Feb-2003)

12:35:00(20-Feb-2003)

12:30:00(20-Feb-2003)

12:25:00(20-Feb-2003)

12:20:00(20-Feb-2003)

12:15:00(20-Feb-2003)

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

38

Lock requests per sec vs XFC writeA True Linear Relationship

Scatter Diagram for Data Collection on node PRF31A between

20-FEB-2003 12:10:49.12 and 20-FEB-2003 12:55:04.05

[XFC]Write IOs/Sec12,00010,0008,0006,0004,0002,0000

[LC

K73

]Req

Cou

nt/S

ec

450,000

400,000

350,000

300,000

250,000

200,000

150,000

100,000

50,000

0

450,000

400,000

350,000

300,000

250,000

200,000

150,000

100,000

50,000

0

1.000

The maximum rate of lock requests per minute is an astounding 450,000 per second.

39

Case 3 - SUMMARY• These are by far the best throughput numbers we

have ever seen on this workload for:– Direct IO, Lock requests per second.

• Performance is great out of the box.• New tools simplify bottleneck identification• Straightforward tuning pushes to even higher

values with a surprising large upward jump• Workloads show consistent ratios between key

statistics (e.g. Lock Requests per DIRIO)• Spinlock related bottlenecks remain with us, albeit

at dramatically higher throughput levels

40

41

Case 4 – Production System

• Upgrade from 16 CPU Wildfire EV68 running at 1.224 GHz (the fastest Wildfire)

• Compared to 16 CPU Marvel EV7 running at 1.15 GHz

• Oracle, TCPIP, Mixed Database Server and Application Server

42

CPU Busy cut in halfNote Color Switch!!!

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

Node(s) : MILP1 and MILP1

[MON_SYST]Cpu Busy(# 1)gfedcb [MON_SYST]Cpu Busy(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

1,400

1,350

1,300

1,250

1,200

1,150

1,100

1,050

1,000

950

900

850

800

750

700

650

600

550

500

450

400

350

300

1,400

1,350

1,300

1,250

1,200

1,150

1,100

1,050

1,000

950

900

850

800

750

700

650

600

550

500

450

400

350

300

43

CPU 0 Interrupt is cut by a factor of more than 3 to 1


[MON_MODES]Cpu 00 Inter m ode(# 1)gfedcb [MON_MODES]Cpu 00 Inter m ode(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15



44

Buffered IO – sustained higher peaks


[MON_SYST]Buffered I/O Rate(# 1)gfedcb [MON_SYST]Buffered I/O Rate(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

10,600

10,400

10,20010,000

9,800

9,600

9,400

9,200

9,000

8,800

8,600

8,400

8,200

8,000

7,800

7,600

7,400

7,2007,000

6,800

6,600

6,400

6,200

6,000

5,800

5,600

5,400

5,200

5,000

4,800

4,600

4,400

10,600

10,400

10,20010,000

9,800

9,600

9,400

9,200

9,000

8,800

8,600

8,400

8,200

8,000

7,800

7,600

7,400

7,2007,000

6,800

6,600

6,400

6,200

6,000

5,800

5,600

5,400

5,200

5,000

4,800

4,600

4,400



45

Direct IO – sustained higher peaks


[MON_SYST]Direct I/O Rate(# 1)gfedcb [MON_SYST]Direct I/O Rate(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

2,500

2,400

2,300

2,200

2,100

2,000

1,900

1,800

1,700

1,600

1,500

1,400

1,300

1,200

1,100

1,000

900

800

700

600

500

2,500

2,400

2,300

2,200

2,100

2,000

1,900

1,800

1,700

1,600

1,500

1,400

1,300

1,200

1,100

1,000

900

800

700

600

500



46

System Wide Interrupt diminished by a factor of 4 to 1


[MON_MODE]Interrupt State(# 1)gfedcb [MON_MODE]Interrupt State(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

190

180

170

160

150

140

130

120

110

100

90

80

70

60

50

40

30

190

180

170

160

150

140

130

120

110

100

90

80

70

60

50

40

30



47

MPsynch shrinks by more than 8 to 1


[MON_MODE]Mp Synch(# 1)gfedcb [MON_MODE]Mp Synch(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

290

280

270

260

250

240

230

220

210

200

190

180

170

160

150

140

130

120

110

100

90

80

70

60

50

40

30

20

10

290

280

270

260

250

240

230

220

210

200

190

180

170

160

150

140

130

120

110

100

90

80

70

60

50

40

30

20

10



48

Kernel Mode decreases from 260 to 150


[MON_MODE]Kernel Mode(# 1)gfedcb [MON_MODE]Kernel Mode(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

310

300

290

280

270

260

250

240

230

220

210

200

190

180

170

160

150

140

310

300

290

280

270

260

250

240

230

220

210

200

190

180

170

160

150

140



49

User Mode decreases from about 480 to 240


[MON_MODE]User Mode(# 1)gfedcb [MON_MODE]User Mode(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

660

640

620

600

580

560

540

520

500

480

460

440

420

400

380

360

340

320

300

280

260

240

220

200

180

160

140

660

640

620

600

580

560

540

520

500

480

460

440

420

400

380

360

340

320

300

280

260

240

220

200

180

160

140



50

Compute Queue disappears


[MON_STAT]Com pute(# 1)gfedcb [MON_STAT]Com pute(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0



51

Packets per second – head and shoulders with higher peaks


[NET_MON]EWA0: Pkts Sent/Sec(# 1)gfedcb [NET_MON]EWA0: Pkts Sent/Sec(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

4,700

4,600

4,500

4,400

4,300

4,200

4,100

4,000

3,900

3,800

3,700

3,600

3,500

3,400

3,300

3,200

3,100

3,000

2,900

2,800

2,700

2,600

2,500

2,400

4,700

4,600

4,500

4,400

4,300

4,200

4,100

4,000

3,900

3,800

3,700

3,600

3,500

3,400

3,300

3,200

3,100

3,000

2,900

2,800

2,700

2,600

2,500

2,400



52

Mailbox Writes – head and shoulders with higher peaks


[MON_IO ]Mailbox Write Rate(# 1)gfedcb [MON_IO ]Mailbox Write Rate(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

3,100

3,000

2,900

2,800

2,700

2,600

2,500

2,400

2,300

2,200

2,100

2,000

1,900

1,800

1,700

1,600

1,500

1,400

1,300

1,200

1,100

1,000

3,100

3,000

2,900

2,800

2,700

2,600

2,500

2,400

2,300

2,200

2,100

2,000

1,900

1,800

1,700

1,600

1,500

1,400

1,300

1,200

1,100

1,000



53

Dedicated Lock Manager Busy drops from 18% down to about 6%


[LCK73_MON]Busy %(# 1)gfedcb [LCK73_MON]Busy %(# 2)gfedcb

11:30:00(31-Mar-2003)

11:00:00(31-Mar-2003)

10:30:00(31-Mar-2003)

10:00:00(31-Mar-2003)

09:30:00(31-Mar-2003)

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2



54

Case 4 - SUMMARY• The GS160 with 16 CPUs had been highly

tuned, yet was unable to handle the heaviest peak loads presented.

• Bottleneck was related to reaching maximum TCPIP throughput, related MPsynch, and limits on the max BUFIO

• GS1280 immediately, without further adjustment, provided dramatic increase in maximum throughput & huge improvement in spare capacity and headroom.

55

56

Case 5 – Production System

• NOTE color switch in slides

• Upgrade 12P GS140 to 16P GS1280

• Mixed Application and Database Server

57

MPsynch almost disappearsDrops from 130 to under 10%

Node(s) : ALCOR and ALCOR


10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

210

200

190

180

170

160

150

140

130

120

110

100

90

80

70

60

50

40

30

20

10

210

200

190

180

170

160

150

140

130

120

110

100

90

80

70

60

50

40

30

20

10

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

58

Kernel Mode shrinks by more than 5 to 1


[MON.MODE]Kernel Mode(# 1)gfedcb [MON.MODE]Kernel Mode(# 2)gfedcb

10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

220

210

200

190

180

170

160

150

140

130

120

110

100

90

80

70

60

50

40

30

20

220

210

200

190

180

170

160

150

140

130

120

110

100

90

80

70

60

50

40

30

20



59

System Wide Interrupt also shrinks by more than 5 to 1


[MON.MODE]Interrupt State(# 1)gfedcb [MON.MODE]Interrupt State(# 2)gfedcb

10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

120

110

100

90

80

70

60

50

40

30

20

10

120

110

100

90

80

70

60

50

40

30

20

10



60

User Mode is cut in half


[MON.MODE]User Mode(# 1)gfedcb [MON.MODE]User Mode(# 2)gfedcb

10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

650

600

550

500

450

400

350

300

250

200

150

650

600

550

500

450

400

350

300

250

200

150



61

CPU busy drops by almost 3 to 1



10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

1,100

1,050

1,000

950

900

850

800

750

700

650

600

550

500

450

400

350

300

250

200

1,100

1,050

1,000

950

900

850

800

750

700

650

600

550

500

450

400

350

300

250

200



62

CPU 0 Interrupt almost disappears and drops by more than 6 to1


[MON.MODES]Cpu 00 Inter mode(# 1)gfedcb [MON.MODES]Cpu 00 Inter mode(# 2)gfedcb

10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5



63


[MON.SYST]Buffered I/O Rate(# 1)gfedcb [MON.SYST]Buffered I/O Rate(# 2)gfedcb

10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

9,500

9,000

8,500

8,000

7,500

7,000

6,500

6,000

5,500

5,000

4,500

4,000

3,500

3,000

2,500

9,500

9,000

8,500

8,000

7,500

7,000

6,500

6,000

5,500

5,000

4,500

4,000

3,500

3,000

2,500

Buffered IO – shows consistently higher peaks. There was a real backlog of work waiting to be serviced



64

Direct IO shows substantially higher peakswhich are short-lived



10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

850

800

750

700

650

600

550

500

450

400

350

300

250

200

150

100

850

800

750

700

650

600

550

500

450

400

350

300

250

200

150

100



65

Mailbox write increases from 1400 to over 2400


[MON.IO]Mailbox Write Rate(# 1)gfedcb [MON.IO]Mailbox Write Rate(# 2)gfedcb

10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

4,400

4,200

4,000

3,800

3,600

3,400

3,200

3,000

2,800

2,600

2,400

2,200

2,000

1,800

1,600

1,400

1,200

1,000

4,400

4,200

4,000

3,800

3,600

3,400

3,200

3,000

2,800

2,600

2,400

2,200

2,000

1,800

1,600

1,400

1,200

1,000



66

Compute Queue evaporates



10:30:00(6-May-2003)

10:00:00(6-May-2003)

09:30:00(6-May-2003)

09:00:00(6-May-2003)

08:30:00(6-May-2003)

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0



67

Case 5 - SUMMARY• Huge backlog of work can now be handled

successfully during long lasting peak periods as demonstrated by higher buffered IO and other throughput metrics.

• Substantial further reserves of spare capacity

• Large changes in key performance metrics such as MPsynch, interrupt.

68

Proof Point Summary

• Marvel EV7 GS1280 systems are the best performing VMS systems ever.

• Excellent out-of-the-box performance• Superior SMP scaling• Huge increases in maximum throughput, some

realized immediately, the rest held in reserve as spare capacity.

• Marvel provides the headroom for future growth

69

Can your enterprise benefit from an upgrade to a GS1280?

• Systems with high MPsynch• Systems with high primary CPU interrupt load• Poor SMP scaling• Heavy locking• Heavy IO, Direct, Buffered, Mailbox• Heavy use of Oracle, TCPIP, Multinet• Look closer if:

– Systems with poor response time

– Systems with insufficient peak period throughput

70

Would you like to participate in our Marvel Proof Point Program?

• Contact [email protected] for more information about how you can take part

• Download T4 kit from public web site:http://h71000.www7.hp.com/OpenVMS/products/t4/

index.html• Start creating a compact, portable, T4 based

performance history of your most important systems• The T4 data will create a common and efficient

language for our discussions. We can then work with you and help you evaluate your unique pattern of use and the degree to which the advantages of Marvel EV7 on OpenVMS can most benefit you.

71

Documents

1 OpenVMS “Marvel” EV7 Proof Points of Olympic Proportions Tech Update - Sept 2003 [email protected]