7
Performance Brief HP BladeSystem delivers no-compromise low latency, high throughput for next-generation server needs October 2013 Executive summary The fundamental nature of data center computing is rapidly changing. Server virtualization, cloud computing, and everything-as-a- service (XaaS) imperatives are altering data center traffic flows, and escalating bandwidth and performance demands. The next generation data center requires an architecture that can deliver both high throughput and a low, predictable latency for demanding applications. Applications are changing; architecture matters more than ever For HP BladeSystem deployments, HP Virtual Connect delivers direct server-to-server connectivity within the rack, enabling wire-speed, machine-to-machine communications for delay-sensitive, bandwidth-intensive traffic. In addition, HP Virtual Connect modules can be leveraged to dynamically fine-tune application-specific performance across server and storage networks to improve scale and make the best use of shared connectivity resources. In contrast, Cisco UCS blade architecture is a hierarchical network design that integrates management in the top of the rack switch or Fabric Interconnect (FI) creating a dependence on a Cisco-only approach. Network latency test comparison HP recently conducted a test to compare the network latency and throughput between HP BladeSystem configured with a ProLiant BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure to a Cisco UCS B200 M3 blade server and 5108 enclosure. To measure the network latency and throughput, the Netperf 2.60 application was used to benchmark the two systems. Netperf is a benchmark used to measure the performance of networking infrastructure. It provides tests for both unidirectional throughput and end- to-end latency. Results for latency, throughput, and CPU load were recorded. All testing was performed by HP in August 2013. In this test, networking was specific to two servers within the same enclosure chassis and on the same side fabric as shown in Figure 1 below. Figure 1. Setup for Cisco UCS and HP BladeSystem used to measure performance Key takeaways from the Netperf benchmark test: HP BladeSystem delivers BOTH low latency and high throughput numbers without compromise in its next-generation architectural design. Cisco’s UCS architectural design forces its customers to compromise: low latency with low throughput numbers OR high latency with high throughput numbers Cisco UCS HP BladeSystem Only from HP How we did it HP BladeSystem The HP BladeSystem infrastructure is a modular, future-proof architecture with common components, tools, and processes that includes: HP Virtual Connect that simplifies connecting LANs and SANs with wire-once technology. HP Networking portfolio that delivers a better architectural approach with a virtualized, flatter, more efficient solution. HP FlexFabric that is the industry’s most complete software-defined network fabric for cloud. 1

HP BladeSystem delivers no-compromise low latency, high ... · BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure ... fabric from the Cisco UCS blade chassis to

  • Upload
    leduong

  • View
    230

  • Download
    1

Embed Size (px)

Citation preview

Page 1: HP BladeSystem delivers no-compromise low latency, high ... · BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure ... fabric from the Cisco UCS blade chassis to

Performance Brief

HP BladeSystem delivers no-compromise low latency, high throughput for next-generation server needs October 2013

Executive summary The fundamental nature of data center computing is rapidly changing. Server virtualization, cloud computing, and everything-as-a-service (XaaS) imperatives are altering data center traffic flows, and escalating bandwidth and performance demands. The next generation data center requires an architecture that can deliver both high throughput and a low, predictable latency for demanding applications.

Applications are changing; architecture matters more than ever

For HP BladeSystem deployments, HP Virtual Connect delivers direct server-to-server connectivity within the rack, enabling wire-speed, machine-to-machine communications for delay-sensitive, bandwidth-intensive traffic. In addition, HP Virtual Connect modules can be leveraged to dynamically fine-tune application-specific performance across server and storage networks to improve scale and make the best use of shared connectivity resources. In contrast, Cisco UCS blade architecture is a hierarchical network design that integrates management in the top of the rack switch or Fabric Interconnect (FI) creating a dependence on a Cisco-only approach.

Network latency test comparison

HP recently conducted a test to compare the network latency and throughput between HP BladeSystem configured with a ProLiant BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure to a Cisco UCS B200 M3 blade server and 5108 enclosure. To measure the network latency and throughput, the Netperf 2.60 application was used to benchmark the two systems. Netperf is a benchmark used to measure the performance of networking infrastructure. It provides tests for both unidirectional throughput and end-to-end latency. Results for latency, throughput, and CPU load were recorded. All testing was performed by HP in August 2013.

In this test, networking was specific to two servers within the same enclosure chassis and on the same side fabric as shown in Figure 1 below.

Figure 1. Setup for Cisco UCS and HP BladeSystem used to measure performance

Key takeaways from the Netperf benchmark test:

• HP BladeSystem delivers BOTH low latency and high throughput numbers without compromise in its next-generation architectural design.

• Cisco’s UCS architectural design forces its customers to compromise: low latency with low throughput numbers OR high latency with high throughput numbers

Cisco UCS HP BladeSystem Only from HP How we did it HP BladeSystem

The HP BladeSystem infrastructure is a modular, future-proof architecture with common components, tools, and processes that includes:

• HP Virtual Connect that simplifies connecting LANs and SANs with wire-once technology.

• HP Networking portfolio that delivers a better architectural approach with a virtualized, flatter, more efficient solution.

• HP FlexFabric that is the industry’s most complete software-defined network fabric for cloud.

1

Page 2: HP BladeSystem delivers no-compromise low latency, high ... · BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure ... fabric from the Cisco UCS blade chassis to

Interrupt coalescing

Network cards routinely send interrupt requests (IRQs) to software. If the hardware generates too many IRQs, the interrupts can disrupt the intended network traffic resulting in decreased network and application performance. Interrupt coalescing allows users to tune network performance by combining multiple IRQs into a single interrupt.

HP’s BladeSystem network solution offers an Adaptive Mode that dynamically adjusts IRQ coalescing to maximize overall performance. Therefore, no network tuning is required by the user.

Test results

Figure 2. Cisco UCS results for default and opimized interrupt coalescing settings (See full-sized graphs in Appendix D).

Chart 1 ) Latency: Lower is better Chart 2 ) Throughput: Higher is better Chart 3 ) CPU load: Lower is better

Cisco UCS. Two test cases were completed with the Cisco UCS system. For the first test, latency, CPU usage, and throughput were measured with the default settings for system bios. To improve Cisco’s network latency and eliminate Cisco’s hockey-stick effect, a second test case was run where the interrupt request setting for the Cisco VIC 1240 network interface card was tuned down to 0 microseconds latency from the default 125 microseconds (see Interrupt Coalescing explanation).

The test results showed that out-of-the-box default settings for Cisco’s VIC 1240 resulted in very high latency. However, when the Cisco UCS system was tuned with optimized interrupt coalescing settings, the latency decreased significantly to an expected level. On the other side, tuning the Cisco UCS system resulted in higher system processor (CPU) load which would affect application performance.

As a side effect, HP observed that for both settings, the single thread TCP throughput used by Netperf is much lower than expected. When using Jumbo Frames of 9000 bytes, which is another optimization, the results are back on a normal level.

In summary, HP found that Cisco’s latency by default is very high, ~2.5 times compared to the optimized version of the default values (125usec -> ~50usec). Tuning the Cisco UCS system decreases latency but causes a higher processor load and lower TCP throughput. Therefore, Cisco UCS customers must choose if they want low latency or high throughput. Customers should not have to tune network settings immediately upon system installation.

Very high latency when using default More CPU

load due to less-optimized IRQ behavior

Very low throughput with the default MTU of 1500 bytes

MTU 9000

MTU 1500

2

Page 3: HP BladeSystem delivers no-compromise low latency, high ... · BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure ... fabric from the Cisco UCS blade chassis to

Figure 3. HP results for default setting compared to Cisco UCS tuned for reduced latency setting (See full-sized graphs in Appendix E).

Chart 1 ) Latency: Lower is better Chart 2 ) Throughput: Higher is better Chart 3 ) CPU load: Lower is better HP BladeSystem. A similar test was completed for HP BladeSystem using default settings which measured latency, throughput, and CPU load. The following graphs show the network latency test results for HP BladeSystem with default settings and compare the HP results to the Cisco tuned-for-reduced-latency results (from Figure 2).

Test results showed that out-of-the-box the default setting for HP BladeSystem resulted in nearly identical latencies as the tuned latency Cisco UCS result, but HP delivers lower CPU load and much higher throughput. HP does not force a compromise and requires no initial tuning to get good latency results like the Cisco UCS design does.

What this means for customers

With HP BladeSystem low latency network performance, customers will not experience long delays. Excessive latency creates bottlenecks that can last a few seconds or can be constant. Maintaining consistent performance with high throughput on a network provides HP customers with the assurance they need for their mission-critical requirements. Cisco customers must choose lower latency but with lower throughput or higher throughput but with higher latency. There is no compromise with HP – customers by default receive the high throughput with low latency.

Furthermore, if customers are truly sensitive to low latency, and are willing to tune their network settings, they would benefit from a higher speed Infiniband solution. HP provides an open network infrastructure where customers can choose Infiniband. However, the only fabric from the Cisco UCS blade chassis to the Fabric Interconnect is Ethernet.

Summary As confirmed in the HP Netperf benchmark testing, the HP BladeSystem demonstrates and continues to provide high throughput with the lowest latency when compared to the results of the Cisco UCS blade multi-tier network design. Cisco’s UCS hierarchical approach to the server networking adds hops and latency even for blades within the same enclosure. Repeated testing with the documented setup and configurations validates HP’s claim to network performance without compromise, offering real world benefits for HP customers.

For more information on HP BladeSystem, see www.hp.com/go/bladesystem © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Intel and Intel Xeon are trademarks of Intel Corporation in the U.S. and other countries. All other product, brand, or trade names used in this publication are the trademarks or registered trademarks of their respective trademark owners. Benchmark results as of October 1, 2013.

HP default

Cisco with rx IRQ coalescing tuned

Nearly identical Without any tuning, HP shows less CPU load comparable latency numbers

Cisco

HP

Cisco

HP

HP delivers much higher throughput

3

Page 4: HP BladeSystem delivers no-compromise low latency, high ... · BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure ... fabric from the Cisco UCS blade chassis to

Appendix A. System under Test (SUT) setup

Latency and Speed Testing Scenario

• Network latency for two servers’ communication in the same blade enclosure • One vNIC used in each server • Source and destination server IP addresses on the same subnet • Host Operating Environment: SUSE Linux Enterprise Server 11 SP2

Test Tool

Netperf Version 2.6.0 http://www.netperf.org/netperf/

Netperf is a benchmark used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

Test Parameters

For Latency

netperf -i30,3 -t TCP_RR -H <destination ip> -v 2 -- -r <packet size>

For Throughput

netperf -i30,3 -t TCP_STREAM -c -C -H <destination ip> -- -m <packet size>

Appendix B. UCS Configuration

Blade Environment Fabric Interconnect 2 x UCS 6296 UP Enclosure: UCS 5108 Interconnect bays: 2 x UCS 2208XP Server blades: 2 x UCS B200 M3 Server CPU: each 2 x ucs-cpu-e5-2680 Server NIC: each 1 x VIC 1240

Appendix C. BladeSystem Virtual Connect Configuration

Blade Environment Enclosure: C7000 with 2 x Onboard Administrator Interconnect bays: 2 x Virtual Connect Flex-10 Server blades: 2 x BL460c Gen8 Server CPU: each 2 x Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz Server NIC: each 1 x FlexFabric 10Gb 2-port 554FLB

Onboard Administrator Build: 3.71 Dec 07 2012

Virtual Connect Build: 4.01-16 (r206377) May 21 2013

BL460c Gen8 BIOS Setting The BIOS setting on the BL460 Gen 8 Server Blade set to default.

4

Page 5: HP BladeSystem delivers no-compromise low latency, high ... · BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure ... fabric from the Cisco UCS blade chassis to

Appendix D. Cisco UCS results for default and opimized interrupt coalescing settings Chart 1 ) Latency: Lower is better

Chart 2 ) Throughput: Higher is better

MTU 9000

MTU 1500

5

Page 6: HP BladeSystem delivers no-compromise low latency, high ... · BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure ... fabric from the Cisco UCS blade chassis to

Chart 3 ) CPU load: Lower is better

Appendix E. HP results for default setting compared to Cisco UCS tuned for reduced latency setting

Chart 1 ) Latency: Lower is better

HP default

Cisco with rx IRQ coalescing tuned

6

Page 7: HP BladeSystem delivers no-compromise low latency, high ... · BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure ... fabric from the Cisco UCS blade chassis to

Chart 2 ) Throughput: Higher is better

Chart 3 ) CPU load: Lower is better

Cisco

HP

Cisco

HP

7