Upload
leduong
View
230
Download
1
Embed Size (px)
Citation preview
Performance Brief
HP BladeSystem delivers no-compromise low latency, high throughput for next-generation server needs October 2013
Executive summary The fundamental nature of data center computing is rapidly changing. Server virtualization, cloud computing, and everything-as-a-service (XaaS) imperatives are altering data center traffic flows, and escalating bandwidth and performance demands. The next generation data center requires an architecture that can deliver both high throughput and a low, predictable latency for demanding applications.
Applications are changing; architecture matters more than ever
For HP BladeSystem deployments, HP Virtual Connect delivers direct server-to-server connectivity within the rack, enabling wire-speed, machine-to-machine communications for delay-sensitive, bandwidth-intensive traffic. In addition, HP Virtual Connect modules can be leveraged to dynamically fine-tune application-specific performance across server and storage networks to improve scale and make the best use of shared connectivity resources. In contrast, Cisco UCS blade architecture is a hierarchical network design that integrates management in the top of the rack switch or Fabric Interconnect (FI) creating a dependence on a Cisco-only approach.
Network latency test comparison
HP recently conducted a test to compare the network latency and throughput between HP BladeSystem configured with a ProLiant BL460c Gen8 Server Blade and the BladeSystem c7000 platinum enclosure to a Cisco UCS B200 M3 blade server and 5108 enclosure. To measure the network latency and throughput, the Netperf 2.60 application was used to benchmark the two systems. Netperf is a benchmark used to measure the performance of networking infrastructure. It provides tests for both unidirectional throughput and end-to-end latency. Results for latency, throughput, and CPU load were recorded. All testing was performed by HP in August 2013.
In this test, networking was specific to two servers within the same enclosure chassis and on the same side fabric as shown in Figure 1 below.
Figure 1. Setup for Cisco UCS and HP BladeSystem used to measure performance
Key takeaways from the Netperf benchmark test:
• HP BladeSystem delivers BOTH low latency and high throughput numbers without compromise in its next-generation architectural design.
• Cisco’s UCS architectural design forces its customers to compromise: low latency with low throughput numbers OR high latency with high throughput numbers
Cisco UCS HP BladeSystem Only from HP How we did it HP BladeSystem
The HP BladeSystem infrastructure is a modular, future-proof architecture with common components, tools, and processes that includes:
• HP Virtual Connect that simplifies connecting LANs and SANs with wire-once technology.
• HP Networking portfolio that delivers a better architectural approach with a virtualized, flatter, more efficient solution.
• HP FlexFabric that is the industry’s most complete software-defined network fabric for cloud.
1
Interrupt coalescing
Network cards routinely send interrupt requests (IRQs) to software. If the hardware generates too many IRQs, the interrupts can disrupt the intended network traffic resulting in decreased network and application performance. Interrupt coalescing allows users to tune network performance by combining multiple IRQs into a single interrupt.
HP’s BladeSystem network solution offers an Adaptive Mode that dynamically adjusts IRQ coalescing to maximize overall performance. Therefore, no network tuning is required by the user.
Test results
Figure 2. Cisco UCS results for default and opimized interrupt coalescing settings (See full-sized graphs in Appendix D).
Chart 1 ) Latency: Lower is better Chart 2 ) Throughput: Higher is better Chart 3 ) CPU load: Lower is better
Cisco UCS. Two test cases were completed with the Cisco UCS system. For the first test, latency, CPU usage, and throughput were measured with the default settings for system bios. To improve Cisco’s network latency and eliminate Cisco’s hockey-stick effect, a second test case was run where the interrupt request setting for the Cisco VIC 1240 network interface card was tuned down to 0 microseconds latency from the default 125 microseconds (see Interrupt Coalescing explanation).
The test results showed that out-of-the-box default settings for Cisco’s VIC 1240 resulted in very high latency. However, when the Cisco UCS system was tuned with optimized interrupt coalescing settings, the latency decreased significantly to an expected level. On the other side, tuning the Cisco UCS system resulted in higher system processor (CPU) load which would affect application performance.
As a side effect, HP observed that for both settings, the single thread TCP throughput used by Netperf is much lower than expected. When using Jumbo Frames of 9000 bytes, which is another optimization, the results are back on a normal level.
In summary, HP found that Cisco’s latency by default is very high, ~2.5 times compared to the optimized version of the default values (125usec -> ~50usec). Tuning the Cisco UCS system decreases latency but causes a higher processor load and lower TCP throughput. Therefore, Cisco UCS customers must choose if they want low latency or high throughput. Customers should not have to tune network settings immediately upon system installation.
Very high latency when using default More CPU
load due to less-optimized IRQ behavior
Very low throughput with the default MTU of 1500 bytes
MTU 9000
MTU 1500
2
Figure 3. HP results for default setting compared to Cisco UCS tuned for reduced latency setting (See full-sized graphs in Appendix E).
Chart 1 ) Latency: Lower is better Chart 2 ) Throughput: Higher is better Chart 3 ) CPU load: Lower is better HP BladeSystem. A similar test was completed for HP BladeSystem using default settings which measured latency, throughput, and CPU load. The following graphs show the network latency test results for HP BladeSystem with default settings and compare the HP results to the Cisco tuned-for-reduced-latency results (from Figure 2).
Test results showed that out-of-the-box the default setting for HP BladeSystem resulted in nearly identical latencies as the tuned latency Cisco UCS result, but HP delivers lower CPU load and much higher throughput. HP does not force a compromise and requires no initial tuning to get good latency results like the Cisco UCS design does.
What this means for customers
With HP BladeSystem low latency network performance, customers will not experience long delays. Excessive latency creates bottlenecks that can last a few seconds or can be constant. Maintaining consistent performance with high throughput on a network provides HP customers with the assurance they need for their mission-critical requirements. Cisco customers must choose lower latency but with lower throughput or higher throughput but with higher latency. There is no compromise with HP – customers by default receive the high throughput with low latency.
Furthermore, if customers are truly sensitive to low latency, and are willing to tune their network settings, they would benefit from a higher speed Infiniband solution. HP provides an open network infrastructure where customers can choose Infiniband. However, the only fabric from the Cisco UCS blade chassis to the Fabric Interconnect is Ethernet.
Summary As confirmed in the HP Netperf benchmark testing, the HP BladeSystem demonstrates and continues to provide high throughput with the lowest latency when compared to the results of the Cisco UCS blade multi-tier network design. Cisco’s UCS hierarchical approach to the server networking adds hops and latency even for blades within the same enclosure. Repeated testing with the documented setup and configurations validates HP’s claim to network performance without compromise, offering real world benefits for HP customers.
For more information on HP BladeSystem, see www.hp.com/go/bladesystem © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Intel and Intel Xeon are trademarks of Intel Corporation in the U.S. and other countries. All other product, brand, or trade names used in this publication are the trademarks or registered trademarks of their respective trademark owners. Benchmark results as of October 1, 2013.
HP default
Cisco with rx IRQ coalescing tuned
Nearly identical Without any tuning, HP shows less CPU load comparable latency numbers
Cisco
HP
Cisco
HP
HP delivers much higher throughput
3
Appendix A. System under Test (SUT) setup
Latency and Speed Testing Scenario
• Network latency for two servers’ communication in the same blade enclosure • One vNIC used in each server • Source and destination server IP addresses on the same subnet • Host Operating Environment: SUSE Linux Enterprise Server 11 SP2
Test Tool
Netperf Version 2.6.0 http://www.netperf.org/netperf/
Netperf is a benchmark used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.
Test Parameters
For Latency
netperf -i30,3 -t TCP_RR -H <destination ip> -v 2 -- -r <packet size>
For Throughput
netperf -i30,3 -t TCP_STREAM -c -C -H <destination ip> -- -m <packet size>
Appendix B. UCS Configuration
Blade Environment Fabric Interconnect 2 x UCS 6296 UP Enclosure: UCS 5108 Interconnect bays: 2 x UCS 2208XP Server blades: 2 x UCS B200 M3 Server CPU: each 2 x ucs-cpu-e5-2680 Server NIC: each 1 x VIC 1240
Appendix C. BladeSystem Virtual Connect Configuration
Blade Environment Enclosure: C7000 with 2 x Onboard Administrator Interconnect bays: 2 x Virtual Connect Flex-10 Server blades: 2 x BL460c Gen8 Server CPU: each 2 x Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz Server NIC: each 1 x FlexFabric 10Gb 2-port 554FLB
Onboard Administrator Build: 3.71 Dec 07 2012
Virtual Connect Build: 4.01-16 (r206377) May 21 2013
BL460c Gen8 BIOS Setting The BIOS setting on the BL460 Gen 8 Server Blade set to default.
4
Appendix D. Cisco UCS results for default and opimized interrupt coalescing settings Chart 1 ) Latency: Lower is better
Chart 2 ) Throughput: Higher is better
MTU 9000
MTU 1500
5
Chart 3 ) CPU load: Lower is better
Appendix E. HP results for default setting compared to Cisco UCS tuned for reduced latency setting
Chart 1 ) Latency: Lower is better
HP default
Cisco with rx IRQ coalescing tuned
6
Chart 2 ) Throughput: Higher is better
Chart 3 ) CPU load: Lower is better
Cisco
HP
Cisco
HP
7