23
Thank you for joining! The Webinar will begin shortly

Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Embed Size (px)

Citation preview

Page 1: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Thank you for joining!

The Webinar will begin shortly

Page 2: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Achieving Lowest Latencies at Highest

Message Rates with Intel Xeon E5-2600

and Solarflare

June 7, 2012

Page 3: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 3

AGENDA

• Intel

– Xeon® Processor E5-2600

– Platform I/O enhancements

• Solarflare

– 10GbE server adapters

– OpenOnload

• How to achieve the best performance

– Intel Xeon E5-2600 + Solarflare SFN6122F: winning combination

• Q&A

Page 4: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Intel® Xeon® Processor E5-2600 Product Family

4

Flexible & Efficient Advanced features automate power consumption across the platform

Best combination

of performance, power efficiency, and cost

Leading Performance Up to 80% performance boost over Intel® Xeon® processor

5600 series-based servers1

The Heart of a Next-Generation Data Center

1 Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012. Configuration details in backup

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to intel.com/performance”

Page 5: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Intel® Xeon® Processor E5-2600 Product Family

5

Reduce Bottlenecks With Intel® Integrated I/O

Intel® Integrated I/O Would you put a racecar engine in this…

…or this?

CORE 1 CORE 2

CORE 3 CORE 4

CORE 5 CORE 6

CORE 7 CORE 8

Xeon E5 2600

CACHE

Integrated PCI Express* 3.0

* Other names and brands may be claimed as the property of others

Page 6: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Intel® Xeon® Processor E5-2600 Product Family

New Intel® Integrated I/O

1st server processor with Intel® Integrated I/O

Reduces I/O latency by as much as 30%1

Improves IO bandwidth by as much as 2x with PCI Express* 3.0 support2

1 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs. Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details

2 Source: 8 GT/s and 128b/130b encoding in PCIe* 3.0 specification enables double the interconnect bandwidth over the PCIe* 2.0 specification (www.pcisig.com/news_room/November_18_2010_Press_Release/ ).

* Other names and brands may be claimed as the property of others 6

Intel® Integrated I/O

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Page 7: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Intel® Xeon® Processor E5-2600 Product Family

7

[ Transactions per second ]

Xeon 5600 Series

Xeon 2600

Family

Send I/O directly to and from processor cache for all I/O traffic types Can allow system memory to remain in low power state Reduce latency by eliminating unneeded trips to memory

New Intel® Data Direct I/O Technology (Intel® DDIO)

Can more than Double I/O Performance1

1 Up to 2.3x I/O performance is 1S with a Xeon processor 5600 series vs. 1S Xeon Processor E5-2600 data for L2 forwarding test using 8x10GbE ports .See notes in backup for configuration details

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Page 8: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Intel® Xeon® Processor E5-2600 Product Family

8

Up to 4 channels DDR3 1600 Mhz memory

Up to 8 cores Up to 20 MB cache

Integrated PCI Express* 3.0

Up to 40 lanes per socket

The Heart of a Next-Generation Data Center

1 Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012.

2 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs. Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details

* Other names and brands may be claimed as the property of others

Up to 80% performance boost vs. prior gen1 Dramatically reduce compute time with Intel® Advanced Vector Extensions Performance when you need it with Intel® Turbo Boost Technology 2.0 Intel® Integrated I/O with Intel® Data Direct I/O cuts latency2 while adding capacity & bandwidth

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to intel.com/performance

Page 9: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 9

• Focused on high performance network

solutions

– Server adapters and software

– Supporting mission critical applications

• Trading / Market Data

• HPC Storage

• Cloud / Virtualization

• Big Data

• Leader in the Financial Services

– Powering Tier1 global exchanges

– Many top commercial banks / trading firms

• Growing position in Media / HPC / Oil & Gas

• World class delivery

– Global OEM/VAR and distributors

– Direct 24x7 Global support

“Solarflare’s product, EnterpriseOnload is a

robust, rigorously tested and fully supported

solution that addresses our demanding support

and service level requirements. In addition to

providing the highest-performance, lowest-

latency hardware, Solarflare’s unique and

innovative application acceleration software

can be used to deploy quickly without any need

to re-write our applications.”

Andrew Bach Senior Vice President of Network Services for NYSE Euronext

Introducing Solarflare

Page 10: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 10

Solarflare Server Adapters

Single Port SFP+

Dual Port 10GBASE-T Single Port 10GBASE-T

Quad Port IBM

Mezzanine Card Dual Port SFP+

Precision Time

Dual Port SFP+

• Full range of products

– Common driver support

– Onload Server Adapter product line

• Delivers best latency performance

– Performant Server Adapter product line

• Optimized for Virtualization, Cloud, HPC, Grid

• High performance

– Rich set of stateless off-loads

• LRO, TSO, RSS, RFS

– Microarchitecture designed for low latency

– Cut Through State Machine Centric Data Path

• Highly scalable virtualized architecture

– 2048 virtual NIC instances

– SR-IOV

• Lowest power in the industry

– <2.5W/port SFP+ Dual Port Dell

DCS Card HP Blade Mezz Card

Page 11: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 11

Precision Time Adapters

• Adapters implement IEEE 1588 PTP to provide precision

host clock synchronization

– Hardware time stamping of PTP packets

– Stratum 3 oscillator maintains high degree of precision

– Solarflare provided (and maintained) PTPd stack

– Open Platform (for 3rd party PTPd stack compatibilty)

– Compatible with standard Solarflare drivers

• Two stage approach provides unmatched accuracy and

stability

– Server clock synchronized to precision Stratum 3 adapter clock

– Adapter clock synchronized to server clock

– Maintains <+/- 200ns accuracy

• SFN6322F PTP server adapter

– Based on SFN6122F

• Same performance and latency characteristics

• Compatible with OpenOnload

SFN6322F

Page 12: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 12

OpenOnload® Application Acceleration Software

• Application Acceleration

• TCP/IP, UDP and multicast acceleration

• Streamlines and reduces interrupts, context

switches and data copies

• Reduces latency by 50%, increases message

rates 3x or more

• Seamlessly integrates into existing infrastructure

• Binary compatible with industry standard APIs

• No software modifications are needed

• Standards-based solution uses TCP/IP and UDP

• No specialized protocols needed

• Compatible with existing Ethernet infrastructure

• Open source GPLv2 / LGPL

• Global 24x7 support available

Page 13: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 13

SFN6122F & Xeon E5-2600 Deliver Winning Combination

• SFN6122F single-stream

latency is superb over all

message rates on Romley

platforms, right up to the

point of CPU core utilization

• Ultra-low jitter (sub-micro at

99Percentile)

• Benefits from Intel® Data

Direct I/O (DDIO) and

chipset IO – memory

bandwidth

• Message rate headroom –

20Mpps with 4x sfnt-streams sfnt-stream / openonload-201109-u2

“Westmere” = 2x Xeon 5687 (3.6GHz)

“Romley” = 2x E5-2687W (3.1GHz) – DDR 1333

“Lowest latency at highest message rate”

Page 14: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 14

The lowest TCP and UDP latency

Page 15: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 15

Bonding + VLANs + epoll and the lowest jitter

Page 16: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 16

The highest message rates

Page 17: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 17

• Resource contention

– Threads fighting for access to CPU

– Threads fighting for access to critical sections

– Running out of memory!

– Fix this by dedicating resources to critical threads, including:

• Memory

• CPU cores

• Onload stacks

• Queuing delays

– If you’re keeping up with incoming rate latency is generally good

– If you fall behind, you get queuing delays

– Fix this by:

• Making each thread more efficient (hard)

• Going parallel / hardware assist (very hard)

What are the causes of latency jitter?

Page 18: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 18

Moving to the new platform?

S1 S2

N

1 N

2

IOH

S1 S2

N

1

N

2

Westmere 2xCPU Romley 2xCPU

• Switching from SFN5xxx to SFN6xxx or Westmere to Romley ?

– Then first-order nothing changes

• Same methodology for Onload tuning

– But be aware of PCIe slot affinitisation

• Westmere 2Proc machines shared IOH / symmetric performance

• Romley 2Proc machines have asymmetric performance

Page 19: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 19

Additional Romley Tuning

S1 S2

N

1 N

2

IOH

S1 S2

N

1

N

2

Westmere 2xCPU Romley 2xCPU

• Check NIC is plugged into PCIe slot which is NUMA local to the application

threads which are processing data from that NIC

• If using interrupts, check that interrupts are directed to a core on the same

NUMA node

• If running RT ensure soft-irq threads are pinned to the same core as the

interrupts (start with nothing pinned!)

Page 20: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 20

Maximizing Performance involves “System Level” optimizations

• OEM BIOS Settings: SMI, HyperThreading, C-States- All Off

– Experiment with EIST & Turbo On/Off

• On the application: Maximize your resources by…

1. Pin Threads, Interrupts, and Processes to individual cores using CPU_ID

2. Place “communication” functions threads on adjacent cores

3. Use PCM to determine L3 Cache Misses & Keep data in L3 Cache

http://software.intel.com/file/41604

4. Compile w/Performance Settings, Use PGO, Evaluate IPP / SSE 4.2 Strings

http://software.intel.com/en-us/articles/using-avx-without-writing-avx-code/

• Determine how many cores your trading strategy requires

1. Can it run on 8 cores? If so, match up CPU+NIC per strategy

https://access.redhat.com/knowledge/solutions/53031

How to achieve the best performance - Intel

Enlist Solarflare and Intel for help. We are eager to engage.

Page 21: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

June 7, 2012 Slide 21

• Find support from Intel & Others @finteligent

• Debate critical industry questions

• Interact with your peers across the globe.

Join The Conversation & Find Support

Page 22: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Q & A

Page 23: Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

Thank You!

For Joining this Event

(A recording will be available later)