34
Presenters: Seth Howell, Ziye Yang Company: Intel

Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

Presenters: Seth Howell, Ziye Yang

Company: Intel

Page 2: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 2

Notices and DisclaimersIntel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration.

No computer system can be absolutely secure.

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks .

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks .

Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system.

Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo.

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.

© 2018 Intel Corporation. Intel, the Intel logo, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as property of others.

Page 3: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit

Agenda• SPDK NVMe-oF development

history & status

• SPDK RDMA transport enhancement

• SPDK TCP transport introduction

• Conclusion

3

Page 4: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit

Agenda• SPDK NVMe-oF development

history & status

• SPDK RDMA transport enhancement

• SPDK TCP transport introduction

• Conclusion

4

Page 5: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 5

19.01: TCP transport released

17.03 – 17.07 : Functional hardening

17.11 – 18.11: Rdma TRANSPORT IMPROVEMENTS

July 2016: Released with RDMA transport support

SDPK NVMe-oF Target Timeline

19.04: Continuing Improvement

Page 6: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 6

19.01: TCP Transport released

17.03 – 17.07 : Functional hardening (e.g., interoperability test with Kerneltarget)

17.11 – 18.11: Rdma TRANSPORT IMPROVEMENTs

Dec 2016: released with rdma transport support

SDPK NVMe-oF host Timeline

19.04: Continuing Improvement

Page 7: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 7

SPDK NVMe-oF target design highlightsNVMe* over Fabrics Target Features Performance Benefit

Utilizes user space NVM Express*

(NVMe) Polled Mode DriverReduced overhead per NVMe I/O

Group polling on each SPDK thread (binding on CPU core) for multiple transports

No interrupt overhead

Connections pinned to dedicated SPDK thread

No synchronization overhead

Asynchronous NVMe CMD handling in whole life cycle

No locks in NVMe CMD data handling path

Page 8: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit

Agenda• SPDK NVMe-oF Development

history & status

• SPDK RDMA transport enhancement

• SPDK TCP transport introduction

• Conclusion

8

Page 9: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 9

Shared Receive Queue Support• Disaggregate requests and buffers from queue pairs.

• Fix the majority of allocations to startup.

• Scaling connections is much cheaper.

• Can increase the cost of core scaling.

Page 10: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit

RDMA TransportShared resources: buffers for IOVs

Network Device 1 (SRQ Support)

Poller 1 (Poll Group 1)

Shared resources: recvs, reqs, in capsule data buffers

QP1

Resources: qpair struct

QP2

Resources: qpair struct

Poller 2 (Poll Group 2)

Shared resources: recvs, reqs, in capsule data buffers

QP3

Resources: qpair struct

Network Device 2 (No SRQ Support)

Poller 3 (Poll Group 1)

Shared resources: None

QP1

Resources: qpair struct, recvs, reqs, in capsule data buffers

10

NVMe-oF RDMA Transport Target Architecture

Page 11: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 11

0

10

20

30

40

50

8 16 24 32 40 48 56 64

ME

MO

RY

OV

ER

HE

AD

(M

IB)

NUMBER OF QPAIRS

Connection Scaling

Traditional vs. SRQ

Traditional Shared

0

50

100

150

200

1 2 3 4 5 6 7 8ME

MO

RY

OV

ER

HE

AD

(M

IB)

NUMBER OF CORES

Core Scaling Traditional vs.

SRQ

Traditional SRQ

Scaling Connections and Cores

Page 12: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 12

Further Enhancements

Completed

– Send With Invalidate Support

– Multi-SGE in both Target and Initiator

Ongoing and Future

– Support for an increasing number of NICs from multiple vendors

– SEND and RECV operation batching

– Automated performance testing

– Extended verbs interface

– NVMe-oF offload support

Page 13: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit

Agenda• SPDK NVMe-oF Development

history & status

• SPDK RDMA transport enhancement

• SPDK TCP transport introduction

• Conclusion

13

Page 14: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 14

General design and implementation

• Follow the SPDK transport abstraction:

• Host side code: lib/nvme/nvme_tcp.c

• Target side code: lib/nvmf/tcp.c

Transport Abstraction

FC

Posix

RDMA TCP

VPP

Page 15: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 15

Performance design consideration for TCP transport in target side

Ingredients Methodology

Design framework Follow the general SPDK NVMe-oFframework (e.g., polling group)

TCP connection optimization Use the SPDK encapsulated Socket API (preparing for integrating other stack, e.g., VPP )

NVMe/TCP PDU handling Use state machine to track

NVMe/TCP request life time cycle Use state machine to track(Purpose: Easy to debug and good for further performance improvement)

Page 16: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 16

TCP PDU Receiving handling for each connection

Ready

Handle CH

Handle PSH

Handlepayload

ErrorState

Error Path

Has

Payload?

No

Payload?

Page 17: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit

SPDK NVMe-oF TCP request life cycle of each connection in target side

Page 18: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 18

Free

NewNeed Buffer

PendingR2t

Ready toExecute in NVMe

Need buffer?

Executing in NVMe

driver

Transfer data from Host (get data for write

cmd)

Readyto

complete

Completed

Error Path

Read cmd?

Write withoutr2t sending

Write Needr2t sending

Send R2T

Get dataerror

Not a validcmd

Recycling

resource

Executed in NVMe

driver

Transfer Response

PDU to host

Page 19: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 19

SPDK Target side (TCP transport): I/O scaling

System configuration: (1) Target: server platform: SuperMicro SYS2029U-TN24R4T; 2x Intel® Xeon® Platinum 8180 CPU @ 2.50 GHz, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 4x 2GB DDR4 2666 MT/s, 1 DIMM per channel; 2x 100GbE Mellanox ConnectX-5 NICs; Fedora 28, Linux kernel 5.05, SPDK 19.01.1; 6x Intel® P4600TM P4600x 2.0TB; (2) initiator: Server platform: SuperMicro SYS-2028U TN24R4T+; 44x Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (HT off); 1x 100GbE Mellanox ConnectX-4 NIC; Fedora 28, Linux kernel 5.05, SPDK 19.0.1. (3) : Fio ver: fio-3.3; Fio workload: blocksize=4k, iodepth=1, iodepth_batch=128, iodepth_low=256, ioengine=libaio or SPDK bdevengine, size=10G, ramp_time=0, run_time=300, group_reporting, thread, direct=1, rw=read/write/rw/randread/randwrite/randrw

120.5

151.2 153.5 158.2132.4

105.4 103.8 100.8

0.0

20.0

40.0

60.0

80.0

100.0

120.0

140.0

0.0

20.0

40.0

60.0

80.0

100.0

120.0

140.0

160.0

180.0

1 2 3 4

Lin

es

-a

ve

rag

e l

ate

ncy

(use

c)

Th

rou

gh

pu

t

(IO

PS

in

k)

# of CPU Cores

4K 100% Random 70% Read 30%

Write QD=1

229.4

475.2

672.0

812.62224.0

1067.5

754.1624.5

0.0

500.0

1000.0

1500.0

2000.0

2500.0

0.0

100.0

200.0

300.0

400.0

500.0

600.0

700.0

800.0

900.0

1 2 3 4

Lin

es

-a

ve

rag

e l

ate

ncy

(use

c)

Th

rou

gh

pu

t

(IO

PS

in

k)

# of CPU Cores

4K 100% Random 70% Read 30%

Write QD=32

Page 20: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 20

SPDK host side (TCP transport): I/O scaling

System configuration: (1) Target: server platform: SuperMicro SYS2029U-TN24R4T; 2x Intel® Xeon® Platinum 8180 CPU @ 2.50 GHz, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 4x 2GB DDR4 2666 MT/s, 1 DIMM per channel; 2x 100GbE Mellanox ConnectX-5 NICs; Fedora 28, Linux kernel 5.05, SPDK 19.01.1; 6x Intel® P4600TM P4600x 2.0TB; (2) initiator: Server platform: SuperMicro SYS-2028U TN24R4T+; 44x Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (HT off); 1x 100GbE Mellanox ConnectX-4 NIC; Fedora 28, Linux kernel 5.05, SPDK 19.0.1. (3) : Fio ver: fio-3.3; Fio workload: blocksize=4k, iodepth=1, iodepth_batch=128, iodepth_low=256, ioengine=libaio or SPDK bdevengine, size=10G, ramp_time=0, run_time=300, group_reporting, thread, direct=1, rw=read/write/rw/randread/randwrite/randrw

192.1

401.4

603.8

845.4

1324.5

1233.91217.4

1140.8

1000.0

1050.0

1100.0

1150.0

1200.0

1250.0

1300.0

1350.0

0.0

100.0

200.0

300.0

400.0

500.0

600.0

700.0

800.0

900.0

1 2 3 4

Lin

es

-av

era

ge la

ten

cy

(use

c)

Thro

ugh

pu

t(I

OP

S in

k)

# of CPU Cores

4K Random 70% Read 30% Write QD=256

Page 21: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 21

Latency comparison between SPDK and kernel ()

System configuration: (1) Target: server platform: SuperMicro SYS2029U-TN24R4T; 2x Intel® Xeon® Platinum 8180 CPU @ 2.50 GHz, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 4x 2GB DDR4 2666 MT/s, 1 DIMM per channel; 2x 100GbE Mellanox ConnectX-5 NICs; Fedora 28, Linux kernel 5.05, SPDK 19.01.1; 6x Intel® P4600TM P4600x 2.0TB; (2) initiator: Server platform: SuperMicro SYS-2028U TN24R4T+; 44x Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (HT off); 1x 100GbE Mellanox ConnectX-4 NIC; Fedora 28, Linux kernel 5.05, SPDK 19.0.1. (3) : Fio ver: fio-3.3; Fio workload: blocksize=4k, iodepth=1, iodepth_batch=128, iodepth_low=256, ioengine=libaio or SPDK bdevengine, size=10G, ramp_time=0, run_time=300, group_reporting, thread, direct=1, rw=read/write/rw/randread/randwrite/randrw

52.86

41.09

50.42

66.79

51.43

59.82

0.00 20.00 40.00 60.00 80.00

4k 70% Reads 30%Writes

4k 100% RandomWrites

4k 100% RandomReads

Latency (usecs)

Average Latency ComparisonsSPDK Target vs Kernel Target

Kernel Target

SPDK Target40.99

31.14

35.58

52.86

41.09

50.42

0.00 20.00 40.00 60.00

4k 70% Reads30% Writes

4k 100% RandomWrites

4k 100% RandomReads

Latency (usecs)

Average Latency ComparisonsSPDK Initiator vs Kernel Initiator

Kernel Initiator

SPDK Initiator

Page 22: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 22

Iops/coRE comparison between SPDK and kernel in target side

System configuration: (1) Target: server platform: SuperMicro SYS2029U-TN24R4T; 2x Intel® Xeon® Platinum 8180 CPU @ 2.50 GHz, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 4x 2GB DDR4 2666 MT/s, 1 DIMM per channel; 2x 100GbE Mellanox ConnectX-5 NICs; Fedora 28, Linux kernel 5.05, SPDK 19.01.1; 6x Intel® P4600TM P4600x 2.0TB; (2) initiator: Server platform: SuperMicro SYS-2028U TN24R4T+; 44x Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (HT off); 1x 100GbE Mellanox ConnectX-4 NIC; Fedora 28, Linux kernel 5.05, SPDK 19.0.1. (3) : Fio ver: fio-3.3; Fio workload: blocksize=4k, iodepth=1, iodepth_batch=128, iodepth_low=256, ioengine=libaio or SPDK bdevengine, size=10G, ramp_time=0, run_time=300, group_reporting, thread, direct=1, rw=read/write/rw/randread/randwrite/randrw

100.00% 100.00% 100.00% 100.00%

237.75%255.12%

241.00% 240.96%

0.00%

50.00%

100.00%

150.00%

200.00%

250.00%

300.00%

1 4 16 32

Re

lati

ve p

erf

orm

ance

# of connections

4k Random 70% Reads 30% Writes, QD=32

Kernel

SPDK

Page 23: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 23

Further development plan

• Continue enhancing the function feature

• Including the compatible test with Linux kernel solution.

• Performance tuning

• Deep integration with user space stack: VPP + DPDK

• Use the hardware features of NICs for performance improvement (e.g., VMA from Mellanox’s NIC, load balance features from Intel’s 100Gbit NIC)

• Figuring out offloading methods with hardware, e.g., FPGA, Smart NIC, and etc.

Page 24: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit

Agenda• SPDK NVMe-oF Development

history & status

• SPDK RDMA transport enhancement

• SPDK TCP transport introduction

• Conclusion

24

Page 25: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 25

Conclusion

• SPDK NVMe-oF solution is well adopted by the industry. In this presentation, followings are introduced, i.e.,

• RDMA transport enhancement

• Newly TCP transport development.

• Further development

• Continue following the NVMe-oF spec and adding more features.

• Continue performance enhancements and integration with other solutions.

• Call for activity in community

• Welcome to bug submission, idea discussion and patch submission for NVMe-oF

Page 26: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design
Page 27: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 27

Backup

Page 28: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 28

• NVMe over Fabrics Target

• July 2016: Released with RDMA transport support

• July 2016 – Oct 2018:

• Hardening: e.g., Intel test infrastructure construction, Discovery simplificationCorrectness & kernel interop)

• Performance improvements: e.g., Read latency improvement, Scalability validation, event framework enhancements, multiple connection performance improvement with group polling

• Jan 2019: Release with TCP transport support

• Compatibility: Support both kernel & SPDK host on TCP transport

• Optimization: Support integrating different TCP stack (e.g., Kernel, VPP)

• NVMe over Fabrics Host (Initiator)

• Dec 2016: Released with RDMA transport support

• Dec 2016 – Oct 2018:

• Hardening: Interoperability test with Kernel/SPDK NVMe-oF target

• Performance improvements: e.g., zero copy, SGL support

• Jan 2019: Release with TCP transport support

• Compatibility: Support both kernel & SPDK target on TCP transport

SDPK NVMe-oF Timeline

Page 29: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 29

General design and implementation• It is very easy to add a new transport in SPDK NVMe-oF solution, we need to

follow the SPDK transport abstraction:

• Host side: Implement transport functions defined in lib/nvme/nvme_transport.cand code is in lib/nvme/nvme_tcp.c

• Target side: Implement the transport related functions defined in lib/nvmf/transport.h and code is in lib/nvmf/tcp.c

Transport Abstraction

FC

Posix

RDMA TCP

VPP

Page 30: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 30

General design and implementation• Performance design consideration for TCP transport in target side

• Follow the general SPDK NVMe-oF framework, e.g., via independent polling group on each thread.

• TCP connection optimization (e.g., read/write) for each qpair.

• Use the SPDK encapsulated Socket API.

• Purpose: To reserve the interface that users can use different TCP stack (e.g., VPP) for further optimization.

• Use state machine to track NVMe/TCP Receiving PDU status.

• NVMe command handling on each qpair.

• Use state machine to track the Life cycle of NVM request

• Benefit: Easy for debugging and further performance improvement.

Page 31: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 31

SPDK NVMe-oF TCP request life cycle of each connection in target side Free

New Need

Buffer

Pending

R2t

Ready toExecute in

NVMe

Need buffer ?

Executing in NVMe

driver

Executed in NVMe

Transfer data from Host (get data for

write cmd)

Readyto complete

Transfer

PDU to

host

Completed

Error Path

Read cmd?Write withoutr2t sending

Write Need r2t sending

SendR2T

Get dataerror

Not a validcmd

Recycling

resource

Page 32: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 32

Benchmarks – latency (need be updated)

System configuration: 44x Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (HT off); Cores per socket: 22; 16x Samsung 8GB DDR4 @2400 1x Intel SSD DC P3700 Series 375GB @ FW E2010320 SPDK:18.10; Host Dist/Kernel: Fedora 28/Kernel 4.18.16-200; Guest Dist/Kernel: Fio ver: fio-3.6-34; Fio workload: blocksize=4k, iodepth=1, iodepth_batch=128, iodepth_low=256, ioengine=libaio, size=10G, ramp_time=0, run_time=300, group_reporting, thread, numjobs=1, direct=1, rw=read/write/rw/randread/randwrite/randrw

Read Write RW Randread Randwrite Randrw

RDMA 77.25 10.59 48.9 21.59 10.34 88.37

TCP 46.76 34.92 89.62 94.09 44.24 156.52

TCP(VMA) 13.45 13.89 29.34 63.12 13.69 87.44

0

20

40

60

80

100

120

140

160

180

La

ten

cy (

us)

QD=1, MLX5, IO SIZE =4KB RDMA/TCP/TCP(VMA)

Page 33: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design

SPDK, PMDK & Vtune™ Amplifier Summit 33

Benchmark – iops (Need to be updated)

System configuration: 44x Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (HT off); Cores per socket: 22; 16x Samsung 8GB DDR4 @2400 1x Intel SSD DC P3700 Series 375GB @ FW E2010320 SPDK:18.10; Host Dist/Kernel: Fedora 28/Kernel 4.18.16-200; Guest Dist/Kernel: Fio ver: fio-3.6-34; Fio workload: blocksize=4k, iodepth=1, iodepth_batch=128, iodepth_low=256, ioengine=libaio, size=10G, ramp_time=0, run_time=300, group_reporting, thread, numjobs=1, direct=1, rw=randread/randwrite/randrw(50% read, 50% write)

Read Write RW Randread Randwrite Randrw

RDMA 705 421 381 479 348 300.2

TCP 97 137 104.9 87 89.7 101.6

TCP(VMA) 338 375 290 316 340 276

0

100

200

300

400

500

600

700

800

IOP

S (

K)

QD=128, MLX5, IO SIZE = 4KB RDMA/TCP/TCP(VMA)

Page 34: Presenters: Seth Howell, Ziye Yang Company: Intel · SDPK NVMe-oF host Timeline 19.04: Continuing Improvement. SPDK, PMDK & Vtune™ Amplifier Summit 7 SPDK NVMe-oF target design