28
© © 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18 ‘’ NVMe Takes It All, SCSI Has To Fall ’’ freely adapted from ABBA Brave New Storage World Alexander Ruebensaal Lugano April 2018 1

NVMe Takes It All, SCSI Has To Fall'' Brave New Storage World

Embed Size (px)

Citation preview

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

‘’NVMe Takes It All, SCSI Has To Fall’’freely adapted from ABBA

Brave New Storage World

Alexander Ruebensaal

Lugano April 2018

1

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

ABC Systems AG

Design, Implementation, Support & Operating of optimized IT Infrastructures

- HA & HP - allowing for fail-safe Transportation of the Applications … since 1981

2

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

64GB/s

In the Year 2012 …

NVMePCIe SSD

3

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

EDSFF Enterprise & DataCenter SFF

[ Ruler ]

NGSFFNext Generation SFF

[ M.3 ]

Six Years After …

Non-Volatile Memory Express NVMe SSD

NVMe is an innovative Host Controller Interface to use SSD natively over PCIe. Mainly, it allows for acceleration due to parallelism resulting in reduced I/O overhead and latency.

M.2

PCIe

U.2

2.5’’

4

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

EDSFF Enterprise & DataCenter SFF

[ Ruler ]

Why NGSFF and EDSFF?

NGSFFNext Generation SFF

[ M.3 ]

U.2[ 2.5’’ ]

• Less complicated chassis• Reduced component cost per SSD• Simple hot swap with high density capabiltites

• No costly drive cages with failure points• No cables to SSDs• Eliminate the backplane with cooling holes• Simplified thermal implementation

5

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

INTEL OPTANE NVMe

PCIe AIC U.2 2.5’’

6

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

NVMe SSD against SAS SSD …

10x NVMe1U

24x NVMe2U

48x NVMe2U

7

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

NVMe Storage Protocol is designed to take full Advantage of Flash

NVMe supports 64K commands per queue (SAS 256, SATA 32) and up to 64K queues. These queues are designed such that I/O commands and responses to those commands operate on the same processor core and can take advantage of the parallel processing capabilities of multi-core processors. Each application or thread can have its own independent queue, so no I/O locking is required. NVMe also supports MSI-X and interrupt steering, which prevents bottlenecking at the CPU level and enables massive scalability as systems expand.

NVMe has a streamlined and simple command set that uses less than half the number of CPU instructions to process an I/O request that SAS or SATA does, providing higher IOPS per CPU instruction cycle and lower I/O latency in the host software stack. NVMe also supports enterprise features such as reservations and client features such as power management, extending the improved efficiency beyond just I/O.

Text & Graphics from http://nvmexpress.org

8

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

PCIe 3.0 Bus - 64GB/s

A Single NVMe

9

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

NVMe uses CPU Lanes directly

CPU – Bus – NVMe Flash

or

CPU - Bus – FC-HBA –Switche(s) – FC-HBA – RAID Ctrl –SAS Enclosure – Disk

8x SAS NVMe already saturate the SAS-Bus …

-> Effect is reduced to Electronics vs Mechanics!

1 NVMe uses 4 CPU Lanes

Broadwell 40 LanesSkylake 48 LanesEPYC 128 Lanes

10

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

NVMe – new Level of Performance

11

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

Intel Xeon Scalable Processors –F (With OmniPath)

Single on-package OmniPath interface Incremental to existing 48 PCIe Lanes Single cable connection to QSFP I/O module Same socket for Skylake & Skylake-F processors

12

Are the NVMe too strong, are the CPU too weak …

Lanes: 1 CPU 48 – 1 NVMe 4

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

AMD EPYC CPU

8~32 “Zen” Cores

TDP 120W~180W

8 Memory Channels

Up to 2TB per CPU

Dedicated Security Engine

Lanes of High Bandwidth I/O 128

How to use them?

1x NVMe 4 Lanes < 3’500MB/S / 3’938MB/s 89%1x SATA SSD 1 Lane max. 32 directly supported by CPU < 540MB/s / 985MB/s 55%1x 100GbE 16 Lanes < 12’500MB/s / 15’754MB/s 79%2x 25GbE 8 Lanes < 6’250MB/s / 7’877MB/s 79%2x 10GbE 8 Lanes [standard Interface for comparison] < 2’500MB/s / 7’877MB/s 32%

13

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

Storage Centric Solution Design – Don’t waste Lanes!

Balanced Designs for Multi-Socket Server Solutions, regardless of CPU Vendor, is a huge Optimization Challenge!

1x NVMe 4 Lanes < 3’500MB/S / 3’938MB/s 89%1x SATA SSD 1 Lane max. 32 directly supported by CPU < 540MB/s / 985MB/s 55%1x 100GbE 16 Lanes < 12’500MB/s / 15’754MB/s 79%2x 25GbE 8 Lanes < 6’250MB/s / 7’877MB/s 79%2x 10GbE 8 Lanes [standard Interface for comparison] < 2’500MB/s / 7’877MB/s 32%

Reads Writes Reads Writes 112 TB IOPS GB/s Gbps 112 TB Mio. GB/s Gbps 112 TB IOPS GB/s Gbps

NVDIMM 32GB 17

NVMe SSD 4 Lane 11TB 800 95 3.35 2.4 64 176 12.8 53.6 48 132 9.6 40.2

SATA SSDmax. 32/CPU

1 Lane 8TB 93 74 0.54 0.52 32 192 2.9 3

100GbE 16 Lane 32 200 200 48 300

25GbE 4 Lane

16 Lane 2x

8 lane 2x

Total 112 192 2.9 3 200 112 176 29.8 53.6 200 112 132 9.6 40.2 300

R -world performance is, of course, application, workload and file system depend Assumption: 112 net Lanes availabe of 128

C A P A C I T Y I O P S

full bandwidth

8 Lane for 2x 25GbE

48

T H R O U G H P U T

16

Component ParameterK IOPS Random GB/s Sequential

1100

PCIe Slots 48

14

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

Conceptual NVMe-Server Design

Universel Department Store Servers Purpose-built Servers for Efficiency

might be over- or wrong-sized for SDS cost, performance, power, space etc. effective

allow for lean Architecture

… HPC, 10’000s in biggest DCs

e.g. choose from > 100 NVMe ServersAMD and INTEL CPUs

15

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

36x NVMe NGSFF

2x Intel Xeon Scalable CPU3x UPI, <10.4GT/s24x DIMM up to 3TB2x PCIe x16, 1x PCIe x82x 10GBase-T

NVMe < 576TB < 352TB < 1’080TB

< 10 million IOPS

32x NVMe U.2

2x Intel Xeon Scalable CPU3x UPI, <10.4GT/s24x DIMM up to 3TB2x PCIe x162x 10GBase-T

32x NVMe EDSFF

2x Intel Xeon Scalable CPU3x UPI, <10.4GT/s24x DIMM up to 3TB2x PCIe x162x 10GBase-T

1U 1U

From Storage-Server to Server-Storage

16

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

48x NVMe U.2 Dual Port 2x Nodes:2x Intel Xeon E5-2600v4 CPU2x QPI, <9.6GT/s16x DIMM up to 2TB1x PCIe x16, 1x PCIe x82x 10GBase-T, SIOM

24x NVMe U.2 4x Nodes:2x Intel Xeon Scalable CPU3x UPI, <10.4GT/s24x DIMM up to 3TB2x PCIe x162x 10GBase-T

NVMe < 528TB < 264TB < 528TB

2U 2U

48x NVMe U.2

2x Intel Xeon E5-2600v4 CPU2x QPI, <9.6GT/s24x DIMM up to 3TB2x PCIe x16, 1x PCIe x8, SIOMSIOM (e.g. 2x 25GbE, 2x 10GbE)

From Storage-Server to Server-Storage

17

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

From Storage-Server to Server-Storage

20x NVMe U.2 7mm

2x Intel Xeon Scalable CPU3x UP up 10.4GT/s24x DIMM up to 3TB2x PCIe x82x 25GBe

< 80TB

1U All-NVMe & GPU Server on ABC booth

4x V100, P100,P40, M10 ...

Storage changes to

- SERVER-CENTRIC

- SOFTWARE-DEFINED

RAID Protection

JBOF Just a Bunch of Flash

NVMe-oF - NVMEe over Fabric

Holisitc Data Management

HCI Hyper Converged Infrastructure

18

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

IBM Spectrum ScaleAcceleration

Achieved with 24x NVMe:The only sub-millisecond overall response time at 0.69ms ORT! 2.5x more builds than other Spectrum Scale storage options. Higher IOPS and throughput than all other SPEC SFS2014_swbuild results.Soltuion available as Appliance or Software only.

NVMe-oF - NVMe over Fabric

RDMAFC

The goal of NVMe over Fabrics is to provide distance connectivity to NVMe devices with no more than 10 micro-seconds (µs) of additional latency over a native NVMe device inside a server.

Use Cases

- A storage system comprised of many NVMe devices, using NVMe over Fabrics with either an RDMA or Fibre Channel interface, making a complete end-to-end NVMe storage solution. This system would provide extremely high performance while maintaining the very low latency available via NVMe.

- Usage of NVMe over Fabrics to achieve the low latency while connected to a storage subsystem that uses more traditional protocols internally to handle I/O to each of the SSDs in that system. This would gain the benefits of the simplified host software stack and lower latency over the wire, while taking advantage of existing storage subsystem technology.

19

Text & Graphics from http://nvmexpress.org

Low Latency Networking

Storage Accelerations, leveraging hardware offloads for NVMe- ConnectX adapters support NVMe-oF <100Gbps- BlueField (SoC) Smart NIC 2x 25GbE combines

ConnectX5 with ARM CPU

NVMesh Reference Architecture- near server-local performance

in a linear scale-out remote standard NVMe solution.NVMesh RA provides the flexibility to create and manage a single, centralized pool of storage, create “right-sized” logical volumes, and even share storage resources with existing compute resources. Also supporting existing applications without changes.

NVMe-oFFC-HBA

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

All-NVMe JBOF

4x PCIe-Bus Extension PCIe 3.0 x16

Client Network 2x EDR IB 100Gbps / 100/40GbE PCIe 3.0 x16

Cluster Interconnect2x EDR IB 100Gbps PCIe 3.0 x16

Sync. Mirror

64 GB/s> 36 Mio. IOPS • 64, 128TB

• …• 256TB• 512 TB• 1’024TB

1U 32-bay JBOF Just a Bunch Of Flash

NVMe SSD U2 hot-swap NVMe SSD EDSFF hot-swap

Capacity CacheDrives

10x more Performance with 3D XPoint™ OPTANE Technology than NAND via PCIe* NVMe*

20

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

The Supermicro JBOF supports up to 12 direct attached hosts, making this the go-to storage platform for any high-performance computing application.

Alternatively, the dual PCI-E 3.0 x16 slots can support dual NVMe-oF add-on-cards to enable additional deployment scenarios.

4 Mini-SAS HD x16 ports , 2 PCI-E 3.0 x16 Slots, 2 IPMI ports

Supermicro JBOF 32x NVMe SSD U.2

21

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

RAID Protection

With a 3.2x lower Annualized Failure Rate (AFR) -SASTA SSD - compared to HDD, IT departments will spend less time and expense replacing or upgrading storage devices.

22

Flash Technology

• More reliable, less Replacements

• Higher Throughput, faster Rebuilds

RAID Approach

• Hardware-Defined (RAID-Controller)

• Software-Defined (SDS)

• Hybrid: Intel VROC Virtual RAID on CPU

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

RAIDFunctionin VMD Volume Management Device

Intel Virtual RAID on CPU – VROC

23

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

HCI Hyper Converged Infrastructure

8x2U 4-Node Server. Dual CPU. 3UPI <10.4GT/s. 24x DIMM. 6x NVMe U.2. 1x PCIe Extension x16. 2x 10GbE…

1U JBOF 32x NVMe. 4x Mini-SAS HD x16 ports. 2x PCI-E 3.0 x16 Slots

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

Data Management – not only Data Storing

Conceptual Optimization

NVMe

LTO LTFS

25

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

Main-stream

Directions of Movementsup to… TB/drive

TB in 1UServer

or JBOF

TB in 4UServer

or JBOD

EDSFF [Ruler] 32 1080

NGSFF [M.3] 16 576

U.2 2.5" 11

M.2 2

AIC Add-in Card 8

SAS 8

SATA 11

15K

10K 1.8

2.5" 2

3.5" 12 1080

12 - 30

10

Parameter

Storage-Technology

RAMVolatile

Non-Volatile

Flash

SSD2.5"

NVMe

Disk

SAS2.5"

NL SASSATA

Tape

LTO

IBM TS1150 [Jaguar]

26

Gotthardpost, 1873Johann Rudolf Koller, 1828-1905

https://en.wikipedia.org/wiki/Rudolf_Koller

Change Horses, add Horses

or use the Gotthard Tunnel …

NVMe Takes It All, SCSI Has To Fall

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

NVMe Takes It All, SCSI Has To Fall

Cold Data

Lukewarm Data

HotData

NVMe

NL SASSATA

LTO[ LTFS ]

SomewhatHot Data

SASFC

SATA SSD

27

• The PCIe Bus is in the Server• NVMe is the Protocol for Flash

• 50-100TB NVMe• PCIe 4.0

‘’Flat screens

vs Displays’’

©© 2018 ABC SYSTEMS AG. All Rights reserved. 11.4.18

Headquarter Zurich Branch Office Berne

Ruetistrasse 28 Giessereiweg 9CH-8952 Schlieren CH-3007 Bern

Tel +41 43 433 6 433 Tel +41 31 3 700 600

http://www.ABCsystems.ch [email protected]

Alexander Ruebensaal [email protected]

… simplify and win with us and our Partners

Other names and brands may be claimed as property of others.

Spectrum ScaleSpectrum Protect

28