NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect...

Preview:

Citation preview

NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage with LightOS

Alex ShpinerSystem ArchitectLightbits Labsalex@lightbitslabs.com

● Founded

● Key milestones:

● 80 Employees :

● Locations

● Funding

We are hiring!

●●●●●●

Optional hardware acceleration for SSD management and data services

High performance, low latency Global Flash Translation Layer with data services

High performance, low latency NVMe/TCP target

NVMe/TCP targetGlobal FTL with Rich

Data Services

Optional hardware acceleration for SSD management and data services

High performance, low latency Global Flash Translation Layer with data services

High performance, low latency

NVMe/TCP target Standard TCP/IP

Network (no RDMA required)

Standard NVMe/TCP client

driver

NVMe/TCP targetGlobal FTL with Rich

Data Services

NVMe/TCP targetGlobal FTL with Rich

Data Services

NVMe/TCP targetGlobal FTL with Rich

Data Services

With Application Replication

v1.xdo

replicate

No Application Replication

●●●

○●

v2.x

With Application Replication No Application Replication

v1.xdo

replicate

●○

Storage server level protection Storage server failure via LightOS Clustering

SSD level protection SSD failure via Global FTL Erasure Coding

v2.x

v1.x

NVMe/TCP targetGlobal FTL with Rich

Data Services

NVMe/TCP targetGlobal FTL with Rich

Data Services

NVMe/TCP targetGlobal FTL with Rich

Data Services

●●

○○ All clients continue working!

● Inherit storage services from LightOS 1.x● High performance and low latency

○ Single hop reads ○ Two hop writes (user + replications)

NVMe/TCP target

Global FTL with Rich

Data ServicesNVMe/TCP target

Global FTL with Rich

Data ServicesNVMe/TCP target

Global FTL with Rich

Data ServicesNVMe/TCP target

Global FTL with Rich

Data ServicesNVMe/TCP target

Global FTL with Rich

Data ServicesNVMe/TCP target

Global FTL with Rich Data Services

● Standard unmodified clients and network○ Leveraging standard NVMe-1.4 and NVMeoF 1.1○ Transparent failover via multipath with Asymmetric

Namespace Access (ANA)

● Distributed and fault tolerant storage servers○ Automatic volume assignment○ Failure domains○ Management○ Discovery service

● Multi-replica volumes

● Each replica is stored on a separate storage server

LightOS Cluster

Storage Server Storage Server

Storage Server Storage Server

Storage Server Storage Server

Storage Server Storage Server

Storage Server Storage Servervol1_replica_2 vol1_replica_3vol1_replica_1

vol1_replica_1

vol1_replica_2

vol1_replica_3

● Different groups of storage servers can be impacted by common elements that share a point of failure:○ Network○ Power○ Geographical

● User defined server assignments to specific failure domain groups.

● Configured via labels assigned to servers, reflecting common dependencies.○ rack_01, rack_02, …○ power_0, power_1, ...

● Replicas are placed in different failure domains.

LightOS Cluster

rack_01

rack_02

rack_03

rack_04

rack_05

Storage Server Storage Server

Storage Server Storage Server

Storage Server Storage Server

Storage Server Storage Server

Storage Server Storage Server

vol1_replica_1

vol1_replica_2

vol1_replica_3

●●●●●

LightOS Cluster

Storage Server Storage Server Storage Server

NVMe/TCP Client

Secondary Secondary Primary

Writes Reads

●○ partial rebuild

LightOS Cluster

Storage Server Storage Server Storage Server

NVMe/TCP Client

Secondary Secondary Primary

Writes Reads

Temporary Failure

“Partial rebuild” Only the necessary

data is sent

●○ Symmetric○ Asymmetric

■■

■● LightOS leverages NVMe ANA for Clustering

○○

○ Failure Handling

LightOS Cluster

Storage Server Storage Server Storage Server

NVMe/TCP Client

Secondary Secondary Primary

●○ Symmetric○ Asymmetric

■■

● LightOS Leverages NVMe ANA for Clustering○○

○ Failure Handling

LightOS Cluster

Storage Server Storage Server Storage Server

NVMe/TCP Client

Secondary Secondary Primary

Failure

●○ Symmetric○ Asymmetric

■■

● LightOS Leverages NVMe ANA for Clustering○○

○ Failure Handling

LightOS Cluster

Storage Server Storage Server Storage Server

NVMe/TCP Client

Secondary Primary Secondary

Failure

●○

○●

○○○

●○

○●

LightOS Cluster

Storage Server Storage Server Storage Server

NVMe/TCP Client

Cluster Management DiscoveryAPI

Initial state

Missing

Initial state

Failover

Contact information

○ lsblk nvme list

● optimized inaccessbile

○ nvme list-subsys <dev>

Recommended