12
UCX enhanced NVMe-over-Fabric Target in SPDK Xinle Du Dai Zhang Minhu Wang Tsinghua University Team August 2020

UCX enhanced NVMe-over-Fabric Target in SPDK...Have an understanding on the concepts in NVMe-oF spec. Know the workflow of spdk, including the logic inside NVMe-oF target library,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • UCX enhanced NVMe-over-Fabric Target

    in SPDK

    Xinle Du Dai Zhang Minhu WangTsinghua University Team

    August 2020

  • BackgroundNVMe and NVMe-over-Fabric(NVMe-oF)

    NVM Express* (NVMe) block protocol is designed to access

    local SSD block device over PCIe in a fast and efficient way.

    And after we got a great local protocol, we then look forward to

    extend it for remote accessing. Thus we got NVMe-oF.

    The NVMe over Fabrics (NVMe-oF) protocol extends

    the parallelism and efficiencies of the NVM Express*

    (NVMe) block protocol over network fabrics such as

    RDMA (iWARP, RoCE), InfiniBand™, Fibre Channel,

    TCP and Intel® Omni-Path.

    which allows us e.g. to read/write *remote* SSD in low-latency.

  • Backgroundspdk

    The Storage Performance Development Kit (SPDK) is a library that:

    Implement Nvme and Nvme-oF protocol.

    Allow for writing high performance, scalable, user-mode storage application easily.

    Achieve high performance by zero-copy, polled-mode, asynchronous and lockless.

    We focus on the NVMe-oF part, which is the SPDK NVMe-oF target library (lib/nvmf).

    The NVMe-oF specification is designed to allow for many different network fabrics, thus the

    SPDK NVMe-oF target library implements a plugin system for easily adding new fabrics and

    network transports in the future.

    The API a new transport must implement is located in lib/nvmf/transport.h

    We can find an example NVMe-oF target application in app/nvmf_tgt.

  • Competition TaskWhat we need to do…

    Extend a new transport (a concept in NVMe-oF spec) with UCX to

    the SPDK NVMe-oF target library.

    However, it is quite challenging, cause we need to:

    Have an understanding on the concepts in NVMe-oF spec.

    Know the workflow of spdk, including the logic inside NVMe-oF target library, and

    how it interoperates with the other part of the project.

    Be familiar with the tricks and design patterns in product-level codes to improve code reuse, performance and resource efficiency.

    Understand the framework and interfaces of UCX to apply and fit it into.

  • Dive in…Concepts in NVMe-oF Spec and spdk Impl

    NVMe-oF specification defines a lot of concepts which helps us understand the implementation of the NVMe

    NVMe-oF target: the whole collection of the abstract SSD storage server. (struct spdk_nvmf_tgt).

    NVMe-oF subsystem and namespace: access control related concepts, we don’t care about them

    much here.(struct spdk_nvmf_subsystem and struct spdk_nvmf_ns).

    NVMe-oF transport: An abstraction for a network fabric. The NVMe-oF specification defines multiple

    network transports (the "Fabrics" in NVMe over Fabrics), and spdk has an extensible system for

    adding new fabrics in the future.(struct spdk_nvmf_transport).

    Currently, spdk implemented RDMA transport and TCP transport (lib/nvmf/rdma.c, lib/nvmf/tcp.c).

    NVMe-oF queue pair: defined by the NVMe-oF specification, map 1:1 to network connections.

    It is similar to the concept of socket in the scope of TCP. (struct spdk_nvmf_qpair).

  • Dive in…Concepts in NVMe-oF Spec and spdk Impl

    NVMe-oF specification defines a lot of concepts which helps us understand the implementation of the NVMe

    Poll group: An abstraction for a collection of network connections that can be polled as a unit.

    Spdk chooses to check for incoming data on groups of connections than checking each one individually

    (e.g. epoll) in order to improve efficiency and scale to large numbers of connections, so poll groups

    provide a generic abstraction for that. All new qpairs assigned to the poll group are given their own

    RDMA send and receive queues, but share this common completion queue. SPDK NVMe-oF RDMA

    transport allocates a single RDMA completion queue per poll group. (struct spdk_nvmf_poll_group) .

    NVMe-oF listener: listen on a network address at which the target will accept new connections.

    (struct spdk_nvmf_listener).

    NVMe-oF host: An NVMe-oF NQN representing a host (initiator) system.

    This is used for access control. (struct spdk_nvmf_host).

  • Dive in…Workflow of the SPDK NVMe-oF target library (overview)

    How NVMe-oF works with terminologies defined before.

    The client and server in NVMe-oF context is called initiator and target separately.

    The SPDK NVMe-oF target uses the SPDK user-space, polled-mode NVMe driver to submit and

    complete I/O requests to NVMe devices.

    The host system uses the initiator to establish a connection and submit I/O requests to an NVMe

    subsystem within an NVMe-oF target.

    The SPDK NVMe-oF target and initiator uses the Infiniband/RDMA verbs API to access an

    RDMA-capable NIC.

  • Dive in…Workflow of the SPDK NVMe-oF target library (interface)

    How NVMe-oF works with terminologies defined before.

    A user of the NVMe-oF target library begins by creating a target using spdk_nvmf_tgt_create(),

    setting up a set of addresses on which to accept connections by calling spdk_nvmf_tgt_listen(),

    then creating a subsystem using spdk_nvmf_subsystem_create(). Namespaces which

    represents bdevs can be added to the subsystems with spdk_nvmf_subsystem_add_ns().

    Once a subsystem exists and the target is listening on an address, new connections may be

    accepted by polling spdk_nvmf_tgt_accept().

    When spdk_nvmf_tgt_accept() detects a new connection, it will construct a new struct

    spdk_nvmf_qpair object and call the user provided callback for each new qpair.The user must

    assign the qpair to a poll group by calling spdk_nvmf_poll_group_add() to process I/O requests.

    All I/O to a subsystem is driven by a poll group, which polls for incoming network I/O.

    Poll groups may be created by calling spdk_nvmf_poll_group_create(). And they automatically

    request to begin polling upon creation on the thread from which they were created.

  • Dive in…Transport Abstraction in SPDK

    Spdk enables separate implementations for different transports.

    The API a new transport must implement is located in (lib/nvmf/transport.h)

    * construct / destruct controller* get infos / statistics* Create / delete I/O queue* Submit request* Process completions

    We focus on the tranports implementations.

  • TBCWhere we reached and what’s the future works.

    We almost understand the transport part of the project,

    it’s position inside the whole project.

    and have a preliminary feeling on which part we need to modify and change.

    But there are still many other issues:

    * The tricks and design patterns in product-level codes* Interoperate with the other part of the project* Resource management* Multi-connection and multi-processing* Error handling* performance optimizations

  • Thanks!