Remote Persistent Memory Block Translation Table: Persistent memory is byte addressable. Existing software

  • View
    0

  • Download
    0

Embed Size (px)

Text of Remote Persistent Memory Block Translation Table: Persistent memory is byte addressable. Existing...

  • Remote Persistent Memory Tom Talpey

  • Outline

     Persistent Memory and Windows PMEM Support  A brief summary, for better awareness

     RDMA Persistent Memory Extensions  And their motivation/use by Storage Protocols

     Example Application Scenarios for Persistent Memory Operations  RDMA Operation Behavior

    2

  • Persistent Memory

  • Memory & Storage Convergence

     Volatile and non-volatile technologies are continuing to converge

    Source: Gen-Z Consortium 2016

    *PM = Persistent Memory

    **OPM = On-Package Memory

    HMC

    HBM

    RRAM

    3DXPointTM

    Memory

    MRAM

    PCM

    Low Latency NAND

    Managed DRAM

    New and Emerging Memory Technologies

    Today

    Memory

    Storage Disk/SSD

    PM*

    Disk/SSD

    PM*

    Disk/SSD

    PM*

    Disk/SSD

    DRAM DRAM DRAM/OPM** DRAM/OPM**

    4

  • 5

    Persistent Memory Brings Storage

    To Memory Slots

    Fast Like Memory

    Persistent Like Storage

    • For system acceleration

    • For real-time data capture, analysis and intelligent response

    Persistent Memory (PM) Vision

  • Persistent Memory (PM) is a type of Non-Volatile Memory (NVM)

     Disk-like non-volatile memory  Persistent RAM disk

     Appears as disk drives to applications

     Accessed as traditional array of blocks

     Usually software arbitrated e.g. RAMdisk-like driver, or simply NVMe

     Memory-like non-volatile memory (PM)  Appears as memory to applications

     Applications store data directly in byte-addressable memory

     No IO or even DMA is required

    6

  • Persistent Memory (PM) Characteristics

     Byte addressable from programmer’s point of view

     Provides Load/Store access via CPU instructions

     Resides on the memory bus

     Has Memory-like performance  Ultra-low latency

     Supports DMA including RDMA

     Not prone to unexpected latencies

    7

  • Persistent Memory Terminology

     Terms used to describe the hardware:  Storage Class Memory (SCM)

     Byte Addressable Storage (BAS)

     Persistent Memory (PM)  Industry converging on this term

     “PMEM” also commonly used

     Non-Volatile Memory  “NVM” term is moving to indicate block mode access

     i.e. not byte-addressable, not a memory interface

     E.g. NVMe, NVMoF, etc.

  • Windows Persistent Memory Support

    9

  • Windows Persistent Memory Support

     Persistent Memory is supported in Windows 10 and Windows Server 2016  PM support is foundational in Windows and is SKU-independent

     Support for JEDEC-defined NVDIMM-N devices available in  Windows Server 2016

     Windows 10 (Anniversary Update – Fall 2016)

     Access methods:  Direct Access (DAX) Filesystem

     Mapped files with load/store/flush paradigm

     Cached and noncached with read/write paradigm

     Block-mode (“persistent ramdisk”)

     Raw disk paradigm

     Application interfaces

     Mapped and traditional file

     NVM Programming Library

     “PMEM-aware” open coded

    10

  • Direct Access Architecture

     Overview

     Support in Windows Server 2016 and Windows 10 Anniversary Update (Fall 2016)

     App has direct access to Storage Class Memory (SCM/PM) via existing memory- mapping semantics

     Updates directly modify memory, the storage stack not involved

     DAX volumes identified through new flag

     Characteristics  True device performance (no software overhead)

     Byte-Addressable

     Filter Drivers relying on I/O may not work or attach – no I/O, new volume flag

     AV Filters can still operate (Windows Defender already updated)

    11

    SCM Disk DriverSCM Bus Driver

    Block Mode

    Application

    Standard File API

    PMEM

    DirectAccess

    Application

    Load/Store

    Operations

    SCM-Aware File System (NTFS -

    DAX)

    Application

    requests memory-

    mapped file

    Enumerates NVDIMM

    User Mode

    Kernel Mode

    Memory

    Mapped

    Region

    Memory

    Mapped

    Region

    Load/Store

    Operations

    Direct Access

    Data Path

    Direct Access

    Setup Path

  • IO in DAX mode

     Memory Mapped Access  This is true zero-copy access to storage

     An application has direct access to persistent memory

     Important  No paging reads or paging writes will be generated

     Cached IO Access  The cache manager creates a cache map that maps directly to PM hardware

     The cache manager copies directly between user’s buffer and persistent memory

     Cached IO has one-copy access to persistent storage

     Cached IO is coherent with memory mapped IO

     As in memory mapped IO, no paging reads or paging writes are generated

     No Cache Manager Lazy Writer thread

     Non-Cached IO Access  Is simply converted to cached IO by the file system

     Cache manager copies directly between user’s buffer and persistent memory

     Is coherent with cached and memory mapped IO

    12

  • Backward App Compatibility on PM Hardware

     Block Mode Volumes  Maintains existing storage semantics

     All IO operations traverse the storage stack to the PM disk driver

     Sector atomicity guaranteed by the PM disk driver

     Has shortened path length through the storage stack to reduce latency

     No storport or miniport drivers

     No SCSI translations

     Fully compatible with existing applications

     Supported by all Windows file systems

     Works with existing file system filters

     Block mode vs. DAX mode is chosen at format time

    13

  • Performance Comparison

     4K random writes

     1 Thread, single core, QD=1

    14

    IOPS Avg Latency (ns) MB / Sec

    NVMe SSD 14,553 66,632 56.85

    Block Mode NVDIMM

    148,567 6,418 580.34

    DAX Mode NVDIMM

    1,112,007 828 4,343.78

  • Using DAX in Windows

     DAX Volume Creation  format n: /dax /q

     Format-Volume –DriveLetter n –IsDAX $true

     DAX Volume Identification  Is it a DAX volume?

     call GetVolumeInformation(“C:\”, ...)

     check lpFileSystemFlags for FILE_DAX_VOLUME (0x20000000)

     Is the file on a DAX volume?  call GetVolumeInformationByHandleW(hFile, ...)

     check lpFileSystemFlags for FILE_DAX_VOLUME (0x20000000)

    15

  • Using DAX in Windows

     Memory Mapping  HANDLE hMapping = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, 0, NULL);

     LPVOID baseAddress = MapViewOfFile(hMapping, FILE_MAP_WRITE, 0, 0, size);

     memcpy(baseAddress + writeOffset, dataBuffer, ioSize);

     FlushViewOfFile(baseAddress, 0);

     OR … use non-temporal instructions for NVDIMM-N devices for better performance  HANDLE hMapping = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, 0, NULL);

     LPVOID baseAddress = MapViewOfFile(hMapping, FILE_MAP_WRITE, 0, 0, size);

     RtlCopyMemoryNonTemporal(baseAddress + writeOffset, dataBuffer, ioSize );

     Future … use open source NVM Programming Library

    16

  • Linux Kernel 4.4+ NVDIMM-N Support

    BTT (Block, Atomic)

    PMEM

    DAX

    BLK

    File system extensions to bypass the page cache and block layer to memory map persistent memory, from a PMEM block device, directly into a process address space.

    A system-physical-address range where writes are persistent. A block device composed of PMEM is capable of DAX. A PMEM address range may span an interleave of several DIMMs.

    Block Translation Table: Persistent memory is byte addressable. Existing software may have an expectation that the power-fail-atomicity of writes is at least one sector, 512 bytes. The BTT is an indirection table with atomic update semantics to front a PMEM/BLK block device driver and present arbitrary atomic sector sizes.

    A set of one or more programmable memory mapped apertures provided by a DIMM to access its media. This indirection precludes the performance benefit of interleaving, but enables DIMM- bounded failure modes.

     Linux 4.2 added support of NVDIMMs. Mostly stable from 4.4

     NVDIMM modules presented as device links: /dev/pmem0, /dev/pmem1

     QEMU support (experimental)

     XFS-DAX and EXT4-DAX available

    17

  • Remote Access to Persistent Memory

    18

  • Going Remote

     One local copy of storage isn’t storage at all  Basically, temp data

     Enterprise-grade storage requires replication  Multi-device quorum

     In addition to integrity, privacy, manageability, … (requirements vary)

     Remote access is required

     PM value is all about LATENCY  Single digit microsecond remote latency goal

     Which btw is 2-3 orders of magnitude better than today’s block storage

     We