Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected])
Solid State Storage Technologies
Dongkun Shin ([email protected])
Embedded Software Laboratory
Sungkyunkwan University
http://nyx.skku.ac.kr/
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 2
NVMe (1)
• The industry standard interface for
high-performance NVM storage
– NVMe 1.0 specification in 2011 (now 1.3)
– Supported by major OSes: Windows, Linux, Solaris, …
• PCIe-based
– Low latency: direct connection to CPU
– Scalable performance: 1GB/s per lane, up to 32 lanes
– No HBA required: reduced power & cost
• Form factors
– Add-in-Card, M.2, BGA, etc.
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 3
NVMe (2)
• Deep queue: 64K commands/queue, up to 64K
queues
• Streamlined command set: only 13 required
commands
• One register write to issue a command
(“doorbell”)
• Support for MSI-X and interrupt aggregation
Doorbell
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 4
User-level NVMe Drivers
• Ex. Intel SPDK (storage performance development kit)
– All I/O operations issued in user-land
– Polling or interrupt (signal to user process)
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 5
User-level NVMe Drivers
• NVMeDirect framework
– User-level access +
kernel-level access
NV
MeD
irec
t Li
bra
ry
NVMe Controller
I/O
H
and
les
I/O
Q
ueu
es
Block Cache
I/O Scheduler
I/O Completion Thread
Handle Handle
Admin Tool
NVMeDirectAPI
Use
rK
ern
el
HW
NV
Me
Dri
ver
Def
ault
Q
ueu
es
Use
r Q
ueu
es
H.-J. Kim, Y.-S. Lee, and J.-S. Kim, “NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs,” HotStorage, 2016.
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 6
All-Flash Array
• Interfaces
– 10Gb/40Gb Ethernet (iSCSI) or
16Gb Fibre Channel or PCIe
– SAS or NVMe SSDs
• Functionalities
– Volume management
– Virtualization support
– RAID
– Snapshot
– Deduplication
– Compression, …
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 7
Traditional Block Interface
• SATA/SCSI/SAS
– Read (sector #, length)
Write (sector #, length, data)
– No block-level liveness information
– No high-level semantics on data
– Several “unwritten contracts”
do not hold for SSDs• Sequential accesses are several tens of
times better than random accesses
• Distant LBNs lead to longer seek times
• Data written is equal to data issued
• …
FTL
SSD
Host
Block device driver
File system
Block I/F
NAND Flash
Flash I/F
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 8
Extending Block I/F
• TRIM command
– “The data in the specified sectors is
no longer needed”
– ATA interface standard
(T13 technical committee)
– Non-queued command
– SATA 3.1 introduces the Queued
TRIM command
FTL
SSD
Host
Block device driver
File system
NAND Flash
Block I/F + SSD-Specific I/F
Flash I/F
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 9
Atomic Write
• Transaction support for multi-block writes
– Simplifies file systems and DBMSes
X. Quyang, et al., “Beyond Block I/O: Rethinking Traditional Storage Primitives,” HPCA, 2011.
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 10
Multi-streamed SSD (1)
• Previous write patterns (= current state) matter
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 11
Multi-streamed SSD (2)
• Mapping data with different lifetime to different
streams
• Standardized in T10 SCSI/SAS (2015), NVMe 1.3
(2017)
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 12
Multi-streamed SSD (3)
• Cassandra with Multi-streamed SSD
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 13
Multi-streamed SSD (4)
• Cassandra’s normalized updated throughput with
5 streams
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 14
Open-Channel SSD (1)
• Why Open-channel SSD is required?
– I/O predictability & isolation
0% writes
latency is consistent
I/O Performance is
unpredictable
due to writes
being buffered
50% writes can make
SSDs as slow as
spinning drives
20% writes make
big impact
on read latency
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 15
Open-Channel SSD (2)
• Why Open-channel SSD is required?
– Log-on-log, indirection, and narrow I/O
Log-structured Database (e.g., RocksDB)
Metadata Mgmt. Address Mapping Garbage Collection
VFS
Log-structured File-system
Metadata Mgmt. Address Mapping Garbage Collection
Block Layer
Solid-State Drive
Metadata Mgmt. Address Mapping Garbage Collection
User
Space
Kernel
Space
HW
pread/pwrite
Read/Write/TrimBlack-Boxed SSD
- SSD state is hidden
due to the narrow I/O interface
- Data Placement + Buffering
→ Best Effort
We need application-driven SSD!
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 16
Open-Channel SSD (3)
• Open-Channel SSD exposes its geometry
• LightNVM: Open-Channel SSD subsystem in Linux Kernel
– Functionalities of Flash Translation Layer (FTL)
– Administration of drive instances
– Interface between application/filesystem and Open-Channel SSD
A t
rad
itio
nal
Blo
ck I
/O S
SD
Op
en
-Ch
an
nel S
SD
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 17
Open-Channel SSD (4)
Traditional SSD Open-Channel SSD
User visibility X O
Command Format Read/Write Program/Read/Erase
Address LBA PPA (physical Page Address)
Timing info. None program/read/erase timing
L2P Mapping Table
On Device On host OSDRAM buffer
Bad Block Management
Write Handling Firmwareon device
Thread on host OS Garbage Collection
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 18
Open-Channel SSD (5)
• Experiment – multi-tenant WorkloadsOC-SSDNVMe SSD
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 19
Open-Channel SSD (6)
• Experiment – Predictable Latency
– 4K reads during 64K concurrent writes
– Read PU and Write PU are separated at OC-SSD
– Consistent low latency at 99.99, 99.999, 99.9999
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 20
Open-Channel SSD (7)
• DIDACache: A deep integration of device and
application for flash-based key-value caching. (FAST’17)
– Integrate the Key-value cache system with FTL
– A single-level direct mapping from keys to physical flash
memory locations
– An integrated garbage collection
– Throughput ↑ and Latency ↓
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 21
KeyValue-SSD (1)
• Internally manages variable length key-value pairs
• Provide a similar interface with conventional host-
side KV store
• Offload the key-value management layer to an SSD
– reduce host system resource
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 22
KeyValue-SSD (2)
KAML: fixed size 8B key,
additional log and translation
for variable size key in host
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 23
KeyValue-SSD (3)
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 24
In-Storage Computing (1)
• Samsung ISC SSD Prototype
– Commodity SSD: Samsung PM1725 NVMe with the ISC
feature
– PCIe 3.0x4
– 800 GB
• Software
– C++11
– C++STL
– G++
– Software emulator
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 25
In-Storage Computing (2)
• ISC Application Development Process
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 26
In-Storage Computing (3)
• ISC Dataflow Programming Model
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 27
In-Storage Computing (4)
• Example: Simple Key-Value Store
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 28
In-Storage Computing (5)
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 29
In-Storage Computing (6)
• FPGA SSD
ICE3028: Embedded Systems Design, Fall 2019, Dongkun Shin ([email protected]) 30
In-Storage Computing (7)
• Cognitive SSD (ATC’19)