Upload
hadiep
View
217
Download
2
Embed Size (px)
Citation preview
File Systems of Solid-State Drives
A Brief Teaching by Mark VerondaTuesday, November 4, 2008
1Tuesday, November 4, 2008
Outline
• Solid-State Drives (SSD)• History and Motivations
• Properties
• Implications
• Log-Structured (LS) File Systems• What they are
• Why they are good for SSDs
• Implementations in Current Use
• Background
2Tuesday, November 4, 2008
File Systems
• Storing
• Organizing
• Easy to find/access/allocate files
3Tuesday, November 4, 2008
File Systems
• What have we been shown?
• Hierarchical Directory
• Unix and inodes
• Mainly stuff that is good for old-fashioned, slow, slow, slow disks
4Tuesday, November 4, 2008
Block Devices
• Two (rough) Categories of I/O-devices
• Character Devices
• e.g., Keyboard
• Block Devices
• e.g., A hard-drive
5Tuesday, November 4, 2008
Solid-State Drive
• Wikipedia Definition: “A data storage device that uses solid-state memory to store persistent data.”
• Aren’t magnets “solid-state?”
• No moving parts!
6Tuesday, November 4, 2008
Outline
• Solid-State Drives (SSD)• History and Motivations
• Properties
• Implications
✓ Background
• Log-Structured (LS) File Systems• What they are
• Why they are good for SSDs
• Implementations in Current Use
7Tuesday, November 4, 2008
History
• First Solid-State Drive?
• 1976!
• Magnetic Core Memory• Yes, that’s why we call it “Core Dump” today
• An “SSD” from the Dataram Corporation would cost $95,000 (in 2008) for the amazing amount of 1 MB
8Tuesday, November 4, 2008
History
• Bubble Memory
• Hyped by Intel at the beginning of the 80’s.
• Used in the Atari system
• Too expensive
9Tuesday, November 4, 2008
History
• 1991
• DEC builds servers with SSDs• 1995
• Sun begins to offer them as well• 1999
• 11 manufacturers of Solid-State Drives• 2001
• 22 manufacturers10Tuesday, November 4, 2008
Why Use SSD?
• No moving parts
• Reduced energy usage
• Better reliability
• Faster
11Tuesday, November 4, 2008
Properties of a SSD
• Two types of SSDs in use today:
• Non-Volatile RAM
• NAND-Gate Flash memory
12Tuesday, November 4, 2008
Non-Volatile RAM
• Treat RAM memory as a block device.
• Usually, a battery pack provides the non-volatility.
13Tuesday, November 4, 2008
Non-Volatile RAM
• Pros :
• Speed
• Purely Random I/O
• Cons :
• Expen$$$ive!!!
• Not the best way to ensure non-volatility
14Tuesday, November 4, 2008
NAND-Gate
• NAND - Not AND
15Tuesday, November 4, 2008
The Problem with NAND
• Floating-Gate Transistors
• Wear leveling...
16Tuesday, November 4, 2008
(with NAND-gates)
• A flash memory package consists of:
• One or more die (AKA: Chips)
• Shares an 8-bit serial I/O bus
• Contains 8,192 blocks
• Organized by 4 planes of 2,048 blocks
• Each block has 64 4KB pages
• A page has an additional 128 bytes for metadata (ID and ECC)
Flash Memory
17Tuesday, November 4, 2008
I/O serial connection
Die0 (Chip0)
Plane0 Plane1 Plane2 Plane3
44K Reg
a block
18Tuesday, November 4, 2008
Flash Memory• Reading
• Writing :
• Slow
• Change bit from 1 to 0
• Erasing :
• Much Slower
• Change bit from 0 to 1
• Must be done on the page-level
19Tuesday, November 4, 2008
Another Problem with Flash
• Some quick statistics :
Page read to register 25 µsWrite page from register 200 µsBlock erase 1.5 msSerial access to register (data bus) 100 µs
20Tuesday, November 4, 2008
!"#$
%&$'()*+'
,"-.+
/0))'(
1*&*-'(
2("+'##"(
34*#5
6'708
9108
34*#5
2:-
!"#$
%&$'(+"&&'+$
;<1
!!" #$%&'$(()'
34*#5
2:-
34*#5
2:-
34*#5
2:-
Figure 3: SSD Logic Components
address to physical flash location. The processor, buffer-
manager, and multiplexer are typically implemented in a
discrete component such as an ASIC or FPGA, and data
flow between these logic elements is very fast. The pro-
cessor, and its associated RAM, may be integrated, as
is the case for simple USB flash-stick devices, or stan-
dalone as for designs with more substantial processing
and memory requirements.
As described in Section 2, flash packages export an
8-bit wide serial data interface with a similar number of
control pins. A 32GB SSD with 8 of the Samsung parts
would require 136 pins at the flash controller(s) just for
the flash components. With such a device, it might be
possible to achieve full interconnection between the flash
controller(s) and flash packages, but for larger configura-
tions this is not likely to remain feasible. For the mo-
ment, we assume full interconnection between data path,
control logic, and flash. We return to the issue of inter-
connect density in Section 3.3.
This paper is primarily concerned with the organiza-
tion of the flash array and the algorithms needed to man-
age mappings between logical disk and physical flash ad-
dresses. It is beyond the scope of this paper to tackle the
many important issues surrounding the design and layout
of SSD logic components.
3.1 Logical Block Map
As pointed out by Birrell et al. [2], the nature of NAND
flash dictates that writes cannot be performed in place as
on a rotating disk. Moreover, to achieve acceptable per-
formance, writes must be performed sequentially when-
ever possible, as in a log. Since each write of a single
logical-disk block address (LBA) corresponds to a write
of a different flash page, even the simplest SSD must
maintain some form of mapping between logical block
address and physical flash location. We assume that the
logical block map is held in volatile memory and recon-
structed from stable storage at startup time.
We frame the discussion of logical block maps us-
ing the abstraction of an allocation pool to think about
how an SSD allocates flash blocks to service write re-
quests. When handling a write request, each target log-
ical page (4KB) is allocated from a pre-determined pool
of flash memory. The scope of an allocation pool might
be as small as a flash plane or as large as multiple flash
packages. When considering the properties of allocation
pools, the following variables come to mind.
• Static map. A portion of each LBA constitutes a
fixed mapping to a specific allocation pool.
• Dynamic map. The non-static portion of a LBA is
the lookup key for a mapping within a pool.
• Logical page size. The size for the referent of a
mapping entry might be as large as a flash block
(256KB), or as small as a quarter-page (1KB) .
• Page span. A logical page might span related pages
on different flash packages thus creating the poten-
tial for accessing sections of the page in parallel.
These variables are then bound by three constraints:
• Load balancing. Optimally, I/O operations should
be evenly balanced between allocation pools.
• Parallel access. The assignment of LBAs to phys-
ical addresses should interfere as little as possible
with the ability to access those LBAs in parallel. So,
for example if LBA0..LBAn are always accessed at
the same time, they should not be stored on a com-
ponent that requires each to be accessed in series.
• Block erasure. Flash pages cannot be re-written
without first being erased. Only fixed-size blocks of
contiguous pages can be erased.
SSD Internals
21Tuesday, November 4, 2008
Current State of SSDs
• Two Manufacturers of note
• Samsung
• Texas Memory Instruments
• Both offer proprietary “optimizations” to ensure wear-leveling.
22Tuesday, November 4, 2008
Implications of SSD
• Ramón Cáceres, Fred Douglis, Kai Li, Brian Marsh (1993)
• Predicts future of (mobile) computers to consist of NV-RAM used for running applications and a backing store of flash memory for persistent data.
23Tuesday, November 4, 2008
Effects on File Systems
• No need to “cluster” data for locality
• Need to avoid writes to the same block
• File System cache unnecessary because data and metadata can be quickly accessed.
24Tuesday, November 4, 2008
Outline
✓ Solid-State Drives (SSD)✓ History and Motivations
✓ Properties
✓ Implications
✓ Background
• Log-Structured (LS) File Systems• What they are
• Why they are good for SSDs
• Implementations in Current Use
25Tuesday, November 4, 2008
Log-Structured File System
• Concept introduced (and implemented) by Rosenblum and Ousterhout in 1991
• Main premises :• CPU speeds are increasing exponentially.
• RAM sizes are increasing exponentially.
• Hard-drive speed has remained constant.
• Conclusion?
• I/O will primarily consists of writes.
26Tuesday, November 4, 2008
Motivating Example
• BSD FFS (Fast File System) of 1992 is terribly inefficient for writes.
• e.g., it takes 5 seeks to add one small file to a folder.
• What about for SSD with no seek-time?
• 5 locations will be rewritten
27Tuesday, November 4, 2008
Log-Structured Overview
• Contains concepts similar to the file systems that we have seen.
• e.g., inodes and directories.
• The log is the only data structure.
• Perform write operations in a batch
• Pro: reduces number of seeks.
• Con: increases potential for data-loss.
28Tuesday, November 4, 2008
Two Issues
• 1) How to retrieve information from the log?
• 2) How to manage free space so that there will always be enough of it to be able to perform a write as a single operation?
29Tuesday, November 4, 2008
(1) Information Retrieval
• With FFS :
• inode number --> disk address of inode.
• With a log-structured file system :
• Introduce new data structure: inode map.
30Tuesday, November 4, 2008
Inode Map
• The inode map is divided into blocks that are written to the log.
• Fix (to a location on the disk) a checkpoint region of all the inode map blocks.
31Tuesday, November 4, 2008
Information Retrieval
32 . M, Rosenblum and J. K. Ousterhout
dirl dir2 filel file2
‘1 &.* ~=
Lag +
~ ........
Disk &w~: ;::;I ,, !!J
~: ,,,,Disk
,::: ~
Sprite LFS+ ‘?
filel file2dirl dir2
Unix FFS
Fig. 1. A comparison between Sprite LFS and Unix FFS. This example shows the modified disk
blocks written by Sprite LFS and Unix FFS when creating two single-block files named dirl /filel
and dlr2 /flle2. Each system must write new data blocks and inodes for file 1 and flle2, plus new
data blocks and inodes for the containing directories. Unix FFS requires ten nonsequential
writes for the new information (the inodes for the new files are each written twice to ease
recovery from crashes), while Sprite LFS performs the operations in a single large write. The
same number of disk accesses will be required to read the files in the two systems. Sprite LFS
also writes out new inode map blocks to record the new inode locations
Block Key:Threaded log
laOld log end New log end
Old datablock
New datablock II
Prcwously deletedII
Copy and Compact
Old log end New log end
Fig, 2. Possible free space management solutions for log-structured file systems, In a log-struc-
tured file system, free space for the log can be generated either by copying the old blocks or by
threading the log around the old blocks. The left side of the figure shows the threaded log
approach where the log skips over the active blocks and overwrites blocks of files that have been
deleted or overwritten. Pointers between the blocks of the log are mamtained so that the log can
be followed during crash recovery The right side of the figure shows the copying scheme where
log space is generated by reading the section of disk after the end of the log and rewriting the
active blocks of that section along with the new data into the newly generated space.
be no faster than traditional file systems. The second alternative is to copy
live data out of the log in order to leave large free extents for writing. For
this paper we will assume that the live data is written back in a compacted
form at the head of the log; it could also be moved to another log-structured
file system to form a hierarchy of logs, or it could be moved to some totally
different file system or archive. The disadvantage of copying is its cost,
particularly for long-lived files; in the simplest case where the log works
circularly across the disk and live data is copied back into the log, all of the
long-lived files will have to be copied in every pass of the log across the disk.
Sprite LFS uses a combination of threading and copying. The disk is
divided into large fixed-size extents called segments. Any given segment is
always written sequentially from its beginning to its end, and all live data
must be copied out of a segment before the segment can be rewritten.
However, the log is threaded on a segment-by-segment basis; if the system
ACM Transactions on Computer Systems, Vol. 10, No. 1, February 1992,
32Tuesday, November 4, 2008
(2) Free-Space
• Need to maintain large enough extents to write new data.
• Two approaches:
• Threading
• Copy and Compact
33Tuesday, November 4, 2008
Threading
• Leave live data in place
• “Thread” the log through free spaces in between the free extents.
• Drawback?
• Severe fragmentation
34Tuesday, November 4, 2008
Copy and Compact
• Copy live data out of log
• Write the new data
35Tuesday, November 4, 2008
Free Space management
operation is complete, the segments that were read aremarked as clean, and they can be used for new data or foradditional cleaning.
As part of segment cleaning it must be possible toidentify which blocks of each segment are live, so that theycan be written out again. It must also be possible to iden-tify the file to which each block belongs and the position ofthe block within the file; this information is needed in orderto update the file’s inode to point to the new location of theblock. Sprite LFS solves both of these problems by writinga segment summary block as part of each segment. Thesummary block identifies each piece of information that iswritten in the segment; for example, for each file data blockthe summary block contains the file number and blocknumber for the block. Segments can contain multiple seg-ment summary blocks when more than one log write isneeded to fill the segment. (Partial-segment writes occurwhen the number of dirty blocks buffered in the file cacheis insufficient to fill a segment.) Segment summary blocksimpose little overhead during writing, and they are usefulduring crash recovery (see Section 4) as well as duringcleaning.
Sprite LFS also uses the segment summary informa-tion to distinguish live blocks from those that have beenoverwritten or deleted. Once a block’s identity is known,its liveness can be determined by checking the file’s inodeor indirect block to see if the appropriate block pointer stillrefers to this block. If it does, then the block is live; if itdoesn’t, then the block is dead. Sprite LFS optimizes thischeck slightly by keeping a version number in the inodemap entry for each file; the version number is incrementedwhenever the file is deleted or truncated to length zero.The version number combined with the inode number forman unique identifier (uid) for the contents of the file. Thesegment summary block records this uid for each block in
Old log end New log end
Copy and CompactBlock Key:
Previously deleted
New data block
Old data block
Threaded log
New log endOld log end
Figure 2 — Possible free space management solutions for log-structured file systems.In a log-structured file system, free space for the log can be generated either by copying the old blocks or by threading the log around theold blocks. The left side of the figure shows the threaded log approach where the log skips over the active blocks and overwrites blocks offiles that have been deleted or overwritten. Pointers between the blocks of the log are maintained so that the log can be followed duringcrash recovery. The right side of the figure shows the copying scheme where log space is generated by reading the section of disk after theend of the log and rewriting the active blocks of that section along with the new data into the newly generated space.
the segment; if the uid of a block does not match the uidcurrently stored in the inode map when the segment iscleaned, the block can be discarded immediately withoutexamining the file’s inode.
This approach to cleaning means that there is nofree-block list or bitmap in Sprite. In addition to savingmemory and disk space, the elimination of these data struc-tures also simplifies crash recovery. If these data structuresexisted, additional code would be needed to log changes tothe structures and restore consistency after crashes.
3.4. Segment cleaning policies
Given the basic mechanism described above, fourpolicy issues must be addressed:
(1) When should the segment cleaner execute? Somepossible choices are for it to run continuously inbackground at low priority, or only at night, or onlywhen disk space is nearly exhausted.
(2) How many segments should it clean at a time? Seg-ment cleaning offers an opportunity to reorganizedata on disk; the more segments cleaned at once, themore opportunities to rearrange.
(3) Which segments should be cleaned? An obviouschoice is the ones that are most fragmented, but thisturns out not to be the best choice.
(4) How should the live blocks be grouped when theyare written out? One possibility is to try to enhancethe locality of future reads, for example by groupingfiles in the same directory together into a single out-put segment. Another possibility is to sort the blocksby the time they were last modified and group blocksof similar age together into new segments; we callthis approach age sort.
July 24, 1991 - 5 -
36Tuesday, November 4, 2008
Pros (for SSD)
• Better write efficiency
• Provides wear-leveling
37Tuesday, November 4, 2008
JFFS 1/2
• JFFS - Journalling Flash File System
• Perfect!
• Optimized for NOR-gate flash chips.
• Implements some functionality that is not necessary for NAND-gate chips.
38Tuesday, November 4, 2008
NOR-Gate
• NOR - Not OR
• Low density
• Slower read/write speeds
• Expensive (compared to NAND)
39Tuesday, November 4, 2008
YAFFS
• YAFFS - Yet Another Flash File System
• Specifically Designed for NAND
• Must be perfect!
• Right?
• Maximum file system size: 8 GB
40Tuesday, November 4, 2008
✓Vote
✓ Solid-State Drives (SSD)✓ History and Motivations
✓ Properties
✓ Implications
✓ Background
✓ Log-Structured (LS) File Systems✓ What they are
✓ Why they are good for SSDs
✓ Implementations in Current Use
(May the best File-System for this nation win!)
41Tuesday, November 4, 2008
Thank you for Learning
42Tuesday, November 4, 2008