Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Architectural Techniques for
Improving NAND Flash Memory Reliability
Thesis Proposal
Yixin Luo
B.S., Computer Engineering, University of Michigan
B.S., Electrical Engineering, Shanghai Jiao Tong University
Thesis Prospectus Committee
Prof. Onur Mutlu (Chair)
Prof. Phillip B. Gibbons
Prof. James C. Hoe
Dr. Yu Cai
Dr. Erich F. Haratsch
July 12, 2016
Carnegie Mellon University
Pittsburgh, PA
Contents
1 Introduction 11.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Background 32.1 SSD Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 NAND Flash Block Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 NAND Flash Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 NAND Flash Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Related Work 103.1 Threshold Voltage Distribution Modeling . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Flash Media Reliability Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Flash Channel Noise Tolerance Techniques . . . . . . . . . . . . . . . . . . . . . . . . 123.4 Flash Reliability Management Techniques . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Host-Side Data Management Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 133.6 Self-Healing Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Preliminary Work: WARM—Write-hotness Aware Retention Management 134.1 Partitioning Data Using Write-Hotness . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.1 Identifying Write-Hot and Write-Cold Data . . . . . . . . . . . . . . . . . . . 144.1.2 Partitioning the Flash Device . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.1.3 Tuning the Partition Boundary . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Flash Management Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2.1 WARM-Only Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2.2 Combining WARM with Refresh . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Proposed Work 1: Online Characterization and Modeling of NAND Flash Mem-ory Errors 19
6 Proposed Work 2: Model-Driven Flash Management Policies 21
7 Proposed Work 3: Characterization and Utilization of NAND Flash MemorySelf-Healing Effect 22
8 Timeline 23
9 Conclusion 24
2
1. Introduction
Raw bit errors are common in NAND flash memory and will increase in the future. These
flash errors are inevitable in nature because they are caused by various sources of circuit-level
noise [3, 4, 5, 6, 7, 8, 10, 35]. To ensure its reliable operation with the presence of these errors,
flash memory deploys error-correcting codes (ECC), which corrects up to a certain raw bit error
rate. However, the raw bit error rate increases as flash memory becomes increasingly vulnerable to
circuit noise due to increased wear out. These errors will exceed the ECC correction capability at
a point where reliable operation of the flash memory can no longer be guaranteed. The duration
until this point is known as the flash memory lifetime. Exacerbating the flash errors count is the
continued increase in flash density through feature size scaling, multi-level cell (MLC), and 3D
NAND technology, all of which trade off flash reliability for significant cost reduction and capacity
increase. Due to the degraded flash reliability, the ECCs used for today’s flash memory can only
sustain a shorter lifetime for future flash memory with higher density.
Our goal in this proposal is to improve flash reliability by 1) reducing the raw bit error rate
or 2) tolerating more errors at lower cost and with lower performance overhead. To reduce the
error rate during read and write (or program) operations, we can increase the precision of these
operations. However, this comes at a cost of higher read or write latency. To tolerate more errors,
we can use a stronger ECC. Today’s flash memory typically uses an ECC encoding whose coding
rate is already approaching the theoretical limit [32]. Thus, stronger error correction capability can
only be achieved through higher redundancy or larger coding granularity, both of which increase
hardware overhead of the ECC decoder/encoder [8, 9]. As we can see, naive approaches that reduce
or tolerate raw bit errors for improving flash reliability do not meet our goal of low cost and low
performance overhead. Thus, to achieve our goal, we propose to perform research that provides
evidence for the following thesis statement.
1.1. Thesis Statement
NAND flash memory reliability can be improved at low cost and with low per-
formance overhead by deploying various architectural techniques that are aware of
higher-level application behavior and underlying flash device characteristics.
1.2. Our Approach
Our approach is to understand flash error characteristics and workload behavior through charac-
terization, and to design smart flash controller algorithms that utilize this understanding to improve
flash reliability. Compared with existing techniques that focus on only a part of the flash memory
system (e.g., circuit optimization, ECC code improvement, or file system design), our approach has
the following three advantages. First, we can take advantage of higher-level application behavior
such as write frequency and locality, and deploy the best suited flash reliability techniques for differ-
ent data. Second, we can take advantage of underlying device characteristics, such as variations in
1
latency, retention time, or errors, to develop more efficient flash-aware reliability techniques. Third,
we can take advantage of the unused computing resources in the flash controller during idle time,
or potentially in the host, to enable more effective flash reliability techniques. In this proposal, we
propose to investigate four directions to enable these three advantages and efficiently improve flash
reliability.
Direction 1: Our preliminary work devises a technique that reduces the overhead for
relaxing flash retention time guarantee, i.e., the duration for which data can be held reliably. This
is an example of our approach that exploits application-level write-hotness and device level reten-
tion characteristics to improve flash lifetime. We find that a small portion of data that is very
frequently written to (i.e., write-hot data) needs only a very short retention time guarantee to be
reliable. However, write-hot data consumes a large fraction of flash endurance and increases the
write amplification factor if such data is not managed properly. We also find that by relaxing
flash retention time guarantee, we can reduce the number of allowed retention errors, and thus
significantly improve flash endurance. Taking advantage of these two findings, we design a mecha-
nism that physically partitions write-hot data from write-cold data. By doing this, our mechanism
enables heterogeneous management of each partition—we can relax the retention time constraint
for write-hot data without any refresh overhead, which is required for write-cold data. Our key
results show that this mechanism can improve overall flash lifetime by 3.24× over a conventional
management policy without refresh. More details are in Section 4 and our MSST 2015 paper [27].
Direction 2: Our first proposed work aims to improve flash reliability by developing an un-
derstanding of flash errors through characterization of modern flash chips, and by constructing a
flash device model in the flash controller. This will be an example of our approach that exploits
the unused computing resources in the flash controller and enables greater device-awareness. De-
pending on the availability of real chips we can test, we expect to profile modern NAND flash chips
(e.g., 3D, MLC/TLC/QLC NAND flash chips) to construct and validate our model. We find that a
flash device model can be leveraged to predict raw bit error rate and improve many aspects of flash
reliability and performance [4, 15, 42, 48, 49, 52]. We also find, however, that existing models are
either too simple and inaccurate or too complex to be constructed and used online. Based on these
two findings, we intend to devise a new, accurate, and easy-to-compute flash device model online,
while the flash device is operating. We will especially focus on cutting-edge devices, including 3D
NAND, TLC, and very aggressive MLC devices.
Direction 3: In our second proposed work, we aim to improve flash reliability by designing a
device-aware flash controller that exploits the online-constructed flash model in various ways. This
will be an example of our approach that exploits device awareness in the flash controller. We find
that an accurate online threshold voltage distribution model can be used to accurately estimate
many flash parameters for optimizing flash reliability and performance. These parameters include
the optimal read reference voltage, remaining flash endurance, and likelihood ratio used in the ECC
code (i.e., the probability ratio of a certain threshold voltage range according to its distribution).
By exploiting these estimations in the flash controller, we can improve flash reliability and lifetime
2
with low cost and low performance overhead.
Direction 4: To improve flash reliability, our third proposed work aims to understand and
exploit the flash self-healing effect [11, 21, 37, 56], the phenomenon in which flash memory cells
gradually recover a fraction of their wear over time. This will be an example of our approach that
improves flash reliability by exploiting device-level behavior. We find an opportunity to significantly
improve flash lifetime by exploiting the heat-accelerated self-healing effect in flash memory [11, 21,
37, 56]. We expect to prove or disprove this opportunity through quantitative experimental analysis
of real flash chips. Exploiting the self-healing effect is challenging because high temperatures may
induce retention errors that can damage the data stored in flash memory, forcing the data to be
migrated away from flash memory during self-healing operations. To mitigate this problem, we
expect to design a device and workload aware mechanism that maximizes the benefit of self-healing
while minimizing expensive data migration. By doing this, we aim to unleash the extended flash
lifetime due to the self-healing effect with minimal performance loss. The result of this work and
the possibility of designing a mechanism is heavily dependent on the nature and the characteristics
of the data that we collect and analyze.
We believe the findings of this thesis will also be applicable to other memory technologies.
For example, the idea of using architectural techniques that take advantage of device-level and
application-level characteristics to improve memory reliability can also be applied to DRAM. To
achieve better cost-effectiveness in data centers, we have proposed to enable flexible memory reli-
ability provisioning using Heterogeneous-Reliability Memory techniques [28, 29]. Our key results
show that our techniques can reduce server hardware cost by 4.7% while achieving 99.90% single
server reliability. This work is an example of our approach that exploits application-level memory
error tolerance as well as the device-level memory error characteristics.
In this thesis, we expect and hope to make the following new contributions:
• Develop a new mechanism to relax flash retention time constraints for write-hot data, such that
we can eliminate unnecessary refresh operations and extend flash lifetime.
• Construct a new, easy-to-compute, and accurate flash device model by experimentally charac-
terizing real modern flash chips (e.g., 3D, MLC, TLC, QLC chips).
• Develop new mechanisms to improve flash reliability and lifetime by exploiting our new online
flash device models.
• Examine and quantitatively evaluate the self-healing effect in real flash chips.
• Develop new techniques that take advantage of the self-healing effect to improve flash memory
reliability with low performance overhead.
2. Background
The reliability of NAND flash-based Solid-State Drives (SSDs) is becoming increasingly impor-
tant in modern computing systems because of three reasons. First, SSDs have become common
in today’s datacenters, laptops, and PCs as they provide significant performance benefit over a
3
traditional hard disk drive (HDD). Second, SSDs are typically used as a persistent (or non-volatile)
storage device, and reliability can lead to losing precious user data. Third, SSDs are becoming more
tightly-coupled into the system. While many of today’s SSDs are connected to the host through
an I/O bus interface such as Serial ATA (SATA), future SSDs will be connected through the PCI
Express (PCIe) or DDR memory bus to reduce access latency. As an example, Figure 1 shows how
the SSD fits into today’s system. In this figure, the SSD is first connected to the southbridge (or
I/O controller hub) through the SATA bus, then to the northbridge (or memory controller hub)
through an internal bus, and to the host CPU through the front-side bus (FSB). In today’s system,
the data from the SSD is first copied to DRAM, which is connected to the northbridge through
the memory bus, then becomes accessible by the host CPU. In contrast, future PCIe or DDR SSDs
bypass the southbridge and can be directly accessed by the host CPU.
SSD
Host CPU
Northbridge
FSB
SATA Bus Southbridge
DRAM
Internal Bus
Figure 1. How SSD fits in today’s system.
In this section, we will provide background information on how a NAND flash-based SSD works
and how NAND flash errors are induced to affect SSD reliability. We will demonstrate the internal
organization of an SSD from its high-level architecture (Section 2.1) to each individual flash cell
that stores its data (Section 2.2). Then we will describe how different types of circuit-level noise
induce flash errors (Section 2.3). We will also describe three basic operations that enable storing
and retrieving data in modern NAND flash memory (Section 2.4).
2.1. SSD Organization
Overview: Figure 2 plots the organization of a typical NAND flash-based SSD. The SSD
stores its data in flash chips, each of which can be accessed independently. The flash chips
communicate with the host through a flash controller. The flash controller consists of various
logical and physical modules that handle raw bit errors in the flash chips and bridge the gap
between the host interface and the flash interface. The ECC encoder/decoder module encodes all
the data stored in the flash chip and detects and, if possible, corrects any raw bit error when the
data is accessed. To minimize latency and energy consumption, the ECC encoder/decoder module
4
is typically implemented as an application-specific integrated circuit (ASIC). The other modules
are typically implemented in the firmware that runs on an embedded processor within the flash
controller. These modules include the host and the flash interface, and the flash translation
layer (FTL). Some high-end SSDs contain DRAM to allow buffering and to speed up the flash
controller. Next, we will introduce relevant components in turn.
SSD Organization
Flash Controller
Embedded Processor
ECC Encoder/Decoder
Ho
st In
terf
ace
FTL Fl
ash
Inte
rfac
e
DRAM
Flash Mgmt. Policies
Flash chip Flash chip
Flash chip Flash chip
HO
ST
Figure 2. SSD organization.
Flash chip: Each flash chip can be read or written at the granularity of a flash page, which is
typically 8− 16KB in today’s flash devices. Before a flash page can be overwritten with new data,
the old data contained in that page has to be erased. Due to limitations in its circuit design [36],
NAND flash memory performs erase operations at the granularity of a flash block, which typically
consists of 256− 512 flash pages. Figure 3 plots a typical example of how flash pages are organized
within a flash chip. In this example, each flash chip consists of two flash dies, each of which
consists of two flash planes. Within a flash plane, all flash pages share the same sense amplifier,
which buffers the data when accessing a page from the plane.
Host and flash interfaces: The host interface, such as SCSI [53], SATA [33], or NVMe [1], is
distinct from the flash interface. These interfaces are each handled by separate interface handlers
in the embedded processor. The host interface handles the conventional sector’s read and write
requests issued by the file system. The flash interface contains flash-specific commands necessary
for managing or optimizing flash performance and reliability (e.g., erase, access special registers,
etc.), which are typically hidden from the host. The flash interface also enforces several NAND
flash-specific constraints. For example, a flash block must be erased before writing new data, and
flash pages within a block must be written in page order. To bridge the gap between the host and
flash interfaces, the FTL, which sits in between the two interfaces, translates the incoming requests
into corresponding flash commands and addresses. Also hidden from the host interface are the
raw flash errors in the flash chips. To correct these errors, all data transferred through the flash
5
Flash Chip Flash Die 0
Plane 0
Block 0 Page 0 Page 1
Page 255
. . .
. . .
Sense Amp.
Block 1023 Page 0 Page 1
Page 255
. . .
Plane 1
Block 0 Page 0 Page 1
Page 255
. . .
. . .
Sense Amp.
Block 1023 Page 0 Page 1
Page 255
. . .
Flash Die 1
Plane 0
Block 0 Page 0 Page 1
Page 255
. . .
. . . Sense Amp.
Block 1023 Page 0 Page 1
Page 255
. . .
Plane 1
Block 0 Page 0 Page 1
Page 255
. . .
. . .
Sense Amp.
Block 1023 Page 0 Page 1
Page 255
. . .
Figure 3. Flash chip organization.
interface to or from the flash chips is encoded or decoded by the ECC encoder/decoder unit.
Flash translation layer (FTL): A modern FTL performs three major tasks (or flash man-
agement policies) to bridge the host and flash interface. First, the FTL maintains a mapping table
which translates each requested logical page number from the host interface to a physical page
number in the flash chips. If the host request is a write request, the FTL updates the table entry
for the requested logical page with the next available physical page number, then invalidates the
physical page that contains the old data. Second, the FTL performs garbage collection, which
finds and erases invalid flash blocks such that the block can be reused. If necessary, the FTL moves
valid data to create reusable flash blocks. Third, the FTL records the amount of wear on each flash
block and performs wear-leveling, which aims to evenly distribute writes across all flash blocks.
These three tasks within FTL are typically implemented in the firmware running on the embedded
processor within the flash controller.
2.2. NAND Flash Block Organization
As we show in Section 2.1, a flash chip is organized into multiple flash blocks. Each flash block
consists of a two-dimensional array of flash cells (or floating-gate transistors), which are the basic
storage components of flash memories. A single-level cell (SLC) can store one bit; a multi-level
cell (MLC) can store two bits; a triple-level cell (TLC) can store three bits. In this section, we
explain the organization of an MLC flash block as an example.
Figure 4 plots the internal logical structure of an MLC flash block. Within the block, all the
cells in the same row share a common wordline (i.e., WL in Figure 4), which typically spans 32K
to 64K cells. In an MLC flash block, each cell stores a 2-bit value, which we represent as a tuple
(LSB, MSB) in this section, where LSB is the least significant bit and MSB is the most significant
bit. The LSBs stored in each wordline form an LSB page, and the MSBs stored in each wordline
6
form an MSB page. Within a block, all cells in the same column are connected in series to form a
bitline or string (i.e., BL in Figure 4). Each string, on one end, is connected to the ground (GND)
through a transistor controlled by ground select line (GSL), and is connected to the sense amplifier
(also shown in Figure 3) through a transistor controlled by string select line (SSL). Multiple strings
in a flash plane are connected in parallel to a common sense amplifier. During a read operation,
the sense amplifier on each string decodes the two-bit value stored in one cell selected by the SSL
and the WL signal. Note that the same flash block organization shown in Figure 4 applies to 3D
NAND.
BL-3 BL-2 BL-1 BL-0
WL-N
WL-2
WL-1
WL-0
Flash Cell
Sense Amp.
GSL
SSL
GND
Figure 4. Flash block organization.
2.3. NAND Flash Errors
To understand the nature of flash errors, we first explain how data is stored in flash cells. We
then explain how various types of circuit-level noise affect flash memories and lead to flash errors.
Threshold Voltage Distribution: NAND flash memory represents data using the threshold
voltage of each flash cell. For an MLC NAND flash memory, each flash cell stores a two-bit value,
and can be programmed to one of four threshold voltage states, which we call the ER, P1, P2, and
P3 states. Each state represents a different two-bit value, and is assigned a voltage window within
the range of all possible threshold voltages. Due to variation across different program operations,
7
the threshold voltage of flash cells programmed to the same state is initially distributed across this
voltage window. This is called the threshold voltage distribution of a state.
Figure 5 illustrates the threshold voltage distribution of an MLC NAND flash memory chip,
for all possible (four) states. The x-axis shows the threshold voltage (Vth), which spans a certain
voltage range. The y-axis shows the probability density at each voltage level across all flash memory
cells. The threshold voltage distribution of each threshold voltage state can be represented as a
probability density curve that spans its own voltage window. We label the distribution curve for
each state with the name of that state and its corresponding two-bit value (i.e., LSB and MSB
values). The boundaries between neighboring threshold voltage windows are referred to as read
reference voltages, labeled Va, Vb, and Vc in Figure 5. These voltages are used by the flash
controller to identify the voltage window (i.e., state) of each cell. For example, a flash cell is
considered as in P1 state, representing a value of 10, if the threshold voltage of this cell is between
Va and Vb.
Threshold voltage (Vth)
Pro
ba
bili
ty d
ensi
ty
ER (11)
P1 (10)
P2 (00)
P3 (01)
Va Vb Vc
LSB MSB
Figure 5. Threshold voltage distribution of an MLC NAND flash memory chip.
Noise and Errors: Over time, the threshold voltage distributions shift, as shown in Figure 6,
and can overlap with each other. These shifts can happen due to various types of noise in flash
memory. Such noise includes: (1) program/erase (P/E) cycling noise, causing the flash cells with
higher wear out to end up with higher threshold voltages right after programming [3, 4, 48]; (2)
cell-to-cell program interference noise, increasing the threshold voltage of neighboring flash cells in
adjacent rows during a program operation [3, 7, 10, 35], (3) read disturb noise, slightly increasing
the threshold voltage of all other flash pages within a block during a read operation [5, 13, 35],
and (4) retention noise, decreasing the threshold voltage of a flash cell over time due to charge
leakage [6, 8, 9, 35].
If the threshold voltage of a flash cell shifts into the threshold voltage window of a state other
than the one it was originally programmed in, the cell can be misread, leading to a flash error.
For example, a cell in the rightmost part of the ER state distribution in Figure 6 will be misread
as being in the P1 state. Similarly, any cell that shifts across the original read reference voltages
(Va, Vb, Vc) induces a raw bit error. As such, the raw bit errors (i.e., the shaded area in Figure 6)
increase as a result of threshold voltage shifts.
8
Threshold voltage (Vth)
Pro
ba
bili
ty d
ensi
ty
ER (11)
P1 (10)
P2 (00)
P3 (01)
Va Vb Vc
Errors when reading with Va
Figure 6. Threshold voltage distributions after the voltages shift, leading to raw bit errors.
2.4. NAND Flash Operations
NAND flash memory supports three basic operations: read, program, and erase. We first
explain how read operations are performed in modern flash chips. We then explain how erase and
program operations are performed. Recall from Section 2.1 that read and program operations are
performed at the page granularity, whereas erase operations are performed at the block granularity.
Read Operation: Data can be read from NAND flash memory by applying read reference
voltages onto the control gate of each cell, to sense the cell’s threshold voltage. To read the LSB of
the cell, we only need to distinguish the states with an LSB value of 1 (ER and P1) from those with
an LSB value of 0 (P2 and P3). As Figure 5 shows, we only need to use one read reference voltage,
Vb, to read the LSB page. To read the MSB page, we need to distinguish the states with an MSB
value of 1 (ER and P3) from those with an MSB value of 0 (P1 and P2). Therefore, we need to
determine whether or not the threshold voltage of the cell falls between Va and Vc, requiring us to
apply both read reference voltages, one at a time, to determine the MSB data.
As we discussed in Section 2.3, the threshold voltage distributions of each state can shift over
time, causing some cells to move into neighboring voltage windows. To mitigate the number of errors
that occur when a large number of cells experience threshold voltage shifts, modern NAND flash
chips support the read-retry mechanism, which adapts the read reference voltages to correspond
to the distribution shifts [4, 6]. The read-retry operation allows the flash controller to increase or
decrease the read reference voltages by multiples of a minimal voltage step (Vstep), iterating over
several potential values of each read reference voltage to empirically find the voltage value that
yields the lowest raw bit error rate. As we show in Figure 7, the new boundaries between the
shifted threshold voltage distributions, Va, Vb, and Vc, yield the lowest raw bit errors (i.e., the size
of the shaded areas is the smallest), thus we consider them as the optimal read reference voltages.
By trying to re-read the same flash page with different read reference voltages after a read failure,
the flash controller increases the chances of reading the data correctly. For example, in Figure 6, the
original read reference voltage Va, between the ER and P1 states, misreads many of the flash cells
(e.g., many cells actually belonging to the ER state would be incorrectly identified as belonging to
the P1 state), but the flash controller can move move the read reference voltage to V ′a (as shown
9
in Figure 7), using the read-retry mechanism, thereby significantly reducing the number of errors.
Threshold voltage (Vth)
Pro
ba
bili
ty d
ensi
ty
ER (11)
P1 (10)
P2 (00)
P3 (01)
Va Va’ Vb Vb
’ Vc Vc’
Errors when reading with Va’
Figure 7. Threshold voltage distributions after the voltages shift, resulting in overlapping distri-butions.
Erase and Program Operations: In NAND flash, data can be programmed only into an
erased flash cell. Since negative gate voltage can only be applied for all flash cells in a block, a flash
block must be erased in its entirety [36]. The erase operation resets the threshold voltage state of
all cells in the flash block to the ER state.
When data is programmed, charge is transferred into the floating gate of a flash cell by repeat-
edly pulsing the programming voltage, in a procedure known as incremental-step-pulse programming
(ISPP) [51]. In order to reduce the impact of interference caused by the programming process on
the neighboring cells (called cell-to-cell program interference [7]), two-step programming is employed
for MLC NAND flash: the LSB is first programmed into a cell, and then the MSB is programmed
only after partial data is programmed into neighboring cells [47]. In the first step, a flash cell is
partially programmed based on its LSB value, either staying in the ER state or moving to a tempo-
rary state (TP), whose mean voltage falls between states P1 and P2. In the second step, the LSB
data is first be read back into an internal buffer to determine the cell’s current threshold voltage,
and then further programming pulses are applied based on the MSB data, to increase the cell’s
threshold voltage into its final state.
3. Related Work
Many prior works have proposed to improve flash reliability through characterization, mod-
eling, and optimization techniques. We classify these works into different groups based on their
similarities.
3.1. Threshold Voltage Distribution Modeling
Characterization: Prior work has experimentally characterized the threshold voltage dis-
tribution, and how various types of noise can change that distribution. Prior work makes the
following conclusions about threshold voltage shifts: 1) Threshold voltage distribution shifts to the
10
right (i.e., towards higher threshold voltages) and becomes wider as program/erase (or P/E) cycle
increases [4, 48]; 2) Threshold voltage distribution shifts to the right when neighboring cells are
programmed to higher voltage states [7, 10]; 3) The distribution of the lower voltage states shifts
to the right when other pages in the same block are read repeatedly [5, 43]; 4) Threshold voltage
distribution shifts to the left (i.e., towards lower threshold voltages) over time due to retention
loss [6, 8, 9]. These experimental characterization works focus mostly on state-of-the-art MLC
NAND flash chips. In contrast, our proposal plans to characterize newer generation TLC or 3D
NAND flash chips, which can be more susceptible to errors than MLC NAND flash due to a smaller
voltage margin (for TLC NAND) or an extra dimension for interference (for 3D NAND).
Prior work also provides several detailed field studies of flash errors in data centers [34, 41, 50].
Unlike experimental characterization using few flash chips, these large-scale field studies cover a
large number of flash devices with many different flash vendors and technologies. However, these
field studies can only characterize the average correctable bit error rate for each drive. They cannot
obtain threshold voltage distribution, cannot obtain raw bit error rate for each flash block/page,
cannot correlate with an accurate P/E cycle count, retention time, or read count for each block, and
does not have enough data for the latest flash technology. In contrast, by performing experimental
characterization, our approach obtains much more detail about threshold voltage distribution and
flash errors than large-scale studies.
Modeling: Based on these experimental characterization results, prior work has also proposed
models of the threshold voltage distribution and the distribution shifts [4, 48]. These models
are based on different distributions, such as Gaussian distribution [4], Normal-Laplace mixture
distribution [48], and other distributions [4], to achieve higher accuracies. On the other hand, some
prior work models the distribution based on idealized circuit-level models [15, 38, 42]. These models
capture some desired behavior, but are less accurate than those derived from real characterization.
While these previous models can be used for offline analysis, they are unsuitable for online evaluation
because they are either prohibitively complex to implement or not accurate enough. In contrast,
our work plans to propose an online threshold voltage distribution model that is both simple to
implement and accurate.
Applications of the models: Prior work has proposed and evaluated techniques that uses
a threshold voltage distribution model for different purposes, such as (1) constructing a raw bit
error rate model offline [48, 49], (2) finding the optimal read reference voltage [6, 43, 52], and (3)
optimizing ECC offline [15]. These use cases can only optimize flash reliability offline because they
are limited by the high overhead of the characterization and modeling techniques, as we discussed
before. Our work plans to reduce these overheads such that these use cases can become suitable
for online usage, and potentially develop new online techniques to further optimize flash lifetime.
3.2. Flash Media Reliability Optimization
One way to improve flash reliability is by reducing noise (i.e., raw bit errors) in the raw flash
media. Improving the quality of raw flash media often requires circuit-level changes to the flash
11
chip. For example, to reduce overall read noise, prior work proposes to improve the precision of
read operations using the read-retry technique [4]. The read-retry technique can also be combined
with the knowledge of how threshold voltage shifts (shown in Section 3.1), and adjust the read ref-
erence voltages to P/E cycles [44, 48], cell-to-cell program interference [7, 10, 44], read disturb [43],
and retention [6]. To mitigate read disturb noise, prior work proposes to add simple logic in the
flash chip to enable dynamic pass-through voltage tuning [5]. To mitigate P/E cycling and pro-
gram interference noise, prior work also proposes to use charge-trap flash instead of traditional
floating-gate cells [16]. While these techniques directly optimize flash media reliability by exploit-
ing underlying device characteristics, they do not take advantage of application behavior like our
proposed architectural techniques do.
3.3. Flash Channel Noise Tolerance Techniques
Another way to improve flash reliability is by tolerating more noise in the unreliable flash
channel using error-correcting codes (ECC). To more efficiently tolerate all kinds of noise, state-of-
the-art SSDs have started to use stronger ECC such as low-density parity-check (LDPC) codes [18]
instead of traditional BCH codes [2]. LDPC code is known to be a capacity-approaching code,
which can tolerate near theoretical maximum bit error rates (the Shannon limit) [32]. Prior work
has studied how to reach this limit by exploiting soft information in LDPC decoders using read-
retry and an iterative decoding algorithm [54]. Prior work has also studied ECC techniques that
tolerate flash error variation [22] and process variation [39, 40]. However, stronger ECC requires
large storage and logic overhead, and has diminishing returns in reliability improvement [8]. On
the other hand, many opportunities to exploit flash error characteristics in the ECC design remain
largely unexploited, which are included as part of this proposal.
3.4. Flash Reliability Management Techniques
Flash reliability varies significantly depending on the usage pattern. Thus, to improve flash
reliability and lifetime, prior work also proposes better flash management techniques to ensure
friendly usage patterns by optimizing flash translation layer (FTL) algorithms. For example, to
reduce unnecessary erase operations, prior work optimizes page allocation policy to achieve higher
spatial locality of write operations [19, 30, 31, 45]. Prior work also proposes techniques to minimize
endurance cycles consumed by FTL metadata [14, 46]. To mitigate program interference errors,
prior work proposes to use certain program sequences, instead of allowing random writes, also
managed by the FTL [7, 10, 46]. To mitigate read disturb errors, prior work proposes to redistribute
read-hot pages across different flash blocks [25]. To mitigate retention errors, prior work proposes to
use various flash refresh techniques that periodically rewrites all data with high retention age [8, 9].
Prior work also proposes techniques to allow the host to provide more information to the FTL
for better management of flash reliability. For example, to achieve longer flash lifetime, prior work
proposes to allow the file system to issue TRIM commands such that the FTL, given more free
space, can perform garbage collection more efficiently [17]. This technique is now widely deployed
12
in today’s systems. Recent work also proposes a multi-stream technique to further optimize flash
lifetime by allowing the file system to provide hints about write-hotness of the data and the FTL to
map data accordingly [20]. Demonstrated by these prior works, exploiting both device character-
istics and application behavior in the FTL often leads to significant reliability improvements with
low overhead. As part of this proposal, our work aims to develop new flash reliability management
techniques to exploit newly-discovered device characteristics and application behavior.
3.5. Host-Side Data Management Techniques
Because flash memory wears out as more data is written, we can also improve flash reliability
and lifetime by reducing the total amount of data written to the flash memory using host-side data
management techniques. For example, prior work has proposed to eliminate redundant writes due
to small updates (i.e., when updating a small piece of data within a page, only the updated data
needs to be written) [12, 26, 55]. Other recent work has also proposed to use compression [23] and
deduplication [24] to reduce the data size written to the flash memory. Compared to these pieces
of work, our proposal aims to bring flash device awareness to the architecture level so that we can
further improve flash reliability in an efficient way.
3.6. Self-Healing Effect
Prior work has proposed idealized circuit-level models for the self-healing effect [37, 56], demon-
strating significant opportunities for using the self-healing effect to improve flash reliability and
lifetime. Based on the assumptions about how self-healing effect works, prior work has also pro-
posed techniques to exploit this effect to improve flash lifetime such as heal-leveling [11], write
throttling [21], and heat-accelerated self-healing [56]. However, these previous results are not yet
convincing enough to show that self-healing effect can successfully improve flash lifetime on real
devices, because they lack real experimental data and evidence supporting the self-healing effect on
modern flash devices. Our work aims to be the first to demonstrate and comprehensively evaluate
the benefit of self-healing effect using experimental data from real flash chips.
4. Preliminary Work: WARM—Write-hotness Aware Retention
Management
WARM is our preliminary work that exploits application-level write-hotness and device level
retention characteristics to improve flash lifetime. Our goal is to eliminate redundant refreshes
for write-hot pages with minimal storage and performance overhead. This work proposes a write-
hotness-aware flash memory retention management policy, WARM. The first key idea of WARM
is to effectively partition pages stored in flash into two groups based on the write frequency of
the pages. The second key idea of WARM is to apply different management policies to the two
different groups of pages/blocks. We first discuss a novel, lightweight approach to dynamically
identifying and partitioning write-hot versus write-cold pages (Section 4.1). We then describe
13
how WARM optimizes flash management policies, such as garbage collection and wear-leveling,
in a partitioned flash memory, and show how WARM integrates with a refresh mechanism to
provide further flash lifetime improvements (Section 4.2). We also summarize our evaluation of
flash lifetime improvement delivered by WARM, and our analysis of hardware and performance
overhead to implement WARM in Section 4.3.
4.1. Partitioning Data Using Write-Hotness
4.1.1. Identifying Write-Hot and Write-Cold Data
Figure 8 illustrates the high-level concept of our write-hot data identification mechanism. We
maintain two virtual queues, one for write-hot data and another for write-cold data, which order
all of the hot and cold data, respectively, by the time of the last write. The purpose of the virtual
queues is to partition write-hot and write-cold data in a space-efficient way. The partitioning
mechanism provides methods of promoting data from the cold virtual queue to the hot virtual
queue, and for demoting data from the hot virtual queue to the cold virtual queue. The promotion
and demotion decisions are made such that write-hot pages are quickly identified (after two writes
in quick succession to the page), and write-cold pages are seldom misidentified as write-hot pages
(and are quickly demoted if they are). Note that the cold virtual queue is divided into two parts,
with the part closer to the tail known as the cooldown window. The purpose of the cooldown
window is to identify those pages that are most recently written to. The pages in the cooldown
window are the only ones that can be immediately promoted to the hot virtual queue (as soon as
they receive a write request). We walk through examples for both of these migration decisions.
Hot Virtual Queue
Hot Window
Hot Data
Cold Virtual Queue
Cooldown Window
Cold Data ……④ ⑤
⑥ ②
①
③
TAIL HEAD TAIL HEAD
Figure 8. Write-hot data identification algorithm using two virtual queues and monitoring win-dows.
Initially, all data is stored in the cold virtual queue. Any data stored in the cold virtual queue
is defined to be cold. When data (which we call Page C) is first identified as cold, a corresponding
queue entry is pushed into the tail of the cold virtual queue ( 1○). This entry progresses forward in
the queue as other cold data is written. If Page C is written to again after it leaves the cooldown
window ( 2○), then its queue entry will be removed from the cold virtual queue and reinserted at
the queue tail ( 1○). This allows the queue to maintain ordering based on the time of the most
recent write to each page.
If a cold page starts to become hot (i.e., it starts being written to frequently), a cooldown
window at the tail end of the cold virtual queue provides these pages with a chance to be promoted
into the hot virtual queue. The cooldown window monitors the most recently inserted (i.e., most
14
recently written) cold data. Let us assume that Page C has just been inserted into the tail of
the cold virtual queue ( 1○). If Page C is written to again while it is still within the cooldown
window, it will be immediately promoted to the hot virtual queue ( 3○). If, on the other hand,
Page C is not written to again, then Page C will eventually be pushed out of the cooldown window
portion of the cold virtual queue, at which point Page C is determined to be cold. Requiring a
two-step promotion process from cold to hot (with the use of a cooldown window) allows us to
avoid incorrectly promoting cold pages due to infrequent writes. This is important for two reasons:
(1) hot storage capacity is limited, and (2) promoted pages will not be refreshed, which for cold
pages could result in data loss. With our two-step approach, if Page C is cold and is written to
only once, it will remain in the cold queue, though it will be moved into the cooldown window ( 2○)
to be monitored for subsequent write activity.
Any data stored in the hot virtual queue is identified as hot. Newly-identified hot data, which
we call Page H, is inserted into the tail of the hot virtual queue ( 4○). The hot virtual queue length is
maximally bounded by a hot window size to ensure that the most recent writes to all hot data pages
were performed within a given time period. (We discuss how this window is sized in Section 4.1.3.)
The assumption here is that infrequently-written pages in the hot virtual queue will eventually
progress to the head of the queue ( 5○). If the entry for Page H in the hot virtual queue reaches the
head of the queue and must now be evicted, we demote Page H into the cooldown window of the
cold virtual queue ( 1○), and move the page out of the hot virtual queue. In contrast, a write to a
page in the hot virtual queue simply moves that page to the tail of the hot virtual queue ( 6○).
4.1.2. Partitioning the Flash Device
Figure 9 shows how we apply the identification mechanism from Section 4.1.1 to perform physical
page partitioning inside flash, with labels that correspond to the actions from Figure 8. We first
separate all of the flash blocks into two allocation pools, one for hot data and another for cold data.
The hot pool contains enough blocks to store every page in the hot virtual queue (whose sizing is
described in Section 4.1.3), as well as some extra blocks to tolerate management overhead (e.g.,
erasing on garbage collection). The cold pool contains all of the remaining flash blocks. Note that
blocks can be moved between the two pools when the queues are resized.
Hot block pool(Hot window)
Hit in the cooldown window ③
⑥ Hit in the hot window
Promote to cooldown window
⑤ Migrate valid pages
Cooldownwindow
②
Cold block pool
Figure 9. Write-hotness aware retention management policy overview.
To simplify the hardware required to implement the virtual queues, we exploit the fact that
15
pages are written sequentially into the hot pool blocks. Consecutive writes to hot pages will be
placed in the same block, which means that a single block in the hot virtual queue will hold all of
the oldest pages. As a result, we can track the hot virtual queue at a block granularity instead of
a page granularity, which allows us to significantly reduce the size of the hot virtual queue.
4.1.3. Tuning the Partition Boundary
Since the division between hot and cold data can be dependent on both application and phase
characteristics, we need to provide a method for dynamically adjusting the size of our hot and cold
pools periodically. Every block is allocated to one of the two pools, so any increase in the hot pool
size will always be paired with a corresponding decrease in the cold pool size, and vice versa. Our
dynamic sizing mechanism must ensure that: (1) the hot pool size is such that every page in the hot
pool will be written to more frequently than the hot pool retention time (which is relaxed as the
hot pool does not employ refresh), and (2) the lifetime of the blocks in the cold pool is maximized.
To this end, we describe an algorithm that tunes the partitioning of blocks between the hot and
cold pools.
The partitioning algorithm starts by setting an upper bound for the hot window, to ensure that
every page in the window will be written to at a greater rate than the fixed hot pool retention time.
Our goal is to relax hot pool retention time guarantee to provide greater endurance. We estimate
this size by collecting the number of writes to the hot pool, to find the average write frequency
and estimate the time it takes to fill the hot window. We compare the time to fill the window to
the hot pool retention time, and if the fill time exceeds the retention time, we shrink the hot pool
size to reduce the required fill time. This hot pool size determines the initial partition boundary
between the hot pool and the cold pool.
We then tune this partition boundary to maximize the lifetime of the cold pool, since we do not
relax retention time for the blocks in the cold pool. Assuming that wear-leveling evenly distributes
the page writes within the cold pool, we can use the endurance capacity metric (i.e., the total
number of writes the cold pool can service), which is the product of the remaining endurance of a
block1 and the cold pool size, to estimate the lifetime of blocks in the cold pool:
EnduranceCapacity = Remaining Endurance× ColdPool Size (1)
Lifetime =EnduranceCapacity
ColdWrite Frequency∝ ColdPool Size
ColdWrite Frequency(2)
We divide the endurance capacity by the cold write frequency (writes per day) to determine
the number of days remaining before the cold pool is worn out. We use hill climbing to find
the partition boundary at which the cold pool size maximizes the flash lifetime. The cold write
frequency is dependent on cold pool size, because as the cold pool size increases, the hot pool size
correspondingly shrinks, shifting writes of higher frequency into the cold pool.
Finally, once the partition boundary converges to obtain the maximum lifetime, we must adjust
what portion of the cold pool belongs in the cooldown window. We size this window to minimize
1Due to wear-leveling, the remaining endurance (i.e., the number of P/E operations that can still be performedon the block) is the same across all of the blocks.
16
the ping-ponging of requests between the hot and cold pools. For this, we want to maximize the
number of hot virtual queue hits ( 6○ in Figure 8), while minimizing the number of requests evicted
from the hot window ( 5○ in Figure 8). We maintain a counter of each of these events, and then use
hill climbing on the cooldown window size to maximize the utility function Utility = ( 6○ – 5○).
In our work, we limit the hot pool size to the number of over-provisioned blocks within the flash
device (i.e., the extra blocks beyond the visible capacity of the device). While the hot pages are
expected to represent only a small portion of the total flash capacity, there may be rare cases where
the size limit prevents the hot pool from holding all of the hot data (i.e., the hot pool is significantly
undersized). In such a case, some less-hot pages are forced to reside in the cold pool, and lose the
benefits of WARM (i.e., endurance improvements from relaxed retention times). WARM will not,
however, incur any further write overhead from keeping the less-hot pages in the cold pool. For
example, the dynamic sizing of the cooldown window prevents the less-hot pages from going back
and forth between the hot and cold pools.
4.2. Flash Management Policies
WARM partitions all of the blocks in a flash device into two pools, storing write-hot data in
the blocks belonging to the hot pool, and storing write-cold data in the blocks belonging to the cold
pool. Because of the different degrees of write-hotness of the data in each pool, WARM also applies
different management policies (i.e., refresh, garbage collection, and wear-leveling) to each pool, to
best extend their lifetime. We next describe these management policies for each pool, both when
WARM is applied alone and when WARM is applied along with refresh.
4.2.1. WARM-Only Management
WARM relaxes the internal retention time of only the blocks in the hot pool, without requiring
a refresh mechanism for the hot pool. Within the cold pool, WARM applies conventional garbage
collection (i.e., finding the block with the fewest valid pages to minimize unnecessary data move-
ment) and wear-leveling policies. Since the flash blocks in the cold pool contain data with much
lower write frequencies, they (1) consume a smaller number of P/E cycles, and (2) experience much
lower fragmentation (which only occurs when a page is updated), thus reducing garbage collec-
tion activities. As such, the lifetime of blocks in the cold pool increases even when conventional
management policies are applied.
Within the hot pool, WARM applies simple, in-order garbage collection (i.e., finding the oldest
block) and no wear-leveling policies. WARM performs writes to hot pool blocks in block order
(i.e., it starts on the block with the lowest ID number, and then advances to the block with the
next lowest ID number) to maintain a sequential ordering by write time. Writing pages in block
order enables garbage collection in the hot pool to also be performed in block order. Due to the
higher write frequency in the hot pool, all data in the hot pool is valid for a shorter amount of
time. Most of the pages in the oldest block are already invalid when the block is garbage collected,
increasing garbage collection efficiency. Since both writing and garbage collection are performed
17
in block order, each of the blocks will be naturally wear-leveled, as they will all incur the same
number of P/E cycles. Thus, we do not need to apply any additional wear-leveling policy.
4.2.2. Combining WARM with Refresh
WARM can also be used in conjunction with a refresh mechanism to reap additional endurance
benefits. WARM, on its own, can significantly extend the lifetime of a flash device by enabling re-
tention time relaxation on only the write-hot pages. However, these benefits are limited, as the cold
pool blocks will eventually exhaust their endurance at the original internal retention time. While
WARM cannot enable retention time relaxation on the cold pool blocks due to infrequent writes to
such blocks, a refresh mechanism can enable the relaxation, greatly extending the endurance of the
cold pool blocks. WARM still provides benefits over a refresh mechanism for the hot pool blocks,
since it avoids unnecessary write operations that refresh operations would incur.
When WARM and refresh are combined, we split the lifetime of the flash device into two phases.
The flash device starts in the pre-refresh phase, during which the same management policies as
WARM-only are applied. Note that during this phase, internal retention time is only relaxed for
the hot pool blocks. Once the endurance at the original retention time is exhausted, we enter
the refresh phase, during which the same management policies as WARM-only are applied and a
refresh policy (such as FCR [8]) is applied to the cold pool to avoid data loss. During this phase,
the retention time is relaxed for all blocks. Note that during both phases, the internal retention
time for hot pool blocks is always relaxed without the need for a refresh policy.
During the refresh phase, WARM also performs global wear-leveling to prevent the hot pool
from being prematurely worn out. The global wear-leveling policy rotates the entire hot pool to a
new set of physical flash blocks (which were previously part of the cold pool) every 1K hot block
P/E cycles. Over time, this rotation will use all of the flash blocks in the device for the hot pool
for one 1K P/E cycle interval. Thus, WARM wears out all of the flash blocks equally despite the
heterogeneity in write-frequency between the two pools.
4.3. Summary of Results
We evaluate lifetime improvement using an I/O trace based simulator, which simulates a NAND
flash-based SSD with different flash management policies. Figure 10 plots flash lifetime provided by
WARM alone (WARM), adaptive rate flash correct and refresh mechanism (ARFCR) [8], and WARM
combined with refresh (WARM+ARFCR), normalized to a conventional management policy without
WARM or refresh (Baseline). Using these results, we show that, when applied alone, WARM
improves overall flash lifetime by an average of 3.24× over Baseline. When WARM is applied
together with an adaptive refresh mechanism, the average lifetime improves by 12.9×, 1.21× over
adaptive refresh alone. We also analyze the hardware and performance overhead of WARM. WARM
requires four hardware counters and 1056B memory overhead. In the worse case, WARM has a
performance penalty of 5.8% over Baseline due to flash management overhead. On average across
all workloads, this overhead is negligible (<2%). In conclusion, WARM can improve flash lifetime
18
significantly while requiring minimal hardware and performance overhead.
0
1
2
4
8
16
Baseline WARM ARFCR WARM+ARFCRNo
rmal
ized
Lif
etim
e Im
pro
vem
ent
Figure 10. Normalized lifetime improvement when WARM is applied on top of Baseline andARFCR.
5. Proposed Work 1: Online Characterization and Modeling of
NAND Flash Memory Errors
Motivation: NAND flash memory errors are common in raw flash chips and they significantly
impact flash reliability. To guarantee the reliable operation of NAND flash memory, strong ECC
codes are applied to mask these errors from the user, leading to significant hardware and capacity
overhead [2, 8, 18, 54]. Understanding these errors through offline characterization and modeling
can enable more cost-effective ways to tolerate them than uniformly applying stronger ECC codes
regardless of the error properties. We expect to examine modern NAND flash chips such as 3D,
multi-level cell (MLC), or triple-level cell (TLC) flash chips to understand the nature of these
errors, depending on the availability of the NAND flash chips we can test. Based on these results,
we expect to construct an accurate threshold voltage distribution model online, which will enable
other mechanisms to exploit the knowledge for improving flash reliability. In this work, we hope to
arrive at a new online mechanism to characterize and model the threshold voltage distribution of
flash cells during system operation at low cost and low latency. We divide this proposed work into
four major directions.
First, we expect to perform a thorough characterization of the threshold voltage distribution.
Such characterization is enabled by the existing read-retry capability of raw NAND flash chips,
which allows us to sweep the read reference voltage and accurately obtain the threshold voltage
for each flash cell. Using this methodology, we expect to study the properties of the threshold
voltage distribution, especially those that affect NAND flash error rates such as the tail distribution
(i.e., the part of the distribution that is far from its mean). We expect to study the effects of
wear out on the distribution by programming different flash blocks to different P/E cycles before
the characterization. We expect to analyze the effects of temperature, stored data pattern, and
19
retention on the threshold voltage distribution to achieve higher accuracy.
Second, we expect to model the threshold voltage distribution from real characterization data.
We start by statically fitting various distribution models to the characterized data under each P/E
cycle count. We expect to evaluate and compare the accuracy of these different static models by
comparing their estimated NAND flash error rates as well as their modeling error rates. Once we
have determined the best static model, we then expect to model the dynamic shift of threshold
voltage distribution over P/E cycles. We expect to evaluate the accuracy of the dynamic model
by showing how well it can predict future threshold voltage distribution and flash errors using
only data obtained in lower P/E cycles. To make it more practical to construct the models,
we also expect to develop techniques that minimize the computation and hardware overhead of
the construction process of these models, by selecting an easy-to-compute model and designing
hardware to accelerate the models.
Third, we expect to understand different factors affecting the accuracy of such a model. In
particular, we would like to develop an understanding of how temperature, stored data pattern,
and retention age affect the distribution and model accuracy.
Fourth, in order to examine the evolution of flash errors and the accuracy of models for them
in newer generation NAND flash chips, we expect to study error patterns in 3D NAND flash
devices. These include error patterns for P/E cycling errors, retention errors, read disturb errors,
and program interference errors. We also expect to perform similar characterization and modeling
(as described above) on 3D NAND flash chips. We expect to understand different factors affecting
the accuracy of our model. In particular, we would like to develop an understanding of how process
variation across layers, P/E cycling, retention, and read disturb affect the distribution and model
accuracy. We also expect to study how three-dimensional program interference impacts NAND
flash reliability.
Towards these four directions, we expect to answer at least the following research questions:
• How does flash wear out affect threshold voltage distribution and flash error rates? How do
other effects (retention and read disturb) affect threshold voltage distribution and flash error
rates?
• How can we accurately model the threshold voltage distribution under any static amount of
wear out? What is the property of the tail distribution (i.e., the part of the distribution far
away from the mean), and which models can be used to represent the shape of the tail?
• How to model the dynamic effect of threshold voltage distribution shifts under wear out?
• Can we combine the dynamic and static models of the threshold voltage distribution to increase
the prediction accuracy of NAND flash error rate?
• How to minimize the computation and hardware overhead for characterizing and modeling the
threshold voltage distribution in flash controllers?
• How do temperature, stored data pattern, retention age affect the threshold voltage distribution
and accuracy of the models we develop?
20
• What are the flash error characteristics in 3D NAND devices for P/E cycling errors, retention
errors, read disturb errors, and program interference errors?
• How do process variation, P/E cycling, retention, read disturb, and program interference in 3D
NAND affect threshold voltage distribution and our model?
• How can we characterize three dimensional program interference in 3D NAND? Does three
dimensional program interference affect data reliability in neighboring flash block?
6. Proposed Work 2: Model-Driven Flash Management Policies
Motivation: Today’s flash controllers manage multiple flash chips based on a set of fixed, con-
servatively estimated flash parameters provided by the flash vendor. These parameters, such as
read reference voltages, ECC strength, flash memory health, etc., are not specifically tuned for the
NAND flash chips connected to the flash controller and therefore cannot adjust to the amount of
wear on each flash block to improve flash reliability and performance. In this work, however, we
expect to take advantage of the threshold voltage distribution model constructed in Section 5. Our
goal is to show that our online model can be exploited in various ways to improve flash reliability.
Our approach can be divided into three steps.
First, we aim to find out which flash parameters to estimate. We expect to explore those
parameters that can be estimated using our proposed model. Among these parameters, we expect
to select those that can be helpful for improving flash reliability. We expect to estimate the optimal
read reference voltage of each read using the predicted threshold voltage distribution. We expect to
estimate the raw bit error rate under different assumptions of read reference voltages. We expect
to estimate the expected remaining lifetime of each flash block without counting P/E cycles. We
expect to estimate the log-likelihood ratio of the distribution, which can be used for improving
ECC coding efficiency. We also expect to quantitatively evaluate and compare the accuracy of
these estimations using different models which we used in Section 5.
Second, we aim to develop techniques to utilize these estimations to improve flash reliability. We
expect to quantitatively evaluate and show how these techniques improve flash reliability in different
ways. With the optimal read reference voltage estimation, we can adapt the read reference voltage
to minimize raw bit error rate before applying read-retry technique. With the raw bit error rate
estimation, we can provide the right amount of ECC protection with the lowest overhead. With the
expected lifetime estimation, we can fully utilize the lifetime of each flash block without suffering
from loss of flash memory capacity. With the more accurate log-likelihood ratio estimation, we can
improve the efficiency of existing ECC codes.
Third, we aim to develop techniques for 3D NAND flash chips based on the characterization
and modeling we perform in Section 5. In particular, we expect to develop techniques to mitigate
any potential new reliability issues in 3D NAND such as three dimensional program interference.
We also expect to develop techniques to tolerate process variation across layers for 3D NAND chips.
To this end, we expect to answer the following research questions:
21
• How can we use our proposed models to predict flash parameters such as optimal read reference
voltage, raw bit error rate, remaining flash lifetime, and optimal ECC parameters? How does
the accuracy of the model affect these estimations?
• How often and at what granularity shall we predict the optimal read reference voltage to mini-
mize flash read error rate and read latency with low overhead?
• How can we efficiently adapt to the right amount of ECC protection to the predicted raw
bit error rate? How can we provide appealing flash reliability or lifetime benefits using such
techniques?
• How can we adjust flash management policies to adapt to the expected remaining flash lifetime,
instead of the P/E cycle counts? How often and for which flash block shall we estimate the
remaining flash lifetime to maximize flash lifetime and minimize performance overhead?
• How does the accuracy of log-likelihood ratio estimation affect error correcting capability?
• How can we mitigate three dimensional program interference in 3D NAND?
7. Proposed Work 3: Characterization and Utilization of NAND
Flash Memory Self-Healing Effect
Motivation: The self-healing effect is a phenomenon that NAND flash memory cells gradually
recover a fraction of its wear over time [37, 56], which can be accelerated by high temperature. As
we discussed in Section 3.6, no prior work attempts to verify their model on modern NAND flash
chips and demonstrate successful self-healing operation in real flash chips. In this work, we strive
to characterize and understand the self-healing effect and design techniques that utilize this effect
to improve flash reliability. Our approach has two steps.
First, we expect to comprehensively characterize different aspects of the self-healing effect using
real NAND flash chips. We expect to investigate the effectiveness of the self-healing effect (i.e., if
it can be used to improve flash reliability) by comparing raw bit error rates before and after heat-
accelerated self-healing under different P/E cycles. We expect to study whether the self-healing
effect persists after P/E cycling (i.e., if it can be used to improve overall lifetime of the flash
memory) by comparing total P/E cycle endurance with and without self-healing. We also expect
to study whether the self-healing effect is repeatable (i.e., if it can be used multiple times to further
improve flash lifetime) by comparing the endurance improvement of the first self-healing, second
self-healing, etc. Dwell-time, the time duration between two consecutive P/E cycles for which the
flash memory cell can recover, directly affects the effectiveness of the self-healing operation. Dwell-
time, similar to retention time, can be accelerated by high temperature according to Arrhenius
Law [6], allowing faster recovery of flash memory cells. We expect to study the relation between
dwell-time and the effectiveness of self-healing operation.
Second, we expect to design techniques to utilize the self-healing effect to improve flash reliability
and lifetime. We expect to investigate the feasibility of heating NAND flash memory at different
granularities using an internal or external heat source. We expect to design mechanisms to trigger
22
self-healing operations that maximize flash lifetime and minimize performance overhead. As heat-
accelerated self-healing operation also accelerates the retention effect, the data currently stored on
the flash memory can be damaged. We expect to design mechanisms that avoid data corruption
due to retention loss by moving the data, while minimizing the performance overhead. We can
predict the idle period of the workload and the effectiveness of the self-healing operation such that
we can schedule the self-healing operation when it is most effective and has the least interference.
We expect to evaluate the flash lifetime improvement and the performance penalty of our proposed
techniques based on our characterization results.
To this end, we expect to answer the following research questions:
• Can heat-accelerated self-healing operation effectively reduce raw bit error rate in real flash
chips?
• How well can the benefit of a self-healing operation persist over P/E cycles? How much flash
lifetime can a self-healing operation improve when performed under different P/E cycles?
• Can we repeat the self-healing operation to further extend flash lifetime?
• How does the self-healing effect correlate with dwell time? How do we design experiments to
characterize this effect?
• How can we utilize the self-healing effect to improve flash reliability and lifetime? How can we
design online/offline mechanisms to trigger the self-healing effect?
8. Timeline
Depending on the success of the different ideas presented in this proposal and the availability
of time, we will aim to explore as many ideas as possible. My goal is to graduate in the Summer of
2017. Table 1 lists my tentative timeline for pursuing the ideas proposed in this document. Note
the success of some of our ideas heavily depend on the data from the experimental results.
Duration Description
Apr-Jun 2016 Work on model-driven flash management policies (Potential milestone: sub-mission to JSAC).
Jul-Sep 2016 Work on characterization and modeling of 3D NAND flash memory errors(Potential milestone: submission to SIGMETRICS).
Oct 2016-Mar 2017 Work on characterization and utilization of self-healing effect (Potential mile-stone: submission to MICRO).
Apr-Jul 2017 Defend and submit thesis.
Table 1. Timeline for this proposal.
23
9. Conclusion
In this proposal, our goal is to improve NAND flash memory reliability with a multitude of
low-cost architectural techniques. To this end, we first describe a mechanism that we have already
worked on: WARM, a technique that manages flash retention differently for write-hot data and
write-cold data, and improves flash lifetime at low cost and low performance overhead. For our
future work, we propose to explore three new directions. The first direction proposes to develop an
online technique to characterize and model flash errors. The second direction proposes to develop
flash management policies that improves flash lifetime by exploiting our online model. The third
direction proposes to understand and develop new techniques that utilize flash self-healing effect.
We hope that this research will demonstrate that NAND flash memory reliability can be improved
at low cost and with low performance overhead by deploying various architectural techniques that
are aware of higher-level application behavior and underlying flash device characteristics.
References[1] NVM Express 1.2a, 2015. http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1 2a.pdf.
[2] R. C. Bose and D. K. Ray-Chaudhuri. On A Class of Error Correcting Binary Group Codes. Informationand control, 1960.
[3] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai. Error Patterns in MLC NAND Flash Memory: Mea-surement, Characterization, and Analysis. In DATE, 2012.
[4] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai. Threshold Voltage Distribution in NAND Flash Memory:Characterization, Analysis, and Modeling. In DATE, 2013.
[5] Y. Cai, Y. Luo, S. Ghose, and O. Mutlu. Read Disturb Errors in MLC NAND Flash Memory: Char-acterization, Mitigation, and Recovery. In DSN, 2015.
[6] Y. Cai, Y. Luo, E. F. Haratsch, K. Mai, and O. Mutlu. Data Retention in MLC NAND Flash Memory:Characterization, Optimization, and Recovery. In HPCA, 2015.
[7] Y. Cai, O. Mutlu, E. F. Haratsch, and K. Mai. Program Interference in MLC NAND Flash Memory:Characterization, Modeling, and Mitigation. In ICCD, 2013.
[8] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. Unsal, and K. Mai. Flash Correct andRefresh: Retention Aware Management for Increased Lifetime. In ICCD, 2012.
[9] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. Unsal, and K. Mai. Error Analysis andRetention-Aware Error Management for NAND Flash Memory. Intel Technology Journal (ITJ), 2013.
[10] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, O. Unsal, A. Cristal, and K. Mai. Neighbor Cell AssistedError Correction in MLC NAND Flash Memories. In SIGMETRICS, 2014.
[11] Y.-M. Chang, Y.-H. Chang, J.-J. Chen, T.-W. Kuo, H.-P. Li, and H.-T. Lue. On Trading Wear-LevelingWith Heal-Leveling. In DAC, 2014.
[12] F. Chen, T. Luo, and X. Zhang. CAFTL: A Content-Aware Flash Translation Layer Enhancing theLifespan of Flash Memory based Solid State Drives. In FAST, 2011.
[13] J. Cooke. The Inconvenient Truths of NAND Flash Memory. Flash Memory Summit, 2007.
24
[14] N. Dayan, P. Bonnet, and S. Idreos. GeckoFTL: Scalable Flash Translation Techniques For Very LargeFlash Devices. In SIGMOD, 2016.
[15] G. Dong, N. Xie, and T. Zhang. Enabling NAND Flash Memory Use Soft-Decision Error CorrectionCodes at Minimal Read Latency Overhead. IEEE Trans. on Circuits and Systems, 2013.
[16] B. Eitan. Non-Volatile Semiconductor Memory Cell Utilizing Asymmetrical Charge Trapping, 1998.US Patent 5,768,192.
[17] T. Frankie, G. Hughes, and K. Kreutz-Delgado. SSD TRIM Commands Considerably Improve Over-provisioning. Flash Memory Summit, 2011.
[18] R. G. Gallager. Low-Density Parity-Check Codes. Information Theory, IRE Transactions on, 1962.
[19] A. Gupta, Y. Kim, and B. Urgaonkar. DFTL: A Flash Translation Layer Employing Demand-basedSelective Caching of Page-level Address Mappings. In ASPLOS, 2009.
[20] J.-U. Kang, J. Hyun, H. Maeng, and S. Cho. The Multi-Streamed Solid-State Drive. In HotStorage,2014.
[21] S. Lee, T. Kim, K. Kim, and J. Kim. Lifetime Management of Flash-Based SSDs Using Recovery-AwareDynamic Throttling. In FAST, 2012.
[22] J. Li, K. Zhao, J. Ma, and T. Zhang. Realizing Unequal Error Correction for NAND Flash Memoryat Minimal Read Latency Overhead. Circuits and Systems II: Express Briefs, IEEE Transactions on,2014.
[23] J. Li, K. Zhao, X. Zhang, J. Ma, M. Zhao, and T. Zhang. How Much Can Data Compressibility Helpto Improve NAND Flash Memory Lifetime? In FAST, 2015.
[24] W. Li, G. Jean-Baptise, J. Riveros, G. Narasimhan, and M. Zhao. CacheDedup: In-line Deduplicationfor Flash Caching. In FAST, 2016.
[25] C.-Y. Liu, Y.-M. Chang, and Y.-H. Chang. Read Leveling for Flash Storage Systems. In SYSTOR,2015.
[26] Y. Lu, J. Shu, and W. Zheng. Extending the Lifetime of Flash-Based Storage Through Reducing WriteAmplification from File Systems. In FAST, 2013.
[27] Y. Luo, Y. Cai, S. Ghose, J. Choi, and O. Mutlu. WARM: Improving NAND Flash Memory Lifetimewith Write-Hotness Aware Retention Management. In MSST, 2015.
[28] Y. Luo, S. Ghose, T. Li, S. Govindan, B. Sharma, B. Kelly, A. Boroumand, and O. Mutlu. CREAM(Capacity- and Reliability-Adaptive Memory): Enabling the Use of ECC DRAM to Increase MemoryCapacity. In under submission to MICRO, 2016.
[29] Y. Luo, S. Govindan, B. Sharma, M. Santaniello, J. Meza, A. Kansal, J. Liu, B. Khessib, K. Vaid, andO. Mutlu. Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost viaHeterogeneous-Reliability Memory. In DSN, 2014.
[30] D. Ma, J. Feng, and G. Li. LazyFTL: A Page-Level Flash Translation Layer Optimized for NANDFlash Memory. In SIGMOD, 2011.
[31] D. Ma, J. Feng, and G. Li. A survey of address translation technologies for flash memories. CSUR,2014.
[32] D. J. MacKay and R. M. Neal. Near Shannon Limit Performance of Low Density Parity Check Codes.Electronics letters, 1996.
25
[33] S.-I. B. Members. Serial ATA International Organization: Serial ATA Revision 3.0, 2009.
[34] J. Meza, Q. Wu, S. Kumar, and O. Mutlu. A Large-Scale Study of Flash Memory Failures in The Field.In SIGMETRICS, 2015.
[35] N. Mielke, T. Marquart, N.Wu, J.Kessenich, H. Belgal, E. Schares, and F. Triverdi. Bit Error Rate inNAND Flash Memories. In IRPS, 2008.
[36] V. Mohan. Modeling The Physical Characteristics of NAND Flash Memory. PhD thesis, University ofVirginia, 2010.
[37] V. Mohan, T. Siddiqua, S. Gurumurthi, and M. R. Stan. How I Learned to Stop Worrying and LoveFlash Endurance. 2010.
[38] R. Motwani. Estimation of Flash Memory Level Distributions Using Interpolation Techniques for Op-timizing the Read Reference. In GLOBECOM, 2015.
[39] R. Motwani and C. Ong. Design of LDPC Coding Schemes for Exploitation of Bit Error Rate DiversityAcross Dies in NAND Flash Memory. In ICNC, 2013.
[40] R. Motwani and C. Ong. Soft Decision Decoding of RAID Stripe for Higher Endurance of Flash MemoryBased Solid State Drives. In ICNC, 2015.
[41] I. Narayanan, D. Wang, M. Jeon, B. Sharma, L. Caulfield, A. Sivasubramaniam, B. Cutler, J. Liu,B. Khessib, and K. Vaid. SSD Failures in Datacenters: What, When and Why? In SIGMETRICS,2016.
[42] Y. Pan, G. Dong, and T. Zhang. Exploiting Memory Device Wear-Out Dynamics to Improve NANDFlash Memory System Performance. In FAST, 2011.
[43] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou, C. Camp, T. Griffin, G. Tressler,and A. Walls. Using Adaptive Read Voltage Thresholds to Enhance The Reliability of MLC NANDFlash Memory Systems. In GLSVLSI, 2014.
[44] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou, C. Camp, T. Griffin, G. Tressler,and A. Walls. Enhancing the Reliability of MLC NAND Flash Memory Systems by Read ChannelOptimization. TODAES, 2015.
[45] D. Park, B. Debnath, and D. Du. CFTL: A Convertible Flash Translation Layer Adaptive to DataAccess Patterns. In SIGMETRICS, 2010.
[46] J. Park, J. Jeong, S. Lee, Y. Song, and J. Kim. Improving Performance and Lifetime of NAND StorageSystems Using Relaxed Program Sequence. In DAC, 2016.
[47] K.-T. Park, M. Kang, D. Kim, S.-W. Hwang, B. Y. Choi, Y.-T. Lee, C. Kim, and K. Kim. A ZeroingCell-To-Cell Interference Page Architecture With Temporary LSB Storing and Parallel MSB ProgramScheme for MLC NAND Flash Memories. JSSC, 2008.
[48] T. Parnell, N. Papandreou, T. Mittelholzer, and H. Pozidis. Modelling of the Threshold Voltage Dis-tributions of Sub-20nm NAND Flash Memory. In GLOBECOM, 2014.
[49] A. Prodromakis, S. Korkotsides, and T. Antonakopoulos. MLC NAND Flash Memory: Aging Effectand Chip/Channel Emulation. Microprocessors and Microsystems, 2015.
[50] B. Schroeder, R. Lagisetty, and A. Merchant. Flash Reliability in Production: The Expected and TheUnexpected. In FAST, 2016.
[51] K.-D. Suh, B.-H. Suh, Y.-H. Lim, J.-K. Kim, Y.-J. Choi, Y.-N. Koh, S.-S. Lee, S.-C. Suk-Chon, B.-S.Choi, J.-S. Yum, et al. A 3.3 V 32 Mb NAND Flash Memory With Incremental Step Pulse ProgrammingScheme. Solid-State Circuits, IEEE Journal of, 1995.
26
[52] H. Tabrizi, B. Peleato, R. Agarwal, and J. Ferreira. Improving NAND Flash Read Performance ThroughLearning. In ICC, 2015.
[53] S. Technology. Serial Attached SCSI (SAS) Interface Manual, 2009.http://www.seagate.com/staticfiles/support/disc/manuals/Interface%20manuals/100293071c.pdf.
[54] J. Wang, K. Vakilinia, T.-Y. Chen, T. Courtade, G. Dong, T. Zhang, H. Shankar, and R. Wesel.Enhanced Precision Through Multiple Reads for LDPC Decoding in Flash Memories. Selected Areas inCommunications, IEEE Journal on, 2014.
[55] G. Wu and X. He. Delta-FTL: Improving SSD Lifetime via Exploiting Content Locality. In EuroSys,2012.
[56] Q. Wu, G. Dong, and T. Zhang. Exploiting Heat-Accelerated Flash Memory Wear-Out Recovery toEnable Self-Healing SSDs. In HotStorage, 2011.
27