Architectural Techniques for Improving NAND Flash Memory

Architectural Techniques for

Improving NAND Flash Memory Reliability

Thesis Proposal

Yixin Luo

B.S., Computer Engineering, University of Michigan

B.S., Electrical Engineering, Shanghai Jiao Tong University

Thesis Prospectus Committee

Prof. Onur Mutlu (Chair)

Prof. Phillip B. Gibbons

Prof. James C. Hoe

Dr. Yu Cai

Dr. Erich F. Haratsch

July 12, 2016

Carnegie Mellon University

Pittsburgh, PA

Contents

1 Introduction 11.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Background 32.1 SSD Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 NAND Flash Block Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 NAND Flash Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 NAND Flash Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Related Work 103.1 Threshold Voltage Distribution Modeling . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Flash Media Reliability Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Flash Channel Noise Tolerance Techniques . . . . . . . . . . . . . . . . . . . . . . . . 123.4 Flash Reliability Management Techniques . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Host-Side Data Management Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 133.6 Self-Healing Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Preliminary Work: WARM—Write-hotness Aware Retention Management 134.1 Partitioning Data Using Write-Hotness . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.1 Identifying Write-Hot and Write-Cold Data . . . . . . . . . . . . . . . . . . . 144.1.2 Partitioning the Flash Device . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.1.3 Tuning the Partition Boundary . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Flash Management Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2.1 WARM-Only Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2.2 Combining WARM with Refresh . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Proposed Work 1: Online Characterization and Modeling of NAND Flash Mem-ory Errors 19

6 Proposed Work 2: Model-Driven Flash Management Policies 21

7 Proposed Work 3: Characterization and Utilization of NAND Flash MemorySelf-Healing Effect 22

8 Timeline 23

9 Conclusion 24

2

1. Introduction

Raw bit errors are common in NAND flash memory and will increase in the future. These

flash errors are inevitable in nature because they are caused by various sources of circuit-level

noise [3, 4, 5, 6, 7, 8, 10, 35]. To ensure its reliable operation with the presence of these errors,

flash memory deploys error-correcting codes (ECC), which corrects up to a certain raw bit error

rate. However, the raw bit error rate increases as flash memory becomes increasingly vulnerable to

circuit noise due to increased wear out. These errors will exceed the ECC correction capability at

a point where reliable operation of the flash memory can no longer be guaranteed. The duration

until this point is known as the flash memory lifetime. Exacerbating the flash errors count is the

continued increase in flash density through feature size scaling, multi-level cell (MLC), and 3D

NAND technology, all of which trade off flash reliability for significant cost reduction and capacity

increase. Due to the degraded flash reliability, the ECCs used for today’s flash memory can only

sustain a shorter lifetime for future flash memory with higher density.

Our goal in this proposal is to improve flash reliability by 1) reducing the raw bit error rate

or 2) tolerating more errors at lower cost and with lower performance overhead. To reduce the

error rate during read and write (or program) operations, we can increase the precision of these

operations. However, this comes at a cost of higher read or write latency. To tolerate more errors,

we can use a stronger ECC. Today’s flash memory typically uses an ECC encoding whose coding

rate is already approaching the theoretical limit [32]. Thus, stronger error correction capability can

only be achieved through higher redundancy or larger coding granularity, both of which increase

hardware overhead of the ECC decoder/encoder [8, 9]. As we can see, naive approaches that reduce

or tolerate raw bit errors for improving flash reliability do not meet our goal of low cost and low

performance overhead. Thus, to achieve our goal, we propose to perform research that provides

evidence for the following thesis statement.

1.1. Thesis Statement

NAND flash memory reliability can be improved at low cost and with low per-

formance overhead by deploying various architectural techniques that are aware of

higher-level application behavior and underlying flash device characteristics.

1.2. Our Approach

Our approach is to understand flash error characteristics and workload behavior through charac-

terization, and to design smart flash controller algorithms that utilize this understanding to improve

flash reliability. Compared with existing techniques that focus on only a part of the flash memory

system (e.g., circuit optimization, ECC code improvement, or file system design), our approach has

the following three advantages. First, we can take advantage of higher-level application behavior

such as write frequency and locality, and deploy the best suited flash reliability techniques for differ-

ent data. Second, we can take advantage of underlying device characteristics, such as variations in

1

latency, retention time, or errors, to develop more efficient flash-aware reliability techniques. Third,

we can take advantage of the unused computing resources in the flash controller during idle time,

or potentially in the host, to enable more effective flash reliability techniques. In this proposal, we

propose to investigate four directions to enable these three advantages and efficiently improve flash

reliability.

Direction 1: Our preliminary work devises a technique that reduces the overhead for

relaxing flash retention time guarantee, i.e., the duration for which data can be held reliably. This

is an example of our approach that exploits application-level write-hotness and device level reten-

tion characteristics to improve flash lifetime. We find that a small portion of data that is very

frequently written to (i.e., write-hot data) needs only a very short retention time guarantee to be

reliable. However, write-hot data consumes a large fraction of flash endurance and increases the

write amplification factor if such data is not managed properly. We also find that by relaxing

flash retention time guarantee, we can reduce the number of allowed retention errors, and thus

significantly improve flash endurance. Taking advantage of these two findings, we design a mecha-

nism that physically partitions write-hot data from write-cold data. By doing this, our mechanism

enables heterogeneous management of each partition—we can relax the retention time constraint

for write-hot data without any refresh overhead, which is required for write-cold data. Our key

results show that this mechanism can improve overall flash lifetime by 3.24× over a conventional

management policy without refresh. More details are in Section 4 and our MSST 2015 paper [27].

Direction 2: Our first proposed work aims to improve flash reliability by developing an un-

derstanding of flash errors through characterization of modern flash chips, and by constructing a

flash device model in the flash controller. This will be an example of our approach that exploits

the unused computing resources in the flash controller and enables greater device-awareness. De-

pending on the availability of real chips we can test, we expect to profile modern NAND flash chips

(e.g., 3D, MLC/TLC/QLC NAND flash chips) to construct and validate our model. We find that a

flash device model can be leveraged to predict raw bit error rate and improve many aspects of flash

reliability and performance [4, 15, 42, 48, 49, 52]. We also find, however, that existing models are

either too simple and inaccurate or too complex to be constructed and used online. Based on these

two findings, we intend to devise a new, accurate, and easy-to-compute flash device model online,

while the flash device is operating. We will especially focus on cutting-edge devices, including 3D

NAND, TLC, and very aggressive MLC devices.

Direction 3: In our second proposed work, we aim to improve flash reliability by designing a

device-aware flash controller that exploits the online-constructed flash model in various ways. This

will be an example of our approach that exploits device awareness in the flash controller. We find

that an accurate online threshold voltage distribution model can be used to accurately estimate

many flash parameters for optimizing flash reliability and performance. These parameters include

the optimal read reference voltage, remaining flash endurance, and likelihood ratio used in the ECC

code (i.e., the probability ratio of a certain threshold voltage range according to its distribution).

By exploiting these estimations in the flash controller, we can improve flash reliability and lifetime

2

with low cost and low performance overhead.

Direction 4: To improve flash reliability, our third proposed work aims to understand and

exploit the flash self-healing effect [11, 21, 37, 56], the phenomenon in which flash memory cells

gradually recover a fraction of their wear over time. This will be an example of our approach that

improves flash reliability by exploiting device-level behavior. We find an opportunity to significantly

improve flash lifetime by exploiting the heat-accelerated self-healing effect in flash memory [11, 21,

37, 56]. We expect to prove or disprove this opportunity through quantitative experimental analysis

of real flash chips. Exploiting the self-healing effect is challenging because high temperatures may

induce retention errors that can damage the data stored in flash memory, forcing the data to be

migrated away from flash memory during self-healing operations. To mitigate this problem, we

expect to design a device and workload aware mechanism that maximizes the benefit of self-healing

while minimizing expensive data migration. By doing this, we aim to unleash the extended flash

lifetime due to the self-healing effect with minimal performance loss. The result of this work and

the possibility of designing a mechanism is heavily dependent on the nature and the characteristics

of the data that we collect and analyze.

We believe the findings of this thesis will also be applicable to other memory technologies.

For example, the idea of using architectural techniques that take advantage of device-level and

application-level characteristics to improve memory reliability can also be applied to DRAM. To

achieve better cost-effectiveness in data centers, we have proposed to enable flexible memory reli-

ability provisioning using Heterogeneous-Reliability Memory techniques [28, 29]. Our key results

show that our techniques can reduce server hardware cost by 4.7% while achieving 99.90% single

server reliability. This work is an example of our approach that exploits application-level memory

error tolerance as well as the device-level memory error characteristics.

In this thesis, we expect and hope to make the following new contributions:

• Develop a new mechanism to relax flash retention time constraints for write-hot data, such that

we can eliminate unnecessary refresh operations and extend flash lifetime.

• Construct a new, easy-to-compute, and accurate flash device model by experimentally charac-

terizing real modern flash chips (e.g., 3D, MLC, TLC, QLC chips).

• Develop new mechanisms to improve flash reliability and lifetime by exploiting our new online

flash device models.

• Examine and quantitatively evaluate the self-healing effect in real flash chips.

• Develop new techniques that take advantage of the self-healing effect to improve flash memory

reliability with low performance overhead.

2. Background

The reliability of NAND flash-based Solid-State Drives (SSDs) is becoming increasingly impor-

tant in modern computing systems because of three reasons. First, SSDs have become common

in today’s datacenters, laptops, and PCs as they provide significant performance benefit over a

3

traditional hard disk drive (HDD). Second, SSDs are typically used as a persistent (or non-volatile)

storage device, and reliability can lead to losing precious user data. Third, SSDs are becoming more

tightly-coupled into the system. While many of today’s SSDs are connected to the host through

an I/O bus interface such as Serial ATA (SATA), future SSDs will be connected through the PCI

Express (PCIe) or DDR memory bus to reduce access latency. As an example, Figure 1 shows how

the SSD fits into today’s system. In this figure, the SSD is first connected to the southbridge (or

I/O controller hub) through the SATA bus, then to the northbridge (or memory controller hub)

through an internal bus, and to the host CPU through the front-side bus (FSB). In today’s system,

the data from the SSD is first copied to DRAM, which is connected to the northbridge through

the memory bus, then becomes accessible by the host CPU. In contrast, future PCIe or DDR SSDs

bypass the southbridge and can be directly accessed by the host CPU.

SSD

Host CPU

Northbridge

FSB

SATA Bus Southbridge

DRAM

Internal Bus

Figure 1. How SSD fits in today’s system.

In this section, we will provide background information on how a NAND flash-based SSD works

and how NAND flash errors are induced to affect SSD reliability. We will demonstrate the internal

organization of an SSD from its high-level architecture (Section 2.1) to each individual flash cell

that stores its data (Section 2.2). Then we will describe how different types of circuit-level noise

induce flash errors (Section 2.3). We will also describe three basic operations that enable storing

and retrieving data in modern NAND flash memory (Section 2.4).

2.1. SSD Organization

Overview: Figure 2 plots the organization of a typical NAND flash-based SSD. The SSD

stores its data in flash chips, each of which can be accessed independently. The flash chips

communicate with the host through a flash controller. The flash controller consists of various

logical and physical modules that handle raw bit errors in the flash chips and bridge the gap

between the host interface and the flash interface. The ECC encoder/decoder module encodes all

the data stored in the flash chip and detects and, if possible, corrects any raw bit error when the

data is accessed. To minimize latency and energy consumption, the ECC encoder/decoder module

4

is typically implemented as an application-specific integrated circuit (ASIC). The other modules

are typically implemented in the firmware that runs on an embedded processor within the flash

controller. These modules include the host and the flash interface, and the flash translation

layer (FTL). Some high-end SSDs contain DRAM to allow buffering and to speed up the flash

controller. Next, we will introduce relevant components in turn.

SSD Organization

Flash Controller

Embedded Processor

ECC Encoder/Decoder

Ho

st In

terf

ace

FTL Fl

ash

Inte

rfac

e

DRAM

Flash Mgmt. Policies

Flash chip Flash chip

Flash chip Flash chip

HO

ST

Figure 2. SSD organization.

Flash chip: Each flash chip can be read or written at the granularity of a flash page, which is

typically 8− 16KB in today’s flash devices. Before a flash page can be overwritten with new data,

the old data contained in that page has to be erased. Due to limitations in its circuit design [36],

NAND flash memory performs erase operations at the granularity of a flash block, which typically

consists of 256− 512 flash pages. Figure 3 plots a typical example of how flash pages are organized

within a flash chip. In this example, each flash chip consists of two flash dies, each of which

consists of two flash planes. Within a flash plane, all flash pages share the same sense amplifier,

which buffers the data when accessing a page from the plane.

Host and flash interfaces: The host interface, such as SCSI [53], SATA [33], or NVMe [1], is

distinct from the flash interface. These interfaces are each handled by separate interface handlers

in the embedded processor. The host interface handles the conventional sector’s read and write

requests issued by the file system. The flash interface contains flash-specific commands necessary

for managing or optimizing flash performance and reliability (e.g., erase, access special registers,

etc.), which are typically hidden from the host. The flash interface also enforces several NAND

flash-specific constraints. For example, a flash block must be erased before writing new data, and

flash pages within a block must be written in page order. To bridge the gap between the host and

flash interfaces, the FTL, which sits in between the two interfaces, translates the incoming requests

into corresponding flash commands and addresses. Also hidden from the host interface are the

raw flash errors in the flash chips. To correct these errors, all data transferred through the flash

5

Flash Chip Flash Die 0

Plane 0

Block 0 Page 0 Page 1

Page 255

. . .

. . .

Sense Amp.


Page 255

. . .

Plane 1


Page 255

. . .

. . .

Sense Amp.


Page 255

. . .

Flash Die 1

Plane 0


Page 255

. . .

. . . Sense Amp.


Page 255

. . .

Plane 1


Page 255

. . .

. . .

Sense Amp.


Page 255

. . .

Figure 3. Flash chip organization.

interface to or from the flash chips is encoded or decoded by the ECC encoder/decoder unit.

Flash translation layer (FTL): A modern FTL performs three major tasks (or flash man-

agement policies) to bridge the host and flash interface. First, the FTL maintains a mapping table

which translates each requested logical page number from the host interface to a physical page

number in the flash chips. If the host request is a write request, the FTL updates the table entry

for the requested logical page with the next available physical page number, then invalidates the

physical page that contains the old data. Second, the FTL performs garbage collection, which

finds and erases invalid flash blocks such that the block can be reused. If necessary, the FTL moves

valid data to create reusable flash blocks. Third, the FTL records the amount of wear on each flash

block and performs wear-leveling, which aims to evenly distribute writes across all flash blocks.

These three tasks within FTL are typically implemented in the firmware running on the embedded

processor within the flash controller.

2.2. NAND Flash Block Organization

As we show in Section 2.1, a flash chip is organized into multiple flash blocks. Each flash block

consists of a two-dimensional array of flash cells (or floating-gate transistors), which are the basic

storage components of flash memories. A single-level cell (SLC) can store one bit; a multi-level

cell (MLC) can store two bits; a triple-level cell (TLC) can store three bits. In this section, we

explain the organization of an MLC flash block as an example.

Figure 4 plots the internal logical structure of an MLC flash block. Within the block, all the

cells in the same row share a common wordline (i.e., WL in Figure 4), which typically spans 32K

to 64K cells. In an MLC flash block, each cell stores a 2-bit value, which we represent as a tuple

(LSB, MSB) in this section, where LSB is the least significant bit and MSB is the most significant

bit. The LSBs stored in each wordline form an LSB page, and the MSBs stored in each wordline

6

form an MSB page. Within a block, all cells in the same column are connected in series to form a

bitline or string (i.e., BL in Figure 4). Each string, on one end, is connected to the ground (GND)

through a transistor controlled by ground select line (GSL), and is connected to the sense amplifier

(also shown in Figure 3) through a transistor controlled by string select line (SSL). Multiple strings

in a flash plane are connected in parallel to a common sense amplifier. During a read operation,

the sense amplifier on each string decodes the two-bit value stored in one cell selected by the SSL

and the WL signal. Note that the same flash block organization shown in Figure 4 applies to 3D

NAND.

BL-3 BL-2 BL-1 BL-0

WL-N

WL-2

WL-1

WL-0

Flash Cell

Sense Amp.

GSL

SSL

GND

Figure 4. Flash block organization.

2.3. NAND Flash Errors

To understand the nature of flash errors, we first explain how data is stored in flash cells. We

then explain how various types of circuit-level noise affect flash memories and lead to flash errors.

Threshold Voltage Distribution: NAND flash memory represents data using the threshold

voltage of each flash cell. For an MLC NAND flash memory, each flash cell stores a two-bit value,

and can be programmed to one of four threshold voltage states, which we call the ER, P1, P2, and

P3 states. Each state represents a different two-bit value, and is assigned a voltage window within

the range of all possible threshold voltages. Due to variation across different program operations,

7

the threshold voltage of flash cells programmed to the same state is initially distributed across this

voltage window. This is called the threshold voltage distribution of a state.

Figure 5 illustrates the threshold voltage distribution of an MLC NAND flash memory chip,

for all possible (four) states. The x-axis shows the threshold voltage (Vth), which spans a certain

voltage range. The y-axis shows the probability density at each voltage level across all flash memory

cells. The threshold voltage distribution of each threshold voltage state can be represented as a

probability density curve that spans its own voltage window. We label the distribution curve for

each state with the name of that state and its corresponding two-bit value (i.e., LSB and MSB

values). The boundaries between neighboring threshold voltage windows are referred to as read

reference voltages, labeled Va, Vb, and Vc in Figure 5. These voltages are used by the flash

controller to identify the voltage window (i.e., state) of each cell. For example, a flash cell is

considered as in P1 state, representing a value of 10, if the threshold voltage of this cell is between

Va and Vb.

Threshold voltage (Vth)

Pro

ba

bili

ty d

ensi

ty

ER (11)

P1 (10)

P2 (00)

P3 (01)

Va Vb Vc

LSB MSB

Figure 5. Threshold voltage distribution of an MLC NAND flash memory chip.

Noise and Errors: Over time, the threshold voltage distributions shift, as shown in Figure 6,

and can overlap with each other. These shifts can happen due to various types of noise in flash

memory. Such noise includes: (1) program/erase (P/E) cycling noise, causing the flash cells with

higher wear out to end up with higher threshold voltages right after programming [3, 4, 48]; (2)

cell-to-cell program interference noise, increasing the threshold voltage of neighboring flash cells in

adjacent rows during a program operation [3, 7, 10, 35], (3) read disturb noise, slightly increasing

the threshold voltage of all other flash pages within a block during a read operation [5, 13, 35],

and (4) retention noise, decreasing the threshold voltage of a flash cell over time due to charge

leakage [6, 8, 9, 35].

If the threshold voltage of a flash cell shifts into the threshold voltage window of a state other

than the one it was originally programmed in, the cell can be misread, leading to a flash error.

For example, a cell in the rightmost part of the ER state distribution in Figure 6 will be misread

as being in the P1 state. Similarly, any cell that shifts across the original read reference voltages

(Va, Vb, Vc) induces a raw bit error. As such, the raw bit errors (i.e., the shaded area in Figure 6)

increase as a result of threshold voltage shifts.

8


Pro

ba

bili

ty d

ensi

ty

ER (11)

P1 (10)

P2 (00)

P3 (01)

Va Vb Vc

Errors when reading with Va

Figure 6. Threshold voltage distributions after the voltages shift, leading to raw bit errors.

2.4. NAND Flash Operations

NAND flash memory supports three basic operations: read, program, and erase. We first

explain how read operations are performed in modern flash chips. We then explain how erase and

program operations are performed. Recall from Section 2.1 that read and program operations are

performed at the page granularity, whereas erase operations are performed at the block granularity.

Read Operation: Data can be read from NAND flash memory by applying read reference

voltages onto the control gate of each cell, to sense the cell’s threshold voltage. To read the LSB of

the cell, we only need to distinguish the states with an LSB value of 1 (ER and P1) from those with

an LSB value of 0 (P2 and P3). As Figure 5 shows, we only need to use one read reference voltage,

Vb, to read the LSB page. To read the MSB page, we need to distinguish the states with an MSB

value of 1 (ER and P3) from those with an MSB value of 0 (P1 and P2). Therefore, we need to

determine whether or not the threshold voltage of the cell falls between Va and Vc, requiring us to

apply both read reference voltages, one at a time, to determine the MSB data.

As we discussed in Section 2.3, the threshold voltage distributions of each state can shift over

time, causing some cells to move into neighboring voltage windows. To mitigate the number of errors

that occur when a large number of cells experience threshold voltage shifts, modern NAND flash

chips support the read-retry mechanism, which adapts the read reference voltages to correspond

to the distribution shifts [4, 6]. The read-retry operation allows the flash controller to increase or

decrease the read reference voltages by multiples of a minimal voltage step (Vstep), iterating over

several potential values of each read reference voltage to empirically find the voltage value that

yields the lowest raw bit error rate. As we show in Figure 7, the new boundaries between the

shifted threshold voltage distributions, Va, Vb, and Vc, yield the lowest raw bit errors (i.e., the size

of the shaded areas is the smallest), thus we consider them as the optimal read reference voltages.

By trying to re-read the same flash page with different read reference voltages after a read failure,

the flash controller increases the chances of reading the data correctly. For example, in Figure 6, the

original read reference voltage Va, between the ER and P1 states, misreads many of the flash cells

(e.g., many cells actually belonging to the ER state would be incorrectly identified as belonging to

the P1 state), but the flash controller can move move the read reference voltage to V ′a (as shown

9

in Figure 7), using the read-retry mechanism, thereby significantly reducing the number of errors.


Pro

ba

bili

ty d

ensi

ty

ER (11)

P1 (10)

P2 (00)

P3 (01)

Va Va’ Vb Vb

’ Vc Vc’

Errors when reading with Va’

Figure 7. Threshold voltage distributions after the voltages shift, resulting in overlapping distri-butions.

Erase and Program Operations: In NAND flash, data can be programmed only into an

erased flash cell. Since negative gate voltage can only be applied for all flash cells in a block, a flash

block must be erased in its entirety [36]. The erase operation resets the threshold voltage state of

all cells in the flash block to the ER state.

When data is programmed, charge is transferred into the floating gate of a flash cell by repeat-

edly pulsing the programming voltage, in a procedure known as incremental-step-pulse programming

(ISPP) [51]. In order to reduce the impact of interference caused by the programming process on

the neighboring cells (called cell-to-cell program interference [7]), two-step programming is employed

for MLC NAND flash: the LSB is first programmed into a cell, and then the MSB is programmed

only after partial data is programmed into neighboring cells [47]. In the first step, a flash cell is

partially programmed based on its LSB value, either staying in the ER state or moving to a tempo-

rary state (TP), whose mean voltage falls between states P1 and P2. In the second step, the LSB

data is first be read back into an internal buffer to determine the cell’s current threshold voltage,

and then further programming pulses are applied based on the MSB data, to increase the cell’s

threshold voltage into its final state.

3. Related Work

Many prior works have proposed to improve flash reliability through characterization, mod-

eling, and optimization techniques. We classify these works into different groups based on their

similarities.

3.1. Threshold Voltage Distribution Modeling

Characterization: Prior work has experimentally characterized the threshold voltage dis-

tribution, and how various types of noise can change that distribution. Prior work makes the

following conclusions about threshold voltage shifts: 1) Threshold voltage distribution shifts to the

10

right (i.e., towards higher threshold voltages) and becomes wider as program/erase (or P/E) cycle

increases [4, 48]; 2) Threshold voltage distribution shifts to the right when neighboring cells are

programmed to higher voltage states [7, 10]; 3) The distribution of the lower voltage states shifts

to the right when other pages in the same block are read repeatedly [5, 43]; 4) Threshold voltage

distribution shifts to the left (i.e., towards lower threshold voltages) over time due to retention

loss [6, 8, 9]. These experimental characterization works focus mostly on state-of-the-art MLC

NAND flash chips. In contrast, our proposal plans to characterize newer generation TLC or 3D

NAND flash chips, which can be more susceptible to errors than MLC NAND flash due to a smaller

voltage margin (for TLC NAND) or an extra dimension for interference (for 3D NAND).

Prior work also provides several detailed field studies of flash errors in data centers [34, 41, 50].

Unlike experimental characterization using few flash chips, these large-scale field studies cover a

large number of flash devices with many different flash vendors and technologies. However, these

field studies can only characterize the average correctable bit error rate for each drive. They cannot

obtain threshold voltage distribution, cannot obtain raw bit error rate for each flash block/page,

cannot correlate with an accurate P/E cycle count, retention time, or read count for each block, and

does not have enough data for the latest flash technology. In contrast, by performing experimental

characterization, our approach obtains much more detail about threshold voltage distribution and

flash errors than large-scale studies.

Modeling: Based on these experimental characterization results, prior work has also proposed

models of the threshold voltage distribution and the distribution shifts [4, 48]. These models

are based on different distributions, such as Gaussian distribution [4], Normal-Laplace mixture

distribution [48], and other distributions [4], to achieve higher accuracies. On the other hand, some

prior work models the distribution based on idealized circuit-level models [15, 38, 42]. These models

capture some desired behavior, but are less accurate than those derived from real characterization.

While these previous models can be used for offline analysis, they are unsuitable for online evaluation

because they are either prohibitively complex to implement or not accurate enough. In contrast,

our work plans to propose an online threshold voltage distribution model that is both simple to

implement and accurate.

Applications of the models: Prior work has proposed and evaluated techniques that uses

a threshold voltage distribution model for different purposes, such as (1) constructing a raw bit

error rate model offline [48, 49], (2) finding the optimal read reference voltage [6, 43, 52], and (3)

optimizing ECC offline [15]. These use cases can only optimize flash reliability offline because they

are limited by the high overhead of the characterization and modeling techniques, as we discussed

before. Our work plans to reduce these overheads such that these use cases can become suitable

for online usage, and potentially develop new online techniques to further optimize flash lifetime.

3.2. Flash Media Reliability Optimization

One way to improve flash reliability is by reducing noise (i.e., raw bit errors) in the raw flash

media. Improving the quality of raw flash media often requires circuit-level changes to the flash

11

chip. For example, to reduce overall read noise, prior work proposes to improve the precision of

read operations using the read-retry technique [4]. The read-retry technique can also be combined

with the knowledge of how threshold voltage shifts (shown in Section 3.1), and adjust the read ref-

erence voltages to P/E cycles [44, 48], cell-to-cell program interference [7, 10, 44], read disturb [43],

and retention [6]. To mitigate read disturb noise, prior work proposes to add simple logic in the

flash chip to enable dynamic pass-through voltage tuning [5]. To mitigate P/E cycling and pro-

gram interference noise, prior work also proposes to use charge-trap flash instead of traditional

floating-gate cells [16]. While these techniques directly optimize flash media reliability by exploit-

ing underlying device characteristics, they do not take advantage of application behavior like our

proposed architectural techniques do.

3.3. Flash Channel Noise Tolerance Techniques

Another way to improve flash reliability is by tolerating more noise in the unreliable flash

channel using error-correcting codes (ECC). To more efficiently tolerate all kinds of noise, state-of-

the-art SSDs have started to use stronger ECC such as low-density parity-check (LDPC) codes [18]

instead of traditional BCH codes [2]. LDPC code is known to be a capacity-approaching code,

which can tolerate near theoretical maximum bit error rates (the Shannon limit) [32]. Prior work

has studied how to reach this limit by exploiting soft information in LDPC decoders using read-

retry and an iterative decoding algorithm [54]. Prior work has also studied ECC techniques that

tolerate flash error variation [22] and process variation [39, 40]. However, stronger ECC requires

large storage and logic overhead, and has diminishing returns in reliability improvement [8]. On

the other hand, many opportunities to exploit flash error characteristics in the ECC design remain

largely unexploited, which are included as part of this proposal.

3.4. Flash Reliability Management Techniques

Flash reliability varies significantly depending on the usage pattern. Thus, to improve flash

reliability and lifetime, prior work also proposes better flash management techniques to ensure

friendly usage patterns by optimizing flash translation layer (FTL) algorithms. For example, to

reduce unnecessary erase operations, prior work optimizes page allocation policy to achieve higher

spatial locality of write operations [19, 30, 31, 45]. Prior work also proposes techniques to minimize

endurance cycles consumed by FTL metadata [14, 46]. To mitigate program interference errors,

prior work proposes to use certain program sequences, instead of allowing random writes, also

managed by the FTL [7, 10, 46]. To mitigate read disturb errors, prior work proposes to redistribute

read-hot pages across different flash blocks [25]. To mitigate retention errors, prior work proposes to

use various flash refresh techniques that periodically rewrites all data with high retention age [8, 9].

Prior work also proposes techniques to allow the host to provide more information to the FTL

for better management of flash reliability. For example, to achieve longer flash lifetime, prior work

proposes to allow the file system to issue TRIM commands such that the FTL, given more free

space, can perform garbage collection more efficiently [17]. This technique is now widely deployed

12

in today’s systems. Recent work also proposes a multi-stream technique to further optimize flash

lifetime by allowing the file system to provide hints about write-hotness of the data and the FTL to

map data accordingly [20]. Demonstrated by these prior works, exploiting both device character-

istics and application behavior in the FTL often leads to significant reliability improvements with

low overhead. As part of this proposal, our work aims to develop new flash reliability management

techniques to exploit newly-discovered device characteristics and application behavior.

3.5. Host-Side Data Management Techniques

Because flash memory wears out as more data is written, we can also improve flash reliability

and lifetime by reducing the total amount of data written to the flash memory using host-side data

management techniques. For example, prior work has proposed to eliminate redundant writes due

to small updates (i.e., when updating a small piece of data within a page, only the updated data

needs to be written) [12, 26, 55]. Other recent work has also proposed to use compression [23] and

deduplication [24] to reduce the data size written to the flash memory. Compared to these pieces

of work, our proposal aims to bring flash device awareness to the architecture level so that we can

further improve flash reliability in an efficient way.

3.6. Self-Healing Effect

Prior work has proposed idealized circuit-level models for the self-healing effect [37, 56], demon-

strating significant opportunities for using the self-healing effect to improve flash reliability and

lifetime. Based on the assumptions about how self-healing effect works, prior work has also pro-

posed techniques to exploit this effect to improve flash lifetime such as heal-leveling [11], write

throttling [21], and heat-accelerated self-healing [56]. However, these previous results are not yet

convincing enough to show that self-healing effect can successfully improve flash lifetime on real

devices, because they lack real experimental data and evidence supporting the self-healing effect on

modern flash devices. Our work aims to be the first to demonstrate and comprehensively evaluate

the benefit of self-healing effect using experimental data from real flash chips.

4. Preliminary Work: WARM—Write-hotness Aware Retention

Management

WARM is our preliminary work that exploits application-level write-hotness and device level

retention characteristics to improve flash lifetime. Our goal is to eliminate redundant refreshes

for write-hot pages with minimal storage and performance overhead. This work proposes a write-

hotness-aware flash memory retention management policy, WARM. The first key idea of WARM

is to effectively partition pages stored in flash into two groups based on the write frequency of

the pages. The second key idea of WARM is to apply different management policies to the two

different groups of pages/blocks. We first discuss a novel, lightweight approach to dynamically

identifying and partitioning write-hot versus write-cold pages (Section 4.1). We then describe

13

how WARM optimizes flash management policies, such as garbage collection and wear-leveling,

in a partitioned flash memory, and show how WARM integrates with a refresh mechanism to

provide further flash lifetime improvements (Section 4.2). We also summarize our evaluation of

flash lifetime improvement delivered by WARM, and our analysis of hardware and performance

overhead to implement WARM in Section 4.3.

4.1. Partitioning Data Using Write-Hotness

4.1.1. Identifying Write-Hot and Write-Cold Data

Figure 8 illustrates the high-level concept of our write-hot data identification mechanism. We

maintain two virtual queues, one for write-hot data and another for write-cold data, which order

all of the hot and cold data, respectively, by the time of the last write. The purpose of the virtual

queues is to partition write-hot and write-cold data in a space-efficient way. The partitioning

mechanism provides methods of promoting data from the cold virtual queue to the hot virtual

queue, and for demoting data from the hot virtual queue to the cold virtual queue. The promotion

and demotion decisions are made such that write-hot pages are quickly identified (after two writes

in quick succession to the page), and write-cold pages are seldom misidentified as write-hot pages

(and are quickly demoted if they are). Note that the cold virtual queue is divided into two parts,

with the part closer to the tail known as the cooldown window. The purpose of the cooldown

window is to identify those pages that are most recently written to. The pages in the cooldown

window are the only ones that can be immediately promoted to the hot virtual queue (as soon as

they receive a write request). We walk through examples for both of these migration decisions.

Hot Virtual Queue

Hot Window

Hot Data

Cold Virtual Queue

Cooldown Window

Cold Data ……④ ⑤

⑥ ②

①

③

TAIL HEAD TAIL HEAD

Figure 8. Write-hot data identification algorithm using two virtual queues and monitoring win-dows.

Initially, all data is stored in the cold virtual queue. Any data stored in the cold virtual queue

is defined to be cold. When data (which we call Page C) is first identified as cold, a corresponding

queue entry is pushed into the tail of the cold virtual queue ( 1○). This entry progresses forward in

the queue as other cold data is written. If Page C is written to again after it leaves the cooldown

window ( 2○), then its queue entry will be removed from the cold virtual queue and reinserted at

the queue tail ( 1○). This allows the queue to maintain ordering based on the time of the most

recent write to each page.

If a cold page starts to become hot (i.e., it starts being written to frequently), a cooldown

window at the tail end of the cold virtual queue provides these pages with a chance to be promoted

into the hot virtual queue. The cooldown window monitors the most recently inserted (i.e., most

14

recently written) cold data. Let us assume that Page C has just been inserted into the tail of

the cold virtual queue ( 1○). If Page C is written to again while it is still within the cooldown

window, it will be immediately promoted to the hot virtual queue ( 3○). If, on the other hand,

Page C is not written to again, then Page C will eventually be pushed out of the cooldown window

portion of the cold virtual queue, at which point Page C is determined to be cold. Requiring a

two-step promotion process from cold to hot (with the use of a cooldown window) allows us to

avoid incorrectly promoting cold pages due to infrequent writes. This is important for two reasons:

(1) hot storage capacity is limited, and (2) promoted pages will not be refreshed, which for cold

pages could result in data loss. With our two-step approach, if Page C is cold and is written to

only once, it will remain in the cold queue, though it will be moved into the cooldown window ( 2○)

to be monitored for subsequent write activity.

Any data stored in the hot virtual queue is identified as hot. Newly-identified hot data, which

we call Page H, is inserted into the tail of the hot virtual queue ( 4○). The hot virtual queue length is

maximally bounded by a hot window size to ensure that the most recent writes to all hot data pages

were performed within a given time period. (We discuss how this window is sized in Section 4.1.3.)

The assumption here is that infrequently-written pages in the hot virtual queue will eventually

progress to the head of the queue ( 5○). If the entry for Page H in the hot virtual queue reaches the

head of the queue and must now be evicted, we demote Page H into the cooldown window of the

cold virtual queue ( 1○), and move the page out of the hot virtual queue. In contrast, a write to a

page in the hot virtual queue simply moves that page to the tail of the hot virtual queue ( 6○).

4.1.2. Partitioning the Flash Device

Figure 9 shows how we apply the identification mechanism from Section 4.1.1 to perform physical

page partitioning inside flash, with labels that correspond to the actions from Figure 8. We first

separate all of the flash blocks into two allocation pools, one for hot data and another for cold data.

The hot pool contains enough blocks to store every page in the hot virtual queue (whose sizing is

described in Section 4.1.3), as well as some extra blocks to tolerate management overhead (e.g.,

erasing on garbage collection). The cold pool contains all of the remaining flash blocks. Note that

blocks can be moved between the two pools when the queues are resized.

Hot block pool(Hot window)

Hit in the cooldown window ③

⑥ Hit in the hot window

Promote to cooldown window

⑤ Migrate valid pages

Cooldownwindow

②

Cold block pool

Figure 9. Write-hotness aware retention management policy overview.

To simplify the hardware required to implement the virtual queues, we exploit the fact that

15

pages are written sequentially into the hot pool blocks. Consecutive writes to hot pages will be

placed in the same block, which means that a single block in the hot virtual queue will hold all of

the oldest pages. As a result, we can track the hot virtual queue at a block granularity instead of

a page granularity, which allows us to significantly reduce the size of the hot virtual queue.

4.1.3. Tuning the Partition Boundary

Since the division between hot and cold data can be dependent on both application and phase

characteristics, we need to provide a method for dynamically adjusting the size of our hot and cold

pools periodically. Every block is allocated to one of the two pools, so any increase in the hot pool

size will always be paired with a corresponding decrease in the cold pool size, and vice versa. Our

dynamic sizing mechanism must ensure that: (1) the hot pool size is such that every page in the hot

pool will be written to more frequently than the hot pool retention time (which is relaxed as the

hot pool does not employ refresh), and (2) the lifetime of the blocks in the cold pool is maximized.

To this end, we describe an algorithm that tunes the partitioning of blocks between the hot and

cold pools.

The partitioning algorithm starts by setting an upper bound for the hot window, to ensure that

every page in the window will be written to at a greater rate than the fixed hot pool retention time.

Our goal is to relax hot pool retention time guarantee to provide greater endurance. We estimate

this size by collecting the number of writes to the hot pool, to find the average write frequency

and estimate the time it takes to fill the hot window. We compare the time to fill the window to

the hot pool retention time, and if the fill time exceeds the retention time, we shrink the hot pool

size to reduce the required fill time. This hot pool size determines the initial partition boundary

between the hot pool and the cold pool.

We then tune this partition boundary to maximize the lifetime of the cold pool, since we do not

relax retention time for the blocks in the cold pool. Assuming that wear-leveling evenly distributes

the page writes within the cold pool, we can use the endurance capacity metric (i.e., the total

number of writes the cold pool can service), which is the product of the remaining endurance of a

block1 and the cold pool size, to estimate the lifetime of blocks in the cold pool:

EnduranceCapacity = Remaining Endurance× ColdPool Size (1)

Lifetime =EnduranceCapacity

ColdWrite Frequency∝ ColdPool Size

ColdWrite Frequency(2)

We divide the endurance capacity by the cold write frequency (writes per day) to determine

the number of days remaining before the cold pool is worn out. We use hill climbing to find

the partition boundary at which the cold pool size maximizes the flash lifetime. The cold write

frequency is dependent on cold pool size, because as the cold pool size increases, the hot pool size

correspondingly shrinks, shifting writes of higher frequency into the cold pool.

Finally, once the partition boundary converges to obtain the maximum lifetime, we must adjust

what portion of the cold pool belongs in the cooldown window. We size this window to minimize

1Due to wear-leveling, the remaining endurance (i.e., the number of P/E operations that can still be performedon the block) is the same across all of the blocks.

16

the ping-ponging of requests between the hot and cold pools. For this, we want to maximize the

number of hot virtual queue hits ( 6○ in Figure 8), while minimizing the number of requests evicted

from the hot window ( 5○ in Figure 8). We maintain a counter of each of these events, and then use

hill climbing on the cooldown window size to maximize the utility function Utility = ( 6○ – 5○).

In our work, we limit the hot pool size to the number of over-provisioned blocks within the flash

device (i.e., the extra blocks beyond the visible capacity of the device). While the hot pages are

expected to represent only a small portion of the total flash capacity, there may be rare cases where

the size limit prevents the hot pool from holding all of the hot data (i.e., the hot pool is significantly

undersized). In such a case, some less-hot pages are forced to reside in the cold pool, and lose the

benefits of WARM (i.e., endurance improvements from relaxed retention times). WARM will not,

however, incur any further write overhead from keeping the less-hot pages in the cold pool. For

example, the dynamic sizing of the cooldown window prevents the less-hot pages from going back

and forth between the hot and cold pools.

4.2. Flash Management Policies

WARM partitions all of the blocks in a flash device into two pools, storing write-hot data in

the blocks belonging to the hot pool, and storing write-cold data in the blocks belonging to the cold

pool. Because of the different degrees of write-hotness of the data in each pool, WARM also applies

different management policies (i.e., refresh, garbage collection, and wear-leveling) to each pool, to

best extend their lifetime. We next describe these management policies for each pool, both when

WARM is applied alone and when WARM is applied along with refresh.

4.2.1. WARM-Only Management

WARM relaxes the internal retention time of only the blocks in the hot pool, without requiring

a refresh mechanism for the hot pool. Within the cold pool, WARM applies conventional garbage

collection (i.e., finding the block with the fewest valid pages to minimize unnecessary data move-

ment) and wear-leveling policies. Since the flash blocks in the cold pool contain data with much

lower write frequencies, they (1) consume a smaller number of P/E cycles, and (2) experience much

lower fragmentation (which only occurs when a page is updated), thus reducing garbage collec-

tion activities. As such, the lifetime of blocks in the cold pool increases even when conventional

management policies are applied.

Within the hot pool, WARM applies simple, in-order garbage collection (i.e., finding the oldest

block) and no wear-leveling policies. WARM performs writes to hot pool blocks in block order

(i.e., it starts on the block with the lowest ID number, and then advances to the block with the

next lowest ID number) to maintain a sequential ordering by write time. Writing pages in block

order enables garbage collection in the hot pool to also be performed in block order. Due to the

higher write frequency in the hot pool, all data in the hot pool is valid for a shorter amount of

time. Most of the pages in the oldest block are already invalid when the block is garbage collected,

increasing garbage collection efficiency. Since both writing and garbage collection are performed

17

in block order, each of the blocks will be naturally wear-leveled, as they will all incur the same

number of P/E cycles. Thus, we do not need to apply any additional wear-leveling policy.

4.2.2. Combining WARM with Refresh

WARM can also be used in conjunction with a refresh mechanism to reap additional endurance

benefits. WARM, on its own, can significantly extend the lifetime of a flash device by enabling re-

tention time relaxation on only the write-hot pages. However, these benefits are limited, as the cold

pool blocks will eventually exhaust their endurance at the original internal retention time. While

WARM cannot enable retention time relaxation on the cold pool blocks due to infrequent writes to

such blocks, a refresh mechanism can enable the relaxation, greatly extending the endurance of the

cold pool blocks. WARM still provides benefits over a refresh mechanism for the hot pool blocks,

since it avoids unnecessary write operations that refresh operations would incur.

When WARM and refresh are combined, we split the lifetime of the flash device into two phases.

The flash device starts in the pre-refresh phase, during which the same management policies as

WARM-only are applied. Note that during this phase, internal retention time is only relaxed for

the hot pool blocks. Once the endurance at the original retention time is exhausted, we enter

the refresh phase, during which the same management policies as WARM-only are applied and a

refresh policy (such as FCR [8]) is applied to the cold pool to avoid data loss. During this phase,

the retention time is relaxed for all blocks. Note that during both phases, the internal retention

time for hot pool blocks is always relaxed without the need for a refresh policy.

During the refresh phase, WARM also performs global wear-leveling to prevent the hot pool

from being prematurely worn out. The global wear-leveling policy rotates the entire hot pool to a

new set of physical flash blocks (which were previously part of the cold pool) every 1K hot block

P/E cycles. Over time, this rotation will use all of the flash blocks in the device for the hot pool

for one 1K P/E cycle interval. Thus, WARM wears out all of the flash blocks equally despite the

heterogeneity in write-frequency between the two pools.

4.3. Summary of Results

We evaluate lifetime improvement using an I/O trace based simulator, which simulates a NAND

flash-based SSD with different flash management policies. Figure 10 plots flash lifetime provided by

WARM alone (WARM), adaptive rate flash correct and refresh mechanism (ARFCR) [8], and WARM

combined with refresh (WARM+ARFCR), normalized to a conventional management policy without

WARM or refresh (Baseline). Using these results, we show that, when applied alone, WARM

improves overall flash lifetime by an average of 3.24× over Baseline. When WARM is applied

together with an adaptive refresh mechanism, the average lifetime improves by 12.9×, 1.21× over

adaptive refresh alone. We also analyze the hardware and performance overhead of WARM. WARM

requires four hardware counters and 1056B memory overhead. In the worse case, WARM has a

performance penalty of 5.8% over Baseline due to flash management overhead. On average across

all workloads, this overhead is negligible (<2%). In conclusion, WARM can improve flash lifetime

18

significantly while requiring minimal hardware and performance overhead.

0

1

2

4

8

16

Baseline WARM ARFCR WARM+ARFCRNo

rmal

ized

Lif

etim

e Im

pro

vem

ent

Figure 10. Normalized lifetime improvement when WARM is applied on top of Baseline andARFCR.

5. Proposed Work 1: Online Characterization and Modeling of

NAND Flash Memory Errors

Motivation: NAND flash memory errors are common in raw flash chips and they significantly

impact flash reliability. To guarantee the reliable operation of NAND flash memory, strong ECC

codes are applied to mask these errors from the user, leading to significant hardware and capacity

overhead [2, 8, 18, 54]. Understanding these errors through offline characterization and modeling

can enable more cost-effective ways to tolerate them than uniformly applying stronger ECC codes

regardless of the error properties. We expect to examine modern NAND flash chips such as 3D,

multi-level cell (MLC), or triple-level cell (TLC) flash chips to understand the nature of these

errors, depending on the availability of the NAND flash chips we can test. Based on these results,

we expect to construct an accurate threshold voltage distribution model online, which will enable

other mechanisms to exploit the knowledge for improving flash reliability. In this work, we hope to

arrive at a new online mechanism to characterize and model the threshold voltage distribution of

flash cells during system operation at low cost and low latency. We divide this proposed work into

four major directions.

First, we expect to perform a thorough characterization of the threshold voltage distribution.

Such characterization is enabled by the existing read-retry capability of raw NAND flash chips,

which allows us to sweep the read reference voltage and accurately obtain the threshold voltage

for each flash cell. Using this methodology, we expect to study the properties of the threshold

voltage distribution, especially those that affect NAND flash error rates such as the tail distribution

(i.e., the part of the distribution that is far from its mean). We expect to study the effects of

wear out on the distribution by programming different flash blocks to different P/E cycles before

the characterization. We expect to analyze the effects of temperature, stored data pattern, and

19

retention on the threshold voltage distribution to achieve higher accuracy.

Second, we expect to model the threshold voltage distribution from real characterization data.

We start by statically fitting various distribution models to the characterized data under each P/E

cycle count. We expect to evaluate and compare the accuracy of these different static models by

comparing their estimated NAND flash error rates as well as their modeling error rates. Once we

have determined the best static model, we then expect to model the dynamic shift of threshold

voltage distribution over P/E cycles. We expect to evaluate the accuracy of the dynamic model

by showing how well it can predict future threshold voltage distribution and flash errors using

only data obtained in lower P/E cycles. To make it more practical to construct the models,

we also expect to develop techniques that minimize the computation and hardware overhead of

the construction process of these models, by selecting an easy-to-compute model and designing

hardware to accelerate the models.

Third, we expect to understand different factors affecting the accuracy of such a model. In

particular, we would like to develop an understanding of how temperature, stored data pattern,

and retention age affect the distribution and model accuracy.

Fourth, in order to examine the evolution of flash errors and the accuracy of models for them

in newer generation NAND flash chips, we expect to study error patterns in 3D NAND flash

devices. These include error patterns for P/E cycling errors, retention errors, read disturb errors,

and program interference errors. We also expect to perform similar characterization and modeling

(as described above) on 3D NAND flash chips. We expect to understand different factors affecting

the accuracy of our model. In particular, we would like to develop an understanding of how process

variation across layers, P/E cycling, retention, and read disturb affect the distribution and model

accuracy. We also expect to study how three-dimensional program interference impacts NAND

flash reliability.

Towards these four directions, we expect to answer at least the following research questions:

• How does flash wear out affect threshold voltage distribution and flash error rates? How do

other effects (retention and read disturb) affect threshold voltage distribution and flash error

rates?

• How can we accurately model the threshold voltage distribution under any static amount of

wear out? What is the property of the tail distribution (i.e., the part of the distribution far

away from the mean), and which models can be used to represent the shape of the tail?

• How to model the dynamic effect of threshold voltage distribution shifts under wear out?

• Can we combine the dynamic and static models of the threshold voltage distribution to increase

the prediction accuracy of NAND flash error rate?

• How to minimize the computation and hardware overhead for characterizing and modeling the

threshold voltage distribution in flash controllers?

• How do temperature, stored data pattern, retention age affect the threshold voltage distribution

and accuracy of the models we develop?

20

• What are the flash error characteristics in 3D NAND devices for P/E cycling errors, retention

errors, read disturb errors, and program interference errors?

• How do process variation, P/E cycling, retention, read disturb, and program interference in 3D

NAND affect threshold voltage distribution and our model?

• How can we characterize three dimensional program interference in 3D NAND? Does three

dimensional program interference affect data reliability in neighboring flash block?

6. Proposed Work 2: Model-Driven Flash Management Policies

Motivation: Today’s flash controllers manage multiple flash chips based on a set of fixed, con-

servatively estimated flash parameters provided by the flash vendor. These parameters, such as

read reference voltages, ECC strength, flash memory health, etc., are not specifically tuned for the

NAND flash chips connected to the flash controller and therefore cannot adjust to the amount of

wear on each flash block to improve flash reliability and performance. In this work, however, we

expect to take advantage of the threshold voltage distribution model constructed in Section 5. Our

goal is to show that our online model can be exploited in various ways to improve flash reliability.

Our approach can be divided into three steps.

First, we aim to find out which flash parameters to estimate. We expect to explore those

parameters that can be estimated using our proposed model. Among these parameters, we expect

to select those that can be helpful for improving flash reliability. We expect to estimate the optimal

read reference voltage of each read using the predicted threshold voltage distribution. We expect to

estimate the raw bit error rate under different assumptions of read reference voltages. We expect

to estimate the expected remaining lifetime of each flash block without counting P/E cycles. We

expect to estimate the log-likelihood ratio of the distribution, which can be used for improving

ECC coding efficiency. We also expect to quantitatively evaluate and compare the accuracy of

these estimations using different models which we used in Section 5.

Second, we aim to develop techniques to utilize these estimations to improve flash reliability. We

expect to quantitatively evaluate and show how these techniques improve flash reliability in different

ways. With the optimal read reference voltage estimation, we can adapt the read reference voltage

to minimize raw bit error rate before applying read-retry technique. With the raw bit error rate

estimation, we can provide the right amount of ECC protection with the lowest overhead. With the

expected lifetime estimation, we can fully utilize the lifetime of each flash block without suffering

from loss of flash memory capacity. With the more accurate log-likelihood ratio estimation, we can

improve the efficiency of existing ECC codes.

Third, we aim to develop techniques for 3D NAND flash chips based on the characterization

and modeling we perform in Section 5. In particular, we expect to develop techniques to mitigate

any potential new reliability issues in 3D NAND such as three dimensional program interference.

We also expect to develop techniques to tolerate process variation across layers for 3D NAND chips.

To this end, we expect to answer the following research questions:

21

• How can we use our proposed models to predict flash parameters such as optimal read reference

voltage, raw bit error rate, remaining flash lifetime, and optimal ECC parameters? How does

the accuracy of the model affect these estimations?

• How often and at what granularity shall we predict the optimal read reference voltage to mini-

mize flash read error rate and read latency with low overhead?

• How can we efficiently adapt to the right amount of ECC protection to the predicted raw

bit error rate? How can we provide appealing flash reliability or lifetime benefits using such

techniques?

• How can we adjust flash management policies to adapt to the expected remaining flash lifetime,

instead of the P/E cycle counts? How often and for which flash block shall we estimate the

remaining flash lifetime to maximize flash lifetime and minimize performance overhead?

• How does the accuracy of log-likelihood ratio estimation affect error correcting capability?

• How can we mitigate three dimensional program interference in 3D NAND?

7. Proposed Work 3: Characterization and Utilization of NAND

Flash Memory Self-Healing Effect

Motivation: The self-healing effect is a phenomenon that NAND flash memory cells gradually

recover a fraction of its wear over time [37, 56], which can be accelerated by high temperature. As

we discussed in Section 3.6, no prior work attempts to verify their model on modern NAND flash

chips and demonstrate successful self-healing operation in real flash chips. In this work, we strive

to characterize and understand the self-healing effect and design techniques that utilize this effect

to improve flash reliability. Our approach has two steps.

First, we expect to comprehensively characterize different aspects of the self-healing effect using

real NAND flash chips. We expect to investigate the effectiveness of the self-healing effect (i.e., if

it can be used to improve flash reliability) by comparing raw bit error rates before and after heat-

accelerated self-healing under different P/E cycles. We expect to study whether the self-healing

effect persists after P/E cycling (i.e., if it can be used to improve overall lifetime of the flash

memory) by comparing total P/E cycle endurance with and without self-healing. We also expect

to study whether the self-healing effect is repeatable (i.e., if it can be used multiple times to further

improve flash lifetime) by comparing the endurance improvement of the first self-healing, second

self-healing, etc. Dwell-time, the time duration between two consecutive P/E cycles for which the

flash memory cell can recover, directly affects the effectiveness of the self-healing operation. Dwell-

time, similar to retention time, can be accelerated by high temperature according to Arrhenius

Law [6], allowing faster recovery of flash memory cells. We expect to study the relation between

dwell-time and the effectiveness of self-healing operation.

Second, we expect to design techniques to utilize the self-healing effect to improve flash reliability

and lifetime. We expect to investigate the feasibility of heating NAND flash memory at different

granularities using an internal or external heat source. We expect to design mechanisms to trigger

22

self-healing operations that maximize flash lifetime and minimize performance overhead. As heat-

accelerated self-healing operation also accelerates the retention effect, the data currently stored on

the flash memory can be damaged. We expect to design mechanisms that avoid data corruption

due to retention loss by moving the data, while minimizing the performance overhead. We can

predict the idle period of the workload and the effectiveness of the self-healing operation such that

we can schedule the self-healing operation when it is most effective and has the least interference.

We expect to evaluate the flash lifetime improvement and the performance penalty of our proposed

techniques based on our characterization results.

To this end, we expect to answer the following research questions:

• Can heat-accelerated self-healing operation effectively reduce raw bit error rate in real flash

chips?

• How well can the benefit of a self-healing operation persist over P/E cycles? How much flash

lifetime can a self-healing operation improve when performed under different P/E cycles?

• Can we repeat the self-healing operation to further extend flash lifetime?

• How does the self-healing effect correlate with dwell time? How do we design experiments to

characterize this effect?

• How can we utilize the self-healing effect to improve flash reliability and lifetime? How can we

design online/offline mechanisms to trigger the self-healing effect?

8. Timeline

Depending on the success of the different ideas presented in this proposal and the availability

of time, we will aim to explore as many ideas as possible. My goal is to graduate in the Summer of

2017. Table 1 lists my tentative timeline for pursuing the ideas proposed in this document. Note

the success of some of our ideas heavily depend on the data from the experimental results.

Duration Description

Apr-Jun 2016 Work on model-driven flash management policies (Potential milestone: sub-mission to JSAC).

Jul-Sep 2016 Work on characterization and modeling of 3D NAND flash memory errors(Potential milestone: submission to SIGMETRICS).

Oct 2016-Mar 2017 Work on characterization and utilization of self-healing effect (Potential mile-stone: submission to MICRO).

Apr-Jul 2017 Defend and submit thesis.

Table 1. Timeline for this proposal.

23

9. Conclusion

In this proposal, our goal is to improve NAND flash memory reliability with a multitude of

low-cost architectural techniques. To this end, we first describe a mechanism that we have already

worked on: WARM, a technique that manages flash retention differently for write-hot data and

write-cold data, and improves flash lifetime at low cost and low performance overhead. For our

future work, we propose to explore three new directions. The first direction proposes to develop an

online technique to characterize and model flash errors. The second direction proposes to develop

flash management policies that improves flash lifetime by exploiting our online model. The third

direction proposes to understand and develop new techniques that utilize flash self-healing effect.

We hope that this research will demonstrate that NAND flash memory reliability can be improved

at low cost and with low performance overhead by deploying various architectural techniques that

are aware of higher-level application behavior and underlying flash device characteristics.

References[1] NVM Express 1.2a, 2015. http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1 2a.pdf.

[2] R. C. Bose and D. K. Ray-Chaudhuri. On A Class of Error Correcting Binary Group Codes. Informationand control, 1960.

[3] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai. Error Patterns in MLC NAND Flash Memory: Mea-surement, Characterization, and Analysis. In DATE, 2012.

[4] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai. Threshold Voltage Distribution in NAND Flash Memory:Characterization, Analysis, and Modeling. In DATE, 2013.

[5] Y. Cai, Y. Luo, S. Ghose, and O. Mutlu. Read Disturb Errors in MLC NAND Flash Memory: Char-acterization, Mitigation, and Recovery. In DSN, 2015.

[6] Y. Cai, Y. Luo, E. F. Haratsch, K. Mai, and O. Mutlu. Data Retention in MLC NAND Flash Memory:Characterization, Optimization, and Recovery. In HPCA, 2015.

[7] Y. Cai, O. Mutlu, E. F. Haratsch, and K. Mai. Program Interference in MLC NAND Flash Memory:Characterization, Modeling, and Mitigation. In ICCD, 2013.

[8] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. Unsal, and K. Mai. Flash Correct andRefresh: Retention Aware Management for Increased Lifetime. In ICCD, 2012.

[9] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. Unsal, and K. Mai. Error Analysis andRetention-Aware Error Management for NAND Flash Memory. Intel Technology Journal (ITJ), 2013.

[10] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, O. Unsal, A. Cristal, and K. Mai. Neighbor Cell AssistedError Correction in MLC NAND Flash Memories. In SIGMETRICS, 2014.

[11] Y.-M. Chang, Y.-H. Chang, J.-J. Chen, T.-W. Kuo, H.-P. Li, and H.-T. Lue. On Trading Wear-LevelingWith Heal-Leveling. In DAC, 2014.

[12] F. Chen, T. Luo, and X. Zhang. CAFTL: A Content-Aware Flash Translation Layer Enhancing theLifespan of Flash Memory based Solid State Drives. In FAST, 2011.

[13] J. Cooke. The Inconvenient Truths of NAND Flash Memory. Flash Memory Summit, 2007.

24

[14] N. Dayan, P. Bonnet, and S. Idreos. GeckoFTL: Scalable Flash Translation Techniques For Very LargeFlash Devices. In SIGMOD, 2016.

[15] G. Dong, N. Xie, and T. Zhang. Enabling NAND Flash Memory Use Soft-Decision Error CorrectionCodes at Minimal Read Latency Overhead. IEEE Trans. on Circuits and Systems, 2013.

[16] B. Eitan. Non-Volatile Semiconductor Memory Cell Utilizing Asymmetrical Charge Trapping, 1998.US Patent 5,768,192.

[17] T. Frankie, G. Hughes, and K. Kreutz-Delgado. SSD TRIM Commands Considerably Improve Over-provisioning. Flash Memory Summit, 2011.

[18] R. G. Gallager. Low-Density Parity-Check Codes. Information Theory, IRE Transactions on, 1962.

[19] A. Gupta, Y. Kim, and B. Urgaonkar. DFTL: A Flash Translation Layer Employing Demand-basedSelective Caching of Page-level Address Mappings. In ASPLOS, 2009.

[20] J.-U. Kang, J. Hyun, H. Maeng, and S. Cho. The Multi-Streamed Solid-State Drive. In HotStorage,2014.

[21] S. Lee, T. Kim, K. Kim, and J. Kim. Lifetime Management of Flash-Based SSDs Using Recovery-AwareDynamic Throttling. In FAST, 2012.

[22] J. Li, K. Zhao, J. Ma, and T. Zhang. Realizing Unequal Error Correction for NAND Flash Memoryat Minimal Read Latency Overhead. Circuits and Systems II: Express Briefs, IEEE Transactions on,2014.

[23] J. Li, K. Zhao, X. Zhang, J. Ma, M. Zhao, and T. Zhang. How Much Can Data Compressibility Helpto Improve NAND Flash Memory Lifetime? In FAST, 2015.

[24] W. Li, G. Jean-Baptise, J. Riveros, G. Narasimhan, and M. Zhao. CacheDedup: In-line Deduplicationfor Flash Caching. In FAST, 2016.

[25] C.-Y. Liu, Y.-M. Chang, and Y.-H. Chang. Read Leveling for Flash Storage Systems. In SYSTOR,2015.

[26] Y. Lu, J. Shu, and W. Zheng. Extending the Lifetime of Flash-Based Storage Through Reducing WriteAmplification from File Systems. In FAST, 2013.

[27] Y. Luo, Y. Cai, S. Ghose, J. Choi, and O. Mutlu. WARM: Improving NAND Flash Memory Lifetimewith Write-Hotness Aware Retention Management. In MSST, 2015.

[28] Y. Luo, S. Ghose, T. Li, S. Govindan, B. Sharma, B. Kelly, A. Boroumand, and O. Mutlu. CREAM(Capacity- and Reliability-Adaptive Memory): Enabling the Use of ECC DRAM to Increase MemoryCapacity. In under submission to MICRO, 2016.

[29] Y. Luo, S. Govindan, B. Sharma, M. Santaniello, J. Meza, A. Kansal, J. Liu, B. Khessib, K. Vaid, andO. Mutlu. Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost viaHeterogeneous-Reliability Memory. In DSN, 2014.

[30] D. Ma, J. Feng, and G. Li. LazyFTL: A Page-Level Flash Translation Layer Optimized for NANDFlash Memory. In SIGMOD, 2011.

[31] D. Ma, J. Feng, and G. Li. A survey of address translation technologies for flash memories. CSUR,2014.

[32] D. J. MacKay and R. M. Neal. Near Shannon Limit Performance of Low Density Parity Check Codes.Electronics letters, 1996.

25

[33] S.-I. B. Members. Serial ATA International Organization: Serial ATA Revision 3.0, 2009.

[34] J. Meza, Q. Wu, S. Kumar, and O. Mutlu. A Large-Scale Study of Flash Memory Failures in The Field.In SIGMETRICS, 2015.

[35] N. Mielke, T. Marquart, N.Wu, J.Kessenich, H. Belgal, E. Schares, and F. Triverdi. Bit Error Rate inNAND Flash Memories. In IRPS, 2008.

[36] V. Mohan. Modeling The Physical Characteristics of NAND Flash Memory. PhD thesis, University ofVirginia, 2010.

[37] V. Mohan, T. Siddiqua, S. Gurumurthi, and M. R. Stan. How I Learned to Stop Worrying and LoveFlash Endurance. 2010.

[38] R. Motwani. Estimation of Flash Memory Level Distributions Using Interpolation Techniques for Op-timizing the Read Reference. In GLOBECOM, 2015.

[39] R. Motwani and C. Ong. Design of LDPC Coding Schemes for Exploitation of Bit Error Rate DiversityAcross Dies in NAND Flash Memory. In ICNC, 2013.

[40] R. Motwani and C. Ong. Soft Decision Decoding of RAID Stripe for Higher Endurance of Flash MemoryBased Solid State Drives. In ICNC, 2015.

[41] I. Narayanan, D. Wang, M. Jeon, B. Sharma, L. Caulfield, A. Sivasubramaniam, B. Cutler, J. Liu,B. Khessib, and K. Vaid. SSD Failures in Datacenters: What, When and Why? In SIGMETRICS,2016.

[42] Y. Pan, G. Dong, and T. Zhang. Exploiting Memory Device Wear-Out Dynamics to Improve NANDFlash Memory System Performance. In FAST, 2011.

[43] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou, C. Camp, T. Griffin, G. Tressler,and A. Walls. Using Adaptive Read Voltage Thresholds to Enhance The Reliability of MLC NANDFlash Memory Systems. In GLSVLSI, 2014.

[44] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou, C. Camp, T. Griffin, G. Tressler,and A. Walls. Enhancing the Reliability of MLC NAND Flash Memory Systems by Read ChannelOptimization. TODAES, 2015.

[45] D. Park, B. Debnath, and D. Du. CFTL: A Convertible Flash Translation Layer Adaptive to DataAccess Patterns. In SIGMETRICS, 2010.

[46] J. Park, J. Jeong, S. Lee, Y. Song, and J. Kim. Improving Performance and Lifetime of NAND StorageSystems Using Relaxed Program Sequence. In DAC, 2016.

[47] K.-T. Park, M. Kang, D. Kim, S.-W. Hwang, B. Y. Choi, Y.-T. Lee, C. Kim, and K. Kim. A ZeroingCell-To-Cell Interference Page Architecture With Temporary LSB Storing and Parallel MSB ProgramScheme for MLC NAND Flash Memories. JSSC, 2008.

[48] T. Parnell, N. Papandreou, T. Mittelholzer, and H. Pozidis. Modelling of the Threshold Voltage Dis-tributions of Sub-20nm NAND Flash Memory. In GLOBECOM, 2014.

[49] A. Prodromakis, S. Korkotsides, and T. Antonakopoulos. MLC NAND Flash Memory: Aging Effectand Chip/Channel Emulation. Microprocessors and Microsystems, 2015.

[50] B. Schroeder, R. Lagisetty, and A. Merchant. Flash Reliability in Production: The Expected and TheUnexpected. In FAST, 2016.

[51] K.-D. Suh, B.-H. Suh, Y.-H. Lim, J.-K. Kim, Y.-J. Choi, Y.-N. Koh, S.-S. Lee, S.-C. Suk-Chon, B.-S.Choi, J.-S. Yum, et al. A 3.3 V 32 Mb NAND Flash Memory With Incremental Step Pulse ProgrammingScheme. Solid-State Circuits, IEEE Journal of, 1995.

26

[52] H. Tabrizi, B. Peleato, R. Agarwal, and J. Ferreira. Improving NAND Flash Read Performance ThroughLearning. In ICC, 2015.

[53] S. Technology. Serial Attached SCSI (SAS) Interface Manual, 2009.http://www.seagate.com/staticfiles/support/disc/manuals/Interface%20manuals/100293071c.pdf.

[54] J. Wang, K. Vakilinia, T.-Y. Chen, T. Courtade, G. Dong, T. Zhang, H. Shankar, and R. Wesel.Enhanced Precision Through Multiple Reads for LDPC Decoding in Flash Memories. Selected Areas inCommunications, IEEE Journal on, 2014.

[55] G. Wu and X. He. Delta-FTL: Improving SSD Lifetime via Exploiting Content Locality. In EuroSys,2012.

[56] Q. Wu, G. Dong, and T. Zhang. Exploiting Heat-Accelerated Flash Memory Wear-Out Recovery toEnable Self-Healing SSDs. In HotStorage, 2011.

27

Documents

Architectural Techniques for Improving NAND Flash Memory