A Semi-Preemptive Garbage Collector for Solid State Drives Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim Presented

A Semi-Preemptive Garbage Collector for Solid State

Drives

A Semi-Preemptive Garbage Collector for Solid State

Drives

Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim

Presented by Junghee Lee

2

High Performance Storage SystemsHigh Performance Storage Systems

• Server centric services– File, web & media servers, transaction processing servers

• Enterprise-scale Storage Systems– Information technology focusing on storage, protection, retrieval of

data in large-scale environments

Google's massive server farms High PerformanceStorage Systems

Storage UnitHard Disk Drive

3

Spider: A Large-scale Storage System Spider: A Large-scale Storage System

• Jaguar – Peta-scale computing machine– 25,000 nodes with 250,000

cores and over 300 TB memory• Spider storage system

– The largest center-wide Lustre-based file system

– Over 10.7 PB of RAID 6 formatted capacity• 13,400 x 1 TB HDDs

– 192 Lustre I/O servers• Over 3TB of memory (on Lustre

I/O servers)

4

Emergence of NAND Flash based SSDEmergence of NAND Flash based SSD

• NAND Flash vs. Hard Disk Drives– Pros:

• Semi-conductor technology, no mechanical parts• Offer lower access latencies

– μs for SSDs vs. ms for HDDs• Lower power consumption• Higher robustness to vibrations and temperature

– Cons:• Limited lifetime

– 10K - 1M erases per block• High cost

– About 8X more expensive than current hard disks• Performance variability

5

OutlineOutline

• Introduction• Background and Motivation

– NAND Flash and SSD– Garbage Collection– Pathological Behavior of SSDs

• Semi-Preemptive Garbage Collection• Evaluation• Conclusion

6

NAND Flash based SSDNAND Flash based SSD

fwrite(file, data)

Block write(LBA, size)

Page write(bank, block, page)

Application

OS

Device

Process Process

File System (FAT, Ext2, NTFS …)

Block Device Driver

Block Interface (SATA, SCSI, etc)

MemoryCPU(FTL)

Flash Flash Flash Flash

SSD

7

NAND Flash OrganizationNAND Flash Organization

Package

Plane 0

Block 0

Page 0

Page 63

Block 2047

Page 0

Page 63

…

…

…

Die 0

Pla

ne

0

Pla

ne

1

Pla

ne

2

Pla

ne

3Register Read

0.025 ms

Write

0.200 ms

Erased

Erase

1.500 ms

Die 1P

lan

e 0

Pla

ne

1

Pla

ne

2

Pla

ne

3

8

Out-Of-Place WriteOut-Of-Place Write

LPN0

LPN1

LPN2

LPN3

Logical-to-Physical Address Mapping Table

PPN1

PPN4

PPN2

PPN5

Physical Blocks

P0 IP1 VP2 VP3 E

P4 VP5 VP6 EP7 E

Write to LPN2

IP3 V

PPN3

Invalidate PPN2

Write to PPN3

Update table

9

Garbage CollectionGarbage Collection

Physical Blocks

P0 IP1 IP2 IP3 I

P4 VP5 VP6 EP7 E

Select Victim Block

Move Valid Pages

Erase Victim Block

P1 V

P3 V

P6 VP7 V

P0 EP1 EP2 EP3 E

2 reads + 2 writes + 1 erase= 2*0.025 + 2*0.200 + 1.5 = 1.950(ms) !!

10

Pathological Behavior of SSDsPathological Behavior of SSDs

• Does GC have an impact on the foreground operations?– If so, we can observe sudden bandwidth drop– More drop with more write requests– More drop with more bursty workloads

• Experimental Setup– SSD devices

• Intel (SLC) 64GB SSD• SuperTalent (MLC) 120GB SSD

– I/O generator• Used libaio asynchronous I/O library for block-level testing

11

Bandwidth Drop for Write-Dominant WorkloadsBandwidth Drop for Write-Dominant Workloads• Experiments

– Measured bandwidth for 1MB by varying read-write ratio

Intel SLC (SSD) SuperTalent MLC (SSD)

Performance variability increases as we increase write-percentage of workloads.

12

Performance Variability for Bursty WorkloadsPerformance Variability for Bursty Workloads

• Experiments– Measured SSD write bandwidth for queue depth (qd) is 8 and 64– Normalized I/O bandwidth with a Z distribution

Intel SLC (SSD) SuperTalent MLC (SSD)

Performance variability increases as we increase the arrival-rate of requests (bursty workloads).

13

Lessons LearnedLessons Learned

• From the empirical study, we learned:– Performance variability increases as the percentage of writes in

workloads increases. – Performance variability increases with respect to the arrival rate of

write requests.

• This is because:– Any incoming requests during the GC should wait until the on-going

GC ends. – GC is not preemptive

14

OutlineOutline

• Introduction• Background and Motivation• Semi-Preemptive Garbage Collection

– Semi-Preemption– Further Optimization– Level of Allowed Preemption

• Evaluation• Conclusion

15

Technique #1: Semi-PreemptionTechnique #1: Semi-Preemption

Rx Wx ERy Wy

TimeGC

Wz Request

Rx

Wx

E

Read page x

Write page x

Erase a block

Data transfer

Meta data update

Non-Preemptive GC

WzWz

Preemptive GC

16

Technique #2: MergeTechnique #2: Merge

Rx

TimeGC

Ry Request

Wx Ry Wy E

Rx

Wx

E

Read page x

Write page x

Erase a block

Data transfer

Meta data update

17

Technique #3: PipelineTechnique #3: Pipeline

Rx

Wx

E

Read page x

Write page x

Erase a block

Data transfer

Meta data update

Rx Wx ERy Wy

TimeGC

Rz Request

Rz

18

Level of Allowed PreemptionLevel of Allowed Preemption

• Drawback of PGC: The completion time of GC is delayed May incur lack of free blocks Sometimes need to prohibit preemption

• States of PGC

State 0

State 1

State 2

State 3

Garbage collection

Read requests

Write requests

X

O

O

O

O

O

X

O

X

X

19

OutlineOutline

• Introduction• Background and Motivation• Semi-Preemptive Garbage Collection• Evaluation

– Setup– Synthetic Workloads– Realistic Workloads

• Conclusion

20

SetupSetup

• Simulator– MSR’s SSD simulator based on DiskSim

• Workloads– Synthetic workloads

• Used the synthetic workload generator in DiskSim– Realistic workloads

WorkloadsAverage request

size (KB)Read ratio

(%)

Arrival rate

(IOP/s)Financial 7.09 18.92 47.19

Cello 7.06 19.63 74.24

TPC-H 31.62 91.80 172.73

OpenMail 9.49 63.30 846.62

Write dominant

Read dominant

21

Performance Improvements for Synthetic WorkloadsPerformance Improvements for Synthetic Workloads• Varied four parameters: request size, inter-arrival time, sequentiality and

read/write ratio• Varied one at a time fixing others

Request size (KB)8 16 32 64

0

0.5

1.0

1.5

2.0

2.5

Ave

rag

e r

esp

on

se t

ime

(ms)

00.51.01.52.02.53.03.54.04.5

Sta

nd

ard

dev

iati

on

NPGC

PGC

NPGC std

PGC std

22

Performance Improvement for Synthetic Workloads (con’t)Performance Improvement for Synthetic Workloads (con’t)

Inter-arrival time (ms)

10 5 3 1

Bursty

0.8 0.6 0.4 0.2

Probability of sequential access

Random dominant

0.8 0.6 0.4 0.2

Probability of read access

Write dominant

23

Performance Improvement for Realistic WorkloadsPerformance Improvement for Realistic Workloads• Average Response Time

Improvement of average response time by 6.5% and 66.6% for Financial and Cello.

Financial Cello TPC-H OpenMail0

0.2

0.4

0.6

0.8

1.0

No

rmal

ized

ave

. r

esp

on

se t

ime

• Variance of Response Times

Improvement of variance of response time by 49.8% and 83.3% for Financial and Cello.

Financial Cello TPC-H OpenMail0

0.2

0.4

0.6

0.8

1.0

No

rmal

ized

sta

nd

ard

dev

iati

on

24

ConclusionsConclusions

• Solid state drives– Fast access speed– Performance variation garbage collection

• Semi-preemptive garbage collection– Service incoming requests during GC

• Average response time and performance variation are reduced by up to 66.6% and 83.3%

25

Questions?Questions?

Contact info

Junghee [email protected] and Computer EngineeringGeorgia Institute of Technology

26

Thank you!Thank you!

Documents

A Semi-Preemptive Garbage Collector for Solid State Drives Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim Presented