Upload
theresa-miller
View
218
Download
3
Tags:
Embed Size (px)
Citation preview
A Semi-Preemptive Garbage Collector for Solid State
Drives
A Semi-Preemptive Garbage Collector for Solid State
Drives
Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim
Presented by Junghee Lee
2
High Performance Storage SystemsHigh Performance Storage Systems
• Server centric services– File, web & media servers, transaction processing servers
• Enterprise-scale Storage Systems– Information technology focusing on storage, protection, retrieval of
data in large-scale environments
Google's massive server farms High PerformanceStorage Systems
Storage UnitHard Disk Drive
3
Spider: A Large-scale Storage System Spider: A Large-scale Storage System
• Jaguar – Peta-scale computing machine– 25,000 nodes with 250,000
cores and over 300 TB memory• Spider storage system
– The largest center-wide Lustre-based file system
– Over 10.7 PB of RAID 6 formatted capacity• 13,400 x 1 TB HDDs
– 192 Lustre I/O servers• Over 3TB of memory (on Lustre
I/O servers)
4
Emergence of NAND Flash based SSDEmergence of NAND Flash based SSD
• NAND Flash vs. Hard Disk Drives– Pros:
• Semi-conductor technology, no mechanical parts• Offer lower access latencies
– μs for SSDs vs. ms for HDDs• Lower power consumption• Higher robustness to vibrations and temperature
– Cons:• Limited lifetime
– 10K - 1M erases per block• High cost
– About 8X more expensive than current hard disks• Performance variability
5
OutlineOutline
• Introduction• Background and Motivation
– NAND Flash and SSD– Garbage Collection– Pathological Behavior of SSDs
• Semi-Preemptive Garbage Collection• Evaluation• Conclusion
6
NAND Flash based SSDNAND Flash based SSD
fwrite(file, data)
Block write(LBA, size)
Page write(bank, block, page)
Application
OS
Device
Process Process
File System (FAT, Ext2, NTFS …)
Block Device Driver
Block Interface (SATA, SCSI, etc)
MemoryCPU(FTL)
Flash Flash Flash Flash
SSD
7
NAND Flash OrganizationNAND Flash Organization
Package
Plane 0
Block 0
Page 0
Page 63
Block 2047
Page 0
Page 63
…
…
…
Die 0
Pla
ne
0
Pla
ne
1
Pla
ne
2
Pla
ne
3Register Read
0.025 ms
Write
0.200 ms
Erased
Erase
1.500 ms
Die 1P
lan
e 0
Pla
ne
1
Pla
ne
2
Pla
ne
3
8
Out-Of-Place WriteOut-Of-Place Write
LPN0
LPN1
LPN2
LPN3
Logical-to-Physical Address Mapping Table
PPN1
PPN4
PPN2
PPN5
Physical Blocks
P0 IP1 VP2 VP3 E
P4 VP5 VP6 EP7 E
Write to LPN2
IP3 V
PPN3
Invalidate PPN2
Write to PPN3
Update table
9
Garbage CollectionGarbage Collection
Physical Blocks
P0 IP1 IP2 IP3 I
P4 VP5 VP6 EP7 E
Select Victim Block
Move Valid Pages
Erase Victim Block
P1 V
P3 V
P6 VP7 V
P0 EP1 EP2 EP3 E
2 reads + 2 writes + 1 erase= 2*0.025 + 2*0.200 + 1.5 = 1.950(ms) !!
10
Pathological Behavior of SSDsPathological Behavior of SSDs
• Does GC have an impact on the foreground operations?– If so, we can observe sudden bandwidth drop– More drop with more write requests– More drop with more bursty workloads
• Experimental Setup– SSD devices
• Intel (SLC) 64GB SSD• SuperTalent (MLC) 120GB SSD
– I/O generator• Used libaio asynchronous I/O library for block-level testing
11
Bandwidth Drop for Write-Dominant WorkloadsBandwidth Drop for Write-Dominant Workloads• Experiments
– Measured bandwidth for 1MB by varying read-write ratio
Intel SLC (SSD) SuperTalent MLC (SSD)
Performance variability increases as we increase write-percentage of workloads.
12
Performance Variability for Bursty WorkloadsPerformance Variability for Bursty Workloads
• Experiments– Measured SSD write bandwidth for queue depth (qd) is 8 and 64– Normalized I/O bandwidth with a Z distribution
Intel SLC (SSD) SuperTalent MLC (SSD)
Performance variability increases as we increase the arrival-rate of requests (bursty workloads).
13
Lessons LearnedLessons Learned
• From the empirical study, we learned:– Performance variability increases as the percentage of writes in
workloads increases. – Performance variability increases with respect to the arrival rate of
write requests.
• This is because:– Any incoming requests during the GC should wait until the on-going
GC ends. – GC is not preemptive
14
OutlineOutline
• Introduction• Background and Motivation• Semi-Preemptive Garbage Collection
– Semi-Preemption– Further Optimization– Level of Allowed Preemption
• Evaluation• Conclusion
15
Technique #1: Semi-PreemptionTechnique #1: Semi-Preemption
Rx Wx ERy Wy
TimeGC
Wz Request
Rx
Wx
E
Read page x
Write page x
Erase a block
Data transfer
Meta data update
Non-Preemptive GC
WzWz
Preemptive GC
16
Technique #2: MergeTechnique #2: Merge
Rx
TimeGC
Ry Request
Wx Ry Wy E
Rx
Wx
E
Read page x
Write page x
Erase a block
Data transfer
Meta data update
17
Technique #3: PipelineTechnique #3: Pipeline
Rx
Wx
E
Read page x
Write page x
Erase a block
Data transfer
Meta data update
Rx Wx ERy Wy
TimeGC
Rz Request
Rz
18
Level of Allowed PreemptionLevel of Allowed Preemption
• Drawback of PGC: The completion time of GC is delayed May incur lack of free blocks Sometimes need to prohibit preemption
• States of PGC
State 0
State 1
State 2
State 3
Garbage collection
Read requests
Write requests
X
O
O
O
O
O
X
O
X
X
19
OutlineOutline
• Introduction• Background and Motivation• Semi-Preemptive Garbage Collection• Evaluation
– Setup– Synthetic Workloads– Realistic Workloads
• Conclusion
20
SetupSetup
• Simulator– MSR’s SSD simulator based on DiskSim
• Workloads– Synthetic workloads
• Used the synthetic workload generator in DiskSim– Realistic workloads
WorkloadsAverage request
size (KB)Read ratio
(%)
Arrival rate
(IOP/s)Financial 7.09 18.92 47.19
Cello 7.06 19.63 74.24
TPC-H 31.62 91.80 172.73
OpenMail 9.49 63.30 846.62
Write dominant
Read dominant
21
Performance Improvements for Synthetic WorkloadsPerformance Improvements for Synthetic Workloads• Varied four parameters: request size, inter-arrival time, sequentiality and
read/write ratio• Varied one at a time fixing others
Request size (KB)8 16 32 64
0
0.5
1.0
1.5
2.0
2.5
Ave
rag
e r
esp
on
se t
ime
(ms)
00.51.01.52.02.53.03.54.04.5
Sta
nd
ard
dev
iati
on
NPGC
PGC
NPGC std
PGC std
22
Performance Improvement for Synthetic Workloads (con’t)Performance Improvement for Synthetic Workloads (con’t)
Inter-arrival time (ms)
10 5 3 1
Bursty
0.8 0.6 0.4 0.2
Probability of sequential access
Random dominant
0.8 0.6 0.4 0.2
Probability of read access
Write dominant
23
Performance Improvement for Realistic WorkloadsPerformance Improvement for Realistic Workloads• Average Response Time
Improvement of average response time by 6.5% and 66.6% for Financial and Cello.
Financial Cello TPC-H OpenMail0
0.2
0.4
0.6
0.8
1.0
No
rmal
ized
ave
. r
esp
on
se t
ime
• Variance of Response Times
Improvement of variance of response time by 49.8% and 83.3% for Financial and Cello.
Financial Cello TPC-H OpenMail0
0.2
0.4
0.6
0.8
1.0
No
rmal
ized
sta
nd
ard
dev
iati
on
24
ConclusionsConclusions
• Solid state drives– Fast access speed– Performance variation garbage collection
• Semi-preemptive garbage collection– Service incoming requests during GC
• Average response time and performance variation are reduced by up to 66.6% and 83.3%
25
Questions?Questions?
Contact info
Junghee [email protected] and Computer EngineeringGeorgia Institute of Technology
26
Thank you!Thank you!