Upload
lykhanh
View
431
Download
4
Embed Size (px)
Citation preview
Become a SSD expert in minutes!
Ryan Smith
408-205-8889
2 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
What is a SSD?
• SSD = Solid State Drive • RAM-based introduced in 1970’s • Flash-based version in 1990’s • Today, it typically uses NAND Flash • 2012 is a big year for SSDs • Don’t complicate it.. it’s just a really fast drive!
0
5
10
15
20
25
30
35
2010 2011 2012
Mil
lio
ns
# of SSDs sold
PC Server Storage
Source : Samsung
3 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Why an SSD?
• Three things that dictate the speed of your PC/Server:
• CPU, DRAM, and HDD
Everything is speeding up.. Except the HDD
Processor:
•Multi-core
•Higher bandwidth
Memory:
•Larger footprint
•Higher bandwidth
Storage:
•Minor throughput improvements
•Currently solved with spindles
Time
Pe
rfo
rma
nce
Closing the gap with Solid State
Storage
4 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Why an SSD?
• Lower response times (latency) • Higher IOPS and Throughput • Lower Power • No RVI Issues, More reliable
Random Performance (IOPS) Power Consumption (Watt)
Read Write 70:30
Test Environment : Intel SR2600UR Server / IOMeter2008 Test Environment : Intel SR2600UR Server / IOMeter2008 / 4KB RND R70:W30
Active Idle
Source : Samsung
SM825
15K RPM HDD
SM825
15K RPM HDD
X100
X60
X30
-87%
-75% 8.5
12.6
3.2
1.1
11K
23K
43K
5 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
So what’s there to know about an SSD?
SSD Key Characteristics
SSD Components
NAND Characteristics
P/E Cycles
WAF
TBW
SMART
Host Interface
Sustained vs. Peak Performance
Benchmarking
SSD Influencers
TRIM
Over-provisioning
Changing Workload
MLC 1 0
1 0
3,000
User Area Reserved O/P
7 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
SSD Components
• Host/NAND Controller • Firmware • NAND Flash • DRAM • Capacitors (optional)
NAND
DRAM
Firmware
Controller
DRAM
NAND Flash
Controller
Firmware
Host Interface
All components work closely together SSD Image Source : Anandtech
8 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
NAND Characteristics
• Types of NAND
• TLC
• MLC
• E-MLC
• SLC
Geometry / Lithography
• 4xnm, 3xnm, 2xnm
• Smaller = Less Cost
NAND Hierarchy
• Pages: Smallest unit that can be read/written (e.g., 8KB)
• Erase block: Groups of pages (e.g., 64 pages @ 8KB = 512KB)
500-1K P/E Cycles 1 year retention
3-5K P/E Cycles 1 year retention
10-30K P/E Cycles 3 month retention
90-100K P/E Cycles 3 mo – 1 yr retention
TLC 1 0
1 0
1 0
MLC 1 0
1 0
E-MLC
SLC 1 0
1 0
1 0
PC Enterprise
1 0
1 0
1 0
1 0
1 0
1 0
9 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
P/E Cycles
• As geometries shrink, error correction must get better • It’s like a car warranty!
• 3 years or 50,000 miles
• 3 years or 3,000 P/E Cycles
Not a useful characteristic by itself
3,000
3xnm 2xnm 2ynm
ECC Requirements
Program / Erase Cycles
The # of times a given NAND cell can be programmed & erased
10 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Write Amplification Factor (WAF)
• WAF 1 means 1MB from host writes 1MB to NAND • WAF 5 means 1MB from host writes 5MB to NAND
• Factors that can affect WAF:
Write Amplification Factor
Bytes written to NAND versus bytes written from PC/Server
Controller
Flash Translation Layer (FTL)
Wear Leveling
Over-provisioning
Garbage Collection
Host Application Write Profile (Ran vs. Seq)
Free user space / TRIM
Bytes written to NAND
Bytes written from Host
WAF =
11 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Write Amplification (WAF) Example
• Below example illustrates WAF of 6
Z
A
LBA 0
Host
SS
D
Time
B
C D
E F
LBA 0 A B
C D
E F
Z B C D E F C
ach
e
Fla
sh
Z Z B C D E F
Z B C D E F
Z B
C D
E F
Host wants to update LBA 0
No more free pages Need to erase entire block Read existing data to Cache
Erase block Write modified page and
old pages back to Flash
4KB from Host
24KB to NAND
12 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
TBW
Examples: ((128GB / 1000) * 3000) / 5 = 76.8 TBW ((128GB / 1000) * 3000) / 2.5 = 153.6 TBW ((256GB / 1000) * 3000) / 5 = 153.6 TBW ((128GB / 1000) * 30000) / 5 = 768 TBW
TeraBytes Written
# of terabytes you can write to the drive over it’s useful life
(Capacity GB/1000) x PE Cycles
WAF
TBW =
13 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
SMART
ID Attribute Name
5 Reallocated Sector Count
9 Power-on Hours
12 Power-on Count
177 Wear Leveling Count
179 Used Reserved Block Count
180 Unused Reserved Block Count
181 Program Fail Count
182 Erase Fail Count
187 Uncorrectable Error Count
195 ECC Error Count
199 CRC Error Count
241 Total LBA Written
Look at health and various statistics Allows for predictable maintenance windows Calculate WAF, TBW
Host GB written = [ID241] / (2/1024/1024)
NAND GB written = [ID177] * Capacity GB WAF = NAND GB / Host GB
Expected Life (yrs) = Warranty PE * ([ID9]/24/365) / [ID177]
14 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Host Interface
• This is how you communicate to the SSD • So many choices..
• SATA
• SAS
• PCIe (NVMe, SCSIe, SATAe, Proprietary)
Which is right for you?
PC Server External Storage
SATA PCIe
SATA SAS PCIe
SATA + SAS bridge
SAS
15 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Sustained vs. Peak Performance
• There can be significant differences in sustained vs. peak • Run enterprise benchmark (e.g., SNIA RTP 2.0) • Or even better, run your own workload (or simulated)
4KB Ran. R/W 100/0 (NCQ=16)
4KB Ran. R/W 65/35 (NCQ=16)
4KB Ran. R/W 0/100 (NCQ=16)
[IOPS] [Ran. Performance @ 4KB]
PM830 128GB
Value SSD
SM825 200GB
Mainstream SSD Vendor “X” 160GB
Value SSD
1MB Seq. R/W 100/0 (NCQ=16)
1MB Seq. R/W 65/35 (NCQ=16)
1MB Seq. R/W 0/100 (NCQ=16)
[MBs] [Seq. Performance @ 1MB]
PM830 128GB SM825 200GB Vendor “X” 160GB
Source : Samsung / SNIA RTP2.0 Benchmark
99% below Peak
94% below Peak
Samsung PM830 vs Vendor “X” 11x Sustained Random Writes
Samsung PM830 vs Vendor “X” 2x Sustained Sequential Writes
95% below Peak
95% below Peak
Over 10,000 IOPS!
There is a BIG difference between “Value” and “Mainstream/Enterprise”
SSDs when you have any degree of writes in your workload
16 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Benchmarking
Benchmark URL
SNIA RTP 2.0 http://www.snia.org/tech_activities/standards/curr_standards/pts
Iometer http://sourceforge.net/projects/iometer/
ATTO Disk http://www.attotech.com/products/product.php?sku=Disk_Benchmark
CrystalDiskMark http://crystalmark.info/software/CrystalDiskMark/index-e.html
HD Tune Pro http://www.hdtune.com/
AS SSD (SSD) http://alex-is.de/PHP/fusion/downloads.php?download_id=9
Anvil (SSD) http://thessdreview.com/latest-buzz/anvil-storage-utilities-releases-new-storage-and-ssd-benchmark/
Scripts Have multiple “dd” running with best guess workload, capturing timing/speeds
Real Workload Capture trace during real workload and playback (ioapps, blktrace/btereplay)
Synthetic or actual workload & take measurements
19 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
TRIM
• Helps the SSD know which blocks aren’t used • Widely supported standard: Windows, Mac OS X, Linux, hdparm • Better sustained performance and extends TBW • Without TRIM, SSD only knows block isn’t used once the same
LBA is written to
Hi
Hi
Bye
Hi Bye
No TRIM needed
Hi
Hi
Bye
Hi Bye
TRIM makes SSD aware
LBA 0 LBA 0
LBA 0 LBA 0
Host
SS
D ?
TRIM
Hi Bye
LBA 0
LBA 0 LBA 0 LBA 1
LBA 1 LBA 0
LBA 1
Time Time
20 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Over-Provisioning
• Helps a few things:
• Improves Write Performance
• Reduces WAF, Increases TBW
User Area Reserved O/P
128GB
100GB 128GB Base-2 to Base-10 conversion:
137,438,953,472 to 128,000,000,0000 (6.9%)
28GB 28% O/P
Sample 128GB SSD 120GB 100GB
Over-Provisioning 7% 28%
Random Read (8K) IOPS 80K 80K
Random Write (8K) IOPS 1,800 6,300
Sequential Read (64K) MB/s 500 500
Sequential Write (64K) MB/s 400 400
4KB Random WAF 5 1.35
4KB Random TBW 15 45
-73%
3x
3.5x
These performance numbers are fictitious but do represent the actual benefits seen during tests
21 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Change Write Workload
• Write sequentially instead of random to reduce WAF
• If you have control of the I/O to the disk, this will pay off
Align your writes with the page boundaries (e.g., 8KB)
If alignment is too hard to implement, just increase your IO size
Only 1 Page needed 2 Pages needed
Random Sequential
MLC 512GB SSD 60 TBW 1250 TBW 20x
8K
8K 8K 8K
8K
LBA 0 LBA 16
LBA 8
LBA 0 LBA 16
8K 8K 8K
LBA
16
Change block alignment
Host
SS
D
23 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
HDD Replacement
• Replace boot drive or main storage • Fastest and easiest way to experience SSDs
SSD HDD
Server
HDD
SSD
Storage
24 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Caching Appliance
• Read and/or Write Cache • Sits between servers and storage, typically in a SAN • Used to speed up legacy or slower storage
HDD
SSD
Servers Cache
Storage
25 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
Tiered Storage
• An external storage device (NAS, SAN) • Only puts “hot” or “critical” data on SSD • Most of the storage is still on HDD
SSD
Servers
HDD HDD
Storage
26 / ? YYYY.MM.DD / 홍길동 책임 / xxxxxx팀
All Flash Storage
• External storage based on 100% SSD/Flash • Typically uses MLC and de-duplication/compression to
achieve better pricing • Designers of these systems are Flash experts
SSD SSD SSD
Servers
Storage