Upload
keon-craner
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Zombie Memory: Extending Memory Lifetime by
Reviving Dead BlocksRodolfo Azevedo1, John D. Davis2, Karin Strauss2,
Parikshit Gopalan2, Mark Manasse2, Sergey Yekhanin2
University of Campinas1 & Microsoft Research2
Zombie Memory
John D. Davis2,
The “End” of the Road for DRAM
• DRAM scaling wall• Fabrication limitations• Variability• Increasing error correction overhead (more transient errors)• Increasing active/standby/refresh power
• Industry looking for byte-addressable alternatives…but, main gating factor is memory lifetime
• Phase Change Memory (PCM), CBRAM, Memristors, etc.• Fabrication friendly• Value stability• “Zero” standby power
• Shorter lifetime (108) vs. DRAM (1015)
• Mismatch in memory cell failure mechanisms
Coming on the Horizon: NEW *RAM!
4 KB PageDead PageZombie Page
Cell Failure Remediation MismatchI am NOT Dead Yet!
• Not all dead things are bad for you!• Lots of good cells in “dead” pages
• Single-level cell (SLC) & multi-level cell (MLC) mechanisms• The first resistance drift + cell failure mechanism for MLC PCM• Adaptive error correction mechanisms• Maximizes memory capacity over the lifetime
Why Should You Care About Zombies?
Zombies in the Paper
SLC MLCError sources Wearout Wearout + driftMechanisms ZombieECP
ZombieERCZombieXOR
ZombieMLC
Lifetime improvement 58%-92% 11x-17xService lifetime ~2.2 years
3.5-4.3 years~5 months
~5 yearsPerformance impact 0-25% 0-25%
Outline
• Block Pairing
• Zombie Memory•Zombie ECP•Zombie ERC•Zombie XOR•Zombie MLC
• How Long do Zombies Live? (Evaluation)
• Conclusions
Single-Level CellZombie ECPZombie ERC
Multi-Level Cell
• Reintegrating Zombies backinto the memory system
• Phase Change Memory + 6 Error Correcting Pointers (ECP)• Other error correction schemes can be used• 512 bit blocks + 64 bits error correction, 64 blocks/ 4 KB page• Differential writes• Simulation details in the paper, SPEC CPU2006
The BasicsPrimary Page Zombie Page
Error Correcting Pointers Review
• Use pointer + replacement bit for cell failure• 9 bits pointer + 1 bit• Additional metadata• ISCA ‘10
512-bitblock
Good BlockWorn Block
Failed Cell
ECP Entry
12% EC Overhead
Adaptive Block Pairing
• Pairing with different sized spare blocks• EC bits in the primary point to the spare• Reuse intrinsic error correction in the spare block• Re-pairing at the sub-block and block levels
• Re-pair with different spare blocks• Gives Zombie a second chance
PrimaryPrimary
SpareSpareSpare
Good BlockWorn BlockSpare Block
Zombie block pools
Zombie XOR
• Pairs primary and spare blocks using XOR aligned bits to produce data• Bias wear to spare block to maximize primary lifetime• Reuse spare error correction bits to correct aligned cell failures in the
primary and spare• Re-pair with “new” spare
Primary
SpareSpare
Good BlockWorn Block
Spare Block
Failed Cell
ECP Entry
Pairing Pointer
Zombie MLC
• Must handle drift and cell failures• Rank modulation* to handle drift
Fixed guard bands
Relative cellvalues
*N. Papandreou et al. IMW, 2011 Reprint of D. Ielmini et al., IEDM2007
11
10
01
00 0 1
Number String Codeword
Zombie MLC
• Must handle drift and cell failures• Rank modulation* to handle drift • Anchor symbols are added to handle cell failures• Known anchor location and/or known values• Optimal encoding: # replacement cells = # failed cells
0 1 0 2 3 0 1 2 31 2 1 3 0 1 2 3 02 3 2 0 1 2 3 0 1
1 Cell Stuck-at 0
Original stringAnchor CodewordAnchors
2 Cells Stuck-at 0
1 2 3 3 0 0 3 3 0 0 3 33 3 1 0 3 3 0 0 3 3 2 0 Original non-uniform string* *over a finite
field
Coordinate shuffle equation1 2 3 4 5 6 7 8 9 10 11 12
See the paper for 3 stuck-at cells mechanism.Codeword
*N. Papandreou et al. IMW, 2011
Bit positions
Zombie ECP & ERC
• Pairing + existing error correction mechanisms • Adaptive: 1/4, 1/2, and full block pairing• ECP [ISCA ‘10]: Use spare block to add more Error Correcting Pointers
to the primary block• ERC [PIT ‘74, HPCA ‘13] : Change the model to an erasure model• Instead of correcting (d-1)/2 errors (error model), can correct d-1 errors• Bias wear to spare block to maximize primary lifetime
How Long do Zombies Live?
Zombie SLC Write Capacity
Zombie SLC Write Capacity
58% longer
life
Zombie SLC Write Capacity
58% longer
life
Zombie SLC Write Capacity
92% longer life
Zombie SLC Performance < 0.5% slowdown on SPEC workloads < 6% slowdown on SPEC workloads
I’m NOT Dead YET!
I’m Still NOT Dead YET!
I’m STILL NOT Dead YET!
Squeezed Blood From a Turnip!
Zombie MLC Write Capacity
Zombie MLC Write Capacity
17X longer life
Zombie MLC Write Capacity
11X longer life
Zombie MLC Performance
< 4% slowdown on SPEC workloads
Zombies Can Be Rehabilitated!
• Zombie framework• Using dead blocks to extend memory lifetime• Versatile and adaptive• Low implementation overhead
• MLC: First drift + cell failure solution• Using fixed positions and/or fixed values for anchors• Lifetime improvement 11X – 17X
• SLC: Multiple mechanisms• Maximize lifetime or capacity• Lifetime improvement of 58-92%
Questions?For more details: Read the paper, read the tech report, and/or talk to [email protected] &{john.d, kstrauss, parik, manasse, yekhanin}@microsoft.com
Zombie Memory: Extending Memory Lifetime
by Reviving Dead Blocks
More About Zombie…
Zombie SLC Performance
Zombie MLC Performance
Mitigating Drift-Induced Soft Errors
• Previous Assumptions:• Fixed guard band for cell value• Uniform distribution of resistance values.• ~2 second data lifetime….
• Relaxing the drift-induced soft error constraint• Rank modulation (no fixed guard band)• Non-uniform distribution of resistance values
• Cluster the low levels and spread apart the high levels• ~5 Days of data lifetime (worst-case wear is 5 seconds)• More knobs:
• Tighten resistance distribution• Use different drift coefficients