34
Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1 , John D. Davis 2 , Karin Strauss 2 , Parikshit Gopalan 2 , Mark Manasse 2 , Sergey Yekhanin 2 University of Campinas 1 & Microsoft Research 2 Zombie Memory John D. Davis 2 ,

Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Embed Size (px)

Citation preview

Page 1: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie Memory: Extending Memory Lifetime by

Reviving Dead BlocksRodolfo Azevedo1, John D. Davis2, Karin Strauss2,

Parikshit Gopalan2, Mark Manasse2, Sergey Yekhanin2

University of Campinas1 & Microsoft Research2

Zombie Memory

John D. Davis2,

Page 2: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

The “End” of the Road for DRAM

• DRAM scaling wall• Fabrication limitations• Variability• Increasing error correction overhead (more transient errors)• Increasing active/standby/refresh power

• Industry looking for byte-addressable alternatives…but, main gating factor is memory lifetime

Page 3: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

• Phase Change Memory (PCM), CBRAM, Memristors, etc.• Fabrication friendly• Value stability• “Zero” standby power

• Shorter lifetime (108) vs. DRAM (1015)

• Mismatch in memory cell failure mechanisms

Coming on the Horizon: NEW *RAM!

4 KB PageDead PageZombie Page

Page 4: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Cell Failure Remediation MismatchI am NOT Dead Yet!

Page 5: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

• Not all dead things are bad for you!• Lots of good cells in “dead” pages

• Single-level cell (SLC) & multi-level cell (MLC) mechanisms• The first resistance drift + cell failure mechanism for MLC PCM• Adaptive error correction mechanisms• Maximizes memory capacity over the lifetime

Why Should You Care About Zombies?

Page 6: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombies in the Paper

SLC MLCError sources Wearout Wearout + driftMechanisms ZombieECP

ZombieERCZombieXOR

ZombieMLC

Lifetime improvement 58%-92% 11x-17xService lifetime ~2.2 years

3.5-4.3 years~5 months

~5 yearsPerformance impact 0-25% 0-25%

Page 7: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Outline

• Block Pairing

• Zombie Memory•Zombie ECP•Zombie ERC•Zombie XOR•Zombie MLC

• How Long do Zombies Live? (Evaluation)

• Conclusions

Single-Level CellZombie ECPZombie ERC

Multi-Level Cell

Page 8: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

• Reintegrating Zombies backinto the memory system

• Phase Change Memory + 6 Error Correcting Pointers (ECP)• Other error correction schemes can be used• 512 bit blocks + 64 bits error correction, 64 blocks/ 4 KB page• Differential writes• Simulation details in the paper, SPEC CPU2006

The BasicsPrimary Page Zombie Page

Page 9: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Error Correcting Pointers Review

• Use pointer + replacement bit for cell failure• 9 bits pointer + 1 bit• Additional metadata• ISCA ‘10

512-bitblock

Good BlockWorn Block

Failed Cell

ECP Entry

12% EC Overhead

Page 10: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Adaptive Block Pairing

• Pairing with different sized spare blocks• EC bits in the primary point to the spare• Reuse intrinsic error correction in the spare block• Re-pairing at the sub-block and block levels

• Re-pair with different spare blocks• Gives Zombie a second chance

PrimaryPrimary

SpareSpareSpare

Good BlockWorn BlockSpare Block

Zombie block pools

Page 11: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie XOR

• Pairs primary and spare blocks using XOR aligned bits to produce data• Bias wear to spare block to maximize primary lifetime• Reuse spare error correction bits to correct aligned cell failures in the

primary and spare• Re-pair with “new” spare

Primary

SpareSpare

Good BlockWorn Block

Spare Block

Failed Cell

ECP Entry

Pairing Pointer

Page 12: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie MLC

• Must handle drift and cell failures• Rank modulation* to handle drift

Fixed guard bands

Relative cellvalues

*N. Papandreou et al. IMW, 2011 Reprint of D. Ielmini et al., IEDM2007

11

10

01

00 0 1

Number String Codeword

Page 13: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie MLC

• Must handle drift and cell failures• Rank modulation* to handle drift • Anchor symbols are added to handle cell failures• Known anchor location and/or known values• Optimal encoding: # replacement cells = # failed cells

0 1 0 2 3 0 1 2 31 2 1 3 0 1 2 3 02 3 2 0 1 2 3 0 1

1 Cell Stuck-at 0

Original stringAnchor CodewordAnchors

2 Cells Stuck-at 0

1 2 3 3 0 0 3 3 0 0 3 33 3 1 0 3 3 0 0 3 3 2 0 Original non-uniform string* *over a finite

field

Coordinate shuffle equation1 2 3 4 5 6 7 8 9 10 11 12

See the paper for 3 stuck-at cells mechanism.Codeword

*N. Papandreou et al. IMW, 2011

Bit positions

Page 14: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie ECP & ERC

• Pairing + existing error correction mechanisms • Adaptive: 1/4, 1/2, and full block pairing• ECP [ISCA ‘10]: Use spare block to add more Error Correcting Pointers

to the primary block• ERC [PIT ‘74, HPCA ‘13] : Change the model to an erasure model• Instead of correcting (d-1)/2 errors (error model), can correct d-1 errors• Bias wear to spare block to maximize primary lifetime

Page 15: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

How Long do Zombies Live?

Page 16: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie SLC Write Capacity

Page 17: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie SLC Write Capacity

58% longer

life

Page 18: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie SLC Write Capacity

58% longer

life

Page 19: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie SLC Write Capacity

92% longer life

Page 20: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie SLC Performance < 0.5% slowdown on SPEC workloads < 6% slowdown on SPEC workloads

Page 21: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

I’m NOT Dead YET!

Page 22: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

I’m Still NOT Dead YET!

Page 23: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

I’m STILL NOT Dead YET!

Page 24: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Squeezed Blood From a Turnip!

Page 25: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie MLC Write Capacity

Page 26: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie MLC Write Capacity

17X longer life

Page 27: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie MLC Write Capacity

11X longer life

Page 28: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie MLC Performance

< 4% slowdown on SPEC workloads

Page 29: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombies Can Be Rehabilitated!

• Zombie framework• Using dead blocks to extend memory lifetime• Versatile and adaptive• Low implementation overhead

• MLC: First drift + cell failure solution• Using fixed positions and/or fixed values for anchors• Lifetime improvement 11X – 17X

• SLC: Multiple mechanisms• Maximize lifetime or capacity• Lifetime improvement of 58-92%

Page 30: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Questions?For more details: Read the paper, read the tech report, and/or talk to [email protected] &{john.d, kstrauss, parik, manasse, yekhanin}@microsoft.com

Zombie Memory: Extending Memory Lifetime

by Reviving Dead Blocks

Page 31: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

More About Zombie…

Page 32: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie SLC Performance

Page 33: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Zombie MLC Performance

Page 34: Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Rodolfo Azevedo 1, John D. Davis 2, Karin Strauss 2, Parikshit Gopalan 2, Mark Manasse

Mitigating Drift-Induced Soft Errors

• Previous Assumptions:• Fixed guard band for cell value• Uniform distribution of resistance values.• ~2 second data lifetime….

• Relaxing the drift-induced soft error constraint• Rank modulation (no fixed guard band)• Non-uniform distribution of resistance values

• Cluster the low levels and spread apart the high levels• ~5 Days of data lifetime (worst-case wear is 5 seconds)• More knobs:

• Tighten resistance distribution• Use different drift coefficients