Upload
job-bailey
View
213
Download
1
Embed Size (px)
Citation preview
2
Phase Change Memories: the technology promises
• Non volatile RAM: More scalable than DRAM (up to 4X) No leakage Read access time in the same range as DRAM
or at least close
• But limited write endurance: 10 Mwrites ? 100 Mwrites ? 1Gwrites ?
3
ISCA 2009 (june)• 3 papers on using PCM memories as main memory:
Concentrate at showing that simple mechanisms would allow a PCM main memory to accommodate conventional applications for the computer lifetime
Did not even notice the security breach: Overwrite attack:
can just physically destroy the memory can be run by any user without any priviledge « just want my machine to be replaced before
the end of the 3 years guarantee »
Main memory should resist YEARS to overwrite attacks
4
Memory Controller:PA-to-PCMA translation
PCMbank
PCMbank
PCMbank
PCMbank
PCM address space
Physical address space
5
Start-Gap scheme, Micro 2009 (dec)
• Still targeting « normal » users applications: Physical address to PCM address translation is
dynamically changed at runtime Randomization to avoid « hot write cells »
associated with spatial locality Security as a by-product of randomization
• First study to consider possible malicious attack: Region-based Start-Gap scheme
6
Memory Controller:PA-to-PCMA translation
PCMbank
PCMbank
PCMbank
PCMbank
PCM address space
Physical address space
PCM address is invisible
7
Start-Gap Wear Leveling
Two registers (Start & Gap) + 1 line (GapLine) to support movement.Move GapLine every G writes to memory.
STARTABC
0 1 2 3
4
PCMAddr = (Start+Addr); (PCMAddr >= Gap) PCMAddr++)
D
GAP
Storage overhead: less than 8 bytes (GapLine taken from spares) Write overhead: One extra write every G writes 1% (G=100)
Randomized address space to avoid “hot region” and predictability
Courtesy from Moinuddin Qureshi
8
The security on RBSG • W the write endurance
• On a given region of S blocks, the PA-to-PCMA address translation of one block is changed every Gap writes: induce an extra PCM block write
• For a given physical block PA-to-PCMA translation is guaranteed to change every Gap*S writes
• For a given physical block PA-to-PCMA translation is periodic with period
Gap*S < W
€
Gap* S2
€
Gap* S2 is long
9
RBSG (Micro 2009)• W= 32M
• S= 256Kblocks, Gap =100
• 4Ghz || write acces time, 4Kcycles: 1Mwrite/sBasing security on low write bandwidth
(256Mbytes/s) ?
• Resist to overwriting same physical block for 4 months (77 days from my counting !!)
10
Birthday Paradox Attack(BPA)
• In a group of 24 persons it is likely (p>1/2) that at least two persons have the same birthday.
• In a sequence of 9645 randomly selected elements in a set of 64M memory blocks, it is likely to have twice the same element.
Micro 2009 - RBSG hypothesis+ 4GBs/s write bandwidth: should resist 4 years at full bandwidth+interleaving 16 sequences of 32M writes on 16 different addresses 4 1/2 hours of write endurance (first failure)
11
Sandbagging RBSG against BPA• Reduce region size S, reduce Gap
S*Gap << W S=128K, Gap=64
Optimized BPA 11.5 days RAA: 48 days
S=64K, Gap=64 Optimized BPA 97 days RAA: 24 days
BUT ..
12
Combined BPA-RAA
1/16 th of the bandwidth for RAA, 15/16 th for BPA
• S= 64K, Gap= 64 14.25 days
• S=256K, Gap= 8 61 days, but 10 % write overhead
But no page mode ?
13
RBSG + page mode
• The PA-to-PCMA translation granularity is a page 4KB pages: write overhead 16 blocks
Gap =128 (12.5% write overhead), S=32K pages 4 1/2 days
14
And spare lines ?
• Main memory are implemented with spare blocks to get some permanent fault tolerance.
Any spare line can replace any memory line
Gap=100, 64K spares, no page mode: RAA-BPA : 51 days
15
Spare lines + page mode
Gap =128, 1K spares : 7.75 days, S=32K pages 64K spares: 16 days, S= 64K pages
+ Endurance = 128M writes 1K spares: 65 days, S= 128K pages 64K spares: 110 days, S= 128K pages
18
S-PCM memory
• Security as the first class citizen
• Should resist to attacks for a sizeable fraction of the expected lifetime
19Principles for a secure PCM main memory
• Invisible PA-to-PCMA translation: Malicious user cannot figure out PA-to-PCMA translation
• Complete « randomization » of the PA-to-PCMA translation changes Any physical block could be mapped onto any PCM
block Defeat RAA
• Frequent changes of the PA-to-PCMA translation: Defeat BPA:
Experimentally, translation change frequency must be much higher than 1/W to reach 50 % of the expected memory life time (256/W in practice)
20
Implementation principles • Use of a PA-to-PCMA translation table
One entry for a region of R= blocks A physical region is mapped on a PCM region A block can be mapped on any block in the target
region
PA-to-PCMA translation change: Only on writes Randomly trigerred with frequency F
No counter: only a random number generator
Swap two PA-to-PCMA translations
€
2r
21
Some implementation constraints
• A region must be larger than a page 16 GB memory, 4KB pages: 4M pages ..
Regions should be large: 256KB 64Kentries 4MB 4Kentries
• A PA-to-PCMA translation change induces 2 R memory block reads and 2 R memory block writes: For limiting write overhead, should limit the
frequency F
22
Dealing with the constraints• W= 32M, 16GB memory, 256 bytes blocks,
• 1 extra write per 8 writes
• F= 256/W 50 % total write endurance extra write bandwidth: 2S*F = 1/8
S= 8K blocks 8K 26-bit translation table entries
– 26Kbytes, not a huge table !! 52 % total write endurance 4GBs/s: 2 years of endurance to BPA or
RAA
23
Initializing the translation table
• The translation table has to set a one-to-one mapping
Boot-time initialization ? With « random » mapping ?
24
T(B).addr B R_init T(B).disp X D_init
B
region displacement
X
address disp
PCM address space
Physical memory address space
Initialized at boot-time
Initialized with zerosat boot-time
25
Swapping two translations blocks
• T(A).addr= oldT(B).addrBA
• T(A).disp= oldT(A).addrRAND
• T(B).disp= oldT(B).addrRAND Randomizing the displacement is
needed to avoid attacks on a fixed position in the region
26
Managing region swaps• Large regions have to be swapped on PA-to-PCMA
translation changes: Normal reads and writes should not to be stopped Randomly triggered PA-to-PCMA translation
changes
• The memory controller must interleave normal access flows with region swapping: In practice, a random priority biased to normal
access flow limits the buffer of regions to be swapped.
27
Endurance of the secure PCM memory
• 16GB memory, 256B blocks, 4Kblocks regions 52 Kbytes translation table
32M 64M 128M 256M
3.125 %
42% 53% 66% 74%
12.5 %
62% 69% 74% 79%
EnduranceWrite overh
ead
Expected life time under attack
28
Endurance of the secure PCM memory
• 16GB memory, 256B blocks, 64Kblocks regions 3.25 Kbytes translation table
32M 64M 128M 256M
3.125%
3 min 0.4 % 7.4% 19%
12.5 %
7.4 % 3 months
19 % 38 % 51 %
2 years
enduranceWrite overh
ead
Expected life time under attack
29
And « normal » applications ?• Region swap after 1/F writes (average)
• In a swap interval: Malicious attacks:
One block 1/F writes, the other blocks no writes « Normal » applications:
A total of 1/F writes on different blocks in the same region
For a single PCM block: swap frequency is much higher than F
Endurance is very close to theoretical
30
S-PCM
• + Years of endurance
• + Address translation:
– Table read + XOR
• - Hardware logic for region swapping
RBSG
• - Days of endurance
• - Address Translation:
– 1st logic + table read + 2nd logic
• + Simple logic for page moving
31
Conclusion• If PCM technology delivers then secure PCM main
memory will be possible
• Wear leveling comes for free with security
• Main overhead costs: Hardware logic to interleave region swapping
with normal access flow Random number generator Will fix write overhead to less than 1 % for
« normal » workload (just adapt ideas from Moinuddin)
• No need for « monstruous » cell endurance
32
Disclaimer
There might be other forms of attacks: Probably not on the scheme by itself:
• randomization is a quite good defense
Side channels attacks against specific hardware implementations: E.g. concentrate attack on a single
bank
34
repeatA (x N) Random (x M)
With Moinuddin’s parametersN=84, M=1792, Gap= min(128,d),LRU stack 4 entriesSame block written 22M times before PA-PCMA translation change
+ BPA: 7 days and that is it !!
35
But that might be corrected
• decrease the gap factor : Gap = Min (128, d/32), 3.5 M
consecutive writes
• decrease the region size : Gap = Min(128,d), 512K regions, 2.75
M consecutive writes
36
Concern
• Each new attack generates new countermeasure: Extra hardware complexity
New opportunity for new attacks Possibility of snowball effects