37
1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA

1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA

Embed Size (px)

Citation preview

1

Towards Phase Change Memory as a Secure Main Memory

André Seznec

IRISA/INRIA

2

Phase Change Memories: the technology promises

• Non volatile RAM: More scalable than DRAM (up to 4X) No leakage Read access time in the same range as DRAM

or at least close

• But limited write endurance: 10 Mwrites ? 100 Mwrites ? 1Gwrites ?

3

ISCA 2009 (june)• 3 papers on using PCM memories as main memory:

Concentrate at showing that simple mechanisms would allow a PCM main memory to accommodate conventional applications for the computer lifetime

Did not even notice the security breach: Overwrite attack:

can just physically destroy the memory can be run by any user without any priviledge « just want my machine to be replaced before

the end of the 3 years guarantee »

Main memory should resist YEARS to overwrite attacks

4

Memory Controller:PA-to-PCMA translation

PCMbank

PCMbank

PCMbank

PCMbank

PCM address space

Physical address space

5

Start-Gap scheme, Micro 2009 (dec)

• Still targeting « normal » users applications: Physical address to PCM address translation is

dynamically changed at runtime Randomization to avoid « hot write cells »

associated with spatial locality Security as a by-product of randomization

• First study to consider possible malicious attack: Region-based Start-Gap scheme

6

Memory Controller:PA-to-PCMA translation

PCMbank

PCMbank

PCMbank

PCMbank

PCM address space

Physical address space

PCM address is invisible

7

Start-Gap Wear Leveling

Two registers (Start & Gap) + 1 line (GapLine) to support movement.Move GapLine every G writes to memory.

STARTABC

0 1 2 3

4

PCMAddr = (Start+Addr); (PCMAddr >= Gap) PCMAddr++)

D

GAP

Storage overhead: less than 8 bytes (GapLine taken from spares) Write overhead: One extra write every G writes 1% (G=100)

Randomized address space to avoid “hot region” and predictability

Courtesy from Moinuddin Qureshi

8

The security on RBSG • W the write endurance

• On a given region of S blocks, the PA-to-PCMA address translation of one block is changed every Gap writes: induce an extra PCM block write

• For a given physical block PA-to-PCMA translation is guaranteed to change every Gap*S writes

• For a given physical block PA-to-PCMA translation is periodic with period

Gap*S < W

Gap* S2

Gap* S2 is long

9

RBSG (Micro 2009)• W= 32M

• S= 256Kblocks, Gap =100

• 4Ghz || write acces time, 4Kcycles: 1Mwrite/sBasing security on low write bandwidth

(256Mbytes/s) ?

• Resist to overwriting same physical block for 4 months (77 days from my counting !!)

10

Birthday Paradox Attack(BPA)

• In a group of 24 persons it is likely (p>1/2) that at least two persons have the same birthday.

• In a sequence of 9645 randomly selected elements in a set of 64M memory blocks, it is likely to have twice the same element.

Micro 2009 - RBSG hypothesis+ 4GBs/s write bandwidth: should resist 4 years at full bandwidth+interleaving 16 sequences of 32M writes on 16 different addresses 4 1/2 hours of write endurance (first failure)

11

Sandbagging RBSG against BPA• Reduce region size S, reduce Gap

S*Gap << W S=128K, Gap=64

Optimized BPA 11.5 days RAA: 48 days

S=64K, Gap=64 Optimized BPA 97 days RAA: 24 days

BUT ..

12

Combined BPA-RAA

1/16 th of the bandwidth for RAA, 15/16 th for BPA

• S= 64K, Gap= 64 14.25 days

• S=256K, Gap= 8 61 days, but 10 % write overhead

But no page mode ?

13

RBSG + page mode

• The PA-to-PCMA translation granularity is a page 4KB pages: write overhead 16 blocks

Gap =128 (12.5% write overhead), S=32K pages 4 1/2 days

14

And spare lines ?

• Main memory are implemented with spare blocks to get some permanent fault tolerance.

Any spare line can replace any memory line

Gap=100, 64K spares, no page mode: RAA-BPA : 51 days

15

Spare lines + page mode

Gap =128, 1K spares : 7.75 days, S=32K pages 64K spares: 16 days, S= 64K pages

+ Endurance = 128M writes 1K spares: 65 days, S= 128K pages 64K spares: 110 days, S= 128K pages

16

Still want to use PCM main memory and guarantee

the hardware for 3 years ?

17

Or

18

S-PCM memory

• Security as the first class citizen

• Should resist to attacks for a sizeable fraction of the expected lifetime

19Principles for a secure PCM main memory

• Invisible PA-to-PCMA translation: Malicious user cannot figure out PA-to-PCMA translation

• Complete   « randomization » of the PA-to-PCMA translation changes Any physical block could be mapped onto any PCM

block Defeat RAA

• Frequent changes of the PA-to-PCMA translation: Defeat BPA:

Experimentally, translation change frequency must be much higher than 1/W to reach 50 % of the expected memory life time (256/W in practice)

20

Implementation principles • Use of a PA-to-PCMA translation table

One entry for a region of R= blocks A physical region is mapped on a PCM region A block can be mapped on any block in the target

region

PA-to-PCMA translation change: Only on writes Randomly trigerred with frequency F

No counter: only a random number generator

Swap two PA-to-PCMA translations

2r

21

Some implementation constraints

• A region must be larger than a page 16 GB memory, 4KB pages: 4M pages ..

Regions should be large: 256KB 64Kentries 4MB 4Kentries

• A PA-to-PCMA translation change induces 2 R memory block reads and 2 R memory block writes: For limiting write overhead, should limit the

frequency F

22

Dealing with the constraints• W= 32M, 16GB memory, 256 bytes blocks,

• 1 extra write per 8 writes

• F= 256/W 50 % total write endurance extra write bandwidth: 2S*F = 1/8

S= 8K blocks 8K 26-bit translation table entries

– 26Kbytes, not a huge table !! 52 % total write endurance 4GBs/s: 2 years of endurance to BPA or

RAA

23

Initializing the translation table

• The translation table has to set a one-to-one mapping

Boot-time initialization ? With « random » mapping ?

24

T(B).addr B R_init T(B).disp X D_init

B

region displacement

X

address disp

PCM address space

Physical memory address space

Initialized at boot-time

Initialized with zerosat boot-time

25

Swapping two translations blocks

• T(A).addr= oldT(B).addrBA

• T(A).disp= oldT(A).addrRAND

• T(B).disp= oldT(B).addrRAND Randomizing the displacement is

needed to avoid attacks on a fixed position in the region

26

Managing region swaps• Large regions have to be swapped on PA-to-PCMA

translation changes: Normal reads and writes should not to be stopped Randomly triggered PA-to-PCMA translation

changes

• The memory controller must interleave normal access flows with region swapping: In practice, a random priority biased to normal

access flow limits the buffer of regions to be swapped.

27

Endurance of the secure PCM memory

• 16GB memory, 256B blocks, 4Kblocks regions 52 Kbytes translation table

32M 64M 128M 256M

3.125 %

42% 53% 66% 74%

12.5 %

62% 69% 74% 79%

EnduranceWrite overh

ead

Expected life time under attack

28

Endurance of the secure PCM memory

• 16GB memory, 256B blocks, 64Kblocks regions 3.25 Kbytes translation table

32M 64M 128M 256M

3.125%

3 min 0.4 % 7.4% 19%

12.5 %

7.4 % 3 months

19 % 38 % 51 %

2 years

enduranceWrite overh

ead

Expected life time under attack

29

And « normal » applications ?• Region swap after 1/F writes (average)

• In a swap interval: Malicious attacks:

One block 1/F writes, the other blocks no writes « Normal » applications:

A total of 1/F writes on different blocks in the same region

For a single PCM block: swap frequency is much higher than F

Endurance is very close to theoretical

30

S-PCM

• + Years of endurance

• + Address translation:

– Table read + XOR

• - Hardware logic for region swapping

RBSG

• - Days of endurance

• - Address Translation:

– 1st logic + table read + 2nd logic

• + Simple logic for page moving

31

Conclusion• If PCM technology delivers then secure PCM main

memory will be possible

• Wear leveling comes for free with security

• Main overhead costs: Hardware logic to interleave region swapping

with normal access flow Random number generator Will fix write overhead to less than 1 % for

« normal » workload (just adapt ideas from Moinuddin)

• No need for « monstruous » cell endurance

32

Disclaimer

There might be other forms of attacks: Probably not on the scheme by itself:

• randomization is a quite good defense

Side channels attacks against specific hardware implementations: E.g. concentrate attack on a single

bank

33

An attack against new Moinuddin’s scheme

34

repeatA (x N) Random (x M)

With Moinuddin’s parametersN=84, M=1792, Gap= min(128,d),LRU stack 4 entriesSame block written 22M times before PA-PCMA translation change

+ BPA: 7 days and that is it !!

35

But that might be corrected

• decrease the gap factor : Gap = Min (128, d/32), 3.5 M

consecutive writes

• decrease the region size : Gap = Min(128,d), 512K regions, 2.75

M consecutive writes

36

Concern

• Each new attack generates new countermeasure: Extra hardware complexity

New opportunity for new attacks Possibility of snowball effects

37

New attack opportunities

• decrease the gap factor : Gap = Min (128, d/32), 3.5 M

consecutive writes Combined with a RAA: 4 months

• decrease the region size : Gap = Min(128,d), 512Kblocks regions,

2.75 M consecutive writes RAA is improved by a 8x factor