48
Tolerating and Correcting Memory Errors in C and C++ Ben Zorn Microsoft Research In collaboration with: Emery Berger and Gene Novark, UMass - Amherst Karthik Pattabiraman, UIUC Vinod Grover and Ted Hart, Microsoft Research Ben Zorn, Microsoft Research 1 Tolerating and Correcting Memory Errors in C and C++

Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Tolerating and Correcting

Memory Errors in C and C++

Ben ZornMicrosoft Research

In collaboration with:

Emery Berger and Gene Novark, UMass - Amherst

Karthik Pattabiraman, UIUC

Vinod Grover and Ted Hart, Microsoft Research

Ben Zorn, Microsoft Research 1Tolerating and Correcting Memory Errors in C and C++

Page 2: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Buffer overflow

char *c = malloc(100);

c[100] = ‘a’;

Dangling reference

char *p1 = malloc(100);

char *p2 = p1;

free(p1);

p2[0] = ‘x’;

a

Focus on Heap Memory Errors

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 2

c

0 99

p1

0 99

p2

x

Page 3: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

Approaches to Memory Corruptions

Rewrite in a safe language

Static analysis / safe subset of C or C++

SAFECode [Adve], PREfix, SAL, etc.

Runtime detection, fail fast

Jones & Lin, CRED [Lam], CCured [Necula], etc.

Tolerate Corruption and Continue

Failure oblivious [Rinard] (unsound)

Rx, Boundless Memory Blocks, ECC memory

DieHard / Exterminator, Samurai

3Tolerating and Correcting Memory Errors in C and C++

Page 4: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Fault Tolerance and Platforms

Platforms necessary in computing ecosystem

Extensible frameworks provide lattice for 3rd parties

Tremendously successful business model

Examples: Window, iPod, browser, etc.

Platform power derives from extensibility

Tension between isolation for fault tolerance,

integration for functionality

Platform only as reliable as weakest plug-in

Tolerating bad plug-ins necessary by design

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 4

Page 5: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Research Vision

Increase robustness of installed code base

Potentially improve millions of lines of code

Minimize effort – ideally no source mods, no

recompilation

Reduce requirement to patch

Patches are expensive (detect, write, deploy)

Patches may introduce new errors

Enable trading resources for robustness

E.g., more memory implies higher reliability

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 5

Page 6: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

Outline

Motivation

Exterminator Collaboration with Emery Berger, Gene Novark

Automatically corrects memory errors

Suitable for large scale deployment

Critical Memory / Samurai Collaboration with Karthik Pattabiraman, Vinod Grover

New memory semantics

Source changes to explicitly identify and protect critical data

Conclusion

6Tolerating and Correcting Memory Errors in C and C++

Page 7: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

DieHard Allocator in a Nutshell

With Emery Berger (PLDI’06)

Existing heaps are packed tightly to minimize space Tight packing increases

likelihood of corruption

Predictable layout is easier for attacker to exploit

Randomize and overprovisionthe heap Expansion factor determines how

much empty space

Does not change semantics

Replication increases benefits

Enables analytic reasoning

7

Normal Heap

DieHard Heap

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++

Page 8: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

DieHard in Practice

DieHard (non-replicated) Windows, Linux version implemented by Emery Berger

Try it right now! (http://www.diehard-software.org/)

Adaptive, automatically sizes heap

Mechanism automatically redirects malloc calls to DieHard DLL

Application: Firefox & Mozilla Known buffer in version 1.7.3 overflow crashes browser

Experience Usable in practice – no perceived slowdown

Roughly doubles memory consumption with 2x expansion

FireFox: 20.3 Mbytes vs. 44.3 Mbytes with DieHard

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 8

Page 9: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

DieHard Caveats

Primary focus is on protecting heap

Techniques applicable to stack data, but requires

recompilation and format changes

Trades space, processors for memory safety

Not applicable to applications with large footprint

Applicability to server apps likely to increase

In replicated mode, DieHard requires determinism

Replicas see same input, shared state, etc.

DieHard is a brute force approach Improvements possible (efficiency, safety, coverage, etc.)

9Tolerating and Correcting Memory Errors in C and C++

Page 10: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Exterminator Motivation

DieHard limitations Tolerates errors probabilistically, doesn’t fix them

Memory and CPU overhead

Provides no information about source of errors

“Ideal” solution addresses the limitations Program automatically detects and fixes memory errors

Corrected program has no memory, CPU overhead

Sources of errors are pinpointed, easier for human to fix

Exterminator = correcting allocator Joint work with Emery Berger, Gene Novark

Plan: isolate / patch bugs while tolerating them

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 10

Page 11: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Exterminator Components

Architecture of Exterminator dictated by solving

specific problems

How to detect heap corruptions effectively?

DieFast allocator

How to isolate the cause of a heap corruption

precisely?

Heap differencing algorithms

How to automatically fix buggy C code without

breaking it?

Correcting allocator + hot allocator patches

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 11

Page 12: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

DieFast Allocator Randomized, over-provisioned heap

Canary = random bit pattern fixed at startup

Leverage extra free space by inserting canaries

Inserting canaries

Initialization – all cells have canaries

On allocation – no new canaries

On free – put canary in the freed object with prob. P

Checking canaries

On allocation – check cell returned

On free – check adjacent cells

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 12

100101011110

Page 13: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

1 2

Installing and Checking Canaries

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++ 13

Allocate Allocate

Install canaries

with probability PCheck canary Check canary

Free

Initially, heap full of canaries

1

Page 14: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Heap Differencing

Strategy

Run program multiple times with different randomized

heaps

If detect canary corruption, dump contents of heap

Identify objects across runs using allocation order

Insight: Relation between corruption and object

causing corruption is invariant across heaps

Detect invariant across random heaps

More heaps => higher confidence of invariant

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 14

Page 15: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

1 2

Attributing Buffer Overflows

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++ 15

One candidate!

4 3

corrupted

canary

Which object caused?

delta is constant but unknown?

12 4 3

Run 2

Run 1

Now only 2 candidates

2 4

41 3

Run 3

2 44

Precision increases exponentially with number of runs

Page 16: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Detecting Dangling Pointers (2 cases)

Dangling pointer read/written (easy)

Invariant = canary in freed object X has same

corruption in all runs

Dangling pointer only read (harder)

Sketch of approach (paper explains details)

Only fill freed object X with canary with probability P

Requires multiple trials: ≈ log2(number of callsites)

Look for correlations, i.e., X filled with canary => crash

Establish conditional probabilities

Have: P(callsite X filled with canary | program crashes)

Need: P(crash | filled with canary), guess “prior” to compute

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 16

Page 17: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Correcting Allocator

Group objects by allocation site

Patch object groups at allocate/free time

Associate patches with group

Buffer overrun => add padding to size request

malloc(32) becomes malloc(32 + delta)

Dangling pointer => defer free

free(p) becomes defer_free(p, delta_allocations)

Fixes preserve semantics, no new bugs created

Correcting allocation may != DieFast or DieHard

Correction allocator can be space, CPU efficient

“Patches” created separately, installed on-the-fly

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 17

Page 18: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Deploying Exterminator

Exterminator can be deployed in different modes

Iterative – suitable for test environment

Different random heaps, identical inputs

Complements automatic methods that cause crashes

Replicated mode

Suitable in a multi/many core environment

Like DieHard replication, except auto-corrects, hot patches

Cumulative mode – partial or complete deployment

Aggregates results across different inputs

Enables automatic root cause analysis from Watson dumps

Suitable for wide deployment, perfect for beta release

Likely to catch many bugs not seen in testing lab

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 18

Page 19: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

0

0.5

1

1.5

2

2.5

No

rmalized

Execu

tio

n T

ime

GNU libc Exterminator

allocation-intensive SPECint2000

DieFast Overhead

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++ 19

Page 20: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Exterminator Effectiveness

Squid web cache buffer overflow

Crashes glibc 2.8.0 malloc

3 runs sufficient to isolate 6-byte overflow

Mozilla 1.7.3 buffer overflow (recall demo)

Testing scenario - repeated load of buggy page

23 runs to isolate overflow

Deployed scenario – bug happens in middle of

different browsing sessions

34 runs to isolate overflow

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 20

Page 21: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

Outline

Motivation

Exterminator Collaboration with Emery Berger, Gene Novark

Automatically corrects memory errors

Suitable for large scale deployment

Critical Memory / Samurai Collaboration with Karthik Pattabiraman, Vinod Grover

New memory semantics

Source changes to explicitly identify and protect critical data

Conclusion

21Tolerating and Correcting Memory Errors in C and C++

Page 22: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

The Problem: A Dangerous Mix

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++ 22

Danger 1:

Flat, uniform

address space

0xFE00

0xADE0

int *p = 0xFE00;

*p = 555;

int A[1]; // at 0xADE0

A[1] = 777; // off by 1

My CodeDanger 2:

Unsafe

programming

languages

Danger 3:

Unrestricted

3rd party code

int *p = 0x8000;

*p = 888;

// forge pointer

// to my data

int *q = 0xADE0;

*q= 999;

Library Code

0x8000555

777

888

999Result: corrupt data, crashes

security risks

Page 23: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Critical Memory

Approach

Identify critical program data

Protect it with isolation & replication

Goals:

Harden programs from both SW and HW errors

Unify existing ad hoc solutions

Enable local reasoning about memory state

Leverage powerful static analysis tools

Allow selective, incremental hardening of apps

Provide compatibility with existing libraries, apps

Ben Zorn, Microsoft Research 23Tolerating and Correcting Memory Errors in C and C++

Page 24: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Critical Memory: Idea

Identify and mark some

data as “critical

Type specifier like const

Shadow critical data in

parallel address space

(critical memory)

New operations on

critical data

cload – read

cstore - write

24

critical int balance;

balance += 100;

if (balance < 0) {

chargeCredit();

} else {

// use x, y, etc.

}

balance

Datax, y,

other

non-critical

data critical

data

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++

Code

Page 25: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Critical Memory: Example

25

int buffer[10];

critical int balance ;map_critical(&balance);

temp1 = 100;

cstore(&balance, temp1);

temp = load ( (buffer+40));

store( (buffer+40), temp+200);

temp2 = cload(&balance);

if (temp2 > 0) { … }

balance = 100;

buffer[10] += 200;

…..

if (balance < 0) {

0

0

100

100

100

100

300

100

100

100

Normal

Mem

Critical

Mem

balance

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++

buffer

overflow

into

balance

Page 26: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Third-party Libraries/Untrusted Code

Library code does not need to be critical memory aware If library does not update

critical data, no changes required

If library needs to modify critical data Allow normal stores to

critical memory in library

Explicitly “promote” on return

Copy-in, copy-out semantics

critical int balance = 100;

library_foo(&balance);

promote balance;

__________________

// arg is not critical int *

void library_foo(int *arg)

{

*arg = 10000;

return;

}

Ben Zorn, Microsoft Research 26

Tolerating and Correcting Memory Errors in C and

C++

Page 27: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Samurai: Heap-based Critical Memory

Software critical memory for heap objects

Critical objects allocated with crit_malloc, crit_free

Approach

Replication – base copy + 2 shadow copies

Redundant metadata

Stored with base copy, copy in hash table

Checksum, size data for overflow detection

Robust allocator as foundation

DieHard, unreplicated

Randomizes locations of shadow copies

Ben Zorn, Microsoft Research 27Tolerating and Correcting Memory Errors in C and C++

Page 28: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Samurai Implementation

28

baseBase

Object

Replica 1

Replica 2

shadow pointer 2

shadow pointer 1

Heapregular store causes

memory error !

Vote

Critical load

checks 2 copies,

detects/repairs on

mismatch

•Two replicas

•Shadow pointers in

metadata

•Randomized to reduce

correlated errors

Update

Critical store writes

to all copies

Metadata

•Metadata protected

with checksums/backup

•Protection is only

probabilistic

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++

Page 29: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Samurai Experimental Results

Implementation

Automated Phoenix pass to instrument loads and stores

Runtime library for critical data allocation/de-allocation (C++)

Protected critical data in 5 applications (mostly SPEC)

Chose data that is crucial for end-to-end correctness of program

Evaluation of performance overhead by instrumentation

Fault-injections into critical and non-critical data (for propagation)

Protected critical data in libraries

STL List Class: Backbone of list structure (link pointers)

Memory allocator: Heap meta-data (object size + free list)

Ben Zorn, Microsoft Research 29Tolerating and Correcting Memory Errors in C and C++

Page 30: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Samurai Performance Overheads

30

1.03 1.08 1.01 1.08

2.73

0

0.5

1

1.5

2

2.5

3

vpr crafty parser rayshade gzip

Slo

wd

ow

n

Benchmark

Performance Overhead

Baseline Samurai

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++

Page 31: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Samurai: STL Class + WebServer

STL List Class

Modified memory

allocator for class

Modified member

functions insert, erase

Modified custom

iterators for list objects

Added a new call-back

function for direct

modifications to list

data

Webserver

Used STL list class for

maintaining client

connection information

Made list critical – one

thread/connection

Evaluated across

multiple threads and

connections

Max performance

overhead = 9%

31Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++

Page 32: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Samurai: Protecting Allocator Metadata

32

0

20

40

60

80

100

120

140

espresso cfrac p2c Lindsay Boxed-Sim Mudlle Average

Performance Overheads

Kingsley Samurai

Ben Zorn, Microsoft Research

Tolerating and Correcting Memory Errors in

C and C++

Average = 10%

Page 33: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

Conclusion

Programs written in C / C++ can execute safely

and correctly despite memory errors

Research vision

Improve existing code without source modifications

Reduce human generated patches required

Increase reliability, security by order of magnitude

Current projects

DieHard / Exterminator: automatically detect and

correct memory errors (with high probability)

Critical Memory / Samurai: enable local reasoning,

allow selective hardening, compatibility

ToleRace: replication to hide data races

33Tolerating and Correcting Memory Errors in C and C++

Page 34: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

Hardware Trends (1) Reliability

Hardware transient faults are increasing

Even type-safe programs can be subverted in presence of HW errors Academic demonstrations in Java, OCaml

Soft error workshop (SELSE) conclusions Intel, AMD now more carefully measuring

“Not practical to protect everything”

Faults need to be handled at all levels from HW up the software stack

Measurement is difficult How to determine soft HW error vs. software error?

Early measurement papers appearing

34Tolerating and Correcting Memory Errors in C and C++

Page 35: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

Hardware Trends (2) Multicore

DRAM prices dropping 2Gb, Dual Channel PC 6400 DDR2

800 MHz $85

Multicore CPUs Quad-core Intel Core 2 Quad, AMD

Quad-core Opteron

Eight core Intel by 2008?

Challenge:

How should we use all this

hardware?

35Tolerating and Correcting Memory Errors in C and C++

Page 36: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Additional Information

Web sites: Ben Zorn: http://research.microsoft.com/~zorn

DieHard: http://www.diehard-software.org/

Exterminator: http://www.cs.umass.edu/~gnovark/

Publications Emery D. Berger and Benjamin G. Zorn, "DieHard:

Probabilistic Memory Safety for Unsafe Languages", PLDI’06.

Karthik Pattabiraman, Vinod Grover, and Benjamin G. Zorn, "Samurai: Protecting Critical Data in Unsafe Languages", Eurosys 2008.

Gene Novark, Emery D. Berger and Benjamin G. Zorn, “Exterminator: Correcting Memory Errors with High Probability", PLDI’07.

Lvin, Novark, Berger, and Zorn, "Archipelago: Trading Address

Space for Reliability and Security", ASPLOS 2008.

Ben Zorn, Microsoft Research 36Tolerating and Correcting Memory Errors in C and C++

Page 37: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Backup Slides

Ben Zorn, Microsoft Research 37Tolerating and Correcting Memory Errors in C and C++

Page 38: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

DieHard: Probabilistic Memory Safety

Collaboration with Emery Berger

Plug-compatible replacement for malloc/free in C lib

We define “infinite heap semantics”

Programs execute as if each object allocated with

unbounded memory

All frees ignored

Approximating infinite heaps – 3 key ideas

Overprovisioning

Randomization

Replication

Allows analytic reasoning about safety

38Tolerating and Correcting Memory Errors in C and C++

Page 39: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Overprovisioning, Randomization

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 39

Expand size requests by a factor of M (e.g., M=2)

1 2 3 4 5

1 2 3 4 5

Randomize object placement

12 34 5

Pr(write corrupts) = ½ ?

Pr(write corrupts) = ½ !

Page 40: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Replication (optional)

Ben Zorn, Microsoft Research Tolerating and Correcting Memory Errors in C and C++ 40

Replicate process with different randomization seeds

1 234 5

P2

12 345

P3

input

Broadcast input to all replicas

Compare outputs of replicas, kill when replica disagrees

1 23 45

P1

Voter

Page 41: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

DieHard Implementation Details

Multiply allocated memory by factor of M

Allocation

Segregate objects by size (log2), bitmap allocator

Within size class, place objects randomly in address

space

Randomly re-probe if conflicts (expansion limits probing)

Separate metadata from user data

Fill objects with random values – for detecting uninit reads

Deallocation

Expansion factor => frees deferred

Extra checks for illegal free

41Tolerating and Correcting Memory Errors in C and C++

Page 42: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Segregated size classes

- Static strategy pre-allocates size classes

- Adaptive strategy grows each size class incrementally

Ben Zorn, Microsoft Research

Over-provisioned, Randomized Heap

2

H = max heap size, class i

L = max live size ≤

H/2F = free = H-L

4 5 3 1 6

object size = 16object size = 8

42Tolerating and Correcting Memory Errors in C and C++

Page 43: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

Randomness enables Analytic Reasoning

Example: Buffer Overflows

k = # of replicas, Obj = size of overflow

With no replication, Obj = 1, heap no more

than 1/8 full:

Pr(Mask buffer overflow), = 87.5%

3 replicas: Pr(ibid) = 99.8%

43Tolerating and Correcting Memory Errors in C and C++

Page 44: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

DieHard CPU Performance (no replication)

Runtime on Windows

0

0.2

0.4

0.6

0.8

1

1.2

1.4

cfrac espresso lindsay p2c roboop Geo. Mean

No

rma

lize

d r

un

tim

e

malloc DieHard

44Tolerating and Correcting Memory Errors in C and C++

Page 45: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

DieHard CPU Performance (Linux)

45Tolerating and Correcting Memory Errors in C and C++

0

0.5

1

1.5

2

2.5cfr

ac

esp

resso

lind

sa

y

rob

oo

p

Ge

o. M

ea

n

16

4.g

zip

17

5.v

pr

17

6.g

cc

18

1.m

cf

18

6.c

raft

y

19

7.p

ars

er

25

2.e

on

25

3.p

erlb

mk

25

4.g

ap

25

5.v

ort

ex

25

6.b

zip

2

30

0.t

wo

lf

Ge

o. M

ea

n

No

rma

lize

d r

un

tim

e

malloc GC DieHard (static) DieHard (adaptive)

alloc-intensive general-purpose

Page 46: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

Correctness Results

Tolerates high rate of synthetically injected errors in SPEC programs

Detected two previously unreported benign bugs (197.parser and espresso)

Successfully hides buffer overflow error in Squid web cache server (v 2.3s5)

But don’t take my word for it…

46Tolerating and Correcting Memory Errors in C and C++

Page 47: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Experiments / Benchmarks

vpr: Does place and route on FPGAs from netlist Made routing-resource graph critical

crafty: Plays a game of chess with the user Made cache of previously-seen board positions critical

gzip: Compress/Decompresses a file Made Huffman decoding table critical

parser: Checks syntactic correctness of English sentences based on a dictionary Made the dictionary data structures critical

rayshade: Renders a scene file Made the list of objects to be rendered critical

Ben Zorn, Microsoft Research 47Tolerating and Correcting Memory Errors in C and C++

Page 48: Tolerating and Correcting Memory Errors in C and C++...Software critical memory for heap objects Critical objects allocated with crit_malloc, crit_free Approach Replication –base

Ben Zorn, Microsoft Research

Related Work Conservative GC (Boehm / Demers / Weiser)

Time-space tradeoff (typically >3X)

Provably avoids certain errors

Safe-C compilers Jones & Kelley, Necula, Lam, Rinard, Adve, …

Often built on BDW GC

Up to 10X performance hit

N-version programming Replicas truly statistically independent

Address space randomization (as in Vista)

Failure-oblivious computing [Rinard] Hope that program will continue after memory error with no

untoward effects

48Tolerating and Correcting Memory Errors in C and C++