22
Error detection with AN encoding Norman A. Rink Technische Universität Dresden [email protected] 6 March 2017

Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Error detection with AN encoding

Norman A. RinkTechnische Universität [email protected]

6 March 2017

Page 2: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Outline

1. Introduction – hardware faults and software-based fault tolerance

2. AN encoding

3. Memory error detection

4. Summary

2

Page 3: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Outline

1. Introduction – hardware faults and software-based fault tolerance

2. AN encoding

3. Memory error detection

4. Summary

3

Page 4: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Hardware faults and soft errors

4

q Faults are a long-standing and recurring issue.q More recently: large-scale studies (co-)authored by, e.g., Google

(SIGMETRICS ‘09), Microsoft (EuroSys ’11).

q What causes hardware faults?q cosmic radiationq “dim silicon” (near-threshold computing to save energy)q temperature variations, process variations

à Typically lead to transient hard-ware faults, aka soft errors.

q Non-negligible fault rates in emerging computing paradigms: q silicon-/carbon-based nano-technologies, grapheneq chemical information processing, bio-inspired/quantum computing

taken from S. Borkar, “Designing reliable systems from unreliable components: …,” IEEE Micro, vol. 25, no. 6, 2005.

Page 5: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Hardware faults and soft errors – cont’d

5

cost

(run

time,

ene

rgy)

error probability/output degradation

approximate computing applications,e.g. image processing, machine learning

safety-/security-critical applications,e.g. automotive, operating system (kernels)

non-critical user applications, processes that can be restarted

q The trade-off of computing with faulty hardware:

à Hardware-implemented error handling (aka fault tolerance) is not flexible.

Page 6: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Error detection through redundant code

q Typical approach to detecting/correcting hardware faults:q replicate dataflow (duplication is sufficient for error detection)q insert checks

6

%3 = add i64 %0, %1%4 = mul i64 %3, %2

%3 = add i64 %0, %1%r3 = add i64 %r0, %r1%4 = mul i64 %3, %2%r4 = mul i64 %r3, %r2

%f0 = icmp eq i64 %4, %r4br i1 %f0, label continue,

label recover

IR transformation

original code

fault-tolerant code

duplicated dataflow

error check

runtime overhead of duplicated dataflow typically < 2x

This cannot detect errors in memory!

Page 7: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Outline

1. Introduction – hardware faults and software-based fault tolerance

2. AN encoding

3. Memory error detection

4. Summary

7

Page 8: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

AN encoding

q Correctness condition: Integer values are multiples of a fixed constant A.à Check for faults like this:

if (n % A != 0) { exit(AN_ERROR_CODE); }

q Advantages over replication:q Data in memory is encoded (automatically protected): no need to replicate loads/stores.q Suitable for multithreaded and shared memory applications.

q Disadvantages: Large runtime overheads (up to and over several 10x).q Checking is expensive (modulo operation)q Decoding is expensive (division by A)q Decoding often required, e.g., for address operands.

8

Trade frequencies of these operations for runtime.

Page 9: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

AN encoding – cont’d

q Make a program fault-tolerant by transforming it into an AN-encoded program.

1. Value encoding:

2. Operation encoding, e.g.:

3. Check insertion:q Where non-encoded values enter/leave the scope of AN encoding.à Often referred to as synchronization points.

9

%3 = mul i64 %0, %1 %t3 = mul i64 %0, %1%3 = div i64 %t3, A

multiplication

%1 = load i64* %0 %t0 = ptrtoint i64* %0 to i64%t1 = div i64 %t0, A%t3 = inttoptr i64 %t1 to i64*%1 = load i64* %t3

load

integerconstantc integerconstantc*A

Page 10: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Results: runtime overheads

10

label test case

A-C bubblesort

D CRC (cyclic redundancy checker)

E DES encryption

F Dijkstra

G-I lex, parse, eval

J Fibonacci

K integer matrix multiply

L memcopy

M-O quicksort

test programs runtime overheads

à Small but visible reduction of overheads for profiles 2 and 3.

Page 11: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Results: silent data corruption (sdc)

11

A B C D E F G H I J K L M N O

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.141 2 3 4

Relative frequencies of sdc responses

Page 12: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Outline

1. Introduction – hardware faults and software-based fault tolerance

2. AN encoding

3. Memory error detection

4. Summary

12

Page 13: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Protecting memory

q Encode integer values before storing to memory:

q Decode and check after loading from memory:

13

%0 = mul i64 %0, Astore i64 %0, i64* %p

%1 = load i64* %p%2 = srem i64 %1, A...%3 = sdiv i64 %2, A

This leads to an overhead of about 1.5x (on the same test programs as before).

Page 14: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

q Have implemented AN encoding at the level of LLVM IR.q Leads to unprotected memory accesses:

q Compiler backend inserts memory accesses (below the LLVM IR level):q register spills, function calls etc.

Protecting memory – cont’d

14

Page 15: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

q Example: register spills

q Analogously for (on x86)q return addressq function argumentsq callee-saved registers, frame pointerq (jump tables)

DMR in the compiler backend

15

mov rcx, -0x30(rbp)...mov -0x30(rbp), rcxadd rcx, rsi

mov rcx, -0x38(rbp)mov rcx, -0x30(rbp)...mov -0x30(rbp), rcxcmp -0x38(rbp), rcxjne <exit>add rcx, rsi

Page 16: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Results: AN encoding plus DMR in the backend

16

q Single bit flips in memory (i.e. in the results of load operations).

Page 17: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Outline

1. Introduction – hardware faults and software-based fault tolerance

2. AN encoding

3. Memory error detection

4. Summary

17

Page 18: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Summary

18

q AN encoding leads to large overheads (10x-100x) …q ... when used to protect operations in the CPU.

q Much lower overheads when AN encoding protects memory only (1.5x).q This is also more in the original spirit of encoding (for fault-tolerant communication).

q Making software fault-tolerant at higher abstraction levels (source code/IR) is likely to miss vulnerabilities.q And it misses bad vulnerabilities (return address, frame pointer).q Hence, one must address lower abstraction levels (i.e. machine instructions).

What can be done about this?

Page 19: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Error detection with AN encoding

Norman A. RinkTechnische Universität [email protected]

6 March 2017 Back up

Page 20: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

References

1) N. A. Rink, D. Kuvaiskii, J. Castrillon, C. Fetzer. Compiling for Resilience: the Performance Gap. ERPP ’15 (co-located with ParCo ‘15).

2) N. A. Rink, J. Castrillon. Improving Code Generation for Software-based Error Detection. REES ‘15 (co-located with ESWeek ’15).

3) M. Hoffmann, P. Ulbrich, C. Dietrich, H. Schirmeier, D. Lohmann, W. Schröder-Preikschat. A practitioner’s guide to software-based soft-error mitigation using AN-codes. HASE ‘14.

4) M. Shafique, S. Garg, J. Henkel, D. Marculescu. The EDA challenges in the dark silicon era: Temperature, reliability, and variability perspectives. DAC ’14.

5) E. B. Nightingale, J. R. Douceur, V. Orgovan. Cycles, cells and platters: An empirical analysis of hardware failures on a million consumer PCs. EuroSys ’11.

6) S. Feng, S. Gupta, A. Ansari, S. Mahlke. Shoestring: Probabilistic soft error reliability on the cheap. ASPLOS ’10.

7) B. Schroeder, E. Pinheiro, W.-D. Weber. DRAM errors in the wild: A large-scale field study. SIGMETRICS ’09.

8) J. Yu, M. J. Garzarán, M. Snir. ESoftCheck: Removal of non-vital checks for fault tolerance. CGO ’09.

9) G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, D. I. August. SWIFT: Software implemented fault tolerance. CGO ’05.

10) S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro, 25(6):10–16, 2005.

11) R. Baumann. Soft errors in advanced computer systems. IEEE Design & Test of Computers, 22(3):258–266, 2005. 20

Page 21: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Worked example of AN encoding

21

%1 = load i64* %0%2 = add i64 %1, 42

%3 = call i64 @foo(i64 %2)store i64 %3, i64* %0

%1 = load i64* %0%2 = add i64 %1, 126

%3 = call i64 @foo(i64 %2)store i64 %3, i64* %0

value encoding

*p = foo( (*p) + 42 );

%t0 = ptrtoint i64* %0 to i64%t1 = div i64 %t0, 3%t2 = inttoptr i64 %t1 to i64*%1 = load i64* %t2

%2 = add i64 %1, 126

%t3 = div i64 %2, 3%t4 = call i64 @foo(i64 %t3)%3 = mul i64 %t4, 3

%t5 = ptrtoint i64* %0 to i64%t6 = div i64 %t5, 3%t7 = inttoptr i64 %t6 to i64*

store i64 %3, i64* %t7

operation encoding

call void @check(i64 %0)%t0 = ptrtoint i64* %0 to i64%t1 = div i64 %t0, 3%t2 = inttoptr i64 %t1 to i64*%1 = load i64* %t2call void @check(i64 %1)

%2 = add i64 %1, 126

call void @check(i64 %2)%t3 = div i64 %2, 3%t4 = call i64 foo(i64 %t3)%3 = mul i64 %t4, 3

call void @check(i64 %0)%t5 = ptrtoint i64* %0 to i64%t6 = div i64 %t5, 3%t7 = inttoptr i64 %t6 to i64*

call void @check(i64 %3)store i64 %3, i64* %t7

check insertion

(A = 3)

Page 22: Error detection with AN encoding - cfaed | TU Dresden · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding: 2

Results: effectiveness of AN encoding

22

A A1 A2 A3 A4 B B1 B2 B3 B4 C C1

C2

C3

C4 D D1

D2

D3

D4 E E1 E2 E3 E4 F F1 F2 F3 F4 G G1

G2

G3

G4 H H1

H2

H3

H4 I I1 I2 I3 I4 J J1 J2 J3 J4 K K1 K2 K3 K4 L L1 L2 L3 L4 M M1

M2

M3

M4 N N1

N2

N3

N4 O O1

O2

O3

O4

rela

tive

frequ

enci

es

0.0

0.2

0.4

0.6

0.8

1.0

correct detected hang crash sdc

test program responses to faults