16
Error Detection in Error Detection in Hardware Hardware VO Hardware-Software-Codesign VO Hardware-Software-Codesign Philipp Jahn Philipp Jahn

Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

Embed Size (px)

Citation preview

Page 1: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

Error Detection in Error Detection in HardwareHardware

VO Hardware-Software-CodesignVO Hardware-Software-Codesign

Philipp JahnPhilipp Jahn

Page 2: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 22

Error detectionError detection

How to How to detectdetect errors with errors with hardware methods hardware methods during system operationduring system operation ConditionsConditions Coverage (probability that error is detected)Coverage (probability that error is detected) Latency (time between start of error and detection)Latency (time between start of error and detection) PerformancePerformance

Slide from VO „Echtzeitsysteme“, H. KopetzSlide from VO „Echtzeitsysteme“, H. Kopetz

Page 3: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 33

Hardware-based error detection Hardware-based error detection Hardware redundancyHardware redundancy

Passive (TMR, majority voting)Passive (TMR, majority voting) Active (duplication and comparison, standby)Active (duplication and comparison, standby) HybridHybrid

Information redundancyInformation redundancy ParityParity ChecksumsChecksums Arithmetic CodesArithmetic Codes

Time redundancyTime redundancy Watchdog timersWatchdog timers CheckingChecking

Capability CheckingCapability Checking Consistency CheckingConsistency Checking Control-Flow CheckingControl-Flow Checking

Page 4: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 44

Information redundancy (1)Information redundancy (1)

Detection / CorrectionDetection / Correction Hamming distanceHamming distance

X = (1001), Y = (0111)X = (1001), Y = (0111) d(X,Y) = 3d(X,Y) = 3

SEC – DEDSEC – DED

Page 5: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 55

Information redundancy (2)Information redundancy (2)

ParityParity One extra bit (even / odd)One extra bit (even / odd) Decoding circuit (set of XOR gates)Decoding circuit (set of XOR gates) Routine checking in busses, memory and registersRoutine checking in busses, memory and registers Detecting singleDetecting single

bit errorsbit errors(no stuck-at faults)(no stuck-at faults)

Page 6: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 66

Information redundancy (3)Information redundancy (3)

Overlapping parityOverlapping parity m of n codesm of n codes Duplication codesDuplication codes Cycle redundancy checksCycle redundancy checks

Sender and receiver agree upon generator polynom G(x)Sender and receiver agree upon generator polynom G(x) Append checksum (k bit) at end of data frame (n-k bit)Append checksum (k bit) at end of data frame (n-k bit) Checksum / G(x) = 0 Checksum / G(x) = 0 correct correct Simple implementation (linear feedback shift register and Simple implementation (linear feedback shift register and

XOR gates)XOR gates) Detect single-bit errors, multiple adjacent bit errors affecting Detect single-bit errors, multiple adjacent bit errors affecting

fewer than n-k bits, and burst transient errorsfewer than n-k bits, and burst transient errors High successful in serial transmission (communication High successful in serial transmission (communication

channels: Ethernet, Token Ring)channels: Ethernet, Token Ring)

Page 7: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 77

Information redundancy (4)Information redundancy (4)

ChecksumsChecksums

Page 8: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 88

Information redundancy (5)Information redundancy (5)

Arithmetic CodesArithmetic Codes Detect errors in arithmetic units (parity would not be Detect errors in arithmetic units (parity would not be

preserved)preserved) Separate or nonseparateSeparate or nonseparate ExamplesExamples

AN codesAN codes Residue codesResidue codes

Page 9: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 99

Time redundancy (1)Time redundancy (1)

Repetition of computations two or more times Repetition of computations two or more times and then comparing (detection or correction by and then comparing (detection or correction by majority)majority) Error detected Error detected maybe retry maybe retry Good for detecting transient faultsGood for detecting transient faults Not protecting against errors resulting from Not protecting against errors resulting from

permanent faultspermanent faults No extra hardware needed but longer processing timeNo extra hardware needed but longer processing time Non-time-critical applicationsNon-time-critical applications

Alternate Logic also detects permanent faults Alternate Logic also detects permanent faults (self-checking circuits f(x) = f ‘(x’))(self-checking circuits f(x) = f ‘(x’))

Page 10: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 1010

Time redundancy (2)Time redundancy (2)

Handle permanent faults per encoding the Handle permanent faults per encoding the second computation (must not alter calculation) second computation (must not alter calculation) e.g. k-shifte.g. k-shift Error in k-1 consecutive bit of arithmetic or logical Error in k-1 consecutive bit of arithmetic or logical

operation detectedoperation detected Additional hardware (two shifters, storage register, Additional hardware (two shifters, storage register,

comparator)comparator)

Page 11: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 1111

Watchdog timersWatchdog timers

Implemented in hardware (external timer) or Implemented in hardware (external timer) or software (process)software (process)

If timer expires If timer expires system reset or recover system reset or recover Detect only very specific type = control-flow Detect only very specific type = control-flow

errorerror If error occurs but timer reset If error occurs but timer reset no detection no detection Difficult to determine runtimeDifficult to determine runtime High detection latencyHigh detection latency

Page 12: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 1212

Capability & Consistency CheckingCapability & Consistency Checking

Capability checking limits access to objects (e.g. Capability checking limits access to objects (e.g. memory segments) to authorized users memory segments) to authorized users (processes)(processes) Implemented in hardware (error traps) or software Implemented in hardware (error traps) or software

(firewall)(firewall) e.g. checking of address validity by MMUe.g. checking of address validity by MMU

Consistency checking determines if states or Consistency checking determines if states or results are reasonableresults are reasonable e.g. range checking, address checking, opcode e.g. range checking, address checking, opcode

checkingchecking

Page 13: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 1313

Control-Flow Checking (1)Control-Flow Checking (1)

Hardware schemeHardware scheme Divide application program into blocksDivide application program into blocks Each block has a single entry and exit pointEach block has a single entry and exit point Reference signature represents an encoding of the Reference signature represents an encoding of the

correct executioncorrect execution Watchdog processor validates the application Watchdog processor validates the application

program by comparing the runtime with the signatureprogram by comparing the runtime with the signature 70% of transient faults lead to control flow errors70% of transient faults lead to control flow errors

LimitationsLimitations Only suitable for processors running single programs Only suitable for processors running single programs

(multiple processes or threads)(multiple processes or threads) Reduced coverage if transmission errors on the bus Reduced coverage if transmission errors on the bus

to the watchdog processor occursto the watchdog processor occurs

Page 14: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 1414

Control-Flow Checking (2)Control-Flow Checking (2)

Signatured Instruction Stream (SIS)Signatured Instruction Stream (SIS) Hardware: Watchdog processor with cyclic code Hardware: Watchdog processor with cyclic code

signature generatorsignature generator Software: Modified assembler and loaderSoftware: Modified assembler and loader

Control Flow Checking using Shadow ProcessingControl Flow Checking using Shadow Processing

Page 15: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 1515

SummarySummary

Hardware low error latencyHardware low error latency Hardware is more expensiveHardware is more expensive e.g. Massively parallel multiprocessorse.g. Massively parallel multiprocessors Combining error detection mechanismCombining error detection mechanism

Page 16: Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

6.6.20076.6.2007 Error Detection in HardwareError Detection in Hardware 1616

ReferencesReferences Ravishankar K. Iyer, Zbigniew Kalbarczyk - Ravishankar K. Iyer, Zbigniew Kalbarczyk - Hardware and Software Error Hardware and Software Error

DetectionDetection - Center for Reliable and High-Performance Computing, - Center for Reliable and High-Performance Computing, University of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-Champaign

Real-Time Systems, Real-Time Systems, Design Principles for Distributed Embedded Design Principles for Distributed Embedded Applications Kopetz, Hermann, 1997, 356 p., Hardcover, ISBN: 978-0-7923-Applications Kopetz, Hermann, 1997, 356 p., Hardcover, ISBN: 978-0-7923-9894-39894-3

Alireza Vahdatpour, Mahdi Fazeli, Seyed Ghassem Miremadi - Alireza Vahdatpour, Mahdi Fazeli, Seyed Ghassem Miremadi - Transient Transient Error Detection in Embedded Sysetms Using Reconfigurable Error Detection in Embedded Sysetms Using Reconfigurable ComponentsComponents - IES, October 2006 - IES, October 2006

M. Dal Chin, W. Hohl, E. Michel, A. Pataricza - M. Dal Chin, W. Hohl, E. Michel, A. Pataricza - Error Detection Error Detection Mechansims for Massively Parallel MultiprocessorsMechansims for Massively Parallel Multiprocessors - IEEE - IEEE Proceedings, 1993Proceedings, 1993

Evaluation of error detection coverage and fault-tolerance of digital plant Evaluation of error detection coverage and fault-tolerance of digital plant protection system in nuclear power plantsprotection system in nuclear power plants

http://robotics.ee.uwa.edu.au/courses/faulttolerant/notes/FT2b.pdfhttp://robotics.ee.uwa.edu.au/courses/faulttolerant/notes/FT2b.pdf A. Steiniger, C. Scherrer - Identifying Efficient Combinations of Error A. Steiniger, C. Scherrer - Identifying Efficient Combinations of Error

Detection Mechanisms Based on Results of Fault Injection Experiments - Detection Mechanisms Based on Results of Fault Injection Experiments - IEEE Transactions on computers, Vol. 51, No. 2, February 2002IEEE Transactions on computers, Vol. 51, No. 2, February 2002