Upload
kineks
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Bit Read?. Bit has error protection. benign fault no error. benign fault no error. benign fault no error. Does bit matter?. Does bit matter?. False Detected Unrecoverable Error. True Detected Unrecoverable Error. Silent Data Corruption. AVF bit = Probability that a Bit Matters - PowerPoint PPT Presentation
Citation preview
http://www.cs.virginia.edu/~krw7c/avf.html
BitRead?
Bit has error
protection
benign faultno error
yes no
Does bit matter?
no
Does bit matter?
Particle StrikeCauses Bit Flip!
Detectiononly
Detection & Correctionbenign fault
no error
benign faultno error
Silent Data Corruption
yes no
True Detected Unrecoverable
Error
False Detected Unrecoverable
Error
yes no
Outliers
We identify strong correlations between structural AVF values and a small set of processor metrics.
Using linear and quadratic regression, we determined an AVF characterization that uses only a few easily measurable variables. These characterizations can be used to predict AVF accurately.
FIT = Failure in Time = 1 failure in a billion hoursIntel Corporation
AVFbit = Probability that a Bit Matters
=
# of Visible Errors
# of Bit Flips from Particle Strikes
As soft errors become more of a problem, protection will be needed even for every day PCs.
Providing total redundancy is too expensive and assumes that AVF is 100%.
Our work shows that AVF varies over time and across applications.
n0
Kristen Walcott, Greg Humphreys, Sudhanva Gurumurthi
University of Virginia{walcott, humper, gurumurthi}@cs.virginia.edu
Dynamic Prediction of Architectural Vulnerability
What bits matter?
Computer Scienceat the UNIVERSITY of VIRGINIA
Rising Problem Dynamic AVF Prediction
Calculating Vulnerability With an accurate predictor, redundancy may be turned on only when vulnerability is high. Preliminary results show that partial redundancy provides a significant performance boost over full redundancy. Next we will perform a more rigorous exploration of the design space of partial redundant multithreading implementations and investigate redundancy toggling policies.
Future Work
Prediction Results
Challenge
2 SimPoints of bzip2
galgel benchmark
• Transient faults due to particle strikes are a key challenge in microprocessor design.
• As transistor counts increase exponentially, per-chip faults are a growing burden.
• Spatial and temporal redundancy techniques are used to protect against faults.
• Redundancy techniques assume that any fault will result in a visible program error (i.e., the Architectural Vulnerability Factor (AVF) is 100 percent).
• Over-design can hurt performance and drain power.
Microarchitectural Metrics
(Cor
rela
tion
to A
VF
)
1
10
100
1000
10000
100000
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Dat
a C
orru
ptio
n FI
T F
IT
1000 MTBF Goal Error Rate