Upload
denis
View
35
Download
1
Embed Size (px)
DESCRIPTION
Mahdi Fazeli , Seyed Ghassem Miremadi , Hossein Asadi , Seyed Nematollah Ahmadian. A Fast and Accurate Multi-Cycle Soft Error Rate Estimation Approach to Resilient Embedded Systems Design. Presenter : Saman Aliari University of Illinois at Urbana Chamapign. - PowerPoint PPT Presentation
Citation preview
A FAST AND ACCURATE MULTI-CYCLE SOFT ERROR RATE ESTIMATION
APPROACH TO RESILIENT EMBEDDED SYSTEMS DESIGN
Department of Computer EngineeringSharif University of Technology
Tehran, IRAN
Mahdi Fazeli, Seyed Ghassem Miremadi, Hossein Asadi, Seyed Nematollah
Ahmadian
Presenter: Saman Aliari
University of Illinois at Urbana Chamapign
2
SPEECH OUTLINES
Soft Errors
SER Modeling in Multi-Cycle Operation
SER Modeling in Single Cycle Operation
Proposed SER Modeling in Multi Cycle Operation
Tool Overview
Experimental Results and Discussions
Conclusions
3
WHAT IS SOFT ERROR?
Transient Faults Due to radiation events 1 0 or 0 1 Alpha particles or Neutrons Memory, Flip-flops, Combinational Logic
Cache
Arithmatic & Logic
Unit
RegFile
Control Unit
Microprocessor
1
1
11
10
00
0
Energetic Particle
4
EVIDENCES OF PARTICLE STRIKES 2000 [Forbes Magezine’00]
SUN Enterprise servers crash, due to Cache problem
2001 [ITRS’01]Soft errors as a major issue in chip design
2003 [EE Times’04]Cisco routers failure, due to soft errors
2004 [Xilinx.com]Xilinx FPGAs highly sensitive to soft errors
2005 [Selse.org]Soft error workshop (70% industry
attendees) 2011 [ZeroSoft’06]
Expected 70% chips to fail in a year
5
MULTI-CYCLE SOFT ERROR PROPAGATION
Q
QSET
CLR
D
POA
B
C
D
E
F
H
I
J
10
00
10
1
0
0
0
0
0
0
0
0
1
0
1Erroneous Value is Captured
Q
QSET
CLR
D
POA
B
C
D
E
F
H
I
J
10
00
10
1
0
1
0
0
1
0
0
1
1
0
1
First Cycle: The SET does not propagate to the PrimaryOutput (PO)
Second Cycle: The error propagates to the Primary Output (PO)
6
SER MODELING IN SINGLE CYCLE
Nominal FIT Logic Derating Timing Derating Electrical Derating Nominal FIT:
Occurrence rate of cosmic rays at error site Computed once for library characterization
Logical Derating Timing Derating Electrical Derating
D
B C
E
A
D
FF
clk
D
1 1
LOGICAL DERATING MODELING
7
The Main Idea: Traversing structural paths from SEU site to POs and FFs Using Signal Probabilities (SP) for off-path signals
SPA: probability of gate “A” having logic value “1” Effective techniques available for SP computation
w
t
w'
t'
EPP(AD) = SPB = 0.2 EPP: Error Propagation Probability
EPP(AE) = EPP(AD)(1-SPC) = 0.20.6 = 0.12
off-path signals
SPB=0.2 SPC=0.4
D
B C
E
A
FF
on-pathsignals
8
PROPAGATION RULES: ON-PATH GATES
Reconvergent Paths
Error propagated to two or more inputs of a gate
Polarity of propagated error matters!
Need of 4 logic values to represent state of each line
0, 1 : no error propagation (Error masked)
a: error propagation with same polarity as error site
ā : error propagation with opposite polarity as error site
Pa(Ui ), Pā(Ui ), P1(Ui ), P0(Ui )
Developed Error Propagation Probability (EPP) Rules
For all logic gates
9
PROPAGATION RULES
On-path gates: Pa(Ui ) + Pā(Ui ) + P1(Ui ) + P0(Ui ) = 1
Off-path gates: P1(Ui ) + P0(Ui ) = 1GATE RULES
AND
n
iiXPoutP
111 )()(
)()]()([)( 11
1 outPXPXPoutPn
iiaia
)()]()([)( 11
1 outPXPXPoutPn
iiaia
)]()()([1)( 10 outPoutPoutPoutPaa
10
TIMING DERATING MODELING Find all possible propagated waveforms
Enhanced static timing analysis Record all possible transitions at each reachable gate
Due to glitch at error site How?
Create glitch of width w Represented by two events: (a,t), (ā,t+w)
For both positive and negative glitches Inject two events (a,t), (ā,t+w) at error site Find all events at the outputs of all on-path gates Calculate the error propagation probabilities Pa, Pā for each event The propagation is done until reaching a PO or FF. Error propagation probabilities for all possible waveforms are computed For each waveform, Latching Probability is computed as follows:
S: Setup Time, H: Hold Time, W: Glitch Width, T:Clock Period
T
WHSLP
at t+w
a
11
TIMING LOGIC DERATING
Different Glitches may propagate to the POs or FFs due to re-convergent fan-out
12
ELECTRICAL DERATING MODELING
1. Algorithm: Computing electrical masking while propagating events
2. Vomin(Gj , inputk): Minimum voltage of input k of Gj
3. Vomax(Gj , inputk): Maximum voltage of input k of Gj
4. Vomin(Gj ): Minimum voltage of Gj output
5. Vomax(Gj ): Maximum voltage of Gj output
6. PWo: Output pulse width
7. For each gate Gj in List(Gi) do
8. For each valid waveform (Wl) in Event List(Gj) do
9. Vomin(inputs) = Max(V omin of gate inputs on waveform Wl);
10. Vomax(inputs) = Min(V omax of gate inputs on waveform Wl);
11. Compute Vomin(Gj )
12. Compute Vomax(Gj )
13. Compute Pwo using computed Vomin(Gj ) and Vomax(Gj )
14. end
15. end
A CASE STUDY: ERROR PROPAGATION FOR TWO CLOCK CYCLES
Q
QSET
CLR
D
PO
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.1
SP=0.5 Q
QSET
CLR
D
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.3
SP=0.5
Clock=1 Clock=2
QQS
ET
CL
R
D
PO
QQS
ET
CL
R
D
PO FF1
PO FF1
FF2 FF2
13
Q
QSET
CLR
D
PO
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.1
SP=0.5 Q
QSET
CLR
D
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.3
SP=0.5
Clock=1 Clock=2
QQS
ET
CL
R
D
PO
QQS
ET
CL
R
D
PO FF1
PO FF1
FF2 FF2
0 1
a a
T=0: P(B)=1(a)T=1:P(B)=1(a)
Q
QSET
CLR
D
PO
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.1
SP=0.5 Q
QSET
CLR
D
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.3
SP=0.5
Clock=1 Clock=2
QQS
ET
CL
R
D
PO
QQS
ET
CL
R
D
PO FF1
PO FF1
FF2 FF2
0 1
a a
T=0: P(B)=1(a)T=1:P(B)=1(a)
T=5: P(E)=0.7(a)+0.3(0)T=6:P(E)=0.7(a)+0.3(0)
a
3 4
a
T=3: P(F)=1(a)T=4:P(F)=1(a)
5 6
a a
Q
QSET
CLR
D
PO
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.1
SP=0.5 Q
QSET
CLR
D
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.3
SP=0.5
Clock=1 Clock=2
QQS
ET
CL
R
D
PO
QQS
ET
CL
R
D
PO FF1
PO FF1
FF2 FF2
0 1
a a
T=0: P(B)=1(a)T=1:P(B)=1(a)
T=5: P(E)=0.7(a)+0.3(0)T=6:P(E)=0.7(a)+0.3(0)
a
3 4
a
T=3: P(F)=1(a)T=4:P(F)=1(a)
5 6
a a
T=10: P(I)=0.42(a)+0.4(1)+0.18(0)T=11:P(I)=0.42(a)+0.4(1)+0.18(0)
ELPP(a)=0.42*0.42=0.176ELPP(a)=0
SFP1mcycle(B)=0.176*0.5=0.088
LP=0.5
10 11
a a
aa
8 9T=8: P(H)=0.2(a)+0.8(0)T=9:P(H)=0.2(a)+0.8(0)
Q
QSET
CLR
D
PO
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.1
SP=0.5 Q
QSET
CLR
D
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.3
SP=0.5
Clock=1 Clock=2
QQS
ET
CL
R
D
PO
QQS
ET
CL
R
D
PO FF1
PO FF1
FF2 FF20 1
a a
T=0: P(B)=1(a)T=1:P(B)=1(a)
T=5: P(E)=0.7(a)+0.3(0)T=6:P(E)=0.7(a)+0.3(0)
a
3 4
a
T=3: P(F)=1(a)T=4:P(F)=1(a)
5 6
a a
T=10: P(I)=0.42(a)+0.4(1)+0.18(0)T=11:P(I)=0.42(a)+0.4(1)+0.18(0)
ELPP(a)=0.42*0.42=0.176ELPP(a)=0
SFP1mcycle(B)=0.176*0.5=0.088
LP=0.5
10 11
a a
aa
8 9T=8: P(H)=0.2(a)+0.8(0)T=9:P(H)=0.2(a)+0.8(0)
10 11
13 14
10 13
1411
13 14
14
10 11
11
1310
T=10: P(J)=0.1(1)+0.63(a)+0.27(0)T=11:P(J)=0.1(1)+0.63(a)+0.27(0)T=13:P(J)=0.2(1)+0.16(a)+0.64(0)T=14:P(J)=0.2(1)+0.16(a)+0.64(0)
ELPP1(a)=0.63*0.63*0.84*0.84=0.28
ELPP1(a)=0.37*0.37*0.16*0.16=0.003
ELPP2(a)=0.63*0.37*0.16*0.84=0.031
ELPP2(a)=0.37*0.63*0.84*0.16=0.031
LP1=0.5
LP2=0.5
LP1=0.7
LP2=0.7
a a
aa
a a
aa
Q
QSET
CLR
D
PO
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.1
SP=0.5 Q
QSET
CLR
D
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.3
SP=0.5
Clock=1 Clock=2
QQS
ET
CL
R
D
PO
QQS
ET
CL
R
D
PO FF1
PO FF1
FF2 FF20 1
a a
T=0: P(B)=1(a)T=1:P(B)=1(a)
T=5: P(E)=0.7(a)+0.3(0)T=6:P(E)=0.7(a)+0.3(0)
a
3 4
a
T=3: P(F)=1(a)T=4:P(F)=1(a)
5 6
a a
T=10: P(I)=0.42(a)+0.4(1)+0.18(0)T=11:P(I)=0.42(a)+0.4(1)+0.18(0)
ELPP(a)=0.42*0.42=0.176ELPP(a)=0
SFP1mcycle(B)=0.176*0.5=0.088
LP=0.5
10 11
a a
aa
8 9T=8: P(H)=0.2(a)+0.8(0)T=9:P(H)=0.2(a)+0.8(0)
10 11
13 14
10 13
1411
13 14
14
10 11
11
1310
T=10: P(J)=0.1(1)+0.63(a)+0.27(0)T=11:P(J)=0.1(1)+0.63(a)+0.27(0)T=13:P(J)=0.2(1)+0.16(a)+0.64(0)T=14:P(J)=0.2(1)+0.16(a)+0.64(0)
ELPP1(a)=0.63*0.63*0.84*0.84=0.28
ELPP1(a)=0.37*0.37*0.16*0.16=0.003
ELPP2(a)=0.63*0.37*0.16*0.84=0.031
ELPP2(a)=0.37*0.63*0.84*0.16=0.031
LP1=0.5
LP2=0.5
LP1=0.7
LP2=0.7
a a
aa
a a
aa
P(a)=0.28*0.5+0.031*0.7=0.161
P(a)=0.003*0.5+0.031*0.7=0.023
Q
QSET
CLR
D
PO
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.1
SP=0.5 Q
QSET
CLR
D
A
B
C
D
E
F
H
I
J
SP=0.7
SP=0.3
SP=0.2
SP=0.4
SP=0.5
SP=0.2
SP=0.5
SP=0.3
SP=0.5
Clock=1 Clock=2
QQS
ET
CL
R
D
PO
QQS
ET
CL
R
D
PO FF1
PO FF1
FF2 FF20 1
a a
T=0: P(B)=1(a)T=1:P(B)=1(a)
T=5: P(E)=0.7(a)+0.3(0)T=6:P(E)=0.7(a)+0.3(0)
a
3 4
a
T=3: P(F)=1(a)T=4:P(F)=1(a)
5 6
a a
T=10: P(I)=0.42(a)+0.4(1)+0.18(0)T=11:P(I)=0.42(a)+0.4(1)+0.18(0)
ELPP(a)=0.42*0.42=0.176ELPP(a)=0
SFP1mcycle(B)=0.176*0.5=0.088
LP=0.5
10 11
a a
aa
8 9T=8: P(H)=0.2(a)+0.8(0)T=9:P(H)=0.2(a)+0.8(0)
10 11
13 14
10 13
1411
13 14
14
10 11
11
1310
T=10: P(J)=0.1(1)+0.63(a)+0.27(0)T=11:P(J)=0.1(1)+0.63(a)+0.27(0)T=13:P(J)=0.2(1)+0.16(a)+0.64(0)T=14:P(J)=0.2(1)+0.16(a)+0.64(0)
ELPP1(a)=0.63*0.63*0.84*0.84=0.28
ELPP1(a)=0.37*0.37*0.16*0.16=0.003
ELPP2(a)=0.63*0.37*0.16*0.84=0.031
ELPP2(a)=0.37*0.63*0.84*0.16=0.031
LP1=0.5
LP2=0.5
LP1=0.7
LP2=0.7
a a
aa
a a
aa
P(a)=0.28*0.5+0.031*0.7=0.161
P(a)=0.003*0.5+0.031*0.7=0.023
P(a)=0.7*0.161=0.112P(a)=0.7*0.023=0.016
P(a)=0.8*0.112=0.089P(a)=0.8*0.016=0.012
SFP2mcycle(B)=1-(1-0.088)*(1-0.101)=0.18
Only logical derating
may occur
All three deratings may occur
14
THE TOOL: MLET MULTI-CYCLE LOGICAL-ELECTRICAL-TIMING DERATING
Characterize library cells
Read designRead technology library cells
START
Extract netlist adj. list
(Gate_List)
Calculate injected pulse width for
library cells
Start traversing Gate_List
Extract SPs Using MC-simulation
End of Gate_list?
Extract forward cone of gate Gi (List_Gi)
Sort List_Gi using topolical sort algorithm
Start traversing List_Gi
|SFPCi(Gi)-SFPCi-1(Gi)| < e
Compute ELPPs, Vo_min, Vo_max, latching probability
for each DFF and output pulse width of gate output (Gj)
Propagate computed values to gate fanout signals
Yes
Compute Error Propagation Probabilities (Logic Derating
Only)
Propagate computed values to gate fanout signals
No
Compute failure probabilities for all FFs, increment the Clock (CLK)
Compute overall design SER
CLK=1
Yes
No
End
Yes No
End of List_Gi?
End of Gate_list?
Compute SFPCi(Gi)
15
EXPERIMENTAL RESULTS: RUN TIME
Execution times for MC simulation approach, SP computation, and MLET approach
• On average, 4 orders of magnitude faster than MC based simulation• Time required to compute SPs is also 5 orders of magnitude less than MC
based simulation
16
EXPERIMENTAL RESULTS: ACCURACY
Difference of derating factors obtained by MLET using various SP variances compared to MC simulations (for an injected pulse width of 50 ps)
• The MLET have an accuracy of about 97% as compared to the MC fault injection approach
17
MULTI-CYCLE SERS
Multi-cycle SER estimation of s820 and s832 ISCAS’89 circuits using MLET
18
CONCLUSIONS & FUTURE WORK SER Estimation is very challenging as it requires dynamic
analysis of transients. The existing SER estimation approaches rely on investigation of
error propagation probabilities for only single cycle resulting in inaccurate system failure rate.
We have proposed a very fast and accurate analytical approach so called MLET which has four main features:
1. It runs very fast.
2. All three masking factors are considered.
3. The effects of error propagation in re-convergent fan-outs are modeled.
4. The effect of multi-cycle error propagation on overall circuit SER is considered.
19
CONCLUSIONS & FUTURE WORK CONT’D
Experimental results extracted for some ISCAS89 circuit benchmark show that MLET is:
4 orders of magnitude faster than the MC simulation based fault injection method
It has an accuracy of about 97%.
Future work: we are going to estimate the SER of a circuit in the presence of Multiple Event Transients (METs) as a reliability concern in ultra deep sub-micron technologies
20
THANK YOU FOR YOUR ATTENTION
21
RELATED WORK: SER MODELING Circuit/Logic-Level Approach
Fault injection SERA by Zhang et. al. [ICCAD’04] SEAT-LA by Rajaraman et. al. [VLSID’06] Mohanram et. al. [ITC’03] Maheshwari et. al. [DFT’03] Asadi et. al. [DSN’03] [PRDC’04] Seifert et. al. [TDMR’04]
Probabilistic Transfer Matrices (PTM) Krishnaswamy et. al. [DATE’05]
Binary Decision Diagram (BDD) FASER by Zhang et. al. [ISQED’06] [SELSE’05]