5
Optimal Design of Checksum-Based Checkers for Fault Detection in Linear Analog Circuits Heebyung Yoon, Abhijit Chatterjee and Joseph L.A. Hughes School of Electrical and Computer Engineering Georgia Institute of Technollogy Atlanta, GA 30332-0250, U.S.A. Abstract Traditionally, built-in self-test (BIST) techniques have assumed access to only the input and output nodes of the circuit under test (CUT). It has been shown ear- lier, that checksum-based checkers can be designed to perform on-line fault detection in linear analog circuits using access to certain internal nodes of CUT. In this paper, we address the problem of optimizing the checker circuitry to maximize the detectability of faults in CUT. The above optimization problem is solved as a linear programming problem. The resulting checker can be used to perform both BIST of CUT and on-line error detection as well. Faults in the checkel: hardware are taken into account during the checker optimization pro- cess. metic was first discussed by Huang and Abraham in [4]. The fault-tolerant matrix arithmetic and concur- rent error detection schemes for N-point FFT networks were studied by Jou and Abraham in [5] and [6]. In [7], Nair and Abraham djescribe the use of real-number checksum codes for detecting and correcting errors in matrix-vector computatiions. Recently, Chatterjee applied continuous checksums to the problem of on-line detection and correction of errors in linear analog circuits [l], [2]. While Chatter- jee [l], [2] gives necessary and sufficient conditions for a fault to be detectable with specified checksum code pa- rameters, he does not solve the problem of finding the best checksum code parabmeters that maximize overall fault coverage. In this paper, we show that the opti- mal checker design can be obtained by solving a linear programming problem. First, we present the basic con- cepts. 1. Introduction 2. Basic concepts and theory The advent of multi-chip module technology has en- abled the integration of complex digital and analog functions into single electronic packages. Such mixed- signal systems, as they are called, are hard to test and debug due to the massive complexity of the circuits in- volved and also due to the presence of analog signals which are inherently imprecise in nature. In this paper, we discuss the optimal design of built- in self-test (BIST) hardware for linear analog circuits that often form the core of complex mixed-signal sys- tems. There has been significant work in the past on the use of checksum codes for fault detection in digital circuits. The concept of algorithm-based fault tolerance for matrix operations using checksum codes for detecting and correcting errors in computer arith- 'This work was funded by the Packaging Research Center at the Georgia Institute of Technology under NSF grant number EEC-9402723 and in part by NSF grant number MIP-9309740. 0-8186-7755-4/96 $05.00 0 1996 IEEE The electrical input-output behavior of linear analog circuits can be described ;by a state equation of the form PI X(S) U(S) X(S)=[ A]--+[ S B]-, S where, X(s) is the Laplace transform of the circuit state variables xz(t) (order N by l), the matrix A = uZ3 is an N by p matrix and the matrix B = b,, is an N by m matrix. It is possible to encode the matrices A and B above, using checksum codes as follows. The matrix A of Equation (1) is encoded by the addition of a row R = [rl ,r2 , . . . , 3rP] to A and the matrix B of Equation (1) is encoded lby the addition of a row Q = [q~, 92, . . . , qm] to B [l]. The row R = CV.A and the row Q = CV.B, where CV = [al, ~ 2 , . . . , aP] is called the coding vector 393 Idh International Conference on VLSI Design - January 1997

[IEEE Comput. Soc. Press Tenth International Conference on VLSI Design - Hyderabad, India (4-7 Jan. 1997)] Proceedings Tenth International Conference on VLSI Design - Optimal design

  • Upload
    jla

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE Comput. Soc. Press Tenth International Conference on VLSI Design - Hyderabad, India (4-7 Jan. 1997)] Proceedings Tenth International Conference on VLSI Design - Optimal design

Optimal Design of Checksum-Based Checkers for Fault Detection in Linear Analog Circuits

Heebyung Yoon, Abhijit Chatterjee and Joseph L.A. Hughes School of Electrical and Computer Engineering

Georgia Institute of Technollogy Atlanta, GA 30332-0250, U.S.A.

Abstract

Traditionally, built-in self-test (BIST) techniques have assumed access to only the input and output nodes of the circuit under test (CUT) . It has been shown ear- lier, that checksum-based checkers can be designed to perform on-line fault detection in linear analog circuits using access to certain internal nodes of CUT. In this paper, we address the problem of optimizing the checker circuitry to maximize the detectability of faults in CUT. The above optimization problem is solved as a linear programming problem. The resulting checker can be used to perform both BIST of C U T and on-line error detection as well. Faults in the checkel: hardware are taken into account during the checker optimization pro- cess.

metic was first discussed by Huang and Abraham in [4]. The fault-tolerant matrix arithmetic and concur- rent error detection schemes for N-point FFT networks were studied by Jou and Abraham in [5] and [6] . In [7], Nair and Abraham djescribe the use of real-number checksum codes for detecting and correcting errors in matrix-vector computatiions.

Recently, Chatterjee applied continuous checksums to the problem of on-line detection and correction of errors in linear analog circuits [l], [2]. While Chatter- jee [l], [2] gives necessary and sufficient conditions for a fault to be detectable with specified checksum code pa- rameters, he does not solve the problem of finding the best checksum code parabmeters that maximize overall fault coverage. In this paper, we show that the opti- mal checker design can be obtained by solving a linear programming problem. First, we present the basic con- cepts.

1. Introduction 2. Basic concepts and theory

The advent of multi-chip module technology has en- abled the integration of complex digital and analog functions into single electronic packages. Such mixed- signal systems, as they are called, are hard to test and debug due to the massive complexity of the circuits in- volved and also due to the presence of analog signals which are inherently imprecise in nature.

In this paper, we discuss the optimal design of built- in self-test ( B I S T ) hardware for linear analog circuits that often form the core of complex mixed-signal sys- tems. There has been significant work in the past on the use of checksum codes for fault detection in digital circuits. The concept of algorithm-based fault tolerance for matrix operations using checksum codes for detecting and correcting errors in computer arith-

'This work was funded by the Packaging Research Center a t the Georgia Institute of Technology under NSF grant number EEC-9402723 and in part by NSF grant number MIP-9309740.

0-8186-7755-4/96 $05.00 0 1996 IEEE

The electrical input-output behavior of linear analog circuits can be described ;by a state equation of the form PI

X ( S ) U(S) X ( S ) = [ A ] - - + [ S B ] - , S

where, X ( s ) is the Laplace transform of the circuit state variables xz ( t ) (order N by l), the matrix A = uZ3 is an N by p matrix and the matrix B = b,, is an N by m matrix.

It is possible to encode the matrices A and B above, using checksum codes as follows. The matrix A of Equation (1) is encoded by the addition of a row R = [rl ,r2 , . . . , 3rP] to A and the matrix B of Equation (1) is encoded lby the addition of a row Q = [ q ~ , 92, . . . , qm] to B [l]. The row R = C V . A and the row Q = CV.B, where CV = [al, ~ 2 , . . . , aP] is called the coding vector

393 Idh International Conference on VLSI Design - January 1997

Page 2: [IEEE Comput. Soc. Press Tenth International Conference on VLSI Design - Hyderabad, India (4-7 Jan. 1997)] Proceedings Tenth International Conference on VLSI Design - Optimal design

and ai, 1 5 i 5 p are real numbers. The modified state equation can be written as follows:

The vector XM(S) = [X(s),c(s)lT, is obtained by the addition of the check variable e(.) to the vector X ( s ) of Equation (1). It has been shown that in the modified state equation, c(s) = CV.X(s) [l]. By using hardware to compute and compare the left and right hand sides of the latter equation, it is possible to detect faults in the circuit under test (CUT).

The output of the error signal can be obtained using the technique proposed by Chatterjee El] as follows. Since c(s ) = CV . X ( s ) = E:='=, a,xi(s) and c(s ) = E:"=, rz .z,(s)/s + E:'=, qi . uz (s ) / s by the encoding of matrices A and B, we obtain

Example 1: Consider the lowpass leapfrog filter shown in Figure 1. The outputs of op-amps OA1, OA3, OA4 and OAG correspond to the state variables X I , 22, 2 3 and x4 of the filter, respectively.

Figure 1. Lowpass leapfrog filter

After modification with [all a2, a3, a41 = [l, 1,1,1] with w1 = 1/RC = 2w2, R = 10k and C = O.Olpf, the modified state equation of the filter of Figure 1 becomes

-w1 0 0 0 U 2

-U2 0 0 0 w1 -w1

w2 - w 1 -w1 - w 2 w 1 f w z -U1 - w 2

L - - I

For the lowpass leapfrog filter of Figure 1, we obtain:

$1 3x2 error(s) = - 2 1 - 2 2 - 2 3 - x4 - - - - sCR sCR

(5) 3x3 3x4 221 Ke +- - - - - - - sCR sCR sCR sCR'

The last term on the right side of Equation (5) is neces- sary for stability of the error detection circuit (EDC)

Using Equation (5), EDC of the lowpass leapfrog filter with CV = [I, 1,1,1] can be designed [l] as shown in Figure 2.

PI.

x4 --yii-l Il.. I R..

Figure 2. Error detection circuit for lowpass leapfrog filter

The output error(t) of Figure 2 is ideally zero in the fault-free case and non-zero otherwise.

3. Problem definition and fault models

We assume a single fault model, namely that only a single parameter of the analog CUT is out of its tol- erance box [3]. To determine the optimal checksum code, we need only consider "soft" parameter devia- tions from the nominal [3] as shorts and opens will be detected automatically.

From analyses of the circuit specifications we as- sume that acceptable ranges of individual circuit de- sign parameters have been obtained under the sin- gle fault model. Let the sensitivity of the er- ror signal to the circuit parameter Pi be expressed as SZror (a1, a2, . - . , ap). Consider three parameters P,, Pj and Pk such that S~ror(a:1 ,a2 ,* .* ,ap) > SgPoT(a:1,a2,. . . , a,) > S r o T ( a l , a2, . . , aP) for some given a1, a ~ , . , ap. Let the threshold voltage of the error signal above which a fault is indicated be T . For arguments sake let us assume that the accept- able ranges of all &, Pj and P k are within 10% of their nominal values.

There axe three possible choices of T as shown in Figure 3. Case (a) choose T so that it is equal to the error voltage given by a 10% deviation in Pi (Figure 3 (a ) ) , case (b) choose T so that it is equal to the error voltage given by a 10% deviation in Pk (Figure 3 ( b ) ) and case (c) choose T so that it is equal to the error voltage given by a 10% deviation in Pj (Figure 3 (e)). Case (a) leads to loss in the coverage of the fault in both Pj and P k , since there will be a range of values of

394

Page 3: [IEEE Comput. Soc. Press Tenth International Conference on VLSI Design - Hyderabad, India (4-7 Jan. 1997)] Proceedings Tenth International Conference on VLSI Design - Optimal design

both Pj and pk for which both Pj and will be out of its acceptable 10% deviation range, but for which the error signal will be less than T . Case (b) leads to the possibility of a false alarm as there will now be a range of values of both Pi and Pj for which both Pi and Pj will be within its acceptable 10% deviation range but the error signal will be larger than T . Our objective is to find T and CV such that false alarms are minimized or eliminated and fault coverage is maximized.

* min le(t)l e T e milx le(t)l ' T = mix le(t)l * min P(FA) *mu P(FA)

*T=minle(t)l

(a) CASE 1 (b) CASE 2 (c) CASE 3

Figure 3. Three possible choices of threshold T

4. Fault detection and BIST

For BIST purposes, an impulse is chosen as the test stimulus. It can be easily shown that under applica- tion of an impulse in the time domain, if h'(t), the impulse response of the faulty circuit is different from h(t) the impulse response of the fault-free circuit, then error(t) # 0 for some t > 0. In practice, however, a short pulse of finite width and height is applied to approximate an impulse.

In the following, we show how the checker is de- signed when the input stimulus is an impulse (or a short pulse). The resulting checker can be used for BIST or on-line error detection. We make the follow- ing observation from Figure 4 and 5. From Figure 4, we observe that the largest value of the error signal, given by le(t)l , in response to a pulse stimulus, increase linearly with the percentage deviation from the nom- inal of every circuit component. From Figure 5, we observe that for a given percentage deviation from the nominal of a circuit component, le(t)l is a linear func- tion of al. This can be shown in general, to be true of other elements of CV as well.

Let the magnitude le(t)l of the error signal due to a fault in component ci of CUT be given by yi. In general, yz may depend on one or more of the elements ( ~ 1 , 1 2 2 , * . , ayp of CV. Let us assume that yi depends only on al. It is possible to represent the relationship

Figure 4:. Error signet1 for leapfrog filter (Rl) with fixed tolerances

Figure 5. Error signaJ for leapfrog filter (Rl) with codling vector a]

of Figure 4 and 5 using Equation (6)

9.i = ClDial. (6)

In Equation (6) c1 is a constant of proportionality and Di is the percentage deviation in the value of c1 from the nominal. In general, when yi depends on more than one element of CV, we cain write Equation (7)

k

yi = D i ( C c ja j ) . (7) j=1

In Equation (7), IC is the number of elements of CV that ya depends upon.

Table 1 shows the elements of CV that must be op- timized for €auks in the respective components. Note that to deteict a fault in Ii!l one needs to optimize only the value of a1 but to detect a fault in R 2 6 , one needs to optimize the values of 1m1, a2, a3 and 124.

Every parameter of CUT has a range of values over which the circuit specific<ations are not violated. Let this range for the parameter Pz (which may repre- sent the value of a passive component) be given by Sa- 5 Pi 5 ST. For simplicity, let us assume that

395

Page 4: [IEEE Comput. Soc. Press Tenth International Conference on VLSI Design - Hyderabad, India (4-7 Jan. 1997)] Proceedings Tenth International Conference on VLSI Design - Optimal design

Coding Vector a1

Table 1. Faults in components vs coding vec- tors

Faults in Components RI, R2, R6, R17, Ria, Rig, E24

S,' - P, = P, - S,- and that D, for every circuit compo- 100%) is 10%. We do a preliminary nent (D , = pt

design of the checker with basic CV = [l, 1,. . . , I]. The average noise on the error signal is computed and the threshold T is chosen to be just larger than this noise level. The problem then is to choose CV such that the probability of a false alarm is near zero and fault cov- erage is maximized. The optimum solution is obtained by solving the following linear programming problem.

P, - s;

I subjected to the P constraints (P = total number of circuit parameters (elemens)) I

Figure 6. Formulation of the optimization problem

The P constants ensure that under fault-free condi- tions T is always greater than le(t)l thereby eliminating false alarms. The function maximizes fault coverage by minimizing the range of values about the nominal, for which component faults are not detected. Example 2: Using Di = 10% and T = 3.58 volts, the optimization problem becomes (Figure 7):

After solving the above optimization problem, the optimal coding vector (OCV) is [QI, ~ 2 , ag, cy41 = [l, 1.5857, 2.0487, 1.05811. Using OCV and equation (3), the equation of the optimal EDC becomes as follows:

error(s) = - 2 1 - 1.585722 - 2.048723 - 1.058124 0.414321 - 4.048722 3.701923 - sCR sCR sCR

sCR sCR 10sCR' 8 shows the optimal EDC for the lowpass

(8) 4.164924 2~ e - - - - ~

Figure leapfrog filter.

Minimize ( 15.61181 +4.9934a2 +4.1885aj + I S . 1 5 2 ~ ~ - 114.56)

subject tn the cnnstrasinU a1 5 I a 2 5 3.817 a3 5 3.837

a 4 5 1.946

0.8453ai + 1.7X53a4 5 3.58 I.0(X)4a2 + 1.8X42a4 5 3.58 0.X526a3 + 1.732fia4 5 3.5X

1 . 7 2 5 3 ~ ~ + 0.9053a3 5 3.58 l.8656al - 0.9512 a2 5 3.58

0. 1785al + 0.09 15aZ + 0.0305a3 + 0.0565a4 5 3.58 0. 1744a1 + 0.1024a2 f 0 . 0 3 6 4 ~ ~ + 0.0554a4 5 3.58 0.04fi5a1 + 0.0185az + 0.0()25a3 + 0.0145a4 5 3.58 0.2052al + 0 . 1 102az + 0.0342a3 + 0.0752aq 5 3.58

a, 2 0 , i = 1 , 2 . % 4

Figure 7. Optimization problem with T = 3.58 volts and Di = 10%

Figure 8. Optimal EDC for lowpass leapfrog filter

Figure 9 shows the value of le(t)l for the optimized and unoptimized checking circuits for faults in different circuit components numbered 1 through 33.

Fault coverage is defined as the probability that le(t)l > T for parametric (soft) or catastrophic (hard) faults in the circuit under test. As an example, consider a fault in the component R,. If a 5 R, 5 b under fault, then the fault is assumed to be parametric, else it is deemed to be catastrophic. The value of a is typically determined by circuit specifications and the value of b is specified by the designer or is some value for which there is gross circuit malfunction (is easily detectable).

We assume the uniform distribution for R, in the range {a , b}. Under this assumption, the probability of detecting a parametric fault in R, from Figure 10 is x/y. Let the probability of detecting a parametric fault in the ith component as determined above, be given by Ppar(i). Also, let the probability of detecting the ith hard fault be given by Phar,j(i) out of j hard

396

Page 5: [IEEE Comput. Soc. Press Tenth International Conference on VLSI Design - Hyderabad, India (4-7 Jan. 1997)] Proceedings Tenth International Conference on VLSI Design - Optimal design

-1 ' - 1

Fault RI ShortlOpen R2 ShortlOpen R4 Shortlopen R7 Shortlopen R13 Shortlopen Rls Shortlopen Rw, Shortlopen

Figure 9. Comparison of error signals for un- optimized and optimized checking circuits

Error Signal, le(t)l 11.52/13.01 13.02110.77 9.6515.13

11.29110.67 10.67113.18

13.419.1 12.94112.91

Figure 10. Fault coverage model for bad com- ponents

faults possible. Then the fault coverage FC is given by

Table 2. Hard faults in passive components

Example 3: Assume that the range of the uniform distribution is between 11k and look (between a and b in Figure 10) and the total number of compo- nents n is 33. Using the optimal EDC of Figure 8,

Ppar(i) = 30.06745. Therefore, the fault coverage FC becomes

Also, we simulated a la,rge number of hard faults, all of which were detected. As an example, for b = 100 . (nominal value), the fault coverage for the un- optimized checking circuit was 98.13%, and for the op- timized checker was 98.:22%.

5. Conclusion

In this paper we have presented a novel algorithm that generates an optimal coding vector (OCV) for the error detection circuit (EDC) . To the best of our knowledge, this is the first time an algorithm for de- termining the fault threshold T (i.e., the magnitude of the error signal at which a fault is determined to have occurred in the linear analog circuits) and OCV has been proposed.

We consider all faults in the circuit under test (CUT) as well as EDC since the output of the er- ror signal also depends on the number of components that have faults in EDC (checker). It is seen that the choice of the checksum code given by coding vector significantly impacts the performance of the checker.

Referemes

[I] A. Chatterjee. Concurrent error detection in linear ana- log and switched-capacitor state variable systems using continuous checksums. International Test Conference, pages 582-591, 1991.

[2] A. Chakterjee. Concurrent error detection and fault- tolerance in linear analog circuits using continuous checksums. IEEE Transaction on Very Large Integra- tion (VLSI) Systems, 1(2):138-150, June 1993.

[3] N. B. Hamida and B. Kaminska. Analog circuit testing based o'n sensitivity computation and new circuit mod- eling. International 'Test Conference, pages 652-661 , 1993.

[4] K.-H. Ehang and J. A. Abraham. Algorithm-based fault tolerance for matrix operations. IEEE "?ransaction on Computers, c-33(6):518-528, June 1984.

Fault-tolerant ma- trix arithmetic and signal processing on highly concur- rent computing structures. Proceedzngs of the IEEE, 74(5):7:32-741, May 1986.

[6] J.-Y. Jim and J. A. Abraham. Fault-tolerant fft net- works. IEEE Transactzon on Computers, 37(5):548-

[7] V. Nair and J. A. Abraham. Real-number codes for fault-tolerant matrix operation on processor arrays. IEEE Transaction on Computers, 39(4):426-435, April 1990.

[5] J.-Y. Jou and J. A. Abraham.

561, Mity 1988.

30.06745 33

) = 0.9556(95.56%) (10) FC = 0.5(1) + 0.5(

397