A Low Overhead High Test Compression

8/8/2019 A Low Overhead High Test Compression

http://slidepdf.com/reader/full/a-low-overhead-high-test-compression 1/14

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

A Low Overhead High Test CompressionTechnique Using Pattern Clustering

With n -Detection Test SupportSeongmoon Wang, Wenlong Wei, and Zhanglei Wang

Abstract—This paper presents a test data compression schemethat can be used to further improve compressions achieved bylinear-feedback shift register (LFSR) reseeding. The proposedcompression technique can be implemented with very low hard-ware overhead. The test data to be stored in the automatic testequipment (ATE) memory are much smaller than that for pre-viously published schemes, and the number of test patterns thatneed to be generated is smaller than other weighted randompattern testing schemes. The proposed technique can be extendedto generate test patterns that achieve high -detection fault

coverage. This technique compresses a regular 1-detection testcube set instead of an -detection test cube set, which is typically

times larger. Hence, the volume of compressed test data for-detection test is comparable to that for 1-detection test. Exper-

imental results on a large industry design show that over 1600Xcompression is achievable by the proposed scheme with the testsequence length, which is comparable to that of highly compacteddeterministic patterns. Experimental results on -detection testshow that test patterns generated by the proposed decompressorcan achieve very high 5-detection stuck-at fault coverage and highcompression for large benchmark circuits.

Index Terms—Linear-feedback shift register (LFSR) reseeding,linear decompression, -detection testing, test data compression.

I. INTRODUCTION

ASCERTAINING high quality of test for complex chipsrequires huge test data. Hence, test data volumes for

complex chips often exceed memory capacity of automatictest equipments (ATEs). A number of different techniquesto compress test data has been developed. Several test datacompression techniques based on linear-feedback shift register(LFSR) reseeding (also called linear decompression) [1], [2]

have been published since Könemann showed that it can ef-ficiently compress test patterns [3]. Several commercial toolsbased on LFSR reseeding are also available [2], [4], [5]. LFSR

reseeding techniques take advantage of the fact that typicalscan test patterns have very few specified bits. Specified bitsare those bits that are assigned binary values during test patterngeneration. All other bits are not specified, i.e., don’t cares.

International Technology Roadmap for Semiconductor(ITRS) 2005 [6] predicts that about 1000X compression willbe required around 2013. Achieving 1000X compression only

Manuscript received December 09, 2008; revised June 14, 2009.S. Wang and W. Wei are with NEC Laboratories America, Princeton, NJ

08540 USA (e-mail: [email protected]; [email protected]).Z. Wang is with Cisco Systems, Inc., San Jose, CA 95134 USA (e-mail:

[email protected]).

Digital Object Identifier 10.1109/TVLSI.2009.2026420

by LFSR reseeding is very difficult. Weighted random pattern

testing has been developed as a technique to improve faultcoverage in random pattern-based built-in self-test (BIST)[7], [8]. Recently, the application of weighted random pattern

testing techniques to test data compression was presentedin [9]–[12]. Unlike other weighted random pattern testings[9]–[11], the technique proposed in [12], which is based on3-weight weighted random BIST (or hybrid BIST [13]–[15]),

requires no on-chip memory to store weight sets. In contrastto conventional weighted random pattern BIST where variousweights, e.g., 0, 0.25, 0.5 0.75, 1.0, can be assigned to outputsof test pattern generators (TPGs), in 3-weight weighted random

BIST, only three weights, 0, 0.5, 1, are assigned. Due to itssimplicity, it can be implemented with low hardware overhead.

The technique proposed in [12] enhances compressionsachieved by simple LFSR reseeding with a 3-weight weighted

random BIST technique. However, since this technique requirestwo LFSRs, each of which should be loaded with a separateseed for each weight set, additional compression achieved by

this technique is limited. The decompressor proposed in [16],which is a preliminary version of this paper, needs only one

seed for each weight set to achieve even higher compression.The proposed method requires no special automatic test patterngenerator (ATPG) that is customized for the proposed compres-

sion scheme, and hence can be used to compress test patternsgenerated by any ATPG tool. The proposed technique includesefficient algorithms that compute the minimum number of weight sets (generators). In addition, two variations of decom-

pressor architecture are proposed to satisfy different objectives.This paper extends our previous paper [16] with a technique touncompact densely specified test cubes to balance numbers of

specified bits in weight sets (see Section IV-A), and a procedureto compute weight sets for designs with multiple scan chains(see Section VI-B).

Achieving high single stuck-at fault coverage does not guar-

antee high quality of testing, especially for chips fabricated withnanometer processes. An -detection testing has been studiedby several researchers [17]–[20] as an effective test technique toimprove unmodeled defect coverage and reduce defective partssince it was proposed by Ma et al. [21]. An -detection test set is

developed to detect each fault by different test patterns in thetest set. Generating -detection test sets by ATPG techniqueshas several serious difficulties that hinder wide adoption of this

test method by the industry. First, the size of -detection test setgrows approximately linearly with [19]. Volumes of even tra-ditional (1-detection) test data for today’s complex chips often

exceed the memory capacity of ATEs. Since times more test

1063-8210/$26.00 © 2009 IEEE

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-




2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

patterns are applied, the test application time will also increase

about times. Test data volume and test application time arethe two major factors that determine the overall test cost. MostATPG-based techniques [17], [20] generate an -detection testset by generating different test sets, and then, eliminating un-necessary test patterns from the set. Hence, the total test gener-

ation time can increase significantly.Although one of the main objectives of test compression is

improvement of test quality, to our best knowledge, there are

very few published papers that directly address both high testdata compression and high -detection coverage. A straightfor-ward approach to reduce large volume of an -detection test set,which was possibly generated by an ATPG-based -detection

test generation technique, is to apply an existing test compres-sion technique on the -detection test set. This approach willreduce the volume of the -detection test set. However, since

most test compression techniques based on LFSR reseeding [2],[3] and broadcast scan [22], [23] compress each test pattern sep-arately, i.e., test patterns are converted into different com-

pressed data, the volume of compressed data for an -detectiontest set will be also times larger than that of compressed datafor the 1-detection test set if the same compression method is

used. Pomeranz and Reddy [19] proposed a test compressiontechnique for -detection test. Their decompressor is basicallya large decoder. Area overhead for decompressors of large de-

signs, which require large number of test patterns for high faultcoverage, will be significant. Further, if test patterns are regen-erated due to last minute design changes, then the decompressor

should be also redesigned and resynthesized. A technique toenhance the probability of detecting unmodeled defects by uti-lizing don’t cares existing in test patterns is proposed by Tang

et al. [24]. This technique can be used in conjunction with a test

data compression scheme.With little modification, the proposed test data compression

technique can generate test patterns that achieve high -detec-tion coverage. This part of the study is presented in our priorpaper also[25]. The proposed technique compresses a 1-detec-

tion test set rather than an -detection test set to generate -de-tection test patterns. Hence, even though the proposed techniquecan achieve very high -detection coverage, the volume of com-

pressed test data for -detection test is comparable to that of compressed test data for 1-detection test. Unlike [19], the de-compressor need not be redesigned for design changes unlessthere are drastic design changes.

The rest of this paper is organized as follows. Section II il-lustrates the generator and the conceptual decompressor for theproposed compression method. In Section III, architecture of

the proposed decompressor is described. In Section IV, the al-gorithm of computing generators is described. Section V de-scribes two variations of the proposed decompressor. Section VIextends the proposed method to multiple scan chain designs.The application of the proposed method to -detection testing

is presented in Section VII. Experimental results are shown inSection VIII, and Section IX has conclusions.

II. PRELIMINARIES

In this paper, compressions achieved by traditional LFSR re-

seeding are enhanced by compressing multiple test cubes intoone seed. This is achieved by merging multiple test cubes into

Fig. 1. Test cube set and generator.

a weight set like [13], and then, compressing the weight set by

LFSR reseeding. Merged test cubes are recovered by a 3-weightweighed random BIST during test application.

A test cube is a test pattern that has unspecified bits. A gener-

ator for a circuit with inputs, which isderived froma set of testcubes, is represented by an -bit tuple ,where . If input is always assigned or1 (0) in every test cube in the test cube set and assigned 1 (0)in at least one test cube, then the input is assigned 1 (0) in

the corresponding generator. If the input is never assigned abinary value 1 or 0 in any test cube in the test cube set, then is

assigned in the corresponding generator. Finally, if the inputis assigned a 1 (0) in a test cube and assigned a 0 (1) in

another test cube in the test cube set, then the test cubeis said to conflict with the test cube at input and is as-signed a in the generator. Inputs that are assigned s in

are called conflicting inputs of . Example 1: In Fig. 1, is a deterministic

test cube set that is merged into a generator . In , inputs

and are assigned only or 0. Hence, weight 0 is givento and in . Note that even if we fix and to 0s,

we can still detect all faults that are detected by , , and .Since isalways assigned or1 inevery testcube,weight 1 isassigned to , i.e., is fixed to 1. On the other hand, and

are assigned 0 in some test cubes and 1 in some other test cubes.Hence, unlike , and , wecannotfix theseinputsto binaryvalues, and weight 0.5 is assigned to these inputs (symbol thatdenotes weight 0.5 is given to and in ). Finally, since

the value at is a don’t care in every test cube, is assignedto in .

The three test cubes that are merged into can be recovered

by the conceptual decompressor shown in Fig. 1. The S-TPGand the F-TPG are controlled by the ATE during test applicationto generate desired patterns, while the R-TPG is a free-runningrandom pattern generator. The S-TPG controls the select inputof the multiplexer; if is assigned a (0 or 1), then the output

of the S-TPG is set to a 1 (0) at the th scan cycle to select theR-TPG (F-TPG) as the pattern source for . The F-TPG gener-ates the values for the inputs that are assigned binary values in

. The F-TPG can be implemented with any linear test pattern

generator such as an LFSR, a cellular automaton, or even a ringgenerator of embedded decompression test (EDT) [2]. If isassigned a 1 (0) in , then the output of the F-TPG should be

set to a 1 (0) at the cycles when a value for is scanned intothe scan chain. If four test patterns are generated from andthe R-TPG generates 00, 01, 10, and 11 for and respec-tively, in each of the four test patterns, then all faults detected by

, , and are also detected by the four test patterns. In thispaper, the values required at the output of the S-TPG are repre-

sented in S-pattern , while the values required at the outputof the F-TPG are represented by F-pattern .

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-




WANG et al.: LOW OVERHEAD HIGH TEST COMPRESSION TECHNIQUE 3

Fig. 2. Decompressor architecture.

Assume that 100X compression is achieved by reseeding theF-TPG, i.e., compressing each individual test cube into a seed

achieves 100X compression. If the proposed technique mergeson average three test cubes into a generator and requires about10% additional data, which include data for the S-TPG, and ad-ditionally specified bits (since multiple test cubes are merged

into a generator, the number of specified bits in a generator

can be larger than that of specified bits in any individual testcube that was merged into the generator) are needed for the

proposed method, then the proposed method achieves approxi-mately compression. In other words, theproposed method improves the compression achieved by LFSRreseeding by a factor of about 3.

III. ARCHITECTURE OF THE PROPOSED DECOMPRESSOR

Fig. 2 describes architecture of the decompressor for the

proposed method to generate test patterns for generatorshown in Fig. 1. During test application, if every test cube thatis merged into , i.e., , , and , covers at least one test

pattern generated by the decompressor, then it is guaranteedthat the test patterns generated by the decompressor detect allfaults detected by , , and . An -bit test cube is said to

cover another -bit test cube if: 1) or , withor 1, at the positions where , where , and

2) at the positions where , with .

In the example shown in Fig. 2, covers , covers and

, and also covers and .The maximum number of conflicting inputs or s allowed

in a generator is denoted by . If a large number of s areallowed, i.e., large is used, then a large number of test

cubes can be merged into each generator, thus leading to highercompression. However, since random patterns generated by the

R-TPG are applied to the conflicting inputs, if large isused, in general, more than patterns should be generated

by each generator to make every test cube merged into cover

at least one test pattern generated by the decompressor for .Hence, we use for short test sequences.

Before each scan shift operation for , the F-TPG isloaded with the same seed, which was computed from bysolving a set of linear equations [1], and repeatedly generates

the same pattern shown in Fig. 2. The S-TPGcomprises of a modulo-7 counter, a 2 3 FIFO, a multiplexer,and a comparator. The modulo-7 counter is reset to 0 in everycapture cycle, and then increments by 1 thereafter at every shift

cycle. The FIFO is loaded with the locations of the scan inputsthat are assigned s in (since and are assigned sin , shown in Fig. 1, the FIFO is loaded with 1 and 3). The

output of the comparator is set to 1 when the content of themodulo-7 counter equals the first (topmost) entry of the FIFO,and set to 0 in all other cycles. Hence, will be set to 1 (and all

entries in the FIFO rotate by one entry) in the cycles when thecontent of the counter is 1 and 3, and 0 in all other cycles. Thisis repeated for all four test patterns, , , , and

Since very small number of s are allowed, the number of storage elements required for the S-FIFO inside the S-TPG isalso very small. The total number of storage bits for the S-FIFO,

i.e., the number of bits required to store locations of conflictingscan inputs of a generator, is given by , whereis the number of scan flip-flops in the scan chain or scan depth.

Let us consider a scan design with a scan chain that comprises130 000 scan flip-flops. Assume that to compress the test set byregular LFSR reseeding, a 650-stage (0.5% of total number of

scan flip-flops) LFSR is required (this will achieve 200X com-pression). If for the proposed decompressor, then thetotal data overhead for the S-TPG isfor each generator, less than 10% of the test data volume for

LFSR reseeding.The S-FIFO needs 51 (depth 3 and width 17) storage ele-

ments. The depth of the S-FIFO is independent of the size of design for which the decompressor is designed. The width of the S-FIFO is logarithmically proportional to the number of

scan flip-flops in the design. Hence, the number of storage el-ements (also, hardware overhead) for the S-TPG will not in-crease significantly even for large designs. In fact, the ratio of

the number of storage elements for the S-FIFO to the numberof storage elements for the F-TPG will be even lower for largerdesigns. Other hardware components required to implement theproposed decompressor besides the S-FIFO and the F-TPG in-

clude a -stage modulo counter and a -bit com-parator (a 17-stage counter and a 17-bit comparator when). Combined area overhead for these components is neg-

ligible, considering the size of the design that has 130 000 flip-flops (since the R-TPG can be shared with the F-TPG, it is notconsidered as additional hardware).

IV. COMPUTING GENERATORS

If an LFSR is used to implement the F-TPG, the number of bits to be stored in the ATE memory for F-TPG data is roughlygiven by the number of generators the number of specified

bits in the most densely specified generator (since data for theS-TPG are small as described before, F-TPG seeds will dom-

inate the overall test data volume for the proposed method).Hence, to minimize overall test data volume, we minimize the

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





number of specified bits in the most densely specified generator

and the total number of generators.

A. Uncompacting Densely Specified Test Cubes

The number of care bits of a generator is tightly related to the

numbers of care bits of test cubes that were merged into the gen-

erator. The proposed technique compresses test cubes generatedby a regular (commercial) ATPG tool, instead of generating test

cubes suitable for compression by a special ATPG. Test datacompaction [26], [27], which is employed by most ATPG toolsto reduce the number of test patterns, increases specified bits intest cubes. Static compaction reduces the number of test patterns

by merging several compatible test cubes into one test cube.Hence, it is likely that those very densely specified test cubeswere created from merging several test cubes by test data com-paction. We uncompact (reverse compact) a few very denselyspecified test cubes into less densely specified test cubes. This

will efficiently reduce the size of LFSR and the overall test datavolume without significant increase in the test sequence length,

since only very few (say, 5%) densely specified test cubes areuncompacted.

To efficiently uncompact a densely specified test cube intotwo test cubes and , the following should be satisfied.First, specified bits in should be evenly divided into and

. Second, the overlap of specified bits between andshould be minimized. In other words, if the input is specifiedin , then it should not be specified in , and vice versa.

The divided test cubes and should be able to detect allfaults, which the original test cube detects. To satisfy theserequirements, in this paper, each very densely specified test cube

is divided into two less densely specified test cubes andby the following procedure.

1) Identify a set of all outputs at which at least one faultin set , where is the set of the faults that are detectedby , is observed.For example, since every output detects at least one fault

in Fig. 3, .2) For every output in , find the set of inputs that are

in the fanin cone of output and specified in .

In Fig. 3, , ,, and .

3) Initialize all bits of two test cubes and with s,both of which have the same number of bits as .

4) Select output at which the largest number of faults

in are observed and mark the faults that are observedat from . For every input in , which drives

(note that all inputs in are specified), if isassigned , where or 1, in , set the corresponding

input to in .In Fig. 3, since the largest number of faults, , , ,and, , are detected at , . The output

is removed from and , , , and are marked.Since the inputs, , , , and , which belong toare assigned, respectively, 1, 0, 0, and 0 in ,

.

5) If the number of specified bits in is greater than thatof specified bits in , then set and

(this process balances numbers of specified bits in and). Otherwise, and . Select an output

Fig. 3. Uncompacting test cube d .

from whose input set includes the fewest inputsthat are already specified in among all input sets (thisstep minimizes overlaps of specified inputs between

and ). Remove from and mark all faults that areobserved at the selected output from . For every input

in , if input is assigned a binary value in , then

set the corresponding input to in . If there are nounmarked faults in , go to Step 6. Repeat step 5.For shown in Fig. 3, step 5 is iterated three times. Inthe first iteration, since some bits of are specifiedwhile none of are specified, the number of speci-fied bits in is greater than that of specified bits of

. Hence, and . Since the inputset of includes no inputs that are specified in

, is removed from , and , , and , which

are observed at , are marked. The values that the in-puts in are assigned in are copied to , making

. In the second itera-

tion, since the number of specified bits in is the same

as that of specified bits in , , and .Since no inputs in are specified in , is se-lected next and removed from . The fault is marked

(the other fault has already been marked). The valuesthat the inputs in are assigned in are copied to tomake .In the third iteration, and .Since only is left in , is marked. Since

, , , and in ,. Since

there are no unmarked faults in , we move on to step 6.

6) Run fault simulation with the divided test cubes and, and identify the lists of faults and that are

detected by and , respectively.and in

Fig. 3.If the number of specified bits in either of the partitioned test

cubes and is large, then the test cube is further divided

into another pair of test cubes.

B. Algorithms to Compute Generators

Let the set of test cubes to be compressed by the proposedmethod be . Test cubes in are grouped into test cube sub-

sets , where . A generator is derived fromeach test cube subset according to the procedure described

in Section II. Each test cube subset is formed by movingtest cubes from into until adding any more test cube in

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





Fig. 4. Constructing test cube subsets. (a) Original test cube set D . (b) Parti-tioned test cube subsets.

into makes the number of care bits (0, 1, ) in the corre-sponding generator greater than a predefined number ,

or the number of conflicting inputs in greater than anotherpredefined number .

Example 2: Fig. 4 illustrates computing generators from aset of test cubes , which has 12 test cubes. Assume thatis set to 6 and to 2, and the F-TPG is implemented with

an LFSR, which has stages, where is a smallnatural number added as a margin to ensure that equations aresolvable [3]. First, we run fault simulation with the entire test

cubes in and identify the set of faults that are detectedby each test cube , where . The set of faults

is called the target fault list of and the faults in arecalled the target faults for . Then, we start constructing testcube subsets starting from by moving test cubes from

one test cube at a time. The column shows numbers of faults in target fault list. First, an empty set is created andgenerator is initialized to . The test cubethat has the largest number of target faults is selected as the

first test cube to be moved. Since has the largest numberof target faults, is selected first to be moved into . After

is added into , is updated to

(see Section II). Next, the test cubethat will cause the minimum number of conflicting inputs inwhen added into is selected from . Since causes onlyone conflicting input and six care bits in , is

selected as the next test cube.

Typically, even if some specified bits in a test cube, whichwas generated by an ATPG tool, are relaxed to s (don’t cares),all faults that are detected by the original test cube can still bedetected. When these overspecified bits are relaxed, more test

cubes can be merged into each generator to reduce the totalnumber of generators. In this paper, we try to relax specifiedinputs only if is in the current generator or is as-

signed in the test cube, but assigned in the current gener-ator. Relaxed inputs are denoted by underlines in Fig. 4(b). As-sume that no specified bits can be relaxed to s in withoutmaking any target fault of undetected. After is added into

, is updated to . When addedinto , both and cause only one additional conflicting

input in , and neither of them makes the number of care bitsin greater than . Assume that is selected as the

next test cube. Also, assume that there are no overspecified bits

in . Hence, is added into as it is. is updated to.

Since the number of care bits and the number of conflictinginputs to be incurred by adding a test cube are computed beforeoverspecified bits in the test cube are relaxed, some test cubes

that make the number of care bits in greater than , or thenumber of conflicting inputs greater than before the relax-ation can be actually added without exceeding or if some overspecified bits in test cubes are relaxed to s. Hence,

we introduce margins and to compensate for this in-accuracy. If no test cube in can be added into withoutexceeding or (before relaxations), then we select a

test cube in that does not make the number of care bits ingreater than or the number of conflicting inputs

greater than , and relax overspecified bits in that test

cube. Assume that margins and are both set to 1.Notest cuberemainingin can beaddedinto without ex-

ceeding or before relaxations. However, adding

makes the number of specified bits 7 (not greater than) and the number of conflicting inputs 2. Hence, is se-

lected as the next candidate. Assume the 1 assigned at is re-

laxed to . Hence, is added into . Adding does notchange . Since adding into makes the number of con-flicting inputs in 3 and the number of care

bits 7 , is selected as the next candidate.Assume that no specified bits in can be relaxed. Hence,cannot be added into , and thus returned to . No more test

cubes from can be added into without making the numberof specified bits in greater than or the numberof conflicting inputs greater than . Hence, forming

is completed.

We obtain F-pattern from. Next, a seed for is computed by using a linear solver.

We load the F-TPG with the computed seed and load the S-FIFOwith locations of conflicting inputs of , i.e., 2 and 6.patterns are generated by the decompressor. If there is any test

cube in that covers no test pattern in thesetof testpatterns generated bythe decompressorfor ,then more test patterns are generated by the decompressor until

all the four test cubes cover at least one test pattern generated bythe decompressor. We run fault simulation with the generatedtest patterns, which are fully specified, and drop all detectedfaults from the target fault lists of test cubes remaining in .

Note that of some test cubes are reduced due to droppedfaults. This process is repeated until all test cubes are removedfrom to merge the 12 test cubes in into four generators,

.

C. Overall Algorithm

Now the procedure to compute generators from a set of testcubes is summarized in the following.

1) Apply the uncompaction process (see Section IV-A) toa few exceptionally densely specified test cubes in .

Define and . .2) Unmark all test cubes in . and

. Select a test cube that has the largest numberof faults in its target fault list from , relax overspecified

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





bits in the test cube, and move it to . Update

accordingly.3) If is empty, go to step 5. If there is at least one test cube

in that can be added into the current test cube subsetwithout making the number of s in greater than

or the number of care bits in greater than ,

then select test cube from those test cubes that willcause the minimum number of new s in , add thetest cube into after relaxing overspecified bits in

, and update accordingly. Otherwise, go to step 4.

Repeat step 3.4) If there is at least one unmarked test cube in that

does not make the number of s in greater than

or the number of care bits in greaterthan when it is added into , then select atest cube randomly among these test cubes and relax

overspecified bits in the test cube . Otherwise, go tostep 5. If the relaxed test cube can be now added to

without making the number of s greater than

or the number of care bits in greater than , thenadd test cube into and update accordingly.Otherwise, mark and put back into . Repeat step 4.

5) Compute a seed for , generate test patterns bysimulating the decompressor, fault simulate with the testpatterns generated by the decompressor using , and

drop the detected faults from the target fault list of everytest cube in and eliminate test cubes whose target faultlists become empty. . If is empty, then exit.

Otherwise, go to step 2.

V. VARIATIONS OF THE S-TPG

A. Scheme to Generate More Variable Patterns

In Fig. 2, since only two inputs, and , are assigned s in, all the four test patterns will differ only at and . Hence,

only a few new faults will be detected by each test pattern afterthe first test pattern. This leads to an increase in the number

of generators. The decompressor shown in Fig. 5(a) can gen-erate test patterns with more variations. Because of the toggleflip-flop inserted between the select signal of the multiplexer

and the output of the comparator, for shown in Fig. 5(a), thesignal is set to 1 at the fifth (tenth) scan cycle and stays at 1until the eighth (14th) scan cycle. As a consequence, the set of consecutiveinputs , , and , and another set of consecutive

inputs , , and can be assigned different values ineach test pattern.

B. Scheme to Reduce Test Sequence Length

In test cube subset shown in Fig. 4(b), and , whichare assigned s in , are assigned 0X in , 11 in , and

10 in , respectively. In order to guarantee detecting all faultsthat are detected by , , and , the decompressor shouldcontinue generating test patterns using the same generatoruntil it generates three test patterns, each of which, respectively,

assigns 01 or 00, 11, and 10 to and . Since the R-TPGis free running, the R-TPG may not generate test patterns that

assign desired values to the conflicting inputs of the generatorfor a long period of time. Even though this does not increase the

Fig.5. Variations of proposeddecompressor. (a) Scheme to generate morevari-able patterns. (b) Scheme to reduce test sequence length.

test data volume, it can increase test application time, which isalso one of the important terms that determine test cost.

Using the decompressor shown in Fig. 5(b), the number of

test patterns generated for each generator can be reduced. Notethat the R-FIFO inside the R-TPG is loaded with 00, 11, and 10,which are covered, respectively, by 0X, 11, and 10 (the valuesthe conflicting inputs and are assigned in , , and ).

In each capture cycle, the first entry of the R-FIFO is loaded intothe shift register, and the other entries in the R-FIFO are shiftedup by one entry. Then, in every scan shift cycle when the counter

value equals the output value of the S-FIFO, i.e., is set to 1,the last bit in the shift register is shifted into the scan chain. If

test cubes are merged into , this version of decompressorneeds to generate only test patterns for .

VI. EXTENSION TO MULTIPLE SCAN CHAINS

A. Decompressor Architecture

Fig. 6 depicts an implementation of the proposed de-

compressor for a circuit with multiple (512) scan chains.For convenience of illustration, assume that all of the 512scan chains comprises 256 scan flip-flops without any lossof generality (hence, the design has a total of 131 072

scan flip-flops). Scan chains are organized into 64 groups,

. Although in this particularexample, every group contains the same number (8) of scanchains, it is not necessary for every group to have the samenumber of scan chains. A multiplexer is inserted before the

input of each scan chain , where , toselect a scan pattern source between the output of the F-TPG

and the output of the R-TPG . The select inputs for

all eight multiplexers in each group are driven by theoutput of a common two-input AND gate. Each entry of the S-FIFO for the multiple scan chain version is divided intotwo sections: one for the identification number of the group a

conflicting input belongs to (group ID) and the other for thelocation of the conflicting input in the scan chain. For example,

the first (topmost) entry in the FIFO, shown in Fig. 6, has 1 forthe group ID and 13 for the scan input location. The group ID

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





Fig. 6. Decompressor for 512 scan chains.

of the first entry is input to the decoder to generate signals ,where , each of which controls the AND gate in

the corresponding group , and the scan input locationof the first entry is input to the comparator. The output of thecomparator is set to 1 if the content of the first entry equals

the counter value. Otherwise, is set to 0.The main purpose of organizing scan chains into groups is to

reduce hardware overhead for the decoder (if scan chainsare grouped into each group, then the number of outputs of thedecoder can be reduced by a factor of ). Grouping can also

reduce test data volume to be stored in the ATE memory andthe number of storage elements required for the S-FIFO. If wereduce the number of chains in each group from eight to fourfor the decompressor shown in Fig. 6, then the total number of

groups increases from 64 to 128. The 6-to-64 decoder should bereplaced by a 7-to-128 decoder. We also need 64 more two-input

AND gates and extra routing to connect the 64 additional AND

gates to the outputs of the decoder. In addition, each group IDsection of the S-FIFO needs one more bit. As an extreme case,if scan chains are not grouped, i.e., each group has only onescan chain, then the decoder with 512 outputs is required andthe S-FIFO needs 9 bits for its group ID section.

In Fig. 6, the S-FIFO is loaded to generate test patterns forgenerator that has three conflicting inputs ( s): the 13th scaninput of scan chain , the 13th scan input of , andthe 224th scan input of . The 13th scan inputs of scan

chains are assigned s in . Likewise,the 224th scan inputs of all scan chains in group except the224th scan input of are assigned s. The S-FIFO is

loaded w ith t wo v alid e ntries and (the l ast e ntryis not valid). Since the group ID field of the first entry

is 1, initially signal is set to 1 and all the other outputs of the decoder are set to 0. In the 13th scan shift cycle, the output

of the comparator is set to 1 and the output of the AND gatetransitions to 1. Therefore, all scan chains , where

, in are loaded with the values generatedby the R-TPG (all the other scan chains in the design are loaded

Fig. 7. Updating generator values.

with the values generated by the F-TPG) in the 13th shift cycle.

Then, the entries of the FIFO are rotated up by one entry andbecomes the first entry. In the 224th scan shift cycle,

the scan chains in are loaded with the values generated

by the R-TPG, and the entries of the FIFO are again rotated upby one entry. When the scan test pattern is fully loaded into thescan chains (at the 256th scan shift cycle), the counter is reset to0. This makes the entries of the S-FIFO rotated up again and the

S-FIFO returns to its initial state. This is repeated for alltest patterns, which are generated from .

B. Computing Generators for Multiple Scan Chain Design

Since all eight multiplexers in a group are controlledbythe samesignal , the th scaninputs of all eight scanchainsin receive their scan values from the same TPG, eitherthe R-TPG or the F-TPG, in any scan shift cycle. Hence, duringthe process of computing generators (see Section IV), if adding

a test cube into the current test cube subset causes a conflict atthe th input of scan chain that belongs to , i.e.,changes the th input of in the current generator to

from , where or 1, then the generator values assigned atthe th inputs of the other scan chains that belong to the same

group and are currently assigned binary values (0 or 1)in the current generator should also be changed to s. (If the thscan input of a scan chain in the same group is currently assigned

an in the current generator, its value need not be changed.)Fig. 7 illustrates the addition of a new test cube into the cur-

rent test cube subset. The 13th scan input of scan chain(denoted by ) is assigned a 1 in the generator , while

the 13th scan input of scan chain , which belongs to thesame group as , is assigned a 0 in before is added.Now, the test cube in which is assigned a 0 is added into

the current test cube subset to cause a conflict. Therefore, the 1at in is now changed to ( after is added intothe current test cube subset is denoted by in Fig. 7). Eventhough is assigned an in (an does not cause a con-flict with any value), the 0 at is also changed to in

due to at in . On the other hand, since inputis assigned in before is added and also assigned in

, input holds its previous value in even afteris added.

For the reason described in the previous paragraph, if eachgroup contains large number of scan chains, then the numberof conflicting inputs will quickly reach and the number

of test cubes that can be added into each test cube subset willdecrease. This will, in turn, increase the total number of genera-tors and decrease compression. On the contrary, if the number of scan chains in each group is too small, then it will increase hard-

ware overhead and also test data volume. The optimal number of scan chains in a group should be determined by considering the

number of specified bits in test cubes; if test cubes are sparselyspecified, large sizes of groups will be preferable.

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





C. Hardware Overhead

Extra hardware required to implement the proposed decom-pressor shown in Fig. 6, excluding the F-TPG, which is alsorequired for a regular LFSR reseeding technique, is the S-TPG,64 two-input AND gates, and 64 2-to-1 multiplexers. The S-TPG

is, in turn, composed of a 3 14 (6-bit group identification and

8-bit scan flip-flop location) FIFO, a 6-to-64 decoder, an eight-stage counter, and an eight-bit comparator. Since the R-TPG can

be shared with the F-TPG, hardware overhead for the R-TPGis not considered. The gate equivalent (the number of gates intwo-input NAND gates) for the 6-to-64 decoder is 385 and the

gate equivalent for the 8-bit comparator is 116 (we synthesizedthe decoder and the comparator by Synopsis Design Compiler).If we assume that the gate equivalent for a storage element is 6,then the total gate equivalent for the 3 14 FIFO is 252. The

eight-stage counter can be implemented with 48 NAND gates.Since a 2-to-1 multiplexer can be implemented with four NAND

gates, the gate equivalent for the 64 2-to-1 multiplexers is 256.The total gate equivalent for all aforementioned components is

about 1100. Considering the size of the design, which has morethan 130 000 flip-flops, overhead for 1100 two-input NAND gateswill be almost negligible (if we assume that the gate equivalentof a scan flip-flop is 10, the overall overhead of the proposed

decompressor is ). Note that

this does not consider the combinational part and memory of thedesign. If we consider them, it will be much lower. The widthof the S-TPG (and also the counter and the comparator) is log-

arithmically proportional to the scan depth (the number of scanflip-flops in the longest scan chain). Hence, hardware overheadwill not increase significantly even for larger designs.

VII. APPLICATION FOR HIGH -DETECTION COVERAGE

A. -Detection Property of the Proposed Decompressor

In Fig. 8(a), assume that the deterministic test cubes , ,and , which are merged into generator , respectively, detect

, and . Typically, a fault can be detected by many dif-ferent test cubes. Assume that the test cube

detects and

detects . The decompressor generates patterns , , , andfrom as shown in Fig. 8(b) (only the inputs that are as-

signed binary values in are specified). Although inputs ,, and are assigned the same values in all these four test pat-

terns, all faults , , and are detected. This implies that ,

, and are highly correlated with each other, and it will beeasy to make fault(s) that are detected by a test pattern , ,2, 3, or 4, detected by another test pattern , where and

, 2, 3, or 4, by carefully assigning binary values to the

inputs that are assigned s or s in . Since and areassigned s in , all faults that are detected by , , and

can be detected independent of the values assigned to and

. On the other hand, and , which are assigned s in ,should be assigned the proper values to make detectthe faults detected by , , and , i.e., and should beassigned 10, 01, and respectively, at least in one pattern

, where . The conceptual decompressor shown inFig. 8(d) generates such patterns. Assume that the U-TPG gen-

erates 10, 01, 11, and 00 for the inputs and respectivelyin , , , and . If and , which are assigned s in ,

Fig. 8. Generating 2-detection test patterns.

are assigned 10, 01, 00, and 11, respectively, in , , , and

, then all faults , , and are detected by two differentpatterns as shown in Fig. 8(c).

The proposed decompressor [except the variation-R shown

in Fig. 5(b)] generates test patterns from every generator.Let the number ofdeterministic test cubes in be andtheset of faults detected by these test cubes be . Typically,is smaller than . Note that generating only patterns

by the decompressor from is enough to detect all faults inonce. Hence, the remaining patterns can be

used to detect hard faults, i.e., faults that have been detected byfewer than test patterns. For example, in Fig. 8(c), generating

, , and detects all the faults , , and . Hence, is

indeedgenerated to detect hard faults and by one more testpattern. If , instead of and , is a hard fault, then we willgenerate that detects .

B. Decompressor Architecture for High -Detection Coverage

Fig. 9(b) shows an implementation of the proposed -detec-tion decompressor for shown in Fig. 9(a). Like the varia-tion-I shown in Fig. 5, the S-FIFO inside the S-TPG is loaded

with locations of pairs of inputs and , where , isassigned a , is assigned a non- , i.e., 0, 1, or , and all theinputs between and are assigned s in . For example,

in the generator shown in Fig. 9(a), is the first input that isassigned a non- value after ( and between andare assigned s). Hence, the locations, 5 and 8, of the input pair

and are loaded. Likewise, the locations of another input

pair, and , are loaded after the pair 5 and 8.

In each capture cycle, both the T and the D flip-flop are resetto 0 and . The modulo-16 counter is reset to 0 in thesame cycle, and then, increments by 1 thereafter at every shiftcycle. When , where or 1, patterns generated

by the F-TPG are scanned into the scan chain. The F-TPG isloaded with a seed for before test patterns are generated for

. The T flip-flop flips its state when the counter value equals

the output of the S-FIFO. Since the first entry of the S-FIFO is5, the T flip-flop flips to 1 in the fifth scan shift cycle andbecomes 10. Then, all entries in the S-FIFO rotate up by oneentry (the first entry becomes 8). When , the U-TPG is

selected as the pattern source for the scan chain. The one storedin the last stage of the shift register of the U-TPG is scanned

into the scan chain (hence, will be assigned 1) and the shiftregister shifts right by 1 bit. The D flip-flop is set to 1 one cycle

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





Fig. 9. n -detection decompressor architecture. (a) Optimizing generator. (b)Decompressor. (c) Timing diagram.

after the T flip-flop is set to 1. The T and D flip-flops hold their

states 11, and the R-TPG is selected as the pattern source for thescan chain until the counter value becomes 8. When the contentof the counter becomes 8, the T flip-flop flips back to 0 and the

entries of the S-FIFO rotate up by one entry again (10 becomes

the first entry). Since , test patterns generated by theF-TPG are scanned into the scan chain until the content of thecounter becomes 10. In the tenth scan shift cycle, the T flip-flop

flips to 1 and the U-TPG is selected as the pattern source to scanthe 1 at the last stage of the shift register into the scan chain. Inthe next cycle, since (the D-flip-flop is set to 1), the

R-TPG is selected as the pattern source. The T and D flip-flopshold their states 11 until the content of the counter becomes 14.

After a scan pattern is fully loaded into the scan chain, the scanchain captures the response to the loaded scan pattern. In thesame cycle, the shift register inside the U-TPG is loaded with

the first entry in the U-FIFO and the entries in the U-FIFO shiftup by one entry. This is repeated for the other patterns, , ,and . Fig. 9(c) presents timing diagrams for related signals.

The compression scheme for -detection testing requires

more storage bits than that for 1-detection testing. However,the increase in test data volume due to U-FIFO data is verysmall; if (maximum eight test cubes can be merged

into each generator), then maximum data for the U-FIFO areonly 24 bits per generator. For the same reason, the increase inhardware overhead due to the U-FIFO is very small.

C. Computing Generators for High -Detection Coverage

Like generators for 1-detection testing, generators for -de-

tection testing are also computed from 1-detection test cubes

generated by a regular ATPG tool by the same algorithm de-scribed in Section IV. The computed generators are divided

into two different groups according to the number of test cubes

from which each generator is derived: single cube generators (orSCGs) and multiple cube generators (or MCGs). As the nameimplies, the test cube subset for each SCG contains only one

test cube (because adding any more test cube into the test cubesubset makes the number of conflicting inputs greater than

or the number of specified bits greater than ). Note thatSCGs have no s. In contrast, MCGs are derived from test cubesubsets that contain more than one test cube.

After computed generators are divided into MCGs and SCGs,

each MCG is expanded into patterns, which representpatterns generated by the decompressor during test application,and are denoted by in Fig. 9(a). These ex-

panded patterns are only partially specified; only two differenttypes of inputs are specified. The first type of specified inputsare the inputs that are assigned binary values in . If input

is assigned a binary value in , then is assigned inall patterns generated by the decompressor from . Forexample, in of Fig. 9(a), , , , , and are

assigned the values that these inputs are assigned in . As thesecond type of specified inputs, if input is assigned a none-in and is assigned , where , and the inputs be-

tween and are assigned s in , then the inputs betweenand are specified in all of the expanded patterns for .

The binary values assigned to the latter type of inputs in the ex-

panded patterns are determined by simulating the R-TPG (thevalues for these inputs are provided by the R-TPG during testapplication). Unlike the first type of specified inputs, the second

type of specified inputs are assigned different values in differentexpanded patterns of . The values for the remaining unspec-ified inputs in , where , are assigned laterto maximize -detection coverage.

After every generator is expanded into partially spec-ified patterns, for each fault in the fault list, we compute thedetection count, i.e., the number of patterns that detect , by faultsimulating the expanded patterns for the entire MCGs and SCGs(at this stage, SCGs are not expanded). During the fault simu-

lation, faults that are detected by patterns are dropped fromthe fault list. Note that if a fault is detected by different ex-panded patterns, which are partially specified, then test patterns

generated by the decompressor during real test application def-initely detect fault or more times.

Next, some inputs that are currently s ( inputs, for short)in expanded patterns , where , are now

specified to detect hard faults in the next procedure. This pro-cedure is similar to dynamic compaction procedure [26] of theATPG process. Test patterns after this procedure are represented

by , , , and in Fig. 9(a). We first select an ex-panded pattern of and imply the values assigned ininto internal circuit lines (since is partially specified, onlysome internal circuit lines will be set to binary values). Then weselect a hard fault , i.e., a fault that is detected by fewer than

expanded patterns, from the fault list and try to detect byspecifying inputs of . If it is found that any combinationof input assignments cannot detect , then we select anotherhard fault from the fault list and repeat the process for the se-

lected fault. Otherwise (specifying some inputs in detects

), we update and other expanded patterns for to re-flect the additionally specified inputs in . For example, since

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





input is specified to 1 (0) in an expanded pattern [see

Fig. 9(a)], of is set to 1 (0) accordingly. Note thatinput is specified to 1 (0) in all , where ,of .

The earlier process is repeated for the other three expandedpatterns of , , , and , in sequence until the number

of specified bits in becomes . After all inputs inare tried, the same procedure is repeated for the next

generator , etc., until the expanded patterns for all MCGsare tried. Typically, specifying additional inputs can detect only

a few or no hard faults in most expanded patterns. Hence, mostrun time is wasted on trying to prove that a selected fault cannotbe detected by any combination of input assignments in each

expanded pattern. If there are many generators and many hardfaults in the fault list, then this procedure will require very longrun time.

To reduce run time, we quickly filter out the faults that cannotbe detected by each expanded pattern without wasting time ontrying to detect those undetectable faults. If a binary value is

implied at line by expanded pattern , then any combinationof input assignments cannot detect the stuck-at fault at .These faults are filtered out first. Then we filter out the faults

whose unique sensitization path [28] is blocked. To detect a fault, its fault effect must pass through the unique sensitization path

of . We identify unique sensitization paths for all hard faults

and store them in a preprocessing step. Finally, we apply -pathcheck [28] for the remaining faults; if there are no paths fromline to any output on which all lines are currently s, the

faults at cannot be detected by any combination of inputassignments.

Next, we generate additional test cubes that detect hard faultsfor every SCG bya modified ATPG until the numberof s inthe

generator becomes larger than or the number of care bitsbecomes larger than . After additional test cubes are addedinto test cube subsets for SCGs, we apply the same procedureused to specify inputs in expanded patterns of MCGs to thenew MCGs (note that once an additional test cube is added for an

SCG, the SCG becomes an MCG). Next, we compute a seed forof every generator and specify more s in the expanded

patterns by simulating the F-TPG with the seed. Then we update

detection counts for all faults remaining in the fault list and dropthe faults that are detected by different patterns.

Now the expanded patterns of every generator haveor fewer unspecified inputs, i.e., only the conflicting in-

puts of each generator have not been specified. Now we assignbest binary values to these inputs in each expanded pattern todetect more hard faults. This procedure is described in Fig. 10.

Assume that . For each expanded pattern , where, we generate test patterns, each of which is as-

signed a different combination of binary values to the inputs thatare assigned s in as shown in Fig. 10. Then, we fault sim-ulate the circuit with these patterns and select the test pattern

that detects the most hard faults among the patterns for eachexpanded pattern . When this procedure is complete, we candetermine fully specified test patterns that will be generated bythe decompressor. This procedure is repeated for the other gen-

erators.

The overall algorithm for computing generators for high-detection coverage is summarized as follows.

Fig. 10. Selecting best binary values for U s.

1) Generators are computed from a set of test cubes ,

and the computed generators are divided into MCGs andSCGs according to the number of test cubes in the testcube subset from which each generator is derived.

2) The algorithm expands each MCG into partially

specified patterns, and assigns binary values to the inputsthat are assigned binary values in the generator and theinputs whose scan values are provided by the R-TPG.

3) The algorithm computes the detection count for every

fault in the fault list by applying expanded patterns of MCGs and test cubes of SCGs to the design. .

4) The ATPG specifies inputs in each of the expanded

patterns of MCG to detect hard faults.5) If there are expanded patterns of whose inputs are

specified in step 4, update and the other expandedpatterns according to the values specified in these patterns.

6) Update the detection counts of the faults that are detected

in step 4. If there exists any MCG that has not beenprocessed, then , and go to step 4.

7) For each SCG , keep adding test cubes that detecthard faults to the corresponding test cube subset until the

number of conflicting inputs becomes greater thanor the number of specified bits becomes greater than

. Update all SCGs according to added test cubes.

8) Do steps 4–6 on the new MCGs that were made by addingmore test cubes to the SCGs in step 7.

9) For every expanded pattern generated from each generator(now all generators are MCGs), the algorithm finds

the best binary values for the inputs that are assigned sin to detect the most hard faults. Exit.

D. Multiple Scan Chain Design

Fig. 11 depicts the -detection version decompressor for acircuit with 512 scan chains. Like the 1-detection version de-

compressor shown in Fig. 6, the 512 scan chains are organizedinto 64 groups to reduce hardware overhead. Initially, the F-TPGis selected as the test pattern source for all scan chains. Since thegroup ID of the first entry of the S-FIFO is 1, signal is set to

1 and all the other signals , where , are set to

0. Since the scan input location of the first entry of the S-FIFOis 13, the T flip-flop inside the S-TPG flips to 1 in the 13th scan

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





Fig. 11. n -detection decompressor for 512 scan chains.

shift cycle, and the scan value generated by the U-TPG, i.e.,

the 0 stored in the last stage of the shift register in the U-TPG,is scanned into all the scan chains in (scan chains inthe other groups are loaded with scan values generated by the

F-TPG). Then, the S-FIFO is shifted up by one entry and thesecond entry becomes the first entry. Note that some of 14th scan inputs in are assigned binary values. Hence,the scan values for these scan inputs should be provided by the

F-TPG. On the other hand, the 14th to 83rd scan inputs of scanchains in are assigned only s in . Hence, thesescan inputs can be assigned values generated by the R-TPG, and

is selected as the second entry of the S-FIFO.In the next cycle, the 1 at the T flip-flop shifts into the D

flip-flop, and both and are set to 1 in . Hence thescan chains in receive their scan data from the R-TPGuntil the content of the counter becomes 84. In the 84th shift

cycle, the T flip-flop flips to 0 and all scan chains in the design

receive their scan data from the F-TPG. The operation describedbefore for the pair of entries and is repeated forother pairs of entries, which are located below in the

S-FIFO. The second to the last entry is . Hence, in the133th cycle, all the scan chains in receive value 1 fromthe U-TPG. The F-TPG will continue providing test patterns to

all scan chains until the scan pattern is fully loaded in the 256thshift cycle. Then, the response to the test pattern is captured intoscan flip-flops and the entriesin the U-TPG shiftup by one entry.

Note that the th scan inputs of all eight scan chains in

that are assigned s in a generator , e.g., the 13th scan inputs

of scan chains in and the 113th inputs of scan chains in, are assigned the same value from the U-TPG while test

patterns are generated for . (In contrast, in the 1-detectionversion shown in Fig. 6, these conflicting inputs are assigned

Fig. 12. Generator for multiple scan chain design.

different values from the R-TPG.) Hence, if the th inputs of scan chains in are assigned in , then the th inputsof all scan chains in should be assigned identical values

in everytest cubein the corresponding subset ( isidenticalwith any value, 0, 1, or ).

In Fig. 12, assume that and belong to the samegroup. First, is added into the current test cube subset .

Note that the 0 assigned at , the 13th input of , is notidentical with the 1 assigned at , the 13th input of ,in . Adding into causes a conflict at and updatesthe 0 at to in . Therefore, and will be as-

signed the same value during the test application although theyare assigned opposite values in . This problem is solved if wecan relax the 1 at (or the 0 at ) to in (it can berelaxed if all faults that are detected by the original test cube

can still be detected after the relaxation). Otherwise, i.e., if

neither the 0 at nor the 1 at in can be relaxed,is removed from , placed back to , and is updated

accordingly. If there are more test cubes where the 13th inputof any scan chain is assigned a value that is not identical to the

13th inputs of other scan chains in the group, then we repeat theearlier procedure for these test cubes until the 13th inputs of thescan chains in the group are always assigned identical values

in every test cube in . Assume that another test cube isadded into the current test cube. It causes a conflict at .However, since the 121st inputs of all scan chains in the groupare assigned identical values in every test cube , , and

(assume that values at the 121st inputs of the other scan chainsin the group than and are also identical), we donot have to relax the 121st input of any scan chain or removeany test cube from .

VIII. EXPERIMENTAL RESULTS

A. 1-Detection Testing

Table I compares compressions achieved by using only

LFSR reseeding (columns under the heading Only LFSR Re-

seeding), the proposed method along with LFSR reseeding, andother recent compression techniques [1], [2], [11]. For LFSR

reseeding, we used our proprietary high-compression LFSRreseeding technique. Scan cells were routed into ten scan chainsin every benchmark circuit. The columns # pat give the numberof test patterns applied and the columns # store (bits) give the

total number of (compressed) data bits that need to be stored inthe ATE memory. For the proposed method, we show results

obtained by using all the three different decompressor schemes:the basic scheme (see Fig. 2), variation-I [see Fig. 5(a)], and

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





TABLE IEXPERIMENTAL RESULTS

TABLE IIRESULTS ON INDUSTRIAL DESIGNS (VARIATION-R)

variation-R shown [see Fig. 5(b)]. For the LFSR reseeding(the heading Only LFSR Reseeding) and the proposed method,we first applied a sequence of pseudorandom patterns to drop

easy-to-detect faults. The number of initial pseudorandompatterns is shown in the parenthesis in the column # pat under

the heading Only LFSR Reseeding. Then, for the remainingundetected faults, we generated deterministic test patternsby an in-house ATPG and compressed them only by LFSR

seeding or the proposed compression method. The number of pseudorandom patterns applied to drop easy-to-detect faultsis included in the total number of test patterns reported inthe columns # pat . The columns # Gen show the number of

generators.The results clearly demonstrate that the proposed method can

efficiently improve compression ratios that are achieved when

only LFSR reseeding is used. Large reductions were achievedespecially for ITC benchmark circuits; numbers of storage bitsfor the proposed method are only about 1/2–1/3 of those of storage bits for LFSR reseeding for all ITC benchmark circuits.Note that the variation-R scheme reduced the number of storage

bits by a factor of about 3.4 for b17s without any increase in thenumber of patterns. Among the three different decompressors,the basic scheme achieved the highest compression and the vari-ation-R scheme generated the smallest number of patterns (the

number of decompressed patterns generated by the variation-Rdecompressor is always same as that of the deterministic pat-terns compressed by the proposed method).

We first compare ours with another hybrid BIST [11]. Since[11] applied very long (32 000 patterns) sequences of pseudo-random patterns, we also conducted experiments with long se-quences of pseudorandom patterns for fair comparisons. These

results are shown in the columns and . Evenif a shorter test sequence was applied, the number of storage bits

for the proposed method is a lot smaller than that of storage bitsin [11] for every circuit except s15850. Numbers of storage bits

in [2] are also larger than those of storage bits of the proposedmethod for most circuits. However, since numbers of test pat-terns are not reported in [2] and the compression depends on the

number of test patterns generated, fairness of the comparisonswith [2] is limited. We compare results of the proposed methodwith another multilevel compression method [1]. Seeds, which

are obtained by LFSR reseeding, are further compressed by aseed compression process in [1]. Finally, in the last column [29]FDR, compressions obtained by using frequency-directed runlength (FDR) codes [30] for the circuits whose scan cells are

specially routed to further reduce test data volumes are com-pared. Except s38417, the number of storage bits for the pro-posed method is much smaller than that of the seed compressiontechnique [1].

Table II shows results of gate delay patterns for indus-

trial designs. The broadside (launch-off-capture) scheme

was used to apply delay test patterns for every case. Thecolumn # FF gives the number of flip-flops in the circuit.

The columns under the heading Determin show results onhighly compacted deterministic delay test patterns generatedby an in-house ATPG. Results obtained by using the pro-posed method (the variation-R was used) are given under the

heading Proposed . The columns FE% give achieved fault effi-ciency, which is given by

. The compression ob-

tained by using the proposed scheme is shown in the lastcolumn, labeled CR. The compression is calculated as the ratio

of storage required for highly compacted deterministic patterns

to that required by the proposed scheme. Over 1600X compres-sion was achieved for D3 by the proposed method. Note that the

number of test patterns increased only about 33% against thedeterministic test set. About 500X compression was achievedfor D2 and the increase in the number of test patterns is only45%. Note that higher compressions are achieved for larger

designs. The column TR gives the factor of reduction in totaltest cycles (the number in the parenthesis of the same columnis the number of scan chains in the design). Since test patterns

are internally generated in the proposed method, the number of scan chains need not be limited by the number of scan channelsthat can be provided by the ATE. For deterministic test patternresults, we used 16 scan chains for every design. Since the

increase in pattern count is small, the proposed compressionmethod can also reduce the test application time significantly.

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





TABLE IIIn -DETECTION RESULTS

B. -Detection Testing

Experimental results for the -detection version are reportedin Table III. The column # sa flts gives the number of collapsedsingle stuck-at faults that were used to generate test patterns

while the column # br flts gives the number of bridging faultsused for bridging fault simulation. These faults were randomly

generated using nonfeedback AND / OR bridging fault model. Re-sults (columns under the heading Proposed ) obtained by theproposed compression technique are compared with results ob-tained by 5-detection test sets generated by an ATPG tool, whichwas implemented based on the algorithm proposed by Huang

[31] (columns under the heading 5-det ATPG). We also reportresults of traditional 1-detection test patterns (the columns underthe heading 1-det ATPG).

The column # stgs gives the number of stages for the F-TPGof the proposed decompressor. The number of patterns gener-

ated by the proposed decompressor is little larger than that of 5-detection ATPG patterns for most circuits except s13207 ands38417 (see columns # pat ). While for s13207, the proposed de-

compressor generated even smaller number of test patterns thanthe 5-detection ATPG, for s38417, the proposed decompressorgenerated about 5.8 times more patterns than the 5-detectionATPG. The total number of bits that need to be stored in the ATE

memory (the column stor bits) includes the F-TPG seeds, theU-FIFO data, and the S-FIFO data. The number of storage bitsfor the proposed technique is about 1/17–1/86 of that of storage

bits for the 5-detection ATPG, i.e., 17X–86X compressions areachieved by the proposed compression technique. Results showthat the proposed method can also achieve up to 19.4X com-pression against 1-detection ATPG test sets.

The column “ ” (“ ”) shows the number of

faults that are detected by less than three (five) test patterns.The test patterns generated by the proposed decompressorachieved over 99% 3-detection fault coverage for all circuitsexcept s13207; 171 faults were detected less than three times

for s13207. This is mainly because the ATPG aborted gener-ating test cubes for large number (76) of faults when it wasgenerating 1-detection test cubes to be compressed by the

proposed method. The 5-detection ATPG, which was modifiedfrom the same ATPG, also gave up generating 5-detectiontest patterns for many (79) faults for s13207. Test patternsgenerated by the proposed decompressor achieved very high

5-detection fault coverage (very close to 100%) for the twolargest benchmark circuits, s38417 and s38584. Note that the

1-detection test cube sets achieved very low 5-detection faultcoverage (see the column “ ” under the heading 1-det

ATPG). The columns bridge cov% compare bridging faultcoverage achieved by test patterns generated by the proposed

decompressor, and 1- and 5-detection ATPG test patterns. Thetest patterns generated by the proposed decompressor achievedhigher bridging fault coverage than 1-detection ATPG test sets

but lower coverage than 5-detection ATPG test sets.

IX. CONCLUSION

In this paper, a test data compression scheme that can be usedto further improve compressions achieved by LFSR reseeding

is presented. The proposed method consists of a novel decom-pressor architecture and an efficient algorithm to compute gen-erators (weight sets) that lead to minimum test data volume.

The proposed decompressor can be implemented with very lowarea overhead. Two variations of the decompressor, which canbe adopted for different testing requirements such as short testtime application, are also proposed. Unlike most commercial

test data compression tools, the proposed method requires nospecial ATPG that is customized for the proposed compressionscheme, and can be used to compress test patterns generated by

any ATPG tool.Experimental results show that the proposed method can

effectively improve compressions achieved by LFSR reseedingwithout increasing test sequence length significantly. Over1600X compression was achieved for a large industrial design

with only about 30% increase in the number of test patternsagainst a highly compacted deterministic test set. Numbers of test patterns generated by the proposed method are comparableto those of highly compacted deterministic test patterns for

most circuits. The test data to be stored in the ATE memory aremuch smaller than that for previously published schemes, andthe number of test patterns that need to be generated is smaller

than other weighted random pattern testing schemes.The proposed test data compression scheme is extended toachieve high -detection coverage with little modification. The

-detection version of the proposed compression technique first

merges a 1-detection test set generated by a regular ATPG intoseveral generators. Then, generators are modified to achievehigh -detection coverage. Since the test data are compressedfrom a 1-detection test set rather than an -detection test set,the proposed technique can achieve high compression. Exper-

imental results demonstrate that the proposed technique canachieve high -detection fault coverage.

REFERENCES

[1] C. V. Krishna and N. A. Touba, “Reducing test data volume usingLFSR reseeding with seedcompression,” in Proc. Int. Test Conf., 2002,pp. 321–330.

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-





[2] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, “Embedded deter-ministic test,” IEEE Trans. Comput.-Aided Design Integr. Circuit Syst.,vol. 23, no. 5, pp. 776–792, May 2004.

[3] B. Könemann, “LFSR-coded test patterns for scan designs,” in Proc.

Eur. Des. Test Conf., 1991, pp. 237–242.[4] L.-T. Wang, C.-W. Wu, and X. Wen , VLSI Test Principles and Archi-

tectures. San Francisco, CA: Morgan Kaufmann, 2006.[5] P. Wohl, J. A. Waicukauski, S. Patel, and M. B. Amin, “Efficient com-

pression and application of deterministic patterns in a logic BIST archi-tecture,” in Proc. IEEE-ACM Design Autom. Conf., 2003,pp. 566–569.[6] International Technology Roadmap for Semiconductors, “Test & Test

Equipment,” 2005.[7] M. Bershteyn, “Calculation of multiple sets of weights for weighted

random testing,” in Proc. Int. Test Conf., 1993, pp. 1031–1040.[8] A. P. Strole and H.-J. Wunderlich, “TESTCHIP: A chip for weighted

random pattern generation, evaluation, and test control,” IEEE J. Solid-

State Circuits, vol. 26, no. 7, pp. 1056–1063, Jul. 1991.[9] B. Könemann, “Care bit density and test cube clusters: Multi-level

compression opportunities,” in Proc. IEEE Int. Conf. Comput. Des.,2003, pp. 320–325.

[10] B. Könemann, “STAGE: A decoding engine suitable for multi-com-pressed test data,” in Proc. Asian Test Symp., 2003, pp. 142–145.

[11] A. Jas, C. V. Krishna, and N. A. Touba, “Weighted pseudorandom hy-brid BIST,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12,no. 12, pp. 1277–1283, Dec. 2004.

[12] S. Wang, K. J. Balakrishnan, and S. T. Chakradhar, “XWRC: Exter-nally-loaded weighted random pattern testing for input test data com-pression,” presented at the Int. Test Conf., Austin, TX, 2005.

[13] S. Wang, “Low hardware overhead scan based 3-weight weightedrandom BIST,” in Proc. Int. Test Conf., 2001, pp. 868–877.

[14] I. Pomeranz and S. Reddy, “3-weight pseudo-random test generationbased on a deterministic test set for combinational and sequential cir-cuits,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 12,no. 7, pp. 1050–1058, Jul. 1993.

[15] B. Könemann, K. D. Wagner, and J. A. Waicukauski, “Hybrid patternself-testing of integrated circuits,” U.S. Patent 005 612 963A, May 18,1997.

[16] S. Wang, W. Wei,and S. T. Chakradhar, “A high compressionand shorttest sequence test compression technique to enhance compressions of LFSR reseeding,” in Proc. Asian Test Symp., 2007, pp. 79–86.

[17] I. Polian, I. Pomeranz, S. M. Reddy, and B. Becker, “Exact computa-tion of maximally dominating faults and its application to n -detectiontests for full-scan circuits,” Proc. Inst. Electr. Eng., vol. 151, no. 3, pp.235–244, May 2004.

[18] C.-W. Tseng, S. Mitra, S. Davidson, and E. J. McCluskey, “An evalua-tion of pseudo random testing for detecting real defects,” in Proc. VLSI Test Symp., 2001, pp. 404–409.

[19] I. Pomeranz and S. M. Reddy, “On test data compression and n -de-tection test sets,” in Proc. IEEE-ACM Des. Autom. Conf., 2003, pp.748–751.

[20] S. Lee, B.Cobb,J. Dworak, M.R. Grimaila, and M.R. Mercer, “AnewATPG algorithm to limit test set size and achieve multiple detectionsof all faults,” in Proc. Des. Autom. Test Eur., 2002, pp. 94–99.

[21] S. C. Ma, P. Franco, and E. J. McCluskey, “An experimental chip toevaluate test techniques experiment results,” in Proc. Int. Test Conf.,1995, pp. 663–670.

[22] K.-J. Lee, J.-J. Chen,and C.-H. Huang, “Usinga singleinput to supportmultiple scan chains,” in Proc. IEEE Int. Conf. Comput.-Aided Des.,

1982, pp. 74–78.[23] I. Hamzaoglu and J. H. Patel, “Reducing test application time for full

scan embedded cores,” in Dig. Papers, 29th Int. Symp. Fault-Tolerant

Comp., 1999, pp. 260–267.[24] H. Tang, G. Chen, C. Wang, J. Rajiski, I. Pomeranz, and S. M. Reddy,

“Defect aware test patterns,” in Proc. Des. Autom. Test Eur., 2005, pp.450–455.

[25] S. Wang, Z. Wang, W. Wei, and S. T. Chakradhar, “A low cost test datacompression technique for high n -detection fault coverage,” in Proc.

Int. Test Conf., 2007, pp. 1–10.[26] P. Goel and B. C. Rosales, “Test generation & dynamic compaction of

tests,” in Dig. Papers Test Conf., 1979, pp. 182–192.[27] J.-S. Chang and C.-S. Lin, “Test set compaction for combinational cir-

cuits,” IEEE Trans. Comput.-Aided Des. Integr. Circuit Syst. , vol. 14,no. 11, pp. 1370–1378, Nov. 1995.

[28] M. Abramovici, M. A. Breuer, and A. D. Friedman , Digital SystemsTesting and Testable Design. New York: Computer Science, 1990.[29] S.-J.Wang, K. S.-M. Li, S.-C. Chen, H.-Y. Shiu, andY.-L.Chu, “Scan-

chain partition for high test-data compressibility and low shift powerunder routing constraint,” IEEETrans.Comput.-Aided Des. Integr. Cir-

cuit Syst., vol. 28, no. 5, pp. 716–727, May 2009.[30] A. Chandra and K. Chakrabarty, “Test data compression and test re-

source partitioning for system-on-a-chip using frequency-directed run-length (FDR) codes,” IEEE Trans. Comput.-Aided Design Integr. Cir-

cuit Syst., vol. 52, no. 8, pp. 1076–1088, Jun. 2002.[31] Y. Huang, “On N-detect pattern set optimization,” presented at the Int.

Symp. Quality Electron. Design, San Jose, CA, 2006.

Seongmoon Wang received the B.S. degree fromChungBuk National University, Cheongju, Korea,in 1988, the M.S. degree from Korea Advanced In-stitute of Science and Technology, Daejeon, Korea,in 1991, and the Ph.D. degree from the Universityof Southern California, Los Angeles, in 1998, all inelectrical engineering.

He was a Design Engineer at GoldStar Electron,Korea, and a DFT Engineer at Syntest Technologiesand 3Dfx Interactive. He is currently a SeniorResearch Staff Member in the NEC Laboratories

America, Princeton, NJ. His current research interests include design fortestability, computer-aided design, and self-repair/diagnosis techniques of very-large-scale integration.

Wenlong Wei received the B.S. degree in biologicalscience from Nanjing University, Nanjing, China, in1996, and the M.S. degree in electrical engineeringfrom the University of Texas, Arlington, in 2004.

In December 2004, he joined the NEC Laborato-ries America, Princeton, NJ, where he is currentlyan Associate Research Staff Member. His current re-search interests include test compression, low-powertest, and defect diagnosis.

Zhanglei Wang received the B.Eng. degree from Ts-inghua University, Beijing, China, in 2001, and theM.S.E. and Ph.D. degrees in computer and electricalengineering from Duke University, Durham, NC, in2004 and 2007, respectively.

He is currently a Hardware Engineer at Cisco Sys-tems, Inc., San Jose, CA. His current research inter-ests include test compression, test pattern grading,test generation, high-speed test, and system-level testand diagnosis.

Documents

A Low Overhead High Test Compression