[IEEE 2006 IEEE Instrumentation and Measurement Technology - Sorrento, Italy (2006.04.24-2006.04.27)] 2006 IEEE Instrumentation and Measurement Technology Conference Proceedings -

IMTC 2006 – Instrumentation and Measurement Technology Conference

Sorrento, Italy 24-27 April 2006

A Software-Based Method for Test Vector Compression in Testing

System-on-a-Chip

Satyendra Biswas1 and Sunil R. Das2,3

1Department of Electrical Engineering and Technology Georgia Southern University, Statesboro, GA 30460, USA

2School of Information Technology and Engineering, Faculty of Engineering University of Ottawa, Ottawa, ON K1N 6N5, Canada

3Department of Computer and Information Science Troy University, Montgomery, AL 36103, USA

{sbiswas, das}@site.uottawa.ca

Abstract – A new software-based hybrid test vector compression method for testing system-on-a-chip (SOC) using an embedded

processor is presented in this paper. In the proposed approach, a software program is first loaded into the on-chip processor memory core together with the compressed test data set. In order to reduce on-chip storage as well as testing time, the large volume of test data input is compressed in a hybrid fashion before being downloaded into the processor. The method combines a set of adaptive coding techniques for the required test data compression. The compression program, however, need not

be loaded into the embedded processor, since only the decompression of test data is necessary for application by the automatic test equipment (ATE). Most importantly, this software-based hybrid scheme requires minimal hardware overhead, while the on-chip embedded processor core can be reused for normal operation after the testing is completed. In the paper, only the compression part of the technique is presented, and the efficiency of the suggested hybrid approach is demonstrated through

simulation experiments on ISCAS 85 combinational and ISCAS 89 full-scan sequential benchmark circuits.

Keywords – Automatic test equipment (ATE), Burrows Wheeler transformation (BWT), frequency-directed run-length (FDR) coding, intellectual property (IP) core, system-on-a-chip (SOC) test.

I. INTRODUCTION

Very large-scale integration has added enormous

complexity to the process of testing integrated circuits, thereby increasing the cost of electronic components. The complexity of testing system-on-a-chip (SOC) integrated circuit has also been considerably enhanced, because of the large number of intellectual property (IP) cores that are now being used on a single piece of silicon. In order to effectively test these systems, each intellectual property (IP) core must be adequately exercised with a set of pre-computed test patterns as provided by the core vendor (Fig.1). A higher level of circuit integration results in larger volume of test data and higher testing time, since the entire set of test vectors for all the component cores in an SOC

must have to be stored and applied. This complexity is due mainly to a reduction in the ratio of externally accessible points (limited SOC primary inputs and outputs) to internal inaccessible points [1] of these chips. However, the input-output (I/O) channel capacity, speed, and data memory of traditional automatic test equipment (ATE) are still limited leading to higher cost. Therefore, new techniques to reduce the overall testing time and storage capacity are always sought after.

Fig. 1. A conventional architecture for SOC testing.

Recently, there has been an emerging trend of hybrid SOC design (viz. SOC containing an embedded processor core) [2-4]. The use of embedded processor has expanded SOC design flexibility, reduced design risk, and lengthened SOC product lifetime significantly by allowing devices to adapt to changing standards and extra features to be added over time [3]. Another major advantage of this type of SOCs is that the embedded processor can be used

Test Vector/Expected Response

User-Defined Test Access Mechanism (TAM) MUX-Based Direct Access

Wrapper Wrapper Wrapper

Core 1 Core 2 Core 3

SOC

3590-7803-9360-0/06/$20.00 ©2006 IEEE

to run software for decompressing the pre-compressed test data rather than employing special decompression hardware circuitry. The compression program is used for compressing the test data volume before downloading the data into the processor memory along with the decompression software. The processor executes a small decompression program, and then the test vectors are applied to individual cores in the SOC. Since the compression process does not have any direct effect on the testing time, compression can be done in advance before downloading the compressed test set into the embedded processor for testing. However, the compression technique must be efficient enough not only to reduce the total test data volume, but also be simple in decompression. Several techniques have been proposed for cost-effective compression of test data volume and for decompression algorithm. A general procedure for statistically encoding the test vectors for full-scan sequential circuits using selective Huffman coding is presented in [5]. A deterministic test vector compression technique for systems-on-a-chip using block matching has been described by Jas and Touba in [6]. The authors also proposed a scheme for compression and decompression of test data using cyclical scan chains [7]. It uses careful ordering of the test set and formation of cyclical scan chains to achieve compression with run-length codes. An effective approach for compressing test data using run-length coding and Burrows Wheeler transform was presented by Ishida et al. [8]. Iyengar et al. [9] developed the notion of statistically encoding test data. They described a BIST scheme for non-scan circuits based on statistical coding using comma codes (very similar to the Huffman codes) and run-length coding. The test vector compression techniques based on Golomb codes have been proposed by Chandra and Chakrabarty [10], [11]. They also proposed a technique of frequency-directed run-length (FDR) coding [12]. In this paper, a new hybrid technique has been presented for efficiently implementing data compression and decompression using an embedded processor core. In the proposed compression method, the block-matching technique [6] is employed first in order to distinguish between the low frequency and high frequency sets of test vectors. Then the Burrows Wheeler transformation is applied only on high frequency data sets. The transformed data sets from high frequency data along with the low frequency data sets are compressed using the Huffman coding algorithm. The decompression program is very simple and compact, and performs decompression very quickly so that both the test data volume and testing time are significantly reduced. The paper is organized as follows. Section II provides a brief overview of the Burrows Wheeler transform (BWT) and Huffman coding, these being necessary to comprehend the proposed technique. In Section III, the details of the suggested approach and related algorithms are explained.

Some partial simulation results are given in Section IV, while Section V provides concluding remarks.

II. BACKGROUND

A. Burrows Wheeler Transform

An enormous amount of complicated problems has been solved by performing a transformation from one form into another in mathematics, especially in the area of digital signal processing. Burrows and Wheeler [13], [14] have released the details of a transformation function that opens the door to some revolutionary new data compression techniques. The Burrows Wheeler Transform or BWT can transform a block of data into a format that is extremely well suited for compression. Most of today's familiar lossless compression algorithms operate in streaming mode, reading a single byte or a few bytes at a time. The BWT is performed on an entire block of data at once. This transformation takes a block of data and rearranges it using a sorting algorithm. The resulting output block contains exactly the same data elements it started with, differing only in their ordering.

Fig. 2. The original set of strings (S0) associated with the buffer.

Fig. 3. The set of strings after sorting.

S0 D R D O B B S

S1 R D O B B S D

S2 D O B B S D R

S3 O B B S D R D

S4 B B S D R D O

S5 B S D R D O B

S6 S D R D O B B

S4 B B S D R D O

S5 B S D R D O B

S2 D O B B S D R

S0 D R D O B B S

S3 O B B S D R D

S1 R D O B B S D

S6 S D R D O B B

F L

360

The transformation is a reversible process, meaning thereby that the original ordering of the data elements can be restored with no loss of fidelity. Also, a block of data transformed by the Burrows Wheeler transformation (BWT) can be compressed using any or a combination of standard techniques. The BWT is explained using a sample string “DRDOBBS”, which contains 7 bytes of data. In order to perform the BWT, the first thing is to treat a string S, of length N, as if it actually contains N different strings, with each character in the original string being the start of a specific string that is N bytes long. It is also treated at the buffer as if the last character wraps around back to the first. As shown in Fig. 2, the matrix is formed by rotating the original sequence of the string. Then the matrix of Fig. 2 is lexicographically sorted. After sorting, the set of strings is arranged as shown in Fig. 3. There are two important points to be noted in this figure. First, though the strings have been sorted, it is important to keep track of which string has which position in the original set. So, it is known that the string 0 (S0), the original unsorted string, has now moved down to row 4 in the array in Fig. 3. Second, the first and last columns in the resulting matrix should be marked with special designations (F and L in Fig. 3). The column F contains all the characters in the original string in sorted order. So the original string "DRDOBBS" is represented in F as "BBDDORS". The characters in column L do not appear to be in any particular order, but in fact they have an interesting property. Each of the characters in L is the prefix character to the string that starts in the same row in column F. The actual output of the BWT, oddly enough, consists of two things: a copy of column L, and the primary index, an integer indicating which row contains the original first character of the buffer B. So, performing the BWT on the original string generates the output string L which contains "OBRSDDB", and a primary index of 5. The integer 5 is found easily enough since the original first character of the buffer will always be found in column L in the row that contains S1. Since S1 is simply S0 rotated left by a single character position, the very first character of the buffer is rotated into the last column of the matrix. Therefore, locating S1 is equivalent to locating the buffer's first character position in L.

B. Huffman Coding

The Huffman coding is a lossless data compression coding technique that uses a small number of bits to encode common characters. It uses a specific method for choosing the representation for each symbol, resulting in a prefix-free code (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols.

A Huffman coding approximates the probability for each character as a power of 1/2 to avoid complications associated with using a non-integral number of bits to encode characters based on their actual probabilities. The Huffman coding works on a list of weights {wi} by building an extended binary tree with minimum weighted path length and proceeds by finding the two smallest ws,w1 and w2, viewed as external nodes, and replacing them

with an internal node of weight w1 + w2. The procedure is then repeated stepwise until the root node is reached. An individual external node can then be encoded by a binary string of 0s (for left branches) and 1s (for right branches). Table 1 below shows the result of the Huffman coding for a set test vectors.

Table 1. Huffman codes for the test vectors of circuit s444

III. PROPOSED TECHNIQUE

Fig. 4 depicts the block diagram of the proposed technique. All the test vectors required for testing the SOC are first compressed in software mode. The compressed test vectors and an efficient decompression program are then loaded into the embedded processor core of the SOC. The processor executes the decompression program, and then applies all the uncompressed original test vectors to each and every core of SOC for generating and analyzing the output responses. In the compression technique, all the test vectors are divided into several blocks of equal size, and the size of the block depends on the total number of bits in each vector. One of the test vectors is considered as a reference test vector and the next test vector is generated from the previous vector by storing only those blocks that differ from the previous one. As shown in Fig. 5, the test vector-1 has 7 blocks of size 4 bits each. The test vector-2 also has 7 blocks of the same size but all the bits in the 2nd, 3rd,and 6th blocks are same as in the test vector-1. Similarly, the 3rd, 4th, and 6th blocks in test vector-3 are same as in the test vector-2. If we have the information of the 1st, 4th, 5th,and 7th blocks in the test vector-2 along with the reference test vector-1, then we can easily compute the total set of

Test vector Occurrence Probability of occurrence

Huffman codeword

000 1631 0.8671 0

010 139 0.0738 10

001 93 0.0494 110

011 7 0.0037 1110

110 5 0.0026 11110

101 3 0.0015 111110

111 2 0.0011 1111110

100 1 0.0005 1111111

361

Fig. 4. Block diagram of the proposed method.

test vector-2. In the same way, only the information of the 1st, 2nd, 5th, and 7th blocks in the test vector-3 are required to generate the whole set of test vectors. The consecutive test vector is built from the previous test vector by replacing the blocks in which they differ. For example, the blocks in which the test vector n + 1 differs from the test vector n are shaded. Hence, the test vector n + 1 can be built from the nth vector by replacing only the shaded blocks.

Block numbers

1 2 3 4 5 6 7

Test vector-1 0100 1100 0001 1000 0110 1000 0111

Test vector-2 0110 1011 0011 0101

Test

vector-3 0100 1001 0010 0111

Fig. 5. Original test vectors are divided into several blocks.

The block matching algorithm [6] is applied on a test data set S as a matrix A×B. Each row of data is divided into several blocks of equal size (4 in Fig. 5), except the last one, which may have smaller size than other blocks. Because of the structural relationship among faults in a circuit, there are lots of similarities between the test vectors. The test vectors can be ordered in an optimal way such that any two successive test vectors differ only in a

fewer number of blocks. Hence, the amount of information required to store these differences will be less than that needed for storing the entire set of test vectors. On completing this block matching process, the high frequency and low frequency blocks are separated for further computation. The high frequency groups such as data sets in columns 1, 5, and 7, and the low frequency data sets in columns 2, 3, 4, and 6 in Fig. 5 are thus separated. Let the newly computed high frequency data sets Tk, k =

1, 2, 3, …, n, form a matrix M×N, where M > 2 and ≤ A,and N = block size of the data (4 in Fig. 5). So, the 1st, 2nd,and 3rd high frequency data sets as generated from the test vectors in Fig. 5 are given as

1

0100

0110

0100

T

=

2

0110

0011

0010

T

=

3

0111

0101

0111

T

=

Test Pattern Generator

Test Vector Compaction

Processor

Memory

Good Signature

Compare

Pass/Fail

Decompression Program

IP Core 1

IP Core 2

IP Core 3

Decompressed Test Vector

SOC

362

The Burrows Wheeler transformation algorithm is then applied to all the high frequency blocks of data. Finally, the Huffman coding algorithm is employed on all the preprocessed hybrid data sets for efficient compression.

Algorithm A

The Burrows Wheeler transformation algorithm is depicted as follows:

Step 1: Create a list of all possible rotations of string. Step 2: Put each rotation in a row of a large, square

table. Step 3: Sort the rows of the table alphabetically,

treating each row as a string. Step 4: Return the last (rightmost) column of the table.

Algorithm B

The inverse Burrows Wheeler transformation algorithm is described as:

Step 1: Create an empty table with no rows or columns.

Step 2: Repeat length (strings) times. Step 3: Insert s as a new column down the left side of

the table. Step 4: Sort the rows of the table alphabetically. Step 5: Return the row that ends with ‘EOF’ character.

Algorithm C

The Huffman coding algorithm is described as given below.

Step 1: Sort source outputs in decreasing order of their probabilities.

Step 2: Merge the two least probable outputs into a single output of which the probability is the sum of the corresponding probabilities.

Step 3: If the number of remaining outputs is more than two, go to Step 1.

Step 4: Arbitrarily assign 0 and 1 as codeword for the two remaining outputs.

Step 5: If an output is the result of merger of two outputs in a preceding step, append the current codeword with a 0 and a 1 to obtain the codeword of the preceding outputs and repeat Step 5. If no output is preceded by another output in a previous step, stop.

IV. EXPERIMENTAL RESULTS

In order to demonstrate the feasibility of the proposed test compression scheme, independent simulations were conducted on various ISCAS 85 combinational and ISCAS 89 full-scan sequential benchmark circuits. An automatic test generation program was first employed for obtaining a set of test vectors providing 100% fault coverage (MinTest) [15]. The proposed compression algorithm was implemented in C programming language on a UNIX machine having Dual 900-MHz Ultra SPARC III processor with 4GB of RAM. Because of space constraints, only some partial results on ISCAS 85 combinational and ISCAS 89 full-scan sequential benchmark circuits are provided in Tables 2 and 3, respectively. The percentage data compression was computed as follows:

( _ ) ( _ )% 100

_

Original Bits Compressed BitsCompression

Original Bits

−= ×

The results as presented in Table 2 show that for most ISCAS 85 combinational benchmark circuits, the percentage compression is near or above 70, with few exceptions. Table 3, on the other hand, shows that the compressions for ISCAS 89 full-scan sequential benchmark circuits are in the range of 67% to 78% with only one exception (s444). The partial experimental results, though not comprehensive enough, provide some insight on the performance of the proposed hybrid compression method. More detailed experimental results on the compression technique together with results of the decompression method in embedded processor will be available soon.

Table 2. Test vector compression results for several ISCAS 85 combinational benchmark circuits

Circuit name

Scan size

Number of test

vectors

Total original

bits

Compressed output bits

Compression %

c432 36 27 972 454 53.29

c499 41 52 2132 426 80.01

c880 60 15 900 272 69.78

c1355 41 84 3444 892 74.09

c1908 33 106 3498 1484 57.58

c2670 233 44 10252 3482 66.03

363

Table 3. Test vector compression results for several ISCAS 89 full-scan sequential benchmark circuits

V. CONCLUSIONS

In this paper, a new hybrid method is presented for test data compression that aims to fully utilize the unique characteristics of block matching, BW transformation, and Huffman coding on test sequences. The proposed approach supports the testing of system-on-a-chip with embedded processor cores. The technique utilizes the computational power of an embedded processor to perform test data compression and decompression in software mode. Thus, it reduces both the amount of test storage and testing time, thereby reducing the tester memory and channel capacity requirements. As the proposed method is mainly software-based, the hardware requirements and cost of ATE are minimized. The technique is completely lossless and time and space efficient because of its higher compression ratio and rapid decompression process. Currently, work is underway on implementing the decompression procedure in the embedded processor along with automatic application of test vectors for analyzing test fault coverage.

REFERENCES

[1] Y. Zorian, E. J. Marinissen, and S. Dey, “Testing embedded-core based system chips”, in Proc. Int. Test Conf., 1998, pp.130-143.

[2] P. S. Zuchowski, C. B. Reynolds, R.J. Grupp, S. G. Davis, B. Cremen, and B.Troxel, “A hybrid ASIC and FPGA architecture”, in Proc. Int.

Conf. Computer Aided Design, 2002, pp. 187-194.

[3] M. Abramovici, C. Stroud, and M. Emmert, “Using embedded FPGAs for SOC yield improvement”, in Proc. Design Automation Conf., 2002, pp. 713-724.

[4] S. Wilton and R. Saleh, “Programmable logic IP cores in SoC design: opportunities and challenges”, in IEEE Custom Integrated Circuit Conf.,

2001, pp. 63-66.

[5] A. Jas, J. Ghosh Dastidar, and N. A. Touba, “Scan vector compression/decompression using statistical coding,” in Proc. VLSI Test

Symp., 1999, pp. 114-120.

[6] A. Jas and N. A. Touba, “Deterministic test vector compression/decompression for systems-on-a-chip using an embedded processor”, J. Electronic Testing: Theory and Applications, vol. 18, pp.

503-514, Aug., 2002. [7] A. Jas and N. A. Touba, “Test vector decompression via cyclical scan chains and its application to testing core-based designs,” in Proc. Int. Test

Conf., 1998, pp. 458-464.

[8] M. Ishida, D. S. Ha, and T. Yamaguchi, “COMPACT: a hybrid method for compressing test data,” in Proc. VLSI Test Symp., 1998, pp.

62-69. [9] V. Iyengar, K. Chakraborty, and B. T. Murray, “Built-in self testing of sequential circuits using precomputed test sets,” in Proc. VLSI Test

Symp., 1998, pp. 418-423. [10] A. Chandra and K. Chakrabarty, “Test data compression for system-on-a-chip using Golomb codes,” in Proc. VLSI Test Symp., 2000, pp. 113-

120. [11] A. Chandra and K. Chakrabarty, “Efficient test data compression and decompression for system-on-a-chip using internal scan chains and

Golomb coding,” in Proc. Design Automation and Test in Europe, 2001, pp. 145-149.[12] A. Chandra and K. Chakrabarty, “Frequency-directed run-length

codes with application to system-on-a-chip test data compression,” in Proc. VLSI Test Symp., 2001, pp. 42-47. [13] M. R. Nelson, “Data compression with the Burrows Wheeler

Transform”, Dr. Dobb's Journal, vol. 22, pp. 46-50, Sep., 1996. [14] M. Burrows and D. J. Wheeler, "A block sorting lossless data compression algorithm", Digital Equipment Corporation, Palo Alto, CA,

Res. Rep. 124, 1994. [15] I. Hamzaoglu and J. H. Patel, “Test set compaction algorithms for combinational circuits”, in Proc. Int. Conf. Computer-Aided Design,

1998, pp. 283-289.

_____________

This research was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant A 4750.

Circuit name

Scan size

Number of test vectors

Total original

bits

Compressed output bits

Compression %

s420f 35 43 1505 488 67.57

s444f 24 24 576 272 52.77

s510f 25 54 1350 296 78.07

s526f 24 49 1176 254 78.4

s820f 23 93 2139 658 69.23

s838f 67 75 5025 1306 74.00

364

Documents

[IEEE 2006 IEEE Instrumentation and Measurement Technology - Sorrento, Italy (2006.04.24-2006.04.27)] 2006 IEEE Instrumentation and Measurement Technology Conference Proceedings -