Upload
others
View
14
Download
4
Embed Size (px)
Citation preview
Chapter 2
AT-SPEED TESTING ANDLOGIC BUILT IN SELF TEST
2.1 Introduction
This Chapter describes the concepts of scan based testing, issues in testing, need
for logic BIST and trends in VLSI testing. Scan and logic BIST are the primary
design-for-test (DFT) methods to control test cost. Both ATPG and logic BIST
based pattern generation depends on the scan architecture of the designs. The
increase in test data volume is prohibitive and impractical to apply on the designs
to achieve particular fault coverage. Several researchers have explored the concept
of test data volume reduction by multiple techniques.
2.2 The Issue: Test Data Volume
2.2.1 ATE Based Testing
Exhaustive IC chip testing after manufacturing is critical to ensure product quality.
Full scan based IC testing using ATE is the most widely used approach. As chip
designs become larger and new process technologies require more complex fault
models, the length of scan chains and the number of scan patterns required have
13
increased dramatically (Agrawal, 2002). To maintain adequate fault coverage, test
application time and test data volume have also escalated, which has driven up the
cost of test. As test data volume increases, test power also increases rapidly. Scan
chain values currently dominate the total stimulus and observe values of the test
patterns, which constitute the test data volume. With the limited availability of
I/O ports as terminals for scan chains, the number of flip-flops per scan chain has
increased dramatically. As a result, the time required to operate the scan chains -
the test application time - has increased to the extent that it is becoming uneconomic
to employ scan test on complex designs.
Built-in self-test (BIST) has emerged as an alternative to ATE-based external
testing. BIST offers number of key advantages. It allows pre-computed test sets
to be embedded in the test sequences generated by on-chip hardware, supports test
reuse and at-speed testing, and protects intellectual property. To understand the
impact of shift time on test application time, consider the typical sequence involved
in processing a single scan test pattern shown in Figure 2.1. The shift operations take
as many clock periods as required by the longest scan chain. Optimizations (such
as overlapping of scan operations of adjacent test patterns) still do not adequately
influence the unwieldy test application time required by the scan operation. For a
scan-based test, test data volume can be approximately expressed as
Test data volume ≈ scan cells ∗ scan patterns (2.1)
Assuming balanced scan chains, the relationship between test time and test data
volume is then
14
Figure 2.1: Scan based testing(Wang, 2006)
Test time ≈ (scan cells ∗ scan patterns)/(scan chains ∗ frequency) (2.2)
Consider a circuit having 60000 flip-flops. Assume the number of scan chains is 60
(sc). Each scan chain will have 1000 flip-flops (len). Assume the number of vectors
is 20000 (tv). The time taken to test the IC is given by,
Number of clock cycles for flush test = len+ 4 (2.3)
Number of clock cycles for scan test = ((tv ∗ (1 + len)) + len) (2.4)
Most of the automatic test equipments work at the speed of few MHz to few tens
of MHz. Even if an ATE working at 100 MHz, for the above example it will take
15
approximately 2 seconds which is typically a long test time. For all fault models the
following metrics are used to characterize the quality of a test set.
Definition 2.1: The fault coverage (FC) is the percentage of detected faults with
respect to the total number of faults. The terms test coverage (TCov) and fault
coverage (FC) are interchangeably used by different EDA vendors.
Definition 2.2: The fault efficiency (FE) is the percentage of detected faults with
respect to the total number of testable faults.
As designs grow larger, maintaining high test coverage becomes increasingly
expensive because the test equipment must store a prohibitively large volume of test
data and the test application time increases. Moreover, a very high and continually
increasing logic-to-pin ratio creates a test data transfer bottleneck at the chip pins.
Accordingly, the overall efficiency of any test scheme strongly depends on the method
employed to reduce the amount of test data.
2.2.2 ATE Limitations over LBIST
Test data storage, and test data bandwidth requirements between the tester and
chip grow rapidly. ATE capability cannot scale indefinitely for increasing complex
IC designs. A new tester with more memory, channels, and higher speed of operation
is also not a good solution. The cost of a high-speed tester has already reached
around 20 million USD (US dollars) in 2010 (ITRS Annual Report, 2011). Some of
the applications of ATE in IC back end process is shown in Figure 2.2. The difficulty
of using ATE can be avoided by eliminating the costly ATE and use BIST. BIST
uses on-chip test pattern generation by Linear Feedback Shift Registers (LFSRs).
16
Figure 2.2: Applications of ATE in IC back end process (chromaate, 2004).
2.2.3 The Need for Logic BIST
Any electronic system employed in safety critical applications is expected to have a
periodical self checking scheme for sustained error free operation. For example, med-
ical electronic devices need to test themselves to assure continued safety of the pa-
tients. LBIST found its use mainly in safety-critical (automotive, medical, military),
mission-critical (deep-space, aviation) and high-availability (telecom) applications.
However, process technologies plunging below 22nm, LBIST will become compulsory
for Application-Specific Integrated Circuits (ASICs), Application-Specific Standard
Products (ASSPs) and complex commercial ICs (Nan Li, et al., 2015). Another
example is automotive electronics. With the explosion in the growth of the automo-
tive semiconductors industry comes an associated and intense focus on high silicon
quality and reliability. The last thing anyone wants is a brake system failure due to
a latent silicon defect, and concerns over reliability are driving changes in the testing
17
requirements for these chips. The electronics must meet certain safety standards set
forth by the automotive manufacturers, such as ISO 26262 and AEC-Q100, which
outline acceptable failure rates for key functionality (Meehl, 2014). DFT methods
and testing strategies need to be restructured to meet these high quality metrics.
Most needed strategy is to test the IC’s for their entire lifespan. On chip testing
emerged as the solution to enable very effective testing on these IC’s. Power-on test
that ensures the silicon is functioning as required each time the vehicle is started,
with LBIST logic coverage typically falling in the range of 75% to 80%. In a typical
application, if the LBIST test signals a silicon failure, an associated error code is
posted to the vehicle computer and the check engine light comes on. In this way,
LBIST provides the means to quickly and easily identify potential safety issues with-
out needing dedicated and complex equipment at car dealerships. Another benefit
of LBIST is that it provides a low pin interface and very small test data volume
when testing the chip on the tester.
2.2.4 Fault Coverage in Logic BIST
Logic BIST′s fault coverage typically reduces because of random pattern resistance
(rpr) faults. So fault coverage is much less than 100 percentage and very long test
sequences are required. To achieve test coverage close to that achievable by deter-
ministic ATPG patterns, the number of BIST patterns must be significantly greater
and which forces a tradeoff between increased test application time and reduced test
coverage. Several solutions have been devised to address this problem. The number
of faults detected by a random pattern is usually high for the first few or many pat-
terns and then reduces with further patterns (Agrawal, 2002). The detected faults
18
are removed from the list of total faults after the application of each test pattern. In
general, after a particular number of patterns, the fault coverage of random patterns
saturates and even hundreds of patterns may detect a few faults only. These faults
are the most difficult to detect. Further generated many random patterns may not
detect even a single fault in the CUT. These undetected faults may also be referred
as random pattern resistant (rpr) faults. Figure 2.3 shows the general trend in fault
coverage with the number of the random patterns. It is not known precisely when
to stop generating these random patterns and is normally done when no more im-
provement in fault coverage is seen.
Figure 2.3: Typical logic BIST fault coverage map
19
2.2.5 At-Speed and On-Chip Testing of Logic BIST
Around 2012, there is a lot of interest in mission-critical markets - military, med-
ical, and especially automotive - in LBIST. LBIST is generally not for a one-time
manufacturing test but for repeated testing in the field (or in system testing) to
ensure that a car or a satellite or medical device is working as expected. For exam-
ple, in an electronically-controlled braking system, LBIST could run every time the
car is turned on. Every time the right signatures have to be compared and if the
signatures are not matched, then the car’s electronic controls can alert the driver
or perhaps prevent the transmission from going into drive. Of course, logic ICs still
run through the usual manufacturing tests, but most likely the LBIST will not be
turned on in production testing.
Meehl (Meehl, 2014) observed that LBIST requires more test patterns than con-
ventional testing, may take an extra millisecond or two to work, and probably would
not provide high enough test coverage without additional test vectors. For manu-
facturing test, the most common approach is the combination of LBIST and ATPG
driven tests for the majority of users. During ATPG testing, LBIST is turned off
and treated like functional logic.
2.2.6 An Automated Logic BIST Insertion Flow
Two kinds of logic BIST configurations are in practice based on the method of
accessing the LBIST from an external device. First in a production facility, efficient
and first-pass quality check of the silicon can be performed on multiple sites in
parallel due to the reduction in test pin interfaces down to the JTAG pins only.
This type of LBIST is called as JTAG Access LBIST and uses the JTAG interface
20
and protocols to start and execute LBIST. Other type called as Direct Access LBIST
can also be easily executed by holding an extra pin high or low. This method uses the
I/O pins of the device directly to access the LBIST blocks. The cadence encounter
test product has supported automatic test pattern generation (ATPG) for many
years. A new capability with the recent Encounter 13.1 release, however, is the
automatic insertion of the LBIST macro with the Cadence RTL (register transfer
level) Compiler (RC). The automated flow first builds scan chains, then inserts
compression channels, and then builds the LBIST circuitry. Designers can check
coverage and manually add more test points to improve coverage. Encounter test
then validates that the PRPG and MISR are working properly, generates the final
signature, and provides the fault coverage for a given LBIST run. The diagram in
Figure 2.4 below shows the LBIST flow in more detail (OPCG is on-product clock
generation). A comparative feature list of the ATE based testing and logic BIST
based testing is given in Table 2.1.
2.3 Trends in VLSI Testing
In the last three decades, the world of electronics (especially microprocessors) has
been expanded exponentially both in productivity and performance. ICs (Integrated
Circuits) are considered as the basis of the modern products. IC industry has
followed a steady path of constantly shrinking the device dimensions and hence
increasing the density of the chip.
IC testing is one of the critical processes involved in the life cycle of a product.
Testing difficulty increases with the integration levels of the devices. Rule of ten
21
Figure 2.4: Automated LBIST insertion with Cadence RTL Compiler (RC) andEncounter Test (ET) (Cadence, 2012)
says that the cost of detecting a faulty IC increases by an order of magnitude as
it move through each stage of manufacturing, from device level to board level to
system level and finally to system operation in the field. Hence rule of ten will be
changed into rule of twenty because the chips, boards and systems are enormously
complex. Feedback obtained from the IC testing is the only way to analyze and
isolate many defects in the manufacturing process. Time-to-yield, time-to-market
and time-to-quality are all gated by testing. With the increasing needs for high
quality electronic products, IC test is used for debugging, diagnosing and repairing
the parts of the device. Also with the increasing reliability, availability and service-
ability requirements results in the periodic testing of the device throughout their life
cycle. VLSI testing is very important to designers, product engineers, test engineers,
managers, manufactures and end-user.
22
Table 2.1: Differences between LBIST and ATE based testing.
Logic BIST ATE based Testing
1 At speed testing Slow (ATE dependent)
2 Reduces Test costs High ATE cost
3 Less test time More test time
4 Random patterns(Low pattern effi-ciency)
Deterministic patterns(high patternefficiency)
5 Low stuck at coverage High stuck at coverage
6 On chip Needs test house
7 Memory not needed Large Memory needed
8 Any stage of implementation Only after manufacturing
9 Debug without disintegrating from thesystem
Debug using ATE only, needs disinte-gration from the system
10 Complex debug Relatively simpler
As we move deep into submicron technology, IC has been replaced by SoCs type
design. SoC is a heterogeneous mixture of a number of cores obtained from various
sources. Conventional test and diagnosis methods became inadequate and costly to
test SoC designs. Traditional bed-of-nails method is not able to control and observe
each and every point in the device. The ATE (automatic test equipment) is computer
controlled equipment which is used to apply test patterns to a DUT (device under
test) and based on the response collected from the DUT it mark the device as a good
or bad. Since core based system introduce new complexities in testing, the designers
has to search for alternate approach for chip development in order to keep up with
diminishing time-to-market requirements and stay within budgets. One alternative
is embedding the testing capabilities or test circuits beyond primary I/O ports and
internal structure of the chip. The process of incorporating test circuits into the
23
design of a device is called Design for Testability. DFT technique (embedded test)
enables the production of higher-quality devices in less time. The use of embedded
test raises margins and significantly reduces the time required for system verification,
test and debug. One of the oldest DFT techniques is scan chain that reduces the test
complexity to a large extend. In the case of SoCs, boundary scan and BIST methods
are used to perform testing. Test cost contributes a major part to the total device
cost. Test cost includes cost of ATE equipment, cost of test development, test vector
generation, programming etc. Scan based techniques significantly reduces the cost
of test generation and BIST methods can lower the complexity and cost of ATE.
The implications that made by today’s complex ICs on the test are three scaling
trends: complexities, increased performance, and higher densities that have direct
impact on the test methods.
2.3.1 Implication on Complexity
Transistor count in an IC keeps on increasing from generation to generation as stated
by Moore’s law. The drastic increase in the ratio of transistor per pin continuously
reduces the accessibility to the transistors from the chip and it will create big problem
for IC test. The amount of test data needed for testing an IC with a certain level of
fault coverage grows with the transistor count of the IC under test. SoC is a mixed IC
that contains various non-homogenous circuits (such as analog blocks, digital blocks,
memory elements, etc) collected from different vendors. Different types of circuits
exhibit distinct defect behavior and need different testing techniques. Typically
distinct external test equipments are used to test each of these circuits and it is
considered as an expensive process. Embedded tests such as BIST and boundary
24
scan remains as a solution to test these types of mixed circuits.
2.3.2 Implication on Performance
With the continuous increase in IC internal speed, performance-related defect cov-
erage has become increasingly important. The complex chip will demand a com-
prehensive performance test. Performance test that is applied from external test
equipment cannot adequately and cost effectively test the high clock speeds and
provide the necessary performance-related defect coverage. Because these equip-
ment are typically made of older technology compared to the chips they test, and
the higher speed test equipment are substantially more expensive. Since embedded
test circuit is built on the same IC, it will have same speed as that of the chips
internal speed.
2.3.3 Implication on Density
Continuous advancement in the technology will keep on increasing the silicon (de-
vice) density. Increase in the density causes reduction in the defect sizes and hence
increases the difficulty in fault localization. Embedded test hardware whether in
embedded memories, cores, user defined logic or analog blocks can act as the infras-
tructure to collect the faulty data from the block under test. This helps to quickly
isolate faults. As a result, embedded test hardware can assist in defect isolation.
2.4 Fault Models of Interest
This section describes the logical fault model used in this work. Logical faults
represent the effect of physical defects on the logic behavior of the modeled system.
25
Restricting the analysis of physical defects to the level of the logic behavior has
several advantages.
The complexity is reduced by transforming a physical problem into a logical
problem. The space of physical defects is larger than the space of logical faults,
such that a fault model can cover several physical defect types. Moreover, tests
derived for certain logical faults may cover physical defects for which no accurate
fault model is known (Agrawal, 2002). Also, most of the logical fault models are
technology-independent and hence testing and diagnosis methods developed for such
fault models are applicable to many technologies.
Functional faults are commonly defined at RTL or higher levels (like behavioral
or system level) and they affect the proper execution of the operations used at
these levels. Shorts and opens are two examples of structural faults. A short is
formed by connecting points not intended to be connected while an open results by
breaking a connection. In this work, only structural, permanent and single faults
of combinational logic are considered. Intermittent, transient, or multiple-faults are
not taken into account. The analog and the memory elements that may be present
in the circuit under test are not considered. Under the single-fault assumption one
assumes that in a system at most one logical fault is present. This assumption is
justified by the fact that in most of the cases a multiple fault can be detected by
the tests designed for the individual single faults that compose the multiple-fault.
26
2.5 Stuck-at Faults
The logical fault corresponding to a signal line being stuck at a fixed logic value
(0/1) is referred to as a single stuck-at 0/1 fault. Physical defects which can be
modeled with the help of a stuck-at 0/1 fault on the signal line i include an open on
the fan-out lines driven by the line i, a short to power/ground or an internal error
in the component driving the line i (Gherman, 2006).
Despite the fact that the single stuck-at fault model does not cover all the physical
defects that can appear in a digital circuit, it is very useful due to the following
properties:
As compared to other fault models, the number of single stuck at faults in a cir-
cuit grows linearly with its size. Moreover, the number of these faults that have to be
explicitly considered can be reduced by fault collapsing. Techniques like structural-
based and dominance-based fault collapsing can reduce the number of faults to be
explicitly analyzed by 50% and 40%, respectively (Abramovici, 1990).
Test sets generated for single stuck-at faults may detect many faults belonging
to other fault models. It is technology independent. The single stuck-at fault model
and its analysis can be used to construct and analyze other types of fault models, like
the transition fault model. A combinational circuit that contains an undetectable
stuck-at fault is said to be redundant, since such a circuit can always be simplified by
removing at least one gate or input. The test generation problem for stuck-at faults
belongs to the class of NP complete problems (worst-case behavior) (Ibarra, 1975).
Undetectable (redundant) faults are usually the ones that cause test generation
algorithms to exhibit their worst-case behavior respectively (Abramovici, 1990).
27
A straightforward extension of the single stuck-at fault model is the multiple
stuck-at fault model. This fault model is more difficult to handle. The list of faults
for a circuit having N possible sites for single stuck-at faults can contain up to
2N single and (3N -1) multiple stuck-at faults (Abramovici, 1990). Fortunately, the
importance of the multiple stuck-at fault model is reduced due to the fact that
tests with complete detection of the single stuck-at faults would detect most of the
multiple stuck-at faults (Hughes, 1984).
2.6 Test-per-Scan Schemes
Test-per-scan BIST schemes require scan-based design. In the case of sequential
circuits, this means that all the storage cells can be configured as one or several scan
paths (chains), which are used as serial shift registers in test mode (Figure 2.6). In
this way, each storage device of the CUT becomes easily controllable and observable.
The test stimuli/responses are shifted into/out of the scan paths (Abramovici, 1990)
(Agrawal, 2002) (Wang, 2006). Scan-based design helps to reduce the problem of
testing sequential circuits to the simpler problem of testing combinational circuits.
The BIST control unit (BCU) in Figure 2.5 must contain at least a shift counter
and a pattern counter. The shift counter controls the bit stream which is generated
and shifted into the scan path by a TPG. The pattern counter controls the length
of the test sequence. A system clock cycle (also called capture or functional clock
cycle) is applied to load the CUT response to the current test pattern into the scan
path. During the so-called shift mode (also called scan or test mode) a new test
pattern is shifted into the scan path, while the CUT response to the previous pattern
28
is shifted out and compressed by a test response evaluator (TRE). A very common
and effective parallel-serial mixed scheme is obtained by partitioning a full scan path
into multiple scan chains (Figure 2.6).
Figure 2.5: Test-per-scan scheme (Wun, 1998)
Figure 2.6: STUMPS architecture for parallel-serial mixed scheme (Wun, 2002)
In Figure 2.6, the test patterns are generated by a pseudo-random pattern gen-
erator (PRPG) and the responses are compacted by a multiple input shift register
(MISR). Both the PRPG and the MISR are typically implemented as linear feedback
shift registers (LFSRs. Such a scheme is called Self-Test Using MISR and Parallel
29
Shift register sequence generator (STUMPS) (Bardell, 1982).
The basic design with multiple scan chains suffers from highly correlated pat-
terns. To solve this problem, XOR-trees (phase shifters (PS)) may be inserted
between the LFSR and the scan chains inputs (Figure 2.6) (Bardell, 1990)(Rajski,
1998).
Test-per-scan schemes have several advantages: (a) high fault/defect coverage
(b) reduced test data size (compared to sequential test patterns) (c) relatively low
test generation time (d) reduced test costs (no special requirement for costly ATEs
for functional testing) (e) low impact on the system behavior, as only scan paths
are included into the mission logic and (f) separation of the pattern generator from
the CUT, so that it can be synthesized at a later step of the design flow.
The drawbacks of test-per-scan schemes are: (i) long test application time re-
quired by the scan mode (ii) functionally untestable faults can be activated (iii)
reduced testability for faults whose detection necessitates pairs of test patterns and
(iv) reduced system performance if scan elements are introduced into the critical
paths. If partial scan paths (Trischler, 1980) are used, such problems can be re-
duced and more test patterns may be applied within the same test time.
2.7 Test-per-Clock Schemes
In a test-per-clock scheme (Koenemann, 1979)(Krasniewski, 1989)(Stroele, 1994)
(Wang, 1986), a test pattern is applied to the CUT every clock cycle. This scheme
is best suited for register-based design. This kind of scheme employs a specific
BIST architecture using the built-in logic block observer (BILBO) (Koenemann,
30
1979), which is a more sophisticated register that can function as a normal state
register, scan register, PRPG or MISR. All functionality of the BILBO depends on
the mode input signals B0 and B1. Signal B0 controls all the registers to switch
between the global and local modes (Figure 2.7).
Figure 2.7: Control signals of a BILBO (Wun, 1998)
Figure 2.8: Test-per-clock scheme (Wun, 2002)
The global mode covers the functional and scan modes. In the local mode, the
registers may act as pattern generators or response evaluators. In order to select
each of these sub-modes associated with the global or local mode, the signal B1
is used. In contrast to signal B0, which is unique for all registers, the signal B1
depends upon the addressed register. In Figure 2.8, it can be seen how to facilitate
testing by changing the functionality of the BILBO registers. Initially, the registers
R1 and R2 are initialized in scan mode. Then register R1 is set to a PRPG mode
31
for the combinational logic C1 and the test responses are observed by register R2
that functions in response evaluation mode as MISR. The combinational logic C2 is
tested after the test outcome contained in R2 is shifted out and the functionalities
of R1 and R2 are interchanged. In the end, the new test outcome contained in R1
has to be shifted out.
Compared to test-per-scan schemes, the test-per-clock schemes have both advan-
tages and disadvantages. The advantages of test-per-clock schemes are: (a) shorter
test times and better support for two-pattern testing (Cockburn, 1998), as a new
pattern can be applied in each clock cycle and (b) better support for at-speed test-
ing, as no pattern shifting is required, which generally is done at a lower speed.
The disadvantages of the test-per-clock schemes may be the following: (i) larger
hardware overhead and (ii) stronger impact on the system behavior and design flow.
The overhead can also be affected by the increased complexity in the test-per-clock
schedule that requires the synthesis of a rather complex BCU. One reason for these
disadvantages is that additional test registers have to be included, due to the fact
that normal BILBO registers cannot work as TPG and TRE simultaneously. (Wang,
1986) introduced a special type of BILBO register, also called concurrent BILBO,
which is able to perform signature analysis and pattern generation concurrently.
2.8 Summary
This Chapter briefed the concepts of VLSI testing, ATE based testing, Logic BIST
based testing and the two types of logic BIST. Most of the commercial DFT tools use
the STUMPS architecture which is the test-per-scan scheme. Various researchers
32
attempted to make the logic BIST based testing more efficient, by reducing the
number of random test patterns in different ways. Details of the different test
patterns compression techniques and schemes proposed previously in literature are
explained in Chapter 3.
33