Testing and Qualification[1]

8/3/2019 Testing and Qualification[1]

1/45

Chapter

12TESTING AND QUALIFICATIONs. Kolluru and D. Berleant

12.1 INTRODUCTIONThe primary purpose of testing is to assess quality. This assessment can be done withrespect to an entire system or device, or with respect to smaller or larger parts of it,as when attempting to find the location of a fault. The assessment can produce aquantitative value, as when chips are to be sorted into speed categories based on thehighest clock rate for which each will function properly, or it can be, and often is,simply a qualitative determination of whether something works or not. Assessingquality is obviously important in applications for which avoiding failure is critical.Perhaps less obvious, bu t no less important, assessing quality can reduce costs. Forexample, it is costly to sell bad units and have to refund or replace them, and it iscostly to complete the fabrication of a unit that could have been discarded due todefects early in the fabrication process.

While the concept of testing is useful in a wide range of applications, the discussion in this chapter is limited to issues related to testing of microelectronicdevices, and in particular, advanced electronic packages, such as MCMs.Testing of advanced electronic packages, like testing of other complex electronicsystems, begins with informal critiques of a design concept, ends with verifyingrepairs to deployed units, and covers numerous intermediate stages. Figure 12.1outlines the testing stages for MCMs, one type of advanced electronic package.12.1.1 Testing of Highly Integrated Packages:

General ConsiderationsTesting, and the related area of dependability, are well-known and importanttopics in the field of computing. Similarly, issues of dependability and testability aresomewhat more acute for highly integrated packages (such as MCMs) than fortraditional printed circuit boards due to a general heuristic ("rule of thumb")principle:Heuristic 1: As component density increases, the individual components tend tobecome harder to test and fix. This heuristic holds because components getsmaller and more concealed by other components and packaging. Fortunately, this is offset by the next heuristic principle.Heuristic 2: As component density increases, elementary parts become cheaper andmore efficiently used. The tendency toward more efficient use of elementarycomponents holds because of a decreased need for components assigned to

447


2/45

448 Chapter 12 Testing and QualificationInformal design critiquesFunctional verificationThermal analysis

Design rule checkingElectronic rules checking

Layout vs. schematic checkingTiming analysisLayout simulation

Component testing (substrates and dice)Staged testing

Full functional testingParametric testing

Field testing of field replaceable units

Designtestingstages

FunctionaltestingstagesFigure 12.1 Stages in the testing of anMCM and their approximate order ofapplication. Feedback paths, where pro-blems found in one stage necessitatingrevisions in a previous stage can alsooccur.

interfacing, broadly defined to include packaging, bonds, connections, I/Odrivers, etc.The elementary parts to which heuristic 2 refers are classified into four cate-gories:1. Electronic parts, such as transistors, resistors, capacitors, etc.2. Electrical nets, which connect other parts together. They share importantproperties with the other categories of elementary parts, such as finite

dependability, nonzero cost, and performance of important duties.3. Electrical bonds, such as the short wires (wirebonds) that may be used toconnect an IC and its pins, or an Ie die and an MCM substrate. Bonds alsoshare important properties with other kinds of parts, such as electrical nets,and even perform similar functions, yet differ from nets from the standpointof fabrication, testing, and reliability.

4. Physical parts, such as pins, physical package parts, etc.Integration increases component density and, at the same time, reduces thenumber of elementary parts. For example, integrating the functions that were pre-

viously performed by two chips into one chip eliminates the need for some of theinterfacing electronics, which, in turn, reduces the number of required nets, electro-nic parts, and bonds. Having one package instead of two also reduces the number of


3/45

Section 12.1 Introduction 449physical package components like pins and ceramic or plastic parts. Placing twochips on an MCM substrate (a lesser degree of integration than having one newchip with the functionality of the previous two) also reduces the total number ofelementary parts such as pins, bonds, and plastic or ceramic parts.Heuristic 1 suggests that increased integration tends to lead to problems withdependability and testability, and hence, to higher costs. Counteracting this tendencyis heuristic 2 which suggests that increased integration tends to lead to improvementsin dependability, testability, and cost.As the technology and experience in support of a given level of technologyimprove, the balance shifts in favor of heuristic 2, and the degree of integrationthat is most cost effective tends to increase over time.In this chapter, multichip modules (MCMs) and other advanced packages, andtheir testing and testability as compared with functionally equivalent single chipintegrated circuits (LCs) on printed wiring boards (PWBs), which is the traditionalgenre of electronic integration, are emphasized. The heuristic principles are usefulbecause they provide basic concepts that give broad guidance and structure forunderstanding this area.

12.1.2 Test Issues for Multichip ModulesTesting is currently a serious bottleneck in MCM design, manufacture, anddeployment. Testing has always played a major role in electronic systems, yetthere are unique characteristics of MCMs that lend a distinctive character to the

testing problem (see Fig. 12.2).As Fig. 12.2 indicates, nets on an MCM are less accessible for probing than isdesirable. This is because nets are small and pass through the substrate, rather thanlarge and over the surface as in the case of PWBs. Nevertheless, the accessibility ofnets for testing in an MCM is greater than the accessibility of nets in a single chip (orwafer) because a test pad can be built for any given net in an MCM, providing anexternally accessible point for probing that net. This is much more difficult with achip where, as a rule, a net can be made accessible for probing only if an entire pin isconnected to that net. Yet probe points are important for electrical testing. Forexample, during the MCM manufacturing process, it is useful to perform tests onindividual die that have just been mounted (see the section on staged testing) andsuch tests require access to the nets that connect to them.

Increasing degree of integrationI Printed circuit boards I

All nets canbe probed

Multichip modulesNet testabilitycan be designed

IWafer scale integrationIPaths ending in pinsare testable

Decreasing availability of probe pointsDecreasing repairability

Figure 12.2 Increased integration correlates with reduced availability ofprobe points and reduced repairability.


4/45

450 Chapter 12 Testing and QualificationAs device complexity increases, it is difficult to perform a full functional test, asthe number of test vectors required becomes astronomical. This led to the need toincrease the testability of internal circuits. The boundary scan method, BIST (BuiltIn Self Test), adding test points on an MCM substrate exterior (i.e., test pads), andpinning out all internal I/O to test pads are some of the ways of increasing testability[12.1]. MCM testing is broadly divided into two categories: those based on softwaresimulations and those applied directly to the devices themselves. Simulation-basedtest methods help ensure the functionality and specifications' compliance of thedesign before manufacturing. Direct test methods perform functional testing onthe MCM during and after fabrication.

12.1.3 Testability and DependabilityConsiderations and Their InteractionThe connection between testability and dependability is that improving dependability tends to reduce the effort and expense needed for testing, and improvingtestability tends to reduce (but not eliminate) the importance of dependability.Since testing of advanced electronic packages is often challenging, dependability isan important consideration from a testing perspective since testing needs to becontrolled to some degree by controlling dependability.While the output of a manufacturing process cannot, in general, be guaranteedto work, different manufacturing lines can and do produce artifacts of widely varyingdependabilities. The dependability of an engineered artifact is determined by boththe quality of the manufacturing process, and by intrinsic properties of the artifact

being produced. An important intrinsic property influencing dependability is thecomplexity of the artifact. High complexity tends to cause lower dependability,and vice versa. Since the complexity of advanced electronic packages is so high,achieving adequate dependability is an important problem. Therefore, the followingsection reviews dependability from a testing perspective.

12.1.4 Dependability in MCM-Based Systemsfrom a Testing Perspective

Like all electronic systems, MCM-based systems can be viewed at differentlevels. At the lowest level is analog circuitry at the circuit level [12.2]. The abstractionhierarchy proceeds upward to the system level (see Fig. 12.3). Dependability problems can occur due to faults in the building blocks of any level in the hierarchy,leading to errors and failures of the overall system.

System levelf---. Card levelMultichip module level

Chip levelR e g i s t e r - t r a n s f ~ r level .._-Gate level~ - . -Circuit level Figure 12.3 A hierarchy of abstractionlevels in an MCM-based system.


5/45

Section 12.1 Introduction 451A dependable system requires dependability of the building blocks and theirinterconnections in each level of the hierarchy. For the circuit, gate, and registertransfer levels, the issues for MCM-based systems are similar in many ways to thosefor other integrated circuit based electronics. However, a significant difference exists:for M C M s ~ the least replaceable unit (LRU) is now an entire MCM, which is morecomplex and, therefore, more expensive than the least replaceable unit on a printedwiring board.When the LRU is an MCM, dependability and testing of its components priorto mounting them, and staged testing and reliability at intermediate stages of theassembly process, become more important. Staged testing refers to verifying thatcomponents and interactions among components meet standards at intermediatestages during the assembly of an MCM or other system. Reworkability refers tothe ease with which a bad component, bad connection, or other defect found duringstaged testing can be fixed or replaced during the assembly process.12.1.4.1 Dependability Versus TestingIt is impossible, or nearly so, to repair a faulty chip. This makes it more impor

tant than it otherwise might be for chips to work dependably. Chip dependability iseven more important when the chip is mounted in an MCM because, not only arebad chips that are mounted in an MCM difficult and expensive to replace in comparison to their replacement on ordinary circuit boards, but just one bad chip of theseveral contained in the MCM will usually make the whole MCM bad, and theprobability that anyone of the several chips is bad is much higher than the probability that a given chip is bad (see Eqs. 12.1 and 12.2 in Sec. 12.7). Compoundingthe problem is that chips are hard to test before they are mounted in an MCM, aproblem of sufficient magnitude so as to make testing of unmounted chips (bare die),a critical issue in making MCMs economically viable (called the "known good die"problem, see [12.3] and Sec. 12.7).

MCM dependability and testing needs are also impacted by fabrication, operating environment, and maintainability factors. In particular, fabrication factorsinclude the dependabilities and testabilities of the component chips, the bondswhich provide electrical connections between chip and wiring, the subst rate orboard and its wiring, and the bonds which provide electrical contact between theMCM and its pins. Other fabrication related factors include the interconnectiontechnology (e.g., optical vs. electrical), the type of bonding (e.g., flip chip, TAB,or wire bonding), the type of substrate (e.g., MCM-D or MCM Deposited [12.4]'1MCM-D/C or MCM-thin filmcopper polyimide Deposited on Ceramic, MCM-C orMCM-Cofired ceramic, and MCM-L or MCM-Laminate substrate), and the type ofsubstrate (e.g., hermetic vs. nonhermetic).

The impact of operating environment is similar in many ways to its effects onprinted wiring board dependability, in that many of the same environmental factorsare issues in both cases. Such environmental factors include heat and heat cycling,humidity, shock, vibration, and cosmic rays. However, specifics often differ so thatexisting knowledge of how environmental factors influence printed wiring boarddependability must be augmented with results applicable to MCMs.Maintainability factors include testability, reworkability, and repairability.Rework is important when testing uncovers a defective component of a partially


6/45

452 Chapter 12 Testing and Qualificationor completely fabricated MeM. For MCMs, rework is a much more difficult andhigher technology process than for printed wiring boards. MCM rework ranges fromtechnically feasible for TAB (Tape Automated Bonding) and flip chip bonding technologies, and for the thin film copper polyimide deposited on ceramic and cofiredceramic packaging technologies, to technically more difficult (for wire bonding) orcurrently uneconomical (for laminate substrates) [12.1].

From the standpoint of repairing failed systems, replacing a failed chip can bedone when it is mounted on a fully manufactured and deployed printed wiringboard, but is much more difficult with a fully manufactured and deployed MeM.

12.1.5 Fault ToleranceConsiderable progress remains to be made in fault tolerant architectures for

MCMs. This is partly because MCM technology, in its present state, is often tooexpensive for the substantial extra circuitry required for some forms of fault tolerance to be financially feasible. Yet, other forms of fault tolerant design do notrequire significantly more silicon real estate. The perspective that might profitablybe taken is one of optimizing the trade-off between the expense of adding in faulttolerance, and the expense of lowered dependabilities and increased needs for testingof nonfault-tolerant architectures.The basic idea in fault-tolerant design is to use redundancy to counteract thetendency of individual faults to cause improper functioning of the unit. Previouswork on fault tolerance in multichip modules is reported by Carey [12.5] who discusses redundant interconnections, by Pearson and Malek [12.6] who discuss redundancy within individual chips on a specialized MCM design, and by Yamamoto[12.7] who discusses redundant refrigeration units for increased reliability of cryogenic MCMs. More recent work suggests that the great increases in yield achievableby adding redundant chips to an MCM design can be cost effective [12.8]. Thesevarious approaches to MCM fault tolerant design are described next.One approach to maximizing the probability that a chip will work once it ismounted is to include redundant circuitry on the chip that can take over the functionof faulty circuitry if and when other circuitry on the chip becomes faulty. Hence, die(unpackaged chips) used for placement in an MCM may have their own built-in faulttolerance. This approach to fault tolerance is efficient in terms of the increase in sizeit implies in the MCM, since an incremental increase in the size of a die leads to arelatively small increase in the area of the MCM substrate that is required to hold theslightly larger die. However, such redundant designs are highly specific to the particular chip. In summary, on-chip redundancy to enhance yield [12.9] is particularlyapplicable when chips must be reliable, but are hard to acquire in tested form, as isoften true for bare die intended for use in MCMs. An MCM design utilizing thisapproach is proposed by Pearson and Malek [12.6].

Fault tolerance can also be built into the MCM substrate, in the form ofredundant interconnection paths. If the substrate is found to have an open path,for example, there might be another functionally identical path that could be usedinstead. Actual MCMs have been fabricated implementing this capability [12.5].Thisapproach need not lead to increased MCM area at all, since, if less than 1000/0 of thesubstrate's interconnect capacity is needed for a nonfault-tolerant design, the


7/45

Section 12.2 Testing: General Concepts 453remaining capacity could be used for holding redundant interconnections. In theevent that capacity exists for only some interconnections to be duplicated, duplication of longer ones should be preferred since the probabil ity of a fault in a pathincreases with the length of the path [12.5].This redundant routing approach has been shown to enhance MCM yieldssignificantly [12.5]. Since the dependability of nets in the MCM substrate decreasesas net length increases, Carey [12.5] duplicated long paths in preference to shortones. Since designs will often have some unused routing capacity, why not usewhat routing capacity is still available for fault-tolerant redundancy?Redundant conductors have been used in MCMs, not only for routing throughthe MCM substrate, but also for wire bonds. Redundant wire bonds are described byHagge and Wagner [12.10]. A large substrate was designed in four quadrants, so thatthe yield for a relatively smaller quadrant was higher than for a large substratecontaining all four sections on one substrate. However, connecting the four quadrants must be done dependably in order for the connected quadrants to competewith the large single substrate design. Connections were done with double wirebonds for increased dependability over single wire bonds. This redundant bondconcept could be investigated for use with die-to-substrate connections as well. Apotential disadvantage is that double bonds may require larger bond pads. However,bonds would require little or no additional substrate area.The more chips there are in an MCM design, the more risk there is of loweredyield. However, a design with more chips may actually have a higher yield than onewith fewer" if the extra chips are there for the express purpose of providing redundancy., the increment in chip number is modest, and an appropriate staged testingtechnique is employed. Indeed, Kim and Lombardi [12.8] found that very high yieldsare possible, and provided analytical results establishing this.The MCMs of the future may be liquid nitrogen cooled for speed, and eventually, to support superconductivity and its varied benefits. The refrigeration systemon which such MCMs depend must be reliable. This motivated a dual refrigerationunit design in the MCM system built by Yamamoto [12.7]. If one refrigerator breaksdown, the low required operating temperatures can still be maintained by the otherrefrigerator.Finally, MCM fabrication lines must provide reliable control of the manufacturing equipment. An uncontrolled shutdown can have serious negative effects onthe facility. When computers are used for control, redundancy should be built intothe fabrication line control system to prevent the destructive effects of unanticipatedshutdowns due to computer crashes, since such crashes will tend to occur occasionally due to software bugs, since software of significant complexity is almost impossible to produce without bugs.

12.2 TESTING: GENERAL CONCEPTS

The following are some important basic definitions:Fault detection-s-sue action of determining that there is a defect present.


8/45

454 Chapter 12 Testing and QualificationFault location-the action of determining where a defect is located.Fault detection coverage--the proportion o f defects that a fault detectionm et ho d c an discover.Fault location coverage--the proportion of faults which ca n be successfullylocated. Successful location does no t necessarily mean finding the exact location. Usually it means finding a subunit (e.g., chip, board, or other component)which contains the fault, an d hence, needs to be replaced.Destructive testing---any testing method which causes units to fail in order tomeasure ho w well they resist failure.Non-destructive testillg--any method of testing which does not intend to causeunits to fail.Defects ma y occur during the manufacture o f an y system. In Ie manufacturing,

defects ma y occur during an y of the various physical, chemical, and thermal processes involved. A defect m ay o ccu r in th e original silicon wafer, by oxidation ordiffusion, or during photolithography, metallization, or packaging. Not all manufacturing defects affect circuit operation, an d it ma y not be feasible, or even particularly desirable, to test for such faults. Thus, only those defects which do affectcircuit o pe ra ti on a re discussed.12.2.1 Fault Models

Fault analysis ca n be made independent of the technology by modeling physicalfaults as logical faults whose effects approximate the effects of common actual faults.Fault models ar e used to specify well-defined representations o f faulty circuits thatca n then be simulated. Fault models ca n also be used to assist in generating testpatterns [12.11]. A good fault model has t he following properties [12.11]:

1. Th e level o f abstraction o f the fault model should match the level o f abstraction at which it is to be used (Fig. 12.3 exemplifies different levels o f abstraction).

2. Th e computational complexity (amount of computation required to makedeductions) o f algorithms that use th e fault model should be low enough thatresults ca n be achieved in a reasonable amount o f time.3. Th e great majority of actual faults ar e represented accurately by the faultmodel.

Typical faults in VLSI circuits ar e stuck-at-faults, opens, and shorts. Th e abilityo f a set o f test patterns to reveal faults in th e circuit is measured by fault coverage.On e hundred percent (100%) fault coverage in complex VLSI circuits is usuallyimpractical, since this would require astronomical amounts o f testing. In practice,a trade-off exists between the fault coverage and th e amount o f testing effortexpended.

Since, for complex circuits, it is no t reasonably possible to apply a large enoughset of tests to achieve full fault coverage, a subset of all possible tests must be chosen.A good choice of such a subset provides better fault coverage than a less good subset


9/45

Section 12.2 Testing: General Concepts 455of the same size. Various algorithms have been proposed for choosing good tests forvarious kinds of ICs. The D-algorithm, P OD EM ( Pa th Oriented DEcision Making)algorithms [12.12], the FA N algorithm [12.13], the CONT algorithm [12.14], an d thesubscripted D-algorithm [12.15] are for combinational circuits. Test generation forsequential circuits is more complex than for combinational circuits because theycontain memory elements, and they also need to be initialized. Early algorithmsfor test generation for sequential circuits used iterative combinational circuits torepresent them, an d employed modified combinational test algorithms [12.16-12.18]. Test patterns for memory devices can be generated by checkerboard algorithms, such as the Static Pattern Sensitive Fault algorithm, etc. [12.19].No test pattern generation algorithm ca n ever fully solve the VLSI testingproblem because the problem is not complete, an d thus, is unsolvable in a reasonabletime for large examples [12.20]. Partitioning the circuit into modules an d testing eachmodule independently is one way to reduce the problem size. However, partitioningis no t always a workable approach. As an example, it is nontrivial to test a circuitconsisting of a cascade of two devices, from tests for the constituent devices. Anotherapproach is to include circuitry in the design whose purpose it is to facilitate testingof the device. Design for testability methods include BIST (Built In Self Test) andboundary scan, both of which are described later. Now, some well-known faultmodels are reviewed.

12.2.1.1 Stuck-at Fault ModelsSuppose an y line in a circuit under test could always have the same logical value

(0 or 1)due to a fault. This is a relatively simple fault model, termed the stuck-atfaultmodel. A line that is stuck at a logical value of 1 because of a fault is called stuck-at-I ,an d a line that is stuck at a logical value of 0 because of a fault is called stuck-at-O.To make test generation computationally tractable, a simpler version of the stuck-atfault model called the single stuck-at fault model assumes that only one line in acircuit is faulty. This is often a reasonable assumption because a faulty circuit oftendoes have just one fault. The single stuck-at fault model is more computationallytractable because there are many fewer faults to consider under this model thanunder a more complex model (the multiple stuck-at fault model) which allows formore than one fault to be present at once. Consider as an example, a circuit with klines. Each line ca n be either properly working, stuck-at 1 or stuck-at 0, leading tothe necessity to consider 3k -1 distinct fault conditions ( + 1 nonfault condition). Onthe o th er h an d, for the same circuit under the single stuck-at model, each of the klines can be either working, stuck-at 1, or stuck-at 0, bu t if one if the lines is stuck, allthe others are assumed to be working. This leads to the necessity to consider 2kdistinct fault conditions, two (stuck-at 1 and stuck-at 0) for each line. Luckily, singlefault tests have reasonably high fault detection coverage of multiple faults as well[12.21].The basic procedure in stuck-at fault testing is to set up the inputs to th e circuitso that the line under test will have the opposite logical value from the logical valueat which it is hypothesized to be stuck, and further, so that the effect of the line beingstuck at the wrong value causes an incorrect logical value downstream at an outputline, allowing the faulty circuit operation to be observed. The process of setting theinputs so that the line under test is set to the opposite value is called sensitizing the


10/45

456 Chapter 12 Testing and Qualificationfault. It might be pointed out that, if a stuck-at fault cannot lead to an observableerror in the output, then the circuit is tolerant of that fault and, for many purposes,the fault does not matter.As an example, consider the circuit shown in Fig. 12.4. A stuck-at fault on inputX cannot be detected at the output as can be seen by tracing logical values throughthe circuit. For this circuit, the output is determined by input Y. Therefore, a stuckat fault on line Y can be detected at the output.

X-- - - - - - -1Y ----+----,..-----4

l - - - - -z Figure 12.4 A stuck-at fault on input Yis detectable at the output, but a stuck-atfault on input X is undetectable. (After[12.22], p. 11.)

12.2.1.2 Bridging Fault ModelsShort circuits in VLSI are called bridging faults because of their cause, which isusually improperly present conducting "bridges" between physically adjacent lines.Because the small size of modern circuit components makes lines very close together,bridging faults are fairly common. Bridging fault models typically assume that theeffect of a short is to create a logical AND or a logical OR between the lines that areshorted together. An AND results when the circuit characteristics require bothinputs to be high for the shorted lines to be forced high. An OR results when thecircuit characteristics allow the lines to be forced high if the input to either line ishigh. Usually, the resistance of a bridge is assumed zero, although this assumptionmay not actually hold in practice [12.22]. Bridging fault modeling is more complicated when the resistance of the short is to be accounted for. High resistance shortsmay result in degraded noise resistance or other degradations in circuit performancewithout affecting logical levels [12.22]. Sometimes, bridging faults can convert acombinational circuit into a sequential one, leading to oscillations or other sequen

tial behaviors. Stuck-at testing covers many, but not all, bridging faults [12.23].To illustrate a case where all stuck-at faults can be detected by a set of testvectors but a bridging fault would be missed, consider the circuit of Fig. 12.5. Thetest vectors 0110, 1001,0111, and 1110applied to inputs A, B, C, and D (a test vectorA--------f8--------4

c - - - - - - - - - ;o ---------;I '>----z

Figure 12.5 A circuit used to illustratethe effects of a bridging fault (see text).


11/45

Section 12.2 Testing: General Concepts 457describes the value applied to each input) will detect all stuck-at faults. However,since all the test vectors apply the same value to inputs Band C, a bridging faultbetween Band C will not be detected.

12.2.1.3 Open Fault ModelsThe major VLSI defect types are shorts and opens. Usually, opens are assumedto have infinite resistance. Leakage current can be modeled with a resistance [12.22].Opens can be modeled with a resistance and a capacitance connected in parallel.In NMOS circuits, open faults may be modeled as stuck-at faults, but opens in

CMOS circuits cannot, and, in fact, such circuits will often have sequential behavior[12.22].

12.2. 1.4 Delay Fault ModelsA delay fault causes signals to propagate more slowly than they should.Detection may occur when this delay is great enough that signal propagation cannotkeep up with the clock rate [12.24]. Two fault models that account for delay faultsare the single-gate delay fault model and the path-oriented delayfault model.Single-gate delay fault models attempt to account for the effects of individualslow gates. Path-oriented delay fault models attempt to account for the cumulativedelay in a path through a circuit. Gate-level models often work better for largecircuits because the large number of paths that can be present can make pathoriented approaches impractical [12.25].

12.2.2 Fault CollapsingA circuit with P lines can have as many as p3_1 possible multiple stuck-at faultsalone. It is difficult and time consuming to test for a large number of possible faultsand, in practical terms, impossible for a circuit of significant size. By "collapsing"equivalent faults into a single fault for which to test, the total number of faults to testfor can be decreased. Faults that are equivalent can be collapsed [12.12]. Faults areequivalent if they have the same effects on the outputs and, therefore, cannot bedistinguished from each other by examining the outputs. Therefore, a test vector that

detects some fault will also detect any equivalent fault. As a simple example, considera NAND gate with inputs A and B and output Z. Under the stuck-at fault model,each of A, B, and Z may be working, stuck-at 0, or stuck-at 1, implying33 - 1 = 27 - 1 = 26 possible multiple stuck-at faults (considering a single stuckat fault to be one variety of a multiple stuck-at fault). But note that, if eitherinput is stuck-at 0, the output Z will have the value 1. Therefore, input A stuckat-O, input B stuck-at-O, and output Z stuck-at-l are equivalent, in addition to somemultiple stuck-at faults, such as A stuck-at 0 and B stuck-at 0, etc.No fault detection coverage is lost by collapsing equivalent faults (assuming thatonly the outputs are accessible). However, it might be desirable to collapse somefaults that are not equivalent, saving on testing time at the expense of some loss incoverage. For example, assume the presence of two faults, fl and f2. If any test for flwill also detect f2, but a test for f2 does not necessarily detect fl , then fl dominates


12/45

458 Chapter 12 Testing and Qualificationfl . Occasionally, this term is used oppositely so that f2 would be said to dominate fl[12.12]. As an example, consider the NAND gate with input A stuck-at-I. The faultis detectable at the output only by setting A to 0 and B to 1. The output Z should be1, but the fault makes it o. Note that the same test detects Z stuck-at-D. Another testfor Z stuck-at-O would be to set B to 0, but this test will not detect A stuck-at-I.Therefore, a stuck-at-I fault on A dominates a stuck-at-O fault on Z because everytest (of which there is only one) tha t detects a stuck-at-I fault on A also detects astuck-at-O fault on Z.

Fault equivalence and dominance both guide the "collapsing" of various different faults into one fault so that testing for that one fault also detects the others.Fault collapsing is a useful idea because it reduces the total number of faults thatmust be explicitly tested for in order to obtain a given fault coverage.

12.3 TESTING OF MEMORY CHIPS

Testing ofmemory chips is a well-defined testing task that, in some respects, serves toexemplify testing of conventional chips. Here are some kinds of faults that can causefailure in the storage cells (faults could also appear in other parts of the memory,such as the address decoder). Stuck-at fault (SAF) Transition fault (TF) Coupling fault (CF) Neighborhood pattern sensitive fault (NPSF)In a stuck-at fault, the logic value of a cell is forced by a physical defect to alwaysbe zero (stuck-at-O) or one (stuck-at-I). A transition fault isclose to a stuck-at fault. Atransition fault is present if a memory cell (or a line) will not change value either fromoto 1or from I to o. Ifit won't transition from 0 to 1, it is called an up transitionfault,

and ifit won't transition from 1 to 0, it is called a down transition fault. Ifa cell is in thestate from which it will not transition after power is applied, it acts like a stuck-atfault. Otherwise, it can have one transition, after which it remains stuck. A couplingfault is present if the state of one cell affects the state of another cell. If k cells togethercan affect the state of some other cell, the coupling fault is called a k-coupling fault.One kind ofk-coupling fault is the neighborhoodpattern sensitive fault. If a cell's stateis influenced by any particular configuration of values or to changes in values ofneighboring cells, a neighborhood pattern sensitive fault is present.The following sections discuss some basic tests that have been used to detectmemory faults.12.3.1 The Zero-One Test

This test consists of writing Os and Is to the memory. The algorithm is shown inFig. 12.6. The algorithm is easy to implement, but has low fault coverage. However,this test will detect stuck-at faults if the address decoder is working properly.


13/45

Section 1204 Design for Testability 459write 0 in all cells;readall cells;write 1 in all cells;Figure 12.6 Zero-One algorithm. read all cells;

12.3.2 The Checkerboard TestIn the checkerboard test, the cells in memory are written with alternating valuesso that each cell is surrounded on four sides by cells whose value is different. Thealgorithm for the checkerboard test is shown in Fig. 12.7. The checkerboard testdetects stuck-at faults, as well as such coupling faults as shorts between adjacent cellsif the address decoder is working properly.

write 1 in a/l cells in group 1 and 0 in a/l cells in group 2;read all cells;write 0 in all cells in group 1 and 1 in all cells in group 2;Figure 12.7 Checkerboard test algorithm. read all cells;

12.3.3 The Walking 1/0 TestIn the walking 1/0 test, the memory is written with all Os (or Is) except for a

"base" cell which contains the opposite logic value. This base cell is "walked" orstepped through the memory. All cells are read for each step. The GALPAT(GALloping PATtern) test is like the Walking 1/0 test except that, in GALPAT,after each read, the base cell is also read. Since the base cell is also read, addressfaults and coupling faults can be located. This test is done first with a backgroundof Os to the base cell value of 1, and then with a background of Is to a base cellvalue of o.

12.4 DESIGN FOR TESTABILITY

Design for Testability (DFT) attempts to facilitate testing of circuits by incorporating features in the design for the purpose of making verification of the circuit easier.Generally, the strategy is to make points in the circuit controllable and observable.Here is a more specific, albeit still informal, characterization of testability: itA circuitis 'testable' if a set of' test patterns can be generated, evaluated, and applied in such au'ayas to satisfy pre-defined levels ofperformance, defined in terms offault-detection,.fault-location, and test application criteria, within a pre-defined cost budget and timescale" [12.26].Factors that affect testability include difficulty of test generation, difficulty offault coverage estimation, the number of test vectors required, the time needed toapply a particular test, and the cost of test equipment. The more complex the circuit,the lower its testability tends to be because, as was discussed previously, observability and controllabili ty decrease. Here, observability is defined as the ease with


14/45

460 Chapter 12 Testing and Qualificationwhich the state of a test point in question can be determined by observing otherlocations (usually outputs), and controllability is defined as the ease with which a testpoint in question can be caused to have the value 0 or 1 by controlling circuit inputs[12.27]. There are various methods of design for testability. Some of these methodsare reviewed in the following sections.

12.4.1 Scan DesignScan design uses extra shift registers in the circuit to shift in test input data topoints within the circuit and to shift out values inside the circuit. The shift registersprovide access to internal points in a circuit. Test vectors may be applied using thesepoints as inputs and responses to tests may be taken using these points as outputs.The shift register may consist of D flip-flops (i.e., latches) that are used asstorage elements in the circuit. They are connected using extra hardware into a

"scan chain" so that, in the test mode, test vectors can be shifted in serially, andso that the internal state of the circuit, once latched into the latches in parallel, canbe serially shifted back out so that the state can be observed from outside (see Fig.12.8).

OutputsCombinationallogic

Inputs

Scan inputScan select- - - - - - - - - + - - - - - - - - - - 'Clock

Scan output ~ - f - 1 " - - ' ; ~ - " ' "

Figure 12.8 Generalized sequential circuit with multiplexed scan design.


15/45

Section 12.4 Design for Testability 461Thus, the latches themselves can be tested, the outputs of the latches can be setindependently of their inputs, and the inputs to the latches can be observed.12.4.1.1 Scan Path and Multiplexed Scan Design TechniqueA multiplexer is connected to each latch, and an extra control line, the scanselect, is used to set the circuit for scan (test) mode. When the scan select line is off,the multiplexers connect the lines from the combinational logic to the latches so thatthe circuit works normally. When the scan select line is on, the latches are connectedtogether to form a serial in, serial out (SISO) shift register. The test vector can nowbe input by serially shifting in the test vector. The test output can be output byshifting it serially out the scan output , that is, the last latch's output .Here is a summary of the method:1. Put the circuit into scan mode by inputing 1 on the scan select line.2. Test the scan circuitry itself by shifting in a vector of 1s, and then a vector of

Os, to check that none of the latches have stuck-at faults.3. Shift a test vector in.4. Put the circuit in normal mode by inputing a 0 on the scan select line. Applythe primary inputs needed for that test vector, and check the outputs.5. Clock the latches so that they capture their inputs, which are the circuit'sinternal responses to the test.6. Put the circuit into scan mode and shift out the captured responses. For

efficiency, clock in the next test vector as the responses to the previous oneare clocked out. Check the responses for correctness.7. Apply more test sequences by looping back to step 4.Scan design has some disadvantages. These include:

a. Additional circuitry is needed for the scan latches and multiplexers.b. Extra pins are needed for test vector input and output, and for setting thecircuit to scan mode or normal mode.c. The circuit operation is slower than it would otherwise be, because of theextra logic (e.g., multiplexers) which signals must traverse.

12.4.1.2 Level-Sensitive Scan Design (LSSDjIn level-sensitive scan design, state changes in the circuit are caused by clockvalues being high, rather than transitions in clock values (edges). To reduce thepossibility that analog properties, such as rise and fall times and propagation delays,

can lead to races or hazards, level sensitivity can be a useful design criterion. Anotherpositive characteristic of level sensitive design is that steady-state response does notdepend on the order of changes to input values [12.26]. The basic storage elementused in circuits that adhere to LSSD is as shown in Fig. 12.9.The clock values (note there are three of them) determine whether the storage

element is used as a normal circuit component or for test purposes. To form a scanchain, the double latch storage elements are connected into a chain configuration,whereby the L2 output of one element feeds into the LI input of the next element.


16/45

462

Data inputClock 1Scan data input - - - ~Scan clock

Latch 1

Chapter 12 Testing and Qualification

...... . - - - - - - - - -........ System dataoutput

Clock 2Latch 2 Scan dataoutput

Figure 12.9 Schematic diagram of a LSSD storage element.This chain configuration is activated only during test mode and allows clocking in ofa series of values to set the values of the elements in the chain. The scan chainconfiguration is illustrated in Fig. 12.10.

For proper operation of a level sensitive circuit, certain constraints must beplaced on the clocks [12.28], including:

Inputs

Scan output

Scan inputClock 1Scan clockClock 2

- -.. =-- Combinational -- logic -:--t!H- L2 L1 J=lI

:=L2 L1~ - - - .I...

F I L2 L1 ,.-614 ---::L2 L1 t+ -~ I

t

Outputs

Figure 12.10 Sequential circuit with level sensitive scan design.


17/45

Section 12.4 Design for Testability 4631. Two storage elements may be adjacent in the chain only if their scan-related

clocks (Scan Clock and Clock 2 in Fig. 12.10) are different in order to avoidrace conditions.2. The output of a storage element may enable a clock signal only if the clockdriving that element is not derived from the clock signal it is activating[12.28].

12.4.1.3 Random Access ScanIn random access scan, storage elements in the circuit can be addressed individually for reading and writing [12.26]. This is in contrast to other scan designapproaches, such as level-sensitive scan design and scan path design, described pre

viously, in which the test values of the storage elements must be read in sequentiallyand iteratively passed down the shift register, formed by the chain of storageelements, until the register is full. In random access scan design, storage elementsare augmented with addressing, scan mode read, and scan mode write capability(see Fig. 12.11).

Outputs

Addressablestorageelements

Combinationallogic

Inputs

Scan input----_...Clock ......__ - - - . . . . . . - . . : - - - - - ..... Scan outputAddressdecoder

Scan address - - - - - ~Scan clock

Figure 12.11 Sequential circuit with random access scan design.

An address decoder selects a storage element which is then readable or writablevia the scan input and output lines. A disadvantage of random access scan design isthe extra logic required to implement the random access scan capabilities. Anotherdisadvantage is the need for additional primary input lines, for example, the addresslines for choosing which storage element to access [12.28].


18/45

46 4 Chapter 12 Testing and Qualification12.4.1.4 Partial ScanFully implemented scan design requires substantial extra chip area (about 30%)for additional circuitry [12.29]. If, however, only some of the storage elements in thecircuit are given scan capability, the extra area overhead ca n be reduced somewhat.Where full scan design involves connecting all latches into a shift register, called the

"scan chain," in partial scan, some are excluded from the chain [12.29]. Partial scantest vectors are shorter than those that would be needed for a full scan design, sincethere are fewer latches to be manipulated. Test sequences tend to be shorter, since,because the test vectors are shorter, there are fewer of them. Also, since, in partialscan, some storage elements in the circuit cannot be read/written via the scan circuitry, an d since the importance of test access to a latch depends on its role in thecircuit, a particular partial scan design must make an intelligent choice of whichstorage elements should be in the scan path.Partial scan, compared to full scan, leads to reduced area and faster circuitoperation. Th e speed-up in circuit operation occurs because the storage elements

that are in critical paths may be left ou t of the scan path so as no t to slow down thosepaths.

12.4.2 Built-In Self-TestBIST (built-in self-test) is a class of design-for-testability methods involvinghardware support within the circuit for generating tests, analyzing test results, and

controlling test application for that circuit [12.30]. The purpose is to facilitate testingan d maintenance. By building test capability into the hardware, the speed an d efficiency of testing ca n be enhanced. BIST techniques have costs, as well as benefits,however. In particular, the extra circuitry for implementing the BIST capabilityincreases the chip area needed, leading to decreased yield an d decreased reliabilityof the resulting chips. On the other hand, BIST ca n reduce testing-related costs.Test vectors ma y be either stored in read-only memory (ROM) or generated asneeded. Storing them in RO M requires large amounts of RO M an d may be undesirable for this reason. However, it does potentially provide high fault coverage andadvantages in special cases [12.30].

Tw o ways to generate test vectors, pseudorandom an d exhaustive testing, areillustrated in the following sections. Pseudorandom testing picks test vectors withoutan obvious pattern. Exhaustive testing leads to better fault coverage, bu t is moretime consuming.

12.4.2. 1 Pseudorandom Test GenerationA linear feedback shift register ( LF SR ) c an generate apparently random testvectors. An LFSR is typically made of D flip-flops an d XOR gates. Each flip-flopfeeds into either the next flip-flop, an XO R gate, or both, an d each flip-flop takes asits input, the output of either the previous flip-flop or of an XO R gate. The overall

form of the circuit is a ring of flip-flops an d XOR gates with some connections intothe XOR gates from across the ring because XOR gates have more than one input. Ifthere is no external input to the circuit, it is called an autonomous linear feedback


19/45

Section 12.4 Design for Testability 465shift register (ALFSR) and the output is simply the values of the flip-flops (seeexercise 9 at the end of the chapter). The pattern generated by an LFSR is determined by the mathematics of LFSR theory (see [12.30] for a brief description and[12.31] for a detailed treatment), and LFSRs can generate test vectors that arepseudorandom (or exhaustive).

12.4.2.2 Pseudoexhsustive TestingTesting exhaustively requires, given a combinational circuit with n inputs, providing 2'1test vectors of n bits each (in other words, every possible input combination). Pseudoexhaustive testing means testing comprehensively, but takingadvantage of circuit properties to do this with less than 2n input vectors.If the circuit is such that no output is affected by all n inputs, it is termed a

partial dependent circuit and any given output line can be comprehensively testedwith less than 2n input vectors. The exact number depends on how many inputs affectthat output line. If k inputs affect it, then 2k vectors will suffice, comprising everypossible combination of values for the inputs that affect that output, with the valuesfor the other input lines being irrelevant (to testing that output line). Each outputline may be tested in this way. Thus, if the circuit has 20 inputs and 20 outputs, buteach output relies on exactly 10 of the inputs, 210 tests for each of the 20 outputsimplies that 20 x 210, or approximately 20,000, tests can be comprehensive, compared to 220 , or approximately 1,000,000, tests for an exhaustive testing sequence,which would be no more comprehensive.

Other pseudoexhaustive techniques can improve on this even more. For example, if there are two input lines which never affect the same output line, they canalways be given the same value with no decrement in the comprehensiveness of thetest sequence. More generally, test vectors for testing one output line can also beused for other output lines, reducing the number of additional test vectors that mustbe generated for those other output lines. An approach to doing this is described in[12.32].As a concrete example, Fig. 12.12 illustrates a partial dependent circuit. Thecircuit shown has an outputrwhich is determined by inputs lV and x, and an output gwhich is determined by inputs x and y. Neither output is affected by both iv and y, sonothing is lost by connecting x and y together so that they both always have the samevalue. With that done, now only four vectors, instead of 23 = 8, provides an exhaustive test sequence.When a circuit is not partial dependent (that is, some output depends on allinputs), the circuit is termed complete dependent. In this case, pseudoexhaustivetesting may be done by a technique involving partitioning the circuit [12.33]. Thismethod is more complex.

..---g1100y----t

1010 x

1100w- - - " " " ' "

Figure 12.12 Example ofa partial dependent circuit (after [12.33], p. 543).


20/45

466 Chapter 12 Testing and Qualification12.4.2.3 Output Response AnalysisConsider a circuit with one output line. Checking for faults means checking theresponse sequence of the circuit to a sequence of tests. One possibility is to have afault dictionary consisting of the sequence of correct outputs to the tests. However,this is impractical for a complex circuit due to the large amount of data that wouldneed to be stored. One way to address this problem is to compact the responsesequence so that it takes less memory to store. The compacted form of an outputresponse pattern is called its signature. This concept is known as response compres-

S;OI1 [12.34]. Since there are fewer bits in the signature than in the actual outputsequence, there are fewer possible signatures than there are actual potential outputs.This results in a problem known as aliasing. In aliasing, the signature of a faultycircuit is the same as the signature of the correct circuit. The faulty output signatureis then called an alias. Aliasing leads to a loss of fault coverage. One approach tousing compaction is "signature analysis," described next.

12.4.2.4 Signature AnalysisSignature analysis has been a commonly used compaction technique in BIST.An LFSR (linear feedback shift register) may be used to read in an output response

and output its signature, a shorter pattern determined by the test output responsepattern.Since the signature is determined by the test output pattern, if a fault results in adifferent test output pattern, then the fault is likely (but not certain) to have adifferent signature. If a fault has a different test output pattern, but its signature isthe same as the proper test output, aliasing is said to have occurred. Aliasing reducestest coverage. Figure 12.13 depicts an LFSR with an input for the test responsepattern and contents which form the signature.Many circuits have multiple output lines, and for these, the wayan LFSR isused for signature generation must be changed. One way is to feed the differentoutput lines into different points in the LFSR simultaneously (see Fig. 12.14). An

LFSR contents0321

.- - A"--

04

Output -----IIresponse

Clock

OFF OFF OFF OFFOut

Figure 12.13 A linear feedback shift register based signature analyzer (DFf"'= D flip-flop). (After [12.34], p. 26.)


21/45

Section 12.4 Design for Testability 467Z1 Z2 Z3 Z4

OFF

Clock Q1 02 03 04Figure 12.14 A linear feedback shift register based signature analyzer for a circuitwith four output lines (Zt-Z4). (After [12.34], p. 27.)

alternative approach uses a multiplexer to feed the value of each output line, in turn,into a one-input LFSR, a process which must be followed for each test input vector.

12.4.2.5 BIST Test Structures I: Built-In Logic Block Observation (BILBO)BILBO has features of scan path, level-sensitive scan design, and signatureanalysis. A BILBO register containing three D flip-flops (latches, labeled DFF),one for each input, appears in Fig. 12.15. ZI, Z2, and Z3 are the parallel inputs

to the flip-flops and Ql, Q2, and Q3 are the parallel outputs from the flip-flops.Control is provided through lines Bl and B2. If Bl == 1 and B2 == 1, the BILBOregister operates in the function (nontest) mode. If BI == 0 and B2 == 0, the BILBOregister operates as a linear shift register and a sequence of bits can be shifted in fromSin to serve, for example, as a scan string. If B1 == 0 and B2 == 1, the BILBO registeris in the reset mode and its flip-flops are reset to o. If Bl == 1 and B2 == 0'1 the BILBOregister is in the signature analysis mode and the MUX is set to select Sout as theinput to Sin, forming a linear feedback shift register (LFSR) with external inputs Zl ,Z2, and Z3. The reader is referred to Fig. 12.16 and Problem 12.7 at the end of thechapter.The BILBO approach relies on the suitability of pseudorandom inputs fortesting combinational logic. Therefore, when the BILBO control inputs cause it tooperate in the signature analysis mode, that is, to be an LFSR, the pseudorandompatterns it produces can be used as test vectors. For example, Fig. 12.17 shows acircuit with two combinational blocks, testable with two BILBO registers.In Fig. 12.17, the first BILBO is set via input vector pn to generate pseudorandom test vectors for the combinational block into which it feeds. The second BILBOis set via input vector sa for signature analysis purposes. The first BILBO is, therefore, used to apply a sequence of test patterns, after which the second BILBO is usedto store the resulting outputs of the combinational block, followed by scanning outof those outputs (the signature). When combinational block 1 has been tested, block2 can be tested similarly by simply reversing the roles of the BILBO registers.BILBO has an interesting advantage over many other types of scan discipline.Using BILBO, if N test vectors are applied before scanning out the results, the


22/45

e c

Z

Z

Z

So

OF

OF

OF

MU

8

'I

,

I

,

8

I

I

I

Sin

eKI

'I'

I

'

0

0

0

Fge11BILOrese


23/45

Section 12.4 Design for TestabilityZ1

'-- OFF

Q1

Z2

- OFF

(a)

-

02

Z3

- - - OFF

03

469

OFF

(b)

OFF OFF Sout

Z1 Z2 Z3

OFF OFF OFF

(c)Figure 12.16 Three BILBO register modes.

BILBO Combinationalnetwork 1 BILBO Combinationalnetwork 2

PN Gen - 4 ~ - - - - - - - - - . ~ SA RegFigure 12.17 BILBO registers configured to test combinational logicblocks.


24/45

470 Chapter 12 Testing and Qualificationnumber of scan outs for those N vectors is 1, compared with the N scan outs requiredby other scan disciplines. However, BILBO requires more circuitry than LSSD, andit has relatively more signal delays because of the gates connected to the flip-flopinputs [12.28].

12.4.2.6 Circular Self-Test Path (CSTPJCSTP connects some (or all) storage cells in the circuit together, forming onelarge circular register [12.35]. A cell of the circular register ma y contain one D flipflop or two arranged as a master an d slave. Th e cells form a feedback shift register;hence, the use of the term "circular." The circular path is augmented with a gate atthe input of each cell that, during the test mode, XORs the functional input from thecircuit that would be the sole input during nontest circuit operation, with the output

of the preceding register in the circular path. This causes the outputs of the flip-flops,during the test mode, to change in a difficult to predict way, so that they can be usedas test inputs to the circuit. When operated in the normal mode, the cells feed inputsthrough to the combinational blocks. When operated in the test mode, the cells feedtest values into the combinational blocks. Once the test pattern has propagatedthrough the circuitry, the response is fed into the circular register, which compactsthe response into a signature. The test response is combined with its present state viathe XOR gates to produce its next state an d next o ut pu t. T he circular path can nowapply the next test vector which is its current contents. After repeating this somenumber of times, the register contents can be checked for correctness. Correctnessmight be determined by matching against the contents for a known working circuit,for example. Th e creators of CSTP cite the following as significant advantages ofCSTP:

1. The complexity of the on-chip mode control circuitry is minimized by thefact that a full test can be done in one test session.2. The hardware overhead is low compared to other multifunctional registertest methods, like the BILBO technique, because the cells are simpler as theyneed only be able to load data an d compact data. As a caveat, this assumesthe circuit can be reset into a known state from which to begin testing.

Th e test pattern generated by the circular path is neither pseudorandom no rpurely random, bu t instead is determined by the logic o f the circuit. The authorsdefend this by analyzing the effect of this in comparison to exhaustive testing, that is,applying all possible test input vectors. They concluded that, with a testing time of4X, which would be needed for exhaustive testing, 98% of the possible test vectorswould be applied, an d with a testing time of 8X, 99.9+ % of the possible test vectorswould be applied. The problem of test pattern repetition must be dealt with because,if it occurs, then the entire preceding sequence of test vectors will also then repeat.Then longer test times will result in no improvement in coverage. The authors of thisapproach found that this is unlikely to occur, ca n be identified if it does occur, an dca n be avoided by changing the initial state of the circular register.


25/45

Section 12.5 Other Aspects of Functional Testing12.5 OTHER ASPECTS OF FUNCTIONAL TESTING

471

The self-test methods described previously facilitate functional testing, in which anactual device is tested to ensure that its behavior conforms to specifications. Thiscontrasts with speed testing, in which properly working circuits are sorted dependingupon how fast they will run, and with destructive testing, in which circuits under testare destroyed in a process which aims to find out what the limits of the circuit are. Inthis section, some additional aspects of functional testing, emphasizing MCMs, arediscussed.Functional testing is important, not only for screening out defective units, butfor quality control, production line problem diagnosis, and fault location withinlarger systems. Functional testing occurs after all design rules are satisfied, all designspecifications are met during the simulation and analysis phase, and the physicaldesign goes through part or all of the manufacturing process. In MCMs, functionaltesting is primarily done at the substrate level, die level, and module level.Staged testing, in which proper functioning of each die on an MCM is checkedafter it is mounted but before the next die is mounted, can help catch problems early.Testing of fully assembled units verifies that the completed system works.

12.5.1 Approaches to Testing MCMsTesting methods can be classified as built-in or external. Built-in (e.g., BIST)

approaches may be preferable in some cases. However, this makes the design processmore difficult since it requires extra hardware, beyond the die and their connections,on the MeM. External test methods will be preferable in many cases due to lowerdesign and production costs.Testing methods can alternatively be classified as concurrent or non-concurrent.In concurrent testing, the device is tested as it runs, such as by a testing program thatruns using clock cycles that would otherwise go unused. In contrast, non-concurrenttesting is run on a unit that is not being used. Concurrent testing makes the designtask more difficult, yet can enhance dependability by automatic detection of faultswhen they occur, as is necessary, e.g., for fault tolerance methods requiring on-thefly reconfiguration. Non-concurrent testing is easier and will probably have a role inMCM testing indefinitely.Testing methods can also be classified as static or dynamic. Static testing dealswith DC characteristics of devices that are not actually running. In MCMs, this canbe used for testing substrates prior to die installation. MCM testing also requiresdynamic testing, that is, testing while the MCM is in operation.Still another way to classify testing methods is functional versus parametric.Functional testing involves testing to see if a device can do the things it is supposedto do, that is, perform its functions. Parameter testing is testing to see whethervarious parameters fall within range. For example, a parametric test might measurerise and fall times to check that they will support operation at a specified frequency.

In the following sections, staged testing, in which components of MCMs aretested, is discussed. Additionally, some ways of testing various components ofMCMs and testing of entire MCMs are introduced.


26/45

472 Chapter 12 Testing and Qualification12.5.2 Staged TestingThe general strategy of testing earlier in the construction of a complex circuit,

rather than later, is intended to minimize wasted work, and hence, expense. TakingMCMs as an example, early detection of faults means less likelihood ofmounting dieon bad substrates, less likelihood ofmounting bad die, less chance of sealing MCMswith bad components, less likelihood of selling bad MCMs, less chance of embedding bad MCMs in a larger system, etc. Detection of faults as early as feasible is,thus, an important part of an overall testing philosophy.Increasing the feasibility of early testing has its own costs. In the case ofMCMs,a staged approach to testing, in which each die is tested after it is installed (instead oftesting the wholeMCM after all the die are installed), requires test pads to be locatedon the substrate to facilitate test access to each die. This means using potentiallyvaluable substrate area for the pads, a more complex substrate design, and potentially slower operation due to the capacitance and crosstalk increase caused by theextra metal in the pads and the conductance paths that lead to them.Taking the early testing strategy further, each die might be tested prior toinstallation. This would not completely eliminate the need for testing it after installation, and hence, the need for test pads, because die can be damaged by the installation process, but it would avoid performing the installation process on a die that isalready bad. Unfortunately, the cost of this is high because testing a die prior toinstallation is a difficult problem in itself. In fact, this problem has a name: the

known good die (KGD) problem [12.3]. This important problem is described laterin the chapter.

12.5.3 MCM Substrate TestingMCM substrates are like miniaturized printed wiring boards in that they connect together all the component parts of the MCM, as well as serve as a platform onwhich to mount the parts. They should be tested for defects before mounting ICs onthem, because it is relatively easy to do and because of the substantial cost of going

through the rest of the fabrication process. This cost would be wasted if the substratewas bad.12.5.3.1 Manufacturing Defects in MCM SubstratesThe substrate contains nets that should be tested for opens and shorts. Thesenets terminate at the substrate surface in pads to which components such as die willbe connected. The connections may employ wire bonds, flip chip bonding technology, or tape automated bonding (TAB). While many pads are used as connections todie, some are used to connect with the pins of the MCM. A net may be tested foropens, shorts to other nets, and high resistance opens or shorts by probing the testpads. High-frequency test signals can be applied to test for characteristics such asimpedance, crosstalk, and signal propagation delays.There are a number of approaches to testing nets which are reviewed in the

following paragraphs. Each has its own advantages and disadvantages. Theseapproaches may be classified into the two broad categories: contact and non-contactmethods.


27/45

Section 12.5 Other Aspects of Functional Testing12.5.3.2 Contact Testing

473

In contact testing, a substrate is tested by making physical con tac t with thepads. Resistance and capacitance measurements are performed using probes to con-tact the pads and locate opens, shorts, and high resistance defects in the nets. Forexample, a net demonstrating an unexpectedly low capacitance likely has a break init. As another example, by moving two probes to two pads, the tester can verify thatcontinuity exists or that no short exists, as desired.12.5.3.2.1 Bed-of-nails testing. Bed-of-nails testing uses a probe consisting of an

array of stiff wires. Each wire contacts a different pad on a device, so that all (ormany) of the pads needing to be probed are contacted by a different wire at the sametime. Multiplexing allows the testing device to select which wires to use for sendingor receiving test signals, allowing measurements of resistance or impedance betweena pair of pads or between any two sets of pads.Suppose there are N nets on a substrate to be tested, and Pk pads in the kth net.The number of tests required to certify the kth net for opens is pk - 1.Therefore, thetotal number of tests to certify all N nets on the substrate for opens is E(Pk - 1).Given an average of p pads per net, then N(p - 1) tests are needed to tes t for opens.To test for shorts, each net must be checked for infinite resistance to each other net,unless auxiliary information about the spatial layout of the MCM is available whichwill allow the testing procedure to skip testing nets that are spatially far apart. In theabsence of such information, N(N - 1)/2 tests for checking shorts on the substrateare needed (provided the nets have no opens). As an example, suppose a substratehas 100 nets with an average of 5 pads per net. Then there are 5 x 100 = 500 testsneeded for open circuit testing, and 100 x (100 - 1) = 9900 tests needed for short-circuit testing. The number of tests needed for short-circuit testing increases quicklywith the number of nets. As the number of tests becomes high, bed-of-nails testprobes save increasing amounts of time because the probe need not be movedfrom place to place as each pad is already connected to one of the probes in thebed of nails. Packages for which the test pads form a regular grid with a fixed centerare better suited to bed-of-nails testers than idiosyncratic arrangements of padsbecause idiosyncratic arrangements require the probe head to be custom built[12.36]. Packages with small, densely packed pads are harder to use with bed-ofnails testers because the probe becomes more complex and expensive to make.

Because bed-of-nails testers are relatively complex and expensive, yet the probeneed not be mechanically (therefore slowly) moved around for each separate test,bed-of-nails testing ismost suited to situations requiring the testing of a large volumeof circuits quickly, so that the high cost is distributed over many tested circuits[12.36].

12.5.3.2.2 Single-probe testing and two-probe testing. Nets are separated bynon-conducting, dielectric material. This implies that a capacitance exists betweena pair of nets or between a net and the reference plane. If a testing procedure appliesan AC signal to it, typically from 1KHz to 10 MHz, the impedance can be measured[12.37]. This measurement can be compared with a corresponding measurement fromanother copy of the same device which is known to be good, or perhaps with astatistical characterization of the corresponding measurement from a number of


28/45

474 Chapter 12 Testing and Qualificationother copies of the device. Lower than expected capacitance suggests an open circuit,while higher than expected capacitance suggests a short circuit.To check for shorts, one measurement for each net is required. To check foropens, one measurement for each pad is required. If doubt exists as to whether theflow of current created by application ofAC represents only the normal capacitanceof the net or includes a high resistance short, an AC signal of a different frequencycan be applied. The difference in current flow 11-12 that this creates will be a functionof the capacitance C, the frequencies F 1 and F2, and the resistance R. If R is infinite,then 11/12 == F t / F2, and any deviation from this is due to resistance (and inductance).Single-probe testing is not as affected as bed-of-nails testing by high pad densityor small pad size, but there are also some disadvantages [12.36]. One disadvantage isthat, if nominal test values are derived from actual copies of the circuit, design faultswill not be detected. Another disadvantage is that, if the substrate has pads on bothsides, then it must be turned over during the testing process.Two-probe testing has all the capabilities of one-probe testing and then some, atthe price of a modestly more complex mechanism that can mechanically handle twoprobes at once. Shorts can be isolated to the two offending nets by probing both ofthem at once.With single- and dual-probe testers, the probes must be mechanically movedfrom pad to pad. This limits the speed of testing [12.36]. To maximize speed, the totaltravel distance of the probes must be minimized. An optimal minimization requiressolving the famous Traveling Salesman Problem, a known intractable problem.Flying probe technologies are becoming more popular as control of impedancesof lines in a substrate becomes more important due to modern high signal frequencies. Flying probe heads provide control over the impedance of the probe itself, tofacilitate sensitive measurements of the nets [12.31].

12.5.3.3 Non-Contact TestingTesting using probes that make mechanical contact with pads on a circuit candamage the pads, which, in tum, can prevent good contact between the pad and aconnection to it later in the manufacturing process. This is one reason why a noncontact testing method is attractive. Another reason is that, in some MCM technol

ogies, it is desirable to test the substrate at various stages in its manufacture beforepads are present. This may not be practical with mechanical testers due to the smallsize of the metal areas to be probed. In non-contact testing, electrical properties ofnets are tested without making actual physical contact with them.12.5.3.3.1 Electron beam testing. Electron beam testing (see for example

[12.38]) works somewhat like the picture tube on a television set or computer terminal. A hot, negatively charged piece ofmetal is used as a source of electrons, whichare directed toward a target, which is the circuit in the case of a tester or the screen inthe case of a television. Magnetic deflection coils or electrostatically charged platescan move the beam back and forth and up and down in the case of a television, or inany direction required for a tester. By directing the electron beam at a particularplace on a circuit, a net can be charged up. If the charge then appears on another net,or does not appear on part of the charged net, there is a short or open.


29/45

Section 12.5 Other Aspects of Functional Testing 475Electron beam testing is similar to single-probe testing in some ways, becausethe electron beam is analogous to the single probe. However, because there are nomoving parts, it can operate much faster than a mechanical device. Another difference is that the electron beam is DC, whereas, single-probe testers typically use AC.However, both varieties of tester rely on the capacitance of the circuit structures tohold charge, and thus, both can mistake high resistances as shorts.A disadvantage of electron beam testing" not shared by contact methods, is theneed for the circuit to be in a vacuum chamber. This can mean a delay of minutes to

pump out the air in the chamber before the testing process can begin. One solution tothis is to have an air lock on the vacuum chamber. The circuit is placed in therelatively small air lock which can be evacuated much faster than the larger testchamber. After the air lock is evacuated, the circuit is moved into the test chamberproper, which has been in a vacuum all along.Electron beam testers appear to be entering the current marketplace.Unfortunately, the price is in the million dollar range [12.39].

12.5.3.4 Wear ofMCM SubstratesThe substrate contains the wiring used to connect all the other components onthe MCM. Improper fabrication can lead to gradual corrosion of nets, leading tofailure. Once properly manufactured and found working, however" reliability hasbeen tested and found remarkably high. Roy [12.2] subjected MCM-D (deposited),

HDI (high-density interconnect) MCM substrates to HAST (highly acceleratedstress tests) for thermal, moisture resistance, salt atmosphere, and thin film adhesionreliability characterization and found that MIL-STD-883C and JEDEC-STD-22reliability standards were easily exceeded, with expected substrate lifetimes of over20 years.12.5.4 Die Testing

An MCM is populated with unpackaged chips (bare die) which are mounted onthe substrate. These bare die should be good, because, if they are not, there issubstantial extra cost involved in removing and replacing them. This is a problembecause bare die are not widely available in tested form since fCs are usually testedby the manufacturer only after they are mounted in a typical one-die package. Thereis more than one reason for this:

1. It is much easier to test a packaged chip than an unpackaged bare die.2. Manufacturers make much of their money from the packaging" and so are

not very interested in selling the unpackaged die.3. Manufacturers prefer not to sell untested bare die because they may not onlyrisk their reputation for reliability, but fear the MCM manufacturer mightdamage die during their own testing and then blame the die supplier forsupplying bad die! Such concerns are real.fCs intended for mounting on an MCM may also be designed differently fromICs intended for standard use. Because they are so close together, the paths between


30/45

476 Chapter 12 Testing and Qualificationthem will tend to have low capacitance, meaning that the die can be designed withlow power drivers. It is more difficult to test such die because their loads must havehigh impedance to match the drivers [12.36]. Another MCM-specific testing difficultyis that manufacturers sometimes change the chip dimensions without warning,requiring the MCM maker to reactively change their test setup on short notice.As discussed elsewhere in this book, die yield has a major impact on MCMyield. In fact, the yield of the MCM will be significantly lower than the yield of thedie it contains. Furthermore, the rework required in removing and replacing bad dieis expensive. So verification of bare die before mounting is important despite thedifficulties.

MCMs are usually intended to operate at high frequencies, and so highfrequency testing is an important part of an MCM test strategy. High-frequencytesting is more difficult than standard testing due to the interference posed by theimpedances in the test equipment.12.5.4. 1 Chip CarriersA chip carrier (see Fig. 12.18) is a die package which is close in size to the die itcarries. Simple in principle, it connects to densely packed perimeter bond pads on adie and runs leads to a less densely packed area array about the size of the die itself.

Figure 12.18 Ball grid array chip carrier. One side has a perimetermounted set of contacts used to connect to the correspondingperimeter pads on a bare die, which is "carried" or mountedon that side of the chip carrier. The other side contains a 2-Darray of contacts which are physically more widely spaced,larger, and more robust than the small pads on the chip,facilitating testing and other handling operations. [Courtesyof Pacific Microelectronics Corporation.]


31/45

Section 12.5 Other Aspects of Functional Testing 477This less densely packed area array package provides a surface mount device (SMD)that is easily assembled into test sockets or directly onto MCM substrates. Thisprovides easier access to the I/O ports for either testing or mounting on MCMsubstrates than is provided by the bare die, yet does not change the area of thedevice to be mounted significantly because the area array is layered over the dieitself. Easier testing means die that are not yet mounted on a substrate can be testedbefore mounting, thus helping to address the problem of providing known good die(KGD) [12.3]. If the chip carrier with its mounted die passes the tests, the entirecarrier package may be mounted as it is on an MCM substrate, with connectionsbetween the substrate and the die mediated by the area array provided by the carrier.The carrier is thus a permanent package which is acceptable for mounting on anMCM because its size is not significantly larger than the die it contains. Problemsinclude the fact that getting from the die to the MCM substrate then requires twoconnections, one from the die to the carrier and one from the carrier to the substrate ..instead of just one from the die to the substrate as it would be without the carrier.The problem with this is that it leads to some loss in reliability of the connections,and hence, lowered yield, since now there are two connections that must be madesuccessfully instead of just one. Chip carrier philosophy and current technology isreviewed by Gilleo [12.40].

12.5.5 Bond TestingSeventy to eighty percent of MCM faults are interconnection faults (assuming

use of known good die, which often is not actually the case) [12.41], so this kind oftesting is useful, even though it does not directly target faulty die. Interconnectionfaults are faults in the connections between a die and the substrate. Testing forinterconnection faults is the responsibility of the MCM manufacturer.Open bonds could be tested by applying a probe to the net on the MCMsubstrate to which it is supposed to be attached, and measuring the capacitance.A properly working bond will cause the measured capacitance to be the sum of thecapacitance of the net, the wire bond or other bond material, and the capacitance ofthe input gate or output driver on the die to which it makes a connection. Thecapacitance measurement is facilitated by the fact that resistive current will be neg-ligible in CMOS circuits, which are the most common kind. For ordinary die, inputgate capacitances run about 2 pF, whereas, output drivers have capacitances on theorder of lOOpF. Die made specifically for mounting in MCMs do not need powerfuloutput drivers, so their output capacitances can be significantly less, but such die arenot generally available at the present time. Thus, open bonds that disconnect outputdrivers should be relatively easy to detect. However, if an output driver and an inputgate are both bonded to the same net, the presence of the higher capacitance outputdriver could dominate the capacitance measurement, precluding a reliable conclusionabout whether the bond to the input gate is open or not. However, with respect to agiven net, a die with an input gate could be mounted on the substrate before a diewith an output driver to be bonded to the same net, so that bonds to low capacitanceinput gates can be capacitance tested before high capacitance output drivers arepresent, if this bond testing approach is to be used. This would be a form of stagedtesting.


32/45

478 Chapter 12 Testing and QualificationThis form of staged testing does have its drawbacks, mainly due to the fact thatsome bonds of a given die will need to be created before, and others after, bonds ofother die are made. Process lines are better suited to handling all of the bonds to adie in one stage before going on to another die. Yet flexible, integrated manufacturing lines should become increasingly viable in the future as automation increasinglyassists manufacturing processes.A more serious drawback is that this approach will not work with flip chipprocesses. Mechanical testing of a particular kind of bond, wire bonds (which aresmall wires going between a pad on a die and a pad on an interconnect), involvespulling on it to make sure it is physically well at tached at both ends.

12.5.6 Testing Assembled MCMsEven when all components (substrate, die, bonds, pins, etc.) are working properly, they still may not necessarily interact properly. Thus, a complete MCM or othersystem must be tested even if its components are known to be good.Working parts may also become bad during the process of assembling a largersystem, and so, it is useful for the system to support testing of its parts even if theywere tested before assembly, and especially if they were not tested, or not completelytested. MCMs are a good example of such systems due to the difficultyof testing some

of the component parts prior to assembly. Components of completed MCMs can behard to test because it is hard to observe states of the interconnects, which are hiddenwithin the MCM and are much smaller than interconnects on printed wiring boards.Thus, a module must be tested after it is assembled. This requires testing ofcomponents in case they were damaged during assembly, even if they were known tobe working prior to assembly. A burn-in may be provided to help detect latentdefects.Various test strategies are suited for testing assembled MCMs. The test strategychosen will be affected by built-in testability features, if any, die quality, substratequality, reliability requirements, fault location coverage requirement, and reworkcost [12.36].Testing exhaustively is equivalent to fully testing of each module componentplus verifying that the components work properly together. This provides high faultcoverage. However, it is impractical for all but certain MCM designs, such as a busconnected microprocessor and SRAM, due to the complexity of fully testing allinternal logic from the MCM pins (12.36].Building the MCM using components with built-in testability features can facilitate MCM testing significantly. Ways to incorporate testability include boundaryscan, built-in self-test (BIST), and providing external access to internal I/O (e.g.,through test pads, as discussed in the following section). Methods such as these facilitate fault detection coverage, fault location coverage, and faster and simpler testing.

12.5.6. 1 Test PadsThe testability of an MCM can be increased by bringing out internal nets to theexterior of the package. This is done by having test pads, which are small contactpoints on the outside of the MCM package that are each connected to some internal


33/45

Section 12.5 Other Aspects of Functional Testing 479net. This provides observability and controllability to internal connections not connected to any of the MCM pins. This can help test engineers to isolate internal faults.Test pads can be connected directly to each net in an MCM. This makes the MCMtesting problem analogous to the printed wiring board test problem, in that all netsare accessible for probing. Unfortunately, while test pads connected to all or manynets in an MCM for testability may be feasible to manufacture, they have somedrawbacks. These drawbacks include the following:

Test pads increase capacitance and crosstalk, adversely affecting performance. Test pads can be hard to access for test purposes simply because of theirnecessarily small size (4 mils might be a typical test pad dimension) andbecause all such pads must be crammed into the small external area providedby the MCM package. Each of these issues is addressed in some detail in thefollowing sections.12.5.6.1.1 Test pads and performance. While test pads are useful in MCM testing, they have the disadvantage of decreasing performance. Therefore, one approachto avoiding this trade-off would be to build test pads on the MCM, use them fortesting, and when testing is concluded, remove the pads. A fabrication step thatchemically etches away the exposed pad while leaving everything else intact wouldbe one approach.12.5.6.1.2 Test pad number and accessibility. More test pads means crowded"smaller, and therefore, less accessible test pads. By providing fewer test pads, the

ones provided could be made larger and, therefore, more accessible. Thus, there is atrade-off between the number of test points and their accessibility. Consequently,progress on MCM testing via test pads must take one of two broad strategies:Strategy J: Better u,'ays to access small pads arranged in dense arrays.Strategy 2: Dealing with an incomplete set of test pads.Regarding strategy 1, here are some ways to access pads:

A small number of probes (for example, two) which can be moved from padto pad efficiently (the moving probe approach). Many probes, one for each test pad, to be applied all'at once. This avoids theproblem of moving probes around from pad to pad, but at the price ofhaving to build a dense, precisely arranged, expensive set of probes (bedof-nails) that can reliably connect to their respective pads. A collapsing col-

umn approach to constructing each probe is one way to do this. Electron beam (E-beam) use. A high technology and nontrivial undertaking.

These methods were discussed in detail in previous sections of this chapter.With regard to strategy 2, here are some possibilities for maximizing testingeffectiveness given limited access to substrate nets. Judicious use of available pads isrequired. Approac

Documents

Testing and Qualification[1]