View
229
Download
1
Embed Size (px)
Citation preview
ELEC516/10 Lecture 101
ELEC 516 VLSI System Design and Design Automation Spring 2010Lecture 10 – Design for Testability
Reading Assignment:
Kang – CMOS Digital Integrated Circuit: Analysis and Design
Chapter 15
ELEC516/10 Lecture 102
Testing your prototype!!!
• Test is time consuming and Test equipment is very expensive!!!!
• Test cost contributes greatly to the cost of the system (20-30% of the chip cost).
• You must think about the test during the design – End-up with untestable chip
– Test your functionality as well as performance
• If you don’t test it, It won’t work!!!!
Prototype Specification?
ELEC516/10 Lecture 103
Introduction• Testing is important, probably as important as the
design process.• Test the chip to make sure it is full functional is
highly complex and time-consuming• Cost of chip debugging is much higher than that of
board-level debugging which is in turn much higher than that of system-level debugging.
• In production environment, many chips must be tested within a short time fro timely delivery to customers.
• Therefore design for testability become very critical.
ELEC516/10 Lecture 104
Testing Classification
• Diagnostic test– Used in chip/board level debugging
– Defect localization
• “go/no go” or production test– Used in chip production
• Parametric test– Voltage and current test, instead of logic test
– Check other parameters such as noise margin (NM), threshold voltage (Vt), delay time (tp) and temperature (T).
ELEC516/10 Lecture 105
Chip Debugging
• Design errors or fabrication defect?• Micro-probing the die• E-beam• Single-die repair
ELEC516/10 Lecture 106
Testing is Expensive
• VLSI tester cost several million dollars (US)• Volume manufacturing requires large number of
testers, maintenance• A lot of time, design company cannot afford this
and a rental model is commonly used. The rent is counted by time usage.
• Tester time costs are in $/sec• Test cost contributes 20-30% to total chip cost.
ELEC516/10 Lecture 107
Types of Testing
Step Error Source Test Type
Design Design flaws Design Verification
Prototype Design flaws/ Prototype flaws
Functional test
Manufacture Physical defects Manufacture Test
Shipping Manu. Test, transport
System Integration
Same Functional Test
Service Stress, Age Diagnosis
ELEC516/10 Lecture 108
Manufacturing defects
• During Manufacturing: misalignment, dust and other particles, “stacking faults”, defects in dielectric, mask scratches, thickness variation: layer to layer shorts, discontinuous wires (open), circuit sensitivities (Vth, Lchannel): found during wafer probe of test structures.
• During packaging: Defects from scratching in handling, damage during bonding misalignment (need always to check the wire bonding), other defects undetected during wafer probe: found during test of packaged parts.
• During mounting: Defects from damage during board insertion(thermal, ESD), infant mortality (mfg defects that show up after a few hours of use). Noise problems, susceptibility to latch-up: found during testing/mounting on board.
• Long term: Defects that appear after months or years of utilization (metal migration, oxide damage during manufacture, impurities): found by the customer
•
• Errors can occur at different stage in the life-time of a chip
ELEC516/10 Lecture 109
Testers for volume manufacturing
• Each pin on the chip is driven/observed by a separate set of circuitry which typically can either:– drive the pin to one data value per cycle – or observe the value of the pin at a particular point in a clock cycle.
• Timing of input transitions and sampling is controlled by a high resolution timing generators
Associated with each pin
Device under test(DUT) is mounted on the test head
ELEC516/10 Lecture 1010
Test Strategy
• The test using the testers is achieved in many steps:– Supply a set of test vectors that specify an input or
output value for every pin on every cycle.
– Tester will load the program into the pin cards.
– Run the program and report any miss-compares between an output value and the expected value.
ELEC516/10 Lecture 1011
Testers for volume manufacturing
Behavioral model
Specification
DesignCycle
Test patternsI/O vectors
Memory
Vcompareerror
Force/Compare
ELEC516/10 Lecture 1012
How many test vectors do we need?
• For exhaustive test: for a digital circuit with 25 inputs and 50 states, 275 cycles are required. Assuming 1us/cycle then test time >109 years.
• Exhaustive test is impractical and unnecessary.– We only need to verify that no faults are present which may
take fewer vectors.– In fact many vectors can test the same fault.
2n inputs required toexhaustively test circuit
2n+m inputs required toexhaustively test the circuit
ELEC516/10 Lecture 1013
Fault Types and Models
• Testing Goal: to detect faults in fabrication, design and failures due to stressful operating conditions and reliability problems
• Test process:– Input test vector to the device under test (DUT) as its
stimuli– Measured outputs are compared with the expected
correct responses to determine the correctness– Difficulty: only system inputs and outputs pins are
accessible– Another difficulty – generation of correct test vectors
to detect all modeled faults and design errors.– Manual or automatic test pattern generator (ATPG)
becomes a difficult task.
ELEC516/10 Lecture 1014
Defect causes
• Physical defects:– Defects in silicon substrate– Photolithographic defects– Mask contamination and scratches– Process variations and abnormalities– Oxide defects
• Physical defects -> electrical faults– Shorts (bridging faults) or Opens– Transistor stuck-on, stuck-open– Resistive shorts and opens– Excessive change in threshold voltage and current
• Electrical faults -> logical faults– Logical stuck-at-0 or stuck-at-1– Slow transition (delay fault)– AND-bridging, OR-bridging
ELEC516/10 Lecture 1015
Fault models
• Traditional models, first developed for board-level tests, assumes that a node gets “stuck” at “0” or “1”, presumably by shorting to GND or Vdd.
• If the output is faulty the entire gate is “stuck”. There are also cases which would correspond to a transistor stuck or stuck-off.
F=(A+B)’What about Fx (F with stuck off fault)?
ELEC516/10 Lecture 1016
Fault Models
• Most Popular – “stuck-at” model
Sa0 (output stuck at 0)
Sa1 (input stuck at 1)
• Covers almost all (other) occurring faults, such as opens and shorts
x
yw
z
1
2
31,3: x sa12: y sa0 or x sa03: z sa1
ELEC516/10 Lecture 1017
Another example
ELEC516/10 Lecture 1018
ELEC516/10 Lecture 1019
Stuck-at fault
• Single stuck at fault models are used frequently– Complexity of test generation is greatly reduced
– Single stuck-at fault is independent of technology, design style
– Single stuck-at tests cover a large percentage of multiple stuck-at fault
– Single stuck-at tests cover a large percentage of unmodeled physical defects
ELEC516/10 Lecture 1020
Delay fault
• Cause timing failures at target speed • Reason for delay fault
– Improper estimation of on-chip interconnect delay and other timing consideration
– Excessive variation in the fab. Process -> variations in circuit delay and clock skew
– Open in metal line connecting parallel transistors
– Aging effects such as hot carrier induced delay increase.
• Detecting delay fault is even more subtle than detecting functional faults in steady state.
ELEC516/10 Lecture 1021
Problem with stuck-at model: CMOS open fault
• Sequential effect: needs two vectors to ensure detection
x y
x
y
z
x y z
0 x 11 1 01 0 zn-1
• Other options: • Use stuck-open or stuck-short models• This requires fault-simulation and analysis at the
switch or transistor level – very expensive
ELEC516/10 Lecture 1022
• Cause short circuit between Vdd and GND for A-C=0 and D = 1
• Possible approach:– Supply Current
Measurement (IDDQ)
– Not applicable for gigascale integration
Problem with stuck-at model: CMOS short fault
A
C
B
D
C
A
D
B
0
0
0
1
ELEC516/10 Lecture 1023
Design for TestabilityCombinational function Sequential function
2n inputs required toexhaustively test circuit
2n+m inputs required toexhaustively test the circuit
• Exhaustive test is impossible or unpractical.• We need to find meaningful vectors to test for possible
faults?????– Not easy because of limited IO and increased complexity
– Concept of: Controllability and observability
ELEC516/10 Lecture 1024
Controllability and Observability
• Controllability – measure of how easy the controller (test engineer) can establish a specific signal value at each node by setting values at the circuit inputs
• Observability - measure of how easy the controller (test engineer) can determine the signal value at any logic node by control values at the circuit primary inputs and observing the primary circuit outputs
• Degree of controllability and observability (testability) can be measured with respect to whether the test vectors are generated deterministically or randomly.
ELEC516/10 Lecture 1025
Path Sensitization
• Step1: Sensitize the circuit: Find input values that produce a value on the faulty node that’s different from the value forced by the fault. For our S-A-1 fault example, we want output of OR gate to be 0.
• Is this always possible? What would it mean if no such input values exist?
• Is the set of sensitizing input values unique? • What’s left to do?
ELEC516/10 Lecture 1026
Error Propagation
• Step2: Fault propagation: Select a path that propagates the faulty value to an observed output (Z in our example)
• Step3: Backtracking: Find a set of input values that enables the selected path.
• Is this always possible? • What would it mean if no such input values exist?
ELEC516/10 Lecture 1027
Testability
• Example of non-testable error:• For x=1 we need both a and =1, What ever the
value of C, one of the three outputs is 1: PB!!!! Two possible propagation of “1” Pb: Fault propagation
ELEC516/10 Lecture 1028
Controllability and Observability
Line 7 cannot be tested at the primary output. Thus this circuit is not fully testable.Reason: reconvergent fanout of line 7
Fault test pattern generation:Fault sensitization – input vector to sensitize a faultFault propagation – condition that propagate the fault to the output so that it can be observed.
ELEC516/10 Lecture 1029
Controllability and Observability
• Circuit with poor controllability– Circuits with feedbacks
– Decoder and clock generator
• Circuits with poor observability– Sequential circuits with long feedback loops
– Circuits with reconvergent fanouts
– Redundant nodes
– Embedded memories such as RAM, ROM, PLA
• Use self test for circuits with poor observability
ELEC516/10 Lecture 1030
Generating and Validating Test-vectors
• Automatic test-pattern generation (ATPG)– For given fault, determine excitation vector (called test
vector) that will propagate error to primary (observable) output
– Majority of available tools: combinational networks only
– Sequential ATPG available from academic research
• Fault simulation– Determine fault coverage of proposed test-vector set
– Simulate correct network in parallel with faulty networks
• Both require adequate models of faults in CMOS IC
ELEC516/10 Lecture 1031
ATPG Process
Fault Selection
Fault Observe Point Assessment
Fault Excitation
Vector Generation
Fault Simulation
Fault Dropping
ELEC516/10 Lecture 1032
ATG for fanout-free combinational circuits
• 2 steps– Activate (excite) the fault from the primary input
• For signal l with stuck-at-v fault, set primary input values such that signal l equal to v’
• Called justification problem – find an assignment of PI vaues that results in a desired value setting on a specified signal in the circuit
– Propagate the resulting error to a primary output• Composite logic values (v/vf), where v and vf are values of the
same signal in N and Nf, where N and Nf are the fault-free circuit, and faulty circuit, respectively.
• Composite logic values (1/0, denoted by D) and (0/1, denoted by D’) represent errors
• We have this logic behavior:– D+D’=1, D.D’=0,D+D = D.D=D,D’+D’=D’.D’=D’, D+0=D,
D’+0=D’,….
ELEC516/10 Lecture 1033
Test generation for the fault l stuck-at-v in a fanout-free circuit
Beginset all values to xJustify(l,v’)if v = 0 then propagate(l,D)else propage(l,D’)
end
ELEC516/10 Lecture 1034
Example
ab
cde
f
gi
h j
Stuck-at-0
ab
cde
f
gi
h j
Stuck-at-011
0x0
D
0 1
D D
ELEC516/10 Lecture 1035
Circuits with Fanout
• Two basic goals: fault activation and error propagation
• Fanout – several ways to propagate an error to PO• Fundamental difficulty
– Reconvergent fanout – the resulting line justification problems are no longer independent
ELEC516/10 Lecture 1036
Example
d
ab
c
e
f1
f2
G1
G2
G3
G4
G5
G6
s-a-1
The only vector that can test the fault is 111x0
ELEC516/10 Lecture 1037
Another Example
s-a-1
a
bcd
ef
h
kl
m
n o
p
q
r s
ELEC516/10 Lecture 1038
Fault SImulation
• Applying a set of vectors to a structural (netlist) description of a design and determining how many and which faults are detected out of the total set of available faults.
• Concurrent fault simulation– Applies the vectors to many copies of the netlist at the
same time.– Each copy contains one or more faults.– Each of these simulations is run concurrently with a
good circuit simulation– If a difference is observed at the legal observation
point between the good circuit and any faulty circuit simulation, the fault is listed as detected
ELEC516/10 Lecture 1039
What can we do to increase testability?
• Increase observability:– Add more pins (???? Can be a problem)
– Add small “probe” bus to selectively enable different values onto the bus
– Compress a sequence of values (for example a value of a bus over many clock cycles) into a small number of bits for later read-out
• Increase controllability– Use Multiplexers to isolate sub-modules and select
sources of test data as inputs
– Provide easy setup of internal states
ELEC516/10 Lecture 1040
Test approaches
• Scan-based testing• Built-in self test
ELEC516/10 Lecture 1041
Scan-based technique
• Minimize the use of additional I/O pins for testing• Use scan registers with both shift and parallel load
capabilities.• Storage cells in registers act as observation points,
control points or both.• Reduce testing of a sequential circuit to that of a
combinational circuit
ELEC516/10 Lecture 1042
Scan the idea
• Two modes of operations: normal and one in which all registers are chained into one long shift register which can be loaded and read-out serially.
Comb.Logic A
reg Comb.Logic A
reg
Scan-in
Scan-out
Scan-based structure
ELEC516/10 Lecture 1043
Scan-based structure
ELEC516/10 Lecture 1044
Scan-path register
scaninscan 2 1
in
out
scanout
keepload
ELEC516/10 Lecture 1045
Scan-based Test- operation
In0
latch
scanintest test
latch
test test
In1
latch
test test
In2
latch
test test
In3
scanoutOut0 Out1 Out2 Out3
Test
1
2
N cycle scan-in
1 cycle evaluation
N cycle scan-out
Testing time per test pattern increases due to shifting time in long register.
ELEC516/10 Lecture 1046
Scan-path Testing
Reg Reg
Reg Reg
+
Reg
>
Reg
in1 in0
out
scanin
scanout
ELEC516/10 Lecture 1047
JTAG – Boundary Scan
• Testing PCB and multichip modules carrying multiple pins
• Shift registers are placed in each chip close to I/O pins in order to form a chain around the board for testing PCB
Scan path
ELEC516/10 Lecture 1048
Buit-In Self Test –The idea• Problem: Scan-based approach is very useful for testing
combinational logic but can be impractical when trying to test memory blocks because of the number of separate test values required to get adequate fault coverage.
• Solution: use on-chip circuitry to generate test data and check the results. Can be used at every power-on to verify correct operation!
Generate pseudo-random data formost circuit using e.g. a linear Feedback shift register (LFSR).
For pseudo-random input data, computesome output values and compare againstexpected value “signature” at the end of the test.
ELEC516/10 Lecture 1049
Built-in self test
• Parts of the circuit are used to test the circuit itself.• Essential circuit modules:
– Pseudo random pattern generator (PRPG)
– Output response analyzer (ORA)
ELEC516/10 Lecture 1050
PRPG using LFSR
• LSFR – linear feedback shift register
Q0 Q1 Q21 0 01 1 01 1 10 1 11 0 10 1 00 0 11 0 0
ELEC516/10 Lecture 1051
Signature Analysis
• Reduce chip area, data compression schemes are used to compare the compacted test responses instead of the entire raw test data.
• Signature analysis – based on cyclic redundancy checking
• Use polynomial division which divides the polynomial representation of the test output data by a characteristic polynomial and then finds the remainder as the signature.
• The signature is then compared with the expected signature to determine whether the device is faulty or not.
• Sometimes the fault may be un-detected - aliasing
ELEC516/10 Lecture 1052
ORA by LFSR
• The signature is the content of this register after the last input bit has been sampled.
• The input sequence {an} is represented by G(x) and the output sequence by Q(x). Gx( = Q(x)P(x) + R(x) where P(x) is the characteristic polynomial of LFSR and R(x) is the remainder.
• For the above LFSR we have P(x) = 1+x2+x4+x5
• For the input sequence [11110101], the G(x) = x7+x6+x5+x4+x2+1 and the remainder term R(X) = x4+x2 which corresponds to the register content of [00101]
ELEC516/10 Lecture 1053
ORA
• On-chip storage of a fault dictionary containing all test inputs with the corresponding outputs is too expensive.
• A simple alternative is to compare the output of two identical circuits for the same input– Cannot detect if both circuits have the same faults
• Self-checking design- detect fault autonomously during on-line operation– A checker circuit is inserted such that the checker
generates and sends out a signal when on-line faults occur.
ELEC516/10 Lecture 1054
Built-in Logic Block Observer (BILBO)
• A form of ORA, used in each cluster of partitioned registers
• Allows monitoring of circuit operation through exclusive ORing into LFSR at multiple points, which corresponds to the signature analyzer with multiple inputs
Co C1 Mode
0 0 linear shift1 0 signature analysis
1 1 data latch0 1 reset
ELEC516/10 Lecture 1055
BILBO Application
CombinationLogic
CombinationLogic
BIL
BO
-A B
ILB
O-
B
scanIn scanOut
In Out
ELEC516/10 Lecture 1056
Memory Self-Test
MemoryUnder Test
FSM SignatureAnalysis
Data-inData-out
Address& R/W control
Patterns: Writing/Reading 0s, 1sWalking 0s, 1sGalloping 0s, 0s
ELEC516/10 Lecture 1057
Current monitoring: IDDQ –The idea
• CMOS logic should draw no current when it’s not switching. So after initializing circuit and disabling pseudo-NMOS gates, the power supply current should be zero after all signals have settled.
• Good for detecting short faults. Need to try several different circuit states to ensure all parts of the chip have been observed.
ELEC516/10 Lecture 1058
Current Monitoring IDDQ Test
• Under bridging fault, static currents drawn from the power supply, much larger than leakage current
• Test different situations– Gate oxide short
– Channel punch through
– P-n diode leakage
– Transmission-gate defect
• IDDQ test only needs sensitization, but not propagation
• Performance in open drain and open gate test is less effective