View
217
Download
1
Category
Tags:
Preview:
Citation preview
Built-In Self-Test for MultipliersBuilt-In Self-Test for Multipliers
Mary PulukuriMary PulukuriDept. of Electrical & Computer EngineeringDept. of Electrical & Computer Engineering
Auburn UniversityAuburn University
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 22
Outline of PresentationOutline of PresentationOverivew of multiplier architecturesOverivew of multiplier architecturesHistory of Digital Signal Processor History of Digital Signal Processor
(DSP) Architectures in FPGAs(DSP) Architectures in FPGAsOverview of Virtex-4 DSPOverview of Virtex-4 DSP
Prior Testing R&D for MultipliersPrior Testing R&D for MultipliersOur ApproachOur ApproachAnalysis MethodologyAnalysis MethodologySimulation ResultsSimulation ResultsApplication to Virtex-4 & 5 DSPsApplication to Virtex-4 & 5 DSPsSummary and ConclusionsSummary and Conclusions
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 33
Overview of MultipliersOverview of MultipliersArray MultiplierArray Multiplier
Final product calculated by using an array of Final product calculated by using an array of full adders & and gatesfull adders & and gates
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 44
Overview of MultipliersOverview of MultipliersSigned array or Baugh Wooley multiplierSigned array or Baugh Wooley multiplier
Final product calculated using an array of full Final product calculated using an array of full adders, and gates & nand gatesadders, and gates & nand gates
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 55
Overview of MultipliersOverview of Multipliers Modified Booth multipliersModified Booth multipliers
Partial products calculated using the modified booth algorithmPartial products calculated using the modified booth algorithm Modified booth algorithm uses a binary encoder to calculate partial Modified booth algorithm uses a binary encoder to calculate partial
products using a series of shift operationsproducts using a series of shift operations Summation of partial products done using CLA addersSummation of partial products done using CLA adders
A.R. Cooper, “Parallel architecture modified Booth multiplier” IEEE Proc. Electronic Circuits and Systems, vol. 135, no. 3, pp. 125-128, 1998
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 66
Overview of MultipliersOverview of MultipliersModified Booth/Wallace Tree multipliersModified Booth/Wallace Tree multipliers
Summation of partial products done using a Wallace Summation of partial products done using a Wallace TreeTreeEach column of partial products are summed using a multi-Each column of partial products are summed using a multi-
stage setup of half and full addersstage setup of half and full addersEach multi-stage adder circuit generates a sum and carry Each multi-stage adder circuit generates a sum and carry
which form the two final partial productswhich form the two final partial productsTwo final stage partial products from the wallace tree Two final stage partial products from the wallace tree
are added using a CLA adderare added using a CLA adder
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 77
Xilinx FPGA ArchitecturesXilinx FPGA Architectures4000/Spartan4000/Spartan
NNxxNN array of unit cells array of unit cellsUnit cell = CLB + routingUnit cell = CLB + routingFast carry logic in CLBs for addersFast carry logic in CLBs for adders
Virtex/Spartan-2Virtex/Spartan-2MMxxNN array of unit cells array of unit cells
Carry logic + AND gate for array Carry logic + AND gate for array multipliersmultipliers
4K block RAMs at edges4K block RAMs at edgesVirtex-2/Spartan-3Virtex-2/Spartan-3
18K block RAMs in array18K block RAMs in array18x18-bit multipliers with each RAM18x18-bit multipliers with each RAM
““based on modified Booth architecture”based on modified Booth architecture”
Virtex-4/Virtex-5Virtex-4/Virtex-5Added 48-bit DSP cores w/multipliersAdded 48-bit DSP cores w/multipliers
PC PC
PC
PC
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 88
Virtex-4 DSP ArchitectureVirtex-4 DSP Architecture 2 DSP slices per tile2 DSP slices per tile
16-256 tiles in 1-8 16-256 tiles in 1-8 columnscolumns
Each DSP includes:Each DSP includes: 18x18-bit 2's-comp 18x18-bit 2's-comp
multiplier (w/o adder)multiplier (w/o adder) 3-input, 48-bit 3-input, 48-bit
adder/subtractor adder/subtractor P = ZP = Z(X+Y+Cin)(X+Y+Cin) Optional accum regOptional accum reg
User controlled User controlled operational modesoperational modes For X, Y, & Z MUXsFor X, Y, & Z MUXs
Configuration bits control Configuration bits control other MUXsother MUXs Pipelining registersPipelining registers Accumulator registerAccumulator register Easily testedEasily tested
X
Y
Z
X
Y
Z
C(48)
A(18)B(18)
A(18)B(18)
P(48)
P(48)
Inputs for cascading
Inputs for cascadingOutputs w/ dedicated routing
Outputs w/ dedicated routing
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 99
BIST Approach for Virtex-5 DSPBIST Approach for Virtex-5 DSP
Larger multiplier
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1010
Multiplier ArchitecturesMultiplier ArchitecturesTest algorithm depends on architectureTest algorithm depends on architecture
But architecture is not specified in data sheetsBut architecture is not specified in data sheetsEliminate sequential logic architecturesEliminate sequential logic architectures ““Based on modified Booth”Based on modified Booth”
Multiplier choices include:Multiplier choices include:ArrayArrayBoothBoothModified BoothModified BoothModified Booth/Wallace treeModified Booth/Wallace tree
Our assumption based on area/performance analysisOur assumption based on area/performance analysis
Our goal: find/develop architecture independent Our goal: find/develop architecture independent test algorithm(s)test algorithm(s)
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1111
Modified Booth Test AlgorithmsModified Booth Test AlgorithmsTest algorithm uses 8-bit counter Test algorithm uses 8-bit counter (256 vectors)(256 vectors)
“ ““ “Effective Built-In Self-Test for Booth Multipliers”Effective Built-In Self-Test for Booth Multipliers”Gizopoulos, Paschalis & ZorianGizopoulos, Paschalis & Zorian
IEEE Design & Test of ComputersIEEE Design & Test of Computers
pp. 105-111, 1998pp. 105-111, 1998Claim fault coverage ~ 99.8%Claim fault coverage ~ 99.8%
4x4 connections to multiplier inputs4x4 connections to multiplier inputsOrder of the bits does not matterOrder of the bits does not matterAlgorithm used in Srinivas Garimella’sAlgorithm used in Srinivas Garimella’s
MS thesis for Virtex-2 multipliersMS thesis for Virtex-2 multipliers
×
nn
2n
Booth encoding
n×n multiplier
8-bit counterMSB LSB
4 4
4×4 algorithm
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1212
Modified Booth Test AlgorithmsModified Booth Test AlgorithmsTest algorithm uses 8-bit counter Test algorithm uses 8-bit counter (256 vectors)(256 vectors)
““An Effective BIST Architecture for Fast Multiplier Cores”An Effective BIST Architecture for Fast Multiplier Cores”Paschalis, Kranitis, Psarakis Paschalis, Kranitis, Psarakis
Gizopoulus & ZorianGizopoulus & Zorian Proc. Design, Automation and Test in Proc. Design, Automation and Test in
Europe Conf. pp. 117-121, 1999Europe Conf. pp. 117-121, 1999 Claim fault coverage ~99.8%Claim fault coverage ~99.8%
5x3 connections with 5 inputs to5x3 connections with 5 inputs to
Booth encodingBooth encoding But this was not explicit in paperBut this was not explicit in paper
Only shown in figure Only shown in figure Order of the bits does not matterOrder of the bits does not matter
Note that this paper is from 1999Note that this paper is from 1999
×
nn
2n
Booth encoding
n×n multiplier
8-bit counterMSB LSB
5 3
5×3 algorithm
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1313
Modified Booth Test AlgorithmsModified Booth Test AlgorithmsTest algorithm uses 8-bit counter Test algorithm uses 8-bit counter (256 vectors)(256 vectors)
““Low Power BIST for Wallace Tree-based Fast Multipliers”Low Power BIST for Wallace Tree-based Fast Multipliers”Bakalis, Kalligeros, Nikolos, Bakalis, Kalligeros, Nikolos,
Vergos & AlexiouVergos & Alexiou Proc. Int. Symp. on Quality of Electronic Design,Proc. Int. Symp. on Quality of Electronic Design,
pp. 433-438, 2000pp. 433-438, 2000 Claim fault coverage > 99%Claim fault coverage > 99%
5x3 connections with 5 inputs to 5x3 connections with 5 inputs to
Booth encodingBooth encoding Specifically stated in paperSpecifically stated in paper
But no data to back up claim that 5x3 better than 3x5But no data to back up claim that 5x3 better than 3x5
Did they just observe it in Zorian paper?Did they just observe it in Zorian paper? Note that this paper was published a year later than ZorianNote that this paper was published a year later than Zorian
8-bit counterMSB LSB
×
nn
2n
Booth encoding
n×n multiplier
5 3
5×3 algorithm
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1414
Modified Booth Test AlgorithmsModified Booth Test AlgorithmsTest algorithm uses 8-bit counter Test algorithm uses 8-bit counter (256 vectors)(256 vectors)
But which side is Booth encoding?But which side is Booth encoding?Xilinx does not specifyXilinx does not specify
Our original approachOur original approachRun 5x3 algorithmRun 5x3 algorithm
256 vectors256 vectors
andand run 3x5 algorithm run 3x5 algorithm512 vectors512 vectors
Include 4x4 if fault coverage improvesInclude 4x4 if fault coverage improves768 vectors768 vectors
Additional algorithms only require multiplexers to Additional algorithms only require multiplexers to change inputschange inputs
Use same 8-bit counterUse same 8-bit counter
×
nn
2n
Booth encoding
n×n multiplier
8-bit counterMSB LSB
5 3
5×3 algorithm
3 5
3×5 algorithm
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1515
Methodology for AnalysisMethodology for AnalysisMultipliers evaluatedMultipliers evaluated
Unsigned array Unsigned array Signed array – Baugh WooleySigned array – Baugh WooleyModified Booth Modified Booth
Carry look-ahead adders sum partial products in every stageCarry look-ahead adders sum partial products in every stage
Modified Booth Wallace Tree Modified Booth Wallace Tree Carry look-ahead adder sums final stage partial products Carry look-ahead adder sums final stage partial products Carry select adder sums final stage partial products Carry select adder sums final stage partial products Ripple carry adder sums final stage partial products Ripple carry adder sums final stage partial products
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1616
Methodology for AnalysisMethodology for AnalysisDesigned 8-bit models of the multipliersDesigned 8-bit models of the multipliersFault model: Collapsed single stuck-at Fault model: Collapsed single stuck-at
gate level faultsgate level faultsExhaustive testingExhaustive testing
To determine undetectable faultsTo determine undetectable faults
Test algorithms evaluatedTest algorithms evaluated44×4 ×4 5×35×33×53×55×3 & 3×55×3 & 3×544×4, 5×3 & 3×5×4, 5×3 & 3×5
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1717
MultiplierTotal Faults
Test Algorithm# faults detected (effective fault coverage)
Exhaust 4×4 5×3 3×55×3 &
3×55×3, 3×5 & 4×4
Unsigned array 16481644 (100)
1644 (100)
1644 (100)
1621 (98.60)
1644 (100)
1644 (100)
Signed array 16481644 (100)
1644 (100)
1644 (100)
1644 (100)
1644 (100)
1644 (100)
Mod-Booth 24992196 (100)
2180 (99.27)
2168 (98.72)
2179 (99.23)
2182 (99.36)
2193 (99.86)
Mod-Booth Wall-Tree CLA
21842090 (100)
2061 (98.61)
2068 (98.95)
2070 (99.04)
2071 (99.09)
2074 (99.23)
Mod-Booth Wall-Tree CSA
24222243 (100)
2215 (98.75)
2217 (98.84)
2218 (98.89)
2222 (99.06)
2228 (99.33)
Mod-Booth Wall-Tree RCA
20211962 (100)
1937 (98.73)
1944 (99.08)
1944 (99.08)
1944 (99.08)
1947 (99.24)
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1818
Application to Virtex-4 & 5 DSPsApplication to Virtex-4 & 5 DSPs In Virtex-4 & 5 DSPsIn Virtex-4 & 5 DSPs
Final stage carry look-ahead adder (CLA) separated from Final stage carry look-ahead adder (CLA) separated from the multiplierthe multiplier
5×3 & 3×5 give the same fault coverage for the multiplier 5×3 & 3×5 give the same fault coverage for the multiplier alonealone
Separate test algorithm for the CLASeparate test algorithm for the CLARun both 5×3 and 3×5 to test for bridging faults on the cascade Run both 5×3 and 3×5 to test for bridging faults on the cascade
routing between adjacent slicesrouting between adjacent slices
Mode (Test) First 256 ccs Second 256 ccs Third 256 ccs Fourth 256 ccs
00 (multiply) P = A×B P = A×B P = A×B+C P = A:B+C
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 1919
Summary and ConclusionSummary and Conclusion If the architecture of the multiplier is not known:If the architecture of the multiplier is not known:
33×5 algorithm gives best overall fault coverage for most ×5 algorithm gives best overall fault coverage for most multipliersmultipliersContradicting the claim of the authors who proposed 5×3Contradicting the claim of the authors who proposed 5×3
Running 3×5 & 5×3 gives better fault coverage for all Running 3×5 & 5×3 gives better fault coverage for all multipliersmultipliers
Running all three algorithms: 3×5, 5×3 and 4×4 test Running all three algorithms: 3×5, 5×3 and 4×4 test algorithms provides the best fault coverage for all algorithms provides the best fault coverage for all multipliersmultipliersArchitecture independent testingArchitecture independent testing
Virtex-4 & Vritex-5 multipliersVirtex-4 & Vritex-5 multipliersOriginal approach was 3×5 and 5×3Original approach was 3×5 and 5×3Better approach would be 3×5 and 4×4Better approach would be 3×5 and 4×4
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 2020
Summary and ConclusionSummary and ConclusionFor multipliers in Virtex-2 FPGAsFor multipliers in Virtex-2 FPGAs
Adder not separated from the multiplierAdder not separated from the multiplierRun both 3Run both 3×5 and 5×3 algorithms×5 and 5×3 algorithms
These give highest fault coverage for multiplier & CLAThese give highest fault coverage for multiplier & CLA
The 3×5 and 4×4 BIST algorithm should be The 3×5 and 4×4 BIST algorithm should be applied to multipliers inapplied to multipliers inSpartan-3ASpartan-3A
Similar to multipliers in Virtex-4Similar to multipliers in Virtex-4Spartan-6Spartan-6
Similar to multipliers in Virtex-4Similar to multipliers in Virtex-4Virtex-6Virtex-6
Similar to multipliers in Virtex-5Similar to multipliers in Virtex-5If only 2 algorithms can be appliedIf only 2 algorithms can be applied
Best results if all 3 can be appliedBest results if all 3 can be applied
M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 2121
Summary and ConclusionSummary and ConclusionArea overhead for different approachesArea overhead for different approaches
In addition to 8-bit counterIn addition to 8-bit counterMaximum area overhead for Maximum area overhead for NN-bit multiplier:-bit multiplier:
One test algorithm: 2One test algorithm: 2NN 2:1 multiplexers 2:1 multiplexersTwo test algorithms: 2Two test algorithms: 2NN 3:1 multiplexers 3:1 multiplexers
1 additional counter bit for control1 additional counter bit for controlAll three test algorithms: 2All three test algorithms: 2NN 4:1 multiplexers 4:1 multiplexers
2 additional counter bits for control2 additional counter bits for controlThis is worst case since synthesis tools may reduce This is worst case since synthesis tools may reduce
multiplexersmultiplexersParticularly in case of two and three test algorithmsParticularly in case of two and three test algorithms
Due to counter duplicate bits to same multiplexersDue to counter duplicate bits to same multiplexersRegardless, this is an area efficient BIST approachRegardless, this is an area efficient BIST approach
Paper almost finished for JETTA Letter or Trans. IE Corr.Paper almost finished for JETTA Letter or Trans. IE Corr. Brad is using 3×5, 5×3 & 4×4 algorithms in test bench Brad is using 3×5, 5×3 & 4×4 algorithms in test bench
for multipliers in Output Response Analyzer (ORA) for for multipliers in Output Response Analyzer (ORA) for mixed signal BISTmixed signal BIST
Recommended