32-Bit RISC Processor With Concurrent Error Detection

Embed Size (px)

Citation preview

  • 8/3/2019 32-Bit RISC Processor With Concurrent Error Detection

    1/7

    A 32-Bit RISC Processor With Concurrent Error Detection

    A . Maamar G. RussellDepartmentof Electrical and Electronic EngineeringUniversityof NewcastleNewcastle upon Tyne ,UK

    AbstractThis pap er describes the design and implementation ofa 32-bit RISC Processor with a Concurrent ErrorDetection capab i@. The CED scheme uses Dong s Codewhere the error detection capability depends upon the

    number of checkbits used and not upon the number ofdata bits, hence can be made application specific. Theequations used fo r check symbol prediction of botharithmetic and logical functions are outlined and itsincorporation in a 32-bit Fault-Tolerant RISC processordescribed.

    1. IntroductionAs the scale of integration increases circuits becomemore susceptible to sources of transient or intermittentfaults, the characteristics of the intermitte nt faults, and theincreased use of co mplex VLSI circuits in safety critical

    applications , necessitate the use of a test strategy whichcontinuously monitors the operation of circuits andcompares them with some known reference. Thisapproach is usually refereed to as Concurrent ErrorDetection (CED). One approach to incorporating a CEDcapability into VLSI circuits, which has been shown to beviable, is the use of informa tion redundancy (codingtechniques); several RISC processors incorporatinginformation redundancy schemes have been designed andfabricated[1,2]. Invariably, the incorporation of CEDschemes incw penalties on a design in terms of areaoverheads resulting from the additional hardware androuting space necessary to implement the scheme, the areaoverhead incurred is a function of the number of thecheckb its used in the coding scheme. Amongst all of theseparable codes used in CED schemes, Berger Code[3] isthe least redundant separable code capable of detecting allunidirectional errors. There are however, manyapplications where the detection of all possibleunidirectio nal errors is unnecessary and this h as led to thedevelopment of several versions of the modified Berger

    Code. One of these versions is Dongs code[4],although this requires fewer check bits, it has slightlyreduced error detection capabilities. Within the code thenumber of checkbits used is a function of the errordetection capability required, and not simply on thenumber of information bits in the data word as in the fullBerger Code. This gives a degree of flexibility ofapplication depending upon the type of system into whichit is incorporated, trading area overhead, for itsimplementation, against error detection capability. In thispaper the mathematical equation s for the prediction ofthe check symbols in Dongs Code are outlined togetherwith a brief descriptio n of the implementation of the checksymbol prediction hardware in the des ign of a 32-bit RISCProcessor with a Concurrent Error Detection (CED)capability.2. Code Construction

    To construct Dongs Code, it is first necessa ry to setthe maximum weight (m ) of the unidirectional errorsneeded to be detecte d by the code, regardless of thenumber of information bits. The check symbol ofthe code consists of two parts, referre d to as C kland Ck2. The num ber of bits in Ckl is j, where :j= r log2(m+1)1. Ckl is equal to the binary representationmodulo(m+l) of the number of zeros in the informationbits represented in j bits. To obtain Ck2, Dong in[4]simply complements Ckl bit by bit. B ut in this paper,Ck2 will be generated by cou nting the number of zeros inCk l and representing the result in binary form, this willreduce the number of bits in Ck2 by at least one bit;however, as the number of bits in CkI increases the savingof bits in Ck2 also increases, without effecting the errordetection capability. The code detects all unidirectio nalerrors except those which affect only the information bitsand have weight equal to ( m + l ) or its multiples[4]. Inother words if m is set to 7, the number of info rmation bitsis 32, and the errors affect only the informatio n bits, thenthe code can detect any unidirectional error of weight notequal to 8,16, 24 and 32, but all other weights can be

    4611089-6503/98 $10.00 0 1998 IEEE

  • 8/3/2019 32-Bit RISC Processor With Concurrent Error Detection

    2/7

    detected. The code can also detect some other typesof errors. For example, if the checkbits are affected byany number of unidirectional errors then the code candetect all types of errors (unidirectional or bi-directionalerrors which affect the information bits) this comes fromthe fact that the check symbols of the code form a set ofunordered words in which no check symbol can bechanged into another by any unidirectional errors; this isan advantage over the Berger Code itself. The errordetection capability of the code is be summarised inTable(1).

    Type of erroraffecting theInformation Bits1 -+ oOR0 + I1 t oOR0 - + I1 -+ oAN D0 - + I

    or its multiples

    -.2 - CSP for SubtractionJ ICkl=(2 +Xcr -YCL-cck I C,,+C,,,)mod(m+ 1 )3.3 - CSP for Arithmetic Shift LeftC kl=( 2 +XckI +C,,,)mod (mi-])

    j+I

    3.4 - CSP for Arithmetic Shift RightCk 1=(2 +XcL+ CO,, X, )mod(m + )J+ I3.5 - CSP for Rotate Op erationCk l=Xck3.6 - CSP for C omplement OperationCk 1=x k 1=(2J+ - XCk ,)m od(m + 1)

    3.7 - CSP for OR OperationC k l = + XC hl YC k l (XnY)ck l )mod(m+l)3.8 - CSP for AN D OperationCk1=(2+ xcil + Yck~- (XuY)~k~)mOd(m+)

    1 4OR

    n A iAll Errors

    Table( 1 ) Error detection capability of Dongs Code3. Check Symbol Prediction (CSP)

    Check symbol prediction is one of the schemes usedto perform concurrent error detection in arithmetic andlogic functions, and it has been considered for somecodes. In this section the mathematical equations for thecheck symbol prediction using Dong s Code, forarithmetic and logic functions, will be outlined. Theequations are based on the mathem atical foundation forthe prediction of Berger Codes described $51. Theseequations will be used to design the check symbolprediction circuit needed to implement concurrent errordetection in the ALU of the processor. Given anarithmetic or logic operation S =X opY, where theoperands X and Y are coded into Dongs Code. Let X ~k land YCklbe the Ckl check symbols of the operands. Thepredicted Ckl ofthe result (SCkl)an be computed fromCkl of the result needs to be predicted, Ck2 is thengenerated from Ckl. The equations for computing thecheck symbols for a range of arithmetic and logicoperations are given below:

    x, , XCkl, and YCkI, i.e. S C k l = f ( x , y , XCkI, YCkI), only

    3.9- CSP for XOR O perationCkl=(2 +(xuY)Ckl- (Xn Y)c kI l)mod(m+l)J+ 13.10 - CSP for EX-NOR O peration- -1Ckl=(2 +( X n Y ) c k l + (X n Y Ck l + l )mod(m+l) (10)3.11 - CPS for Increment O peration:Ck l= ( 2 + Xcr l+ CO,,- CCk l C,,)mod(m+l) (1 1)3.12 - CSP for Decrement O perationCk l= (2J+1 XCk]+ CO,,- Cckl)m od(m+ l) (12)4. RISC ArchitectureA functional block diagram of the RISC processor isshown in Figure(1). The RISC processor is divided intotwo main blocks, the information processing block and thecheck symbol prediction block. The function andcomposition of the information processing and checksymbol prediction block are outlined below.

    4.1 Information Processing BlockThe information processing block fetches and executesarithmetic and logic operations, and performs data transferfrondto register file and the external mem ory. The blockconsists of ALU, the Data Register File (DRF), and theControl Unit.a) ALU : The ALU is designed as three separate units,namely the Arithmetic Unit, Logic Unit, and the Shifter-Rotator Unit, the reason for the separation is because theinternal carry generation is the time consuming part in the

    462

  • 8/3/2019 32-Bit RISC Processor With Concurrent Error Detection

    3/7

    arithmetic unit, hence, other operations which do notrequire internal carry generation can be performed fasterif they are separated fiom the arithmetic operations. Tospeed up the performance of the arithmetic unit, the carrygenerator should be designed to be as fast as possible,there are several different methods to generate the internalcarries, choosing one of them is a trade off betweenspeed and s ilicon area. In this design the carry generationcircuitry, initially adopted, is that proposed byBrent&Kung[7], since it is very suitable for a VLSIimplementation, and takes only 16 gate levels to generatethe internal carries for a 32-bit operand. However duringthe implementationit was discovered that it is possible tomodify the design reducing the number of gate levels to12 without affecting its regularity for VLSIimplementation.b). Control Unit : The control unit is the m ost complexmodule in the RISC processor , ts function is to fetch,execute the instructions, and provide the necessarysequences of ope rations required by other blocks in thecircuit to perform their functions. The control unit usesseveral special purpose registers such as program counter,status register etc., to perform its task. All the instructio nsare executed in one clock cycle, however, the number ofthe internal cycles depends upon the operation to beperformed. To overcome the processor-memorybottleneck problem, memory acc ess operations are kept toa minimum, only LoaUStore instructions are used tocommunicate with the memory. Load instruction readsdata from the memory and stores it in the register file.Store instruction, moves data from the register file andwrites it into the memory. All other instructions areRegister-Register instructions.e). Register File : The register file consists of twoindepen dent register files. Data Re gister File (DRF) forstoring data, and Check Symbol Register File (CSRF) tohold the check symbols of the data. Each code wordstored in the register file is divided into two parts, theinformation part stored in R, of the DRF, and the checksymbol part which is store d in R, of the CSRJ?. Each filehas its own address decoding hardware. Thus addressingerrors can be detected by a mismatch between the newcheck symbol generated for the data word and its storedcheck symbol, since the probability of both regist ers beingin error simultaneously with the same fault is very low.DRF consists of 32x32bits general purpose registersavailable to the user, and CSRF consists of 32-registerseach of 5-bits used to store the check sym bols.4.2. Check Symbol Prediction Block

    Figure (2 ) shows block diagram of he circuitry whichgenerate s the predicted check symbols Ck l and Ck2, thecircuit also gene rates the actual check symbols C kl andCk2 from the result. To predict the check symbol of

    the result of most operations, the internal carries of thetwo operands are used. Using the internal carriesgenerated in the ALU can reduc e the cost of the hardware,but it is risky, as the internal carries can be affected by anerror, the error will affect both the ALU and thePrediction Unit. To date there is no straight forward wayof detectin g errors affecting the internal carries,therefore , a decisio n was mad e to have a sepa rate internalcarry generator for the check symbol prediction circuit.To generate the predicted check symbol of the result ofthe arithmetic operations, the prediction circuit uses theactual check symbols of the operands, and the internalcarries generated within the prediction circuitry; togenerat e the predicted check symbol of the result of logicoperatio ns, the prediction circuit uses only the o perand(operands),and he actual check symbols. When the resultbecomes available from the output of the ALU, the actualcheck symbol is generated; at the same time the predictedcheck symbol of the result of the given operation becom esready at the output of the predicted check symbol unit.The predicted and the actual check symb ols arecompared using Totally Self-checking Checkers (TSC); ifthey match then the result is error fiee, otherwise an errorsignal will be generated, and the execution sequence ishalted.5. Error Detection Ca pability of th e RISC

    The discussion of the error detection capability of thecoding scheme used in the RISC Processor is divided intotwo secti ons, a) Errors occurrin g in transferring data fromthe register file toALU , ) Errors occu rring in the ALU.a) Data Transfer Errors: Before moving any data wordfrom DRF to the ALU or IIO port, a new check symbolfor the data word is generated and compared with thecheck symbol for the data word stored in the CSRF, ifthey match, then the data word and its check symbol canbe moved to the AL U or to the 1/0 port, but when anerror is detected, an error signal is activated and thetransfer is halted . To move the result from the A LU to itsdestination (DRF or I/O port), first its actual checksymbol has to be generated and compared with thepredicted check symbol generated by the prediction checksymbol circuitry, if no error is dete cted then the da ta wordand its corresponding check symbol are m oved, otherwis ean error signal is activated. When an error signal isactivate d the control unit will be informed to stop theprocessing the instruction.b) AL U Errors : The detection of errors in the ALU isdiscussed in relation to 4 separate cases.Case 1:Errors which only afSect the information bits:Anyunidirectional error of weight not equal to (m+I) or itsmultiples occurring in the Arithmetic Unit (affects theresult only, but not the predicted check symbol), willincrease or decrease the number of zeros in the result, and

    463

  • 8/3/2019 32-Bit RISC Processor With Concurrent Error Detection

    4/7

    Error signal -1

    ILUCircuits Iheck SymbolPrediction Circuits

    Error Signal-34Note : Control Unit not show n for clarity of the circuitFigure(1) Block Diagram of the RISC Processor with CED

    464

  • 8/3/2019 32-Bit RISC Processor With Concurrent Error Detection

    5/7

    the actual check symbol generated from the informationbits will not match the predicted check symbol, and theerror can be detected.Case 2: Errors which only affect the predicted checksymbol: When the check symbol prediction circuitgenerates an incorrect check symbol due to the occurrenceof any number of unidirectional errors, then the predictedcheck symbol will not match with the actual checksymbol, and an error signal will be generated; this meansthat any unidirectional error in the prediction circuitrycan be detected since no predicted check symbol can bechanged into another predicted check symbol by anynumber of unidirectional errors.Case 3: Errors which affect both information bits andcheck symbol: If the check symbol is affected byunidirectional errors at the same time as the informationbits are affected by either unidirectional or bi-directionalerrors, then the predicted check symbol will not matchwith the actual check symbol, and any type of error of anyweight can be detected, since no unidirectional error canchange one check sym bol into another.Case 4: Error s affecting the internal carries : Tw ointernal carry generators are used, one to generate thecarries needed by the arithmetic unit, the other generatesthe internal carries needed by the prediction circuit. If anerror occurred in the internal carries in the arithmeticunit , his will affect the actual check symbol, but it willnot affect predicted check symbol, as it uses internalcarries which are generated separately, therefore theactual check symbol and the predicted check symbol willnot match and the error can be detected. The only casewhere an error affecting the internal carries cannot bedetected arises if both the internal carries used by thearithmetic unit and the internal carries used by theprediction circuit are affected by the same error, howeverthe possibility of this error occurring is very low .6. Performance

    The total delay time of the unchecked processor is thedelay of reading an operand (operands) from the registerfile, the delay through the ALU, and the delay of writingthe result back into the register file. In the checked RISCprocessor the total time for any operation is equal to thetime needed to read the operand from the register file,time to check the operand ,ALU time, time to check theresult, and time to write the result back into the registerfile, in other words the extra time needed by the checkedprocessor is the time for checking the operand and thetime for checking the result.To check the operand, a new check symbol must begenerated and then compared with the stored checksymbol of the operand, if no error is detected then theoperand is transferred to the ALU and its check symboltransferred to the prediction block. While the ALU is

    performing the operation on the operand, the predictioncircuit is generating the predicted check symbol; the timetaken by the ALU to obtain the result is about the same asthe time taken by the prediction circuit to generate thepredicted check symbol. If the prediction circuit requiresmore time compared with the AL U, this w ill not affect theover all delay, as the generated check symbol producedby the prediction circuit cannot be used before thecheck symbol of the result obtained from the ALU isgenerated. Simulation results have shown that the circuitdelay through the prediction circuitry is much less thanthat through the ALU and Check Symbol Generator.From Figure(3) it is seen the time penalty fromintroducing Concurrent Error Detection is: t, + t3 , wheret, is the time taken by the Check symbol generator togenerate the check symbol and the time delay through theTSC, and t 3 is again the time taken by the Check sym bolgenerator to generate the check symbol of the result andthe time delay through the TSC . The total delay, overall, is equal to : 2 x ( delay in Check Symbol Generator +delay in TS C ). Check Symbol Generator is a 32bit -mods-0s counter, and its total delay is about 10 gatelevels; TSC delay is 2 gate levels. Therefore the total timepenalty is about 24 gate levels. From above it is seen thatthe ALU has to wait for 12-gate levels before it can startperforming the operation, and it has to wait for another 12gate levels before the result can be moved back to theregister file. To eliminate the extra delay a pipelinestructure is used in the RISC design, the execution of thecurrent instruction in the ALU can be overlapped withchecking the result of the previous instruction, hence theextra time required to check the result can be neglected

    7. Hardware ComplexityThe penalty of implementing a Concurrent ErrorDetection using information redundancy is not only withrespect to time but also area overheads resulting fromextra hardware. The extra hardware used is summarisedbelow:1- Check Symbol Register File (CSRF): This filecomprises 32 registers, each 5-bits wide, the CSRF usesthree busses to communicate with other RISC units. Threeseparate address decoders are used, in order to detect anyerror in reading from the wrong register, or writing to thewrong register, which could not be detected if a commonaddress decoder is used.2- Checkers: There are three checkers, one checker foreach bus. Each checker is made of, 32-bit-mods-zeros-counter, and a two-rail Totally-Self-ch ecking (TSC)checker.3 - Prediction Block: This block generates the predictedcheck symbol for the result, it consists of Ckl generator (comprising the internal carry generator, 3 adders of 3bits

    465

  • 8/3/2019 32-Bit RISC Processor With Concurrent Error Detection

    6/7

    Operand XOperand YI I , .

    ,ACS YCSZeros CounterAdder I Subtracter . I

    Result Predictedfrom ALU cs

    Figure(2) Check Symbol P rediction Circuitry

    7

    CriticalPath

    operand Check SymbolReading time 1 I

    Check SymbolGeneratorChecking time

    t l : ( t 1 < < t 2)I

    processing timet2

    Check SymbolGenerator

    1Check Symbol

    1Writing time Result

    Figure(3) Delay through RISC Processor

    466

  • 8/3/2019 32-Bit RISC Processor With Concurrent Error Detection

    7/7

    each), Ck2 generator (which comprises one full adder andthree inverters), and a 5bit output latch.Qualitatively, the area overheads incurred are much lessthan duplication, or an implementation of ConcurrentError Detection using full Berger Code.6. Conclusions

    The c apabilities of Dong's Code to predict the checksymbol for arithmetic and logic operations has beendemonstrated. The design of a 32-bit RISC processor withConcurrent Error Detection capability where all the RISCprocessor units such as, the ALU, Register File, ControlUnit, incorporate Dong's Code, for error detection hasbeen presented. From qualitative analysis of errordetection capability of the technique as discussed inSection 2, the code detects all single errors andunidirectional errors except those which affect only theinformation bits and have weight equal to (m+l) or itsmultiples. The code can also detect some other types oferrors as shown previously in Table( 1).

    References[ I ] Russell, G. and Ell iot, l.D.,"Design of Highly ReliableVLSI Processors Incorporating Concurrent Error Detectionand Correction", Proceedings EURO ASIC91, May1991 Paris.[2] Sayers, I.L. and Russell, G." A Unified Error DetectionScheme for ASIC Design", Chapter 15 in 'Test Techniques forVLSI and WSI Circu its', Massara. R. (Ed itor), Peter PeregrinusLtd, 1989[3] J.M. Berger, " A note on error detection codes forAsymmetric Channels", lnformation and Control, vol. 4, March[4] H . Dong , " Modified Berger codes for detection ofunidirectional errors ", 12* lnt. Symp. Fault-Tolerant Comp.,June 1982, pp 317-320[SI J . Lo, S. Thanawastien, T. R. Rao, and, M. icolaidis 'I,An SFS Berger Check Prediction AL U and its Application toSelf-checking Processor Designs", IEEE Trans on CAD, vol 11no 4, April 1992, pp 525-540.[6] A. Maamar and G Russel1,"Checkbit Prediction usingDong's Code for Arithmetic Functions", Proc. of 3rd IEEE Int.On-line Testing Workshop, Greece, July.97,pp 254-258.[7] R.P. Brent and H. T. Kung ." A Regular Layout of ParallelAdders", IEEE Trans. Computer, vo1.C-31, March 1982, pp260-264

    1961, ~ ~ 6 8 - 7 3 .

    467