9
___________________________________________ * Copyright © 2009 by the Consortium for Computing Sciences in Colleges. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the CCSC copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Consortium for Computing Sciences in Colleges. To copy otherwise, or to republish, requires a fee and/or specific permission. 205 A COMPLETE DESIGN OF A RISC PROCESSOR FOR PEDAGOGICAL PURPOSES * Hala ElAarag Department of Mathematics and Computer Science Stetson University 421 N. Woodland Blvd. DeLand, Fl 32723 386-822-7554 [email protected] ABSTRACT In this paper we present the design of an instruction set architecture to address the need of providing a simple but realistic hands-on experience to computer organization and architecture students. To illustrate the goals of (Reduced Instruction Set Computer) RISC processors, we start with DLX architecture and show how we can apply the RISC philosophy and further reduce the DLX instruction set. At the end, students can build a complete pipelined processor in the lab with minimum cost using TTL chips. In this paper we give a recommendation to the chips needed to build this processor hoping that it would be useful for other Computer organization and architecture professors who want to provide their students with a deep insight on processor design. 1. INTRODUCTION At our University, as is the case for many other universities, we offer one course in computer organization and architecture. This course covers logic design, assembly language programming and computer organization. To give the students a hands-on experience in processor design, we used a kit from Yunten Labs [1]. However, this system is a very simple 4-bit architecture with a handful of instructions. Although students enjoyed building their own processor, this was not sufficient for the instructor.

A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

___________________________________________

* Copyright © 2009 by the Consortium for Computing Sciences in Colleges. Permission to copywithout fee all or part of this material is granted provided that the copies are not made ordistributed for direct commercial advantage, the CCSC copyright notice and the title of thepublication and its date appear, and notice is given that copying is by permission of theConsortium for Computing Sciences in Colleges. To copy otherwise, or to republish, requires afee and/or specific permission.

205

A COMPLETE DESIGN OF A RISC PROCESSOR FOR

PEDAGOGICAL PURPOSES*

Hala ElAaragDepartment of Mathematics and Computer Science

Stetson University421 N. Woodland Blvd.

DeLand, Fl 32723386-822-7554

[email protected]

ABSTRACT

In this paper we present the design of an instruction set architecture to addressthe need of providing a simple but realistic hands-on experience to computerorganization and architecture students. To illustrate the goals of (ReducedInstruction Set Computer) RISC processors, we start with DLX architectureand show how we can apply the RISC philosophy and further reduce the DLXinstruction set. At the end, students can build a complete pipelined processorin the lab with minimum cost using TTL chips. In this paper we give arecommendation to the chips needed to build this processor hoping that itwould be useful for other Computer organization and architecture professorswho want to provide their students with a deep insight on processor design.

1. INTRODUCTION

At our University, as is the case for many other universities, we offer one course incomputer organization and architecture. This course covers logic design, assemblylanguage programming and computer organization. To give the students a hands-onexperience in processor design, we used a kit from Yunten Labs [1]. However, thissystem is a very simple 4-bit architecture with a handful of instructions. Althoughstudents enjoyed building their own processor, this was not sufficient for the instructor.

Page 2: A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

JCSC 25, 2 (December 2009)

206

The Essentials of Computer Organization and Architecture by Null [2] introducesassembly language through MARIE. MARIE has a very simplistic instruction set anddatapath. Hennessy and Patterson’s Computer Organization and design [3], while widelyused, is more adequate at a university where a series of courses on computer organizationare offered. It is best if the students are familiar with logic design and assembly languageas it focuses on the design of processors. Hennessey and Patterson [3] use MIPS as theexample for assembly language programming. While MIPS is a simple assemblylanguage, it is too complicated to digest in one course that covers a board aspect ofcomputer organization. That was the motivation to create HRISC.

Our goal was to design an instruction set architecture that is not too simple likeMARIE and not as complex as MIPS. HRISC is realistic, yet could be easily build in alab. In this way, we combine theory and practice and clarify concepts by having studentsbuild their own processors from scratch.

DLX is a simplified RISC architecture that is designed mainly for pedagogicalpurposes. We take DLX one step further, and start to reduce its instruction set based onthe SPEC benchmarks to reach HRISC instruction set.

2. RELATED WORK

There are many simplified architectures designed for pedagogical purposes. Someare designed with the goal of prototyping using FPGAs [4] while others are based onHardware description Languages (HDL) [5]. Though, most of them are designed assimulators, some examples are Ant32 [6] and MARS [7].

Unlike related work, we had the following goals for our processor design:! Simple enough so students can build it physically using hardware. ! Realistic enough so it gives insight on how real processors work. ! Based on MIPS (or DLX) architecture. So we can make use of the myriad of

simulators that have been developed over the years. To name a few MIPS simulators, SPIM [8] is the most famous and WebMIPS [9]illustrates some of MIPS architectural features like pipelining.

3. HRISC OVERVIEW

HRISC is a 32-bit processor that is based on the DLX architecture. It has thefollowing features:! Achieves RISC goals! Has a Simple instruction set! Other desired complex instructions are implemented in software.! Load/store architecture! Has a single addressing mode! Has two fixed length instruction formats. ! Opcode and register designators are in the same bit fields within each instruction's

encoding for simple decoding.

Page 3: A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

CCSC: Southeastern Conference

207

! Pipelined design ! Executes one instruction in one clock cycle. ! Hardwired control unit.

4. HRISC INSTRUCTION SET

In this section, we reduce the instruction set of the DLX machine. Benchmarksprovided by the Standard Performance Evaluation Corporation (SPEC) are used [3]. Thefour benchmarks represent compilers, floating point, general integer and business dataprocessing.

From the instruction mixes of the four benchmarks, we noticed that only a smallnumber of instructions (conditional branch, add, load and store) dominate across all fourprograms. If floating point instructions are not taken into account then it can be noticedthat only 22 instructions are considered. Those instructions are the ones which areresponsible for more than 2% of the execution in at least one of the benchmarks. Thus,as the other instructions constitute a very small percentage of the instruction executionthen, they could be eliminated. This was done as a first step. We also noticed that someinstructions have blank spaces. Those are the ones which are not responsible for morethan 1.5% of the execution, and the average may appear at 1% or less because theinstruction is not used by all benchmarks. These instructions were eliminated as a secondstep of the reduction. Although not shown here for space limitation, we show how theeliminated instructions could be implemented in software using non-eliminatedinstructions. From the 53 DLX instructions only 17 are left and 34 are eliminated. Hencethe HRISC instruction set has 17 instructions shown in Table 1. HRISC has twoinstruction formats.

Table 1: HRISC Instruction Set

ControlBranch BEQZ, BNEZJump JR , JALR

Arithmetic/logicalAdd ADD,ADDI

Subtract SUBAnd Logical ANDExclusive Or XOR

Shift Right Logical SRL

Shift Left Logical SLLLoad High Immediate LHI

Set SEQ, SLT,SGT

Data TransferLoad LWStore SW

Page 4: A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

JCSC 25, 2 (December 2009)

208

5. THE TABLE OF EVENTS

Table 2 shows the table of events of HRISC. This table of events is considered adocumentation that shows the steps of execution of the different instructions in thepipeline. The table shows the pipeline stages on the vertical axis, while the six instructiongroups on the horizontal axis. The Register Transfer Language (RTL) is used to describethe operations performed in the pipeline. The arrow used in the RTL means thatdestinations are loaded on the edge of the clock at the end of the stage. The intersectionbetween a column and a row shows the operations performed in the execution of thatinstruction in that pipeline stage. All of these operations are done in parallel.

Since in a pipelined system all the stages operate at the same time, therefore carewas taken not to cause conflicts between operations in different stages of differentinstructions.

Figure 1 shows a complete block diagram of HRISC. The block diagram is alsodivided vertically into the five stages. In the IF stage, the PC can be loaded with 3possibilities BTA, RS1 or PC + 1 in case of a taken branch, a jump instruction or non-control transfer instructions, respectively. F1 and F2 are used to select an input to themultiplexer. They are functions of b1 and b2 (2 control bits from PROM ID) and the Zline (result of comparing RS1 with 0, if RS1 = 0 then Z = 0). Then

f1 = bl.b2'.z' + b1.b2.z f2 = bl'.b2 In the ID stage, a comparator is used to compare RS1 with 0 and outputs the Z line.

Also an adder is used to compute the target address of the branch instruction (BTA) earlyin the pipeline. Registers IR2 and LINK1 take the values of IR1 and PC, respectively. Thetwo source registers are read into registers A and B.

In the EX stage, OUT1 stores the result of the ALU instructions while OUT3 storesthe result of the SET instructions.

Page 5: A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

CCSC: Southeastern Conference

209

Since both OUT1 and OUT3 registers are connected to OUT2, then they should bechosen to be 3-state output registers. OUT1 is enabled by the control line (- (OUT2 <-OUT1)), while OUT3 is enabled by the control line (- (OUT2 <- OUT3)). Registers IR3and LINK2 take the values of IR2 and LINK1, respectively. MAR is loaded with theALU output, while MDR1 is loaded with the contents of register B.

In the MEM stage registers IR4, OUT2, LINK3, MDR2 and the data memory[MAR]are loaded with the contents of IR3, OUT1 or OUT3, LINK2, data memory[MAR] orMDR1, respectively.

The WB stage writes the register file with the contents of OUT2, LINK3 or MDR2.Only one of these registers is enabled by the corresponding control line and the writeoperation is performed.

6. DESIGN OF HRISC

In this section we show how to build HRISC in a lab using off the shelf chips.Choosing chips for this design is fairly simple. This is because the design consists mainlyof registers, PROMs, the register file and the ALU. Texas Instrument’s TTL family offersa wide array of functions. The 7400 series is the most widespread family of Integratedcircuits (ICs). A List of the chips that could be used is shown in Table 3.

Other components could be substituted if they provide the same functionality. Mostrecent 7400 series parts are fabricated in CMOS. For example, 74HCT258 for High-speedCMOS with TTL-compatible input thresholds are functionally similar to the TTL part74S258. The CMOS chips have significant power reduction when compared to TTL ones.Also instead of the 8-bit 374 registers there are 16-bit registers like 74FCT16374 and 32-bit registers like 74LVTH32374, which could greatly reduce the number of chips used.If 8-bit registers like the ones suggested in Table 3 are used to design 32-bit processorthen the total number of chips used is 159. To build this processor in the lab, we suggesttwo options. Either build an 8-bit version of HRISC or use the equivalent 32-bit registers.

7. MICROPROGRAM AND HARDWIRED CONTROL UNIT

We then design the microinstruction format of each PROM, and the possible entriesfor each field. For space limitation we show the simplest PROM; PROM IDPROM ID

b1,b2 10 BEQZ11 BNEQZ01 JR, JALR00 otherwise

Since the PROMS are found to have most of its entries as don’t cares, also they havevery few control lines, therefore it is better to hardwire the control unit. This would leadto less cost and better speed. We now illustrate how the HRISC control unit could behardwired.

Each instruction has a 6-bit opcode, they are named a,b,c,d,e and f. To facilitate thejob, the instruction's opcode can be assigned values such that each bit of the opcode

Page 6: A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

JCSC 25, 2 (December 2009)

210

decides which group the instruction belongs to. That is, bit "a" decides whether theinstruction is R-type or I-type, "b" decides whether it is a memory or a control instruction,bits "c” and "d” differentiate between different memory/control instructions, "e" decideswhether it is a shift instruction from the ALU R-type instructions, while "f" differentiatesbetween ALU and SET instructions. Table 4 shows the opcode bit configuration of theinstructions.

PROM ID has two control lines b1 and b2, they can be hardwired as follows:

We similarly show the hardwiring for the other PROM EX, PROM MEM and PROMWB.

CONCLUSION

Implementing processors as part of a computer organization course contributespositively to the learning process. Students obtain an insight that they can’t get throughtheory or simulation. Many students commented on how building their own processorusing actual hardware gives them an invaluable experience and reveals many intricaciesthat otherwise would have remained ambiguous. In this paper, we presented a hardwareimplementation of a pipelined processor that could be used in the scope of one computerorganization course.

REFERENCES

1. Computer System Architecture Lab, http://yuntenlabs.com/csalab.htm

2. Linda Null, The Essentials of Computer Organization And Architecture, Jonesand Bartlett publishing, 2006

3. Patterson and Hennessy, Computer Organization and Design: TheHardware/Software Interface, Morgan Kauffman, 2007

4. M. Holland, J. Harris, and S. Hauck, "Harnessing FPGAs for ComputerArchitecture Education", Int. Conf. on Microelectronic Systems Education, June2003.

5. Nestor J., “Teaching Computer Organization with HDLs: An IncrementalApproach”, Proceedings of the 2005 IEEE Int. Conf. on Microelectronic SystemsEducation.

6. Ellard et al. “On the design of a new CPU architecture for pedagogicalpurposes”, Proceedings of the 2002 workshop on Computer architecture

Page 7: A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

CCSC: Southeastern Conference

211

education: Held in conjunction with the 29th International Symposium onComputer Architecture, May 2002

7. Vollmar K., “MARS: an education-oriented assembly language simulator”,Proceedings of the 37th SIGCSE technical symposium on Computer scienceeducation, March 2006

8. Larus, J., SPIM: A MIPS32 simulator, http://www.cs.wisc.edu/~larus/spim.html

9. Branovic, I., Giorgi, R. and Martinelli, E., WebMIPS: A New Web-Based MIPSSimulation Environment for Computer Architecture Education, Workshop onComputer Architecture Education, 31st International Symposium on ComputerArchitecture, Munich, Germany, 2004.

Table 3: Suggested Chips for HRISC

Name Type Description

Instruction Registers(IR1-IR4)

74AS374 Octal Register with three-state outputs

Program Counter 74AS575Octal D-Type Flip-Flop with Synchronous Clear,

Three-state outputs

4:1 Multiplexer 74AS253Dual 4-line to 1-line Data Selector/Multiplexer with

three-state outputs

2:1 Multiplexer 74S258Quad 2-line to 1-line Data Selector/Multiplexer with

Inverted three-state outputs

Adder 74S283 4-bit Binary Full adder

Register File 74ASC3l03 16 x 8 edge-triggered 3-port register file

ALU 74AS181 4-bit Arithmetic Logic Unit and Function Generator

Carry Lookahead 74AS882 32-bit Lookahead Carry Generator

Buffer 74AS244 Octal Buffer with Noninverted three-state outputs

PROMsTBP18S03

0256 bits (32 words by 8 bits) Programmable Read-

only memoriesRegisters A,B,OUT,LINK, MDR, MAR 74S374 Octal Register with three-state outputs

Page 8: A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

JCSC 25, 2 (December 2009)

212

Page 9: A COMPLETE DESIGN OF A RISC PROCESSOR FOR …helaarag/papers/CCSCSE09.pdf · 2017-10-30 · bits "c” and "d” differentiate between different memory/control instructions, "e" decides

CCSC: Southeastern Conference

213