Upload
aparna-dinesh
View
25
Download
0
Tags:
Embed Size (px)
Citation preview
Second Progress Report
on
A Recongurable VLIW processor System
Submitted by
PAVAN NAIK PORIKA
(Registration No : 10VL16F )
of
MASTER OF TECHNOLOGY
in
VLSI Design
Under the guidance of
Mrs Aparna.P
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA
SURATHKAL, SRINIVASNAGAR-575025
KARNATAKA, INDIA
DECEMBER 2011
Abstract
Field-Programmable Gate Arrays (FPGAs) are constantly improving in terms
of performance and area, and provide a technology platform that allows fast and com-
plex recongurable designs. So Computer architectures based on recongurable hardware
are becoming more popular. This project is on the designing and implementation of a
recongurable very long instruction word (VLIW) processor system. This processor is im-
plemented as a softcore using verilog code on a field-programmable gate arrays (FPGA).
This VLIW processor can exploit data level as well as instruction level parallelism inherent
in an application and make its execution faster. More importantly, we achieve our results
by saving expensive FPGA area through the sharing of resources.
ii
Contents
Abstract ii
Contents iii
List of Figures iv
1 INTRODUCTION 11.1 BASIC PROCESSOR DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 INSTRUCTION SET ARCHITECTURE . . . . . . . . . . . . . . . . . . . . 2
2 32-BIT RISC PROCESSOR DESIGN 22.1 ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 TARGET fOR NEXT EVALUATION 73.1 VLIW PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 CONTROL UNIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 IMPLIMENTATON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
References 8
iii
List of Figures
1 Block Diagram of Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Instruction set architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Schematic of the 32-bit RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Schematic of the 32-bit R0M . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Simulated result of the 32-bit RAM . . . . . . . . . . . . . . . . . . . . . . . 56 Simulated result of the 32-bit R0M . . . . . . . . . . . . . . . . . . . . . . . . 57 Simulated result of addition program . . . . . . . . . . . . . . . . . . . . . . . 68 Simulated result counter program . . . . . . . . . . . . . . . . . . . . . . . . . 6
iv
1 INTRODUCTION
VLIW processor has the main architectures which can exploit ILP in a single core pro-cessor. This architectures exploit ILP by issuing multiple operations per issue-slot to additionalfunctional units (FUs).
1.1 BASIC PROCESSOR DESIGN
The basic design of a single processor contains physically separated memories forprogram instructions and data. This implies that the width of databus may dier per memorytype. This is especially useful for VLIW architectures, because we want to issue very widewords from instruction memory. A four-stage design consisting of fetch, decode, execute, andwriteback stages is used for this processor. This processor has four Arithmetic Logic Units(ALUs), two Multiplier units (MULs), one Control unit (CTRL), one Memory unit (MEM), aGeneral-purpose Register (GR) le with 64 32-bit registers and a Branch Register (BR) le with8 1-bit registers.
PC
BR MEM
CTRLGR
A
A
A
A
M
M
FETCH DECODE
EXECUTE
WRITE
BACK
DATA
MEMORY
INST
MEMORY
.
Figure 1: Block Diagram of Processor
The Figure 1 depicts the organization of a 4-issue processor. The fetch unit fetches aVLIW instruction from the attached instruction memory, and passes it on to the decode unit.In this stage, the instruction is being split into syllables. Also, the register contents used asoperands are fetched from the register les. The actual operations take place in either the executeunit, or in one of the parallel CTRL or MEM units. ALU1 and MUL operations are performedin the execute stage. This stage is designed parametric, so that the number of ALU and MULfunctional units could be adapted. The processor should have exactly one CTRL and MEMunit, so these units are designed outside the parametric execute unit. All jump and branch
1
operations are handled by the CTRL unit, and all data memory load and store operations arehandled by the MEM unit. To ensure that all results to the GR and BR registers, external datamemory and the internal Program Counter (PC) are written at the same time per instruction, allwrite activities are performed in the writeback unit.
1.2 INSTRUCTION SET ARCHITECTURE
Each syllable in this processor will take 32 bits and each instruction contains 4 dif-ferent syllables so the default instruction size of the processor is 128 bit as shown in figure 2.As a processor contains 4 ALU units, all syllables are able to issue an ALU operation and theother operations are distributed among the syllables. Syllable 0 is able to issue CTRL opera-tions, syllables 1 and 2 are able to issue MUL operations and syllable 3 is able to issue MEMoperations.
Figure 2: Instruction set architecture
2 32-BIT RISC PROCESSOR DESIGN
2.1 ARCHITECTURE
A 32-bit RISC processor is designed. It contains 256× 32 RAM, 128× 32 ROM, 64general purpose registers, a ALU which can performs operations on 32-bit data and a controlunit which controls all control signals like chip select, read , write and branch operations. Thedecoder is designed in such a way that it divides the instruction in to opcode, mode of operationand registers. By reading the opcode and mode of operation in selects the operation in executionunit and control unit generates signals like chip select, read and write.
The RISC processor contains special instruction memory and branch memory. Instruc-tion memory contains machine code of the program and program counter(PC) increments ofterexecution of each instruction so that next instruction is fetched and executed. Branch mem-ory contains the branch address, when the branch instructions in decoded the branch address
2
copied to program counter(PC) so that next instruction for execution is shifted to specifiedbranch address.
Schematic of the 32-bit RAM as shown in figure 3 contains two data in/out ports andtwo address pots so that two data can be read or written at a time and it contains separate controlsignals for both ports. The register memory also contains the same architecture as RAM.
Figure 3: Schematic of the 32-bit RAM
Schematic of the 32-bit ROM as shown in figure 4 contains two data out ports and twoaddress pots so that two data can be read at a time and it contains separate control signals forboth ports.
Figure 4: Schematic of the 32-bit R0M
3
2.2 INSTRUCTION SET
The processor has 25 different instructions to perform all arithmetic, logical, branchand data transfer with 3 different modes. Mode0 of instructions are based on the register-register logic in which all operations are performed registers , Mode1 of instructions are basedon immediate mode in which all operations are performed on direct data and in Mode2 is branchinstruction. The instructions of the processor are shown in the Tables 1, 2, 3, 4.
OPCODE MACHINE CODE MODE REG AND BRANCH ADDRESSESADD 0000001 00 XXX........XXXADDI 0000010 01 XXX.........XXXSUB 0000011 00 XXX........XXXSUBI 0000100 01 XXX........XXXINC 0000101 00 XXX........XXXDEC 0000110 00 XXX........XXXMUL 0000111 00 XXX........XXXMULI 0001000 01 XXX........XXXDIV 0001001 00 XXX........XXXDIVI 0001010 01 XXX........XXX
Table 1: Arithmetic Instructions
OPCODE MACHINE CODE MODE REG AND BRANCH ADDRESSESXCHANG 0010100 00 XXX........XXXMOV 0010000 00 XXX........XXXMOVI 0010001 01 XXX........XXXPUSH 0010010 00 XXX........XXXPOP 0010100 00 XXX........XXX
Table 2: Data transfer Instructions
OPCODE MACHINE CODE MODE REG AND BRANCH ADDRESSESJUMP 0110000 10 XXX........XXXJUMPI 0110001 10 XXX........XXX
Table 3: Branch Instructions
4
OPCODE MACHINE CODE MODE REG AND BRANCH ADDRESSESASHFTL 0100000 00 XXX........XXXASHFTR 0100001 00 XXX........XXXLSHFTL 0100010 00 XXX........XXXLSHFTR 0100011 00 XXX........XXXNOT 0100100 00 XXX........XXXNOTI 0100101 01 XXX........XXXNAND 0100110 00 XXX........XXXNANDI 0100111 01 XXX........XXXNOR 0101000 00 XXX........XXXNOPI 0101001 01 XXX........XXX
Table 4: Logical Operation Instructions
2.3 RESULTS
Simulated result of 32-bit RAM is as shown in Figure 5
Figure 5: Simulated result of the 32-bit RAM
Simulated result of 32-bit ROM is as shown in Figure 6
Figure 6: Simulated result of the 32-bit R0M
5
Simulated results of a addition program is as shown in Figure 7
MOVI reg[1] 15‘d2 (00100010100001000000000000001000)MOVI reg[2] 15‘d1 (00100010100001000000000000000100)ADD reg[1] reg[2] reg[3] (00000010000000100001000001100000)END (00000000000000000000000000000000)
Figure 7: Simulated result of addition program
Simulated results of a counter program is as shown in Figure 8
MOVI reg[2] 15‘d10 (00100010100001000000000000101000)MOVI reg[3] 15‘d0 (00100010100001000000000000000000)MOVB breg[0] 6‘d3 (00101010100001100000000000000000)INC reg[1] (00001010000000100000100000000000)JUMPC reg[3] 15‘b0 (01100011000000000000100001000000)END (00000000000000000000000000000000)
Figure 8: Simulated result counter program
6
3 TARGET fOR NEXT EVALUATION
The targets for next evaluation are as follows:
3.1 VLIW PROCESSOR
A 4 issue VLIW processor is to be designed with each instruction length of 128 bitswitch contain 4 operations in it. T he Execution unit contains 4-ALUs and 2-multipliers, as theinstruction length is 128 bits decoder should divide the 128 bit in two 4 small instructions toexecute the operations separately and simultaneously. RAM and ROM is to be designed so that8 datas can be read from the memory or written in to the memory at a time.
3.2 CONTROL UNIT
A special control unit is to de designed. This control unit has to generate control signalsto manage all ALUs, multipliers, general purpose registers and branch registers.
3.3 IMPLIMENTATON
After the design of VLIW processor the performance of the VLIW processor is comparedwith the risc processor by implementing the processors in to a FPGA board.
7
References
[1] S. W. Fakhar Anjam and F. Nadeem, “A Shared Recongurable VLIW Multiprocessor Sys-tem,” in Computer Engineering Laboratory, Delft University of Technology Delft, The
Netherlands.
[2] G. B. Stephen Wong, Thijs van, “-VEX: A Recongurable and Extensible VLIW Processor,”in Delft University of Technology Delft, The Netherlands.
[3] M. D. Ciletti, “Modeling, synthesis, and rapid prototyping with the verilog (tm) hdl,”Recherche, vol. 67, p. 02, 1999.
[4] L. H. S. de Pablo, J.A. Cebrin, “A very simple 8-bit RISC processor for FPGA.”
8