Project Report 2

Second Progress Report

on

A Recongurable VLIW processor System

Submitted by

PAVAN NAIK PORIKA

(Registration No : 10VL16F )

of

MASTER OF TECHNOLOGY

in

VLSI Design

Under the guidance of

Mrs Aparna.P

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

SURATHKAL, SRINIVASNAGAR-575025

KARNATAKA, INDIA

DECEMBER 2011

Abstract

Field-Programmable Gate Arrays (FPGAs) are constantly improving in terms

of performance and area, and provide a technology platform that allows fast and com-

plex recongurable designs. So Computer architectures based on recongurable hardware

are becoming more popular. This project is on the designing and implementation of a

recongurable very long instruction word (VLIW) processor system. This processor is im-

plemented as a softcore using verilog code on a field-programmable gate arrays (FPGA).

This VLIW processor can exploit data level as well as instruction level parallelism inherent

in an application and make its execution faster. More importantly, we achieve our results

by saving expensive FPGA area through the sharing of resources.

ii

Contents

Abstract ii

Contents iii

List of Figures iv

1 INTRODUCTION 11.1 BASIC PROCESSOR DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 INSTRUCTION SET ARCHITECTURE . . . . . . . . . . . . . . . . . . . . 2

2 32-BIT RISC PROCESSOR DESIGN 22.1 ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 TARGET fOR NEXT EVALUATION 73.1 VLIW PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 CONTROL UNIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 IMPLIMENTATON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

References 8

iii

List of Figures

1 Block Diagram of Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Instruction set architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Schematic of the 32-bit RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Schematic of the 32-bit R0M . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Simulated result of the 32-bit RAM . . . . . . . . . . . . . . . . . . . . . . . 56 Simulated result of the 32-bit R0M . . . . . . . . . . . . . . . . . . . . . . . . 57 Simulated result of addition program . . . . . . . . . . . . . . . . . . . . . . . 68 Simulated result counter program . . . . . . . . . . . . . . . . . . . . . . . . . 6

iv

1 INTRODUCTION

VLIW processor has the main architectures which can exploit ILP in a single core pro-cessor. This architectures exploit ILP by issuing multiple operations per issue-slot to additionalfunctional units (FUs).

1.1 BASIC PROCESSOR DESIGN

The basic design of a single processor contains physically separated memories forprogram instructions and data. This implies that the width of databus may dier per memorytype. This is especially useful for VLIW architectures, because we want to issue very widewords from instruction memory. A four-stage design consisting of fetch, decode, execute, andwriteback stages is used for this processor. This processor has four Arithmetic Logic Units(ALUs), two Multiplier units (MULs), one Control unit (CTRL), one Memory unit (MEM), aGeneral-purpose Register (GR) le with 64 32-bit registers and a Branch Register (BR) le with8 1-bit registers.

PC

BR MEM

CTRLGR

A

A

A

A

M

M

FETCH DECODE

EXECUTE

WRITE

BACK

DATA

MEMORY

INST

MEMORY

.

Figure 1: Block Diagram of Processor

The Figure 1 depicts the organization of a 4-issue processor. The fetch unit fetches aVLIW instruction from the attached instruction memory, and passes it on to the decode unit.In this stage, the instruction is being split into syllables. Also, the register contents used asoperands are fetched from the register les. The actual operations take place in either the executeunit, or in one of the parallel CTRL or MEM units. ALU1 and MUL operations are performedin the execute stage. This stage is designed parametric, so that the number of ALU and MULfunctional units could be adapted. The processor should have exactly one CTRL and MEMunit, so these units are designed outside the parametric execute unit. All jump and branch

1

aparna

Highlight

aparna

Highlight

operations are handled by the CTRL unit, and all data memory load and store operations arehandled by the MEM unit. To ensure that all results to the GR and BR registers, external datamemory and the internal Program Counter (PC) are written at the same time per instruction, allwrite activities are performed in the writeback unit.

1.2 INSTRUCTION SET ARCHITECTURE

Each syllable in this processor will take 32 bits and each instruction contains 4 dif-ferent syllables so the default instruction size of the processor is 128 bit as shown in figure 2.As a processor contains 4 ALU units, all syllables are able to issue an ALU operation and theother operations are distributed among the syllables. Syllable 0 is able to issue CTRL opera-tions, syllables 1 and 2 are able to issue MUL operations and syllable 3 is able to issue MEMoperations.

Figure 2: Instruction set architecture

2 32-BIT RISC PROCESSOR DESIGN

2.1 ARCHITECTURE

A 32-bit RISC processor is designed. It contains 256× 32 RAM, 128× 32 ROM, 64general purpose registers, a ALU which can performs operations on 32-bit data and a controlunit which controls all control signals like chip select, read , write and branch operations. Thedecoder is designed in such a way that it divides the instruction in to opcode, mode of operationand registers. By reading the opcode and mode of operation in selects the operation in executionunit and control unit generates signals like chip select, read and write.

The RISC processor contains special instruction memory and branch memory. Instruc-tion memory contains machine code of the program and program counter(PC) increments ofterexecution of each instruction so that next instruction is fetched and executed. Branch mem-ory contains the branch address, when the branch instructions in decoded the branch address

2

aparna

Pencil

aparna

Highlight

aparna

Note

operation

aparna

Highlight

aparna

Highlight

copied to program counter(PC) so that next instruction for execution is shifted to specifiedbranch address.

Schematic of the 32-bit RAM as shown in figure 3 contains two data in/out ports andtwo address pots so that two data can be read or written at a time and it contains separate controlsignals for both ports. The register memory also contains the same architecture as RAM.

Figure 3: Schematic of the 32-bit RAM

Schematic of the 32-bit ROM as shown in figure 4 contains two data out ports and twoaddress pots so that two data can be read at a time and it contains separate control signals forboth ports.

Figure 4: Schematic of the 32-bit R0M

3

2.2 INSTRUCTION SET

The processor has 25 different instructions to perform all arithmetic, logical, branchand data transfer with 3 different modes. Mode0 of instructions are based on the register-register logic in which all operations are performed registers , Mode1 of instructions are basedon immediate mode in which all operations are performed on direct data and in Mode2 is branchinstruction. The instructions of the processor are shown in the Tables 1, 2, 3, 4.

OPCODE MACHINE CODE MODE REG AND BRANCH ADDRESSESADD 0000001 00 XXX........XXXADDI 0000010 01 XXX.........XXXSUB 0000011 00 XXX........XXXSUBI 0000100 01 XXX........XXXINC 0000101 00 XXX........XXXDEC 0000110 00 XXX........XXXMUL 0000111 00 XXX........XXXMULI 0001000 01 XXX........XXXDIV 0001001 00 XXX........XXXDIVI 0001010 01 XXX........XXX

Table 1: Arithmetic Instructions

OPCODE MACHINE CODE MODE REG AND BRANCH ADDRESSESXCHANG 0010100 00 XXX........XXXMOV 0010000 00 XXX........XXXMOVI 0010001 01 XXX........XXXPUSH 0010010 00 XXX........XXXPOP 0010100 00 XXX........XXX

Table 2: Data transfer Instructions

OPCODE MACHINE CODE MODE REG AND BRANCH ADDRESSESJUMP 0110000 10 XXX........XXXJUMPI 0110001 10 XXX........XXX

Table 3: Branch Instructions

4

OPCODE MACHINE CODE MODE REG AND BRANCH ADDRESSESASHFTL 0100000 00 XXX........XXXASHFTR 0100001 00 XXX........XXXLSHFTL 0100010 00 XXX........XXXLSHFTR 0100011 00 XXX........XXXNOT 0100100 00 XXX........XXXNOTI 0100101 01 XXX........XXXNAND 0100110 00 XXX........XXXNANDI 0100111 01 XXX........XXXNOR 0101000 00 XXX........XXXNOPI 0101001 01 XXX........XXX

Table 4: Logical Operation Instructions

2.3 RESULTS

Simulated result of 32-bit RAM is as shown in Figure 5

Figure 5: Simulated result of the 32-bit RAM

Simulated result of 32-bit ROM is as shown in Figure 6

Figure 6: Simulated result of the 32-bit R0M

5

Simulated results of a addition program is as shown in Figure 7

MOVI reg[1] 15‘d2 (00100010100001000000000000001000)MOVI reg[2] 15‘d1 (00100010100001000000000000000100)ADD reg[1] reg[2] reg[3] (00000010000000100001000001100000)END (00000000000000000000000000000000)

Figure 7: Simulated result of addition program

Simulated results of a counter program is as shown in Figure 8

MOVI reg[2] 15‘d10 (00100010100001000000000000101000)MOVI reg[3] 15‘d0 (00100010100001000000000000000000)MOVB breg[0] 6‘d3 (00101010100001100000000000000000)INC reg[1] (00001010000000100000100000000000)JUMPC reg[3] 15‘b0 (01100011000000000000100001000000)END (00000000000000000000000000000000)

Figure 8: Simulated result counter program

6

3 TARGET fOR NEXT EVALUATION

The targets for next evaluation are as follows:

3.1 VLIW PROCESSOR

A 4 issue VLIW processor is to be designed with each instruction length of 128 bitswitch contain 4 operations in it. T he Execution unit contains 4-ALUs and 2-multipliers, as theinstruction length is 128 bits decoder should divide the 128 bit in two 4 small instructions toexecute the operations separately and simultaneously. RAM and ROM is to be designed so that8 datas can be read from the memory or written in to the memory at a time.

3.2 CONTROL UNIT

A special control unit is to de designed. This control unit has to generate control signalsto manage all ALUs, multipliers, general purpose registers and branch registers.

3.3 IMPLIMENTATON

After the design of VLIW processor the performance of the VLIW processor is comparedwith the risc processor by implementing the processors in to a FPGA board.

7

References

[1] S. W. Fakhar Anjam and F. Nadeem, “A Shared Recongurable VLIW Multiprocessor Sys-tem,” in Computer Engineering Laboratory, Delft University of Technology Delft, The

Netherlands.

[2] G. B. Stephen Wong, Thijs van, “-VEX: A Recongurable and Extensible VLIW Processor,”in Delft University of Technology Delft, The Netherlands.

[3] M. D. Ciletti, “Modeling, synthesis, and rapid prototyping with the verilog (tm) hdl,”Recherche, vol. 67, p. 02, 1999.

[4] L. H. S. de Pablo, J.A. Cebrin, “A very simple 8-bit RISC processor for FPGA.”

8

Documents

Project Report 2