25
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010

Lutiac – Small Soft Processors for Small Programs

  • Upload
    chava

  • View
    27

  • Download
    3

Embed Size (px)

DESCRIPTION

Lutiac – Small Soft Processors for Small Programs. David Galloway and David Lewis November 18, 2010. Introduction. Lutiac is an experimental soft processor Designed for very small programs roughly 200 instructions roughly 200 words of data - PowerPoint PPT Presentation

Citation preview

Page 1: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

Lutiac – Small Soft Processors for Small Programs

David Galloway and David LewisNovember 18, 2010

Page 2: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

2

Introduction

Lutiac is an experimental soft processor Designed for very small programs

roughly 200 instructions roughly 200 words of data

Take a drastic step to reduce the size of the processor

Measure its area and speed Compare to NIOS II

Page 3: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

3

Typical Microprocessor

ALU

A registers B registers

From Outside World

To Outside World

PC

+1

Instruction

Memory

Decoder To Control Points

Page 4: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

4

Typical Microprocessor

Typical Microprocessor consists of: data path (registers, ALU, ...) controller (PC, instruction memory, decoder)

Data path has control inputs register file read addresses register file write address register file write enable instruction is add/subtract/and/or/copy/...

Page 5: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

5

Control Inputs

Control inputs are driven from the decoder Decoder driven from current instruction Current instruction determined by program

counter If instruction memory never changes:

current instruction is a constant function of the program counter so control inputs depend entirely on the value of the program

counter

Page 6: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

6

Control Inputs Are Function of PC

If we have small programs (≤ 64 total instructions) program counter only needs 6 bits

Each control input is a function of 6 PC bits could be replaced by a 6-lut

Entire decoder is a set of 6-luts Instruction memory isn’t needed at all, and can

be removed

Page 7: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

7

Drastic Step - Delete Instruction Memory

ALU

A registers B registers

From Outside World

To Outside World

PC

+1

Instruction

Memory

Decoder To Control Points

X

Page 8: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

8

Lutiac

ALU

A registers B registers

From Outside World

To Outside World

PC

+1

Decoder To Control Points

Page 9: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

9

Another Way to Think About It

At the point in a normal soft processor where the instruction is read from the instruction memory:

instruction = instruction_memory[pc];if(instruction is this) do this;if(instruction is that) do that; ...

Replace by a case statement based on the pc:case(pc)0: do this;1: do that;2: do the other thing;...

Page 10: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

10

Lutiac Implementation

Built a very simple prototype 16-bit processor that uses hard-wired programs instead of an instruction memory

3 stage pipeline decode: sets read addresses on register file execute: computes results, sets up register file writes write back: register file write

One cycle per instruction

Page 11: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

11

Lutiac Implementation

No data memory, just registers no fixed instruction format, so no hard limit on number of registers

One input port from outside world, one output port

Simple assembler converts my_program.s file into an equivalent Verilog processor description

Page 12: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

12

Experiments

Measure size and speed of Lutiac, varying: number of different kinds of instructions in the program size of the program number of registers used

Used Quartus 8.0 (2 years ago now) Stratix IV chips of various sizes, fastest speed grade

Each Stratix IV LAB contains 20 FFs + roughly 10 6-LUTs Some LABs can be re-configured as 640 bit RAMs

known as “MLABs”

Will compare to NIOS II at the end, but for now, remember that a medium sized NIOS II uses 58 LABs and 11 M9K rams

Page 13: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

13

Lutiac Size vs. Instruction Mix

Number New Type1 read_data_port2 add3 branches4 copy5 or6 asr7 mul8 sub9 and

10 mul_acc

Each program contains 64 random instructions, chosen from the allowed instruction types

Page 14: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

14

Fmax vs. Instruction Mix

Number New Type1 read_data_port2 add3 branches4 copy5 or6 asr7 mul8 sub9 and

10 mul_acc

Page 15: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

15

Effect of Program Size

Size grows linearly as program size increases beyond 64 instructions, roughly 1 LAB for every 20 additional instructions

Page 16: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

16

Effect of Number of Registers

Very large Lutiac (512 random instructions) grows by the number of MLABs needed to hold additional registers

Would save area if we used M9Ks instead of MLABs once we needed more than 96 16-bit registers

Page 17: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

17

Scalability of Multiple Lutiac Cores

Chained N identical 64 instruction Lutiac cores together LABs grow by 14.5 per core Fmax drops as Quartus placement worsens Ran out of DSP blocks above 256 cores

Page 18: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

Comparison to NIOS II

Very inexact NIOS II is 32 bits, Lutiac is 16 bits NIOS II also has memory interfaces, caches, traps, ...

Configure NIOS II systems with 4K bytes of RAM allows up to 1K words of instructions or data

Lutiac has no RAM, all instructions and data in MLABs

Lutiac and NIOS II both use four 18x18 multipliers (Multiplier/Accumulate mode)

18

Page 19: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

19

Comparison to NIOS II

Stratix IV NIOS IIVersion Comment LABs M9K 18x18s Cycles/Ins. FmaxNIOS/e smallest 37 6 0 5 368NIOS/s normal 58 11 4 1.? 235NIOS/f fastest 84 16 4 1.? 281

Instructions Registers LABs M9K 18x18s Cycles/Ins. Fmax64 32 14 0 4 1 198512 128 48 0 4 1 197

Stratix IV 16-bit Lutiac

Page 20: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

20

Comparison to NIOS II

Back of the envelope guess (± factor of 2x) Un-optimized 32-bit Lutiac is nearly twice the size of a 16-

bit Lutiac (25 LABs); .75 the speed (177 MHz) 32-bit Lutiac/NIOS IIs speed ratio = (177 / 235) area ratio of Lutiac/NIOS IIs

(25 LABs + DSP) / (58 LABs + 11 M9K RAMs + DSP) = .3

32-bit Lutiac/NIOS IIs throughput/area (177/235) / .3 = 2.5x

32-bit Lutiac/NIOS IIe throughput/area NIOS IIe is smallest NIOS, but isn’t pipelined, so has 5 cycles/instruction (177/368 * 5/1) / ((25 LABs + DSP) / (37 LABs + 6 M9K RAMs)) = 4.5x

Page 21: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

21

Lutiac Disadvantages

Limited to very small programs (200 instructions or so)

Must re-synthesize circuit every time program changes instruction memory replaced by LUTs would need good simulation tools or a debug version of the processor that did have an instruction

memory

Page 22: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

22

Lutiac Advantages

Circuit is smaller, less complex than standard soft processor

One less stage in the pipeline no instruction memory read required

Program contents are exposed to logic synthesis data path components that aren’t used will be removed by

synthesis circuit may be smaller and faster

Page 23: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

23

Lutiac Advantages

Flexible and powerful wide range of useful instructions can be available if not used by program, they will be synthesized away easy to add specialized instructions if needed

Not limited by a fixed instruction word width or encoding can use as many registers as the program wants

Page 24: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

24

Lutiac Advantages

Processor self configures based on program no “mega-wizard” needed if multiplier/adder/etc. isn’t used, synthesis will leave it out

Data path can adapt to the program Examples:

if program ever references a register immediately after writing to it, create a bypass register; else leave bypass register out of circuit

if multiplier and adder were used in parallel, create a separate copy of the register file for the multiplier; else have it share the adder’s register file

Page 25: Lutiac – Small Soft Processors for Small Programs

© 2010 Altera Corporation - Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S.

25

Conclusions

For small programs, it is possible to build 16-bit soft processors using only 12-25 LABs (plus multiplier) smaller and faster than smallest 32-bit NIOS II (37 LABs, 6 M9K

RAMs) with instructions/second on the same order as the mid-size NIOS

II (58 LABs, 11 M9K RAMs) size advantage over NIOS II disappears as program size

approaches 1000 instructions