20
- 1/20 - A Dynamically Reconfigurable Processor Architecture Masa Motomura System ULSI Development Division NEC Corporation [email protected] MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16

Processor Architecture A Dynamically Reconfigurable

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Processor Architecture A Dynamically Reconfigurable

- 1/20 -

A Dynamically ReconfigurableProcessor Architecture

Masa Motomura

System ULSI Development DivisionNEC Corporation

[email protected]

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16

Page 2: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 2/20 -

Contents

� Introduction

� DRP architecture

� DRP compiler

� DRP Application development system

� Sample applications

� Future roadmap and summary

Page 3: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 3/20 -

� Dynamically Reconfigurable Processor

=> DRP

� H/W-like performance by customized

datapath configurations using array of PEs

� S/W-like scalability by dynamic recon-

figuration of customized datapath planes

� C compiler is available

� Based on in-house high-level

synthesis tool, called Cyber

Introduction: What is DRP?

State Transition Controller

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

MemMem MemMem MemMem MemMem

MemMem MemMem MemMem MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

DRP Core (variable array size)

Array of processingelements (PEs) andmemories (Mems)

Array of processingelements (PEs) andmemories (Mems)

Datapath Plane 3

Datapath Plane 2

Dynamic Reconfiguration

Logic

Datapath Plane 1

MemoryBus

Function Unit

Page 4: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 4/20 -

DRP Architecture

� Fine-grained processor-based programmable array architecture

� Architected for stream data processing, such as NW packet, motion/still picture, and wireless data streams, etc.

� A simple sequencer� A simple sequencer

� Array of configurabledata memories

� Array of configurabledata memories

State Transition Controller

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

MemMem MemMem MemMem MemMem

MemMem MemMem MemMem MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

DRP Core (variable array size)

� Array of byte-orientedprocessing elements

� Fully programmable inter-PE wiring resources

� Array of byte-orientedprocessing elements

� Fully programmable inter-PE wiring resources

Page 5: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 5/20 -

DRP Architecture

� Finite state machine thatcontrols dynamic datapathreconfiguration

� Finite state machine thatcontrols dynamic datapathreconfiguration

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

MemMem MemMem MemMem MemMem

MemMem MemMem MemMem MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

DRP Core (variable array size)

C0 C1 C2 C3

� Sequencer is built-in, and is decoupled from PE array

� Datapath is dynamically reconfigured during runtime

� Customized datapath/memory organization at any time

� Customized datapathplane for each state

� Customized memory structures for each data-path plane (FIFO/table/scratch-pad, etc)

� Customized datapathplane for each state

� Customized memory structures for each data-path plane (FIFO/table/scratch-pad, etc)

Dyn

amic

Rec

onfigura

tion Datapath Plane

for State C0Datapath Plane

for State C0

… for C1

… for C3… for C2

Datapath Planefor State C0

Datapath Planefor State C0

… for C1

… for C3… for C2

Page 6: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 6/20 -

Processing Element (PE)� ALU: ordinary byte arithmetic/logic operations

� DMU (data management unit): handles byte select, shift, mask, constant generation, etc., as well as bit manipulations

� An instruction dictates ALU/DMU operations and inter-PE connections

� Source/destination operands can either from/to

� its own register file

� other PEs (i.e., flow through)

� Instruction pointer (IP) is provided from STC (statetransition controller)

Dat

a_in

(8bx2

)

Dat

a_out

(8b)

Flag_in

Flag_out

Inst

ruct

ions

ALU

Reg

iste

r Fi

le

Data WireFlag WireIP

DM

U

Page 7: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 7/20 -

Computation on DRP Core

� Instruction Pointer(IP) from STCidentifies a datapath plane

PE Array

Add Sel

Add Cmp

Add Add Cmp

Sel

PE

PE ALUDMU

012

Insts.

IP = “1”

1

� When IP changes, datapath plane switches instantaneously

� PE instructions as a collection behave like an extreme VLIW

� Sequencing through instructions=> Dynamic reconfiguration

� Spatial computation with using acustomized datapath plane

Page 8: Processor Architecture A Dynamically Reconfigurable

- 8/20 -

AES

3DES

MD5

SHA-1

Compress

Data In

Control

(task selectionby descriptor)

Example: IPSec on DRP Core

Dynamic Reconfiguration

Data Out

Multiple Datapath Planes

Different datapath plane for different set of algorithmic processing

Page 9: Processor Architecture A Dynamically Reconfigurable

- 9/20 -

AES

K08

128

8 Kr

128

Mix

Colu

mn

(r<10)

SS

SS

Place & Route Place & Route

MD5

32

128

(r<64)

g

+ + + R +

X[k] T[i]

+

2222

2222

&&&& &&&& ||||

&&&& &&&& ||||

IPSEC on DRP Core - Continued

PE

Dynamic Reconfiguration

Page 10: Processor Architecture A Dynamically Reconfigurable

- 10/20 -

Time

Input Packet

Checksum

Field extract

Reassemble

Field Check

Checksum

Checksum

Checksum

Table Lookup

Filed Check

Field Check

Table Lookup

Field Check

Example: Packet Processing

DynamicReconfiguration

1 Cycle

32b, 64b,,,

Different hardware for processing respective chunk of input packet

The compiler automaticallySchedules cycle-by-cycle datapath planes, statically

The compiler automaticallySchedules cycle-by-cycle datapath planes, statically

Page 11: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 11/20 -

DRP Compiler

� High-level synthesis: generates finite state machine and associated datapathplanes from the C source code

� Modify Cyber to cope with DRP architectural features

� Mapper: maps RTL for each datapathplane to individual PEs and memories

� Place&Router: physically locate the PEs and memories, and mutually connect them

High LevelSynthesis

DatapathPlane RTLDatapathPlane RTL

Place&Router

C SourceVerilog

RTL Source(optional)

DatapathPlane RTLDatapathPlane RTLDatapathPlane RTLDatapathPlane RTLDatapathPlane RTLDatapathPlane RTL

FiniteState

Machine

FiniteState

Machine

Mapper

DRP Object Code

PE/MemoryArray Code

STCCode

Compile C source code into DRP object code

Page 12: Processor Architecture A Dynamically Reconfigurable

- 12/20 -

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

PE PE PE PE PE PE PE PE

MemMem MemMem MemMem MemMem

MemMem MemMem MemMem MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

MemMem

C0 C1 C2 C3

Datapath Planefor State C0

Datapath Planefor State C0

… for C1

… for C3… for C2

Datapath Planefor State C0

Datapath Planefor State C0

… for C1

… for C3… for C2

Programming ModelDRP-CoreC source code

� “Finite state machine with datapath planes”

� DRP is architected based on a clear picture of how C code is compiled into hardware

Bit-orientedProcessing

Bit-orientedProcessingif ((a>b)||(c!=d))

DataPath_Func_1();

else if (e==f)

DataPath_Func_2();

else

DataPath_Func_3();

Control Structure=>

Finite State Machie

Control Structure=>

Finite State Machie

Byte-orientedProcessing

Byte-orientedProcessing

Page 13: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 13/20 -

1st Implementation: DRP Tile

� Tile is a minimum unit of DRP array

� 8x8 PEs

� 1-p HMEM and 2-p VMEM

� A PE stores 16 instructions

� Reconfiguration of a tile is controlled by its STC

� External port can attach extendedfunctional units, such as

� External memory controllers

� Peripheral bus controllers(like PCI)

� Complex operation units(like multiplier, divider, etc)

PEPE

VMEM:8b x 256w1-R, 1-R/W

VMEM:8b x 256w1-R, 1-R/W

HMEM: 8b x 8kw, 1-R/WHMEM: 8b x 8kw, 1-R/W

STCSTC

VM-CTRVM-CTR

HM-CTRHM-CTR

External PortExternal Port

Page 14: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 14/20 -

1st Silicon: DRP-1

PCICMUL

MC MULMUL PLLPLL

PLLPLL

PEVMEMHMEM

MUL

Data

CLK CLK

CLK CLK

STC

VM-CTRHM-CTR

PCI IF

Ctrl

MUL MUL

MULMUL

Test

- 8 Tiles. 512 PEs, 160kb VMEMs, 2Mb HMEMs

- 8 STCs - 8 32b Multipliers- 1 SDRAM/SRAM/CAM

controller- 1 PCI controller- 4 PLLs

- 8 Tiles. 512 PEs, 160kb VMEMs, 2Mb HMEMs

- 8 STCs - 8 32b Multipliers- 1 SDRAM/SRAM/CAM

controller- 1 PCI controller- 4 PLLs

To SDRAM/SRAM/CAM

Extended Functional Units

Program

DRP-1 Chip Blockdiagram

Page 15: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 15/20 -

Current Status: HW Platform

PEProcessing Element

State Transition Controller

PCI Controller

Memory Controller

2-port Memory (2kb)

1-port Memory (64kb)

DRP-1 Die Photo � 0.15µm CMOS 8-Al process, 696-pin TBGA package

� 33-133MHz clock

� 22M logic transistors (44K for a PE)

� 1.5Mb configuration memories

� Fully functional w/o respin

� DRP-1 board is now up and running

� Demonstration of DRP as an off-loading engine to a host MPU through PCI or HyperTransport I/F

FPGA DRP-1

PCI

MC

PCI

CAM SRAMSRAM SDRAM

HTTo/from

Host MPU

Page 16: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 16/20 -

Current Status: Compiler SuiteHigh Level Synthesis ViewHigh Level Synthesis View

Place&Router ViewPlace&Router View

Top WindowTop Window

Hierarchy BrowserHierarchy Browser

Memory Access SchedulingMemory Access Scheduling

Report SummaryReport Summary

Source Code Editor/ViewerSource Code Editor/Viewer

C codeC code

Verilog RTL

Verilog RTL

Scheduled DataFlow Graph

Scheduled DataFlow Graph

Critical PathDelay AnalysisCritical Path

Delay Analysis

PEPE

Scheduled StateTransition DiagramScheduled State

Transition Diagram

DRP application development system (HW+SW) is ready

DRP application development system (HW+SW) is ready

Page 17: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 17/20 -

Example: IPv4 Packet Forwarding� RFC1812 (Header check, checksum, longest prefix match, header modification)

� 8x16 PEs, 100MHz Operation

� Table look up: 5 round index search using 100MHz SRAM

� 6 pipe-stages, 5-cycle each. 100MHz x 5Cycle => 20M Packet/second

Dynamic Reconfiguration (5 datapath planes) PE

Page 18: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 18/20 -

Example: AES Encryption

ByteSub(8-bit TableLook-up x16)

ShiftRow

K0

8

128

128

8

Kr

128

128

Output

MixColumn (r<16)

(r<

10)

(r=10: Skip)

SSSS

CriticalPath

� Critical Path: 6.6ns. Throughput: 1.76Gbps (ECB mode)

� Fits in 8x16 PEs, single context (80 ALU/DMUs, 96 RFUs, 16vMEMs)

P & R Results

Page 19: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 19/20 -

DRP Roadmap

2002 2003 2004

DRP-based Product

DRP Core150nm

133-166MHz

Year

150nm100-133MHz

130nm(Cu)166-200MHz

90nm(Cu)250MHz

(Prototype)

DRP ASSPDRP ASSP

DRPCoreDRPCoreDRPCoreDRPCoreHT*

DRP ASSP(2)DRP ASSP(2)

DRPCoreDRPCoreDRPCoreDRPCore

HT*

SPI4/5

* PCI-X/Express mayalso be available

DRP 1st Silicon

Embedded core for NEC ASICs

Application development system

Programmable off-load engine to host MPU

� Faster time to market

� Longer time in market

� Lower risk ASIC design

� Faster time to market

� Longer time in market

� Lower risk ASIC designHostMPUHostMPU

HyperTransport,

PCI

Page 20: Processor Architecture A Dynamically Reconfigurable

MICROPROCESSOR FORUM 2002, Session 4-2, Oct. 16 - 20/20 -

Summary

� DRP architecture features

�Array of PEs and memories for programmable datapath

�FSM-based sequencer, a state transition controller, for the control of dynamic datapathreconfiguration

� DRP application development system (DRP-1 chip & board, and the compiler suite) is ready

� Sample application development is underway

� Launch of DRP-based product planned during Y’03