37
http://ces.univ-karlsruhe.de/RISPP Bauer, Shafique, Henkel Invited Talk @ SPP-RR Colloquium, 9/25/2009 RISPP: Rotating Instruction Set Processing Platform Lars Bauer , Muhammad Shafique and Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe (TH)

RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

http://ces.univ-karlsruhe.de/RISPPBauer, Shafique, Henkel Invited Talk @ SPP-RR Colloquium, 9/25/2009

RISPP: Rotating Instruction SetProcessing Platform

Lars Bauer, Muhammad Shafique

and Jörg Henkel

Chair for Embedded Systems (CES)

University of Karlsruhe (TH)

Page 2: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

2

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Development of Embedded Systems

Typical: Static analysis of

hot spots

Building tightly optimizedsystem

Nowadays: Increasing complexity

More functionality

Problem: Statically chosen design

point has to match all requirements

Typically inefficient for individual components (e.g. tasks or hot spots)

Page 3: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

3

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Flexibility, 1/time-to-market, …

Eff

icie

ncy

: $/M

ips,

mW

/MH

z, M

ips/

are

a, …

ASIC:- Non-programmable,- highly specialized

ASIC:- Non-programmable,- highly specialized

General purposeprocessor

General purposeprocessor

ASIP

(extensibleprocessor)

ASIP

(extensibleprocessor)

- Instruction set extension- parameterization

- inclusion/exclusion offunctional blocks

“Hardware solution”

“Softwaresolution”

Possible Solution:Extensible Processors

Reconfigurable Compu-

ting: Processor with

reconfigurable ISA,

i.e. reconfigurable

Special Instructions

Page 4: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

4

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Realizing Reconfigurable SIs

Legend:Special Instruction

Container (SIC):Reconfigu-rable area:

Core Pipeline (scaled down):

Partition the reconfi-gurable fabric into so-called SI Containers

An SI may be loaded into any free container

Problems: Fragmentation (internal

and external)

Relatively long reconfi-guration time

Co

re P

ipe

line

Corresponds to Chimaera, OneChip, Molen, Proteus, etc.

Page 5: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

5

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Analysis of Special Instruction Execution

0

5

10

15

20

25

30

35

0 200 400 600 800 1000 1200 1400 1600 1800 2000

No cISA exec.

With cISA exec.

With cISA exec. & smaller SIs

With cISA exec. & upgrades

#A

ccum

ula

ted S

I E

xecutions (

in t

housands)

Execution Time [K cycles]

core Instruction

Set Architectures

(i.e. the ISA that is

statically available

in the pipeline)

Our RISPP

approach:

modular Special

Instructions sup-

porting upgrades

Page 6: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

6

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Atom: elementary data path (smaller granularity)

Molecule: combination of Atoms (bigger granularity)

Special Instr.: Application specific assembly instruction

SI A SI B SI C

A1 A2 A3 AcISA

12

2

Atom 2Atom 1

B1 B2 BcISA C1 CcISA

Atom 3

1 2

C2

SPECIAL IN-STRUCTIONS(SIs)

MOLECULES

ATOMS2

111

12

Atom 4 Atom 6Atom 5

1 2 122

112

(the numbers denote: #Atom-instances requi-red for this Molecule)

1

(an SI can be implementedby any of its Molecules)

Fundamental Processor Extension:Atom / Molecule Model

Page 7: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

7

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Atom: elementary data path (smaller granularity)

Molecule: combination of Atoms (bigger granularity)

Special Instr.: Application specific assembly instruction

SI A SI B SI C

A1 A2 A3 AcISA

12

2

Atom 2Atom 1

B1 B2 BcISA C1 CcISA

Atom 3

1 2

C2

SPECIAL IN-STRUCTIONS(SIs)

MOLECULES

ATOMS2

111

12

Atom 4 Atom 6Atom 5

1 2 122

112

(the numbers denote: #Atom-instances requi-red for this Molecule)

1

(an SI can be implementedby any of its Molecules)

Example Atom

X00

X30

X10

X20Y20

Y00

Y10

Y30

>> 1−

>> 1

>> 1

−>> 1++

++

<< 1

<< 1

DCT HT

Fundamental Processor Extension:Atom / Molecule Model

Page 8: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

8

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Atom: elementary data path (smaller granularity)

Molecule: combination of Atoms (bigger granularity)

Special Instr.: Application specific assembly instruction

SI A SI B SI C

A1 A2 A3 AcISA

12

2

Atom 2Atom 1

B1 B2 BcISA C1 CcISA

Atom 3

1 2

C2

SPECIAL IN-STRUCTIONS(SIs)

MOLECULES

ATOMS2

111

12

Atom 4 Atom 6Atom 5

1 2 122

112

(the numbers denote: #Atom-instances requi-red for this Molecule)

1

(an SI can be implementedby any of its Molecules)

Fundamental Processor Extension:Atom / Molecule Model

Example Special Instruction

INPUT: OUTPUT:DCT=0

QSubSAV (Sum of

Absolute Values)

+

+

+

Repack Transform

HT=0 DCT=0 HT=1

Page 9: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

9

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Atom: elementary data path (smaller granularity)

Molecule: combination of Atoms (bigger granularity)

Special Instr.: Application specific assembly instruction

SI A SI B SI C

A1 A2 A3 AcISA

12

2

Atom 2Atom 1

B1 B2 BcISA C1 CcISA

Atom 3

1 2

C2

SPECIAL IN-STRUCTIONS(SIs)

MOLECULES

ATOMS2

111

12

Atom 4 Atom 6Atom 5

1 2 122

112

(the numbers denote: #Atom-instances requi-red for this Molecule)

1

(an SI can be implementedby any of its Molecules)

Fundamental Processor Extension:Atom / Molecule Model

Example Molecule

+

+

+

Repack (2 instances) Transform (2 instances)

1716151413121110

SAV (2 instances)

Page 10: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

10

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Supporting Modular SIs

Co

re P

ipe

line

There is no predetermined maximum of supported SIs

Multiple SIs may share common data paths (i.e. reuse them)

SIs can be upgraded (due to multiple available Molecules)

Significantly reduced fragmentation problem

Decision how many Atom Containers shall be spend for which SI can adapt at run time Demands a run-time system

Co

re P

ipe

line

Page 11: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

11

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Dynamic System Behavior

Extensible Processor: selecting points in design space at design time

Reconfigurable Processors: typically fix at compile time when and how to deploy reconfigurable hardware

For instance depending on input data (e.g. different computational paths in a video encoder)

How to handle situations that are

unknown at design- & compile time?

Page 12: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

12

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Example: Execution Flowof an H.264 Video Encoder

Iterates on Macro Blocks (MBs), i.e. 16x16 pixels

2 different MB-types different computational paths

Intra-frame prediction: I-MB

Inter-frame prediction: P-MB

If M

B_

Ty

pe

= P

_M

B MC

L

oo

p O

ve

r M

B

Encoding

Engine

Lo

op

Ov

er

MB

ME

: S

A(T

)D

RD

·M

B-T

yp

e D

ec

isio

n (

I o

r P

)

·M

od

e D

ec

isio

n (

for

I o

r P

)

Lo

op

Ov

er

MB

IPRED

DCT /

Q

DCT /

HT / Q

IDCT /

IQ

IDCT /

IHT / IQ

CAVLCth

en

els

e

MB Encoding Loop

In-L

oo

p D

e-

Blo

ck

ing

Filte

r

Page 13: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

13

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Problem: Input-DependentDynamic Application Behavior

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700

Frame Number

I-M

Bs

pe

r fr

am

e [

%]

Distribution of

I-MBs [%] in a CIF

(352x288: 396 MBs)

Video Scence

The RISPP Run-time system (Rotation Manager) can adapt the SI

performance (choosing different Molecules) depending on the requirements

Page 14: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

14

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Run-time System:Simplified Overview and Connections

Decode Scheduling

Prediction

Selection

Replacing

Core Pipeline

Status / Controll

Execution Control

Instruction

Reconfigure

Special Instructions& Forecasts

Run-time System

Instruction Memory including Special

Instructions and

Forecasts

Monitoring

Reconfigurable HW

Page 15: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

15

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Error Back-

propagation

Online Monitoring & Prediction

Exemplary Control Flow Graph (Nodes are Base Blocks)

Time for

reconfiguration

Exemplary inner loop,

executing SATD

FC1: Forecasting the

future usage of SATD

FC2: Forecasting that SATD is

no longer required in this loop,

potentially forecasting other SIs

Potentially other

inner loops, etc.

Exemplary

outer loop

Monitor the amount of SIexecutions between FC1 and FC2

Calculate the Error between Prediction and Monitoring

Back-propagate the weight error (based on the Temporal Difference Scheme)

Page 16: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

16

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Measured Forecast Adaptationfrom Hardware Prototype

0

20

40

60

80

100

120

140

160

180

200

150 200 250 300 350 400 450 500

Frame Number

Fo

reca

st

Valu

e (

exp

ecte

d a

mo

un

t o

f I-

MB

s)

Actually Executed I-MBs

Predicted I-MBs for α = 0.6

Predicted I-MBs for α = 0.3

Predicted I-MBs for α = 0.1

Page 17: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

17

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Run-time System:Simplified Overview and Connections

Decode Scheduling

Prediction

Selection

Replacing

Core Pipeline

Status / Controll

Execution Control

Instruction

Reconfigure

Special Instructions& Forecasts

Run-time System

Instruction Memory including Special

Instructions and

Forecasts

Monitoring

Reconfigurable HW

Page 18: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

18

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Input to the Selection:Requested SIs and theirimplementing Molecules

Selection: Choose a subsetS of Molecules

Constraint: Chose exactly oneMolecule to implement a SI

Constraint: Stay within the capacityof the reconfigurable hardware

Formalized Molecule Selection

Complexity:

● Our Selection has similarities to the Knapsack

problem

● However, due to Atom sharing it is not identical

● Polynomial reduction from Knapsack to

Selection is given in the DAC’08 paper

Selection is NP-hard

Page 19: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

19

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Greedy vs. Optimal Selection

For many parameter pairs, greedy finds same solution

In some (not relevant) cases, greedy is even faster

optimal solving the Selection does not necessarily lead to the fastest execution (more problems need to be solved and the performance still depends on the actual SI execution frequency)

Greedy: Optimal:

Page 20: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

20

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Run-time System:Simplified Overview and Connections

Decode Scheduling

Prediction

Selection

Replacing

Core Pipeline

Status / Controll

Execution Control

Instruction

Reconfigure

Special Instructions& Forecasts

Run-time System

Instruction Memory including Special

Instructions and

Forecasts

Monitoring

Reconfigurable HW

Page 21: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

21

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Determining Atom Reconf. Sequence

6

5

4

3

2

1

fastest available

Molecule# loaded

Atoms

3m

1m 2m

#

In

sta

n-

ce

s o

f A

tom

A1

# Instances

of Atom A2

1 2 3

1

2

3

1m

2m

3m2m 2m

3m

Upgrade

candi-

dates

Selected

Molecule

Problem: Reconfiguration is slow

Constraint: At most one reconfiguration at a time

Page 22: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

22

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Comparing Different SchedulingMethods for 2 Selected SIs

Scheduling Methods:

# Instances of Atom A1

# Instances of Atom A2

1 2 3

1

2

3

4

4

Upgrade Candi-

dates for SI2

5

5“First Select First Re-

configure” (FSFR)

“Avoid Software

First” (ASF)

“Smallest Job First”

(SJF)

Selected Mole-

cule for SI2

Selected Mole-

cule for SI1

1 2m m

1m

2m

Page 23: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

23

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Comparing our ProposedScheduling Schemes

200

300

400

5005 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Amount of Reconfigurable Hardware [#AtomContainers]

Ex

ec

uti

on

Tim

e [

Mil

lio

n C

yc

les

]

Avoid Software First (ASF)

First Select First Reconfigure (FSFR)

Smallest Job First (SJF)

Highest Efficiency First (HEF)

Encoding 140 frames (352x288 resolution) with H.264

Page 24: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

24

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Detailed Analysis of HEF scheduler

DCT Execution MC Execution SATD Execution SAD Execution

DCT Latency MC Latency SATD Latency SAD Latency

Lin

es:

SI

Late

ncy [

Cycle

s]

(Lo

g S

cale

)

Execution Time [100K Cycles]

110

100

1,0

00

10,0

00

Bars

: # o

f S

I E

xecu

tio

ns p

er

100K

Cycle

s

01,0

00

2,0

00

3,0

00

4,0

00

Continuation of Latency lines for SAD and SATD are omitted for clarity

0 2 4 6 8 10 12 14 16 18 20 22 24

Page 25: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

25

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Run-time System:Simplified Overview and Connections

Decode Scheduling

Prediction

Selection

Replacing

Core Pipeline

Status / Controll

Execution Control

Instruction

Reconfigure

Special Instructions& Forecasts

Run-time System

Instruction Memory including Special

Instructions and

Forecasts

Monitoring

Reconfigurable HW

Page 26: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

26

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Infrastructure for Modular SIs

Atom Container(reconfigurable)scaled down for clarity

Bus Connector(non-reconfigurable)

Input

Output

. . .. . .

Page 27: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

27

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

FPGA-based Prototype

Xilinx Virtex-4 LX 160 on Silica/Avnet Board

Audio/Video Module, CF-Card, Touch-Screen LCD

SDRAM, DDR-DRAM, SRAM, Reconfiguration EEPROM

Page 28: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

28

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

PCB for Reconfiguration EEPROMand Peripherals

Page 29: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

29

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Leon 2 core Instruction Set Architecture

Static Atoms (i.e. non-reconfigura-

ble) for typical operations like data repacking

etc.

10 dynamically reconfigurable

Atoms

Periphery IP-Core for Video-In and

Video-Out. Additi-onally providing

video buffers and memory-mapped interface to access

the buffers

Periphery IP-Core for I2C (touch-

screen LCD)

Reconfiguration IP-Core: external EEPROM FIFO ICAP (Internal

Configuration Access Port)

Atom Framework

Rotation Manager: currently imple-

mented as a hard-ware block for the Forecasts / Predic-tion and a Micro-

Blaze for Selection, Scheduling and Replacement

FPGA Floorplan/PlanAhead

Page 30: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

30

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

FPGA Floorplan/PlanAhead (cont’d)

Page 31: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

31

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

RISPP Simulator:Components, Connections, and GUI

SystemC-based simulator

Input for pipeline is obtained from Instruction Set Simulator (ArchC)

SI information is semi-automatically derived at compile time

getFastestAvailableImpl()

Special Instruction

getRequiredDPs()

SI Implementation

isAvailableOnFPGA()

Data Path

manageSIexec()

SI Execution UnitCore PipelineApplication

Binary

Prefetching Unit

Online Monitoring

input

input

inpu

t

Instruction

Set Arch.

Branch

tracepushNextDataPath()

DP loading queue

FPGA

SIC FPGA

DPC FPGA

Special Instruction Container

Data Path Container

2..Ü

1..Ü

0..Ü

0,11

1

11

1

1 1

1

11 1 0..Ü

1

11..Ü

1

0,1

Defines the SIs (including instruc-

tion format), implementations and data paths

XML-filehas many▼

requiresmultiple

currentlycontains ►

is availableon FPGA

contains ►

1 0..Ü

knows ►

triggers ►

◄ stalls

◄ observes

asks ►

fills ►

triggers▼

reconfigures ►

1..Ü0..Ü

Pipeline & run-time system SI management

FPGA management0..Ü

...

...

......

...

UML Legend: association: aggregation: composition: generalization:

1

Page 32: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

32

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

RISPP Simulator:Components, Connections, and GUI

Page 33: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

33

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

If M

B_

Ty

pe

= P

_M

B MC

L

oo

p O

ve

r M

B

Encoding

Engine

Lo

op

Ov

er

MB

ME

: S

A(T

)D

RD

·M

B-T

yp

e D

ec

isio

n (

I o

r P

)

·M

od

e D

ec

isio

n (

for

I o

r P

)

Lo

op

Ov

er

MB

IPRED

DCT /

Q

DCT /

HT / Q

IDCT /

IQ

IDCT /

IHT / IQ

CAVLC

the

ne

lse

MB Encoding Loop

In-L

oo

p D

e-

Blo

ck

ing

Filte

r

Overall System Evaluation

Page 34: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

34

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Overall System Evaluation

Special

InstructionImplemented Atoms

Motion Estimation

(ME)

SAD SAD_16

SATD QSub, HT_4, Repack, SAV

(Inverse) Transform

(I)DCT DCT_4, Repack, (QSub)

(I)HT_2x2 HT_2

(I)HT_4x4 HT_4, Repack

Motion Compen-

sation (MC)MC_Hz_4 PointFilter, BytePack, Clip3

Intra Prediction

(IPred)

IPred_HDC PackLBytes, CollapseAdd

IPred_VDC CollapseAdd

Loop Filter (LF) LF_BS4 Cond, LF_4

Page 35: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

35

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Overall System Evaluation

Special

InstructionImplemented Atoms

Motion Estimation

(ME)

SAD SAD_16

SATD QSub, HT_4, Repack, SAV

(Inverse) Transform

(I)DCT DCT_4, Repack, (QSub)

(I)HT_2x2 HT_2

(I)HT_4x4 HT_4, Repack

Motion Compen-

sation (MC)MC_Hz_4 PointFilter, BytePack, Clip3

Intra Prediction

(IPred)

IPred_HDC PackLBytes, CollapseAdd

IPred_VDC CollapseAdd

Loop Filter (LF) LF_BS4 Cond, LF_4

Compared to Leon 2 GPP: 26.6x faster

Conservative comparison to reconfigurable

processor with Monolithic SI: still 1.24x faster

Depending on the size/ granularity of the SIs

it can be > 7x (e.g. for Proteus; 2.38x in

comparison to Molen)

Our approach additionally provides:

Adaptivity for changing control flow

(due to input data)

Adaptivity for multi tasking scenarios

(tasks sharing the reconfigurable hardware)

Page 36: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

36

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

Summary & Conclusion

Hierarchical Special Instruction composition with

different area-performance trade-offs modular SIs

Solved the problem “Parallelism vs. Reconfiguration

Overhead”. We can provide both by upgrading the SIs

Achieving noticeably better performance:

Comparison to GPP (Leon-2): 26.6x (using 8 Atom Containers)

Comparison to state-of-the-art ASIPs: up to 3.1x

Comparison to state-of-the-art reconfigurable processor: up to

7.19x (2.38x in comparison to Molen)

Providing very high adaptivity that is demanded for

changing control flow or multi-tasking environments

There is a large potential for improving the way current

Extensible Processors work

Page 37: RISPP: Rotating Instruction Set Processing Platform · 2009-09-29 · RISPP: Rotating Instruction Set Processing Platform. Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for

37

Bauer, Shafique, Henkel http://ces.univ-karlsruhe.de/RISPPInvited Talk @ SPP-RR Colloquium, 9/25/2009

RISPP Publication Excerpt[ICCAD’09] Run-time Energy Minimization Scheme using a dynamically

power-gated instruction set

[CODES’09] Replacement Policy for run-time reconfigurable accelerators

[DATE’09] Cross-Architectural Design-Space Exploration Tool

[JSPS’09] Describing and optimizing the H.264 video encoder appl.

[FPL’08] Hardware infrastructure that allows to reconfigure Atoms and to implement different Molecules

[TVLSI’08] General overview and comparison with state-of-the-art ASIP

[DAC’08] Determining the Molecule Selection and comparing our approach with Proteus

[DATE’08] Determining the Atom reconfiguration sequence for the selected Molecules and comparing our approach with Molen

[SASO’07] Online monitoring and fine-tuning the predicted Special Instruction execution frequencies

[DAC’07] Presentation of RISPP concept and compile-time preparations (when to start prefetching)