33
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour , Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki Murakami Kyushu University, Fukuoka, Japan

A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

  • Upload
    posy

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator. Farhad Mehdipour , Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki Murakami Kyushu University, Fukuoka, Japan. O UTLINE. Introduction Problem Definition and Basic Concepts - PowerPoint PPT Presentation

Citation preview

Page 1: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki Murakami

Kyushu University, Fukuoka, Japan

Page 2: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

2

OUTLINE

Introduction Problem Definition and Basic Concepts Hybrid DSE Approach for Designing RAC Case study: Designing RAC for a Reconfigurable Processor Conclusion

Page 3: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

3

OUTLINE

Introduction Problem Definition and Basic Concepts Hybrid DSE Approach for Designing RAC Case study: Designing RAC for a Reconfigurable Processor Conclusion

Page 4: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

4

Designing Embedded Systems

Embedded Microprocessors Application-Specific Integrated Circuits (ASICs) Application-Specific Instruction set Processors (ASIPs) Extensible Processors

CPU

Instruction Dispatcher

Register File

+ & x LD/ST CFU1 CFU2

LD/ST: Load / Store

CFU: Custom Functional Unit

Page 5: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

5

Extensible Processors

Improving the performance and energy efficiency of the embedded processors

maintaining compatibility and flexibility. Using an accelerator for accelerating frequently executed portions of

applications Accelerator implementations

custom hardware (such as ASIP or Extensible Processors) reconfigurable fine/coarse grain hw

CPU

Instruction Dispatcher

Register File

+ & x LD/ST CFU1 CFU2LD/ST: Load / Store

CFU: Custom Functional Unit

Page 6: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

6

Custom Instructions

Critical segments Most frequently executed portions of the applications

Instruction set customization hardware/software partitioning (Identifying critical segments in applications)

Custom Instructions (CIs) are extracted from critical segments of an application and executed on a Custom Functional Unit (CFU)

A CI is represented as a DFG

1: SUBU R3, R0, R32: ADDU R10, R0, R03: SRA R8, R10, 0x34: SLT R2, R3, R85: BNE R0,400488, R2

ADDU

SRA

SLT

SUBU

BNE

R3R0 R0R0

R10

R8

R2

R2

R30x3

400488

2

3

4

5

1A Custom Instruction

Page 7: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

7

Reconfigurable Processors

Using a reconfigurable accelerator (RAC) instead of custom functional units

Adding and generating custom instructions after fabrication

CPU

Instruction Dispatcher

Register File

+ & x LD/ST CFU1 CFU2RFU Config Mem

CFU: Custom Functional Unit

RFU: Reconfigurable Functional Unit

Page 8: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

8

Baseline Processor

ALU

Register File

RAC...

...

...

ConfigurationMemory

How a Reconfigurable Processor Works

GPP: General Purpose Processor

RAC: Reconfigurable Accelerator

400680 subiu $25,$25,1400688 lbu $13,0($7)400690 lbu $2,0($4)400698 sll $2,$2,0x184006a0 sra $14,$2,0x184006a8 addiu $4,$4,14006b0 srl $8,$2,0x1c4006b8 sll $2,$8,0x24006c0 addu $2,$2,$254006c8 lw $2,0($2)4006d0 xori $13,$13,14006d8 addu $10,$10,$2400680 subiu $25,$25,1400698 sll $2,$2,0x184006a0 sra $14,$2,0x18400688 lbu $13,0($7)4006e0 bgez $10,4006f0 ... Hot Basic Block

Reconfigurable Processor

Page 9: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

9

OUTLINE

Introduction Problem Definition and Basic Concepts Overall Design Approach Hybrid DSE Approach for Designing RAC Case study: Designing RAC for a Reconfigurable Processor Conclusion

Page 10: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

10

Designing a Reconfigurable Accelerator (RAC)

Multitude of design parameters e.g. number of functional units, input/output ports, type of

functional units and etc.

Several design parameters high complexity in the RAC design the requirements for a methodological approach

A major challenge finding the right balance between the different quality

requirements (e.g. speedup, area, energy consumption)

Page 11: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

11

Traditional Design Process

Describing a reference model Verifying the model functional correctness Obtaining a rough estimates of performance Manual or semi-automatic generation of several

alternative designs Choosing the most suitable design based on various

performance metrics

Page 12: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

12

Design Space Exploration

Design Space Exploration (DSE) the process of analyzing several “functionally equivalent”

implementation alternatives to identify an optimal solution Can become too computationally expensive

Example: a design with four tasks on an architecture with three processing

modules and each have four possible configurations results in 500 design alternatives

Our Hybrid (analytical & quantitative) DSE approach reduces substantially the design time and effort

Page 13: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

13

Assumptions

RAC: a matrix of FUs the width equal to w and height equal to h basically has a combinational logic

Basic Elements: Functional Units (logic resources) Multiplexers (routing resources)

FUs in RAC are fully connected Except the lack of connections from lower to upper rows

RAC Input Ports

RAC Output Ports

2

2FU

1

1FU

2

1FU

1

2FU Row1

FU FU FU FU

Col1

Row5

...

...

h

w

Page 14: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

14

Assumptions

CIs (DFGs) are mapped onto the RAC and executed at runtime

1: SUBU R3, R0, R32: ADDU R10, R0, R03: SRA R8, R10, 0x34: SLT R2, R3, R85: BNE R0,400488, R2

ADDU

SRA

SLT

SUBU

BNE

R3R0 R0R0

R10

R8

R2

R2

R30x3

400488

2

3

4

5

1A Custom Instruction

Data Flow Graph

2

3

4

1

RAC Initial Architecture

5

Page 15: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

15

OUTLINE

Introduction Problem Definition and Basic Concepts Hybrid DSE Approach for Designing RAC Case study: Designing RAC for a Reconfigurable Processor Conclusion

Page 16: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

16

The Problem

Designing a RAC while optimizing Speedup & Area Design parameters

the RAC’s width and height the number of FUs, the number of input and output ports

Speedup

CCyclesOnRATotalClock

ocessorseCyclesOnBaTotalClockSpeedup

Pr

Page 17: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

17

Design Methodology- Tool Chain

Our Approach: A Hybrid (Analytical & Quantitative) DSE Approach

CI ExtractorCIs

(DFGs)

CI (DFG)Analyzer

Required Informationfor DSE

Design SpaceExploration

Accelerator FinalSpecification

Adjusting RACArchitecure

Specification ofAccelerator

Application

Page 18: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

18

Problem Formulation

: the fraction of DFGs (CIs) in applications which have the width equal to i and height equal to j.

: percentage of application execution time concerns to all DFGs with the width of 4 and height of 3 is 7%.

ij

W

i

H

j

ij CPUCCCPUCC

1 1

)()(

jiFUCPUCC ij

ij )()(

2

4

6

1 3

5

2

4

5

1 3

ij

%743

Page 19: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

19

Problem Formulation

ij

W

i

wh

H

j

ij

wh RACCCRACCC

1 1

)()(

: the number of clock cycles for executing DFG(i,j) on a RAC (w,h)

)( wh

ij RACCC

if one or both dimensions of a DFGs are greater than RAC’s dimensions,

temporal partitioning: divides a DFG into time exclusive smaller DFGs

reconfiguration overhead time for loading subsequent partitions of a DFG from the configuration memory onto the RAC

Page 20: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

20

Problem Formulation

hjwi ,)3hjwi ,)2

)()( wh

wh

ij RACDRACCC hjwi ,)1

hjwi ,)4

RACh

w

RAC

DFG

RAC

h

w

RAC

DFGh

w

RAC

DFG

Page 21: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

21

Delay of RAC

wkMUXDFUDRACDh

i

ki

h

i

ki

wh ,...,1,)()()(

1

11

)( kjMUXD = delay of mux(

kjs2 to 1)

kjmk

js 2log )1()1( wwjmkjRAC Input Ports

RAC Output Ports

22FU

1

1FU

21FU

1

2FU Row1

FU FU FU FU

Col1

Row5

...

...

h

wRAC’s Delay Path

Page 22: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

22

Area of RAC

= delay of mux(kjs2 to 1)

w

i

h

j

ij

w

i

h

j

ij

wh MUXAFUARACA

1

1

11 1

)(*2)()(

)( ijMUXA

RAC Input Ports

RAC Output Ports

2

2FU

1

1FU

2

1FU

1

2FU Row1

FU FU FU FU

Col1

Row5

...

...

h

w

Page 23: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

23

Optimization Problem

Objective: Maximize ( ) )

ij

W

i

wh

H

j

ij

ij

W

i

H

j

ij

wh

wh

RACCC

CPUCC

RACCC

CPUCCRACS

1 1

1 1

)(

)(

)(

)()(

)( whRACS

Page 24: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

24

OUTLINE

Introduction Problem Definition and Basic Concepts Hybrid DSE Approach for Designing RAC Case study: Designing RAC for a Reconfigurable Processor Conclusion

Page 25: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

25

Designing RAC for a Reconfigurable Processor

AMBER: a reconfigurable processor targeted for embedded systems Main components

a base processor (general RISC processor) Sequencer and a coarse-grained reconfigurable functional unit (RFU)

Register File

ID/EXE Reg

RFU

ConfigurationMemory

ALUs

MUX SequencerSequencer

Table

EXE/MEM Reg

GPP Augmented HW

AMBER’s Architecture

Page 26: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

26

Quantitative Approach

22 applications of Mibench were attempted Applications were executed on Simplescalar and profiled Hot segments and DFGs are extracted from the applications No limitation in the RFU’s initial architecture DFGs are mapped on the initial RFU architecture Feedbacks of mapping

Specification of the designed architecture for AMBER’s RFU

Page 27: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

27

Hybrid DSE Approach

FU and various size multiplexers Synthesized using Hitachi 0.18um Measuring delay and area

DFGs are analyzed to extract required information quantitatively

Reconfiguration penalty time: 1 clock cycle

The base processor clock frequency: 166MHz

Page 28: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

28

Speedup Evaluation

14

710

1316

191 4 7

10

13

16

19

0

0.5

1

1.5

2

2.5

Speedup

Width

Height

0-0.5 0.5-1 1-1.5 1.5-2 2-2.5

Page 29: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

29

Speedup Evaluation

Increasing RAC’s width more parallelism

Widths larger than 6 negative effect of growing the number of muxes and their sizes no more speedup achievable

Increasing RAC’s Height longer delay speedup declines and area increases

Page 30: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

30

Comparison

Page 31: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

31

OUTLINE

Introduction Problem Definition and Basic Concepts Hybrid DSE Approach for Designing RAC Case study: Designing RAC for a Reconfigurable Processor Conclusion

Page 32: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

32

CONCLUSION

Hybrid DSE approach Uses realistic data from the attempted applications Substantially reduces design time and designer efforts Can be used for shrinking a large design space Easily extendable to apply new design parameters More suitable where the new applications are added

Page 33: A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator

Thanks for your attention!