Optimisation of Pipelined MPSoCs Using Integer Linear Programming Embedded Systems Laboratory esl...

Preview:

Citation preview

Optimisation of Pipelined MPSoCs Using Integer Linear ProgrammingOptimisation of Pipelined MPSoCs Using Integer Linear Programming

Embedded Systems Laboratory

http://www.cse.unsw.edu.au/~esl

Haris JavaidResearch Associate

School of Computer Science and EngineeringThe University of New South Wales

AUSTRALIA

An Example of a Pipelined MPSoCAn Example of a Pipelined MPSoC

Process 0

Process 1

Process 2

Process 3

Process 4

Process 5

• What is a pipelined MPSoC (MultiProcessor System-on-Chip)?• Multiple processors connected

in a pipeline• Useful for?

• Audio and video codecs• Used in?

• Portable devices such as phones, tablets, etc.

• Challenge?• Minimise their area footprint

because of size constraints• Less area footprint typically

means less power consumption, which will increase battery life

JPEG Encoder

Runtime of a Pipelined MPSoCRuntime of a Pipelined MPSoC

Raw Image

256x128

512 8x8 Macroblocks

1 1 11 1Macro Block 1

2234 3 2 2345Macro Block 2

6 5 4 3 2Macro Block 3

7 6 5 4 3

1000 2000 1300 900 1000Latency (time units)

Macro Block #

6200

10,200

First Macroblock processing time

Runtime of a pipelined MPSoCRuntime of a pipelined MPSoC

L(pcritical) = max { L(pi) } i=0,1,2,...,N-1

1000 2000 1300 900 1000Latency (time units)

Research ProblemResearch Problem Minimise area footprint under a runtime constraint, when each

processor has a number of configurations

A design point is one combination of processor configurations

A B C D E F

A B C D E F G H

A B C D E

A B C D E F

A B C D E F

A B C D

E

B

D

F

A

F

Runtime constraint satisfied

Minimum area footprint

SolutionSolution

Binary (0-1) Integer Linear Programming (ILP) Objective

Minimize area footprint of the pipelined MPSoC

Constraints Only one configuration can be selected for a processor Amongst the selected processor configurations, the

one with maximum latency will be critical for runtime calculation

Pipelined MPSoC’s runtime < Rd (Runtime constraint provided by the designer)

Binary ILP FormulationBinary ILP Formulation

Variables xi,j equals 1 if configuration j of processor i is selected cm,n equals 1 if processor m with configuration n is selected as

the critical configuration of the pipelined MPSoC N = Total number of processors Ki = Total number of configurations for processor i L1

i,j , Li,j and Ai,j refers to L1,L and area of processor i in configuration j respectively

Objective

Binary ILP FormulationBinary ILP Formulation

Constraints For each processor, one configuration is selected

Only one processor configuration can be selected as the critical configuration

A processor configuration already selected can be selected as the critical configuration

Binary ILP FormulationBinary ILP Formulation Pipelined MPSoC’s runtime must be less than or

equal to the runtime constraint

A configuration is critical only if its L is maximum amongst the Ls of all the selected processor configurations

where ci = Σ ci,j

Some Experiments …Some Experiments …

Configurations include custom instructions and differing instruction and data caches

Design Space = 4.2 x 1013 design points

Processor # # Configurations

Processor 0 144

Processor 1 144

Processor 2 396

Processor 3 144

Processor 4 252

Processor 5 144

ResultsResults

To further reduce solution time: Better ILP solvers can be used Pruning techniques and heuristics can be used Simplify (reduce variables and constraints) in the ILP

formulation

Rd (Clock Cycles) Solution Time

3500000 3 secs

3200000 17 mins

2900000 35 mins

2800000 No sol. (2 days)

2700000 No sol. (2 days)

Simplification of the Binary ILPSimplification of the Binary ILP

Any ideas?

Can we avoid the use of variables for the determination of the critical processor (ci,j)?

Write the first constraint multiple times assuming each processor is critical

Remove the ci,j variables and the second constraint

Improved ResultsImproved Results

Rd (Clock Cycles) Solution Time Improved Solution Time

3500000 3 secs < 1 secs

3200000 17 mins 11 secs

2900000 35 mins 1 min

2800000 No sol. (2 days) 5 mins

2700000 No sol. (2 days) 6 mins

Further ReadingFurther Reading

H. Javaid and S. Parameswaran. Synthesis of Heterogeneous Pipelined Multiprocessor Systems Using ILP: JPEG Case Study. In CODES+ISSS, 2008.

H. Javaid, A. Ignjatovic and S. Parameswaran. Rapid Design Space Exploration of Application Specific Heterogeneous Pipelined Multiprocessor Systems. In IEEE TCAD, 2010.

Questions?Questions?

If you are interested in research related to design, modelling and optimisation of multiprocessor systems, embedded

systems and FPGA based systems for thesis or non-thesis reasons ;-)

contact me at harisj@cse.unsw.edu.au