19
Making Good Points: Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank Vahid * Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine

David Sheldon, Frank Vahid * Department of Computer Science and Engineering

  • Upload
    ganesa

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods. David Sheldon, Frank Vahid * Department of Computer Science and Engineering University of California, Riverside - PowerPoint PPT Presentation

Citation preview

Page 1: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

Making Good Points: Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods

David Sheldon, Frank Vahid*

Department of Computer Science and EngineeringUniversity of California, Riverside

*Also with the Center for Embedded Computer Systems at UC Irvine

Page 2: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

Counterbus

W1

16 bytes4 physical lines filled when line size is 32 bytes

Off Chip Memory

Line Concatenation

[Zhang/Vahid/Najjar, ISCA 2003, ISVLSI 2003, TECS 2005]

Parameterized Component: Cache

2 of 19

127% 620% 126%

0%

20%

40%

60%

80%

100%

120%

padp

cm

crc

auto

2

bcnt bilv

bina

ry blit

brev

g3fa

x fir

pjep

g

ucbq

sort v4

2

adpc

m epic

g721

pegw

it

mpe

g

jpeg ar

t

mcf

pars

er vpr

Ave

Nor

malized

Ene

rgy)

cnv8K4W32B cnv8K1W32B cfg8Kwcwslc

40% avg savings

Page 3: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 3 of 19

FPGA Systems are Often Built from Parameterized Components

Parameterized components include: Cache (e.g., size, associatively, line size)

Processors Co-processors Buses (e.g., bit width, network-on-chip structure)

uP

MPEG Enc

Cache config

config

config

config Bus

FPGADSP

config

Page 4: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

0

2

4

6

8

10

12

14

0 20 40 60 80 100 120 140 160 180 200

Mill

ions

Thousands

Equivilent LUTs

cycl

es

520 pointsOver 10 days

~35 min per point

<1 min to execute

Remaining time was in synthesis and place and route

Microblaze Soft-Core Processor – Design Space due to Parameters

Pareto points: Points where no point exists that is better in all metrics.

Cycles

Equivalent LUTs 4 of 19

Page 5: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 5 of 19

Pareto Points Differ Per Application and Per Criteria

App a2

Designer B

Platform

App a1

Time

Ener

gyTime

Ener

gy

Pareto points

Designer A

c1c2 c3

c1

c2

c3

(a)

(b)

c1 c3 ...c2

Page 6: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 6 of 19

Previous Work: Parameter Interdependency graph

Platune [Givargis/Vahid 2002]: Introduced parameter interdependency graph

Edges – parameters are dependent

Nodes not connected – independent

Search dependent parameters exhaustively; compose local Pareto points into global points

Greatly reduces search space if independent parameters

Good results, 44 hours Randomized Approaches

Pareto Simulated Annealing (PSA) [Talarico 2006]

Good results, 6 hours Genetic Algorithms [Ascia 2005]

Good results, 4 hours

Platune’s Architecture

MIPSI$

D$MEM

CPU–I$ Bus

CPU–D$ Bus

$-MEM Bus

sizeassoc.linesize

sizeassoc.

codea code

code

Supply Voltage

Page 7: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 7 of 19

Our Approach We developed

Design-of-Experiments (DoE)-based technique to automatically generate a parameter interdependency graph

Relieves designer of burden Technique to generate Pareto-points via parameter interdependency graph edge-weight-based algorithm

Improve speed versus Platune Called DoE-Based Pareto-Point Generator (DPG)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0 2 4 6 8 10 12Time (sec)

Ener

gy (

J)

Time

Performance

Page 8: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 8 of 19

Design of Experiments (DoE)

i$ size i$ assoc d$ size d$ lined$ assocm-i$code

m-i$a code

$-mcode

Supply Voltage

MIPSI$

D$MEM

CPU–I$ Bus

CPU–D$ Bus

$-MEM Bus

sizeassoc.linesize

sizeassoc.

codea code

code

Supply Voltage

2k8

8k832

BiBi

Bi

4.1

DoE generates a set of orthogonal experiments that allows for statistical analysis of the search space

Page 9: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 9 of 19

DPG Algorithm Subsequent DoE analysis determines main effects of parameters

Y bar Marginal Means Plot

0

0.00002

0.00004

0.00006

0.00008

0.0001

0.00012

0.00014

0.00016

-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1

Effect Levels

i$ size

i$ assoc

d$ size

d$ line

d$ assoc

m-i$code

m-i$a code

$-mcode

Supply Voltage

MIPSI$

D$MEM

CPU–I$ Bus

CPU–D$ Bus

$-MEM Bus

sizeassoc.linesize

sizeassoc.

codea code

code

Supply Voltage

Page 10: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 10 of 19

DPG Algorithm (cont.) Compute weight of each pair of nodes Sort edges in decreasing weight

DK, (I$ assoc, CPU-I$ address code) DI, (I$ assoc, CPU I$ code) IK, (CPU-I$ code, CPU I$ address code) IQ, (CPU-I$ code, $-MEM address code) KQ, (CPU I$ address code, $-MEM address code) ...

MIPSI$

D$MEM

CPU–I$ Bus

CPU–D$ Bus

$-MEM Bus

sizeassoc.linesize

sizeassoc.

codea code

code

Supply Voltage

Page 11: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 11 of 19

DPG Algorithm (cont.) Pair wise merge of nodes

Creates a sparse set of Pareto points The designer can direct the tool to fill in the regions of interest

Original Pareto pointsFilled in Pareto points

Time

Energy

Page 12: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 12 of 19

Platune – Pareto Graph with Fill-in

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time

Ener

gy

Platune

Single Factor

DoE IOT

DPG - 3 value

DPG - fill in

jpeg

Page 13: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 13 of 19

Platune – Pareto Graph with Fill-in

0

0.001

0.002

0.003

0.004

0.005

0.006

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Time (sec)

Ener

gy (J

)

PlatuneDPG

b1_histogram

Page 14: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

Interdependency Graph Comparison: Manual vs. Automated

David Sheldon, UC Riverside 14 of 19

jpeg b1_histogram g3fax

Page 15: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 15 of 19

Platune Results

44

0

1

2

3

4

5

6

SF DoE IOT DPG Genetic PSA Platune

Run

time

in H

ours

DPG is 30x faster than Platune 2.5x faster than Genetic Algorithms

Page 16: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

Xilinx Microblaze Soft-Core Processor

Tuned the Microblaze for various benchmarks

Exhaustive data generated for 12 benchmarks for comparison

The Microblaze also has a configurable cache, which allows for over 3,000 configurations.

For these tests we used results previously generated thus giving us only 64 configurations.

David Sheldon, UC Riverside 16 of 19

MicroblazebsFPUdiv

mulMSRPCMP

Page 17: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 17 of 19

Network on Chip – Results

DPG also works on larger design spaces

Page 18: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

DPG Scales Well

David Sheldon, UC Riverside 18 of 19

Number of Parameters

DPG Analysis Phase

Total Design Space

Percent of Design Space

6 34 64 53.13%10 67 1,024 6.54%15 136 32,768 0.42%20 234 1,048,576 0.02%25 353 33,554,432 0.001%30 497 1,073,741,824 0.00005%

Page 19: David Sheldon, Frank Vahid * Department of Computer Science and Engineering

David Sheldon, UC Riverside 19 of 19

Conclusion DoE-Based Pareto-Point Generation (DPG) algorithm quickly finds good Pareto Points Results were better and obtained faster than previous Platune or randomized techniques

Approach is easier to use – no designer knowledge of parameter interdependencies is needed

Useful for FPGAs as well as other parameterized systems, such as SOCs synthesized to ASICs, parameterized SOCs, etc.