42
1 F.Catthoor © imec 2004 Requirements of system level design approach Algorithms Data Structures + ARM IP 1 IP 2 RAM ROM Architecture Platform architecture RAM RAM ROM MMU custom logic DSP ROM micro processo r

Requirements of system level design approach

  • Upload
    helga

  • View
    41

  • Download
    3

Embed Size (px)

DESCRIPTION

Algorithms. +. Data Structures. Architecture. ARM. RAM. ROM. IP 1. IP 2. Platform architecture. ROM. custom logic. micro processor. ROM. MMU. Platform integration. RAM. DSP. RAM. Requirements of system level design approach. Data mngnt. Concurrency mngnt. Platform - PowerPoint PPT Presentation

Citation preview

Page 1: Requirements of system level  design approach

1F.Catthoor © imec 2004

Requirements of system level

design approachAlgorithms Data Structures+

ARM

IP1 IP2

RAM ROM

Architecture

Platform architecture

RAMRAM

ROM

MMU

custom logic

DSP

ROM microprocessor

Page 2: Requirements of system level  design approach

2F.Catthoor © imec 2004

• System Specification and System-level Refinement with Exploration Support (algorithm design level, concurrent task level, system timing simulation)

• Data Transfer and Storage Exploration for Massive Real Time Data Manipulation (dynamic memory mngntstatic transfer and storage, address generation)

• Co-Design for Heterogenous Implementation Paradigms (refinement from unified HW/SW model,RTOS modeling, complete system simulation)

• RF front-end exploration (fast mixed-signal co-simulation, chip-package co-design, noise coupling)

Current challenges and solutions

Page 3: Requirements of system level  design approach

3F.Catthoor © imec 2004

Task vs Array data vs Instr level issues

Instr.-level issues

Optimized system specification

Task-level system architecture

Task level issues

Array-level system architecture

Array data level issues

Proc.-level system architecture

Task1 Task2

Task3

Proc1 Proc2

Proc3Arithmetic + local control + address issues

Page 4: Requirements of system level  design approach

5F.Catthoor © imec 2004

Concurrency versus DTSE issues

Concurrency issues

Optimized system specification

DTSE optimized specification

Data transfer and storageexploration issues

Proc.-level system architecture

Arithmetic + local control + addressing

Proc1 Proc2

Proc3

Background memories

Page 5: Requirements of system level  design approach

6F.Catthoor © imec 2004

Why fix data storage/transfer before concurrency mngnt issues?Recursive image processing algorithm on local neighbourhoods:(i : 0 .. I-1 ) ::(j : 0 .. J-1 ) :: img[i][j]= f(img[i][j-k], old_img[i][j]);

I

rows

J c o l u m n s

Page 6: Requirements of system level  design approach

7F.Catthoor © imec 2004

Why fix data storage/transfer before concurrency mngnt issues?

For given speed-up M: minimally M data-paths for f()

Unrolling i loop (limited by I): M J-word double-buffered FIFO's

I

rows

J c o l u m n s

14.4mm2

(0.7um)

Page 7: Requirements of system level  design approach

8F.Catthoor © imec 2004

Why fix data storage/transfer before concurrency mngnt issues?

Unrolling j loop(limited by k):M - 1 buffer reg

(i : 0 .. I-1 ) ::(j : 0 .. (J div 2)-1 ) ::. begin.img[i][2j-1]= f(img[i][2j-k-1],... old_img[i][2j-1]);. img[i][2j]= f(img[i][2j-k],... old_img[i][2j]);. end;

I

rows

J c o l u m n s

Page 8: Requirements of system level  design approach

9F.Catthoor © imec 2004

Global data management design flow for dynamic concurrent tasks with data-dominated behaviour

Data Type RefinementData Type Refinement

Task concurrency mgmtTask concurrency mgmt

Physical memory mgmtPhysical memory mgmt

Address optimizationAddress optimization

SWSWdesigndesignflowflow

HWHWdesigndesignflowflow

Concurrent OO specConcurrent OO spec

MgmtUnit

Memory

controller

ASU ASU

processor

memmemmem

MemoryAllocationAssignment

SW/HW co-designSW/HW co-design

Virtual

MgmtMemory

DynamicDataTypes

keydata

keydata

Binary Tree (BT)

keydata

Sub-pool per size

Free Blocks

Page 9: Requirements of system level  design approach

10F.Catthoor © imec 2004

Data Management Flow

Dynamic

Data

Type

Explor.

Physical

Memory

Mngnt.

VirtualMemory

Segments

ConcreteData types

PhysicalMemories

DDT Dynamic Data TypeTrafo & Refinement

Dynamic memory mgmtRefinement

Physical memory mgmtRefinement

Page 10: Requirements of system level  design approach

11F.Catthoor © imec 2004

Data-transfer and data-storage bottlenecks: SDRAM access

ClientMain

Memory

Client

data

128 - 1024bit bus

LocalLatch

LocalSelectbank1

LocalLatch

LocalSelectbankN

Cacheand

Bankcomb.

GlobalBankSelectControl addr

ctrl

Wide word Burst mode

Page 11: Requirements of system level  design approach

12F.Catthoor © imec 2004

Data-transfer and data-storage bottlenecks: cache misses

ClientMain

Memory

MainMemory

Processors

Data-paths regf

16kBN-portSRAM

L1 cache

1MB1/2-portSRAM

L2 cache 256 MB (S)DRAM

Many cache missesPage Loading

Page 12: Requirements of system level  design approach

13F.Catthoor © imec 2004

Data-transfer and data-storage bottlenecks: system bus load

MainMemoryL2 cache

Datapaths

L1 cache

System chip

Harddisk

OtherSystem

Resources

OtherSystem

Resources

Diskaccess

bus

Mainsystem

bus

L2bus

Page 13: Requirements of system level  design approach

14F.Catthoor © imec 2004

Multi-processor System Design

Image Proc System

Standardsubsystem :

detailed solutionlocally optimized

by expert

Subsystemresembles a

standard solutionbut needs small

adaptations

Newcomplex

subsystem

E.g.: 2D convolution E.g.: DCT for specific coderE.g.: quadtree coder

Locally optimized

Globally optimized => exploration!

Buffer Buffer

Page 14: Requirements of system level  design approach

15F.Catthoor © imec 2004

Platform design requires change

Multi-media platform city

Traditionalarchitecture city

Traditional compiler boulevard

Power volcano (multi-media)

processor trend= Application engineer

Cobblestone bypassroad (requires paving)

Page 15: Requirements of system level  design approach

16F.Catthoor © imec 2004

Ad-Hoc Design: Backtracking ?

System Specification

Memory Organizations?? ? ? ? ? ? ? ? ? ? ? ? ?

??

Page 16: Requirements of system level  design approach

17F.Catthoor © imec 2004

Systematic System Exploration

System Specification

Memory Organization

? !? ?

? !? ?

! ?? ?

? ! ? ?

Page 17: Requirements of system level  design approach

18F.Catthoor © imec 2004

Data Transfer & Storage Exploration (DTSE) Principles

Processor Data Paths

L1Cache

L2Cache

Chip

Cache & BankRecombine

Local Latch 1 +Bank 1

Off-chip SDRAM

Local Latch N +Bank N

Page 18: Requirements of system level  design approach

19F.Catthoor © imec 2004

Data Transfer & Storage Exploration (DTSE) Principles

Processor Data Paths

L1Cache

L2Cache

Chip

Cache & BankRecombine

Local Latch 1 +Bank 1

Off-chip SDRAM

Local Latch N +Bank N

ANALYSIS !

Page 19: Requirements of system level  design approach

25F.Catthoor © imec 2004

Main Data Transfer & Storage Principles

Processor Data Paths

L1Cache

L2Cache

Chip

Cache & BankRecombine

Local Latch 1 +Bank 1

Off-chip SDRAM

Local Latch N +Bank N

4 Avoid N-port Memories 3 Exploit memory hierarchy

1 Reduce redundant transfers2 Introduce Locality

6 Exploit limited life-timeand data layout freedom 5 Meet real-time constraints

Page 20: Requirements of system level  design approach

26F.Catthoor © imec 2004

Fast implementationwith tools

Time - Efficient System Exploration Design Flow

Initial System Specification

Accurate cost figuresto guide decision

System-level Feedback?? ??

design alternatives

Page 21: Requirements of system level  design approach

27F.Catthoor © imec 2004

Physical Memory Management

Page 22: Requirements of system level  design approach

28F.Catthoor © imec 2004

Cavity detection application:medical imagingInitial description

f u n c t i o n

G a u s s B l u r

f u n c t i o n

C o m p u t e E d g e s

f u n c t i o n

D e t e c t R o o t s

f u n c t i o n

L a b e l R o o t s

( i m a g e _ i n : W [ N ] [ N ] )

i m a g e _ o u t : W [ N ] [ N ]

= . . .

Every function computes new matrix information from the output of the previous step. The new value of a pixel depends on its neighbors.

Page 23: Requirements of system level  design approach

29F.Catthoor © imec 2004

Cavity detector results: overall summary

0

100

200

300

400

500

600

accesses size cycles

Original

DF trafo

Loop trafo

Data reuse

In-place

Data layout

ADOPT - modulo

ADOPT - rest

Page 24: Requirements of system level  design approach

30F.Catthoor © imec 2004

Conclusions for DTSE stage•Order of magnitude can be typically gained on system bus load|

•As a result, also the energy consumption in the data memory hierarchy is reduced with about the same amount

•Also the system performance (board level) is significantly reduced because of competing resources on these system busses

•Penalty on code size is small (less than 20%)

•Typically the pure CPU speed is improved IF there was a data transfer bottleneck that could not be “hidden” by overlapping the computation and communication in the original code (which was certainly so for the cavity detector)

Page 25: Requirements of system level  design approach

31F.Catthoor © imec 2004

Task- versus Proc./Instr-level: mapping

Task1 Task2

Task3

Proc1 Proc2

Proc3

Array Proc1

ArrayProc2

Page 26: Requirements of system level  design approach

32F.Catthoor © imec 2004

Pareto curves allow task trade-off decision: DAB illustration

TASK-1 TASK-2 TASK-3

0 10000 20000 30000 40000

Execution time

0

4

8

12

0 50000 100000

Execution time

0

5

10

15

0.0 2.0 4.0 6.0

Execution time

0

500

1000

En

erg

y

Source: Digital Audio Broadcast

Mapped on two processors

Page 27: Requirements of system level  design approach

33F.Catthoor © imec 2004

Pareto curves allowtask trade-off decision

0 10000 20000 30000 400000

4

8

12

0 50000 1000000

5

10

15

0.0 2.0 4.0 6.00

500

1000

Source: Digital Audio Broadcast

Single proc.Large mem. overhead

TASK-1 TASK-2 TASK-3

En

erg

y

Execution time Execution timeExecution time

Page 28: Requirements of system level  design approach

34F.Catthoor © imec 2004

Pareto curves allowtask trade-off decision

0 10000 20000 30000 400000

4

8

12

0 50000 1000000

5

10

15

0.0 2.0 4.0 6.00

500

1000

Source: Digital Audio Broadcast

TASK-1 TASK-2 TASK-3

En

erg

y

Execution time Execution timeExecution time

Page 29: Requirements of system level  design approach

35F.Catthoor © imec 2004

512w256w128w96w64w

1

2

Cache Power

Main memory Power

032w

Cache Size[ words ]

Relativepower

Trade-offs in memory organisation(e.g. voice coder SW controlled cache)

Gain in power of additional factor 6 comparedto optimized (platform independent code)

Page 30: Requirements of system level  design approach

36F.Catthoor © imec 2004

Global concurrency management design flow for dynamic concurrent tasks with data-dominated behaviour

Data Type RefinementData Type Refinement

Task concurrency mgmtTask concurrency mgmt

Physical memory mgmtPhysical memory mgmt

Address optimizationAddress optimization

SWSWdesigndesignflowflow

HWHWdesigndesignflowflow

Concurrent OO specConcurrent OO spec

System control

HW-Ctrl uCtrl

Memory organ.

uProcDSPHWUnified modelPartitionRefine/compile

SW/HW co-designSW/HW co-design

Task scheduleAllocate/assign

Transform

Task1 Task2

Task3

Page 31: Requirements of system level  design approach

37F.Catthoor © imec 2004

MPEG4JPEG

Why are Applications becoming more dynamic and concurrent?

The workload decreases but the tasks are dynamically created and their size is data dependent

T1

T1’ T1T2

T3T4

Page 32: Requirements of system level  design approach

38F.Catthoor © imec 2004

Terminal QoS (3D demonstrator)

Page 33: Requirements of system level  design approach

39F.Catthoor © imec 2004

ARM

Processor

1Vdd=1V Vdd=3.3V

ARM

Processor

2

TNnTN2TN1

Codes’01, System Design Automation book Verlag’01

Reduce global system energy by task scheduling + assignment (e.g. 2-processor approach )

Page 34: Requirements of system level  design approach

40F.Catthoor © imec 2004

Tradeoff between time-budget and energy

Processor 1Low Vdd

Vdd=1.5V5nJ/instr.

2TUs/instr.

Processor 2High speed

Vdd=3.0V20nJ/instr.1TU/instr.

TradeoffMoreTimeUnits

Moreenergy

90M instr.180 M TUs

450 mJ

20M instr.20 M TUs400 mJ

180 M TUs

40M instr.80 M TU200 mJ

70M instr.70 M TU1400 mJ

80 M TUs 1600 mJ

850 mJ

Page 35: Requirements of system level  design approach

41F.Catthoor © imec 2004

Trade-off between time budget (period/latency) and cost (e.g.energy) leads to Pareto curves

Time

Cost

TB1TB2TB3TB4TB5TB6

Processor alloc/assign and scheduling alternativesFor TNs in code version 1

xx

x

xNon-optimal points

Page 36: Requirements of system level  design approach

42F.Catthoor © imec 2004

0

500

1000

1500

2000

2500

3000

3500

0 50 100 150 200 250

Time budget (us)

Energy (nJ)

Not single working point but Pareto curves needed in global trade-off

Both data transfer-storageand concurrency aspectshave to be combined!

Page 37: Requirements of system level  design approach

43F.Catthoor © imec 2004

0

5 0 0

1 0 0 0

1 5 0 0

2 0 0 0

2 5 0 0

3 0 0 0

3 5 0 0

0 5 0 1 0 0 1 5 0 2 0 0 2 5 0

Comparison of scheduling the original and transformed task-level descriptions

Time budget (us)

Energy (nJ)

original

Transformed

Page 38: Requirements of system level  design approach

44F.Catthoor © imec 2004

Overall solution: combination of complex design- and simple run-time

schedulers

Cases’00, ISSS’01,Design&Test- Sep.’01

12

3

th read fram e 1

A B

th read fram e 2

cost

1 3 2

Design-time Scheduling

Design-time Scheduling

A B

Design-time scheduling: at compile time, exploring all the optimization possibilities

time

TF 1cost

time

TF 2

Run-time

Scheduling

1 A B 3 2

• Run-time scheduling: at run time, providing flexibility and dynamic control at low cost as part of synthesized RTOS

Page 39: Requirements of system level  design approach

45F.Catthoor © imec 2004

Task 2

Application

Task 1

task

en

ergy

task execution time

En

ergy

task execution time

app

lica

tion

en

ergy

application execution time

time limit

Run-time: original Pareto point selection

Page 40: Requirements of system level  design approach

46F.Catthoor © imec 2004

Task 3Task 2

Application

Task 1ta

sk e

ner

gy

task execution time task

en

ergy

task execution time

app

lica

tion

en

ergy

application execution time

time limit

Run-time: one selection if new task enters

En

ergy

task execution time

Page 41: Requirements of system level  design approach

47F.Catthoor © imec 2004

Task 3Task 2

Application

Task 1ta

sk e

ner

gy

task execution time task

en

ergy

task execution time

En

ergy

task execution timeap

pli

cati

on e

ner

gy

application execution time

time limit

Run-time: better selection if new task enters

Gain

Page 42: Requirements of system level  design approach

48F.Catthoor © imec 2004

Quality of Service (QoS) result

17,53

14,32

6,211 6,171

17,53

14,65

9,487 9,469

0

2

4

6

8

10

12

14

16

18

20

no DVS inter-task DVS greedy heur. DP

ener

gy(

J)

fps=5 fps=10

65% energy saving for 5 fps, 46% for 10 fps