18
1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory IDA, Linköping University 2 Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6 H.L. Describe and Synthesize Description of a design in terms of behavioral specification. Refinement of the design towards an implementation by adding structural details. Evaluation of the design in terms of a cost function and the design is optimized for the cost function. For I=0 To 2 Loop Wait until clk’event and clk=´1´; If (rgb[I] < 248) Then P=rgb[I] mod 8; ... clk P R enable O

TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

1

TDTS 01 Lecture 6

High-Level Synthesis I

Zebo PengEmbedded Systems Laboratory

IDA, Linköping University

2Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

H.L. Describe and Synthesize

Description of a design in terms of behavioral specification.

Refinement of the design towards an implementation by adding structural details.

Evaluation of the design in terms of a cost function and the design is optimized for the cost function.

For I=0 To 2 LoopWait until clk’eventand clk=´1´;If (rgb[I] < 248) Then

P=rgb[I] mod 8;... clk

PR

enable

O

Page 2: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

2

3Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Lecture 6

Advanced scheduling issues

A typical HLS process

Definition and motivation

Scheduling techniques

4Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Definition

HLS generates automatically a register-transfer level design from a behavioral specification.

For I=0 To 2 LoopWait until clk’eventand clk=´1´;If (rgb[I] < 248) Then

P=rgb[I] mod 8;...

clk

p R

enable

OHLS

HLS is also called architectural synthesis, behavioral synthesis, or algorithmic synthesis.

Layout Synthesis

Logic Synthesis

RTL Synthesis

High-Level Synthesis

Page 3: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

3

5Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Input to HLSThe behavioral specification.

Design constraints (timing, performance, cost, power consumption, pin-count, testability, etc.).

An optimization (cost) function.

A module library representing the available components.

For I=0 To 2 LoopWait until clk’eventand clk=´1´;If (rgb[I] < 248) Then

P=rgb[I] mod 8;...

clk

p R

enable

OHLS

6Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Controller

Output of HLSRTL implementation structure (netlist).

Controller (captured usually as a symbolic FSM).

For I=0 To 2 LoopWait until clk’eventand clk=´1´;If (rgb[I] < 248) Then

P=rgb[I] mod 8;...

clk

p R

enable

OHLS

C1 C2 C3

Other attributes, such as geometrical

information, to guide the back-end design tasks.

Page 4: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

4

7Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Goal of HLS To generate a RTL design that

implements the specified behavior,

satisfies the design constraints, and

leads to the optimization of the given cost function.Ex. Cost = DP-area + Cont-area + Exe-time

For I=0 To 2 LoopWait until clk’eventand clk=´1´;If (rgb[I] < 248) Then

P=rgb[I] mod 8;...

HLS

Controller

clk

p R

enable

O

C1 C2 C3

8Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Why HLS?Efforts spent in system design are now dominating

the design cycle.

Physical design → Logic design → System design.

Improve design productivity.

Increase re-targetability.

New technology generations.

ASICs vs. FPGAs.

New platforms.

Suport design re-use.

Get the design correct the first time.

Page 5: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

5

9Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Lecture 6

Advanced scheduling issues

A typical HLS process

Definition and motivation

Scheduling techniques

10Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

HLS Phase 1 — Specification Which language to use?

Procedural languages

Functional languages

Graphics notations

VHDL vs. Verilog vs. SystemC

Explicit parallelism or not?

Requirement specification

Timing

Power consumption

Ares cost

PROCEDURE Test;VAR

A,B,C,D,E,F,G:integer;

BEGINRead(A,B,C,D,E);F := E*(A+B);G := (A+B)*(C+D);...

END;

Page 6: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

6

11Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Phase 2 — Data-Flow Analysis

Parallelism extraction.

Eliminating high-level

language constructs

(procedure call, complex

operator, etc.).

Program transformation (e.g,

loop invariant movement).

Loop unrolling.

Common sub-expression

detection.

Dead code elimination.

+

A B C DE

GF

+

* *

PROCEDURE Test;VAR

A,B,C,D,E,F,G:integer;BEGIN

Read(A,B,C,D,E);F := E*(A+B);G := (A+B)*(C+D);...

END;

12Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Phase 3 — Operation Scheduling

To decide when to perform

each operation.

Performance/cost trade-

off.

Clocking strategy.

Performance estimation.

+

A B C DE

GF

+

* *

Page 7: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

7

13Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Phase 4 — Data-Path Allocation

Operator selection and

binding.

Register/memory allocation.

Interconnection generation.

Hardware minimization.

+

A B C DE

GF

+

* *

To decide the basic data-path structure.

partial data-path

Reg R2

+

A<0:7> B<0:7>

*

Reg R1

Reg R3

E<0:7>

Reg R4

M

14Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Phase 5 — Control Allocation Selection of control

style (PLA, micro-code, random logic, etc.).

Controller generation.

Controller description:

S1: Load R1 and R2 next S2;

S2: Add, M=1, Load R3 next S3;

S3: Load R4 next S4;

S4: Mult, M=0, Load R3 next S5;

...

Reg R2

+

A<0:7> B<0:7>

*

Reg R1

Reg R3

E<0:7>

Reg R4

M

Page 8: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

8

15Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Phase 6 — Module Binding and Cont. Imp.

Selection of physical modules.

Controller ROM:

0000 : 11000000 0001

0001 : 00100000 0010

0010 : 00011000 0011

0011 : 01000000 0100

...Module

Lib

Reg R1 Reg R2

Reg R3

A<0:7> B<0:7>

M1

Adder 8

L2L1

L3

Decision on module parameters and constraints.

Controller implementation.

16Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

The Basic Issues

Scheduling — Assignment of each operation to a time slot corresponding to a clock cycle or time interval.

Resource Allocation — Selection of the types of hardware components and the number for each type to be included.

Module Binding — Assignment of operation to the allocated hardware components.

Controller Synthesis — Design of control style and clocking scheme.

Page 9: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

9

17Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

The Basic Issues (Cont’d)

Compilation — To convert the input specification

into an internal representation.

Parallelism Extraction — To extract the inherent

parallelism of the original solution.

It is usually done with data flow analysis

techniques.

Operation Decomposition — Implementation of

complex operations in the behavioral

specification.

18Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Lecture 6

Advanced scheduling issues

A typical HLS process

Definition and motivation

Scheduling techniques

Page 10: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

10

19Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

The Scheduling Problems

Resource-constrained (RC) scheduling:Given a set O of operations with a partial ordering,

a set K of functional unit types, a type function, : O K, to map the operations into the

functional unit types, and resource constraints mk for each functional unit type.

Find a (optimal) schedule for the set of operations that obeys the partial ordering and utilizes only the available functional units.

Time-constrained (TC) scheduling

20Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

A RC-Scheduling Example

a := i1 + i2;

o1 := (a - i3) * 3;

o2 := i4 + i5 + i6;

d := i7 * i8;

g := d + i9 + i10;

o3 := i11 * 7 * g;

1 adder, 1 multiplier

: +,- Adder

* Multiplier

+ *

+

*

*+

-

*

+

+

o1

o2

o3

g

da

Page 11: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

11

21Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

ASAP (As Soon As Possible) Sch.

Sort operations topologically according to their dependence.

Schedule operations in the sorted order by placing them in the earliest possible control step.

1 2 3 4 5

6 7 8

9 10

Sorted DFG

+ 1*

3

+ 2

+ 4

*5

- 6

+ 7

+ 8

*9

*10

Co

ntr

ol S

tep

s

1

2

4

6

7

5

3

+ *

+

*

*+

-

*

+

+

22Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

ALAP (As Late As Possible) Sch. Sort operations topologically according to their dependence.

Schedule operations in the reversed order by placing them in the latest possible control step.

1 2 3 4 5

6 7 8

9 10

Sorted DFG

+ 1

*3

+ 2

+ 4

*5- 6

+ 7

+ 8*

9

*10

Co

ntr

ol S

tep

s

N-5

N-3

N-1

N

N-2

N-4

6 steps needed

+ *

+

*

*+

-

*

+

+

6 steps needed = Optimum

Page 12: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

12

23Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

List Scheduling

For each control step, the operations that are available to be scheduled are kept in a list;

The list is ordered by some priority function:

The length of path from the operation to the end of the block (critical path);

Mobility: the number of control steps from the earliest to the latest feasible control step.

Each operation on the list is scheduled one by one if the resources it needs are free; otherwise it is deferred to the next control step.

24Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Sorted list: +1(2), *3(2), +4(2), +2(1), *5(1)

Schedulable operations

Co

ntr

ol S

tep

s

1

3

5

6

4

2

Sorted list: +1(2), *3(2), +4(2), +2(1), *5(1)

Schedulable operations + 1*

3

List Scheduling Example

+ *

+

*

*+

-

*

+

+

1 2 3 4 5

6 7 8

9 10

+ 2

+ 1*

3

+ 4*

5

- 6

+ 7

+ 8*

9

*10

Page 13: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

13

25Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Time-Constrained Scheduling

Force-Directed Method

TC scheduling — to find a schedule for the operations that obeys the partial ordering and utilizes the minimal amount of hardware.

The basic idea is to balance the concurrency of operations.

ASAP and ALAP schedules, assuming no resource constraints, are calculated to derive the time frames for all operations.

For each type of operations, a distribution graph is built to denote the possible control steps for each operation.

If an operation could be done in k steps, then 1/k is added to each of these k steps.

26Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Force-Directed Scheduling Example

ASAP

+

+

*

a1

a2

*

+a3

ALAP

+

+

*

a1

a2*

+a3

Time Constraint: 3 cycles

a1

a2a3

Distribution Graph for Addition

The

idea

l dis

tribu

tion

Page 14: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

14

27Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Force-Directed Scheduling ExampleThe algorithm tries to balance the distribution graph by calculating the force of each operation-to-control step assignment and select the smallest force.

)(

)(

))(( )()(

1)(

iALAP

iASAP

ji

o

osijso sDG

oTsDGForce

ASAP

+

+

*

a1

a2

*

+a3

ALAP

+

+

*

a1

a2*

+a3

Time Constraint: 3 cycles

a1

a2a3

Distribution Graph for Addition

The

idea

l dis

tribu

tion

28Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

5.015.1)(2

15.1

)3(

)3()2)3((

aALAP

aASAPsca sDGForce

Force-Directed Scheduling ExampleThe algorithm tries to balance the distribution graph by calculating the force of each operation-to-control step assignment and select the smallest force.

ASAP

+

+

*

a1

a2

*

+a3

ALAP

+

+

*

a1

a2*

+a3

Time Constraint: 3 cycles

a1

a2a3

Distribution Graph for Addition

The

idea

l dis

tribu

tion

Page 15: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

15

29Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

5.015.0)(2

15.0

)3(

)3()3)3((

aALAP

aASAPsca sDGForce

Force-Directed Scheduling ExampleThe algorithm tries to balance the distribution graph by calculating the force of each operation-to-control step assignment and select the smallest force.

ASAP

+

+

*

a1

a2

*

+a3

ALAP

+

+

*

a1

a2*

+a3

Time Constraint: 3 cycles

a1

a2a3

Distribution Graph for Addition

The

idea

l dis

tribu

tion

30Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Classification of Sch. Approaches, I

Constructive scheduling —

One operation is assigned to one control step at a time and this process is iterated from control step (operation) to control step (operation).

ASAP

ALAP

List scheduling

Page 16: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

16

31Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Classification of Sch. Approaches, II

Global scheduling —

All control step and all operations are considered simultaneously when operations are assigned to control steps.

Force-directed scheduling

Neural net scheduling

Integer Linear Programming algorithms

32Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Classification of Sch. Approaches, III

Transformational scheduling —

Starting from an initial schedule, a final schedule is obtained by successively transformations.

Transformation-Based Approach

PerformanceCostPowerTestability...

Page 17: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

17

33Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Lecture 6

Advanced scheduling issues

A typical HLS process

Definition and motivation

Scheduling techniques

34Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Advanced Scheduling Issues

Chaining and multi-cycling:

*100ns

+

200ns

+

Two chainedadditions*

100ns

+

+

A multi-cyclemultiplication*

100ns

+

+

50ns

Pipelined operations.

Two adders needed

Complex in analysis, verification and testing.

Page 18: TDTS 01 Lecture 6 High-Level Synthesis I - IDATDTS01/lectures/13/lec6.pdf · 2013-01-29 · 1/29/2013 1 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory

1/29/2013

18

35Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 6

Summary on Scheduling

Scheduling has long been recognized as a very important step in high-level synthesis, since it determines the performance of the final design.

A wide variety of algorithms exist in the literature for efficiently performing the task of HLS scheduling.

These algorithms can be classified in different ways:

Objectives: Basic scheduling algorithms, Time constrained, or Resource constrained.

Approaches: Constructive, Global, or Transformational.