49
A parallel for loop memory template for a high level synthesis compiler Euromicro Conference on Digital System Design Lille, France 02/09/2010 Craig Moore Wim Meeus, Harald Devos, and Dirk Stroobandt

A parallel 'for' loop memory template for a high level synthesis compiler

Embed Size (px)

DESCRIPTION

We propose a parametrized memory template for applications with parallel 'for' loops. The template's parameters reflect important trade-offs made during system design. The template is incorporated in our high level synthesis (HLS) compiler, where the template's parameters are adjusted to the application. The template fits parallel 'for' loops with no loop dependencies and sequential bodies. We found two alternative template implementations using our compiler. In the future, we will develop templates for other types of 'for' loops. These will be added to the compiler and it will identify the template that works best for the application it is compiling. Once a template is selected, the compiler will use design space exploration to select the best combination of template parameters for the targeted hardware and application.

Citation preview

Page 1: A parallel 'for' loop memory template for a high level synthesis compiler

A parallel for loop memory templatefor a high level synthesis compiler

Euromicro Conference on Digital System Design

Lille, France02/09/2010

Craig MooreWim Meeus, Harald Devos, and Dirk Stroobandt

Page 2: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 2

Outline

● High Level Synthesis● Hardware Development● External Memory● Burst memory transfers● Parallel For Loops● Memory Template Overview● Small Example● Future Work● Conclusions

Page 3: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 3

High Level Synthesis (HLS)Missing Pieces

Page 4: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 4

HLS Missing Pieces

Page 5: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 5

HLS Missing Pieces

Page 6: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 6

Memory Templatesas Tools

● HDL Programmers have:● Toolkit of memory designs● Use the right tool for the job● Manually adapt their designs

● HLS Compilers should:● Have a toolkit of templates● Adapt the template to the app● Evaluate each template● Suggest the best template

Page 7: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 7

1) Read values from memory2) Process each value3) Store output in memory

Basic Steps for any Algorithm

for (int i = start; i < end; i++){ b[i] = func(a[i]);}

Page 8: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 8

Implement on Hardware

Page 9: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 9

External Memoryfor FPGAs

● A bottle neck● Sequential in nature● Number of values

returned each cycle depends on bus width.

● Each memory request requires a handshake

Page 10: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 10

Adapting to the Bottleneck

● Stream values from memory

● Pre-fetch values● Read/Write more than

one value each clock cycle

● Store values locally to mask latency

● Reduce number of requests

Page 11: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 11

Burst Transfers

● Burst of consecutive memory operations

Page 12: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 12

Read Transfer Start Address: 3

Transfer: 4

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 13: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 13

Read Transfer Start Address: 3

Transfer: 4

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 14: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 14

Read Transfer Start Address: 3

Transfer: 4

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 15: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 15

Read Transfer Start Address: 3

Transfer: 4

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 16: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 16

Read Transfer Start Address: 3

Transfer: 4

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 17: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 17

Write Transfer Start Address: 2

Transfer: 5

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 18: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 18

Write Transfer Start Address: 2

Transfer: 5

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 19: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 19

Write Transfer Start Address: 2

Transfer: 5

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 20: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 20

Write Transfer Start Address: 2

Transfer: 5

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 21: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 21

Write Transfer Start Address: 2

Transfer: 5

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 22: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 22

Write Transfer Start Address: 2

Transfer: 5

Burst Transfers

● Burst of consecutive memory operations

0

1

4

2

5

3

6

Page 23: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 23

Parallel for Loop

● Each iteration is run in parallel● No loop dependencies

● Loop Transformations to remove them

for i = 1 to 4{ a(i) = a(i) + 1 b(i) = a(i – 1) + a(i + 1)}

Example with Dependencies

Page 24: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 24

Template Overview

Page 25: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 25

Template Overview

Requests read bursts and controls execution of data paths, waits foroutput buffer if it is full

Page 26: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 26

Template Overview

Non-pipelined loop bodies executing in parallel.

Page 27: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 27

Manual Design

With enough values, performs write bursts.

Page 28: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 28

Manual Design

Starts and stops execution

Page 29: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 29

Manual Design

Controls access to memory, grants permission based on request (output buffer priority)

Page 30: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 30

Manual Design

Controls access to memory, grants permission based on request (output buffer priority)

Starts and stops execution With enough values, performs write bursts.

Non-pipelined loop bodies executing in parallel.

Requests read bursts and controls execution of data paths, waits foroutput buffer if it is full

Page 31: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 31

Byte-Enable Signal

● Multiple values for each memory transaction● Tells which bytes to replace and preserve

Page 32: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 32

Byte-Enable Signal

● Multiple values for each memory transaction● Tells which bytes to replace and preserve

Ignore

Enable

Page 33: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 33

Byte-Enable Signal

● Multiple values for each memory transaction● Tells which bytes to replace and preserve

Ignore

Enable

Page 34: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 34

Byte-Enable Signal

● Multiple values for each memory transaction● Tells which bytes to replace and preserve

Ignore

Enable

Page 35: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 35

Byte-Enable Signal

● Multiple values for each memory transaction● Tells which bytes to replace and preserve

Ignore

Enable

Page 36: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 36

Parametrized Template

Page 37: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 37

Parametrized Template

● Memory Bus Width = MParameters

Page 38: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 38

● Word Width = W

Parametrized Template

● Memory Bus Width = MParameters

Page 39: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 39

● Word Width = W

Parametrized Template

● Memory Bus Width = MParameters

● Max Words = A = M / W

Page 40: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 40

● Word Width = W

Parametrized Template

● Memory Bus Width = MParameters

● Max Words = A = M / W

● Input FIFOs = X = Cx * A

Page 41: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 41

● Word Width = W

Parametrized Template

● Memory Bus Width = MParameters

● Max Words = A = M / W

● Input FIFOs = X = Cx * A

● Iterations = Output FIFOs = N = C

N * X

Page 42: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 42

● Word Width = W

Parametrized Template

● Memory Bus Width = MParameters

● Max Words = A = M / W

● Input FIFOs = X = Cx * A

● Iterations = Output FIFOs = N = C

N * X

● Burst Length

● Input FIFO Length

● Iteration Length

● Output FIFO Length

Page 43: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 43

● Word Width = W

Parametrized Template

● Memory Bus Width = MParameters

● Max Words = A = M / W

● Input FIFOs = X = Cx * A

● Iterations = Output FIFOs = N = C

N * X

● Burst Length

● Input FIFO Length

● Iteration Length

● Output FIFO Length

Page 44: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 44

Example – Reading Values

Values in Memory

Values to be read

Byte enabled

Byte disabled

Values processed

Page 45: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 45

Example – Processing Values

Values in Memory

Values to be read

Byte enabled

Byte disabled

Values processed

Page 46: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 46

Example – Writing Values

Values in Memory

Values to be read

Byte enabled

Byte disabled

Values processed

Page 47: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 47

Future Work

● More templates for other parallel for loops● Pipelined loop body● Data reuse

● Compiler identifies parallel for loop● No keywords● Check for loop dependencies, and do loop

transformations if required● Compiler suggests best memory template

● Chosen based on performance estimate● Design space exploration using templates

Page 48: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 48

Conclusions

● HLS Tools don't create memory designs● Manual memory designs can take

days/weeks/months to complete● Parametrized memory template designs are

generated in seconds● Easy to perform design space exploration using

different parameter values and/or templates

Page 49: A parallel 'for' loop memory template for a high level synthesis compiler

30/06/2010 Craig Moore, DSD 02/09/2010 49

Thank You!

Questions?

[email protected]://www.elis.ugent.be/~cmoore

Wim Meeus*, Harald Devos‡, and Dirk Stroobandt**{wim.meeus, dirk.stroobandt}@elis.ugent.be, ‡[email protected]