47
CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Embed Size (px)

DESCRIPTION

Hardware Design with RTL VHDL Pseudocode DatapathController Block diagram Block diagram State diagram or ASM chart VHDL code Interface 3

Citation preview

Page 1: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

CDA 4253 FPGA System DesignRTL Design Methodology

1

Hao ZhengComp Sci & Eng

USF

Page 2: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Structure of a Typical Digital Design

Datapath(Execution

Unit)

Controller(Control

Unit)

Data Inputs

Data Outputs

Control Inputs

Control Outputs

Control Signals

StatusSignals

2

Page 3: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Hardware Design with RTL VHDL

Pseudocode

Datapath Controller

Blockdiagram

Blockdiagram

State diagramor ASM chart

VHDL code VHDL code VHDL code

Interface

3

Page 4: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Hardware Design with RTL VHDL

4

Page 5: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Steps of the Design Process1. Text description2. Interface3. Pseudocode4. Block diagram of the Datapath5. Interface divided into Datapath and Controller 6. FSM of the Controller7. RTL VHDL code of the Datapath, Controller, and Top-

Level Unit8. Testbench for the Datapath, Controller, and Top-Level

Unit9. Functional simulation and debugging10. Synthesis and post-synthesis simulation11. Implementation and timing simulation12. Experimental testing using FPGA board

5

Page 6: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Min_Max_Average

6

Page 7: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Pseudocode Input: M[i]Outputs: max, min, average

max = 0min = MAX // the maximal constantsum = 0for i=0 to 31 do d = M[i]; sum = sum + d if (d < min) then min = d endif if (d > max) then max = d endifendforaverage = sum/32

7

Data M[i] are stored in memory.

Results are stored in the internal registers.

Page 8: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Circuit Interface

n

5

n

2

clk

reset

in_data

in_addr

write

START

DONE

out_data

out_addrMIN_MAX_AVR

8

Page 9: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Interface Table

Port Width Meaning

clk 1 System clock

reset 1 System reset – clears internal registers

in_data n Input data bus

in_addr 5 Address of the internal memory where input data is stored

write 1 Synchronous write control signal

START 1 Starts the computations

DONE 1 Asserted when all results are ready

out_data n Output data bus used to read results

out_addr 2 01 – reading minimum10 – reading maximum11 – reading average

9

Page 10: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Datapath

10

Input: M[i]Outputs: max, min, average

max = 0min = MAXsum = 0for i=0 to 31 do d = M[i]; sum = sum + d if (d < min) then min = d endif if (d > max) then max = d endifendforaverage = sum/32

Page 11: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Datapath

11

Input: M[i]Outputs: max, min, average

max = 0min = MAXsum = 0for i=0 to 31 do d = M[i]; sum = sum + d if (d < min) then min = d endif if (d > max) then max = d endifendforaverage = sum/32

+

d

sum

average

min max

d min d max

min max

< >

mux mux

/32

Page 12: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

State Diagram for Controller

12

Input: M[i]Outputs: max, min, average

max = 0min = MAXsum = 0for i=0 to 31 do d = M[i]; sum = sum + d if (d < min) then min = d endif if (d > max) then max = d endifendforaverage = sum/32

Page 13: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

State Diagram for Controller

13

Input: M[i]Outputs: max, min, average

max = 0min = MAXsum = 0for i=0 to 31 do d = M[i]; sum = sum + d if (d < min) then min = d endif if (d > max) then max = d endifendforaverage = sum/32

start / rst=1

i==32 / done=1

i < 32 / i++

start’/

done=0

init

run

end

Page 14: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Sorting

14

Page 15: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

15

Beforesorting

During Sorting Aftersorting

Addr

0123

3 3 2 2 1 1 1 12 2 3 3 3 3 2 24 4 4 4 4 4 4 31 1 1 1 2 2 3 4

i=0 i=0 i=0 i=1 i=1 i=2j=1 j=2 j=3 j=2 j=3 j=3

Mi Mj

Legend: position of memory indexed by i

position of memory indexed by j

Sorting - Example

Data

Page 16: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

16

Pseudocode

for i=0 to k-2 doA = M[i]for j=i+1 to k-1 do

B = M[j]if A > B then

M[i] = B

M[j] = A

A = Bend if

end forend for

Page 17: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

17

Sorting - Interface

Sort

clock

resetdin

N

done

addrk

we

startMemory

N

dout

Page 18: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

18

Datapathfor i=0 to k-2 do

A = M[i]for j=i+1 to k-1 do

B = M[j]if A > B then

M[i] = B

M[j] = A

A = Bend if

end forend for

• Registers to hold A, B, and memory addresses i and j.

• Incrementor.• Comparator.

Page 19: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

19

Datapathfor i=0 to k-2 do

A = M[i]for j=i+1 to k-1 do

B = M[j]if A > B then

M[i] = B

M[j] = A

A = Bend if

end forend for

counteri

enable

j+1

mux

regj

sel1

en_regj

Page 20: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

20

Datapathfor i=0 to k-2 do

A = M[i]for j=i+1 to k-1 do

B = M[j]if A > B then

M[i] = B

M[j] = A

A = Bend if

end forend for

i j

mux

addr

B A

mux

dout

sel3

din

A B

enA

>

AgtB

A B

regA regBenB

mux

B

sel2

Page 21: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

21

State Diagram for the Controller

for i=0 to k-2 doA = M[i]for j=i+1 to k-1 do

B = M[j]if A > B then

M[i] = B

M[j] = A

A = Bend if

end forend for

• Nested loops by two FSMs: one for the outer loop controls the one for the inner loop.

• Reuse the FSM for the single for loop in the previous example.

Page 22: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

22

State Diagram for the Controller

for i=0 to k-2 doA = M[i]for j=i+1 to k-1 do

B = M[j]if A > B then

M[i] = B

M[j] = A

A = Bend if

end forend for

start / rst=1

j<k-1 / …

i<k-2 / A <= M[i]j <= i+1

start’/

done=0

init

outeri==k-2 /done=1end

j==k-1 / …

inner

Page 23: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

23

Optimization for Performance

Page 24: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

24

Performance Definitions

• Throughput: the number of inputs processed per unit time.

• Latency: the amount of time for an input to be processed.

• Maximizing throughput and minimizing latency in conflict.• Both require timing optimization:

- Reduce delay of the critical path

Page 25: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

25

Achieving High Throughput: Pipelining

• Divide data processing into stages• Process different data inputs in different stages

simultaneously.

xpower = 1;for (i = 0; i < 3; i++) xpower = x * xpower;

process (clk) begin if rising_edge(clk) then if start=‘1’ then

cnt <= 3; end if; if cnt > 0 then

cnt <= cnt – 1;xpower <= xpower * x;

elsif cnt = 0 then done <= ‘1’; end if;end process;

Throughput: 1 data / 3 cycles = 0.33 data / cycle .Latency: 3 cycles.Critical path delay: 1 multiplier delay

Page 26: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

26

Achieving High Throughput: Pipelining

xpower = 1;for (i = 0; i < 3; i++) xpower = x * xpower;

process (clk, rst) begin if rising_edge(clk) then if start=‘1’ then -- stage 1

x1 <= x;xpower1 <= x;done1 <= start;

end if; -- stage 2 x2 <= x1; xpower2 <= xpower1 * x1; done2 <= done1; -- stage 3 xpower <= xpower2 * x2; done <= done2; end if;end process;

Throughput: 1 data / cycleLatency: 3 cycles + register delays.Critical path delay: 1 multiplier delay

Page 27: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

27

Achieving High Throughput: Pipelining

• Divide data processing into stages• Process different data inputs in different stages

simultaneously.

din dout

Page 28: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

28

Achieving High Throughput: Pipelining

• Divide data processing into stages• Process different data inputs in different stages

simultaneously.

din dout…stage 1 stage 2 stage n

Penalty: increase in area as logic needs to be duplicated for different stages

registers

Page 29: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

29

Reducing Latency

• Closely related to reducing critical path delay.• Reducing pipeline registers reduces latency.

din dout…stage 1 stage 2 stage n

registers

Page 30: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

30

Reducing Latency

• Closely related to reducing critical path delay.• Reducing pipeline registers reduces latency.

din dout…stage 1 stage 2 stage n

Page 31: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

31

Timing Optimization

• Maximal clock frequency determined by the longest path delay in any combinational logic blocks.

• Pipelining is one approach.

din dout…stage 1 stage 2 stage n

registers

din dout

Page 32: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

32

Timing Optimization: Spatial Computing

• Extract independent operations• Execute independent operations in parallel.

X = A + B + C + D

process (clk, rst) begin if rising_edge(clk) then X1 := A + B; X2 := X1 + C; X <= X2 + D; end if;end process;

process (clk, rst) begin if rising_edge(clk) then X1 <= A + B; X2 <= C + D; X <= X1 + X2; end if;end process;

Critical path delay: 3 adders Critical path delay: 2 adders

Page 33: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

33

Timing Optimization: Avoid Unwanted Priority

process (clk, rst) begin if rising_edge(clk) then if c0=‘1’ then r0 <= din; elsif c1=‘1’ then r1 <= din; elsif c2=‘1’ then r2 <= din; elsif c3=‘1’ then r3 <= din; end if; end if;end process;

Critical path delay: 4-input AND gate + 4x1 MUX.

Page 34: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

34

Timing Optimization: Avoid Unwanted Priority

Critical path delay: 2x1 MUX

process (clk, rst) begin if rising_edge(clk) then if c0=‘1’ then r0 <= din; end if; if c1=‘1’ then r1 <= din; end if; if c2=‘1’ then r2 <= din; end if; if c3=‘1’ then r3 <= din; end if; end if;end process;

Page 35: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

35

Timing Optimization: Register Balancing• Maximal clock frequency determined by the longest path

delay in any combinational logic blocks.

din block 1 block 2 dout

din block 1 block 2 dout

Page 36: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

Timing Optimization: Register Balancing

process (clk, rst) begin if rising_edge(clk) then rA <= A; rB <= B; rC <= C; sum <= rA + rB + rC; end if;end process;

process (clk, rst) begin if rising_edge(clk) then sumAB <= A + B; rC <= C; sum <= sumAB + rC; end if;end process;

Page 37: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

37

Optimization for Area

Page 38: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

38

Area Optimization: Resource Sharing• Rolling up pipleline: share common resources at different

time – a form of temporal computing

dindout

din dout…stage 1 stage 2 stage n

Block including all all logic in stage 1 to n.

Page 39: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

39

Area Optimization: Resource Sharing

• Use registers to hold inputs• Develop FSM to select which inputs to process in each

cycle.X = A + B + C + D

+

++

AB

CD

X

Page 40: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

40

Area Optimization: Resource Sharing

• Use registers to hold inputs• Develop FSM to select which inputs to process in each

cycle.X = A + B + C + D

+

++

AB

CD

X + X

ABCD

A, B, C, D need to hold steady until X is processed

control

Page 41: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

41

Area Optimization: Resource Sharing

Merge duplicate components

together

Page 42: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

42

Area Optimization: Resource Sharing

Merge duplicate components

together

Page 43: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

43

Impact of Reset on Area

Page 44: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

44

Resetting Block RAM

• Block RAM only supports synchronous reset.• Suppose that Mem is 256x16b RAM. • Implementations of Mem with synchronous and

asynchronous reset on Xilinx Virtex-4.

Page 45: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

45

Optimization for Power

Page 46: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

46

Power Reduction Techniques

• In general, FPGAs are power hungry. • Power consumption is determined by

where V is voltage, C is load capacitance, and f is switching frequency• In FPGAs, V is usually fixed, C depends on the number of

switching gates and length of wires connecting all gates.• To reduce power,

• turn off gates not actively used,• have multiple clock domains,• reduce f.

Page 47: CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF

47

Dual-EdgeTriggered FFs

• A design that is active on both clock edges can reduce clock frequency by 50%.

din doutstage 1 stage 2 stage nstage 4

din doutstage 1 stage 2 stage nstage 4

Example 1

Example 2 positively triggerednegatively triggered