36
Run-time Adaptive on-chip Communication Scheme 林林林 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C

Run-time Adaptive on-chip Communication Scheme

  • Upload
    ilario

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Run-time Adaptive on-chip Communication Scheme. 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C. Outline. Abstract Introduction Definitions Algorithm Motivation Case Study Hardware Implementation Conclusion. Abstract. - PowerPoint PPT Presentation

Citation preview

Page 1: Run-time  Adaptive on-chip Communication Scheme

Run-time Adaptive on-chip Communication Scheme

林孟諭 Dept. of Electrical EngineeringNational Cheng Kung University

Tainan, Taiwan, R.O.C

Page 2: Run-time  Adaptive on-chip Communication Scheme

2

Outline

• Abstract• Introduction• Definitions• Algorithm• Motivation Case Study • Hardware Implementation• Conclusion

Page 3: Run-time  Adaptive on-chip Communication Scheme

3

Abstract

• During run-time varying workloads and/or constraints in embedded systems require run-time adaptivity to provide a high degree of efficiency during any operation mode/scenario.

• We are presenting the first approach of an adaptive on-chip communication scheme.

• It provides an adaptive routing/path allocation algorithm to meet a required level of Quality of Services (QoS) which is guaranteed bandwidth.

Page 4: Run-time  Adaptive on-chip Communication Scheme

4

Introduction (1/2)

• A run-time adaptive network on chip that adapts the underlying interconnection infrastructure on-demand in response to changing communication requirements imposed by an application.

• To provide on-demand interconnections, we present a novel adaptive routing/path allocation algorithm that meets QoS requirements (bandwidth).

Page 5: Run-time  Adaptive on-chip Communication Scheme

5

Introduction (2/2)

• The scheme makes decisions locally at each router depending on the available bandwidth in each direction to the neighboring router.

• Dynamic connections are realized by re-assigning a certain number of buffer blocks to different output ports of a router on-demand.

• It also increases the resource utilization, especially buffer utilization, through on-demand buffer block configuration.

Page 6: Run-time  Adaptive on-chip Communication Scheme

6

Definitions (1/6)

• Definition 1: An application task graph (TG) is a directed graph Gk = (T, F), – T is the set of all tasks ti used by an application – fi, j F ∈ represents the connection from task ti to tj

Task

connection

Task

Page 7: Run-time  Adaptive on-chip Communication Scheme

7

Definitions (1/6)

• Definition 2: Physical Network (PN) is a directed graph P = (N, V, Bt, r). – N is a set of tiles ni – vi, j V ∈ represent an edge, the physical channel between ni and nj

– Each tile has a current buffer configuration at time t, bi,t B∈ t represents the state of a buffer assignment to individual output ports.

– A routing function r which determines the paths taken.

Tile n1

edge Tile n2

Page 8: Run-time  Adaptive on-chip Communication Scheme

8

Definitions (2/6)

• Definition 3: Logical Network (LN) at time t is a directed graph Lt = (M, W)– M is a set of task groups mi

– w i, j W ∈ represents the set of connections between two task groups mi and mj

– A task group mi is a set of tasks scheduled to be executed on a particular PE.

– LN is the subset of the task graph set G that are running at a specific time t.

• Definition 4: The Task Mapping Function is a function lt : T’ T → L⊆ t which maps subset T’ of each task graph T to the logical network LN.

Page 9: Run-time  Adaptive on-chip Communication Scheme

9

Definitions (3/6)

• Definition 5: The Network Mapping Function is a function pt : Lt → S P ⊆which maps a logical network onto a subset of the physical network.

• Definition 6: A Routing Function r : N × N → V , r : (ni , nj) → vi,j returns a path vi,j away from the current PE (ni) given the input port for each transaction and the destination nj.

For example, a path, v, that Gauss2 forwards to Filter2.

Page 10: Run-time  Adaptive on-chip Communication Scheme

10

Definitions (4/6)

• Definition 7: – The Buffer Configuration bi,t is the current buffer configuration of tile ni

N∈ . – A Virtual Channel (VC) is a unidirectional logical or virtual connection

between the tile ni and nj – Each VC is realized by an independently managed pair of message

buffers referred to as the Virtual Channel Buffer (VCB).

Page 11: Run-time  Adaptive on-chip Communication Scheme

11

Definitions (5/6)

Physical Network

Task Graph

Logical Network

PEPEPEPE

Network Mapping Function (Definition 5)

Task Mapping Function (Definition 4)

Routing Function (Definition 6)Buff. Buff. Buff.Buff. …

Page 12: Run-time  Adaptive on-chip Communication Scheme

12

Definitions (6/6)

• Definition 8: The System Monitor M is an infrastructure which is used to collect, aggregate, and process system statistics.

• Definition 9: Our Adaptive Network on Chip AdNoC is defined as the tuple AdNoC = (P, M, Lt, Gi, pt, lt, r) with the parameters as given above.– P = Physical Network (Definition 2)– M= System Monitor (Definition 8)– Lt = Logical Network (Definition 3)

– Gi = Task Graph (Definition 1)– pt = Network Mapping Function (Definition 5)– lt = Task Mapping Function (Definition 4)– r = Routing Function (Definition 6)

Page 13: Run-time  Adaptive on-chip Communication Scheme

13

Algorithm (1/12)

• To provide bandwidth guarantee in an adaptive NoC, the underlying communication infrastructure needs to provide an adaptive path allocation strategy.

• Therefore, finding a path/routing for a given logical network and physical mapping of the application is a major challenge. The run-time path allocation algorithm is given in Alg. 1.

Page 14: Run-time  Adaptive on-chip Communication Scheme

14

Algorithm (2/12)

1: upon receiving data at runtime do2: if destination = processor port then3: route ⇐ processor port4: else5: if flit type = head or connection in look-up table from same source port then {non- header flits are always in look-up-table}6: route ⇐ look-up table7: else {get route}8: route ⇐ do weighted XY route allocation // Alg.29: if route found then {assign buffer to route}10: do runtime buffer assignment for found route // Alg.311: end if

12: if no route found or buffer assignment unsuccessful then13: collect router status information14: send information to higher level15: end if16: end if17: if flit type = tail and keep-alive not requested then {free buffer}18: remove buffer from buffer table19: remove connection from look-up table20: end if21: end if

Algorithm 1 Runtime/On-demand Path Selection Algorithm

Page 15: Run-time  Adaptive on-chip Communication Scheme

15

Algorithm (3/12)

1: upon receiving a connection and destination do2: if connection in look-up table from different source port then {look for potential loops}3: loopRoute ⇐ output port of other connection4: end if5: for all output ports pi do {initialize all weights to zero} // 將 Weight 歸零6: wi ⇐ 07: end for8: dx = | destination x − current x| // dx, dy 作為等下要計算 Weight 的係數9: dy = | destination y − current y|

15

Algorithm 2 Weighted XY Route Allocation

Page 16: Run-time  Adaptive on-chip Communication Scheme

16

Algorithm (4/12)

10: for all pi with available bandwidth > required bandwidth and loopRoute = pi do {East and West output ports} // 判斷東西向11: if pi points toward destination x then12: wi ⇐ available bandwidth pi dx+ total link bandwidth13: else if pi points away from destination x then14: wi ⇐ available bandwidth15: end if

// 判斷南北向16: if pi points toward destination y then // {North and South output ports}17: wi ⇐ available bandwidth pi dy + total link bandwidth18: else if pi points away from destination y then19: wi ⇐ available bandwidth20: end if21: end for{route toward the port having highest weight}

22: route = pi with max wi {save the route in the look-up table}23: look-up table ⇐ connection = route24: return route 16

Page 17: Run-time  Adaptive on-chip Communication Scheme

17

Algorithm (5/12)

• For a requesting transaction, the path is checked in every possible direction and the VCB is assigned accordingly on-demand.

• The weighted XY algorithm wXY presented in Alg. 2 assigns each output port a weight based on available bandwidth and dx or dy between the current and the destination nodes.

• This ideally gives the packet a maximum number of sensible routing choices along its path. The weight is also proportional to the available bandwidth.

Page 18: Run-time  Adaptive on-chip Communication Scheme

18

Algorithm (6/12)

• The wXY route allocation strategy is described as follows: given is the tuple ρ = {N, E, S, W, P}.

• Each i ∈ ρ has a weight wi and available bandwidth bi with bi ≤ bmax, bmax being the maximum line bandwidth.

Page 19: Run-time  Adaptive on-chip Communication Scheme

19

Algorithm (7/12)

• The current router coordinates are x, y. Each packet p has destination coordinates xd , yd and a required bandwidth bp. The weights are assigned as follows:

Page 20: Run-time  Adaptive on-chip Communication Scheme

20

Algorithm (8/12)

• The route r chosen is then:

• The router distribute the VCBs to any route as needed by assigning it to the according output port.

Page 21: Run-time  Adaptive on-chip Communication Scheme

21

Algorithm (9/12)

Algorithm 3 On-demand Buffer Assignment1: upon receiving a connection and direction do // 收到連結和目的地2: search for next free buffer bfree ∈ buffer pool B and not in buffer table

// 尋找可用的buff.3: if bfree found then {assign available buffer to current direction}4: current buffer bcurr b⇐ free // 將可用的buff.分配到所需地5: buffer table b⇐ curr → output port // 指向哪個 port也記錄在 table6: return bcurr

7: else8: return no buffer available9: end if

• Our scheme to assign buffers on-demand is given in Alg. 3. • The benefits of such on-demand assignment is evident:

buffers are only allocated when needed meaning that virtual channels can be reused by different ports.

Page 22: Run-time  Adaptive on-chip Communication Scheme

22

Algorithm (10/12)

• Fig. 3 shows an exemplary scenario to showcase the run-time behavior using different transactions in one router.

Page 23: Run-time  Adaptive on-chip Communication Scheme

23

Algorithm (11/12)

t0: All four directions are occupied with four different transactions; buffers are also assigned.

t1: Transaction T5 requests a path and weights are calculated till tδ taking 4 hardware cycles. A buffer is also assigned to the calculated direction before tδ.

t2: Transaction T1, T2, and T4 free their corresponding channels and assigned buffers.

Page 24: Run-time  Adaptive on-chip Communication Scheme

24

Algorithm (12/12)

t3: Four new transactions T1, T2, T4, and T6 request processing and they are granted resources.

t4: Transactions T7 requests a path and buffer but due to unavailable buffer resources, the transaction cannot be granted. So, the requesting transaction has to wait or inform the upper layer through the system monitor.

Page 25: Run-time  Adaptive on-chip Communication Scheme

25

Motivation Case Study (1/9)

• We motivate the need of an adaptive NoC by means of a very simple scenario. We study an MPEG decoder [1] and an Image Processing Line (IPL) [18] application.

The task graphs are shown in Figures 1a and 1b.

Assume at time t0 the NoC is running the MPEG video decoder (Fig. 1c).

At time t1, the IPL needs to be executed then it is also mapped besides the MPEG onto the processing elements. Once a mapping is performed, the routers attempt to set up meaningful routes (Fig. 1d).

Page 26: Run-time  Adaptive on-chip Communication Scheme

26

Motivation Case Study (2/9)

Fig. 1. Motivation to use an adaptive communication architecture

A

D

C

B HGF

E

Page 27: Run-time  Adaptive on-chip Communication Scheme

27

Motivation Case Study (3/9)

A

D

C

B

Page 28: Run-time  Adaptive on-chip Communication Scheme

28

Motivation Case Study (4/9)

(Fig. 1d) Find Conn. F (Gauss1 to Filter2): // 參考 P.16 Alg.2

First↑ (Gauss1 to MC) : 100%* 1+ 100% // line 16~17← (Gauss1 to Gauss2): 100% // line 13~14

The weighted path, Ga.1 to MC, is better than the path , Ga.1 to Ga.2, so we choose the former. Then→ (MC to Filter2): 100%* 1+ 100% // line 11~12

AD

C

B

EF2 F1Original F, but it failed

Page 29: Run-time  Adaptive on-chip Communication Scheme

29

Motivation Case Study (5/9)

(Fig. 1e) Find Conn. G : // 參考 P.16 Alg.2

First, we have 2 choices, G1 and G2: With G1 → (Gauss2 to Gauss1) : 100%* 2+100% // line 11~12 then→ (Gauss1 to Filter1) : X

It failed.

F

G2

With G2

↑ (Gauss2 to VLD) : 100%* 1+ 100% // line 16~17then→ (VLD to MC) : 100%* 2+100% // line 11~12 then→ (MC to Filter2) : X It failed, too.

Because the 2 choices are not successful. In order to find an available route, we have to re-mapping.

G1

E

Page 30: Run-time  Adaptive on-chip Communication Scheme

30

Motivation Case Study (6/9)

Re-mapping 後的結果Find Conn. E →(Ga.1 to Fi.1): 100%* 1+ 100%↑(Ga.1 to VLD): 100%

AD

C

B

E

F

G

H

Find Conn. F ↑(Ga.1 to VLD) : 100%* 1+ 100%→(VLD to MC) : 80%* 2+ 100%→(MC to Fi.2) : 80%* 1+ 100%

Find Conn. G ←(Ga.2 to Fi.1): 100%* 1+ 100%

Find Conn. H ↑(Ga.2 to Fi.2): 100%* 1+ 100%

最後經由 Alg.3(P.21) ,分配所需的 Buffer

Page 31: Run-time  Adaptive on-chip Communication Scheme

31

Motivation Case Study (7/9)

In this example:

(Fig. 1d) Conn. E : The task Gauss1 first establishes a route to its

neighboring filter task Filter1.

Conn. F : Then, it uses a deterministic XY routing algorithm for Filter2.However, that will fail due to the limited bandwidth availability.

Page 32: Run-time  Adaptive on-chip Communication Scheme

32

Motivation Case Study (8/9)

(Fig. 1e) Conn. F : It forces the router at Gauss1 to try another route, using

the Alg.1.And depending on Alg.3, the routers supply a corresponding buffer block, allocating the buffer to output ports on-demand.

Conn. G & H : The second Gauss task Gauss2 attempts to conduct the same action, but it fails.

(Fig. 1f) Conn. G & H : Thus it becomes necessary to invoke a re-mapping.

And we can successfully find the path with enough bandwidth.

Page 33: Run-time  Adaptive on-chip Communication Scheme

33

Motivation Case Study (9/9)

• If path and buffer blocks are not available the mapping function sends appropriate feedback to the upper layer.

• Therefore, in a dynamic run-time application scenario an adaptive on-chip communication infrastructure which can build connections on-demand to provide QoS.

Page 34: Run-time  Adaptive on-chip Communication Scheme

34

Hardware Implementation

• Our hardware platform for the AdNoC is illustrated in Fig. 4.

• It consists of mainly two parts: the run-time path allocation the on-demand VCB assignment part.

• The path allocation part either decides based on the lookup table or by calculating the type of the flit.

Page 35: Run-time  Adaptive on-chip Communication Scheme

35

Page 36: Run-time  Adaptive on-chip Communication Scheme

36

Conclusion

• We have introduced the first approach of an adaptive on-chip communication architecture. It provides an adaptive path allocation algorithm to meet varying bandwidth guarantees.

• Run-time connections are realized by re-assigning a number of buffer blocks on-demand.

• Our buffer allocation scheme increases the buffer utilization and decreases the overall buffer use.