66
A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Chita R. Das Onur Mutlu

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach

  • Upload
    berit

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach . Asit K. Mishra. Onur Mutlu. Chita R. Das. Executive summary. - PowerPoint PPT Presentation

Citation preview

Page 1: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

A Heterogeneous Multiple Network-On-Chip Design:

An Application-Aware Approach

Asit K. Mishra Chita R. DasOnur Mutlu

Page 2: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

2

Executive summary• Problem: Current day NoC designs are agnostic to application requirements and

are provisioned for the general case or worst case. Applications have widely differing demands from the network

• Our goal: To design a NoC that can satisfy the diverse dynamic performance requirements of applications

• Observation: Applications can be divided into two general classes in terms of their requirements from the network: bandwidth-sensitive and latency-sensitive

- Not all applications are equally sensitive to bandwidth and latency

• Key idea: Design two NoC - Each sub-network customized for either BW or LAT sensitive applications - Propose metrics to classify applications as BW or LAT sensitive - Prioritize applications’ packets within the sub-networks based on their sensitivity

• Network design: BW optimized network has wider link width but operates at a lower frequency and LAT optimized network has narrow link width but operates at a higher frequency

• Results: Our proposal is significantly better when compared to an iso-resource monolithic network (5%/3% weighted/instruction throughput improvement and 31% energy reduction)

Page 3: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

3

• Channel bandwidth affects network latency, throughput and energy/power

• Increase in channel BW leads to- Reduction in packet serialization- Increase in router power

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

Page 4: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

4

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

Simulation settings:

• 8x8 multi-hop packet based mesh network

• Each node in the network has an OoO processor (2GHz), private L1 cache and a router (2GHz) • Shared 1MB per core shared L2

• 6VC/PC, 2 stage router

Page 5: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

5

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb sa

p

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

01234567

64b links 128b links 256b links 512b links

IT (

norm

. to

64b

links

)

Page 6: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

6

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb sa

p

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

01234567

64b links 128b links 256b links 512b links

IT (

norm

. to

64b

links

)

1. 18/30 (21/36 total) applications’ performance is agnostic to channel BW (8x BW inc. → less than 2x performance inc.)

Page 7: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

7

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb sa

p

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

01234567

64b links 128b links 256b links 512b links

IT (

norm

. to

64b

links

)

1. 18/30 (21/36 total) applications’ performance is agnostic to channel BW (8x BW inc. → less than 2x performance inc.)

2. 12/30 (15/36 total) applications’ performance scale with increase in channel BW (8x BW inc. → at least 2x performance inc.)

Page 8: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

8

• Reduction in router latency (by increasing frequency)- Reduction in packet latency- Increase in router power consumption

Impact of network latency on application performance

Resource requirements of various applications - II

Page 9: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

9

Resource requirements of various applications - II

Simulation settings:

• … same as last experiment

• 128b links

• Added dummy stages (2-cycle and 4-cycle ) to each router

Impact of network latency on application performance

Page 10: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

10

Resource requirements of various applications - IIap

plu

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq g

asta

r

milc h

swim

sjbb sa

p

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0.5

0.6

0.7

0.8

0.9

1.0

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle ro

uter

)

Impact of network latency on application performance

Page 11: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

11

Resource requirements of various applications - IIap

plu

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq g

asta

r

milc h

swim

sjbb sa

p

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0.5

0.6

0.7

0.8

0.9

1.0

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle ro

uter

)

1. 18/30 (21/36 total) applications’ performance is sensitive to network latency (3x latency reduction → at least 25% performance improvement)

Impact of network latency on application performance

Page 12: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

12

Resource requirements of various applications - IIap

plu

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq g

asta

r

milc h

swim

sjbb sa

p

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0.5

0.6

0.7

0.8

0.9

1.0

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle ro

uter

)

2. 12/30 (15/36 total) applications’ performance is marginally sensitive to network latency (3x latency increase → less than 15% performance improvement)

1. 18/30 (21/36 total) applications’ performance is sensitive to network latency (3x latency reduction → at least 25% performance improvement)

Impact of network latency on application performance

Page 13: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

13

a

wrf art

deal s ba g

h264 gc

c p t

libq g

asta

r

milc h

sjbb sa

p x s

bzip

lbm

sjas

sopl

x

cact

s o

mcf

0.5

0.7

0.9

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle

rout

er)

Application-aware approach to designing multiple NoCs ap

plu

wrf art

deal

sjen

g ba g

h264 gc

c

pvra

y

tont

o

libq g

asta

r

milc h

swim

sjbb sa

p

xala

n s

bzip

lbm

sjas

sopl

x

cact

s o

mcf

01234567

64b links 128b links 256b links 512b links

IT (

norm

. to

64b

links

)

Page 14: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

14

a

wrf art

deal s ba g

h264 gc

c p t

libq g

asta

r

milc h

sjbb sa

p x s

bzip

lbm

sjas

sopl

x

cact

s o

mcf

0.5

0.7

0.9

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle

rout

er)

Application-aware approach to designing multiple NoCs

Based on the observations:

1. Applications can be classified into distinct classes: typically LAT/BW sensitive2. LAT sensitive applications can benefit from low network latency3. BW sensitive applications can benefit from high network bandwidth4. Not all applications are equally sensitive to either LAT or BW5. Monolithic network cannot optimize both classes simultaneously

appl

u

wrf art

deal

sjen

g ba g

h264 gc

c

pvra

y

tont

o

libq g

asta

r

milc h

swim

sjbb sa

p

xala

n s

bzip

lbm

sjas

sopl

x

cact

s o

mcf

01234567

64b links 128b links 256b links 512b links

IT (

norm

. to

64b

links

)

Page 15: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

15

a

wrf art

deal s ba g

h264 gc

c p t

libq g

asta

r

milc h

sjbb sa

p x s

bzip

lbm

sjas

sopl

x

cact

s o

mcf

0.5

0.7

0.9

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle

rout

er)

Application-aware approach to designing multiple NoCs

Solution

Two NoCs where each (sub)network is optimized for either LAT or BW sensitive applications

appl

u

wrf art

deal

sjen

g ba g

h264 gc

c

pvra

y

tont

o

libq g

asta

r

milc h

swim

sjbb sa

p

xala

n s

bzip

lbm

sjas

sopl

x

cact

s o

mcf

01234567

64b links 128b links 256b links 512b links

IT (

norm

. to

64b

links

)

Page 16: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

16

Design methodology

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

Page 17: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

17

Design methodology

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

1

Identify LAT/BW sensitive applications- Proposes a novel dynamic application classification scheme

1

Page 18: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

18

Design methodology

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

1

Identify LAT/BW sensitive applications- Proposes a novel dynamic application classification scheme

1

2 Design sub-networks based on applications’ demand- This network architecture is better than a monolithic iso-resource

network

2

Page 19: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

19

Design methodology

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

1

Identify LAT/BW sensitive applications- Proposes a novel dynamic application classification scheme

1

2 Design sub-networks based on applications’ demand- This network architecture is better than a monolithic iso-resource

network

2

DE

MU

X

Page 20: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

20

Design: Dynamic classification of applications

Network episode Compute episode

time

Application life cycleO

utst

andi

ng n

etw

ork

pack

ets

Page 21: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

21

Design: Dynamic classification of applications

Network episode Compute episode

time

Application life cycle

• App. has at least one outstanding packet• Processor is likely stalling → low IPC

Out

stan

ding

net

wor

k pa

cket

s

Page 22: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

22

Design: Dynamic classification of applications

Network episode Compute episode

time

Application life cycle

• App. has at least one outstanding packet• Processor is likely stalling → low IPC

• App. has no outstanding packet• High IPC

Out

stan

ding

net

wor

k pa

cket

s

Page 23: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

23

Design: Dynamic classification of applications

Network episode Compute episode

Epi

sode

hei

ght

Episode length time

Application life cycle

• App. has at least one outstanding packet• Processor is likely stalling → low IPC

• App. has no outstanding packet• High IPC

Episode length = Number of consecutive cycles there are net. packets

Episode height = Avg. number of L1 packets injected during an episode

Out

stan

ding

net

wor

k pa

cket

s

Page 24: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

24

Design: Dynamic classification of applications

Network episode Compute episode

time

Application life cycle

• App. has at least one outstanding packet• Processor is likely stalling → low IPC

• App. has no outstanding packet• High IPC

Out

stan

ding

net

wor

k pa

cket

s

Short episode ht.: Low MLP, each request is critical (LAT sensitive)Tall episode ht.: High MLP (BW sensitive)

Short episode len.: Packets are very critical (LAT sensitive)Long episode len.: Latency tolerant (could be de-prioritized)

Episode length Epi

sode

hei

ght

Page 25: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

25

Classification and rankingClassification Length

Long Medium ShortTall gems, mcf sphinx, lbm, cactus, xalan sjeng, tonto

Height Medium omnetpp, apsiocean, sjbb, sap, bzip,

sjas, soplex, tpc

applu, perl, barnes, gromacs, namd, calculix,

gcc, povray, h264, gobmk, hmmer, astar

Short leslie art, libq, milc, swim wrf, deal

Classification: LAT/BW

Page 26: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

26

Classification and ranking

Classification: LAT/BW

Ranking Length Long Medium Short

High Rank-4 Rank-2 Rank-1Height Medium Rank-3 Rank-2 Rank-2

Short Rank-4 Rank-3 Rank-1

Ranking: Sensitivity to LAT/BW

Classification Length Long Medium Short

Tall gems, mcf sphinx, lbm, cactus, xalan sjeng, tonto

Height Medium omnetpp, apsiocean, sjbb, sap, bzip,

sjas, soplex, tpc

applu, perl, barnes, gromacs, namd, calculix,

gcc, povray, h264, gobmk, hmmer, astar

Short leslie art, libq, milc, swim wrf, deal

Page 27: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

27

Network design

1N-128

Page 28: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

28

Network design

1N-128 2N-64x256-ST(Steering)

Page 29: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

29

Network design

1N-128 2N-64x256-ST(Steering)

2N-64x256-ST-RK(Steering+Ranking)

Page 30: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

30

Network design

1N-128 2N-64x256-ST(Steering)

2N-64x256-ST-RK(Steering+Ranking)

2N-64x256-ST-RK(FS)(Steering+Ranking and

Frequency Scaling)

Page 31: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

31

Network design

1N-128

1N-256

2N-64x256-ST(Steering)

2N-64x256-ST-RK(Steering+Ranking)

2N-64x256-ST-RK(FS)(Steering+Ranking and

Frequency Scaling)

1N-512 (High BW)2N-128X128

1N-320(iso-BW)

1N-320(FS)(iso-resource)

Page 32: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

32

Analysis

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Performance (25 WL with 50% BW and 50% LAT)

Page 33: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

33

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

Page 34: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

34

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

+18%

Performance (25 WL with 50% BW and 50% LAT)

Page 35: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

35

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

+7%+18%

Page 36: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

36

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

+5%+7%+18%

Page 37: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

37

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

5%+5%

+7%+18%

Page 38: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

38

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

w. 2%5%

+5%+7%+18%

Page 39: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

39

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

w. 2% w. 2%5%

+5%+7%+18%

Page 40: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

40

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

w. 2% w. 2%5%

+5%+7%+18%

1N-128

1N-256

2N-128x128

1N-512

2N-64x256-ST

2N-64x256-ST+

RK(no FS)

2N-64x256-ST+RK(FS)

1N-320(no FS)

1N-320 (FS)

0

0.4

0.8

1.2

1.6

2

Nor

mal

ized

ene

rgy

Energy (25 WL with 50% BW and 50% LAT)

- 47%- 59%

Page 41: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

41

1N-1

28

1N-2

56

2N-1

28x1

28

1N-5

12

2N-6

4x256

-ST

2N-6

4x256

-ST+R

K(no F

S)

2N-6

4x256

-ST+R

K(FS)

1N-3

20(n

o FS)

1N-3

20 (F

S)0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

w. 2% w. 2%5%

+5%+7%+18%

1N-128

1N-256

2N-128x128

1N-512

2N-64x256-ST

2N-64x256-ST+

RK(no FS)

2N-64x256-ST+RK(FS)

1N-320(no FS)

1N-320 (FS)

0

0.4

0.8

1.2

1.6

2

Nor

mal

ized

ene

rgy

Energy (25 WL with 50% BW and 50% LAT)

- 47%- 59%

Best EDP across all designs

Page 42: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

42

Conclusions• Problem: Current day NoC designs are agnostic to application requirements and

are provisioned for the general case or worst case. Applications have widely differing demands from the network

• Our goal: To design a NoC that can satisfy the diverse dynamic performance requirements of applications

• Observation: Applications can be divided into two general classes in terms of their requirements from the network: bandwidth-sensitive and latency-sensitive

- Not all applications are equally sensitive to bandwidth and latency

• Key idea: Design two NoC - Each sub-network customized for either BW or LAT sensitive applications - Propose metrics to classify applications as BW or LAT sensitive - Prioritize applications’ packets within the sub-networks based on their sensitivity

• Network design: BW optimized network has wider link width but operates at a lower frequency and LAT optimized network has narrow link width but operates at a higher frequency

• Results: Our proposal is significantly better when compared to an iso-resource monolithic network (5%/3% weighted/instruction throughput improvement and 31% energy reduction)

Page 43: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

43

Thank youQ?

Asit [email protected]

Page 44: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

44

Backup Slides . . .

Page 45: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

45

Other metrics considered for application classification

0

20

40

60

80

100

120

0

200

400

600

800

1000

1200

1400

1600

1800

2000

L1MPKI L2MPKI Slack

L1/L

2 M

PK

I

Sla

ck (i

n cy

cles

)

Page 46: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

46

Analysis of network episode length and height

appl

u

wrf art

deal

sjeng

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc h

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

02468

101214

Avg.

epi

sode

hei

ght

(net

wor

k pa

cket

s)

appl

u

wrf art

deal

sjeng

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0100020003000400050006000

Avg.

epi

sode

leng

th

(in cy

cles)

Short length/heightMedium length/heightLong length/High height

0.3M10K

0.4M18K

Page 47: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

47

Analysis of network episode length and height

appl

u

wrf art

deal

sjeng

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc h

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

02468

101214

Avg.

epi

sode

hei

ght

(net

wor

k pa

cket

s)

appl

u

wrf art

deal

sjeng

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0100020003000400050006000

Avg.

epi

sode

leng

th

(in cy

cles)

Short length/heightMedium length/heightLong length/High height

Based on performance scaling sensitivity to bandwidth and frequency

0.3M10K

0.4M18K

Page 48: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

48

Empirical results to support the classification SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 49: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

49

Empirical results to support the classification SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 50: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

50

Empirical results to support the classification SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 51: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

51

Empirical results to support the classification SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 52: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

52

Empirical results to support the classification SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Why 9 clusters?

0 5 10 15 20 25 30 350

255075

100125150175200225

Number of clusters

With

in g

roup

sum

of

squa

res

Page 53: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

53

Empirical results to support the classification SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Why 9 clusters?

0 5 10 15 20 25 30 350

255075

100125150175200225

Number of clusters

With

in g

roup

sum

of

squa

res 13x

Page 54: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

54

Empirical results to support the classification SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Why 9 clusters?

0 5 10 15 20 25 30 350

255075

100125150175200225

Number of clusters

With

in g

roup

sum

of

squa

res

Page 55: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

55

Analysis with varying workload combinations

0% BANDWIDTH 100% LATENCY

25% BAND-WIDTH 75% LATENCY

50% BAND-WIDTH 50% LATENCY

75% BAND-WIDTH 25% LATENCY

100% BAND-WIDTH 0% LATENCY

0.7

0.9

1.1

1.3

1.5WS IT

WS

and

IT (n

orm

. to

1N-1

28 n

et.)

Page 56: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

56

Comparison to prior works

1N-1

28-S

TC

1N-1

28-S

T+R

K

2N-1

28x1

28-L

D-B

AL

2N-6

4x25

6-W

-LD

-BA

L

2N-6

4x25

6-S

T+R

K...

0.8

1.0

1.2

1.4WS IT

WS

and

IT (n

orm

. to

1N-1

28 n

et.)

Page 57: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

57

Dynamic steering of packets

appl

u art

barn

es

nam

d

gcc

libq

asta

r

hmm

er

lesl

ie

calc

ulix

sjen

g

sap

sphn

x

lbm

sopl

x

omne

t

mcf

ocea

n

0%

20%

40%

60%

80%

100%

Latency-optimized network

% p

acke

ts in

sub

-net

wor

k

Page 58: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

58

Design: Putting it all together

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Page 59: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

59

Design: Putting it all together

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Classify applications based on sensitivity to network BW/LAT

Page 60: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

60

Design: Putting it all together

Episode LEN/HT

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Classify applications based on sensitivity to network BW/LAT

Use network episode length/height to dynamically

identify apps

Page 61: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

61

Design: Putting it all together

Episode LEN/HT

Design LAT/BW optimized networks

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Classify applications based on sensitivity to network BW/LAT

Use network episode length/height to dynamically

identify apps

Page 62: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

62

Design: Putting it all together

Classify applications based on sensitivity to network BW/LAT

Episode LEN/HT

Design LAT/BW optimized networks

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Use network episode length/height to dynamically

identify appsPrioritization within

networks

Page 63: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

63

Summary

• A NoC paradigm based on top-down approach (application demand/requirement analysis)

• An efficient design paradigm for future heterogeneous multicores

Page 64: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

64

Summary

• A NoC paradigm based on top-down approach (application demand/requirement analysis)

• An efficient design paradigm for future heterogeneous multicores

Small core

GPGPUs

Accelerators/ ASIC

Latency critical

Throughput (BW) critical

Throughput (BW) critical

Latency critical (real-

time constraints)

Big core

Page 65: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

65

Summary

• A NoC paradigm based on top-down approach (application demand/requirement analysis)

• An efficient design paradigm for future heterogeneous multicores

Small core

GPGPUs

Accelerators/ ASIC

Latency critical

Big core

Providing all these guarantees in one network is hard

Multiple networks: each customized for one metric

Throughput (BW) critical

Throughput (BW) critical

Latency critical (real-

time constraints)

Page 66: A Heterogeneous Multiple Network-On-Chip Design:  An  Application-Aware Approach

66

Summary

• A NoC paradigm based on top-down approach (application demand/requirement analysis)

• An efficient design paradigm for future heterogeneous multicore m/c

Latency

Throughput

Local communication

Long haul comm.

Power

1 cycle/ bufferless/Faster

routers

1 cycle/ high bandwidth

Power efficient links/DVFS router

Hybrid/ fewer connectivity

network

Butterfly/express channels

MU

XShare 2D space or 3D layers

Episode LEN/HT/??