Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University...

Preview:

Citation preview

Silicon Nanophotonic Network-On-Chip Using TDM Arbitration

Gilbert Hendry – Columbia University

Johnnie Chan, Shoaib Kamil, Lenny Oliker,

John Shalf, Luca P. Carloni, Keren Bergman

2

Why Photonics?

TX RX

ELECTRONICS: Buffer, receive and re-

transmit at every router.

Each bus lane routed independently. (P NLANES)

Off-chip BW is pin-limited and power hungry.

Photonics changes the rules for Bandwidth, Energy, and Distance.

OPTICS: Modulate/receive high

bandwidth data stream once per communication event.

Broadband switch routes entire multi-wavelength stream.

Off-chip BW = On-chip BW for nearly same power.

RX

TX

RX RX

TX

RX

TXRXTX

TX TXTXTX TX

RX

Silicon Photonic Integration

Cornell, 2005

Sandia, 2008 Ghent, 2007

Columbia, 2008

Cornell, 2009

Photonic Networks-on-Chip

[U. of Wisconsin, HP] [MIT] [Columbia]

Corona Photonic Clos PhotonicTorus

Ring Resonators

Modulator/filter

λ λ

Broadband

Circuit-switched P-NoCs

SD

0V1V

n-region

p-region

Electronic Control

0V1V

Ohmic Heater

Thermal Control

Tran

sm

issi

on

Injected Wavelengths

Off-resonance profile

On-resonance profile

Energy-efficient end-to-end transmission

High bandwidth through WDM

Electronic network still available for small control messages*

Network-level support for secure regions

Pros:

Cons:

* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]

Circuit-switched P-NoCs

Path setup latency Path setup contention

(no fairness) Longer paths block more

Head-of-line blocking at gateways

Head of Line Blocking

Core

Core

Core

Core

Tx/Rx

Netw

ork

IF

Bidirectional Waveguide

Bidirectional Electronic Channel

Control Router

Electronic Crossbar

5-port photonic switch

To/From Control plane

To/From Data plane

Seri

aliz

atio

n

Dri

vers

Des

eria

liza

tion

Rec

eive

rs

* [P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]

External Concentration*

TDM Arbitration

Tim

e sl

ot

0 Tim

e sl

ot

1 Tim

e sl

ot

T

t0t1t2

t3t4

tC-3tC-2tC-1

Synchronous Gateway/Control

Time slot ~ 10nsTDM sync clock ~ 100MHz

Nonblocking Network Scheduling

Time slot 0

Time slot 1

Time slot 2

Required time slots = N-1

However…

0

10

20

30

40

50

Inse

rtio

n L

oss

(dB

)

Topology Size (nodes)

Non-BlockingTorus Topology

18.7 25.331.5

38.044.1

50.656.8

63.2

[M. Petracca et al. IEEE Micro, 2008]

Nonblocking topology difficult to implement because of Insertion Loss

* [J. Chan et al. Architectural Exploration of Chip-Scale Photonic Interconnection Network Designs Using Physical-Layer Analysis. JLT, May 2010

Scheduling Time Slots

Problem: Blocking Network Full coverage Minimize Time

Slots (most comm. per

slot)

Constraints: Source contention Destination

contention Topology

contention

Solution: Genetic Search

S

S

S

S

SS

S

S S

S

S

S S

SS

S

S

S S

SS

S

S

S

S

S

SS

S

S S

SS

S

S

S

S

S

SS

S

Population

(size P)

Selection(down to size

psxP)

Reproduction(back to P)

Mutation(still P)

Slot 0: c0, c5, c7, c8Slot 1: c23, c6, c58…Slot T: c42, c65, c1

Initialization

S

Slot 0: c0Slot 1: c1…Slot N2: cN2

Fitness = 1/(number of time slots)

Reproduction: Birds and Bees

S0

S1

c0, c3, c60, c19c27, c4

c100, c71, c9

c1, c17, c23

C

c12, c2, c1, c60c100, c82, c9

c0

c89, c56, c16, c63

c0, c3, c60, c19c12, c2, c1, c60

Mutation: Secret of the Ooze

S

c0, c3, c60, c19c27, c4

c100, c71, c9

c1, c17, c23

c100c71c9

S

c0, c3, c60, c19, c9c27, c4, c100

c1, c17, c23, c71

c100c71c9

Schedule Results

Pop size = 50 Mutation prob = 0.8

16-node 36-node 64-node

10 20 30 40 50 60 701

10

100

1000

10000

10

100

1000

10000

Network size

Exe

cuti

on T

ime

(s)

Sol

utio

n (N

umbe

r of

slo

ts)

Implementation: Photonic Switch

200µm rings Total switch size =

1.4mm x 1.4mm No

S->W, S->E, N->W, N->E (X-then-Y routing)

Implementation: Switch Control Width of LUT = 12

(number of rings) Length of LUT = T

(number of time slots)

Implementation: Network Gateway 1. Send request 2. Grant, set x-

bar and transmit to serializer

3. Receive, deserialize

4. Store in temp buffer, request to core

Simulation Setup PhoenixSim* – Photonic and Electronic

network simulator 64 cores E-mesh, P-mesh, P-TDM Traffic

Random – 32B, 1kB, 32kB messages Scientific application traces

* [Chan et al. PhoenixSim: A Simulator for Physical-Layer Analysis of Chip-Scale Photonic Interconnection Networks. In DATE 2010]

Results – Random Traffic

1 10 100 10000.01

0.1

1

10

100

1000E-MeshP-MeshP-TDM

Measured Bandwidth (GB/s)

Avg

. Lat

ency

s)32B

1 10 100 10000.01

0.1

1

10

100

1000

E-Mesh

Measured Bandwidth (GB/s)

Avg

. Lat

ency

s)

Results – Random Traffic

32B1kB

1 10 100 10000.01

0.1

1

10

100

1000E-Mesh

Measured Bandwidth (GB/s)

Avg

. Lat

ency

s)

Results – Random Traffic

32B1kB32kB

Results – Scientific Applications

Cactus GTC MADbench PARATEC

0.00001

0.0001

0.001

0.01

E-Mesh P-Mesh P-TDM

Exe

cuti

on T

ime

(s)

Cactus GTC MADbench PARATEC

0.00001

0.0001

0.001

0.01

0.1

E-Mesh P-Mesh P-TDM

Ene

rgy

(J)

Benchmark

Num Phases

Num Messages

Total Size (MB)

Avg Msg Size (B)

Cactus 2 285 7.3 25600

GTC 2 63 8.1 129796

MADbench 195 15414 86.5 5613

PARATEC 34 126059 5.4 43.3

Conclusion TDM implements fairness TDM improves network utilization Genetic Search useful for finding full-coverage

static schedule Future Work:

Scaling gracefully* Reducing time slots* Dynamic scheduling

Contact: gilbert@ee.columbia.edu

* [Hendry et al. Time-Division-Multiplexed Arbitration in Silicon Nanophotonic Networks-on-Chip for High Perf. CMPs. In JPDC, Jan 2011]

Recommended