68
AISTECS 2019 Emerging Silicon Nanophotonic Networks: Time to Bridge the Gap with System Designers Davide Bertozzi University of Ferrara (Italy) - Temporary Guest Scientist at IHP Microelectronics (Germany)

EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

AISTECS 2019

Emerging SiliconNanophotonic Networks:Time to Bridge the Gap with System Designers

Davide BertozziUniversity of Ferrara (Italy) -Temporary Guest Scientist at

IHP Microelectronics (Germany)

Page 2: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

• Evolution of the top10 in the last six years:

• Average total compute power:• 0.86 PFlops 21 PFlops• ~24x increase

• Average node compute power:• 31GFlops  600GFlops• ~19x increase

• Average number of nodes• 28k  35k• ~1.3x increase

Node compute power main contributor to performance growth

Node compute power may keep scaling thanks to customization

Average of top 10 sytems, relative to 2010

24x

19x

1.3x

[S. Rumley, et al. Optical Interconnects for Extreme Scale Computing Systems, Journal of Parallel Computing, pp.65-80, 2017]

Trends in Extreme HPC

<<Like 1980s, great time for architects!>>(John L. Hennessy & David A. Patterson, Turing Lecture, ISCA 2018)

Page 3: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

• Top 10 average node levelevolutions:

• Average node compute power:• 31GFlops  600GFlops• ~19x increase• Number of nodes: ~1.3x• Total Compute power: ~24x

• Average bandwidth availableper node

• 2.7GB/s  7.8GB/s• ~3.2x increase

• Average byte‐per‐flop ratio• 0.06 B/Flop  0.01 B/Flop• ~6x decrease• Sunway TaihuLight (#1) shows 0.004 B/Flop !!

Growing gap in interconnect bandwidth might cause aggregate execution performance not to keep up with available compute power!

Average of top 10 sytems, relative to 2010

19x

3.2x

0.17x

Trends in Extreme HPC

[S. Rumley, et al. Optical Interconnects for Extreme Scale Computing Systems, Journal of Parallel Computing, pp.65-80, 2017]

What about Connectivity?

Page 4: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Interconnect Power Concern

Source: W.Dally

Data from 28nm NVIDIA chips

Source: S.Borkar

Computation will be relatively inexpensivein terms of energy over communication

Bandwidth should be increased within tighter and tighterpower budgets

Page 5: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

NI

SRAM

switch

NI

CPU

Accel

NI

switch

switch

NI

SRAM NI

NINI

NINI

DSP DMA MPEG

CPU

Ethnt

switch

switch

switch

switch

switch

switch

But surprisingly criticalities are showing up even in the lowest layer (chip‐scale communications)

EMERGING Network‐on‐Chip CRITICALITIES: Latency sensitivityof the multi‐hop fabric Bandwidth criticalitiesfor future kilo‐core chips The power overheadfor moving bits around Non‐seamless scaling to off‐chip comm.

The Communication Hierarchy

Courtesy of K.Bergman

WE NEED A GAME CHANGER!

A lot of work is going on at the upper layers of the interconnection hierarchy: A lot of activity: PCIe, GEN‐Z, OpenCAPI, CCIX, Ethernet, InfiniBand,..

Page 6: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Node Router Router Node

Long distance Router-Router link

Electrical transceivers

Optical transceivers

Router

Node

Short distanceNode-Router link

Silicon Photonics: Game Changer?

Short distanceRouter-RouterLink (electricallink or VCSEL-based optical technology)

Silicon photonics uses co‐integration techniquesof optical components and/or transceivers with 

standard CMOS manufacturing process

Page 7: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Silicon Photonics: Game Changer?Silicon photonics is delivering integrated optical transceivers and holds promise of bringing optical communications closer to and deeper into the processing node

Node Router

Core

Router

Core

Node

Electrical transceivers

Integrated optical transceivers

Router

Core

Node

Short distanceNR link

Conventionalhop‐by‐hop 

data movement

Flattenedend‐to‐end 

data movement

Courtesy of K.Bergman

Page 8: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Silicon Photonics: Game Changer?Silicon photonics is delivering integrated optical transceivers and holds promise of bringing optical communications closer to and deeper into the processing node

Node Router

Core

Router

Core

Node

Electrical transceivers

Integrated optical transceivers

Router

Core

Node

Short distanceNR link

Key enablerfor new paradigms:

disaggregatedarchitectures

Page 9: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Silicon Photonics: Game Changer?Silicon photonics holds promise of integrated optical transceivers and of bringing

optical communications closer to and deeper into the processing node

Node Router

Core

Router

Core

Node

Electrical transceivers

Integrated optical transceivers

Router

Core

Node

Short distanceNR link

Requirements for that to happen• Divide cost by 1.5 orders of magnitude at least• Improve energy efficiency by one order of magnitude at least• Efficient integration solutions with electronics• Improve system‐ability of the technology

ImprovingTechnology Maturity

Architecture and system‐level design

&

Page 10: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

The gap between system-level designers and technology developers is huge!Architecture design points stemdirectly from designers’ intuition

Descriptive information at differentabstraction layers are mixed

Designs are difficult to compare with one another

The application of well‐knownoptimization techniques is difficult

No consistent methodologies to explore the design space

Most of the design spacestill largely unknown

Mind the Gap

Page 11: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

TODAY

The gap between system-level designers and technology developers is huge!Architecture design points stemdirectly from designers’ intuition

Descriptive information at differentabstraction layers are mixed

Designs are difficult to compare with one another

The application of well‐knownoptimization techniques is difficult

No consistent methodologies to explore the design space

Most of the design spacestill largely unknown

Golden ageof ONoC

assessment(~2008‐2012)Estimated

power savingswith nanophotonicnetworks

Mind the Gap

Early‐stage ONoC analysis: inflated expectations

Example of optical parametersused in early‐stage analyses

Gartner Hype Cycle

Page 12: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

TODAY

The gap between system-level designers and technology developers is huge!Architecture design points stemdirectly from designers’ intuition

Descriptive information at differentabstraction layers are mixed

Designs are difficult to compare with one another

The application of well‐knownoptimization techniques is difficult

No consistent methodologies to explore the design space

Most of the design spacestill largely unknown

How to change the through of 

disillusionment into a slope to enlightment?

Mind the Gap

Gartner Hype Cycle

Page 13: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Goal:

Bridge the gap between developers of emerging devices and circuit & systemdesigners, thus coupling emerging interconnect technologies and architectureswith digital systems and working out novel system‐level design concepts.

Focus:

Photonically‐integrated chip‐scale parallel computing

Their coupling with off‐chip memory sub‐systems

Methodology:

Addressing the horizontal integration gap

Addressing the vertical integration gap

A Framework to Bridge the Gap

Optical NetworkProcessor(s) Cachehierarchy

ENoC DRAM GPU

Page 14: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

BACKGROUND

OPTICAL NETWORKS‐ON‐CHIP

Page 15: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Optical NoC Initiator

1‐4In1

Wavelength‐divisionmultiplexed input signal

4‐stage modulator

ONoC Input 0101010

ElectricalSignal

Page 16: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Optical NoC Target1‐4Out

1 ONoC Output

TIA

Comp

010..

TIA

Comp

010..

TIA

Comp

010..

TIA

Comp

010..

Page 17: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Wavelength‐Routed Optical NoCs

I1 O1

O2

O3

O4

λ1

λ2λ3

λ4

I4λ2

λ1

λ3

λ4

.....

.

Main feature: static allocation of channels to source‐destination pairs

The topology needs to avoid interference of 

same‐wavelength carriers

No Time spent in routing and  arbitration

All‐optical interconnect solution

Performance predictability

All‐to‐all communications can take  place concurrently

Hard to scale to a large number of cores

Page 18: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Better topologies exist, that reusethe same set of 4 wavelengthsacross all initiators

II

I

I1234

O1

II

I

I1234

O2

II

I

I1234

O3

II

I

I1234

O4

(Naive) Non‐blocking Crossbar

I11234

I21234

I31234

I41234

Wavelength‐Routed Optical NoCsI1 O1

O2

O3

O4

λ1

λ2λ3

λ4

I4 λ2

λ1

λ3

λ4

.....

.

Main feature: static allocation of channels to source‐destination pairs

Page 19: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Better topologies exist, that reusethe same set of 4 wavelengthsacross all initiators

State‐of‐the‐art «Snake» topology

Wavelength‐Routed Optical NoCsI1 O1

O2

O3

O4

λ1

λ2λ3

λ4

I4 λ2

λ1

λ3

λ4

.....

.

Main feature: static allocation of channels to source‐destination pairs

Page 20: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Static Power OverheadA major source of overhead of optical NoCs comes from static power

1 dBm

0.763 dBm

Passing by a ring0.005dB each

Waveguide crossing0.05dB each

Propagation loss0.274 dB/cm0.1 cm

PhotodetectorSensitivity

Laser sources Thermal Tuning

Insertion loss (and laser power requirements) depends on the connectivity pattern

Page 21: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Horizontal Integration Challenge

Optical NetworkProcessor(s) Li NoC DRAM GPU

Page 22: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

TSV

TSV

TSV

TSV

M1

2

3

4

λ1 λ2 λ3 λ4

Array of off‐chip CW lasers

Electronic layer

Photonic layer

Off‐chip memories

Cluster of processor cores

M2

M3

M4H4

H3

H2

H11

Target ArchitectureSolutions such as 3D or 2.5D integration allow for the separation of

both electronic and photonic processes and open the door to a fully dedicated process optimization for the photonic die

Gateways

Hubs

Page 23: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

System ViewEswitch Eswitch

Eswitch Eswitch

Local Domain

Eswitch Eswitch

Eswitch Eswitch

Local Domain

Eswitch Eswitch

Eswitch Eswitch

Local Domain

Eswitch Eswitch

Eswitch Eswitch

Local Domain

Top level

Source: IBM

Page 24: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

System View

Data rate adaptation(De-)Serialization

Flow control Clock Resynchronization

Message-dependent deadlock avoidance

Not Just E/O and O/E Converters, but an Architecture Integration Challenge

Eswitch Eswitch

Eswitch Eswitch

Eswitch Eswitch

datavalid

stall

Local domain

EswitchEswitch

EswitchEswitch

EswitchEswitch

Local domain

datavalid

stall

Photo‐detector

TIA

PD

TIA

Driver

Modulator

Modulator

Driver

PD

Source: SSSA Pisa

Page 25: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Architecture Integration Challenge

1 0

>= 10GHz

1) Data rate adaptation

[0.5 ÷ 3] GHz

Clock speed Modulation Rate

Page 26: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

2) (De-)SerializationArchitecture Integration Challenge

01

01

11

01 0101 1101

01

01

11

01

32/64/128/256 bits Optical bitstream

Page 27: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

3) Flow control Architecture Integration Challenge

Buffer size is a function of the round trip time for 

full‐throughput operation

System ViewEswitch Eswitch

Eswitch Eswitch

Eswitch Eswitch

datavalid

stall

Local domain

EswitchEswitch

EswitchEswitch

EswitchEswitch

Local domain

datavalid

stall

InterfaceInterface

BufferBuffer

Page 28: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

4) Clock ResynchronizationArchitecture Integration Challenge

1 0

>= 10GHz

Local ClockData

Δ

System ViewEswitch Eswitch

Eswitch Eswitch

Eswitch Eswitch

datavalid

stall

Local domain

EswitchEswitch

EswitchEswitch

EswitchEswitch

Local domain

datavalid

stall

InterfaceInterface

Reconverted signal

Page 29: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

5) Message-dependent deadlock avoidanceArchitecture Integration Challenge

System ViewEswitch Eswitch

Eswitch Eswitch

Eswitch Eswitch

datavalid

stall

Local domain

EswitchEswitch

EswitchEswitch

EswitchEswitch

Local domain

datavalid

stall

InterfaceInterface

Page 30: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

System ViewEswitch Eswitch

Eswitch Eswitch

Eswitch Eswitch

datavalid

stall

Local domain

EswitchEswitch

EswitchEswitch

EswitchEswitch

Local domain

datavalid

stall

InterfaceInterface

BRIDGE

BRIDGE

The bridge is a complex blocktaking care of key functional tasksfor architecture correct operation, built on top of a multi-technology

platform and supporting GHz-range signaling rates

Page 31: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Bridge Configuration

One of the keychallenges consistsof overcoming the inherent serial nature of optical communications

0101 1101

01

01

11

01

Increasing the signalling rate of optical channels

Increasing the bit‐levelparallelism (WDM)

A combination thereof

Research Goal: Explore and Characterize the Configuration Space of the Bridge

Pay Attention: CMOS cannot achieve arbitrary speeds!

SERDES

Implications over the SerDes Architecture, henceover the performance‐power trade‐off of the bridge

010 1101

0101

1101

0101

1101

0101 1101

01

01

11

01

01

01

11

01

01

01

11

01

Page 32: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Technology Partitioning16 3D‐stacked computation clusters

16x16 optical NoC (ONoC) CMOSOptics

In static‐power dominated technologies like silicon photonics, operation at high transmission rates may become a priority

to cut down on pJ/bit

Better performing technologies than CMOS may be required in the back‐end of the bridge

Bridge

OpticsCMOS BiCMOS

We select IHP 130nm SiGe BiCMOS (SG13S)‐ fT/Fmax=250 GHz / 340 GHz‐ 3.3V I/O CMOS, 1.2V logic CMOS‐ 5 thin metal layers, 2 thick onesTarget logic family:‐ 2.5V compatible ECL‐ A Cell library provides std cell gates‐ Logic synthesis from HDL enabled (Synopsys DC)Similar technologies providemonolithic integration

of optical components with the BiCMOS process

OpticsCMOS BiCMOS

Our assumption

Page 33: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Mux

(15x2)x1

DC‐FIFO (req)

DC‐FIFO (reply)

CompDemux

1x3

DESER

Arbiterλ1c

λ13

λ12

λ11

………….

Comp

Comp

TIA

TIA

TIA

PD

PD

PD

Comp TIA PDClockDivider

VCdec

Demux

1x3

DC‐FIFO (req)

DC‐FIFO (reply)

dec

dec

1x15

Mux

3x1

.

.

.

.

SER

PLL

Arbiter

1x15

Driversλ11

λ12

λ13

λ1c

Modulators

ENoC

MesochronousSynchronizer

Credit counter 15

Credit counter 1

DC‐FIFO

DC‐FIFO…..

Filters

TRANSMITTER SIDE

RECEIVER SIDE

Transmitter side

Receiver side

Bridge Architecture

Gateway

Page 34: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Bridge Architecture –Transmitter Side1 transmission module for each target

(15 of them in a 16x16 ONoC)Optimization: only one set of buffers for all destinations

VCdec

Demux

1x3 DC‐FIFO (req)

DC‐FIFO (reply)

dec

dec

1x15 M

ux3x1

.

.

.

.

SER

PLL

1x15

Drivers λ11

λ12

λ13

λ1c

Modulators

ENoC

1 GHz Network Interface Frequency=f(Modulation rate)

ModulationRate (e.g., 10 Gbps)

Driversλ15_1

λ15_2

λ15_3

Mux

3x1

Arbiter

λ15cArbiter

SER

One virtual channelfor each message class

to avoid (message‐dependent) deadlock

Bit‐level Parallelism

Source‐synchronous communication

Page 35: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Bridge Architecture: Receiver SideMux

(15x2)x1

Arbiter

1 GHz Network Interface Frequency=f(transmission frequency)

ModulationRate (e.g., 10 Gbps)

DC‐FIFO (req)

DC‐FIFO (reply)

CompDemux

1x3 DESER

λ15c

λ15_3

λ15_2

λ15_1

Comp

Comp

TIA

TIA

TIA

PD

PD

PD

Comp TIA PDClockDivider

DC‐FIFO (req)

DC‐FIFO (reply)

CompDemux

1x3 DESER

λ1c

λ13

λ12

λ11

………….

Comp

Comp

TIA

TIA

TIA

PD

PD

PD

Comp TIA PDClockDivider

Receiver module 1

Receiver module 15

Source‐synchronous communication

1 receiver module for each transmitter

Page 36: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Flow ControlMux

(15x2)x1

DC‐FIFO (reply)

CompDemux

1x3 DESER

Arbiterλ1c

λ13

λ12

λ11

………….

Comp

Comp

TIA

TIA

TIA

PD

PD

PD

Comp TIA PDClockDivider

VCdec

Demux

1x3 DC‐FIFO (req)

DC‐FIFO (reply)

dec

dec

1x15 M

ux3x1

.

.

.

.

SER

PLLArbiter

1x15

Drivers λ11

λ12

λ13

λ1c

Modulators

ENoC

MesochronousSynchronizer

Credit counter15

Credit counter1

DC‐FIFO

DC‐FIFO….. Credit‐based flow control:

‐ Reuses the datapath‐ Exploits low dynamic power of 

ONoCs‐ No round‐trip timing assumptions

Can fire only if credits available

Page 37: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Master D‐Latch

clkd q

Slave D‐Latch

Mux

clock f

Input Data

Output Data

2:1Mux 

f 2f

clkd q

clkd q

2:1 Mux Cell is the main building block 

a

b

a

b ba

Transmission frequency = twice the input clockLower PLL frequency

PERFECT BINARY TREE STRUCTURE• M =Log2(N) Stages working at halved speed 

with respect to one another• The number of building blocks  per stage is 

inversely proportional to the operating frequency  Energy savings

No need for additional selectors

Mux2x1

÷2f

Input ClockOutput 

Clockf/4 f/2

÷2÷2

Input Data

f/8f/4

f/2

f

2f

f/8

16:1Mux 

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

f/2

f/4

Output Data

N

Serializer Architecture

Page 38: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Mux2x1

÷2f

Input ClockOutput 

Clockf/4 f/2

÷2÷2

Input Data

f/8f/4

f/2

f

2f

f/8

16:1Mux 

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

f/2

f/4

Output Data

More parallelism: remove stages from the right

This architecture is very flexible, it can easily span a wide bridge configuration space

Scale up: add more stages to the leftFlexibility

Mux2x1

÷2f

Input ClockOutput 

Clockf/4 f/2

÷2÷2

Input Data

f/8f/4

f/2

f

2f

f/8

16:1Mux 

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

f/2

f/4

Output Data

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

÷2 ÷2fInput Clock

Output Clock f/4 f/2

Output Data

÷2÷2

Input Data

f/16 f/8f/4

f/2

f

f/8f/16

2f

Mux 32x1 32x1

MUX

OutputData 1

OutputData 2

f

f

16x2MUX

InputClock

2 bitsOpticalParallelism

Page 39: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

f0

DC‐FIFO 1

Routing1 M

UX 

3x1

DEMUX

 1x15

15 

1

Arbiter

Credits From Rx15

DEMUX

 1x2

MUX

 30x1

Arbiter

Credit counter

Credit counter

15 

M15Data

÷2÷2

VC DECODER

MESO

Tx

comp TIA PD15CLK

÷2 ÷2

clk5 clk4 clk3 clk2

PLL

clk1

32x1 Binary Tree Serializer 15 Driver

DC‐FIFO 2

Routing2

DEMUX

 1x15

MUX 

3x1

15 

15 

15 

M15CLK

VC DECODER

VC‐ID

DC‐FIFO 1

DC‐FIFO 15

Credits to Rx15

DC‐FIFO 29

DC‐FIFO 30

DEMUX

 1x

3

1x32 Binary Tree Deserializer 15clk5 clk4 clk3 clk2 clk1

÷2÷2 ÷2 ÷2 comp TIA PD15Data

15 

RX

f1f1/16 f1/2f1/4f1/8

15 

VC_ID

ONOC

Driver

Clock domains

Laser Source

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Mux2x1

Input Dataf/8

÷2 ÷2

Output Dataf

f/2

Input Clock

Output Clock f/8

f/4

f/4

f/8

f/4

f/2

Mux 8x1 

Demux2x1

Input Dataf

÷2 ÷2

Output Data

f/8Input Clock

Output Clockf/2 f/4

f/4

f/2

Demux 1x8  Demux

2x1

Demux2x1

Demux2x1

Demux2x1

Demux2x1

Demux2x1

f/4 f/8

f/8

Experimental Results

Bridge Front‐End Architecture

(De)‐Serialization + TransceiversOpto‐electronics

+ ONoC

CMOS Two process nodes

(bulk 40nm or 28 nm FD‐SOI)

CMOS

ECL 130nm

Partitioning options due to multi‐stage nature of the serializer

Page 40: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

DP2DP3DP4DP5

DP1

Target Data Rate for source‐destination connection: @25 Gbit/s

DP2DP3DP4

DP1

Target Data Rate for source‐destination connection: @40 Gbit/s

CMOS 40nm ECL 130nmDP : Design point FD‐SOI 28nm

1.28 ns 0.64 ns 0.32 ns 0.16 ns 0.08 ns

Bridge Front‐EndArchitecture

1.28 ns

0.8 ns 0.4 ns 0.2 ns 0.1 ns 0.05 ns

Bridge Front‐EndArchitecture

0.8 ns

Experimental [email protected] Gbit/s

@12.5 Gbit/s

2‐bits

@20 Gbit/s

@20 Gbit/s

2‐bit

Fully CMOS

Fully CMOS

@6.25 Gbit/s

@6.25 Gbit/s

4‐bits

Fully CMOS

@10 Gbit/s

@10 Gbit/s

4‐bits

Page 41: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

0

1

2

3

4

5

6

7

DP1 DP2 DP3 DP4 DP5 DP1 DP2 DP3 DP4 DP5 DP1 DP2 DP3 DP4 DP1 DP2 DP3 DP4 DP1 DP2 DP3 DP4 DP1 DP2 DP3 DP4

130nm ECL‐CMOS 40nm 130nm ECL‐28nm FDSOI CMOS

Energy‐per‐bit

(pJ/bit)

X

1 channel x 25 Gbit/s

4 channels x 6.25 Gbit/s

2 channels x 12.5 Gbit/s

X

1 channelx 40 Gbit/s

4 channelsx 10 Gbit/s

2 channelsx 20 Gbit/s

25 Gbit/s 40 Gbit/s

x : Not feasibleExperimental Results

Fully‐CMOS

‐31%

‐84%

Hybrid CMOS‐ECL

Fully CMOS Fully CMOS

Hybrid CMOS‐ECL

Fully CMOS

Hybrid CMOS‐ECL

Page 42: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Experimental Results

f0

DC‐FIFO 1

Routing1 M

UX 

3x1

DEM

UX 

1x15

15 

1

Arbiter

Credits From Rx15

DEM

UX 

1x2

MUX 30

x1

Arbiter

Credit counter

Credit counter

15 

M15Data

÷2÷2

VC DECODER

MESO

Tx

comp TIA PD15CLK

÷2 ÷2

clk5 clk4 clk3 clk2

PLL

clk1

32x1 Binary Tree Serializer 15 Driver

DC‐FIFO 2

Routing2

DEM

UX 

1x15

MUX 

3x1

15 

15 

15 

M15CLK

VC DECODER

VC‐ID

DC‐FIFO 1

DC‐FIFO 15

Credits to Rx15

DC‐FIFO 29

DC‐FIFO 30

DEM

UX 

1x3

1x32 Binary Tree Deserializer 15clk5 clk4 clk3 clk2 clk1

÷2÷2 ÷2 ÷2 comp TIA PD15Data

15 

RX

f1f1/16 f1/2f1/4f1/8

15 

VC_ID

ONOC

Driver

Clock domains

Laser Source

f0

DC‐FIFO 1

Routing1 M

UX 

3x1

DEM

UX 

1x15

15 

1

Arbiter

Credits From Rx15

DEM

UX 

1x2

MUX 30

x1

Arbiter

Credit counter

Credit counter

15 

M15Data

÷2÷2

VC DECODER

MESO

Tx

comp TIA PD15CLK

÷2 ÷2

clk5 clk4 clk3 clk2

PLL

clk1

32x1 Binary Tree Serializer 15 Driver

DC‐FIFO 2

Routing2

DEM

UX 

1x15

MUX 

3x1

15 

15 

15 

M15CLK

VC DECODER

VC‐ID

DC‐FIFO 1

DC‐FIFO 15

Credits to Rx15

DC‐FIFO 29

DC‐FIFO 30

DEM

UX 

1x3

1x32 Binary Tree Deserializer 15clk5 clk4 clk3 clk2 clk1

÷2÷2 ÷2 ÷2 comp TIA PD15Data

15 

RX

f1f1/16 f1/2f1/4f1/8

15 

VC_ID

ONOC

Driver

Clock domains

Laser Source

f0

DC‐FIFO 1

Routing1 M

UX 

3x1

DEM

UX 

1x15

15 

1

Arbiter

Credits From Rx15

DEM

UX 

1x2

MUX 30

x1

Arbiter

Credit counter

Credit counter

15 

M15Data

÷2÷2

VC DECODER

MESO

Tx

comp TIA PD15CLK

÷2 ÷2

clk5 clk4 clk3 clk2

PLL

clk1

32x1 Binary Tree Serializer 15 Driver

DC‐FIFO 2

Routing2

DEM

UX 

1x15

MUX 

3x1

15 

15 

15 

M15CLK

VC DECODER

VC‐ID

DC‐FIFO 1

DC‐FIFO 15

Credits to Rx15

DC‐FIFO 29

DC‐FIFO 30

DEM

UX 

1x3

1x32 Binary Tree Deserializer 15clk5 clk4 clk3 clk2 clk1

÷2÷2 ÷2 ÷2 comp TIA PD15Data

15 

RX

f1f1/16 f1/2f1/4f1/8

15 

VC_ID

ONOC

Driver

Clock domains

Laser Source

f0

DC‐FIFO 1

Routing1 M

UX 

3x1

DEM

UX 

1x15

15 

1

Arbiter

Credits From Rx15

DEM

UX 

1x2

MUX 30

x1

Arbiter

Credit counter

Credit counter

15 

M15Data

÷2÷2

VC DECODER

MESO

Tx

comp TIA PD15CLK

÷2 ÷2

clk5 clk4 clk3 clk2

PLL

clk1

32x1 Binary Tree Serializer 15 Driver

DC‐FIFO 2

Routing2

DEM

UX 

1x15

MUX 

3x1

15 

15 

15 

M15CLK

VC DECODER

VC‐ID

DC‐FIFO 1

DC‐FIFO 15

Credits to Rx15

DC‐FIFO 29

DC‐FIFO 30

DEM

UX 

1x3

1x32 Binary Tree Deserializer 15clk5 clk4 clk3 clk2 clk1

÷2÷2 ÷2 ÷2 comp TIA PD15Data

15 

RX

f1f1/16 f1/2f1/4f1/8

15 

VC_ID

ONOC

Driver

Clock domains

Laser SourceT

X

RX

TX

RX

TX

RX

TX

RX

16x16 λ‐Router Topology16mm x 16mm optical layer

Page 43: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

0

2

4

6

8

10

12

1‐bit 2‐bits 4‐bits 1‐bit 2‐bits 4‐bits

Bridge CMOS Part Bridge ECL Part Thermal Tuning Tx‐Rx‐Laser

Energy‐per‐bit  (p

J/bit)

Hybrid10.94 pJ/bit

Fully‐CMOS1.23 pJ/bit

1 channelx 25 Gbit/s

4 channelsx 6.25 Gbit/s

2 channelsx 12.5 Gbit/s

1 channel x 40 Gbit/s

4 channels x 10 Gbit/s

2 channelsx 20 Gbit/s

@25 Gbit/s @40 Gbit/s

Fully‐CMOS1.6 pJ/bit Fully‐CMOS

1.15 pJ/bit

Hybrid8.69 pJ/bit Hybrid

8.31 pJ/bit

Experimental Results

100% bandwidth utilization

‐88.7%  ‐85.4% 

Energy efficiencies in the ballpark of 1 to 2 pJ/bit

are possible with more WDM channels, a trend that highersignaling speeds exacerbate

Page 44: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

TSV Laser Sources Total Power [W] SNR Manufacturingrequirements

25Gbit/s

40Gbit/s

25Gbit/s

40Gbit/s

25Gbit/s

40Gbit/s

25Gbit/s

40Gbit/s

1 channel 1216 2176 32 32 65.6 83.4 16.2 16.14 R = [5, 1, 25] um *Infeasible

R = [5, 1, 30] um*Up to 1 channel

R = [5, 0.25, 30] um*Up to 19 channels

2 channels 1216 2176 48 48 7.4 79.7 13.13 13.1

4 channels 2176 2176 80 80 9.6 11.01 8.8 8.8

Experimental Results

Network‐Level Trade‐Offs

Bridge Front‐EndArchitecture

25 Gbit/s

40 Gbit/s

ECL 130nmFD‐SOI 28nm

1 channel2 channels4 channels

Optical parallelism comes with cost and signal integrity concerns!

Page 45: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Vertical Integration Challenge

Page 46: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

The Design Space

ONoC topology design points stemdirectly from designers’ intuition

HOW TO «SYNTHESIZE» THE MOST EFFICIENT ONoC SOLUTION FOR THE 

REQUIREMENTS OF THE CONNECTIVITY PROBLEM AT HAND?

The design space is currentlylargely unknown

Major Requirements: start from a high-level description, operate on abstractions and refine them into an

actual implementation with components from a technology library.

Page 47: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Can we extend the paradigms and methodologies of EDA to the context of emergingsilicon nanophotonic interconnection networks?

High-level specification

Gate-Level Netlist

Mapped Gate-Level Netlist

Planar geometric shapes

Technology-independent Logic Library

Technology Library

I.

II.

III.

IV.

Switching Primitives Representation

Technology Mapping

Assignment of modulation carriers

Netlist connectivity

V.

VI.

VII.

Device Parameter Selection

Placement and routing

Physical Design

0 Routing Protocol SelectionRouting Protocol Selection

Design Automation Beyond E‐Roots

Page 48: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Can we extend the paradigms and methodologies of EDA to the context of emergingsilicon nanophotonic interconnection networks?

High-level specification

Gate-Level Netlist

Mapped Gate-Level Netlist

Planar geometric shapes

Technology-independent Logic Library

Technology Library

I.

II.

III.

IV.

Switching Primitives Representation

Technology Mapping

Assignment of modulation carriers

Netlist connectivity

V.

VI.

VII.

Device Parameter Selection

Placement and routing

Physical Design

0 Routing Protocol SelectionRouting Protocol Selection

Design Automation Beyond E‐Roots

Design automationshould not determinewhich technology to

pursue

Design automation can lead to concrete

evaluation of a new technology

State‐of‐the‐Art PIC Design Tools

Page 49: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

I.

II.

III.

IV.

Switching Primitives Representation

Technology Mapping

Selection of modulation carriers

Netlist connectivity

Can we understand all topology design pointsin the context of a unified design framework?

Can we populate the design space of wavelength-routed optical NoC topologies?

Front‐End Synthesis Methodology

Page 50: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Add function at target’ side.

The filter is used to implement both the drop function at initiator side and the add one at target side

Basic building block for the implementation of any wavelength-routed topology:the 1x2 Drop Filter

λi

On-resonance signalOff-resonance signal

Drop function at initiator side.

Sjλ1 λ2λ1 λ1

Basic PrimitiveDROP FUNCTION ADD FUNCTION

Page 51: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

SYNTHESIS METHODOLOGY1. Wavelength Resolution

Wavelength Resolution Graph (WRG) for a generic 4x4 WRONoC.

Each channel of the WDM input signal should be resolved so to be routed to a different output

2. Technology Mapping

ABCD

λi = λi

λi

λi

E.g., Grouping the 1x2 DFs into compact 2x2 photonicswitching elements (PSEs), from a technology library!

AB

CD

BC

AD AC

BD

λ2

3. Symbolic Wavelength AssignmentAssign a resonant wavelength to the MRRs

4. Topology ConnectionDraw the topology logic scheme. It’s a λ‐router! However, it is optimized wrt baseline: only 3 resonator types! .

λ2

Minimizeno. of MRR types

Constraint: avoid conflicts!

Constraint: Drop channels on rows only once!

Out1

Out2

Out3

Out4

Page 52: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Generic Topology from the Front-End Flow

These crossings should be considered as apparent, since at this stage we are drawing the logic topology, not the physical one.

Our synthesis methodology can potentially populate the complete design space of WRONoC topologies by spanning all possible technology mappings,

subject to the constraints of each stage for legal solutions.

Only with 2x2 PSEs, the number of WRONoC topologies in the design space amounts to[(n-1)!]n

A 4x4 WRONoC topology can be implemented in 1296 different ways

(1,2,3,4)A 

(1,2,3,4)B 

(1,2,3,4)C 

(1,2,3,4)D 

(1)B(2,3,4)A

(1)C(2,3,4)D

(1)D(2,3,4)C

(1)A(2,3,4)B

(2)A(1)D(3,4)C

(2)C(1)B(3,4)A

(2)B(1)C(3,4)D

(2)D(1)A(3,4)B

(1)C(2)B(3)A(4)D

(1)B(2)C(3)D(4)A

(1)A(2)D(3)C(4)B

(1)D(2)A(3)B(4)C

λ1 

λ1 

λ2 

λ2 

λ3 

λ3 

Generic Abstract Solutions

Page 53: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

I.

II.

III.

IV.

Front-End Methodology

V.

VI.

VII.

Device Parameter Selection

Placement and routing

Physical Design

LOGIC TOPOLOGY

PHYSICAL TOPOLOGY

Back‐End Synthesis Methodology

Page 54: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

λxλx

λi

λi What is the exact radius length of the MRR, typically in the range 5‐

20um? What is the exact value of the n wavelengths used by each initiator

in an n x n wavelength‐routed optical NoC? What is the maximum bit‐level communication parallelism on the 

I/O optical channels? Not just cost and reliability, but also feasibility!λi

DEVICE PARAMETER SELECTION

This is not just a refinement step, due to the ROUTING FAULT concern:It has implications on network‐level throughput and scalability

R1

λx

λx

λy

λz

λz

λy

Parallelism is 6 in PSEx

Page 55: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

λxλx

λi

λi What is the exact radius length of the MRR, typically in the range 5‐

20um? What is the exact value of the n wavelengths used by each initiator

in an n x n wavelength‐routed optical NoC? What is the maximum bit‐level communication parallelism on the 

I/O optical channels? Not just cost and reliability, but also feasibility!λi

DEVICE PARAMETER SELECTION

This is not just a refinement step, due to the ROUTING FAULT concern:It has implications on network‐level throughput and scalability

R1

λx

λx

λy

λz

λz

λy

Parallelism is 6 in PSExNO NO

Parallelism is 6 4 in PSEx Parallelism is 7 5 in PSEy

R1

R2

Page 56: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

λxλx

λi

λi What is the exact radius length of the MRR, typically in the range 5‐

20um? What is the exact value of the n wavelengths used by each initiator

in an n x n wavelength‐routed optical NoC? What is the maximum bit‐level communication parallelism on the 

I/O optical channels? Not just cost and reliability, but also feasibility!λi

DEVICE PARAMETER SELECTION

This is not just a refinement step, due to the ROUTING FAULT concern:It has implications on network‐level throughput and scalability

R1

λx

λx

λy

λz

λz

λy

Parallelism is 6 in PSExNO NO

Parallelism is 6 4 in PSEx Parallelism is 7 5 in PSEy

R1

R2

R1

Available parallelism is 6 4 3 in PSEx Available parallelism is 7 5 3 in PSEy Available parallelism is 9 7 in PSEz

R2

NO NONO NO NO

R3

Page 57: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

λxλx

λi

λi What is the exact radius length of the MRR, typically in the range 5‐

20um? What is the exact value of the n wavelengths used by each initiator

in an n x n wavelength‐routed optical NoC? What is the maximum bit‐level communication parallelism on the 

I/O optical channels? Not just cost and reliability, but also feasibility!λi

DEVICE PARAMETER SELECTION

This is not just a refinement step, due to the ROUTING FAULT concern:It has implications on network‐level throughput and scalability

R1

λx

λx

λy

λz

λz

λy

Parallelism is 6 in PSExNO NO

Parallelism is 6 4 in PSEx Parallelism is 7 5 in PSEy

R1

R2

R1

Available parallelism is 6 4 3 in PSEx Available parallelism is 7 5 3 in PSEy Available parallelism is 9 7 in PSEz

R2

NO NONO NO NO

R3As topology size increases, the proliferation of filter types and wavelength channels

may limit the availability of non‐overlapped transmission peaks, which may cause the topology to be practically infeasible

Electromagnetic Model

NO

NO

NO

NO

NO

NO

NONO

NONO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NONO

NONO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NONO

NONO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NO

NONO

NONO

NO

NO

NO

NO

NO

NO

NO

NO

Parallelism and Scalability Limitations

Page 58: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

R1

R1 + Rtol R1 ‐ Rtol

λ2,2λ3,1

There exists a post‐fabrication variation scenario thatends up in a routing fault

Even without overlapping, proximity raises optical crosstalk concerns

PARAMETER UNCERTAINTY

Variation interval λ ± ()

R1

R2

Conservative design‐for‐reliability constraint:Let us assign device parameters

and state an achievable bit‐level parallelismsuch that routing faults will not take place

under any variability scenario

We modelled the Ring radius/wavelength channel selection problem subject to routingfault avoidance as a Constrained Optimization Problem, and used ASP as declarative technology.

This is the first refinement step directly exposed to the underlying technology

Page 59: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

I.

II.

III.

IV.

Front-End Methodology

V.

VI.

VII.

Placement and routing

Physical Design

LOGIC TOPOLOGY

PHYSICAL TOPOLOGY

Back‐End Synthesis Methodology

Device Parameter Selection

Page 60: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Logic Topology

Optical Layer ‐ Layout Planning

Placement and Routing

Lots of unexpected waveguide crossings(which burden on the static power budget)

Electronic P&R tools cannot be reused here

We propose PROTON+, a tool for automatic placement and routing of ONoC topologies

(Collaboration with prof. Schlichtmann at TU Munich)The tool tries to strike a good balance between crossing losses and propagation losses, which 

might be conflicting objectives

Minimize waveguide length Minimize no. of crossings

Page 61: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

15

20

25

30

35

40

100/0 90/10 80/20 70/30 60/40 50/50 40/60 30/70 20/80 10 90 0/100max

imum

inse

rtion

loss

prop / cross

8x8-lambda-router8x8-GWOR

8x8-Std-Crossbar

Where Lp and Cp are approximate functions of path lengths and no. of crossings

By setting the weights of the objective function, the best physical mapping for the technology/topology at hand can be achieved

Placement: Non‐linear optimization problem solved with an IPMRouting: adaptation of the Lee’s algorithm «Maze Router»

Our objective functions minimizes the insertion loss across the lossiest path.This indirectly limits total laser power.

Physical Design Space Exploration

Page 62: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Layout of 16x16 λ‐Router with PROTON+

Hubs

Memory controller

Ins. Loss max = 44dB 255 crossings on the 

critical path 28636um waveguide

length on the criticalpath

24425 sec of CPU time 

(Intel Core 2 Quad CPU with 8GB RAMrunning at 2.33GHz)

Page 63: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Layout of 16x16 λ‐Router with PROTON+

Hubs

Memory controller

Ins. Loss max = 44dB 255 crossings on the 

critical path 28636um waveguide

length on the criticalpath

24425 sec of CPU time 

(Intel Core 2 Quad CPU with 8GB RAMrunning at 2.33GHz)

Page 64: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

24426

640

50001000015000200002500030000

16x16 λ‐Router

0005101520253035404550

8x8 λ‐Router 8x8 GWOR 8x8 StandardCrossbar

16x16 λ‐Router

Maximum insertion loss (dB)

Manual

PROTON [Boos+ICCAD'13]

PLATON

Proton v2.0 (PLATON) implements a force‐directed placement algorithm Better computation times, better insertion losses PLATON is well‐suited for large‐scale topologies, rather than for small‐scale ones

PROTONv2.0 (PLATON)

Prof. Ulf SchlichtmannTU Munich’s Placement and Routing Tools

Computation time

Page 65: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

There is a large variability in the design space: from 18 to 39 crossings! This raises the issue of placement-aware logic topology synthesis, completely new

discipline for optical NoCs. λ-Router and snake proposed in literature are not the best topologies from the critical path lengthviewpoint!

Design automation helps to get the most out of a technology

20 

40 

60 

80 

100 

120 

140 

160 

180

18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39 Critical path length(max. no. Of crossings) 

Number of Topologies 

Lambda‐Router  Snake 

Distribution of the critical path after physical mapping.

Memory ControllerGateway

Optical Layer4x4 ONoCs replicated 3x

Physical Design with Proton+

We exhaustively generated all 4x4 WRONoC topologies and mapped them with Proton+

Exists in LiteratureExists in Literature

Exploring the Design Space

Page 66: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Conservative PROCESS VARIATIONS Only 4x4 ONoCs are certainly feasible Multiple wavelength selection options

useless if uncertainty ranges are notreduced accordingly

Ideal Fabrication With overly fine step and large rings, 

the upper bound is roughly a 60x60 topology, with limited parallelismthough!

Achievable parallelism most sensitive to the incremental step of MRR radii

We performed device parameter selection to assess scalability of generic topologies

Fabricationoptions

Rmin Rstep Rmax

Ropt 5μm 1μm 25μm

R’opt 5μm 1μm 30μm

R’’opt 5μm 0.25μm 30μm

Radius selection rangeand 

Incremental step

Radius Tolerance: 10nm

Laser uncertainty: 0.5nm

Scalability

Topology Radix

Page 67: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Conclusions• High‐performance computing systems will be soon againinterconnect‐limited. Emerging technologies can be game changers.

• Time for a concrete evaluation of emerging silicon nanophotonicnetworks in small‐scale systems. How? By bridging the gap with system designers. 

• Horizontal integration gap:• ENoC‐ONoC bridge key to determining configuration of optical connections (data rate, parallelism), and its energy efficiency.

• 1‐2 pJ/bit communication can be realistically targeted at 40 Gbps connection rate, with 4 WDM channels@10Gbps in parallel (bridge in 28 nm CMOS). Signal integrityis an issue. 

• Vertical integration gap:• Design methods have been developed to populate the largely unknown design space of wavelength routed topologies.

• Early‐stage complete cross‐layer synthesis methodology defined.• More energy‐efficient topologies than existing ones in literature have been«synthesized».

• Design automation: an enabler for emerging technologies.

Page 68: EmergingSilicon NanophotonicNetworks: Time to Bridge the ...mpsoc.unife.it/~aistecs/AISTECSprint.pdf · AISTECS 2019 EmergingSilicon NanophotonicNetworks: Time to Bridge the Gap with

Acknowledgement

68