18
Fault-Tolerant Delay- Insensitive Inter-Chip Communication Yebin Shi Apt Group The University of Manches ter

Fault-Tolerant Delay-Insensitive Inter-Chip Communication

  • Upload
    hayes

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Fault-Tolerant Delay-Insensitive Inter-Chip Communication. Yebin Shi Apt Group The University of Manchester. Outline. SpiNNaker Inter-Chip interconnect Basic Transmitter and Receiver Potential Problems with the Designs Robust Transmitter and Receiver Future work and conclusion. - PowerPoint PPT Presentation

Citation preview

Page 1: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Yebin ShiApt Group

The University of Manchester

Page 2: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Outline

• SpiNNaker Inter-Chip interconnect• Basic Transmitter and Receiver • Potential Problems with the Designs• Robust Transmitter and Receiver • Future work and conclusion

Page 3: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Research Aims

• Investigate the impact of transient glitches at inter-chip wires on the interface circuits.

• Redesign the link interface circuits to increase glitch-resistance and avoid deadlock.

Page 4: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

SpiNNaker

Network infrastructure:– 6 bidirectional inter-chip links– delay-insensitive on-chip and

inter-chip communication– Packets are variable-length, serialized in 4-bit flits, with end-of-packet marker– 1 Gb/s throughput per link

Page 5: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Inter-Chip Communication

Inter-Chip Network:– 2of7 data encoding– 2-phase (NRZ) handshake– data and control in single stream

On-Chip Network:– 3of6 data encoding– 4-phase (RTZ) handshake– separate data and control channels

Page 6: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Link Transmitter

cd

Ctrl

di n[5: 0]

ctrl [2: 0]cd

dat_ack

ctrl _ack

eop_cd

dat_cd

cen

den

dout_pre[6: 0]

eop

code 36to

code 27pi pel i ne

0

Pi pel i ne

ack2

cd cd

d36[5:0] d27[6: 0]

RTZToNRZ

phaseconv

dout[5: 0]pi pel i ne1

- data channel: pipeline for code and phase conversion- ctrl channel: merge EoP symbol into the data stream

Page 7: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Link Receiver

Pipeline3

dat_ack

ctrl_ack

dout[5:0]

ctrl[2:0]

d36[5:0]

cd

C

cd

C

cd CPipeline4

C

din[6:0]

ack2T

cd

cdpipeline1 pipeline2

Code27to

code36

NRZto

RTZ+

pipeline0

cd

cd

eop_cd2

dat_cd2 dat_cd3dat_cd0

eop_cd0 cd1

- data channel: phase and code conversion pipeline- ctrl channel: Extract EoP symbols from stream

Page 8: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Glitch Impact on Simulation• Automatic packet data generation • CRC scheme included for result verification• Random generation of transient glitches

–injected onto the inter-chip link–Single Event Upset (SEU) scenarios

• Configurable frequency and duration of glitches–Frequency: up to ½ glitch/packet–duration scale: 0.1-2 ns

• Extensive simulation–a large number of densely packed glitches over 1M packets–speed-up fault simulation

Page 9: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Fault effects in the Transmitter

pi pe_en

dout[6:0]Pi pel i ne

1

ack2

cd cd

d27[6:0] RTZToNRZ

T

d27_o[6: 0]

eop_cddat_cd

Deadlock risks:– A transient glitch may corrupt a 2-of-7 symbol, leading to handshaking failure.– Phase-sensitive phase converter.– Independent reseting.

Page 10: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Fault Effects in the Receiver

D

EN

Q

D

EN

Q

di n[6: 0]

(2ph)

l at_q0[6: 0]

l at_q1[6: 0]

cdc

T

acki

(4ph)

dout[6: 0]

(4ph)

acko

(2ph)

Deadlock risks:– A corrupted 2-of-7 symbol may prevent completion of conversion to 3of6.– Independent reseting.

Page 11: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Deadlock in Receiver

- a glitch occurs when dout_cd is in transit - a wrong value stored in the bottom latch- a conversion failure for next data conversion

D

EN

Q

D

EN

Q

di n[6: 0]

(2ph)

l at_q0[6: 0]

l at_q1[6: 0]

cdc

T

acki

(4ph)

dout[6: 0]

(4ph)

acko

(2ph)

Page 12: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Robust 2-ph to 4-ph Conversion

dout

(4-ph)

a

b

clrn

din

(2-ph)

c

d

11/ 0

01/ 0

10/ 0

00/ 1

cl rn-cl rn+ & !di n

clrn

+ &

din

di n-

din+

cl rn-1

2

phase-insensitive converter:– Used in 2-phase ack input to the Transmitter.– Used in 2-phase data inputs to the Receiver.

reset signal not shown

Page 13: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Robust Receiver Design

pi pel i ne3 pi pel i ne4pi pel i ne1

cd

pi pel i ne5

ctrl _gen

cd

cd

cd

eop_cd dat_cd

NRZ-to-RTZ

T

rstn

di n[6:0] dout[5: 0]

ctrl _acki

dat_ackipi pe1_en

pi pe2_en

pi pe3_en

pi pel i ne2code 27

Tocode 36

pi pe1_cdack2

pi pe1_di n pi pe2_di n pi pe3_di n

eop

– Phase-insensitive phase converter– Enhanced code converter and completion detector – Independent reset capability

Page 14: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Receiver Phase Converterdi n[6: 0]

dout ( pi pe1_di n[6: 0])

cl rnac0

++ac1

-

acki ( pi pe1_cd)

rst to 1

rst to 0

Cpi _conv_0

pi _conv_6

di n

cl rn

dout

di n

cl rn

dout

acki also triggers the ack signal back to the transmitter

Page 15: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Code conversion with Priority Arbitration

C0

C1

Mutex

Mutex

C15 Mutex

C17

din[6:0] onehot[16:0]

3of6Enc

dout[5:0]

req

C20

– support full set of 2-of-7 code– convert invalid symbols into a valid one– stop propagation of invalid symbols containing more than 2 transitions

Page 16: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Independent Reset

– An extra, possibly redundant, transition is created after reset in case the Tx is waiting for an

acknowledge token.

– The phase-insensitive converter for ack2 in TX absorbs the extra token if it is not needed.

T

rstn

ack2

Page 17: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Simulation results

Simulation results for 1 million packets sent

Items \ Designs Original I/F Proposed I/F

Glitches 478,280 390,357

Successfully Received Packets

916,684 863,182

Deadlock 7,632 7

Performance (ns/symbol)

17 15

Area(um2) 8219.7 8555.7

– Significantly reduced deadlock occurrence.– worse packet loss.– trivial area overhead. – increased throughput.

Page 18: Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Conclusions and Future work • Enhance the resistance to transient glitches in inter-chip links by replacing phase converters.• Avoid deadlocks by hardening completion detection modules in the receiver.• Remove corrupt symbols by applying an arbitration scheme for symbol conversions.• Allow independent chip resets without introducing deadlocks by sending safe, possibly redundant tokens (data or ack) on reset.

• A generalized approach for circuit evaluation, including the computation of safety margins.• Investigation into the impact of back-pressure on glitch resistance.