Upload
cian
View
39
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Synthesis of synchronous elastic architectures. Jordi Cortadella (Universitat Polit è cnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel Corp.). Network of Computing Units. Out. In. B3. B1. B2. Network of Computing Units. Out. In. B3. B1. B2. - PowerPoint PPT Presentation
Citation preview
Synthesis of synchronousSynthesis of synchronouselastic architectureselastic architectures
Jordi Cortadella (Universitat PolitJordi Cortadella (Universitat Politèècnica Catalunya)cnica Catalunya)
Mike Kishinevsky (Intel Corp.) Mike Kishinevsky (Intel Corp.)
Bill Grundmann (Intel Corp.)Bill Grundmann (Intel Corp.)
Network of Computing UnitsNetwork of Computing Units
InIn OutOut
B1 B3
B2
Network of Computing UnitsNetwork of Computing Units
InIn OutOut
B1 B3
B2
Network of Computing UnitsNetwork of Computing Units
InIn OutOut
B1 B3
B2
Latency-insensitive (elastic) systemLatency-insensitive (elastic) system
InInOutOut
B1 B3
B2
Every block onlyEvery block onlymakes one stepmakes one step
when all inputs are validwhen all inputs are valid
WhyWhy
ScalableScalable
Modular (Plug & Play)Modular (Plug & Play)
Tolerance to variable latencyTolerance to variable latency– CommunicationCommunication– ComputationComputation
Not asynchronousNot asynchronous– Use existing design paradigmsUse existing design paradigms– CAD toolsCAD tools
OutlineOutline
The cost of elasticityThe cost of elasticity
SELF: an elastic protocolSELF: an elastic protocol– Basic implementation (linear pipelines)Basic implementation (linear pipelines)– General netlists (forks and joins)General netlists (forks and joins)– Formal models and verificationFormal models and verification
Synthesis of elastic architecturesSynthesis of elastic architectures
Related workRelated work
Elastic blockElastic block
Data Data
Valid ValidStop Stop
Control
CoreCore
CLK
Gated clockGated clock
What’s the cost ofWhat’s the cost ofelasticity?elasticity?
Communication channelCommunication channelreceiversender
Data Data
Long wires: slow transmission
Pipelined communicationPipelined communicationsender receiver
DataData
sender receiver
DataData
Pipelined communicationPipelined communication
sender receiver
DataData
How about if the sender does not always send valid data?
Pipelined communicationPipelined communication
The Valid bitThe Valid bitsender receiver
Data Data
Valid Valid
The Valid bitThe Valid bitsender receiver
Data
Valid
Data
Valid
The Valid bitThe Valid bitsender
Data
Valid
receiver
Data
Valid
The Valid bitThe Valid bitsender
Data
Valid
receiver
Data
Valid
Data
Valid
The Valid bitThe Valid bitsender receiver
Data
Valid
How about if the receiver is not always ready ?
The Stop bitThe Stop bit
0000000000
sender
Data
Valid
Stop
receiver
Data
Valid
Stop
The Stop bitThe Stop bit
1111000000
sender
Data
Valid
Stop
receiver
Data
Valid
Stop
The Stop bitThe Stop bit
1111110000
sender
Data
Valid
Stop
receiver
Data
Valid
Stop
The Stop bitThe Stop bit
1111111111
sender
Data
Valid
Stop
receiver
Data
Valid
Stop
Back-pressureBack-pressure
The Stop bitThe Stop bit
1100000000
sender
Data
Valid
Stop
receiver
Data
Valid
Stop
Long combinational path
Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)
main main main
aux aux aux
shell
pearl
receiver
shell
pearl
sender
V
S
V
S
V
S
V
S
Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)
main main main
aux aux aux
shell
pearl
receiver
shell
pearl
sender
V
S
V
S
V
S
V
S
Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)
main main main
aux aux aux
shell
pearl
receiver
shell
pearl
sender
V
S
V
S
V
S
V
S
Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)
main main main
aux aux aux
shell
pearl
receiver
shell
pearl
sender
V
S
V
S
V
S
V
S
Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)
main main main
aux aux aux
shell
pearl
sender
shell
pearl
receiver
V
S
V
S
V
S
V
S
Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)
main main main
aux aux aux
shell
pearl
sender
shell
pearl
receiver
V
S
V
S
V
S
V
S
Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)
main main main
aux aux aux
shell
pearl
sender
shell
pearl
receiver
V
S
V
S
V
S
V
S
Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)
main main main
aux aux aux
shell
pearl
sender
shell
pearl
receiver
V
S
V
S
V
S
V
S
Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)
main main main
aux aux aux
shell
pearl
receiver
shell
pearl
sender
• Handshakes with short wires• Double storage required
V
S
V
S
V
S
V
S
Proposal: an elastic protocolProposal: an elastic protocol
SELF (Synchronous ELastic Flow)SELF (Synchronous ELastic Flow)
Simple and provably correctSimple and provably correct
Data-path with no overhead in:Data-path with no overhead in:– AreaArea– LatencyLatency– EnergyEnergy
Negligible control overheadNegligible control overhead
Fine-grain elasticityFine-grain elasticity
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
FF FF
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
H L H L
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
H L H L
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
H L H L
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
H L H L
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
H L H L
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
H L H L
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
H L H L
Flip-flops already have aFlip-flops already have adouble storage capability, but …double storage capability, but …
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
H L H L
Not allowed in conventionalNot allowed in conventionalFF-based design !FF-based design !
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
1 cycle
H L LH
Let’s make the master/slave latches independent
Flip-flops vs. latchesFlip-flops vs. latchessender receiver
H L H L
½ cycle ½ cycle
Let’s make the master/slave latches independent
Only half of the latches (H or L) can move tokens
Elastic buffer keeps dataElastic buffer keeps datawhile stop is in flightwhile stop is in flight
W1R1
W2R1
W1R2
W2R2
Cannot be done withSingle Edge Flopswithout double pumping
Use latches inside MS
Carloni’s relay station belongs to this class
SELF (linear communication)SELF (linear communication)sender receiver
V V V V
S S S S
En En En En
1 1
Data
Valid
Stop
Data
Valid
Stop
1 1
SELFSELFsender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
11
00
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
11
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
11
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
11
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
11
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
00
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
00
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
00
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
00
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
00
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
11
11
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
11
11
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
11
11
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
11
11
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
11
11
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
11
11
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
11
11
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
11
11
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
11
11
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
11
00
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
11
00
Data
Valid
Stop
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
11
00
Data
Valid
Stop
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
11
00
Data
Valid
Stop
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
11
00
Data
Valid
Stop
Data
Valid
Stop
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
11
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
11
00
SELFSELF
sender receiver
V V V V
S S S S
En En En En
Data
Valid
Stop
Data
Valid
Stop
11
00
SELFSELF
The protocolThe protocol
SenderSender ReceiverReceiver
DataData
ValidValid
StopStop
Idle cycle: Valid = 0
00
The protocolThe protocol
SenderSender ReceiverReceiver
DataData
ValidValid
StopStop
Transfer cycle: Valid = 1 Stop = 0
11
00
DD
The protocolThe protocol
SenderSender ReceiverReceiver
DataData
ValidValid
StopStop
Retry cycle: Valid = 1 Stop = 1
11
11
DD
Persistency: G [ V Persistency: G [ V S S (Data=D) (Data=D) NextNext (V (V Data=D) ] Data=D) ]
RetryRetry
TransferTransfer
The protocolThe protocol
SenderSender ReceiverReceiver
DataData
ValidValid
StopStop
DataData
ValidValid
StopStop
* D D * C C C B * A* D D * C C C B * A
0 1 1 0 1 1 1 1 0 10 1 1 0 1 1 1 1 0 1
0 0 1 0 0 1 1 0 0 00 0 1 0 0 1 1 0 0 0
Elastic Half BufferElastic Half Buffer
SSii
EnEnii
VVii
SSi-1i-1
VVi-1i-1
DataData Latc
hLa
tch
EHBEHB
JoinJoin
EHB
+
V1
V2
S1
S2
V
S
EHB
EHB
Lazy ForkLazy Fork
V1
V2
S1
S2
V
S
Eager ForkEager Fork
V1
V2
S1
S2
^̂
^̂
V
S
Elastic combinational pathsElastic combinational paths
ForkJoin
Join / Fork
Wire
Wire
EBEB
EBEBEBEB
EBEB
Elastic combinational pathsElastic combinational paths
ForkJoin
Join / Fork
Wire
Wire
EBEB
EBEBEBEB
EBEB
Enable signalEnable signalto data latchesto data latches
Elastic combinational pathsElastic combinational paths
ForkJoin
Join / Fork
Wire
Wire
EBEB
EBEBEBEB
EBEB
Elastic buffer: formal modelElastic buffer: formal model
…i i+1 i+ki i+1 i+k
rd wr
Dout
Vout
Sout
Din
Vin
Sin
Buffer [ 0.. ]
Initial state: rd = wr = 0
Invariant: wr rd
Elastic buffer: formal modelElastic buffer: formal model
…i i+1 i+ki i+1 i+k
rd wr
Dout
Vout
Sout
Din
Vin
Sin
Liveness properties (finite unbounded latencies)
• Finite forward latency: G (rd wr F Vout)
• Finite backward latency : G( Sout F Sin)
Formal verificationFormal verification
…i i+1 i+ki i+1 i+k
rd wr
Dout
Vout
Sout
Din
Vin
Sin
Din
Vin
Sin
Dout
Vout
Sout
Implementation
Formal verificationFormal verification
The abstract FSM model is appropriate for The abstract FSM model is appropriate for compositional verificationcompositional verification
Verification of implementations with Verification of implementations with model model checkingchecking (1-bit abstractions of the datapath) (1-bit abstractions of the datapath)
– LTL specs + NuSMVLTL specs + NuSMV
– Buffer is a refinement of the specBuffer is a refinement of the spec– In-order data-transmissionIn-order data-transmission– Correct synchronization of fork/join structuresCorrect synchronization of fork/join structures– Absence of deadlocksAbsence of deadlocks
Observational equivalenceObservational equivalence
D: a b c d e f g h i j k …D: a b c d e f g h i j k …
Synchronous:
Elastic:
D: a a b b b c d e e f g g h i i i j k …D: a a b b b c d e e f g g h i i i j k …En: 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1 …En: 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1 …
ElasticizationElasticization
Synchronous Elastic
CLKCLK
CLKCLK
PC
IF/ID ID/EX EX/MEM MEM/WB
JJOOIINN
JJOOIINN
FFOORRKK
FORKFORK
V
S
CLKCLK
V
S
V
S
V
S
V
S
JOIN
JOIN
FORK
FORK
1
0
CLKCLK
1
0
1
0
1
0
1
0
JOIN
JOIN
FORK
FORK
1
0
CLKCLK
1
0
1
0
1
0
1
0
JOIN
JOIN
FORK
FORK 0
0
1
0
1
0
1
0
1
0
1
0
Elastic control layerGeneration of gated clocks
CLKCLK
Variable-latency UnitsVariable-latency Units
[0 - k] cycles
[0 - k] cycles
VS VS
donego
Variable-latency unitsVariable-latency units
Telescopic units:Telescopic units:– 1 cycle for fast operations1 cycle for fast operations– 2 cycles for slow operations2 cycles for slow operations
Examples:Examples:– Short / long additions (carry propagation)Short / long additions (carry propagation)– A A ×× 0, A / 1 0, A / 1– Dynamic changes in latencyDynamic changes in latency
(fast if cold, slow if hot)(fast if cold, slow if hot)
Microarchitectural explorationMicroarchitectural exploration
Bubble insertion + Variable-latency unitsBubble insertion + Variable-latency units
– May improve performanceMay improve performanceMore bubbles but reduces cycle timeMore bubbles but reduces cycle time
– Reduce powerReduce powerUnits designed for most frequent input dataUnits designed for most frequent input data
Exploration at fine-granularityExploration at fine-granularity
Some related workSome related workAsynchronous designAsynchronous design– Micropipelines (Sutherland)Micropipelines (Sutherland)– Rings (Williams, Sparso)Rings (Williams, Sparso)– CHP and slack-elasticity (Martin, Burns, Manohar et al.)CHP and slack-elasticity (Martin, Burns, Manohar et al.)
Latency insensitive designLatency insensitive design– Carloni and a few follow-ups (large overhead)Carloni and a few follow-ups (large overhead)– Wire pipelining: Svensson, Nookala, Casu, … Wire pipelining: Svensson, Nookala, Casu, …
Interlock pipelinesInterlock pipelines (H. Jacobson et al.) (H. Jacobson et al.)
De-synchronizationDe-synchronization– J. Cortadella et al.J. Cortadella et al.– V. VarshavskyV. Varshavsky
Synchronous implementations of CSPSynchronous implementations of CSP – J. O’Leary et al.J. O’Leary et al.– A. Peeters et al.A. Peeters et al.
SummarySummary
SELF: a specific protocol and implementation for elastic SELF: a specific protocol and implementation for elastic systems with systems with very small overheadvery small overhead buffering buffering
Compositional theoryCompositional theory proving correctness proving correctness (Krstic et al., FMCAD’06)(Krstic et al., FMCAD’06)
Library of controllersLibrary of controllers has been designed and their has been designed and their correctness verifiedcorrectness verified
Elasticization CADElasticization CAD in progress in progress
New New micro-architectural opportunitiesmicro-architectural opportunities based on bubbles based on bubbles and variable latency unitsand variable latency units