27
1 Clockless Logic: Clockless Logic: Dynamic Logic Pipelines Dynamic Logic Pipelines (contd.) (contd.) Drawbacks of Williams’ PS0 Pipelines Drawbacks of Williams’ PS0 Pipelines Lookahead Pipelines Lookahead Pipelines

1 Clockless Logic: Dynamic Logic Pipelines (contd.) Drawbacks of Williams’ PS0 Pipelines Lookahead Pipelines

  • View
    232

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

1

Clockless Logic:Clockless Logic:Dynamic Logic Pipelines (contd.)Dynamic Logic Pipelines (contd.)

Drawbacks of Williams’ PS0 PipelinesDrawbacks of Williams’ PS0 Pipelines Lookahead PipelinesLookahead Pipelines

Page 2: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

2

Drawbacks of PSO PipeliningDrawbacks of PSO Pipelining1.1. Poor throughput:Poor throughput:

long cycle time: 6 events per cyclelong cycle time: 6 events per cycle data “tokens” are forced far apart in timedata “tokens” are forced far apart in time

2.2. Limited storage capacity:Limited storage capacity: max only 50% of stages can hold distinct tokensmax only 50% of stages can hold distinct tokens data tokens must be separated by at least one data tokens must be separated by at least one

spacerspacer

Our Research Goals: Our Research Goals: address both issuesaddress both issues still maintain very low latencystill maintain very low latency

Page 3: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

3

Recent ApproachesRecent Approaches3 novel styles for high-speed async pipelining:3 novel styles for high-speed async pipelining:

MOUSETRAP Pipelines MOUSETRAP Pipelines [Singh/Nowick, TAU-00, ICCD-[Singh/Nowick, TAU-00, ICCD-01]01]

““Lookahead Pipelines”Lookahead Pipelines” (LP) (LP) [Singh/Nowick, Async-00][Singh/Nowick, Async-00] ““High-Capacity Pipelines”High-Capacity Pipelines” (HC) (HC) [Singh/Nowick, [Singh/Nowick,

WVLSI-00]WVLSI-00]

Goal:Goal: significantly improve throughput of PS0significantly improve throughput of PS0

Two Distinct Strategies:Two Distinct Strategies: LP: LP: introduceintroduce protocol optimizations protocol optimizations

““shave off”shave off” components from critical cycle components from critical cycle

HC: HC: fundamentally new protocolfundamentally new protocolgreater concurrency: “loosely-coupled” stagesgreater concurrency: “loosely-coupled” stages

Page 4: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

4

OutlineOutline New Asynchronous Pipelines: New Asynchronous Pipelines:

MOUSETRAP PipelinesMOUSETRAP Pipelines LLookahead ookahead PPipelines (LP)ipelines (LP) HHigh-igh-CCapacity Pipelines (HC)apacity Pipelines (HC) Dynamic circuit styleDynamic circuit style

Static circuit styleStatic circuit style

Page 5: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

5

Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #1#1Use non-neighbor communication:Use non-neighbor communication:

stage receives information stage receives information from from multiple later multiple later stagesstages

allows allows “early evaluation” “early evaluation”

Benefit:Benefit: stage gets stage gets head-starthead-start on next on next

cyclecycle

Page 6: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

6

Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #2#2Use early completion detection:Use early completion detection:

completion detector completion detector moved before stagemoved before stage (not after) (not after) stage indicatesstage indicates “early done”“early done” in parallel with in parallel with

computationcomputation

Benefit:Benefit: again, stage gets again, stage gets head-starthead-start on on

next cyclenext cycle

early completion detectorearly completion detector

Page 7: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

7

Lookahead Pipelines: OverviewLookahead Pipelines: Overview5 New Designs:5 New Designs:

““Dual-Rail” Data Signaling:Dual-Rail” Data Signaling: LP3/1:LP3/1: “early evaluation”“early evaluation” LP2/2:LP2/2: “early done”“early done” LP2/1:LP2/1: “early evaluation” + “early done”“early evaluation” + “early done”

““Single-Rail” Bundled-Data Signaling:Single-Rail” Bundled-Data Signaling: LPLPSRSR2/2:2/2: “early done”“early done”

LPLPSRSR2/1:2/1: “early evaluation” + “early done”“early evaluation” + “early done”

Page 8: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

8

Optimization = Optimization = “early evaluation”“early evaluation” each stage has two control inputs: from stages N+1 and N+2each stage has two control inputs: from stages N+1 and N+2

Idea: Idea: shorten precharge phaseshorten precharge phase terminate precharge terminate precharge early:early: when N+2 is done evaluating when N+2 is done evaluating

Dual-Rail Design #1: Dual-Rail Design #1: LP3/1LP3/1

Datain

Dataout

PCPC EvalEval

From N+2From N+2From N+2From N+2

NN N+1N+1 N+2N+2

ProcessingBlock

CompletionDetector

Page 9: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

9

LP3/1 ProtocolLP3/1 Protocol PRECHARGEPRECHARGE N:N: when N+1 completes when N+1 completes

evaluationevaluation EVALUATEEVALUATE N:N: whenwhen N+2N+2 completes completes

evaluationevaluation

New!New!

11 22 33

Enables “early evaluation!”Enables “early evaluation!”

44

N evaluatesN evaluates N+1 evaluatesN+1 evaluates

N+2 indicates “done”N+2 indicates “done”

N+2 evaluatesN+2 evaluates

NN N+1N+1 N+2N+2

N+1 indicates “done”N+1 indicates “done”

33

Page 10: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

10

PS0PS0PS0PS0

LP3/1LP3/1LP3/1LP3/1

LP3/1: Comparison with PS0LP3/1: Comparison with PS0

55

44

4466

NN N+1N+1 N+2N+2

NN N+1N+1 N+2N+2

Enables “early evaluation!”Enables “early evaluation!”

11

11

evaluatesevaluates

evaluatesevaluates

22

22

evaluatesevaluates

evaluatesevaluates

33

33

evaluatesevaluates

evaluatesevaluatesOnly 4 events in cycle!Only 4 events in cycle!

6 events in cycle6 events in cycle

PRECHARGE N:PRECHARGE N: when N+1 when N+1completes evaluationcompletes evaluationPRECHARGE N:PRECHARGE N: when N+1 when N+1completes evaluationcompletes evaluation

33

indicates “done”indicates “done”

indicates “done”indicates “done”

33

EVALUATE N:EVALUATE N: when N+2 completes evaluation when N+2 completes evaluationEVALUATE N:EVALUATE N: when N+2 completes evaluation when N+2 completes evaluation

EVALUATE N:EVALUATE N: when N+1 completes precharging when N+1 completes prechargingEVALUATE N:EVALUATE N: when N+1 completes precharging when N+1 completes precharging

Page 11: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

11

11 22 33

44

LP3/1 PerformanceLP3/1 Performance

DETECTEVAL TT 3Cycle Time =Cycle Time =

saved pathsaved path

Savings over PS0:Savings over PS0: 1 Precharge + 1 Completion Detection1 Precharge + 1 Completion Detection

Page 12: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

12

LP3/1: Inside a StageLP3/1: Inside a Stage

Precharge Precharge whenwhen PC=1PC=1(and Eval=0)(and Eval=0)

Evaluate Evaluate “early”“early” whenwhen Eval=1Eval=1(or PC=0)(or PC=0)

PC (From Stage N+1)PC (From Stage N+1)Eval (From Stage N+2)Eval (From Stage N+2)

NANDNAND

A NAND gate mergesA NAND gate merges2 control inputs:2 control inputs:

Problem: Problem: “early”“early” Eval=1Eval=1 is non- is non-persistent!persistent!

may be de-asserted may be de-asserted beforebefore stage completes stage completes evaluation!evaluation!

Problem: Problem: “early”“early” Eval=1Eval=1 is non- is non-persistent!persistent!

may be de-asserted may be de-asserted beforebefore stage completes stage completes evaluation!evaluation!

Merging 2 Control Inputs:Merging 2 Control Inputs:

““early Eval”early Eval”

““old Eval”old Eval”

Page 13: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

13

LP3/1 Timing Constraints: LP3/1 Timing Constraints: ExampleExample

Observation:Observation: PC=0PC=0 soon aftersoon after Eval=1, Eval=1, and is persistentand is persistent

Solution:Solution: no change!no change!use PC as safeuse PC as safe “takeover”“takeover” for Eval!for Eval!

Timing Constraint:Timing Constraint: PC=0PC=0 must arrivemust arrive beforebefore Eval de-assertedEval de-assertedsimple one-sided timing requirementsimple one-sided timing requirementother constraints as well… all easily satisfied in practiceother constraints as well… all easily satisfied in practice

PC (From Stage N+1)PC (From Stage N+1)Eval (From Stage N+2)Eval (From Stage N+2)

NANDNAND

Problem (cont.):Problem (cont.): “early”“early” Eval=1Eval=1 non-persistent non-persistent

Page 14: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

14

Dual-Rail Design #2: Dual-Rail Design #2: LP2/2LP2/2Optimization = Optimization = “early done”“early done”

Idea: move completion detector Idea: move completion detector beforebefore processing processing blockblockstage indicates whenstage indicates when “about to”“about to” precharge/evaluateprecharge/evaluate

ProcessingBlock

“early” Completion

Detector

Datain

Dataout

“early done”

Page 15: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

15

LP2/2 Completion DetectorLP2/2 Completion DetectorModified completion detectors needed:Modified completion detectors needed:

DoneDone=1=1 when stage starts evaluating, and inputs valid when stage starts evaluating, and inputs valid DoneDone=0=0 when stage starts precharging when stage starts precharging

asymmetric C-elementasymmetric C-element

CCDoneDone

ORORbitbit00

ORORbitbit11

ORORbitbitnn

++++++

PCPC

Page 16: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

16

11 22

44

LP2/2 ProtocolLP2/2 ProtocolCompletion Detection:Completion Detection:

performedperformed in parallel in parallel with evaluation/precharge of with evaluation/precharge of stagestage

N evaluatesN evaluates N+1 evaluatesN+1 evaluates

NN N+1N+1 N+2N+2

22

““early done”early done”of N+1 evalof N+1 eval

33

33

““early done”early done”of N+2 evalof N+2 eval

““early done”early done”of N+1 prechof N+1 prech

Page 17: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

17

LP2/2 PerformanceLP2/2 Performance

11 22

3344

LP2/2 savings over PS0: LP2/2 savings over PS0: 1 Evaluation + 1 Precharge1 Evaluation + 1 Precharge

DETECTEVAL TT 22Cycle Time =Cycle Time =

Page 18: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

18

Dual-Rail Design #3: Dual-Rail Design #3: LP2/1LP2/1Hybrid of LP3/1 and LP2/2.Hybrid of LP3/1 and LP2/2. Combines: Combines:

early evaluationearly evaluation of LP3/1 of LP3/1 early doneearly done of LP2/2 of LP2/2

DETECTEVAL TT 2Cycle Time =Cycle Time =

Page 19: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

19

Lookahead Pipelines: OverviewLookahead Pipelines: Overview5 New Designs:5 New Designs:

““Dual-Rail” Data Signaling:Dual-Rail” Data Signaling: LP3/1:LP3/1: “early evaluation”“early evaluation” LP2/2:LP2/2: “early done”“early done” LP2/1:LP2/1: “early evaluation” + “early done”“early evaluation” + “early done”

““Single-Rail” Bundled-Data Signaling:Single-Rail” Bundled-Data Signaling: LPLPSRSR2/2:2/2: “early done”“early done”

LPLPSRSR2/1:2/1: “early evaluation” + “early done”“early evaluation” + “early done”

Page 20: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

20

Single-Rail Design: Single-Rail Design: LPLPSRSR2/12/1Derivative of LP2/1, adapted to single-rail:Derivative of LP2/1, adapted to single-rail:

bundled-data: bundled-data: matched delaysmatched delays instead of completion instead of completion detectorsdetectors

delaydelay delaydelay delaydelay

““Ack”Ack” to previous stages is to previous stages is “tapped off early”“tapped off early”once in evaluate (precharge), dynamic logic insensitive to input changesonce in evaluate (precharge), dynamic logic insensitive to input changes

Page 21: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

21

PC and Eval are combined exactly as in LP3/1PC and Eval are combined exactly as in LP3/1

Inside an LPInside an LPSRSR2/1 Stage2/1 Stage

““done”done” generated by an generated by an asymmetric C- asymmetric C-element element

donedone=1=1 when stage evaluates, and when stage evaluates, and data inputs data inputs validvalid donedone=0=0 when stage precharges when stage precharges

PC (From Stage N+1)PC (From Stage N+1)

Eval (From Stage N+2)Eval (From Stage N+2)

NANDNAND

aCaC++

““ack”ack”

““req” inreq” in

data indata in data outdata out

““req” outreq” out

matcheddelay

donedone

Page 22: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

22

LPLPSRSR2/1 Protocol2/1 Protocol

11 22

33

aCEVAL TT 2Cycle Time =Cycle Time =

element-C asymmetric throughDelay aCT

N evaluatesN evaluates N+2 evaluatesN+2 evaluates

N+2 indicates “done”N+2 indicates “done”

NN N+1N+1 N+2N+2

22

N+1 evaluatesN+1 evaluates

N+1 indicates “done”N+1 indicates “done”

Page 23: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

23

ResultsResultsDesigned/simulated FIFO’s for each pipeline Designed/simulated FIFO’s for each pipeline

style style

Experimental Setup:Experimental Setup: design: 4-bit wide, 10-stage FIFOdesign: 4-bit wide, 10-stage FIFO technology: 0.6technology: 0.6 HP CMOS HP CMOS operating conditions: 3.3 V and 300°Koperating conditions: 3.3 V and 300°K

Page 24: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

24

Throughput

Design Mega items/sec Improvement (%)

PS0 420 -

LP3/1 590 40%

LP2/2 760 79%

LP2/1 860 102%

LPSR2/1 1208 188%

dual-raildual-rail

single-railsingle-rail

Comparison with Williams’ PS0Comparison with Williams’ PS0

LP2/1: LP2/1: >2X faster>2X faster than Williams’ PS0 than Williams’ PS0 LPLPSRSR2/1: 2/1: 1.2 Giga items/sec1.2 Giga items/sec

Page 25: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

25

Comparison: Comparison: LPLPSRSR2/1 vs. Molnar 2/1 vs. Molnar FIFO’sFIFO’sLPLPSRSR2/1 FIFO: 1.2 Giga items/sec2/1 FIFO: 1.2 Giga items/sec

Adding logic processing to FIFO:Adding logic processing to FIFO:simply fold logicsimply fold logic into dynamic gate into dynamic gate little overhead little overhead

Comparison with Molnar FIFO’s:Comparison with Molnar FIFO’s: asp* FIFO:asp* FIFO: 1.1 Giga items/sec 1.1 Giga items/sec

more complex timing assumptions more complex timing assumptions not easily not easily formalizedformalized

requires explicit latches, separate from logic!requires explicit latches, separate from logic!adding logic processing adding logic processing betweenbetween stages stages significant significant

overheadoverhead

micropipeline:micropipeline: 1.7 Giga items/sec 1.7 Giga items/sec two parallel FIFO’s, each only 0.85 Giga/sectwo parallel FIFO’s, each only 0.85 Giga/secvery expensive transition latchesvery expensive transition latchescannot add logic processing to FIFO!cannot add logic processing to FIFO!

Page 26: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

26

datapath widthdatapath width= 32 dual-rail bits!= 32 dual-rail bits!

Practicality of Gate-Level Practicality of Gate-Level PipeliningPipeliningWhen datapath is wide:When datapath is wide:

Can often split into narrow Can often split into narrow “streams”“streams”

comp. comp. ddet. et. ffairly airly low cost!low cost!

Use Use “localized”“localized” completion detector completion detector for each stream:for each stream:

need to examine only a few bitsneed to examine only a few bits small fan-insmall fan-in

send “done” to only a few gatessend “done” to only a few gates small fan-outsmall fan-out

donedone

fan-out=2fan-out=2

comp. det.comp. det.fan-in = 2fan-in = 2

Page 27: 1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines

27

ConclusionsConclusions

Introduced several new dynamic pipelines:Introduced several new dynamic pipelines: Use Use two novel protocols:two novel protocols:

– ““early evaluation”early evaluation”– ““early done”early done”

Especially suitable for Especially suitable for fine-grain (gate-level) pipeliningfine-grain (gate-level) pipelining

Very high throughputs obtained:Very high throughputs obtained:– dual-rail:dual-rail: >2X improvement>2X improvement over Williams’ PS0 over Williams’ PS0– single-rail:single-rail: 1.2 Giga items/second1.2 Giga items/second in 0.6 in 0.6 CMOS CMOS

Use easy-to-satisfy, one-sided timing constraintsUse easy-to-satisfy, one-sided timing constraints

Robustly handle arbitrary-speed environmentsRobustly handle arbitrary-speed environments– overcome a major shortcoming of Williams’ PS0 pipelinesovercome a major shortcoming of Williams’ PS0 pipelines

Recent Improvement: Even faster single-rail pipeline Recent Improvement: Even faster single-rail pipeline (WVLSI’00)(WVLSI’00)