41
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University Reading 4

Practical Design and Performance Evaluation of Completion Detection Circuits

  • Upload
    karis

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Practical Design and Performance Evaluation of Completion Detection Circuits. Fu-Chiung Cheng Department of Computer Science Columbia University. Reading 4. Outline. Motivation Previous Work New Completion Detection Circuit Performance Evaluation Conclusion. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Practical Design and Performance Evaluation of Completion Detection Circuits

1

Practical Design and Performance Evaluation of

Completion Detection Circuits

Fu-Chiung Cheng

Department of Computer ScienceColumbia University

Reading 4

Page 2: Practical Design and Performance Evaluation of Completion Detection Circuits

2

Outline

• Motivation

• Previous Work

• New Completion Detection Circuit

• Performance Evaluation

• Conclusion

Page 3: Practical Design and Performance Evaluation of Completion Detection Circuits

Motivation• Circuits: Synchronous or Asynchronous.

• Synchronization:

Sync: a global clock

Async: start and completion mechanisms

Page 4: Practical Design and Performance Evaluation of Completion Detection Circuits

Motivation• Potential advantages of async. design:

• No clock skew problem, • Low power consumption, • Average-case performance, • Modularity, composability and reusability• Easier technology migration

• The promise of high performance is

especially attractive.

Page 5: Practical Design and Performance Evaluation of Completion Detection Circuits

Motivation• High performance async. design:

1. fast self-timed components with good average case performance

2. fast completion detection circuits, detecting the completion.

Self-timedcomponent

+

+C

AA

BB

0010

0010

SS

SS

0010

0n-11n-1

Ack0

Ackn-1

DoneReset...

......

Page 6: Practical Design and Performance Evaluation of Completion Detection Circuits

Motivation• High performance async. design:

1. fast self-timed components with good average case performance

2. fast completion detection circuits, detecting the completion.

Self-timedcomponent

+

+C

AA

BB

0010

0010

SS

SS

0010

0n-11n-1

Ack0

Ackn-1

DoneReset...

......

Page 7: Practical Design and Performance Evaluation of Completion Detection Circuits

Motivation• Fast self-timed components:

1. Delay-insensitive carry-lookahead adders

2. Delay-insensitive comparators:

)n(

)nlog(log

:complexity Logic

:complexity Time

)n(

)(

:complexity Logic

:complexity Time 1

Page 8: Practical Design and Performance Evaluation of Completion Detection Circuits

Motivation• Fast completion detection circuits:

1. Completion detection circuits (CDCs) are considered as the major overhead.

2. This paper address the design of fast completion detection circuits.

Page 9: Practical Design and Performance Evaluation of Completion Detection Circuits

Previous Work:

• Self-timed components may use

1. bundled data protocol

2. dual-rail signaling

Page 10: Practical Design and Performance Evaluation of Completion Detection Circuits

Previous Work: • CDCs for bundled data components 1. Delay elements (an inverter chain). delay > worst case delay.

2. Speculative completion [Nowick97] performance depend on A. number of matched delays and B. associated abort detection network 3. Current-Sensing Completion-Detection [Dean94,Grass96] A. consume substantial power B. requires several gate delays

Page 11: Practical Design and Performance Evaluation of Completion Detection Circuits

Previous Work: • CDCs for dual-rail self-timed components 1. General model: A. n two-input ORs B. 1 n-input C-element 2. Operations: A. computation cycle: DoneReset=1 B. reset cycle: DoneReset=0

+

+C

SS

SS

0010

0n-11n-1

Ack0

Ackn-1

...... DoneResetSelf-timed

component

AA

BB

0010

0010

...

Page 12: Practical Design and Performance Evaluation of Completion Detection Circuits

Previous Work: • N-input C-element: a tree of 2-input C-elms 1. long delay 2. large variance

C

C

C

C

….

….

….….

Ack0

Ack1

Ackn-2

Ackn-1

C

Page 13: Practical Design and Performance Evaluation of Completion Detection Circuits

Previous Work: • N-input C-element: 1. More efficient implementation: DoneReset = (done+reset DoneReset) A. done circuit: an n-input AND done = Ack0 Ack1 … Ackn-1

B. reset: circuit: an n-input OR reset = Ack0 + Ack1 + …+ Ackn-1

C. a 2-input C-elem.

2. delay & variance: better than the tree of 2-input C-elem

&...

Ack0

Ackn-1

+...

Ack0

Ackn-1

C

done

reset

DoneReset

Page 14: Practical Design and Performance Evaluation of Completion Detection Circuits

Previous Work: • Wuu’s CDCs [Wuu93]:

A. done circuit: a tree of NAND

B. reset circuit: a tree of NOR

C. long delay D. small variance E. use static gates

done

reset

))DoneResetreset(done(

DoneReset)reset(done

DoneReset)reset(doneDoneReset

1n10 Ack...AckAckdone

1n10 Ack...AckAckreset

Page 15: Practical Design and Performance Evaluation of Completion Detection Circuits

Previous Work: • Yun’s CDCs [Yun97]:

A. done circuit: a tree of domino logic

B. no reset circuit C. variant delay

D. large variance

E. use dynamic CMOS

11

0

1

1

0

1

0

1

0

0 0

prech

prech

S0i S1

i+( )M

7

i=0

prech

S0i S1

i+( )M

31

i=24

S0i S1

i+( )M

23

i=16

S0i S

1i+( )M

15

i=8

00S 1

0S

07S 1

7S

06S 1

6S

15S0

5S

04S 1

4S

03S 1

3S

02S 1

2S

01S 1

1S

8-bit completiondetection domino logic

done

Page 16: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design • Computation Completion detection circuits (dynamic n-input NOR)

(static 2-input NOR) SSAck

Ack...AckAck

Ack...AckAckdone

i

1

i

0i

1n10

1n10

1

0 0 0 0

Ack 0 Ack 1 Ack n-2 Ack n-1...

Ack i

done

1

0 0

Ack i

S0i

S1i

S0i S

1i

Page 17: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design • Reset Completion detection circuits

(dynamic 2n-input Or)

i

1

i

0i

1n

1

1n

0

0

1

0

0

1n10

SSAck

))S(S...)S((S

Ack...AckAckreset

0

...

1

0 0

S0i

S1i

S0i S

1i

00S 1

0S ... S1n-1

0 0

S0n-1

reset

0

Page 18: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design • Computation cycle:

For the done signal, 1. the PMOS transistor (Acki) will be closed and 2. all NMOS transistors will be open. 3. Thus, the done signal will be turned on.

on. turned eventually be willSor SEither i

1

i

0

1

0 0 0 0

Ack 0 Ack 1 Ack n-2 Ack n-1...

Ack i

done

1

0 0

Ack i

S0i

S1i

S0i S

1i

Page 19: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design • Computation cycle:

For the reset signal, the reset signal is turned on as soon as any Acki signal goes high

on. turned eventually be willSor SEither i

1

i

0

0

...

1

0 0

S0i

S1i

S0i S

1i

00S 1

0S ... S1n-1

0 0

S0n-1

reset

0

Page 20: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design • Reset cycle:

For the done signal, the done signal is turned off as soon as any Acki signal is turned off

off. turned eventually be willSor SEither i

1

i

0

1

0 0 0 0

Ack 0 Ack 1 Ack n-2 Ack n-1...

Ack i

done

1

0 0

Ack i

S0i

S1i

S0i S

1i

Page 21: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design • Reset cycle:

For the reset signal, the reset signal is turned off only after all Acki signals are turned off.

off. turned eventually be willSor SEither i

1

i

0

0

...

1

0 0

S0i

S1i

S0i S

1i

00S 1

0S ... S1n-1

0 0

S0n-1

reset

0

Page 22: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design • done + reset circuits = dual-rail multi-input C-element

• done + reset circuits + 2-input C-element = single-rail multi-input C-element

• Implementation of 2-input C-element: 1

0

1

0

Weak done

reset

done

reset

DoneResetDoneReset

done

reset

done

reset

Page 23: Practical Design and Performance Evaluation of Completion Detection Circuits

DIRCA With CDC: part 1

Page 24: Practical Design and Performance Evaluation of Completion Detection Circuits

DIRCA With CDC: part 2

Page 25: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design

1

0 0 0 0

Ack 0 Ack 1 Ack n-2 Ack n-1...

Ack i

done

1

0 0

Ack i

S0i

S1i

S0i S

1i

• The PMOS in the pull-up circuit of the done circuit saves power in non-operation mode.

• In a quiescent state, all Acki signals are zero. All pull-down transistors are closed. • To save power, pull-up transistor is open to cut off the path from Vdd to Ground.

Page 26: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design

1

0 0 0 0

Ack 0 Ack 1 Ack n-2 Ack n-1...

Ack i

done

1

0 0

Ack i

S0i

S1i

S0i S

1i

• Input low arrives too early, power is wasted.• Input low arrives too late, take a longer time to turn on the done signal. • Low power consumption latest Acki signal• High performance any not-latest Acki signal

Page 27: Practical Design and Performance Evaluation of Completion Detection Circuits

SPICE Output: done circuit

ChengDone0:1. Ack0 is the latest signal.2. input pulses: 3 and 43. buffered input:10044. Ack0:1005. Done:246806. DoneReset: 200

Delay=0.55ns

Page 28: Practical Design and Performance Evaluation of Completion Detection Circuits

SPICE Output: done circuit

ChengDone1:1. Ack1 is the latest signal.2. input pulses: 5 and 63. buffered input:10064. Ack1:1015. Done:246806. DoneReset: 200

Delay=0.22ns

Page 29: Practical Design and Performance Evaluation of Completion Detection Circuits

SPICE Output: done circuit

ChengDone37:1. All Ack arrive at the same time2. Done:246803. DoneReset: 200

Delay=0.64ns

Page 30: Practical Design and Performance Evaluation of Completion Detection Circuits

SPICE Output: reset circuit

Delay=1.23ns

ChengReset0:1. Ack0 is the latest signal.2. input pulse: 3 and 43. buffered input:10045. Reset:135796. DoneReset: 200

Page 31: Practical Design and Performance Evaluation of Completion Detection Circuits

SPICE Output: reset circuit

Delay=0.87ns

ChengReset1:1. Ack0 is the latest signal.2. input pulse: 3 and 43. buffered input:10045. Reset:135796. DoneReset: 200

Page 32: Practical Design and Performance Evaluation of Completion Detection Circuits

SPICE Output: reset circuit

Delay=1.34ns

ChengReset37:1. All Ack reset at the same time2. Done:246803. DoneReset: 200

Page 33: Practical Design and Performance Evaluation of Completion Detection Circuits

Our Design

1

0 0 0 0

Ack 0 Ack 1 Ack n-2 Ack n-1...

Ack i

done

1

0 0

Ack i

S0i

S1i

S0i S

1i

• Constraint: when conducting,

when only one pull-down transistor is conducting. • This can be achieved by properly sizing transistors.

pull-dwonpull-up RR 5

Page 34: Practical Design and Performance Evaluation of Completion Detection Circuits

Logic Complexity

done done+resetcircuit

n-bit 32-bit 64-bit n-bit 32-bit 64-bitWuu 10n-4 316 636 14n-8 440 888Yun 4n-5 123 251 N/A

Cheng 5n+1 161 321 7n+5 229 453

# of transistors

Page 35: Practical Design and Performance Evaluation of Completion Detection Circuits

Performance Evaluation• SPICE Simulation: 1. use MOSIS 2 micron CMOS level 2 parameters 2. W=3u L=2u (buffer 0.4 ns 2-input Nor 0.18ns)• Computation-completion detection circuits 38 typical cases (for Wuu, Yun and Cheng) The delay measured includes the delay of the OR gate for Acki.• Reset-completion detection circuits: 38 typical cases (Wuu and Cheng)

Page 36: Practical Design and Performance Evaluation of Completion Detection Circuits

Performance Evaluation

Computation Completion Detection32-bit done(ns) Speed upCase

Wuu Yun Cheng C vs W C vs YMin 2.18 1.46 0.22 4.1 2.8Max 2.65 3.36 0.64 10.4 14.3Avg 2.27 2.53 0.28 9.2 10.2

Page 37: Practical Design and Performance Evaluation of Completion Detection Circuits
Page 38: Practical Design and Performance Evaluation of Completion Detection Circuits
Page 39: Practical Design and Performance Evaluation of Completion Detection Circuits

Performance Evaluation

Reset Completion Detection 32-bit reset(ns) Speed up Case

Wuu Cheng C vs W Min 2.40 0.87 Max 2.89 1.34 Avg

2.85 0.71 4.0

Page 40: Practical Design and Performance Evaluation of Completion Detection Circuits

Conclusions

• A new completion detection circuit for dual-rail self-timed components. 1. very fast computation-completion detection 2. very fast reset-completion detection

• Low-overhead, very fast completion detection

circuit is crucial for high performance

self-timed circuits.

Page 41: Practical Design and Performance Evaluation of Completion Detection Circuits

Conclusions

• SPICE simulation results:

1. our computation-completion detection circuit 9 times faster than Wuu's and Yun's

2. our reset-completion detection circuit: 4 times faster than Wuu's.