Prof. Dejan Markovi ee216a@gmail - UCLAicslwebs.ee.ucla.edu/dejan/classwiki/images/7/7c/Lec-05_Logical... · Example: a 2 ‐input NAND gate

1

EE M216A .:. Fall 2010Lecture 5

Logical Effort

Prof. Dejan Marković[email protected]

Logical Effort Recap

Normalized delay– d = g·h + p– g is the logical effort of the gate

g C /C● g = CIN/CINV

● Inverter is sized such that RINV equals the gate’s drive strength

– h is the electrical effort● h = COUT/CIN

● The combination of g·h is essentially our fan‐out definition

– p is the parasitic effort● p = CSELF/CINV

D. Markovic / Slide 2

p SELF/ INV

● Typically ~1 for an inverter

May have different g, p for each type of gate structure– The g, p may differ per input, and for pull‐up/down

EEM216A .:. Fall 2010 Lecture 5: Logical Effort | 2

2

A Side Note: Total Gate Effort (gTOT)

One way of evaluating a gate is to look at the total logical effort of the gate– gTOT = CIN_TOT/CINV

● A sum of the total input capacitance● A sum of the total input capacitance

– Example:● 3‐input NAND● Equivalent inverter is 4:2● gINPUT = 10/6=5/3● gTOT = 30/6 = 5 (or 3·gINPUT)

WP:WN = 4:6


The total gate effort is not very useful to calculate delay– But it is an indication of the “cost” of the gate– Can be very useful in gate mapping of logic synthesis

● To find which gate is best to use to map a given Boolean expression


A Note on Asymmetry

The gates used in examples so far have been symmetric– All inputs are essentially identical

● Some are bundled

P:N ratio is approximately equal to the mobility ratio– P:N ratio is approximately equal to the mobility ratio● Same as the reference inverter

The truth is that inputs of most gates are not symmetric– Different inputs may see different capacitances– Even series stacked gates may not be the same size

Pull‐up and pull‐down is rarely equal resistanceC ll thi “ k d” t


– Call this “skewed” gates– P:N sizing for optimal delay is β = root(µ)– Inverter in a domino gate often favors the PMOS to improve

speed


3

Asymmetric Examples

PMOS NOR: different sizingEquivalent inv is still 6:2– CinA = 10, CinB = 26

(10/8 5/4) t l t

For more complex gates:Equivalent inv is now 9:3− gEDC=21/12=7/4,

g 30/12 5/2– gA (10/8=5/4) not equal to gB (26/8=13/4)● B input is much worse…

(for a reason)

Bd

e

a’

b’cde

2412

12

18

18

999

gA’B’=30/12=5/2


A

NOR

Outputb’a’

c

d

8

2 2

12

12

12 12

Same β = µ = 3

The larger g is BAD! (More effort to do logic)


What is the Reference Inverter?

The assumption of logical effort is that the reference inverter has equal rise and fall delays– β =µ

The implication for different pull‐up and pull‐down resistance is that the rising and falling delays are not equal– Similar to the parasitic delay calculation earlier

● dUP = gUP·h + pUP

● dDN = gDN·h + pDN


Since static CMOS gates are inverting, the transitions through subsequent gates must be alternating– Use an average logical effort (and parasitic effort) – a good idea

● gAVG = (gUP + gDN)/2


4

Skewed P:N Ratio Gates

Assume: β = 1.7, µ = 3Example: Inverter– Pull‐up: reference inverter is sized P:N of WP:WP/µ

W (1 1/1 7)/W (1 1/ ) 1 19● gUP = WP(1+1/1.7)/WP(1+1/µ) = 1.19

– Pull‐down: reference inverter is sized P:N of µWN:WN

● gDN = WN(1+1.7)/WN(1+ µ) = 0.675

– gAVG = 0.93 (instead of 1)Example: 2‐input NAND Gate– gUP = WP(1+2/1.7)/WP(1+1/µ) = 1.63

W (2 1 7)/W (1 ) 0 925


– gDN = WN(2+1.7)/WN(1+ µ) = 0.925– Average = 1.28 (instead of 5/4)

Skewing rise/fall transition in gates may improve delay(because reference gate wasn’t optimized for min tp)


Velocity Saturation

Velocity saturation complicates a little– Need to account for the change in resistance

A f i t i W 2WAssume reference inverter is WP=2WN

Example: a 2‐input NAND gate with sizing WP=2, WN=2– Assuming RN_nostack=(4/3)RN_stack, RP_nostack=(6/5)RP_stack

– The equivalent inverter with same drive resistance● Pull‐up: equiv inv = 2:1, gUP=4/3 (same as before)


● Pull‐down: equiv inv = 8/3:4/3, gDN=4/4=1

This makes sense because velocity saturation allows the transition through the series stacking be faster (more current)


5

Choosing a Reference Inverter

The reference inverter can actually be any βLogical effort can be used for “optimally” sizing a logic chain– Part of optimal sizing is to add inverters to increase N

(the number of logic stages)(the number of logic stages)– It is inconvenient to add inverters that do not have a LE of 1

A common practice is to choose the reference × inverter β to be the one that is used. Use a normalization to adjust the g.– Calculate gINV(β =µ) assuming the reference × inverter gAVG = 1– Example:

● g = 1 with β = 1 7 (µ = 3)


● gREF = 1 with β = 1.7 (µ = 3)● gAVG_INV(β=µ) = 1.16 for β = 3 (µ = 3) ● A 2‐input NAND β = 1.7 has g = 1.37● A 2‐input NAND β = 3 has g = 1.45 = 1.25·1.16

– g of 2‐input NAND with reference inverter of β = µ is 1.25


Generalization for Multistage Networks

Same concept applies at the path level

Stage effort: fi = gi· hi

Path electrical effort: Hpath = Cout /Cin

Path logical effort: Gpath = g1g2…gN

Branching effort: B th = b1b2 bN I hi

D. Markovic / Slide 10EEM216A .:. Fall 2010 Lecture 5: Logical Effort | 10

Branching effort: Bpath b1b2…bN

Path effort = Gpath· Hpath· Bpath

Path delay D = Σdi = Σpi + Σgi· hi

Ignore thisfor now

6

Total Effort

A path can have a total effort

– G = Path Logical Effort ∏=

=N

iiPATH gG

1

– H = Path Electrical Effort

– F = Path Effort● Consider path effort as fanout. Total fanout

should be a product of all fanouts.

Treat the path as a single “gate”

∏=

=

=

=

N

iiPATH

IN

OUTPATH

i

fF

CCH

1

1


– Treat the path as a single “gate”


Example: Total Effort

Calculations

– G = (4/3)2(5/3) = 3∏

=

=

OUT

N

iiPATH

CH

gG1

– H = 60/4 = 15

– F = 45; F ?= GH (yes for this case)

g=4/3p =2

WP:WN=2:2f = 4

∏=

=

=

N

iiPATH

IN

OUTPATH

fF

CH

1

WP:WN=24:6


g=4/3

p =2

p =2

g=5/3

p =2

Cout=60

WP:WN=6:6

12

3f1 = 4

f2 = 3.33

f3 = 3.33

7

Example: Random Multi‐stage Networks

Delay of a multi‐stage network = sum of stage delays

– Path Effort Delay ∑=i

iF fD

– Path Parasitic Delay

– Total Path Delay

∑=i

ipP

∑ +==i

Fi PDdD

f1 = 4

f 3 33g = 4/3W :W = 2:2


f2 = 3.33

f3 = 3.33

DF = 10.66, P = 6

D = 16.66 (τ delays)g = 4/3p = 2

g /3p = 2

g = 5/3p = 2

Cout = 60

WP:WN = 2:2

WP:WN = 6:6

12

3

WP:WN = 24:6

Add Branching Effort

Ratio of total to on‐path capacitance– How much more current is needed to supply on‐path

(given that some current “flows” off‐path)

Con-path

Con-path + Coff-pathb =


cload

8

Branching Example #1

5

15

90

g =h =F =f

190/5 = 1818 (wrong!)(15+15)/5 6

Introduce new kind of effort to account for branching:

– Branching Effort:

15 90f1 =f2 =F =

(15+15)/5 = 690/15 = 636, not 18!

Con‐path + Coff‐path

Cb =


– Path Branching Effort:


Con‐path

Π biB =

Now we can compute path effort: H = ∏g·h·b

Multistage Networks with Branching

General logical effort formulation

Stage effort: fi = gi· hi

Path electrical effort: Hpath = Cout /Cin

Path logical effort: Gpath = g1g2…gN

Branching effort: B th = b1b2 bN


Branching effort: Bpath b1b2…bN

Path effort = Gpath· Hpath· Bpath

Path delay D = Σdi = Σpi + Σgi · bi · hi

Branching

9

Branching Example #2

g = 4/3p = 2

Cout = 60Cin = 4

WP:WN = 2:2 WP:WN = 24:6

WP:WN = 6:6 WP:WN = 24:6

12

3

f1 = 8

f2 = 6.66

f1 = 4

f2 = 3.33

branching w/o bran.

For circuits with branching:– GPATH is the same = 3– HPATH is the same = 15– FPATH differs

g = 4/3p = 2

g = 5/3p = 2

Cout 60

WP:WN = 6:6

2f3 = 3.33 f3 = 3.33


FPATH differs● h1 = 24/4= 6, h2 = 60/12=5, h3=2● FPATH = 4/3 · 6 · 4/3 · 5 · 5/3 · 2 = 177.8

F is no longer GH– New FPATH with branching: FPATH = G·B·H


Interconnects

Capacitance from wires are the most difficult to deal withFixed load so intuitively, we would increase fanoutShort wires – small parasitic capacitance– Treat them as increasing p for each gateTreat them as increasing p for each gate

● Not exact but accounts for the effectLong wires – large capacitance (dominates gate loading)– Size of driving gate is as if driving a large C (inverter chain)– Size of receiving gate does not impact branching

● It is typically big and limited by area, power, and functionMedium wires – most difficult. Cap similar to gate loading


– Brute force is to write delay as function of gate and wire capacitances ● Nth order polynomial and differentiate

– More realistic method is to iterate


10

Handling Wires & Fixed Loads

iCL

Cw

i


Summary

Delay and/or power of a logic network depend significantly on the relative sizes of logic gates (not transistors within a gate)Inverter buffering is a simple example of the analysis– The analysis leads to ~FO4 as being optimal fanout for drivingThe analysis leads to FO4 as being optimal fanout for driving

larger capacitive loadsTo generalize analysis of delay, we introduce logical effort– Delay normalized by inverter delay, d = g·h + p– g and p are characteristics of a logic gate that depends on its

structure and does not depend on gate size.● May have different g’s and p’s for different inputs and PU / PD


● Simplify by using gAVG and ignoring C’s of intermediate nodes– Once a table of g’s and p’s are created for the catalog of gates,

delay can be calculated quickly and easilyNext we will look at how to size a network (instead of just analyzing it)


11

Optimum Effort per Stage

When each stage bears the same effort:

Fanout of each stage:

Complex gates should drive smaller load!!!


Minimum path delay

Gate Sizing Example (1/2)

From: David Harris

First, compute path effort:

Hpath


The optimal stage effort is:

path


12

Gate Sizing Example (2/2)

We can now size the gates:

8.1345.1201 =⋅=z 5.14

45.135

=⋅=yx


7.1245.13

4=⋅=

zy 1045.1

1 =⋅=xCin

The total normalized delay is (assuming pinv = 1):


Sizing Example with Branching

Size the gates for optimum delayOnce the path effort is determined, it is quite easy to determine the appropriate gate sizes

Start from the output of a pathiiout gC

C _=– Start from the output of a path– Work backwards to the input

● Check your work if the input is the same as the specification● Assuming each unit W has capacitance of unit C

optiin f

C _ =

g = 4/3Gate 2 (dup) Gate 3 (dup)

B = b1b2=4

F = 177.8, fopt = 5.62


g = 4/3p = 2

g 4/3p = 2

g = 5/3p = 2

Cout = 601

23Cin = 4

Cin3 = 60*5/3*1/5.6 = 17.9

Cin2 = b2*17.9*4/3*1/5.6 = 8.5

Cin1 = b1*8.5*4/3*1/5.6 = 4

WP3 = 4/5*17.9 = 14.3

WP2 = ½*8.5 = 4.25

13

Branches: Same # Stages, Different Loads

Example: 2 paths with the same # of stages but different loads– Optimal system has all paths with equal delay– Branching for path 1, b = 1 + α

A ti i th t ~● Assumption is that p1 ~ p2

Cout1

2

2222

1

1111

2211

1 2

in

out

in

out

NN

CCGBF

CCGBF

pFNpFN

DD

===

+=+

=Cin1

Path 1


Cout2

111

222

1

2

out

out

in

in

CGBCGB

CC

== αCin2

Path Effort Estimate

With equal‐stage branches, the Path Effort can be estimatedwithout knowing each stage’s, b– Htot = (Cout1+Cout2)/Cin (with B = 1)

Simple example for Path 1:– G = gNAND

– H1 = Cout1/Cin

– B = (1+Cout2/Cout1) ● Since F1 = F2 and G1 = G2, B = (1+Cout2/Cout1)

– F = G·H1·B = gNAND(Cout1+Cout2)/Cin

Cout1

Cout2

Cin1

Cin2

Path 1

Cin


– Same as F = G·Htot

The errors occur with different g and p for the gates in the two different paths


14

Unequal‐Length Branches (Not Straightforward)

If both branches are long, then N1=N2 is an o.k. assumptionOtherwise, fopt differs between two paths

Solving precisely, example:C = H C

2dinv

1C1

– 2dinv − p = Cout1/C1

– (dinv − p)2 = Cout2/C2

– dnand = gnand(C1 + C2)/Cin

– D = dnand + 2dinv = gnand(Cout1/(2dinv − p) + Cout2 /(dinv − p)2) + 2dinv

– Take partial derivative of ∂D/∂(dinv)

Cout2

= H2·Cin

Cin = 1

Cout1 = H1·Cin

dinv dinvdnand 2

C2


Not easy so ignore p to simplify– If gNAND = 4/3, H1 = H2 = 3: dnand= 2.35, dinv = 1.80– The per stage f = g·h (no p) is different per stage

Most branches are relatively long or not critical path


Re‐convergent Paths

Very similar to unequal branching, another logic structure that adds complexity to Logical Effort is recombined branches.– Typically constrains the sizing problem

Example: ignoring parasitics. Let x = Cin = 1– fa = (y+z)/x– fb = (3z/y)– fc = gNANDCout/z– 2 variables

Cout

Cinx

yz

fa

fb

fc2x NANDz


– Constrain with fa = fb = fc, 2 vars equations (directly solve)● Any constraints are possible

– Increase variables by introducing more buffering (parallel to y)


15

Logic Opt. Example: 2‐Stage; 8‐Input AND

Assume symmetric 8‐input AND (β = µ =2, 3Wo is unit C)– gNAND=10/3, pNAND=8 for an 8‐input NAND– gINV = 1, pINV=1– Cout=100, Cin=1Cout 100, Cin 1

Logical effort– G = 10/3, B = 1, H = 100– F = 333.3– For 2 stages, fopt=18.3

● h2 = 18.3, Cin2 = 5.5, WP = 11Wo● h1 = 5.5, Cin1 = 1, WP = 0.6Wo

D l 36 6 9 45 6

x8

Cout

G1 G2


● Delay = 36.6 + 9 = 45.6– Buffering

● For 3 stages, fopt = 7 (1 extra inverter), Delay = 31● For 4 stages, fopt = 4.3 (2 extra inverters), Delay = 28.2● For 5 stages, fopt = 3.19 (closer to optimal of 3.6), Delay=28● For 6 stages, fopt = 2.63 (below optimal of ~3.6), Delay=28.8


Logic Opt. Example: Multi‐2‐Input Gate 8‐AND

Many ways to implement this same function

Use a tree of fewer input AND gates– (A0A1)(A2A3)…– If multiple ANDs (as in a memory decoder), then partial

results can be shared


16

2‐input Implementation

Assuming that 3W has capacitance of unit C

out

Cout = 100

G7G6G5G4G3G2

Cin = 1

Assuming that 3Wo has capacitance of unit C– F = B·G·H = 4/33 ·100 = 235– fopt = 2.48 (too small)– D = 6·2.48 + 3·2 + 3·1 = 14.88 + 9 = 23.88

● Still better than 8‐input NANDOptimal sizing– G7: Cin7 = 100ginv/f = 40, WP=80W0


– G6: Cin6 = Cin7gnand/f = 21.5, WP=32W0

– G5: Cin5 = Cin6ginv/f = 8.7, WP = 17.4W0

– G4: Cin4 = Cin5gnand/f = 4.7, WP = 7W0

– G3: Cin3 = Cin4ginv/f = 1.88, WP = 3.8W0

– Double check G2: Cin = Cin3gnand/f = 1, WP=1.5W0


4‐input Implementation

out

Cout = 100

G5G4G3G2

Cin = 1

Assuming that unit for C = 3Wo

– F = B·G·H = 4/3·6/3·100 = 266– fopt = 4.04 (a tad high)– D = 4·4.04 + 1·4 + 1·2 + 2·1 = 16 .:. 16 + 8 = 24.16

● Slightly worse than 2‐input! Due to self‐loadingOptimal sizing


p g– G5: Cin5 = 100ginv/f = 24.75, WP=50W0

– G4: Cin4 = Cin5gnand/f = 8.16, WP=12.5W0

– G3: Cin3 = Cin4ginv/f = 2.02, WP=4W0

– Double check G2: Cin=Cin3gnand4/f = 1, WP=1.0W0


17

Example: 2‐NOR Implementation

i h i f

out

Cout = 100

G5G4G3

G2

Cin = 1

Assuming that unit for C = 3Wo

– F= BGH = 4/32*5/3*100 = 296– fopt = 4.14 (a tad high)– D = 4·4.14 + 3·2 + 1·1 = 16.56 + 7 = 23.6 (Best one!)

Optimal sizing– G5: Cin5= 100ginv/f = 24.1, WP=48W0


5 5 0

– G4: Cin4 = Cin5gnand/f = 7.76, WP=11.6W0

– G3: Cin3 = Cin4gnor/f = 3.125, WP=7.5W0

– Double check G2: Cin = Cin3gnand/f = 1, WP = 1.5W0

Lesson: use close to optimal f, use gates with small p


Logical Effort “Design Flow”

Compute the path effort: Path Effort = F = ∏ g· h· b

Find the best number of stages: N* ~ log4(PathEffort)

Compute the stage effort: se* = (PathEffort)1/N

Working from either end, determine gate sizes:


Reference: Sutherland, Sproull, Harris, “Logical Effort,” (Morgan-Kaufmann 1999)

18

Summary of LE Design Methodology

1. Draw network2. Buffer non‐critical paths with minimum‐sized gates

• Minimize loading on critical path• Simplifies sizing of non‐critical pathSimplifies sizing of non critical path

3. Estimate total effort along each path (without branching)4. Verify that the number of stages is appropriate

• Add inverters if fopt > 55. Assign branch ratio of each branch

• Estimate based on the ratio of the Effort of the paths• Ignore paths that have little effect (i.e. min‐sized)


• Include wire capacitances6. Compute delays for the design (include parasitic delay)

• Adjust branching ratios (especially with wire capacitance)• Repeat if necessary until delay meets specification

7. Re‐optimize logic network if fopt is small (Return to step 3)EEM216A .:. Fall 2010 Lecture 5: Logical Effort | 35

Documents

Prof. Dejan Markovi ee216a@gmail - UCLAicslwebs.ee.ucla.edu/dejan/classwiki/images/7/7c/Lec-05_Logical... · Example: a 2 ‐input NAND gate