37
inst.eecs.berkeley.edu/~ee241b Borivoje Nikolić EE241B : Advanced Digital Circuits Lecture 18 – Power-Performance Tradeoffs 2 1 EECS241B L18 POWER-PERFORMANCE II MarketWatch, March 28: Opinion: There’s no returning to regular schooling as online learning goes mainstream, by Alex Hicks When in-person education resumes, online learning tools and methods will be entrenched in the system

EE241B : Advanced Digital Circuitsinst.eecs.berkeley.edu/~ee241b/sp20/Lectures/... · • Optimal logic depth was found to be 8-11 FO4 delays in superscalar processors • 1.8-3 FO4

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • inst.eecs.berkeley.edu/~ee241b

    Borivoje Nikolić

    EE241B : Advanced Digital Circuits

    Lecture 18 – Power-Performance Tradeoffs 2

    1EECS241B L18 POWER-PERFORMANCE II

    MarketWatch, March 28: Opinion: There’s no returning to regular

    schooling as online learning goes mainstream, by Alex Hicks

    When in-person education resumes, online learning tools and methods will be entrenched in the system

  • Announcements

    • Project midterm reports due today, March 31• Please e-mail me the link to your web page

    • Assignment 3 due Thursday, April 2.• Quiz next Tuesday

    • Reading – req’d• Rabaey et al, LPDE, Ch. 4

    2EECS241B L18 POWER-PERFORMANCE II

  • Outline

    • Module 5• Power-performance tradeoffs

    3EECS241B L18 POWER-PERFORMANCE II

  • 5.B Power-Performance Tradeoffs

    4EECS241B L18 POWER-PERFORMANCE II

  • Know Your Enemy

    • Where does power go in CMOS?• Switching (dynamic) power

    • Charging capacitors• Leakage power

    • Transistors are imperfect switches• Short-circuit power

    • Both pull-up and pull-down on during transition• Static currents

    • Biasing currents

    EECS241B L18 POWER-PERFORMANCE II 5

  • Summary of Power Dissipation Sources

    • – switching activity• CL – load capacitance• CCS – short-circuit “capacitance”• Vswing – voltage swing• f – frequency

    DDLeakDCDDswingCSL VIIfVVCCP ~

    IDC – static current Ileak – leakage current

    powerstaticrateoperationenergyP

    EECS241B L18 POWER-PERFORMANCE II 6

  • CMOS Performance Optimization• Reminder - sizing: Optimal performance with equal fanout per stage

    • Extendable to general logic cone through ‘logical effort’• Equal effective fanouts (giCi+1/Ci) per stage• Optimal fanout is around 4

    CL

    CL

    predecoder

    3 15

    CW

    word driver

    addrinput

    wordline

    [Ref: I. Sutherland, Morgan-Kaufman‘98]EECS241B L18 POWER-PERFORMANCE II 7

  • Performance Optimization

    Energy

    Increasing performanceincreases power!

    Delay =1/Performance

    EECS241B L18 POWER-PERFORMANCE II 8

  • Performance OptimizationEnergy

    Delay = 1/Performance

    Mircoarchitecture A

    Mircoarchitecture B

    EECS241B L18 POWER-PERFORMANCE II 9

  • The Design Abstraction Stack

    Logic/RT

    (Micro-)Architecture

    Software

    Circuit

    Device

    System/Application

    A very rich set of design parameters to consider!It helps to consider options in relation to their abstraction layer

    sizing, supplies, thresholds

    logic family, standard cell versus custom, clocking

    Parallel versus pipelined, general purpose versus application specific

    Bulk, PDSOI, FDSOI, finFET

    Choice of algorithm

    Amount of concurrency

    EECS241B L18 POWER-PERFORMANCE II 10

  • Achieve the highest performance under the power cap

    Delay

    Unoptimized design

    DmaxDmin

    Energy/op

    Emin

    Emax

    Power-Performance Optimization

    EECS241B L18 POWER-PERFORMANCE II 11

  • Achieve the highest performance under the power cap

    Delay

    Unoptimized design

    Var1

    Energy/op

    Designoptimizationcurves

    Emax

    DmaxDmin

    Emin

    Power-Performance Optimization

    EECS241B L18 POWER-PERFORMANCE II 12

  • Achieve the highest performance under the power cap

    Delay

    Unoptimized design

    Var1

    Var2

    Energy/op

    Designoptimizationcurves

    Emax

    DmaxDmin

    Emin

    Power-Performance Optimization

    EECS241B L18 POWER-PERFORMANCE II 13

  • How far away are we from the optimal solution?

    Delay

    Unoptimized design

    Var1

    Var2

    Var1 + Var2

    Energy/op

    Designoptimizationcurves

    Emax

    DmaxDmin

    Emin

    Power-Performance Optimization

    EECS241B L18 POWER-PERFORMANCE II 14

  • Global optimum – best performance

    Delay

    Unoptimized design

    Var1

    Var2

    Var1 + Var2

    Global

    Energy/op

    Designoptimizationcurves

    Emax

    DmaxDmin

    Emin

    Power-Performance Optimization

    EECS241B L18 POWER-PERFORMANCE II 15

  • Maximize throughput for given energy or

    Minimize energy for given throughput

    Delay

    Unoptimized design

    Emax

    DmaxDmin

    Energy/op

    Emin

    Power-Performance Optimization

    EECS241B L18 POWER-PERFORMANCE II 16

  • topology A

    topology BDelay

    Ener

    gy/o

    p

    Power-Performance Optimization

    • There are many sets of parameters to adjust• Tuning variables• Circuit

    (sizing, supply, threshold)

    • Logic style(std. cells, custom , …)

    • Block topology (adder: CLA, CSA, …)

    • Micro-architecture (parallel, pipelined)

    EECS241B L18 POWER-PERFORMANCE II 17

  • Power-Performance Optimization

    • There are many sets of parameters to adjust• Tuning variables• Circuit

    (sizing, supply, threshold)

    • Logic style(std. cells, custom , …)

    • Block topology (adder: CLA, CSA, …)

    • Micro-architecture (parallel, pipelined)

    Globally optimal power-performance curve for a given function

    EECS241B L18 POWER-PERFORMANCE II 18

    topology A

    topology BDelay

    Ener

    gy/o

    p

  • f (Anom,B)

    Delay

    Ener

    gy

    D*

    SA

    SB

    f (A0,B)

    f (A,B0)

    (A0,B0)

    D0

    0AAA AD

    AES

    Energy-Delay Sensitivity

    EECS241B L18 POWER-PERFORMANCE II 19

  • Delay

    Ener

    gy

    f (A0,B)

    f (A,B0)

    (A0,B0)

    f (A1,B)∆D

    ∆E = SA∙(∆D) + SB∙∆D

    D0At the solution point all sensitivities should be equal

    Solution: Equal Sensitivities

    EECS241B L18 POWER-PERFORMANCE II 20

  • 5. C Architectural Optimization

    21EECS241B L18 POWER-PERFORMANCE II

  • Optimal Processors

    • Processors used to be optimized for performance• Optimal logic depth was found to be 8-11 FO4 delays in superscalar processors• 1.8-3 FO4 in sequentials, rest in combinatorial

    • Kunkel, Smith, ISCA’86• Hriskesh, Jouppi, Farkas, Burger, Keckler, Shivakumar, ISCA’02• Harstein, Puzak, ISCA’02• Sprangle, Carmean, ISCA’02

    • But those designs are have very high power dissipation• Need to optimize for both performance and power/energy

    EECS241B L18 POWER-PERFORMANCE II 22

  • From System View: What is the Optimum?

    • How do sensitivities relate to more traditional metrics:• Power per operation (MIPS/W, GOPS/W, TOPS/W)• Energy per operation (Joules per op)• Energy-delay product

    • Can be reformatted as a goal of optimizing power x delayn• n = 0 – minimize power per operation• n = 1 – minimize energy per operation• n = 2 – minimize energy-delay product• n = 3 – minimize energy-(delay)2 product

    EECS241B L18 POWER-PERFORMANCE II 23

  • Optimization Problem

    • Set up optimization problem:• Maximize performance under energy constraints• Minimize energy under performance constraints

    • Or minimize a composite function of EnDm• What are the right n and m?

    • n = 1, m = 1 is EDP – improves at lower VDD• n = 1, m = 2 is invariant to VDD

    • E ~ CVDD2• D ~ 1/VDD

    EECS241B L18 POWER-PERFORMANCE II 24

  • Hardware Intesnity

    • Introduced by Zyuban and Strenski in 2002.• Measures where is the design on the Energy-Delay curve• Parameter in cost function

    optimization

    EECS241B L18 POWER-PERFORMANCE II 25

    Slope of the optimal E-D curve at the chosen design point

  • Optimum Across Hierarchy Layers

    Zyuban et al, TComp’04EECS241B L18 POWER-PERFORMANCE II 26

    Optimal logic depth in pipelined processors is ~18FO4Relatively flat in the 16-22FO4 range

  • 5.D Circuit-Level Tradeoffs

    27EECS241B L18 POWER-PERFORMANCE II

  • iin

    iL

    ThDD

    DDdpi C

    CVV

    VKt,

    ,1

    iin

    iL

    ThDD

    DDdpi W

    WVV

    VKtD,

    ,1

    CL

    In Out

    1 2 N

    Alpha-Power Based Delay Model

    EECS241B L18 POWER-PERFORMANCE II 28

  • ♦ Leakage

    ♦ Switching 20 1 , int,Sw L i i DDE C C V

    DVeIWE DDnV

    VV

    InLkt

    DDTh

    0

    Energy Models

    EECS241B L18 POWER-PERFORMANCE II 29

  • Sizing, Supply, Threshold Optimization

    • Transistor sizing can yield large power savings with small delay penalties• Gate sizing• Beta-ratio adjustments• (Stack resizing)

    • Supply voltage affects both active and leakage energy• Threshold voltage affects primarily the leakage

    EECS241B L18 POWER-PERFORMANCE II 30

  • CL

    In Out1 2 N

    Unconstrained energy: find min D = tpi

    Constrained energy: find min D, under E < EmaxWhere E = ei

    11 jginjginjgin CCC ,,, 11 jjj WWW

    Apply to Sizing of an Inverter Chain

    EECS241B L18 POWER-PERFORMANCE II 31

    tp1 tp2 tpN

  • Constrained Optimization

    • Find min(D) subject to E = Emax• Constrained function minimization

    • E.g. Lagrange multipliers

    • Can solve analytically for x = Wj, VDD, VThEECS241B L18 POWER-PERFORMANCE II 32

    maxExExDx

    0x

    maxDDxEx

    Or dual:

  • Inverter Chain: Sizing Optimization

    EECS241B L18 POWER-PERFORMANCE II 33

  • • Variable taper achieves minimum energy• Reduce number of stages at large dinc

    [Ma, Franzon, IEEE JSSC, 9/94]

    1 2 3 4 5 6 70

    5

    10

    15

    20

    25

    stage

    effe

    ctiv

    e fa

    nout

    , f

    0%

    1%

    10%

    30%

    dinc= 50%nomopt

    1 1

    11j j

    jj

    W WW

    W

    Wnom

    DD

    SKV

    22

    1

    jW

    j j

    eS

    f f

    Stojanovic, ICCAD’02

    Inverter Chain: Sizing Optimization

    EECS241B L18 POWER-PERFORMANCE II 34

    ej – energy per stagefj – fanout per stage

  • • Gate sizing (Wi)

    • Supply voltage (Vdd)

    xv = (VTh+VTh)/Vdd

    Vth Vdd

    Vddnom

    Sens(Vdd)

    0

    for equal feff (Dmin) 1

    swj j

    nom j jj

    EW e

    D f fW

    ~-2E/Dv

    vsw

    DD

    DD

    sw

    xx

    DE

    VD

    VE

    1

    12

    Sensitivity to Sizing and Supply

    EECS241B L18 POWER-PERFORMANCE II 35

  • • Threshold voltage (Vth)

    Low initial leakage

    speedup comes for “free”0 Vth

    Vthnom

    Sens(Vth)

    1

    t

    ThThDDLk

    th

    Th

    nVVVVP

    VD

    VE

    Sensitivity to Vth

    EECS241B L18 POWER-PERFORMANCE II 36

  • Next Lecture

    • Low-power design• Lowering supply voltage

    EECS241B L18 POWER-PERFORMANCE II 37