43
Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Embed Size (px)

Citation preview

Page 1: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Low Power Functional Unit for use in Coarse Grained Reconfigurable Array

Nathaniel McVicarCorey Olson

Jimmy Xu

Page 2: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Outline Functional Unit

Shifter ALU MADD

Design Flow (all modules) VCS Design Compiler PrimeTime Encounter & Cadence v2lvs

UPF Tutorial Results

Dynamic Power consumption of modules Power Down/Up timing VDD Scaling

Page 3: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

FU TopLevel Main Units

ALU MADD Barrel Shifter

Supporting Modules Output Muxes Clock gating registers Crossbar

Page 4: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

IBM 65nm PDK

Process - cmos10lpe low power process very low leakage in power analysis

Standard cells cp65npksdst_tt1p2v25c

Page 5: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Shifter Specs

32-bit shifter with 5 shift bits Bi-directional shifting Logical and arithmetic shifting Purely combinational design

1GHz target frequency Want it as fast as possible Need to be power aware during

synthesis

Page 6: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Shifter Design

31’b0 X[31:0]

X[30:0] 31{X[31]}LEFT /

LOGICAL

Z

S[4]

S[3]

S[2]

S[1]

S[0]

Page 7: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

ALU Specs

32-bit ALU supporting Supports 15 instructions Combinational design

1GHz target frequency On critical path Want it as fast as possible Need to be power aware during

synthesis

Page 8: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

ALU Design Methodologies Muxed Output

Simple functions with muxed output

Gate off functions not in use

More gates Higher leakage,

lower switching

Hardware Reuse Do everything

with the adder Cannot gate the

adder Fewer gates

Lower leakage, higher switching

Page 9: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

ALU Design 1

+

A

B

AB

flipA

flipB

clearA

clearB

setA

setA

AB

P

G

Z

Z

Control

sel[1:0]

Page 10: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Power Results

Switching: (Syn. Model) 630 uW (3.55 uW)

Interconnect: 1.14 mW (3.94 mW)

Leakage: 135 nW (530 nW)

Total: 1.77 mW (7.5 mW)

Page 11: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

ALU Design 2

+

A

B

A

BO

Z

Control

sel[1:0]

en

en

latch

en

en

Page 12: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Power Results

Switching: (Syn. Model) 655 uW (3.55 uW)

Interconnect: 1.21 mW (3.94 mW)

Leakage: 160 nW (530 nW)

Total: 1.87 mW (7.5 mW)

Page 13: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

MADD Specs

32 bit multiply-add unit 2 cycle pipelined module Add input arrives on second cycle

1 GHz target frequency most power hungry module in design

need to be power aware during synthesis ideally would run as fast as possible may need to trade speed for power

(~700MHz)

Page 14: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

MADD Design

A

B

CLK

HeterogeneousBooth Enc

PP Generation

CSA TreeStage 1

D QRegisters

CLK

C CSA TreeStage 2

Final Adder Z

Page 15: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

VCS

Testbenches written to verify functionality using VCS random input vectors used for data instructions/shift encodings tested

sequentially

Page 16: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Design Compiler Compile to standard cell library

cp65npksdst_tt1p2v25c from IBM’s cmos10lpe compile to others for corner analysis (ff, 1p0v,

…) control target frequency and synthesize for

power Reports created

Power – inaccurate, but use as a baseline Area – reports number of gates in design Timing – design can’t always meet timing

Page 17: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

DC Example# standard cells that you synthesize toset target_library <libname>.dbset link_library <libname>.db

# prepare and synthesizeanalyze –f verilog <my_verilog_file>.velaborate <my_toplevel>current_design <my_toplevel>linkuniquifycompile_ultra –gate_clockcompile_ultra –incremental

# check for errors in the synthesized design (timing violations, cell warnings,…)check_designreport_constraint –all_violators

# write the output file in verilog netlist formatwrite –f verilog –output <filename>.vh

# output the timing or power or cell reportredirect timing/power/cell.rep { report_timing/cell/power }

Page 18: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

DC Example Output

Operating Conditions: TT1P2V25C Library: cp65npksdst_tt1p2v25cWire Load Model Mode: enclosed

Design Wire Load Model Library------------------------------------------------Alu B0.1X0.1 cp65npksdst_tt1p2v25c

Global Operating Voltage = 1.2 Power-specific unit information : Voltage Units = 1V Capacitance Units = 1.000000pf Time Units = 1ns Dynamic Power Units = 1mW (derived from V,C,T units) Leakage Power Units = 1nW

Cell Internal Power = 433.2152 uW (51%) Net Switching Power = 409.2202 uW (49%) ---------Total Dynamic Power = 842.4354 uW (100%)

Cell Leakage Power = 129.3405 nW

Page 19: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

PrimeTime power analysis

reports breakdown of power consumption internal switching intermediate nodes switching leakage

more detailed breakdown available memory, clock network, register, combinational

timing check - redundant at this stage no functional verification

use simulator for functionality vcs, ncsim

Page 20: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

PT Example# setuplink_library <libname>.dbread_verilog <netlist>.vhcurrent_design <my_toplevel>link

# for a design without an existing clock inputcreate_clock –name clock -period

# toggle_count is prob of switching, static is prob of being a 1set_switching_activity –toggle_count 0.25 –static_probability 0.5 <INPUT>

# get the power analysis and write details to Alu.rptcheck_powerupdate_powerreport_power > Alu.rpt

Page 21: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

PT Example Output

Attributes ---------- i - Including register clock pin internal power u - User defined power group

Internal Switching Leakage TotalPower Group Power Power Power Power ( %) Attrs--------------------------------------------------------------------------------------------io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%) memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%) black_box 0.0000 0.0000 0.0000 0.0000 ( 0.00%) clock_network 0.0000 0.0000 0.0000 0.0000 ( 0.00%) iregister 0.0000 0.0000 0.0000 0.0000 ( 0.00%) combinational 9.606e-04 1.053e-03 1.295e-07 2.014e-03 (100.00%) sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

Net Switching Power = 1.053e-03 (52.30%) Cell Internal Power = 9.606e-04 (47.70%) Cell Leakage Power = 1.295e-07 ( 0.01%) ---------Total Power = 2.014e-03 (100.00%)

Page 22: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Encounter

Features Place and Route Control the power and ground to all

cells Extract parasitic capacitances stream out gds for use with Cadence

Page 23: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

ALU Encounter Example

Page 24: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Encounter

Failures difficult to use impossible to save netlist views still need to use cadence tools to

generate SPICE netlist unable to extract parasitics

could still do this with Cadence

Page 25: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Cadence

Features read in a verilog netlist stream in standard cell layouts and

schematics stream in gds from Encounter create SPICE netlist

Page 26: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

ShiftLR Cadence Example

Page 27: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Cadence

Failures unable to properly stream in standard

cell schematics unable to create netlist from

schematic unable to run LVS or extract parasitics

Solution v2lvs

Page 28: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

v2lvs

enables a SPICE netlist from a synthesized

verilog netlist include SPICE definitions of standard

cells run HSPICE simulations for power

down/up sequence and VDD scaling

Page 29: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

v2lvs ExampleVerilog:

SEN_EO2_S_0P5 U2120 ( .A1(pprow4[11]), .A2(pprow5[9]), .X(n566) );SEN_EO2_S_0P5 U2121

( .A1(pprow4[13]), .A2(pprow5[11]), .X(n567) );SEN_EO2_S_0P5 U2122 ( .A1(pprow2[13]), .A2(pprow7[3]), .X(n568) );SEN_EO2_S_0P5 U2123 ( .A1(pprow2[15]), .A2(pprow7[5]), .X(n569) );

v2lvs:v2lvs -i -v ../synthesis/ShiftLR.vh -s0 VSS -s1 VDD -s

design_model.inc -o ShiftLR.sp -lsr cp65npksdst.lvs

HSPICE:XU2120 n566 pprow4[11] pprow5[9] SEN_EO2_S_0P5 XU2121 n567 pprow4[13] pprow5[11] SEN_EO2_S_0P5 XU2122 n568 pprow2[13] pprow7[3] SEN_EO2_S_0P5XU2123 n569 pprow2[15] pprow7[5] SEN_EO2_S_0P5

Page 30: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

HSPICE

Created simulation test-bench for power measurement using vector input

Adds potential VDD scaling and gating

Page 31: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Final Power Results

Page 32: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Synthesis Matters At 1 GHz, MADD power very dependent

on synthesis options

Internal Switching

Leakage Total

Naïve 11.2 mW

7.16 mW 1.07 uW 18.3 mW

Constrained

7.77 mW

4.56 mW 0.59 uW 12.3 mW

Ultra 4.08 mW

1.88 mW 0.30 uW 5.96 mW

Page 33: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Synthesis Matter contd. The lower power synthesis options, have

trouble reducing clock and register power

Clock Register Comb

Naïve 9.95% 13.0% 77.05%

Constrained 12.7% 14.8% 72.5%

Ultra 27.4% 12.9% 58.5%

Page 34: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Power-up time resultsW=0.6um M=1

Page 35: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Power-up time results contd.

W=0.6um M=12

Page 36: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Power-up time results contd.

W=6um M=12

Page 37: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Power-up time results contd.

W=6um M=120

Page 38: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Power-up time results contd.

Iavg during power-down = 10.66 uAPavg = 12.792 uWPower-up Delay = 9.4ps

Page 39: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Voltage Scaling - ALU

0

1

2

3

4

5

6

7

8

9

500 2500 4500 6500 8500 10500

Delay (ps)

Po

wer

(m

W) 1 GHz

Page 40: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Voltage Scaling – ShiftLR

0

0.1

0.2

0.3

0.4

0.5

0.6

100 1000 10000 100000

Delay (ps)

Po

wer

(m

W) 1 GHz

500 MHz

1.2V

1.0V

0.8V

0.6V

Page 41: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Results

Significantly reduced power for all modules

Explored voltage scaling Implemented power-up / power-

down sleep logic

Page 42: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Intangibles

Gained significant insight into the current state-of-the-art for low power FPGA and CGRA design, through reading

Gained practical knowledge working with the design tool chain of a commercial PDK

Page 43: Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Questions?