44
Power-Aware RAM Processing for FPGAs December 9, 2005 Power-aware RAM Processing for FPGA Embedded Memory Blocks Russell Tessier University of Massachusetts Vaughn Betz, David Neto and Thiagaraja Gopalsamy Altera Corporation

Power-Aware RAM Processing for FPGAs December 9, 2005 Power-aware RAM Processing for FPGA Embedded Memory Blocks Russell Tessier University of Massachusetts

Embed Size (px)

Citation preview

Power-Aware RAM Processing for FPGAs December 9, 2005

Power-aware RAM Processing for FPGA

Embedded Memory Blocks

Russell Tessier

University of Massachusetts

Vaughn Betz, David Neto and Thiagaraja Gopalsamy

Altera Corporation

Power-Aware RAM Processing for FPGAs December 9, 2005

Overview° Operation of FPGA embedded memory blocks (EMBs)

° Power consumption in EMBs

° Opportunities for power saving

• Shut down clocks to memory core

° Three automated power saving techniques

• Unused memory port shutdown

• Memory control signal transform

• Memory mapping to multiple blocks

° Experimental results

Power-Aware RAM Processing for FPGAs December 9, 2005

FPGA Embedded Memory Blocks

° Embedded memory blocks (EMBs) are important parts of FPGAs

° Consume roughly 14% of Altera Stratix II dynamic power *

• Increasing in recent designs

* Stratix II Low Power Applications Note, 2005

Power-Aware RAM Processing for FPGAs December 9, 2005

Stratix II Embedded Memory Block – External View

° Input ports (data, address, control) are synchronous

° Mode 1: Single port (ignore Port B)

° Mode 2: True dual port

MemoryCore

Port A Data InPort A Address

Port A R/W Enable

Port A Data Out

Clock enables

Port B R/W Enable

Port B Data InPort B Address

Port B Data Out

Clock enables

Port A Port B

Power-Aware RAM Processing for FPGAs December 9, 2005

Stratix II Embedded Memory Block – External View

° Mode 3: Simple dual-port

• Large majority of RAM implementations

Port A Data InPort A Address

Port A Write Enable

MemoryCore

Clock enables

Port B Data Out

Clock enables

Port B Read Enable

Port B Data InPort B Address

Port A Port B

Power-Aware RAM Processing for FPGAs December 9, 2005

Embedded Memory Block Port Internal View

Write Data

MClk

MClk

Write Enable

PulseGen.

Column MuxWrite BuffersSense Amps

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LinePre-charge

Read Data

ReadEnable Latch

AddressMClk

MClkClk Enable

ClkMClk

Power-Aware RAM Processing for FPGAs December 9, 2005

Embedded Memory Block Port Read: Step 1

° Substantial power required to charge bit lines

BIT BIT

Bit LinePre-charge

Precharge BIT lines to VCC

MClkMClkClk Enable = 1

Clk

Power-Aware RAM Processing for FPGAs December 9, 2005

Embedded Memory Block Port Read: Step 2

MClk

Column MuxSense Amps

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LinePre-charge

MClk

Address

Data read out of RAM cells

MClkClk Enable = 1

Clk

Power-Aware RAM Processing for FPGAs December 9, 2005

Embedded Memory Block Port Read: Step 3

Read Data

MClk

Column MuxSense Amps

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LinePre-charge

ReadEnable = 1 Latch

MClk

AddressData passes through

latch toRead Data lines

MClkClk Enable = 1

Clk

Power-Aware RAM Processing for FPGAs December 9, 2005

Embedded Memory Block Port Read Summary

° If read clock enable = 0, steps 1 and 2 suppressed

° If read enable = 0, step 3 suppressedRead Data

MClk

Column MuxSense Amps

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LinePre-charge

ReadEnable Latch

Address

MClkClk Enable

Clk

MClk

Power-Aware RAM Processing for FPGAs December 9, 2005

Embedded Memory Block Port Write: Step 1

° Substantial power required to charge bit lines

BIT BIT

Bit LinePre-charge

Precharge BIT lines to VCC

MClkClk Enable = 1

Clk

MClk

Power-Aware RAM Processing for FPGAs December 9, 2005

Embedded Memory Block Port Write: Step 2

Write Data

MClk

Write Enable

PulseGen.

Column MuxWrite BuffersSense Amps

Bit LinePre-charge

MClk

Data loaded into write buffers based on write enable

MClkClk Enable = 1

Clk

MClk

Power-Aware RAM Processing for FPGAs December 9, 2005

Embedded Memory Block Port Write: Step 3

Write Data

MClk

MClk

Write Enable

PulseGen.

Column MuxWrite BuffersSense Amps

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LinePre-charge

MClkClk Enable

Clk

AddressMClk

Data loaded into RAM cells

MClk

Power-Aware RAM Processing for FPGAs December 9, 2005

Embedded Memory Block Port Write Summary

° If write clock enable = 0, steps 1, 2, and 3 suppressed

° If write enable = 0, step 2 suppressed Write Data

MClk

MClk

Write Enable

PulseGen.

Column MuxWrite BuffersSense Amps

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LinePre-charge

MClk

MClk

MClkClk Enable

Clk

Power-Aware RAM Processing for FPGAs December 9, 2005

Reducing RAM Power Consumption

° Each RAM element can use an enabled or free running clock

° Use enabled clocks rather than free running clocks to prevent bit-line pre-charge

° Only enable RAMs when access is necessary

° Read enable not always specified by designer

• Write enable created for functionality

MClkClk Enable

Clk

Power-Aware RAM Processing for FPGAs December 9, 2005

Power Optimization #1

° For single-port memories

• Tie Port B clock enable to GND

• Previously tied high, with write enable disabled

Port B ClockMemoryCore

Port A Data InPort A Address

Port A R/W Enable

Port A Data Out

Clock enables

Port A Port B

Port B Write Enable = 0

Shut Off

Power-Aware RAM Processing for FPGAs December 9, 2005

Single Port Optimization Experiments

° Determine power effect of shutting off clock to unused Port B

° Only impacts single-port RAM and ROM designs

° 43 Stratix II designs

• Large customer designs with memory

• Targeted to smallest achievable FPGA

• Hand-generated input vectors

° Quartus 5.0

° Target maximum frequency

Power-Aware RAM Processing for FPGAs December 9, 2005

Memory Power – Port Optimization° 9.2% average power reduction for designs with memories (only impacts

ROMs and single port memories)

Memory Dynamic Power

0102030405060

Designs

% P

ow

er R

edu

ctio

n

5 10 15 20 25 30 35 40

Power-Aware RAM Processing for FPGAs December 9, 2005

Dynamic Power - Port Optimization° 2.4% average power reduction for designs with memories (only

impacts ROMs and single port memories)

Dynamic Power

0

5

10

15

20

25

30

35

Designs

% P

ow

er R

edu

ctio

n

5 10 15 20 25 30 35 40

Power-Aware RAM Processing for FPGAs December 9, 2005

FPGA RAM Processing

° FIFOs and Shift registers converted into logical RAMs

° Logical RAMs broken into RAM blocks of sizes appropriate for physical implementation

° Each RAM block assigned to a physical embedded memory block

FIFO, Shift Register, RAM specification

Create Logical Memory

Logical RAMs

Logical-to-physical

RAM processing

RAM blocks/ logic

Memory/logic

placement

Placed Memory

Power-Aware RAM Processing for FPGAs December 9, 2005

FIFO Elaboration to Logical RAM

° Convert to logic and synchronous RAM with signal pattern found on EMB

Clock

Wrreq

counter counter

DataData

WriteAddress

ReadAddress

Q

Write enable

Read enable

Q

Rdreq

Vcc VccWr clkenable

Rd clkenable

Implemented in LUTs/FFs

Clock

Data

Wrreq

Data

Wrreq

Q

Rdreq

Q

Rdreq

BeforeAfter

Logical RAM

Power-Aware RAM Processing for FPGAs December 9, 2005

Power Optimization #2

° Convert EMB read enable/write enable signals to associated read/write clock enable signals

° Limitations

• Each port must have dedicated read or write enable signal (simple-dual port)

• Embedded memory block have read enable

Clock

Wren

DataData

WriteAddress

ReadAddress

Q

Write enable

Read enable

Q

Rden

Vcc VccWr clkenable

Rd clkenable

WriteAddress

ReadAddress

Clock

Wren

DataData

WriteAddress

ReadAddress

Q

Write enable

Read enable

Q

Rden

Vcc Vcc

Wr clkenable

Rd clkenable

WriteAddress

ReadAddress

Before After

Power-Aware RAM Processing for FPGAs December 9, 2005

Read Port Control Signal Equivalence

° Memory core inactive if read clock enable inactive

° Read operation will occur if both read enable and read clock enable are high

• One signal could be tied to VCC

MClk

Column MuxWrite Buffers

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LineConditioning

Read Address

ReadEnable

Latch

Read Data

MClkMClkVCC

Clk

Power-Aware RAM Processing for FPGAs December 9, 2005

Read Port Control Signal Equivalence

° If read clock enable = 0 and read enable = 1, read suppressed

° If read clock enable = 1 and read enable = 0, read suppressedRead Data

MClk

Column MuxSense Amps

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LinePre-charge

VCC LatchAddress

MClkMClkRead

enable

Clk

Power-Aware RAM Processing for FPGAs December 9, 2005

Write Port Control Signal Equivalence

° Memory core inactive if write clock enable is inactive

° Write operation will occur if both write enable and write clock enable are high

• One signal could be tied to VCC

Write Data

MClk

MClk

Write Enable

PulseGen.

Column MuxWrite Buffers

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LineConditioning

Write AddressMClk

MClkMClkVCC

Clk

Power-Aware RAM Processing for FPGAs December 9, 2005

Write Port Control Signal Equivalence

° If write clock enable = 0 and write enable = 1, write suppressed

° If write enable = 0 and write clock enable = 1, write suppressed Write Data

MClk

MClk

VCCPulseGen.

Column MuxWrite BuffersSense Amps

Row Decode

Column Decode

RAM cell

BIT BIT

Bit LinePre-charge

MClk

MClkMClkWrite

enable

Clk

Power-Aware RAM Processing for FPGAs December 9, 2005

Quartus II Implementation

° Conversion mode

• Quartus II default

• Ties off R/W enable to RAM clock enables

• Doesn’t make transform if CE already present on port

° Combining mode

• AND user RAM clock enables with derived R/W clock

• Could impact performance

MClk

Write Enable

ClkUser-defined Write Clk Enable

MClkClk

User-defined Write Clk Enable

Power-Aware RAM Processing for FPGAs December 9, 2005

Clock Enable Conversion Experiments

° 40 Stratix II RAM-based designs designs

° Quartus 5.1

° Target max frequency

° Quartus II simulation with test vectors

° Dynamic power evaluated with Quartus II PowerPlay power analyzer

° Covers the following optimizations• Automatic conversion of R/W enable to R/W clock enable

• Combining of R/W enable with existing R/W clock enable

Power-Aware RAM Processing for FPGAs December 9, 2005

Memory Power – Clock Enable Optimization° 9.7% average power reduction for convert and combine for all

designs (6.3% for convert only)

Memory Dynamic Power

-10

0

10

20

30

40

50

60

70

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Designs

% P

ow

er

Re

du

cti

on

Enable convert

Enable convert/combine

Power-Aware RAM Processing for FPGAs December 9, 2005

Core Dynamic Power – Clock Enable Optimization

° 2.6% average power reduction for convert and combine for all designs (1.8% for convert only)

Core Dynamic Power

-5

0

5

10

15

20

25

30

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Designs

% P

ow

er

Re

du

cti

on

Enable convert

Enable convert/combine

Power-Aware RAM Processing for FPGAs December 9, 2005

Mapping RAM to Multiple EMBs

° User-defined memory often too large to fit in one EMB

° Must use RAM in multiple EMBs to implement logical RAM

° Implementation choice can impact design area, performance, and power.

4k deep x 4 wide

16K bits 4K bits 4K bits 4K bits 4K bits

M4K M4K M4K M4K

User-defined (logical) memory

Physical (EMB) memory

Power-Aware RAM Processing for FPGAs December 9, 2005

Memory Organization

° Each EMB can be configured to have different depth and width (e.g. Stratix II M4K)

° All hold 4K bits

° Slightly lower power consumption for wider EMB configurations (not including routing)

4K words deep

1 bit wide

32 bits wide

128 words deep

8 bits wide

512 words deep

Power-Aware RAM Processing for FPGAs December 9, 2005

Area and Delay Optimal Mapping

° Configure each EMB to be as deep as possible

° Number of address bits on each EMB same as on logical memory

° Area and performance efficient: no external logic needed

° Power inefficient: All EMBs must be active during each logical RAM access

4k words deep and 1 bit wide(4 times)

Addr[0:11]

Data[0:3]

4k words deep and 4 bits wide

Logical memory

4 EMBs active during access

EMB

Vertical Slicing

Power-Aware RAM Processing for FPGAs December 9, 2005

Alternative Mapping

° Configure EMB to have width of logical RAM (e.g. 1Kx4)

• Allows shutdown of some RAMs each cycle

• But adds some logic

° Saves RAM power, adds combinational logic and register power

More Power Efficient:

1K deep x 4 wide

(4 times)

1 EMB active during access

AddrDecoder

4

Addr[0:9]

Addr[10:11]

Data[0:3]

4k words deep and 4 bits wide

Logical memory

Addr[10:11]

Horizontal Slicing

Power-Aware RAM Processing for FPGAs December 9, 2005

RAM Slicing - Example

° Power reduction available with different slicing

4kx32 Dynamic Power

0

20

40

60

80

100

120

140

Maximum Depth

Dyn

amic

Po

wer

(m

W)

4kx32

Best range

Multiplexer Power Increasing

128 256 512 1k 2k 4k

EMB Power Increasing

Power-Aware RAM Processing for FPGAs December 9, 2005

Power Optimization #3: Power-aware RAM Partitioning

° Power optimal EMB configuration often between “horizontal” and “vertical”

° Need algorithm to consider possible logical to physical RAM mappings

Completed placementMemory/Logic

Placement

Insert Decode and Mux Logic

Power-awareRAM Partitioner

FIFO, Shift Register

Create Logical Memory

Logical RAMs

Logical to Physical RAM

processing

RAM blocks/Logic

Power-Aware RAM Processing for FPGAs December 9, 2005

Power-aware RAM Partitioning Algorithm

° For each EMB type

• For each EMB depth versus width configuration

- Determine number of required EMBs, decoder, and output mux circuits

- Estimate power of RAM access (active EMBs, decoder, and output mux)

- Limit to four-way muxing at most

• Save lowest power configuration

° Rank possible EMB implementations by power

° Select lowest-power, feasible choice

• Check if EMB usage overflowed by choice

• If yes, select next choice

Power-Aware RAM Processing for FPGAs December 9, 2005

Experimental Approach

° Simulation and power estimation performed

• Multi-bit input multiplexers

• Decoders

• EMB blocks in different configurations

° 40 designs evaluated

° Quartus 5.1

° Mapped to smallest possible device and target max frequency

° Simulation with test vectors, power analysis with PowerPlay

° Approach used in combination with clock enable conversion and combining

Power-Aware RAM Processing for FPGAs December 9, 2005

Memory Power

° 21.0% average power reduction for all techniques for memory designs (9.7% with only enable convert/combine)

-10

0

10

20

30

40

50

60

70

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Designs

% D

yn

Po

we

r R

ed

uc

tio

n

Enable convert/combine

Enable convert/combine + Mempartition

Power-Aware RAM Processing for FPGAs December 9, 2005

Overall Core Dynamic Power

° 6.8% average power reduction for all techniques for memory designs (2.6% with convert/combine)

-5

0

5

10

15

20

25

30

35

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Designs

% D

yn.

Po

wer

Red

uct

ion

Enable convert/combine

Enable convert/combine + mempartition

Power-Aware RAM Processing for FPGAs December 9, 2005

Design Performance

° 1.0% average performance loss for all techniques (0.1% for enable convert/combine)

Average Design Clock Frequency

-30

-25

-20

-15

-10

-5

0

5

10

Designs

% F

req

uen

cy Im

pro

vem

ent

EnableConvert/Combine

EnableConvert/Combine +Mem Partition

Power-Aware RAM Processing for FPGAs December 9, 2005

Results Summary

° Almost 7% core dynamic power reduction across all designs

• Some designs benefit more than others

° Minimal clock frequency hit for most designs

Enable convert

Enable convert/ combine

Enable convert/

combine + Mem

partition

Core dynamic power -1.8% -2.6% -6.8%

Memory dynamic power -6.3% -9.7% -21.0%

Max clk freq -0.1% -0.2% -1.0%

LUT count 0.0% 0.1% 0.7%

Power-Aware RAM Processing for FPGAs December 9, 2005

Impact of Multiple Embedded Memory Blocks° Rerun 40 designs but only allow one type of target EMB for each

mapping

° All designs targeted to Stratix II EP2S180

° Significant power impact for most designs versus EP2S180 target with no restrictions

M512 M4K M-RAM

Designs completed 23 38 4

Core dynamic power 40.4% 6.6% 47.3%

Memory power 279.5% 33.3% 754.0%

Max clk freq. -2.2% 0.6% -1.0%

LUT count 0.4% -0.5% 0.0%

Power-Aware RAM Processing for FPGAs December 9, 2005

Summary

° Key to reducing RAM power is keeping clocks disabled.

° Single port RAMs a straightforward optimization

° Movement of read/write enables to clock enables limits dynamic activity

° Power-aware RAM partitioner attempts to select power-optimal mapping – combined with clock enable enhancement

° Overall

• About 30% average memory power reduction

- 9% single port optimization

- 21% enable convert/combine and memory partitioning

• About 9% average dynamic power reduction

- 2% single port optimization

- 7% enable convert/combine and memory partitioning