Upload
debra-simmons
View
218
Download
0
Embed Size (px)
Citation preview
Power-Aware RAM Processing for FPGAs December 9, 2005
Power-aware RAM Processing for FPGA
Embedded Memory Blocks
Russell Tessier
University of Massachusetts
Vaughn Betz, David Neto and Thiagaraja Gopalsamy
Altera Corporation
Power-Aware RAM Processing for FPGAs December 9, 2005
Overview° Operation of FPGA embedded memory blocks (EMBs)
° Power consumption in EMBs
° Opportunities for power saving
• Shut down clocks to memory core
° Three automated power saving techniques
• Unused memory port shutdown
• Memory control signal transform
• Memory mapping to multiple blocks
° Experimental results
Power-Aware RAM Processing for FPGAs December 9, 2005
FPGA Embedded Memory Blocks
° Embedded memory blocks (EMBs) are important parts of FPGAs
° Consume roughly 14% of Altera Stratix II dynamic power *
• Increasing in recent designs
* Stratix II Low Power Applications Note, 2005
Power-Aware RAM Processing for FPGAs December 9, 2005
Stratix II Embedded Memory Block – External View
° Input ports (data, address, control) are synchronous
° Mode 1: Single port (ignore Port B)
° Mode 2: True dual port
MemoryCore
Port A Data InPort A Address
Port A R/W Enable
Port A Data Out
Clock enables
Port B R/W Enable
Port B Data InPort B Address
Port B Data Out
Clock enables
Port A Port B
Power-Aware RAM Processing for FPGAs December 9, 2005
Stratix II Embedded Memory Block – External View
° Mode 3: Simple dual-port
• Large majority of RAM implementations
Port A Data InPort A Address
Port A Write Enable
MemoryCore
Clock enables
Port B Data Out
Clock enables
Port B Read Enable
Port B Data InPort B Address
Port A Port B
Power-Aware RAM Processing for FPGAs December 9, 2005
Embedded Memory Block Port Internal View
Write Data
MClk
MClk
Write Enable
PulseGen.
Column MuxWrite BuffersSense Amps
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LinePre-charge
Read Data
ReadEnable Latch
AddressMClk
MClkClk Enable
ClkMClk
Power-Aware RAM Processing for FPGAs December 9, 2005
Embedded Memory Block Port Read: Step 1
° Substantial power required to charge bit lines
BIT BIT
Bit LinePre-charge
Precharge BIT lines to VCC
MClkMClkClk Enable = 1
Clk
Power-Aware RAM Processing for FPGAs December 9, 2005
Embedded Memory Block Port Read: Step 2
MClk
Column MuxSense Amps
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LinePre-charge
MClk
Address
Data read out of RAM cells
MClkClk Enable = 1
Clk
Power-Aware RAM Processing for FPGAs December 9, 2005
Embedded Memory Block Port Read: Step 3
Read Data
MClk
Column MuxSense Amps
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LinePre-charge
ReadEnable = 1 Latch
MClk
AddressData passes through
latch toRead Data lines
MClkClk Enable = 1
Clk
Power-Aware RAM Processing for FPGAs December 9, 2005
Embedded Memory Block Port Read Summary
° If read clock enable = 0, steps 1 and 2 suppressed
° If read enable = 0, step 3 suppressedRead Data
MClk
Column MuxSense Amps
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LinePre-charge
ReadEnable Latch
Address
MClkClk Enable
Clk
MClk
Power-Aware RAM Processing for FPGAs December 9, 2005
Embedded Memory Block Port Write: Step 1
° Substantial power required to charge bit lines
BIT BIT
Bit LinePre-charge
Precharge BIT lines to VCC
MClkClk Enable = 1
Clk
MClk
Power-Aware RAM Processing for FPGAs December 9, 2005
Embedded Memory Block Port Write: Step 2
Write Data
MClk
Write Enable
PulseGen.
Column MuxWrite BuffersSense Amps
Bit LinePre-charge
MClk
Data loaded into write buffers based on write enable
MClkClk Enable = 1
Clk
MClk
Power-Aware RAM Processing for FPGAs December 9, 2005
Embedded Memory Block Port Write: Step 3
Write Data
MClk
MClk
Write Enable
PulseGen.
Column MuxWrite BuffersSense Amps
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LinePre-charge
MClkClk Enable
Clk
AddressMClk
Data loaded into RAM cells
MClk
Power-Aware RAM Processing for FPGAs December 9, 2005
Embedded Memory Block Port Write Summary
° If write clock enable = 0, steps 1, 2, and 3 suppressed
° If write enable = 0, step 2 suppressed Write Data
MClk
MClk
Write Enable
PulseGen.
Column MuxWrite BuffersSense Amps
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LinePre-charge
MClk
MClk
MClkClk Enable
Clk
Power-Aware RAM Processing for FPGAs December 9, 2005
Reducing RAM Power Consumption
° Each RAM element can use an enabled or free running clock
° Use enabled clocks rather than free running clocks to prevent bit-line pre-charge
° Only enable RAMs when access is necessary
° Read enable not always specified by designer
• Write enable created for functionality
MClkClk Enable
Clk
Power-Aware RAM Processing for FPGAs December 9, 2005
Power Optimization #1
° For single-port memories
• Tie Port B clock enable to GND
• Previously tied high, with write enable disabled
Port B ClockMemoryCore
Port A Data InPort A Address
Port A R/W Enable
Port A Data Out
Clock enables
Port A Port B
Port B Write Enable = 0
Shut Off
Power-Aware RAM Processing for FPGAs December 9, 2005
Single Port Optimization Experiments
° Determine power effect of shutting off clock to unused Port B
° Only impacts single-port RAM and ROM designs
° 43 Stratix II designs
• Large customer designs with memory
• Targeted to smallest achievable FPGA
• Hand-generated input vectors
° Quartus 5.0
° Target maximum frequency
Power-Aware RAM Processing for FPGAs December 9, 2005
Memory Power – Port Optimization° 9.2% average power reduction for designs with memories (only impacts
ROMs and single port memories)
Memory Dynamic Power
0102030405060
Designs
% P
ow
er R
edu
ctio
n
5 10 15 20 25 30 35 40
Power-Aware RAM Processing for FPGAs December 9, 2005
Dynamic Power - Port Optimization° 2.4% average power reduction for designs with memories (only
impacts ROMs and single port memories)
Dynamic Power
0
5
10
15
20
25
30
35
Designs
% P
ow
er R
edu
ctio
n
5 10 15 20 25 30 35 40
Power-Aware RAM Processing for FPGAs December 9, 2005
FPGA RAM Processing
° FIFOs and Shift registers converted into logical RAMs
° Logical RAMs broken into RAM blocks of sizes appropriate for physical implementation
° Each RAM block assigned to a physical embedded memory block
FIFO, Shift Register, RAM specification
Create Logical Memory
Logical RAMs
Logical-to-physical
RAM processing
RAM blocks/ logic
Memory/logic
placement
Placed Memory
Power-Aware RAM Processing for FPGAs December 9, 2005
FIFO Elaboration to Logical RAM
° Convert to logic and synchronous RAM with signal pattern found on EMB
Clock
Wrreq
counter counter
DataData
WriteAddress
ReadAddress
Q
Write enable
Read enable
Q
Rdreq
Vcc VccWr clkenable
Rd clkenable
Implemented in LUTs/FFs
Clock
Data
Wrreq
Data
Wrreq
Q
Rdreq
Q
Rdreq
BeforeAfter
Logical RAM
Power-Aware RAM Processing for FPGAs December 9, 2005
Power Optimization #2
° Convert EMB read enable/write enable signals to associated read/write clock enable signals
° Limitations
• Each port must have dedicated read or write enable signal (simple-dual port)
• Embedded memory block have read enable
Clock
Wren
DataData
WriteAddress
ReadAddress
Q
Write enable
Read enable
Q
Rden
Vcc VccWr clkenable
Rd clkenable
WriteAddress
ReadAddress
Clock
Wren
DataData
WriteAddress
ReadAddress
Q
Write enable
Read enable
Q
Rden
Vcc Vcc
Wr clkenable
Rd clkenable
WriteAddress
ReadAddress
Before After
Power-Aware RAM Processing for FPGAs December 9, 2005
Read Port Control Signal Equivalence
° Memory core inactive if read clock enable inactive
° Read operation will occur if both read enable and read clock enable are high
• One signal could be tied to VCC
MClk
Column MuxWrite Buffers
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LineConditioning
Read Address
ReadEnable
Latch
Read Data
MClkMClkVCC
Clk
Power-Aware RAM Processing for FPGAs December 9, 2005
Read Port Control Signal Equivalence
° If read clock enable = 0 and read enable = 1, read suppressed
° If read clock enable = 1 and read enable = 0, read suppressedRead Data
MClk
Column MuxSense Amps
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LinePre-charge
VCC LatchAddress
MClkMClkRead
enable
Clk
Power-Aware RAM Processing for FPGAs December 9, 2005
Write Port Control Signal Equivalence
° Memory core inactive if write clock enable is inactive
° Write operation will occur if both write enable and write clock enable are high
• One signal could be tied to VCC
Write Data
MClk
MClk
Write Enable
PulseGen.
Column MuxWrite Buffers
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LineConditioning
Write AddressMClk
MClkMClkVCC
Clk
Power-Aware RAM Processing for FPGAs December 9, 2005
Write Port Control Signal Equivalence
° If write clock enable = 0 and write enable = 1, write suppressed
° If write enable = 0 and write clock enable = 1, write suppressed Write Data
MClk
MClk
VCCPulseGen.
Column MuxWrite BuffersSense Amps
Row Decode
Column Decode
RAM cell
BIT BIT
Bit LinePre-charge
MClk
MClkMClkWrite
enable
Clk
Power-Aware RAM Processing for FPGAs December 9, 2005
Quartus II Implementation
° Conversion mode
• Quartus II default
• Ties off R/W enable to RAM clock enables
• Doesn’t make transform if CE already present on port
° Combining mode
• AND user RAM clock enables with derived R/W clock
• Could impact performance
MClk
Write Enable
ClkUser-defined Write Clk Enable
MClkClk
User-defined Write Clk Enable
Power-Aware RAM Processing for FPGAs December 9, 2005
Clock Enable Conversion Experiments
° 40 Stratix II RAM-based designs designs
° Quartus 5.1
° Target max frequency
° Quartus II simulation with test vectors
° Dynamic power evaluated with Quartus II PowerPlay power analyzer
° Covers the following optimizations• Automatic conversion of R/W enable to R/W clock enable
• Combining of R/W enable with existing R/W clock enable
Power-Aware RAM Processing for FPGAs December 9, 2005
Memory Power – Clock Enable Optimization° 9.7% average power reduction for convert and combine for all
designs (6.3% for convert only)
Memory Dynamic Power
-10
0
10
20
30
40
50
60
70
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Designs
% P
ow
er
Re
du
cti
on
Enable convert
Enable convert/combine
Power-Aware RAM Processing for FPGAs December 9, 2005
Core Dynamic Power – Clock Enable Optimization
° 2.6% average power reduction for convert and combine for all designs (1.8% for convert only)
Core Dynamic Power
-5
0
5
10
15
20
25
30
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Designs
% P
ow
er
Re
du
cti
on
Enable convert
Enable convert/combine
Power-Aware RAM Processing for FPGAs December 9, 2005
Mapping RAM to Multiple EMBs
° User-defined memory often too large to fit in one EMB
° Must use RAM in multiple EMBs to implement logical RAM
° Implementation choice can impact design area, performance, and power.
4k deep x 4 wide
16K bits 4K bits 4K bits 4K bits 4K bits
M4K M4K M4K M4K
User-defined (logical) memory
Physical (EMB) memory
Power-Aware RAM Processing for FPGAs December 9, 2005
Memory Organization
° Each EMB can be configured to have different depth and width (e.g. Stratix II M4K)
° All hold 4K bits
° Slightly lower power consumption for wider EMB configurations (not including routing)
4K words deep
1 bit wide
32 bits wide
128 words deep
8 bits wide
512 words deep
Power-Aware RAM Processing for FPGAs December 9, 2005
Area and Delay Optimal Mapping
° Configure each EMB to be as deep as possible
° Number of address bits on each EMB same as on logical memory
° Area and performance efficient: no external logic needed
° Power inefficient: All EMBs must be active during each logical RAM access
4k words deep and 1 bit wide(4 times)
Addr[0:11]
Data[0:3]
4k words deep and 4 bits wide
Logical memory
4 EMBs active during access
EMB
Vertical Slicing
Power-Aware RAM Processing for FPGAs December 9, 2005
Alternative Mapping
° Configure EMB to have width of logical RAM (e.g. 1Kx4)
• Allows shutdown of some RAMs each cycle
• But adds some logic
° Saves RAM power, adds combinational logic and register power
More Power Efficient:
1K deep x 4 wide
(4 times)
1 EMB active during access
AddrDecoder
4
Addr[0:9]
Addr[10:11]
Data[0:3]
4k words deep and 4 bits wide
Logical memory
Addr[10:11]
Horizontal Slicing
Power-Aware RAM Processing for FPGAs December 9, 2005
RAM Slicing - Example
° Power reduction available with different slicing
4kx32 Dynamic Power
0
20
40
60
80
100
120
140
Maximum Depth
Dyn
amic
Po
wer
(m
W)
4kx32
Best range
Multiplexer Power Increasing
128 256 512 1k 2k 4k
EMB Power Increasing
Power-Aware RAM Processing for FPGAs December 9, 2005
Power Optimization #3: Power-aware RAM Partitioning
° Power optimal EMB configuration often between “horizontal” and “vertical”
° Need algorithm to consider possible logical to physical RAM mappings
Completed placementMemory/Logic
Placement
Insert Decode and Mux Logic
Power-awareRAM Partitioner
FIFO, Shift Register
Create Logical Memory
Logical RAMs
Logical to Physical RAM
processing
RAM blocks/Logic
Power-Aware RAM Processing for FPGAs December 9, 2005
Power-aware RAM Partitioning Algorithm
° For each EMB type
• For each EMB depth versus width configuration
- Determine number of required EMBs, decoder, and output mux circuits
- Estimate power of RAM access (active EMBs, decoder, and output mux)
- Limit to four-way muxing at most
• Save lowest power configuration
° Rank possible EMB implementations by power
° Select lowest-power, feasible choice
• Check if EMB usage overflowed by choice
• If yes, select next choice
Power-Aware RAM Processing for FPGAs December 9, 2005
Experimental Approach
° Simulation and power estimation performed
• Multi-bit input multiplexers
• Decoders
• EMB blocks in different configurations
° 40 designs evaluated
° Quartus 5.1
° Mapped to smallest possible device and target max frequency
° Simulation with test vectors, power analysis with PowerPlay
° Approach used in combination with clock enable conversion and combining
Power-Aware RAM Processing for FPGAs December 9, 2005
Memory Power
° 21.0% average power reduction for all techniques for memory designs (9.7% with only enable convert/combine)
-10
0
10
20
30
40
50
60
70
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Designs
% D
yn
Po
we
r R
ed
uc
tio
n
Enable convert/combine
Enable convert/combine + Mempartition
Power-Aware RAM Processing for FPGAs December 9, 2005
Overall Core Dynamic Power
° 6.8% average power reduction for all techniques for memory designs (2.6% with convert/combine)
-5
0
5
10
15
20
25
30
35
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Designs
% D
yn.
Po
wer
Red
uct
ion
Enable convert/combine
Enable convert/combine + mempartition
Power-Aware RAM Processing for FPGAs December 9, 2005
Design Performance
° 1.0% average performance loss for all techniques (0.1% for enable convert/combine)
Average Design Clock Frequency
-30
-25
-20
-15
-10
-5
0
5
10
Designs
% F
req
uen
cy Im
pro
vem
ent
EnableConvert/Combine
EnableConvert/Combine +Mem Partition
Power-Aware RAM Processing for FPGAs December 9, 2005
Results Summary
° Almost 7% core dynamic power reduction across all designs
• Some designs benefit more than others
° Minimal clock frequency hit for most designs
Enable convert
Enable convert/ combine
Enable convert/
combine + Mem
partition
Core dynamic power -1.8% -2.6% -6.8%
Memory dynamic power -6.3% -9.7% -21.0%
Max clk freq -0.1% -0.2% -1.0%
LUT count 0.0% 0.1% 0.7%
Power-Aware RAM Processing for FPGAs December 9, 2005
Impact of Multiple Embedded Memory Blocks° Rerun 40 designs but only allow one type of target EMB for each
mapping
° All designs targeted to Stratix II EP2S180
° Significant power impact for most designs versus EP2S180 target with no restrictions
M512 M4K M-RAM
Designs completed 23 38 4
Core dynamic power 40.4% 6.6% 47.3%
Memory power 279.5% 33.3% 754.0%
Max clk freq. -2.2% 0.6% -1.0%
LUT count 0.4% -0.5% 0.0%
Power-Aware RAM Processing for FPGAs December 9, 2005
Summary
° Key to reducing RAM power is keeping clocks disabled.
° Single port RAMs a straightforward optimization
° Movement of read/write enables to clock enables limits dynamic activity
° Power-aware RAM partitioner attempts to select power-optimal mapping – combined with clock enable enhancement
° Overall
• About 30% average memory power reduction
- 9% single port optimization
- 21% enable convert/combine and memory partitioning
• About 9% average dynamic power reduction
- 2% single port optimization
- 7% enable convert/combine and memory partitioning