Upload
jeff-leavitt
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Commercial FPGAs: Altera Stratix Family
Dr. Philip BriskDepartment of Computer Science and Engineering
University of California, Riverside
CS 223
Notes on These Slides
• Altera has disclosed the details of their devices both in online documentation and academic papers
• The academic papers evaluate different design decisions and tradeoffs; the experiments are a bit too specialized for this course. – Please do not overly emphasize the
experimentation in your studies
The Stratix TM Routing and Logic Architecture
D.M. Lewis, et al.,International Symposium on FPGAs, 2003
Online documentation
Altera Stratix FPGA
Stratix Logic Element (LE)
Register Feedback Mode
Register Cascade (Shift Regs.)
Logic Array Block (LAB)
Directionally Biased Routing
• Long vertical wires require power drivers– Fewer vertical wires
• More rows than columns– More demand for
horizontal wires
The Stratix II Logic and Routing Architecture
D.M. Lewis, et al.,International Symposium on FPGAs, 2005
Online documentation
Logic Array Block (LAB)
Adaptive Logic Module (ALM)
Adaptive Logic Module (ALM)
Four ALM Operating Modes
• Normal Mode• Extended LUT Mode• Arithmetic Mode• Shared Arithmetic Mode
Normal Mode
LUT Input Utilization
Extended LUT Mode
• Some 7-input logic functions
Arithmetic Mode
Arithmetic Mode ExampleR = (X < Y) ? Y : X
(X < Y)• Compute X-Y using the carry
chain• Only look at the carry output• Use the carry output to select
either X or Y accordingly
Configure the LUTs to pass X through unmodified, and ignore the carry chain outputs
Shared Arithmetic Mode (3-input Add)
Register Chain (Shift Registers)
Separates logic and shift register functions• Cycle 1
• Combination logic• Cycles 2..k+1
• Shift by k
…
ALM Benefits
• Reduced LAB area by 2.6% compared to Stratix• 15% performance improvement• When shrinking from a 0.13um(Stratix) to 90nm
(Stratix II) technology node– 51% performance improvement– 50% area decrease
TriMatrix Embedded Memories
M512 RAM Block
Functions• 1-port RAM• 2-port RAM• FIFO• ROM• Shift Register
576 RAM bits (32 x 18), includes parity bits
M4K RAM Block
4,608 RAM bits (128 x 36), includes parity bits
Functions• 1-port RAM• 2-port RAM• True 2-port
RAM• FIFO• ROM• Shift Register
M-RAM Block
589,824 RAM bits (4K x 144), includes parity bits
Functions• 1-port RAM• 2-port RAM• True 2-port
RAM• FIFO
MRAM LAB Interface
DSP Blocks
• Eight 9x9 multipliers• Four 18x18 multipliers• One 36x36 multiplier
Add/Sub/Accum Functions• Multiplier• Multiply-Accum• AB + CD• AB + CD + EF + GH
DSP BlockInternals
DSP Block Interconnect Interface
Architectural Enhancements in Stratix-IIITM and Stratix-IVTM
D.M. Lewis, et al.,International Symposium on FPGAs, 2009
Online documentation (Stratix III)
Online documentation (Stratix IV)
New Features
• Programmable power management• LUT-RAM• LUT-Register Mode• Enhanced DSP Block
Programmable Body Bias Control
Large regions• Less body bias control circuitrySmall regions• Fine-grained power mgmt
Power Efficiency
LUT-RAM
SRAM
SRAM
SRAM
SRAM
x yIdea• Use the SRAM bits as memory• Granularity is LAB-wide
What is needed?• Write capability• Signals for address and data for
the write path
LUT-RAM ArchitectureSupports one read + one write in a single cycle
MLAB vs. LAB
ALM LUT-Register Mode
https://upload.wikimedia.org/wikipedia/commons/c/c6/R-S_mk2.gif
ALM LUT-Register Mode
DSP Block Capabilities• High-performance, power-optimized, fully registered and pipelined multiplication
operations• Natively supported 9-bit, 12-bit, 18-bit, and 36-bit wordlengths• Natively supported 18-bit complex multiplications• Efficiently supported floating-point arithmetic formats (24-bit for single precision
and 53-bit for double precision)• Signed and unsigned input support• Built-in addition, subtraction, and accumulation units to combine multiplication• results efficiently• Cascading 18-bit input bus to form tap-delay line for filtering applications• Cascading 44-bit output bus to propagate output results from one block to the
next block without external logic support• Rich and flexible arithmetic rounding and saturation units• Efficient barrel shifter support• Loopback capability to support adaptive filtering
DSP Block Overview
Multiply-Add
4-Multiply Add w/Accumulation
Cascading Output for FIR Filters
Full DSP Block
Half-DSP Block Architecture
Four 9-bit Independent Half-DSP Multiplier Mode
Three 12-bit Independent Half-DSP Multiplier Mode
Two 18-bit Independent Half-DSP Multiplier Mode
36-bit Half-DSP Multiplier Mode
54x54-bit Multiplier Mode
Used for double-precision floating-point
Architectural Enhancements in Stratix-VTM
D.M. Lewis, et al.,International Symposium on FPGAs, 2013
Online documentation
Larger MLAB/LUT-RAM
4 Flip-Flops per ALM
Embedded Memories with Error Correction Codes (ECC)