Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1 Helena Krupnova©
FPGA TechnologySnapshot :
Current Devices andDesign Tools
Helena Krupnova, Gabriele SaucierCSI/INPG, Grenoble, France
2 Helena Krupnova©
Outline
o Motivationso Latest FPGA devices overview
l Xilinx, Altera, Actel, Lucent FPGAsl quantitative and qualitative comparisons
o FPGA design toolsl flowsl hot issuesl strategy comparisons and analysis
o Conclusions
3 Helena Krupnova©
INPG / Design&Reuse Trainings onSoC Design Using DesignAnd Prototyping Platforms
o 8 Trainings since March 1998l Grenoble (France), Noida (India), Santa Clara (USA)
o About 250 participants in totalo Topics:
l latest FPGA devices & design toolsl overview of the FPGA-based emulation and prototyping boardsl IP and macro blocks available for FPGA implementationsl multi-FPGA partitioningl case studiesl invited lectures (BARCO Silex, Intel, BULL, STMicroelectronics)l vendor presentations
o Future seminars:l August 2000, Taiwan; October 2000, UK
4 Helena Krupnova©
Latest FPGA Devices
o Increased capacityl several Mlns PLD gates
o Hierarchical structuresl logic, routing hierarchies
o Embedded memoriesl RAM (single/dual port), ROM, CAM, FIFO supportl implementation of general logic in Embedded Array Blocks
o Enhanced embedded arithmetic resourcesl carry, cascade chains, dedicated adder, multiplier, counter
support
o Support of different I/O standards (LVDS)o Clock management circuits (DLL, PLL)
5 Helena Krupnova©
FPGA Characteristics
Device Family LargestDevice
Capacity,Basic Cells
Capacity,Eq Gates
MAX User IOPin Number
Technology Process
Xilinx 4000 XC40250XV 8,464 CLBs 250,000 448 SRAM 0.25µXilinx Virtex,
Virtex-EXCV3200E 73,008 LCs 4,074,387 804 SRAM 0.18µ
Xilinx Virtex-EExtendedMemory
XCV812E 21,168 LCs 254,016(logicgates)
556 SRAM 0.18µ
AlteraFLEX 8000
81500 1,296 LEs 16,000 208 SRAM 0.5µ
AlteraFLEX 10K
EPF10K250 12,160 LEs 250,000 470 SRAM 0.25µ
AlteraApex20K,Apex20KE
EP20K1500E 51,840 LEs 1,500,000 808 SRAM 0.18µ
AlteraACEX 1K
EP1K100 4,992 LEs 100,000 333 SRAM 0.18µ
ActelPROASIC
A500K510 51,200 tiles 410,000 623 Flash 0.25µ
LucentORCA
OR3L225B 11,552 LUTs 340,000 612 SRAM 0.25µ
6 Helena Krupnova©
FPGA CharacteristicsDevice Family Logic
PrimitiveBasic Cell
(CLB)Architecture Interconnect Carry
ChainsMemories Clock
ManagementXilinx4000
LUT4 2 4-input LUTs, 1 3-input LUT, 2 FF per
CLB (F, G, H-functiongenerators)
CLBs organized in rows andcolumns
CLB routing : vertical andhorizontal channels (single-length, double-length, quad,octal lines and long lines),
Programmable Switch Matrices
YES 270K RAM bits ofDistributed Select RAM
N/A
XilinxVirtex,
Virtex-E
LUT4 2 LUTs per SLICE, 2SLICES per CLB,
4 FF per CLB
CLBs organized in rows andcolumns
FastConnect – intra-CLB, localrouting, general purpose routing
(horizontal and verticalchannels)
YES 851K RAM bits ofBlockRAM + 1M of
Distributed Select RAM,
DLL
XilinxVirtex-E
ExtendedMemory
LUT4 2 LUTs per SLICE, 2SLICES per CLB,
4 FF per CLB
CLBs organized in rows andcolumns
FastConnect – intra-CLB, localrouting, general purpose routing
(horizontal and verticalchannels)
YES 1,146K Block RAM bits +301K Distributed Select
RAM
DLL
AlteraFLEX8000
LUT4 1 LUT, 1 FF per LogicElement (LE)
8 LEs per LAB, lABs organized inrows and columns
Local Interconnect within LAB,Row&Column Interconnect
YES N/A N/A
AlteraFLEX 10K
LUT4 1 LUT, 1 FF per LogicElement (LE)
8 LEs per LAB, LABs organized inrows ans columns
Local Interconnect within LAB,Row&Column Interconnect
YES 41K RAM bits ofEmbedded Array Blocks
(EABs)
PLL
AlteraApex20K
Apex20KE
LUT4 1 LUT, 1 FF per LogicElement (LE)
MultiCore : LUT, Product Term,Memory ; 10 Les per LAB,
16 LABs per MegaLAB,MegaLABs organized in rows and
columns
Local Interconnect within LAB,MegaLAB Interconnect,
Row&Column Interconnect
YES ESB capable toimplement RAM, ROM,
CAM, ProductTerm,442K RAM
PLL
AlteraACEX 1K
LUT4 1 LUT, 1 FF per LogicElement (LE)
8 LEs per LAB, lABs organized inrows and columns
Local Interconnect within LAB,Row&Column InterconnectFull-length, half length row
channels
YES EABs (49K bits) PLL
ActelPROASIC
3input-1output
tile
Tile can be configuredeither as a comb. Cell
or as FF
Tiles are organized in blocks,blocks of tiles are organized in
rows and columns
Local network, long linenetwork, bus network, global
network
NO Max of 60 EmbeddedRAM Blocks (256x9)
N/A
LucentORCA
LUT4 PLC consists of a PFUand a SLIC ;
PFU contains 8 LUT4and 9 FFs
Array of PLCs, the whole array issplit in four quadrants
Segmented hierarchicalrouting : Interquad routing, inter
PLC routing
YES 185K : implemented asa special operatingmode of the PLCs
PCM
7 Helena Krupnova©
FPGA Density Cross ReferenceDevice Family Largest
DeviceLogicBlock
# Logic Cells(1)
per LogicBlock
Capacity,Logic Cells
# Memory bit Capacity,Typical Gates
(Logic andRAM)
Capacity, Logic Gates(no memory)
Xilinx 4000 XC40250XV CLB 2.375 8,464 CLBs 270K(2) 440,000 (7) 241,000
Xilinx VirtexE XCV3200 CLB 4.5 73,008 LCs16,224 CLBs
851K+1M(2) 4,074,387 (3) 876,096
Xilinx Virtex EExtended Memory
XCV812E CLB 4.5 21,168 LCs4,704 CLBs
1,146K+301K(2) - 254,016
AlteraFLEX 8000
81500 LE 1 1,296 LEs N/A 16,000 (4) 16,000
AlteraFLEX 10K
EPF10K250 LE 1 12,160 LEs 41K 250,000 (4) 149,000
AlteraApex20K
EP20K1500E LE 1 51,840 LEs 442K 1,500,000 (4) 622,000
AlteraACEX 1K
EP1K100 LE 1 4,992 LEs 49K 100,000 60,000
ActelPROASIC
A500K510 tile 0. 4(?) 51,200 tiles 138K 410,000 240,000 ( ?)
LucentORCA
OR3L225B LUT 1 11,552 185K(5) 340,000 (6) 166,000
(1) A Logic Cell (by Xilinx) consists of a 4-input Look-Up Table (LUT) and a flip-flop (FF)(2) Distributed Select RAM - uses Logic Cell resources(3) Xilinx XC4000: about 28.5 gates/CLB (from XAPP059);(4) Altera : about 12 gates/LE(5) RAM mode uses logic cell resources(6) 108 gates per PFU/SLIC; 12 gates per LUT/FF pair; the estimation assumes 30% of logic used as memory(7) 25% of logic used as memory
8 Helena Krupnova©
FPGA Capacity and I/O PinComparisons
Xilinx
400
0
Xilin
x Vi
rtex
E
Xilinx
Virte
x EM
Altera
FLE
X 80
00
Altera
FLE
X 10
K
Alter
a APE
X 20
K
Altera
ACE
X 1K
Actel
Pro
ASIC
Luce
nt OR
CA
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
FPGA Capacity Comparison, Equivalent Gates, Logic Only (no memory)
capacity, eq gates
Xilin
x 400
0
Xilin
x Virt
ex E
Xilin
x Virt
ex E
M
Alte
ra F
LEX
8000
Alte
ra F
LEX
10K
Alte
ra A
PEX
20K
Alte
ra A
CEX
1K
Acte
l Pro
ASIC
Luce
nt O
RCA
0100
200
300400
500
600
700
800
900
FPGA I/O pin Number Comparison
I/O Pin NUmber
9 Helena Krupnova©
FPGA Quantitative Classification
Size, EQ Gates (Logic, no memory)
Number of I/Os
Embedded Block Memory
1,000,000
600,000
700,000
500,000
400,000
300,000
200,000
100,000
800,000
900,000
500
1000
1.2M600K300K 900K
Xilinx Virtex E
Xilinx 4000 Xilinx Virtex EM
Altera Flex 8K
Altera Flex 10K
Altera Apex 20K
Actel ProASIC
Lucent ORCA
10 Helena Krupnova©
Memory Capacity Comparisons
Xili
nx 4
000
Xili
nx V
irtex
E
Xili
nx V
irtex
EM
Alte
ra F
LEX
800
0
Alte
ra F
LEX
10K
Alte
ra A
PE
X 2
0K
Alte
ra A
CE
X 1
K
Act
el P
roA
SIC
Luce
nt O
RC
A
0200000400000600000800000
100000012000001400000160000018000002000000
FPGA Memory Capacity Comparisons
block memory distributed memory total memory
11 Helena Krupnova©
FPGA Design Tools
FPGA CompilerFPGA Compiler II
FPGA Express
FPGA CompilerFPGA Compiler II
FPGA ExpressExemplarLeonardo Spectrum
ExemplarLeonardo Spectrum
SynplicitySynplify
CertifyAmplify
SynplicitySynplify
CertifyAmplify
AlteraMax+PLUS IIQuartus
AlteraMax+PLUS IIQuartus
XilinxAlliance Series
XilinxAlliance Series
Synthesis
Place & Route
®
12 Helena Krupnova©
FPGA Design Flow Specific Issues
o Partitioning for synthesiso Floorplanningo Macro block processingo Memory inferenceo Timing constraintso FPGA advanced synthesis features
l pipelining, retiming, register duplication, operator sharing
o HDL coding stylesl CASE vs. IF-THEN-ELSE
o Debugging facilitiesl embedded logic analyzer functions
o Design issuesl latches/ gated clocks/ using DLLs, PLLs/ tri-state busses/
buffering/ FSM encoding styles
13 Helena Krupnova©
Partitioning for Synthesis
o Tradeoff between the faster CPU times and better resultso RMM guideline :
l putting the registers at the front or the back of the hierarchyblocks
l grouping related combinatorial logic into the same modulel separating timing-critical blocks from area-critical blocksl limit clock to 1 per block and synthesize the blocks separately
o Preserving the hierarchy blocks of about 50K gates andflattening the sub-hierarchy
Exemplar Leonardo Synplicity Synplify Synopsys Design Compiler,FPGA Compiler, FPGA
ExpressPartitioning
forsynthesis
• Manual hierarchy manipulation(group/ ungroup)
• Auto hierarchy control (definingtreshold)
• The hierarchy is preserved bydefault
• Automatic hierarchyhandling (dissolving,
optimizing andrebuilding)
• syn_hier attribute
• Manual partitioning bygroup/ungroup commands
14 Helena Krupnova©
Floorplanning
o « Divide and conquer » approach to the place androute problem
l creating efficient local placement solutions (RLOCconstraints in Xilinx)
l evenly distributing logic among the chip, avoiding routingcongestion (LOC constraints in Xilinx)
l putting logic belonging to the critical block close together(clique constraints in Altera)
o Requires the expert knowledge of the targetedarchitecture and the design tool
o Gate level - FPGA vendor floorplan toolso RT-level floorplanner - Amplify tool from Synplicity
15 Helena Krupnova©
Macro Block Processing
o Inference, Instantiationl HDL code portability vs. efficiency
o Macro blocks are :l specifically optimized for speed/areal have predefined placement solutions
ExemplarLeonardo
SynplicitySynplify
Synopsys DesignCompiler, FPGACompiler, FPGA
Express
Xilinx Altera
Macroblock
proces-sing
• Modgenmacro
generator• Inference
• Instantiation
• Inference• Instantiation
• Context-drivenmodule
generation
• Inference(DesignWare)
• Instantiation
• CoreGEN,LogiCORE,LogiBLOX
macrogeneration tools
• Macros shouldbe instantiated
in HDL code
• LPMfunctions
• MegaWizardAltera IPgenerator
16 Helena Krupnova©
Memory processingo Implementation possibilities:
l embedded block memoryl special LUT configuration mode (distributed memory)l using general logic resources
o Automatic inference for limited caseso Memory wrapping and manual handcrafting
ExemplarLeonardo
Synplicity Synplify SynopsysDC, FC,FPGA
Express
Xilinx Altera
Memoryimplementation
AutomaticRAM
Inferencefrom HDL
code
• Automatic RAMinference from HDL
code• Only synchronous
RAM is inferred,asynchronous RAM
not supported• syn_ramstyle attribute
(to select betweenXilinx RAM styles)
no RAMinference
• instantiatingXilinx RAM
primitives inVHDL code
• generatingmemories
withMemGen,LogiBLOXprograms
• EAB memoryfunctions areaccessible via
LPM• genmem utility
• Alteramegafunctions(for FIFO, CAM,
etc.)
17 Helena Krupnova©
Timing Optimization
o Timing Constraintsl clock constraintsl input arrival, outut required time constraintsl multicycle path constraintsl false path constraintsl multiple clock constraints
o Timing Optimizationl pipelining, retimingl constraints on the fanout (inserting buffers for cutting long-
delay nets)l resource sharingl register duplicationl assigning critical logic functions to EABs
o Overconstraining can lead to worse resultsl result is not a linear function from constraint values
18 Helena Krupnova©
Conclusions
o Device vendorsl extended libraries of IPs and macro blocksl FPGA test boards for the latest devicesl embedded logic analyzersl internet-oriented softwarel experts programs
o Tool vendorsl moving to the higher level of abstration
• RTL floorplanning
l better integration between synthesis and layout• second-pass synthesis with place and route SDF files• floorplanning-based synthesis
l incremental synthesis / place and route, ECO supportl optimizing runtimes