Upload
jody
View
72
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Reconfigurable Computing - FPGA structures. John Morris Chung-Ang University The University of Auckland. ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia. FPGA Architectures. Programmable logic takes many forms Originally devices contained 10’s of gates and flip-flops - PowerPoint PPT Presentation
Citation preview
Reconfigurable Computing -FPGA structures
John MorrisChung-Ang University
The University of Auckland
‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia
FPGA Architectures
Programmable logic takes many forms
Originally devices contained 10’s of gates and flip-flops These early devices were generally called PAL’s (Programmable
Array Logic) A typical structure was
With 10-20 inputs and outputs and ~20 flip-flops,they could
• implement small state machines and • replace large amounts of discrete ‘glue’ logic
ProgrammableAnd-Or array
FF
FF
FF
FF
Inp
uts
(
20)
Ou
tpu
ts (
20)
Programmable Logic
Memory should also be included in the class of programmable logic! It finds application in LUTs, state machines, ...
From early UV EPROMs with ~kbytes,we now have many styles of memory which retains values when power is removedand capacities in Mbytes
Memory is an important consideration when designingreconfigurable systems.FPGA technology does not provide large amounts of memoryand this can be a constraint -especially if you are trying to produce a compact,single chip solution to your problem!
Modern Programmable Logic
As technology has evolved, so have programmable devices
Today’s FPGAs contain Millions of ‘gates’ Memory Support for several I/O protocols - TTL, LVDS, GTL, … Arithmetic units - adders, multipliers Processor cores
FPGA Architecture
The ‘core’ architecture of most modern FPGAs consists ofLogic blocksInterconnection resourcesI/O blocks
Typical FPGA Architecture
Logic blocksembedded in a‘sea’ of connectionresources
CLB = logic blockIOB = I/O bufferPSM = programmable switch matrix
This particular arrangement
is similar to that in Xilinx 4000
)and onwards (chips-
devices from other manufacturersare similar in overall
structure
Logic Blocks
Combination of And-or array
orLook-Up-Table (LUT)
Flip-flops Multiplexors
General aim Arbitrary boolean
function of several variables
Storage Designers try to estimate
what combination of resourceswill produce the most efficientapplication circuit mappings
Xilinx 4000 (and on) CLB
•3 LUT blocks
•2 Flip-Flops (Asynch Reset)
•Multiplexors
•Clock / Reset Lines
Adders
Adders appear in most designs Arithmetic Adders (including subtracters) Other arithmetic operators
eg multipliers, dividers
Counters (including program counters in processors) Incrementors, decrementors, etc
They also often appear on the critical path Adder performance can be crucial for system performance
Because of their importance, researchers are still searching for better ways to add!
Adder structures proposed already Ripple carry Carry select Carry skip Carry look-ahead Manchester … and several dozen more variants
Ripple Carry Adder
The simplest and most well known adder
How long does it take an n-bit adder to produce a result?
n x propagation delay( FA: (a or b) carry ) We can do better than this - using one of many known better structures but What are the advantages of a ripple carry adder? Small Regular
Fits easily into a 2-D layout!
FA
a1 b1
cincout
s1
FA
a0 b0
cincout
s0
FA
an-1 bn-1
cincout
sn-1
FA
an-2 bn-2
cincout
sn-2carryout
Very important in packing circuitry into
fixed 2-D layout of an FPGA!
Ripple Carry Adders
Ripple carry adder performance is limited by propagation of carries
FAa1 b1
cincout
s1
FAa0 b0
cincout
s0
FAan-1bn-1
cincout
sn-1
FAan-2bn-2
cincout
sn-2carryout
FAa3 b3
cincout
s3
FAa2 b2
cincout
s2
On an FPGA,this link is often
the major source
of time delay
…because one or two
FA blocks will often fitin a logic block!
LBLBLB
Connections within a logic block are fast!
Connections between logic blocks are slower
‘Fast Carry’ Logic
Critical delay Transmission of carry out from one logic block to the next
Solution (most modern FPGAs) ‘Fast carry’ logic Special paths between logic blocks used specifically for
carry outVery fast ripple carry adders!
More sophisticated adders? Carry select
Uses ripple carry blocks - so can use fast carry logicShould be faster for wide datapaths?
Carry lookahead Uses large amounts of logic and multiple logic blocksHard to make it faster for small adders!
Carry Select Adder
n-bit Ripple Carry Adder
a0-3
sum0-3
b0-3cin
a4-7
sum04-7
b4-7
cout7
cout3
0
sum14-7
cout7
1
n-bit Ripple Carry Adder
b4-7
n-bit Ripple Carry Adder
0 1
sum4-7
0 1
carryHere we build an 8-bit adder
from 4-bit blocks
‘Standard’
n-bit ripple carryadders
n = any suitable value
Carry Select Adder
n-bit Ripple Carry Adder
a0-3
sum0-3
b0-3cin
a4-7
sum04-7
b4-7
cout7
cout3
0
sum14-7
cout7
1
n-bit Ripple Carry Adder
b4-7
n-bit Ripple Carry Adder
0 1
sum4-7
0 1
carryAfter 4*tpd it will
produce a carry out
This block adds the 4 low order bits
These two blocks ‘speculate’
on the value of cout3
One assumes it willbe 0
the other assumes 1
Carry Select Adder
n-bit Ripple Carry Adder
a0-3
sum0-3
b0-3cin
a4-7
sum04-7
b4-7
cout7
cout3
0
sum14-7
cout7
1
n-bit Ripple Carry Adder
b4-7
n-bit Ripple Carry Adder
0 1
sum4-7
0 1
carryAfter 4*tpd it will
produce a carry out
This block adds the 4 low order bits
After 4*tpd we will have:• sum0-3 (final sum bits)• cout3
(from low order block)• sum04-7
• cout07
(from block assuming 0 cin)• sum14-7
• cout17
(from block assuming 1 cin)
Carry Select Adder
n-bit Ripple Carry Adder
a0-3
sum0-3
b0-3cin
a4-7
sum04-7
b4-7
cout7
cout3
0
sum14-7
cout7
1
n-bit Ripple Carry Adder
b4-7
n-bit Ripple Carry Adder
0 1
sum4-7
0 1
carry
Cout3 selects correct sum4-7 and carry out
All 8 bits + carry are availableafter 4*tpd(FA) + tpd(multiplexor)
Carry Select Adder
This scheme can be generalized to any number of bits Select a suitable block size (eg 4, 8) Replicate all blocks except the first
One with cin = 0
One with cin = 1
Use final cout from preceding block to select correct set of outputs for current block
Fast Adders
Many other fast adder schemes have been proposedeg Carry-skip Manchester Carry-save Carry Look Ahead
If implementing an adder
(eg in programmable logic) do a little research first!
Fast Adders
Challenge: What style of adder is fastest / most compact for any FPGA technology? Answer is not simple For small adders (n < ?),
fast carry logic will certainly make a simple ripple carry adder fastest
It will also use the minimum resources - but will need to be laid out as a column or row
For larger adders ( ? < n < ? ), carry select styles are likely to be best -
They use ripple carry blocks efficiently
For very large adders ( n > ? ), a carry look ahead adder may be faster?
But it will use considerably more resources!
Exploiting a manufacturer’s fast carry logic To use the Altera fast carry logic, write your adder like this:
LIBRARY ieee;USE ieee.std_logic_1164.all;LIBRARY lpm ;USE lpm.lpm_components.all ;
ENTITY adder ISPORT ( c_in : IN STD_LOGIC ;
a, b : IN STD_LOGIC_VECTOR(15 DOWNTO 0) ;sum : OUT STD_LOGIC_VECTOR(15 DOWNTO 0) ;c_out : OUT STD_LOGIC ) ;
END adderlpm ;
ARCHITECTURE lpm_structure OF adder ISBEGIN
instance: lpm_add_subGENERIC MAP (LPM_WIDTH => 16)PORT MAP ( cin => Cin, dataa => a, datab => b,
result => sum, cout => c_out ) ;END lpm_structure ;
What about that carry in?
In an ALU, we usually need to do more than just add! Subtractions are common also Observe
c = a - b
is equivalent toc = a + (-b)
So we can use an adder for subtractions if we can negate the 2nd operand
Negation in 2’s complement arithmetic?
Adder / Subtractor
Negation in 2’s complement arithmetic? Rule:
Complement each bitAdd 1 eg
Binary Decimal 0001 1
Complement 1110Add 1 1111 -1
0110 6Complement 1001 Add 1 1010 -6
Adder / Subtractor
Using an adderComplement each bit using an inverterUse the carry in to add 1!
a
b
carry
c
cin
FA FA FA
0 1
add/subtract
Example - GenerateENTITY adder IS
GENERIC ( n : INTEGER := 16 ) ;PORT ( c_in : IN std_ulogic ;
a, b : IN std_ulogic_vector(n-1 DOWNTO 0) ;sum : OUT std_ulogic_vector(n-1 DOWNTO 0) ;c_out : OUT std_ulogic ) ;
END adder;
ARCHITECTURE rc_structure OF adder ISSIGNAL c : STD_LOGIC_VECTOR(1 TO n-1) ;COMPONENT fulladd
PORT ( c_in, x, y : IN std_ulogic ;s, c_out : OUT std_ulogic ) ;
END COMPONENT ;BEGIN
FA_0: fulladd PORT MAP ( c_in=>c_in, x=>a(0), y=>b(0), s=>sum(0), c_out=>c(1) ) ;
G_1: FOR i IN 1 TO n-2 GENERATEFA_i: fulladd PORT MAP ( c(i), a(i), b(i), sum(i), c(i+1) ) ;
END GENERATE ;FA_n: fulladd PORT MAP (C(n-1),A(n-1),B(n-1),Sum(n-1),Cout) ;
END rc_structure ;
IEEE 1164 standard logic package Bus pull-up and pull-down resistors can be ‘inserted’
Initialise a bus signal to ‘H’ or ‘L’:
‘0’ or ‘1’ from any driver will override the weak ‘H’ or ‘L’:
SIGNAL not_ready : std_logic := ‘H’;
IF seek_finished = ‘1’ THEN not_ready <= ‘0’;END IF;
/ready
10k
VDD
DeviceA DeviceB DeviceC