Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Team ANDAn introduction to FPGAs & VHDL
Stéphane Domas
Introduction
FPGAs• architecture principles,• an example : the spartan 3,• advantages and drawbacks for scientific computing.
VHDL• syntax and semantic,• programming and simulation tools.
Équipe AND 2 / 27
FPGAs
General architecture
• based on SRAM cells (Static RandomAccess Memory),
• composed of logic blocs (CLBs),• mesh to interconnect blocs,• I/O cells,• sometimes specialized blocs : RAM,
clock generation, multipliers, ...• the whole configurable at will.
Équipe AND 3 / 27
FPGAs
Logical blocks
• Architecture :• one or several Look-Up Table (LUT),• multiplexers,• one or several output flip-flops (or latches).
• Functionalities :• LUTs⇒ simple logical functions on few bits,• multiplexers⇒ combine output of LUTs,• latches⇒ 1 bit memories.
Équipe AND 4 / 27
FPGAs
Mesh• Manhattan type (as a grid or a torus),• routing matrix at each cross,
• several channels :• between direct neighbors,• between close neighbors,• sometimes between n-neighbors,• some dedicated to distribute clock signals.• . . .
Équipe AND 5 / 27
FPGAs
Inputs/Outputs
• several types :• power supply,• external clock,• FPGA initialization and programming,• general purpose.
• configurable I/O cells⇒ adaptation to outside environment (e.g.different logic levels)
Équipe AND 6 / 27
FPGAs
abstraction
• state machine :• CLB = vertex with n inputs and m outputs of 1 bit,• interconnection mesh = edges,
• particularities :• considering the physical time to propagate a signal over an edge,
• vertices evolving synchronously or asynchronously⇒ clock signal distributed to all/some CLBs.
• possible to delay an output of a CLB,⇒ store state of several 1 bit signals in a "register" thanks toflip-flops (type D)
Équipe AND 7 / 27
Spartan 3
• the base : CLBs, IO cells,• bonus : RAM and multipliers (18 bits),• DCM = Digital Clock Manager (clock generation, phase
shift, ...)
Équipe AND 8 / 27
Spartan 3
• 4 slices per CLB,• LUTs can be used as RAM or shift-register.
Équipe AND 9 / 27
Spartan 3
version 3A
version 3A-DSP
Équipe AND 10 / 27
For scientific computing
Advantages• natural parallelism,• natural task pipelining,• dynamic and fast reconfiguration,• easy to integrate within an embedded board
Drawbacks• extremely low level programming,• non imperative algorithmic,• binary logic⇒ computation with integer values (no floating point
operations),• low clock frequency (compared to classic processors),• "big" FPGAs costly,• development/production environments not free.
Équipe AND 11 / 27
VHDL : principles
• VHDL = Very High Description Language,• provides an abstraction of the hardware,• based on functional description of blocs with n inputs and
m outputs (called ports),• hierarchical aggregation of blocs,• each bloc contains "instructions" to manipulate values
called signals,• signal = aggregation of n physical signals of 1 bit.
Équipe AND 12 / 27
VHDL : syntax
to create a block
1. description of I/O ports and variables (entity),
2. description of its structure and its code (architecture).
entity clockgen isgeneric ( Tps : Time );port ( phase : out std_logic );
end entity clockgen;architecture rtl of clockgen is
...
end architecture rtl;
Équipe AND 13 / 27
VHDL : syntax
content of architecture
1. sub-blocks declaration,
2. signal declaration,
3. sub-blocks instantiation,
4. code.
Équipe AND 14 / 27
VHDL : syntax
Example :architecture rtl of div_tb is
component clockgengeneric ( Tps : Time );port ( phase : out std_logic );
end component;...signal ext_clk : std_logic;...
beginclkgen0 : clockgen
generic map ( Tps => 5ns )port map ( phase => ext_clk );
...
end architecture rtl;
Équipe AND 15 / 27
VHDL : syntax
the "code"• assignment :signal <= expression
• expression =
• composition of signals using logic, relational or arithmeticoperators,
• cast of types (e.g. std_logic_vector→ signed),• conditional assignment.
valsign <= val(3);
valabs <= "0000" & unsigned(val) when (valsign = ’0’)
else "0000" & unsigned(-val);
Équipe AND 16 / 27
VHDL : syntax
the "code" : two execution contexts
• outside process : concurrent execution⇒ continuous assignments (i.e. follow physical changes).
• within process : sequential execution,⇒ real assignments are effective at the process end.
Équipe AND 17 / 27
VHDL : syntax
example outside processentity setreset is
port(s,r : in bit; q,qb : out bit);end setreset;
architecture rtl of setreset issignal sq,sqb : bit;
beginq <= sq; - assignment 1qb <= sqb; - assignment 2sq <= sqb NOR r; - assignment 3sqb <= sq NOR s; - assignment 4
end architecture rtl;
Équipe AND 18 / 27
VHDL : syntax
process principles :
• sequential execution within the process,• sensibility list = parameters of the process,• any change in the state of a parameter⇒ start the process,• control instructions allowed (if, case, ...),• inner variables with immediate assignment allowed,• a signal can be assigned in a single process,• a signal can be read by several process,
• special case : a process with the clock as a parameter.⇒ the state of all signals are stored in registers, and thus delayed by aclock cycle.
Équipe AND 19 / 27
VHDL : syntax
example of an asynchronous process : subtraction/comparison
signal a : unsigned(3 downto 0);signal b : unsigned(3 downto 0);signal sum : unsigned(3 downto 0);signal cmp : std_logic;...diffcmp : process (a,b)variable s : signed(3 downto 0);begin
s := a - b;sum <= s; - totally uselesscmp <= ’1’;if a < b then
cmp <= ’0’;s := -s;
end if;sum <= s;
end process;
Équipe AND 20 / 27
VHDL : syntax
example of a synchronous process : two cycles delay
signal s : std_logic_vector(2 downto 0);signal sdly : std_logic_vector(2 downto 0);signal r1 : std_logic_vector(2 downto 0);signal r2 : std_logic_vector(2 downto 0);...dly2cy : process (clk)
if rising_edge(clk) thenr1 <= s;r2 <= r1;
end if;end process;sdly <= r2;
Équipe AND 21 / 27
VHDL : syntax
example of mixed process : delay + asynchronous reset
signal s : std_logic_vector(2 downto 0);signal sdly : std_logic_vector(2 downto 0);signal r1 : std_logic_vector(2 downto 0);signal r2 : std_logic_vector(2 downto 0);...dly2cyraz : process (clk,reset)
if reset = ’1’ thenr1 <= "000";r2 <= "000";
elsif rising_edge(clk) thenr1 <= s;r2 <= r1;
end if;end process;sdly <= r2;
Équipe AND 22 / 27
VHDL : coding
Solutions
• "by hand" in text files :• long and extremely tedious,• a lot of syntax mistakes.
• automatic generation (simulink, HDL code, coregen, ...)• design made with drawing boxes,• each box has a fixed functionality or can contain matlab
instructions,
• costly tools,• code need modification to be usable on real FPGAs.
Équipe AND 23 / 27
VHDL : coding
Simulink example
Équipe AND 24 / 27
VHDL : simulation
Solutions
• GPL solution : ghdl and gtkwave• need to write a VHDL code called testbench) that :
I instantiates the design + a clock generator if needed,I defines a process to generate input signals for the design.
• VHDL compilation (via ghdl)⇒ syntax checking,• creation of an executable to simulate the behavior of the design,• this executable produces traces of the signals,• display and analysis of these traces via gtkwave.
• paying solutions (matlab + modelsim)• automatic generation of the testbench,• co-simulation,• integrated trace visualization.
Équipe AND 25 / 27
VHDL : simulation
Visualization example : a divider
Équipe AND 26 / 27
VHDL : execution
Principles
• microcode, called bitstream, generation via a placement/routing tool(ise),
• uploading the bitstream (via USB, PCI, ...),• "execution" starts at the end of the upload.
• reading/writing values through I/O pins :• with external peripherals,• with another FPGA or processor,• with USB/PCI/Ethernet/... controllers.
• on some FPGA boards :• only a part of the bitstream can be uploaded,• bitstream stored in a flash memory and automatically loaded when
powering the board
Équipe AND 27 / 27