10
Spring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field Programmable Gate Arrays (FPGAs) January 26, 2010 John Wawrzynek 1 Spring 2010 EECS150 - Lec03-FPGA Page FPGA Overview Basic idea: two-dimensional array of logic blocks and flip-flops with a means for the user to configure (program): 1. the interconnection between the logic blocks, 2. the function of each block. Simplified version of FPGA internal architecture: 2 Spring 2010 EECS150 - Lec03-FPGA Page Die Photos: Virtex FPGA vs. Pentium IV FGPA Vertex chip looks remarkably structured Very dense, very regular structure “Full-Custom” Pentium chip somewhat more random in structure Large on-chip memories (caches) are visible 3 Spring 2010 EECS150 - Lec03-FPGA Page FPGAs are in widespread use 4 INSIDE Make MicroBlaze Processing Roar With Hardware Acceleration FPGAs Help CERN Track Particles Approaching Speed of Light Hardware Trumps Software in Medical Device Design Taming Power Draw in Consumer MPUs INSIDE Make MicroBlaze Processing Roar With Hardware Acceleration FPGAs Help CERN Track Particles Approaching Speed of Light Hardware Trumps Software in Medical Device Design Taming Power Draw in Consumer MPUs FPGAs Power Net-Centric Battlefield on Many Fronts FPGAs Power Net-Centric Battlefield on Many Fronts INSIDE Algorithm Developers Power New DA System on Xilinx Automotive FPGA Platform Engineer Turns Blown Engine into Hot Startup How to Beat Your Son at Guitar Hero Using Xilinx FPGA Tips and Tricks for Using FPGA Editor and SystemVerilog INSIDE Algorithm Developers Power New DA System on Xilinx Automotive FPGA Platform Engineer Turns Blown Engine into Hot Startup How to Beat Your Son at Guitar Hero Using Xilinx FPGA Tips and Tricks for Using FPGA Editor and SystemVerilog Automotive Innovators Hit Top Gear in Driver Assistance with FPGA Platforms Automotive Innovators Hit High Gear in Driver Assistance with FPGA Platforms SUB Plugging into High-Volume Consumer Products Plugging into High-Volume Consumer Products HIGH VOLUME Spartan-3E: A New Era Multimedia for Automotive DSP Algorithms DESIGN TOOLS New ISE 7.1i Software Control Your Designs SERIAL I/O Extend Your Reach HIGH VOLUME Spartan-3E: A New Era Multimedia for Automotive DSP Algorithms DESIGN TOOLS New ISE 7.1i Software Control Your Designs SERIAL I/O Extend Your Reach Far more designs are implemented in FPGA than in custom chips.

EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

  • Upload
    votu

  • View
    232

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

EECS150 - Digital DesignLecture 3 - Field Programmable

Gate Arrays (FPGAs)January 26, 2010

John Wawrzynek

1

Spring 2010 EECS150 - Lec03-FPGA Page

FPGA Overview• Basic idea: two-dimensional array of logic blocks and flip-flops with a means

for the user to configure (program): 1. the interconnection between the logic blocks, 2. the function of each block.

Simplified version of FPGA internal architecture:

2

Spring 2010 EECS150 - Lec03-FPGA Page

Die Photos: Virtex FPGA vs. Pentium IV

• FGPA Vertex chip looks remarkably structured– Very dense, very regular structure

• “Full-Custom” Pentium chip somewhat more random in structure– Large on-chip memories (caches) are visible

3

Spring 2010 EECS150 - Lec03-FPGA Page

FPGAs are in widespread use

4

INSIDE

Make MicroBlaze Processing Roar With Hardware Acceleration

FPGAs Help CERN TrackParticles Approaching Speed of Light

Hardware Trumps Software in Medical Device Design

Taming Power Draw inConsumer MPUs

INSIDE

Make MicroBlaze Processing Roar With Hardware Acceleration

FPGAs Help CERN TrackParticles Approaching Speed of Light

Hardware Trumps Software in Medical Device Design

Taming Power Draw inConsumer MPUs

FPGAs Power Net-CentricBattlefield on Many Fronts

FPGAs Power Net-CentricBattlefield on Many Fronts

INSIDE

Algorithm Developers PowerNew DA System on XilinxAutomotive FPGA Platform

Engineer Turns Blown Engine into Hot Startup

How to Beat Your Sonat Guitar Hero Using Xilinx FPGA

Tips and Tricks for Using FPGA Editor and SystemVerilog

INSIDE

Algorithm Developers PowerNew DA System on XilinxAutomotive FPGA Platform

Engineer Turns Blown Engine into Hot Startup

How to Beat Your Sonat Guitar Hero Using Xilinx FPGA

Tips and Tricks for Using FPGA Editor and SystemVerilog

Automotive Innovators Hit Top Gear in

Driver Assistance with FPGA Platforms

Automotive Innovators Hit High Gear in Driver Assistance

with FPGA Platforms

SUBSCRIBE

Plugging into High-VolumeConsumer Products

Plugging into High-VolumeConsumer Products

HIGH VOLUMESpartan-3E: A New Era

Multimedia for Automotive

DSP Algorithms

DESIGN TOOLSNew ISE 7.1i Software

Control Your Designs

SERIAL I/OExtend Your Reach

HIGH VOLUMESpartan-3E: A New Era

Multimedia for Automotive

DSP Algorithms

DESIGN TOOLSNew ISE 7.1i Software

Control Your Designs

SERIAL I/OExtend Your Reach

Far more designs are implemented in FPGA than in custom chips.

Page 2: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

Background (review) for upcoming

• A MUX or multiplexor is a combinational logic circuit that chooses between 2N inputs under the control of N control signals.

• A latch is a 1-bit memory (similar to a flip-flop).

5

Spring 2010 EECS150 - Lec03-FPGA Page

FPGA Variations• Families of FPGA’s differ in:

– physical means of implementing user programmability,

– arrangement of interconnection wires, and

– the basic functionality of the logic blocks.

• Most significant difference is in the method for providing flexible blocks and connections:

• Anti-fuse based (ex: Actel)

+ Non-volatile, relatively small

– fixed (non-reprogrammable)

6

Spring 2010 EECS150 - Lec03-FPGA Page

User Programmability

• Latches are used to:1. control a switch to make or

break cross-point connections in the interconnect

2. define the function of the logic blocks

3. set user options:

• within the logic blocks

• in the input/output blocks

• global reset/clock

• “Configuration bit stream” is loaded under user control

• Latch-based (Xilinx, Altera, …)

+ reconfigurable

– volatile

– relatively large.

7

Spring 2010 EECS150 - Lec03-FPGA Page

Idealized FPGA Logic Block

• 4-input look up table (LUT)– implements combinational logic functions

• Register– optionally stores output of LUT

8

Page 3: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

4-LUT Implementation• n-bit LUT is implemented as a 2n x

1 memory:– inputs choose one of 2n memory

locations.

– memory locations (latches) are normally loaded with values from user’s configuration bit stream.

– Inputs to mux control are the CLB inputs.

• Result is a general purpose “logic gate”.

– n-LUT can implement any function of n inputs!

9

Spring 2010 EECS150 - Lec03-FPGA Page

LUT as general logic gate• An n-lut as a direct implementation of

a function truth-table.

• Each latch location holds the value of the function corresponding to one input combination.

Example: 4-lut

Example: 2-lut

Implements any function of 2 inputs.

How many of these are there?How many functions of n inputs?

10

Spring 2010 EECS150 - Lec03-FPGA Page

FPGA Generic Design Flow

• Design Entry:– Create your design files using:

• schematic editor or • HDL (hardware description languages: Verilog, VHDL)

• Design Implementation:– Logic synthesis (in case of using HDL entry) followed by,– Partition, place, and route to create configuration bit-

stream file• Design verification:

– Optionally use simulator to check function,– Load design onto FPGA device (cable connects PC to

development board), optional “logic scope” on FPGA• check operation at full speed in real environment.

11

Spring 2010 EECS150 - Lec03-FPGA Page

Example Partition, Placement, and Route

• Example Circuit:– collection of gates and flip-flops

• Idealized FPGA structure:

Circuit combinational logic must be “covered” by 4-input 1-output LUTs.Flip-flops from circuit must map to FPGA flip-flops. (Best to preserve “closeness” to CL to minimize wiring.)Best placement in general attempts to minimize wiring.

12Vdd, GND, clock, and global resets are all “prewired”.

Page 4: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

Example Partition, Placement, and Route

• Example Circuit:– collection of gates and flip-flops

13

Two partitions. Each has single output, no more than 4 inputs, and no more than 1 flip-flop. In this case, inverter goes in both partitions. Note: the partition can be arbitrarily large as long as it has not more than 4 inputs and 1 output, and no more than 1 flip-flop.

A

A

B

B

INOUT

Spring 2010 EECS150 - Lec03-FPGA Page

Xilinx FPGAs (interconnect detail)

14

Spring 2010 EECS150 - Lec03-FPGA Page

Project platform: Xilinx ML505-110

15

Spring 2010 EECS150 - Lec03-FPGA Page

FPGA: Xilinx Virtex-5 XC5VLX110TVirtex-5 “die photo”

A die is an unpackaged part

!"#$%&'()*$$+,,-$$$).'/0$1

!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<

!"#$%&'()*!"#$%&'( !"#$%&')

16

Page 5: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

From die to PC board ...

!"#$%&'()*$$+,,-$$$).'/0$1

!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<

!"#$%&'()*!"#$%&'( !"#$%&')

2 www.xilinx.com XAPP426 (v1.3) March 3, 2006

Implementing Xilinx Flip-Chip BGA PackagesR

*Xilinx flip-chip packages are not hermetically sealed and exposure to cleaning solvents/ chemicals or excessive moisture during boards assembly could pose serious package reliability concerns. Small vents are kept by design between the heatspreader (lid) and the organic substrate to allow for outgassing and moisture evaporation. Solvents or other corrosive chemicals could seep through these vents and attack the organic materials and components inside the package and hence are strongly discouraged during board assembly of Xilinx flip-chip BGA packages.

Recommended PCB Design Rules

For Xilinx BGA packages, NSMD (Non Solder Mask Defined) pads on the board are suggested. This allows a clearance between the land metal (diameter L) and the solder mask opening (diameter M) as shown in Figure 3. The space between the NSMD pad and the solder mask, and the actual signal trace widths depends on the capability of the PCB vendor. The cost of the PCB is higher when the line widths and spaces are tighter.

Figure 2: Package Construction with Type II Lid

Copper Heatspreader

Solder Ball

Flip Chip Solder Bump

Organic Build-Up Substrate

Underfill Epoxy

Silicon Die

Thermal Interface Material

Adhesive Epoxy*

Figure 3: Suggested Board Layout of Soldered Pads for BGA

Ball Grid Array (BGA)

Flip-Chip Package

17

Spring 2010 EECS150 - Lec03-FPGA Page

Banks of I/O placed on chip floor plan

!"#$%&'()*$$+,,-$$$).'/0$1

!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<

!"#$%&'()*!"#$%&'( !"#$%&')

Virtex-5 FPGA User Guide www.xilinx.com 217UG190 (v4.2) May 9, 2008

SelectIO Resources General GuidelinesR

Virtex-5 I/O Bank RulesIn Virtex-5 devices, with some exceptions in the center column, an I/O bank consists of 40 IOBs (20 CLBs high and a single clock region). There are always four half-sized banks (20 IOBs) and a single configuration bank in the center column. The number of banks depends upon the device size, and in larger devices, there are additional full-sized banks in the center column. In the Virtex-5 Overview the total number of I/O banks is listed by device type. The XC5VLX30 has 12 usable I/O banks and one configuration bank. Figure 6-3 is an example of a columnar floorplan showing the XC5VLX30 I/O banks.

Reference Voltage (VREF) Pins

Low-voltage, single-ended I/O standards with a differential amplifier input buffer require an input reference voltage (VREF). VREF is an external input into Virtex-5 devices. Within each I/O bank, one of every 20 I/O pins is automatically configured as a VREF input, if using a single-ended I/O standard that requires a differential amplifier input buffer.

Output Drive Source Voltage (VCCO) Pins

Many of the low-voltage I/O standards supported by Virtex-5 devices require a different output drive voltage (VCCO). As a result, each device often supports multiple output drive source voltages.

Output buffers within a given VCCO bank must share the same output drive source voltage. The following input buffers use the VCCO voltage: LVTTL, LVCMOS, PCI, LVDCI and other DCI standards.

Figure 6-3: Virtex-5 XC5VLX30 I/O Banks

ug190_6_03_021306

BANK40 I/O

BANK20 I/OBANK20 I/O

BANK20 I/O

BANK20 I/O

BANK40 I/O

BANK40 I/O

BANK40 I/O

CONFIG

BANK40 I/O

BANK40 I/O

BANK40 I/O

BANK40 I/O

Virtex-5 FPGA Packaging and Pinout Specification www.xilinx.com 275UG195 (v4.3) June 18, 2008

FF676 Package—LX30R

Figure 3-8: FF676 Package—LX30 SelectIO Bank Diagram

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

ug195_c3_06_031507

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

ABCDEFGHJKLMNPRTUVWY

AAABACADAEAF

ABCDEFGHJKLMNPRTUVWYAAABACADAEAF

16 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 1516 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 15 1516 16 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 1516 16 16 16 16 16 16 16 16 3 3 3 15 15 15 15 15 15 15 15 1514 14 14 12 12 12 3 3 3 3 3 3 3 3 3 11 11 11 11 13 13

14 14 12 12 12 3 3 3 3 1 3 1 3 3 3 11 11 11 13 1314 14 12 12 12 12 1 1 1 1 1 1 1 1 1 11 11 11 13 13 1314 14 14 12 12 12 1 1 1 1 1 1 1 1 1 11 11 11 11 13 1314 14 12 12 12 11 11 11 11 13 13 1314 14 14 12 12 12 11 11 11 11 13 13

14 12 12 12 12 11 11 11 11 13 1314 14 12 12 12 12 11 11 11 11 13 13 1314 14 12 12 12 12 11 11 11 11 13 1314 12 12 12 12 11 11 11 11 13 13 1314 14 12 12 12 12 17 17 11 11 13 13

14 12 18 18 18 17 17 17 17 13 1314 14 18 18 18 18 17 17 17 17 13 13 1314 14 18 18 18 18 17 17 17 17 17 1314 18 18 18 18 17 17 17 17 17 13 1314 14 18 18 18 18 2 2 2 2 2 2 17 17 17 17 13 13

14 18 18 18 18 2 2 2 2 2 2 2 2 2 17 17 17 17 17 1314 14 18 18 18 18 4 4 2 4 2 2 2 2 17 17 17 17 17 13 1314 14 18 18 18 4 4 4 4 4 4 4 4 4 4 17 17 17 17 17 1314 18 18 18 18 4 4 4 4 4 4 4 1714 14 18 18 18

14 18 18 18

Colors on this package pinout map to banks.

18

UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture

!"#$%&'()*$$+,,-$$$).'/0$1

!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<

!"#$%&'()*!"#$%&'( !"#$%&')

Colors represent different types of resources:

LogicBlock RAMDSP (ALUs)ClockingI/OSerial I/O + PCI

A routing fabric runs throughout the chip to wire everything together.

UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture

!"#$%&'()*$$+,,-$$$).'/0$1

!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<

!"#$%&'()*!"#$%&'( !"#$%&')

Routing fabric requires many interconnect layers.

Page 6: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

!"#$%&'()*$$+,,-$$$).'/0$1

!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<

!"#$%&'()*!"#$%&'( !"#$%&')

Configurable Logic Blocks (CLBs)

� �� �$� � � � �� � � � � !� �� www.xilinx.com ���UG190 (v4.2) May 9, 2008

Chapter 5

Configurable Logic Blocks (CLBs)

���� �"��"��#The Configurable Logic Blocks (CLBs) are the main logic resources for implementing sequential as well as combinatorial circuits. Each CLB element is connected to a switch matrix for access to the general routing matrix (shown in Figure 5-1). A CLB element contains a pair of slices. These two slices do not have direct connections to each other, and each slice is organized as a column. Each slice in a column has an independent carry chain. For each CLB, slices in the bottom of the CLB are labeled as SLICE(0), and slices in the top of the CLB are labeled as SLICE(1).

The Xilinx tools designate slices with the following definitions. An “X” followed by a number identifies the position of each slice in a pair as well as the column position of the slice. The “X” number counts slices starting from the bottom in sequence 0, 1 (the first CLB column); 2, 3 (the second CLB column); etc. A “Y” followed by a number identifies a row of slices. The number remains the same within a CLB, but counts up in sequence from one CLB row to the next CLB row, starting from the bottom. Figure 5-2 shows four CLBs located in the bottom-left corner of the die.

Figure 5-1: ���������� �����������#� ���� ������

SwitchMatrix

Slice(1)

COUTCOUT

CINCIN

Slice(0)

� � �

UG190_5_01_122605

Slices define regular connections to the switching fabric, and to slices in

CLBs above and below it on the die.

The LX110T has 17,280 slices.21

Spring 2010 EECS150 - Lec03-FPGA Page

!"#$%&'()*$$+,,-$$$).'/0$1

!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<

!"#$%&'()*!"#$%&'( !"#$%&')

X-Y naming convention for slices

172 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008

Chapter 5: Configurable Logic Blocks (CLBs)R

Slice DescriptionEvery slice contains four logic-function generators (or look-up tables), four storage elements, wide-function multiplexers, and carry logic. These elements are used by all slices to provide logic, arithmetic, and ROM functions. In addition to this, some slices support two additional functions: storing data using distributed RAM and shifting data with 32-bit registers. Slices that support these additional functions are called SLICEM; others are called SLICEL. SLICEM (shown in Figure 5-3) represents a superset of elements and connections found in all slices. SLICEL is shown in Figure 5-4.

Figure 5-2: Row and Column Relationship between CLBs and Slices

SliceX1Y1

COUTCOUT

CINCIN

SliceX0Y1

CLB

UG190_5_02_122605

SliceX1Y0

COUTCOUT

SliceX0Y0

CLB

SliceX3Y1

COUTCOUT

CINCIN

SliceX2Y1

CLB

SliceX3Y0

COUTCOUT

SliceX2Y0

CLB

Lower-left corner of the die.

X0, X2, ... are lower CLB slices.X1, X3, ... are upper CLB slices.

Y0, Y1, ... are CLB column positions.

22

Spring 2010 EECS150 - Lec03-FPGA Page

Atoms: 5-input Look Up Tables (LUTs)

ExpressFabric Technology

WP245 (v1.1.1) July 7, 2006 www.xilinx.com 5

R

The 6-input LUT leads to several benefits:• As it implements wider functions directly in the LUT, the number of logic levels

between registers is reduced, leading to higher performance.• It implements significantly more logic than a LUT with four inputs.• Power consumption is reduced because the larger LUT reduces the amount of

required interconnect (routing resources).The Virtex-5 family SLICEM LUTs also provide additional benefits:• New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or

32 x 2 distributed RAM. Benefits for the designer are a much denser and faster implementation of distributed RAM with increased flexibility.

• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement a shift register of up to 128 bits, providing significant area savings and reduced routing resources in comparison to previous architectures. Shift registers are features available only in Xilinx devices. The Xilinx ISE™ software packer automatically packs two 16-bit SRLs with common addressing but different data. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it can be implemented in a single slice.

Routing and Interconnect ArchitectureWith process technology advancements, interconnect timing delays can account for more than 50% of the critical path delay. A new diagonally symmetric interconnect pattern, developed for the Virtex-5 family, enhances performance by reaching more places in fewer hops. The new pattern allows for more logic connections to be made within two or three hops. Moreover, the more regular routing pattern makes it easier for the Xilinx ISE software to find the most optimal routes. All of the interconnect features are transparent to FPGA designers, but translate to higher overall

Figure 3: Block Diagram of a Virtex-5 6-Input LUT

WP245_03_051006

LUT5

A1

A2

A3

A4 D

A5

A6

A2

A3

A4 D

D6

D5

A5

A6

A2

A3

A4

A5

A6

LUT6

LUT5

A[6:2] D000000000100010

....

101

111011111011111

001

Q

Q

Q

Q

Q

Q

(1)

(1)

(1)

(0)

(0)

(0)

.... D

A[6:2]

Computes any 5-input logic function.

Timing is independent of function.

Latchesset during

configuration. 23

Spring 2010 EECS150 - Lec03-FPGA Page

Virtex-5 6-LUTs: Composition of 5-LUTsMay be used

as one 6-input LUT (D6 out) ...

Combinational logic

(post configuration)

... or as two 5-input LUTS (D6 and D5)

ExpressFabric Technology

WP245 (v1.1.1) July 7, 2006 www.xilinx.com 5

R

The 6-input LUT leads to several benefits:• As it implements wider functions directly in the LUT, the number of logic levels

between registers is reduced, leading to higher performance.• It implements significantly more logic than a LUT with four inputs.• Power consumption is reduced because the larger LUT reduces the amount of

required interconnect (routing resources).The Virtex-5 family SLICEM LUTs also provide additional benefits:• New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or

32 x 2 distributed RAM. Benefits for the designer are a much denser and faster implementation of distributed RAM with increased flexibility.

• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement a shift register of up to 128 bits, providing significant area savings and reduced routing resources in comparison to previous architectures. Shift registers are features available only in Xilinx devices. The Xilinx ISE™ software packer automatically packs two 16-bit SRLs with common addressing but different data. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it can be implemented in a single slice.

Routing and Interconnect ArchitectureWith process technology advancements, interconnect timing delays can account for more than 50% of the critical path delay. A new diagonally symmetric interconnect pattern, developed for the Virtex-5 family, enhances performance by reaching more places in fewer hops. The new pattern allows for more logic connections to be made within two or three hops. Moreover, the more regular routing pattern makes it easier for the Xilinx ISE software to find the most optimal routes. All of the interconnect features are transparent to FPGA designers, but translate to higher overall

Figure 3: Block Diagram of a Virtex-5 6-Input LUT

WP245_03_051006

LUT5

A1

A2

A3

A4 D

A5

A6

A2

A3

A4 D

D6

D5

A5

A6

A2

A3

A4

A5

A6

LUT6

LUT5

The LX110T has 69,120 6-LUTs6-LUT delay is 0.9 ns

24

Page 7: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

The simplest view of a slice

194 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008

Chapter 5: Configurable Logic Blocks (CLBs)R

Designing Large Multiplexers

4:1 Multiplexer

Each Virtex-5 LUT can be configured into a 4:1 MUX. The 4:1 MUX can be implemented with a flip-flop in the same slice. Up to four 4:1 MUXes can be implemented in a slice, as shown in Figure 5-21.

Figure 5-21: Four 4:1 Multiplexers in a Slice

UG190_5_21_050506

(D[6:1])

(C[6:1])

(B[6:1])

(A[6:1])

(CLK)CLK

6

SLICE

LUT

LUT

LUT

LUT

A[6:1]

O6

6 A[6:1]

O6

RegisteredOutput

4:1 MUX Output

(Optional)

D Q

(D)

(DQ)

RegisteredOutput

4:1 MUX Output

(Optional)

D Q

(C)

(CQ)

RegisteredOutput

4:1 MUX Output

(Optional)

D Q

(B)

(BQ)

RegisteredOutput

4:1 MUX Output

(Optional)

D Q

(A)

(AQ)

6 A[6:1]

O6

6 A[6:1]

O6

SEL D [1:0], DATA D [3:0]Input

SEL C [1:0], DATA C [3:0]Input

SEL B [1:0], DATA B [3:0]Input

SEL A [1:0], DATA A [3:0]Input

Four 6-LUTs

Four Flip-Flops

Switching fabric may see combinational and registered

outputs.

An actual Virtex-5 slice adds many small features to this

simplified diagram. We show them one by one ...

25

Spring 2010 EECS150 - Lec03-FPGA Page

Two 7-LUTs per slice ...

Extra multiplexers(F7AMUX,

F7BMUX)

Virtex-5 FPGA User Guide www.xilinx.com 195UG190 (v4.2) May 9, 2008

CLB OverviewR

8:1 Multiplexer

Each slice has an F7AMUX and an F7BMUX. These two muxes combine the output of two LUTs to form a combinatorial function up to 13 inputs (or an 8:1 MUX). Up to two 8:1 MUXes can be implemented in a slice, as shown in Figure 5-22.

Figure 5-22: Two 8:1 Multiplexers in a Slice

UG190_5_22_090806

(D[6:1])

(C[6:1])

(CX)

(B[6:1])

(A[6:1])

(AX)

SELF7(1)(CLK)

CLK

SELF7(2)

SEL D [1:0], DATA D [3:0]Input (1)

SEL C [1:0], DATA C [3:0]Input (1)

SEL B [1:0], DATA B [3:0]Input (2)

SEL A [1:0], DATA A [3:0]Input (2)

6

SLICE

LUT

LUT

LUT

LUT

A[6:1]

O6

6 A[6:1]

O6 RegisteredOutput

8:1 MUXOutput (1)

(Optional)

D Q

(CMUX)

(CQ)

RegisteredOutput

8:1 MUXOutput (2)

(Optional)

D Q

(AMUX)

(AQ)

6 A[6:1]

O6

6 A[6:1]

O6

F7BMUX

F7AMUX

Extra inputs (AX and CX)

26

Spring 2010 EECS150 - Lec03-FPGA Page

Or one 8-LUTs per slice ...

Third multiplexer(F8MUX)

Third input (BX)

196 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008

Chapter 5: Configurable Logic Blocks (CLBs)R

16:1 Multiplexer

Each slice has an F8MUX. F8MUX combines the outputs of F7AMUX and F7BMUX to form a combinatorial function up to 27 inputs (or a 16:1 MUX). Only one 16:1 MUX can be implemented in a slice, as shown in Figure 5-23.

It is possible to create multiplexers wider than 16:1 across more than one SLICEM. However, there are no direct connections between slices to form these wide multiplexers.

Fast Lookahead Carry Logic

In addition to function generators, dedicated carry logic is provided to perform fast arithmetic addition and subtraction in a slice. The Virtex-5 CLB has two separate carry chains, as shown in Figure 5-1. The carry chains are cascadable to form wider add/subtract logic, as shown in Figure 5-2.

The carry chain in the Virtex-5 device is running upward and has a height of four bits per slice. For each bit, there is a carry multiplexer (MUXCY) and a dedicated XOR gate for adding/subtracting the operands with a selected carry bits. The dedicated carry path and

Figure 5-23: 16:1 Multiplexer in a Slice

UG190_5_23_050506

(D[6:1])

(C[6:1])

(CX)

(B[6:1])

(A[6:1])

(AX)(BX)

(CLK)

SELF7

SELF7SELF8

CLK

6

SLICE

LUT

LUT

LUT

LUT

A[6:1]

O6

6 A[6:1]

O6

RegisteredOutput

16:1 MUXOutput

(Optional)

D Q

(BMUX)

(B)

6 A[6:1]

O6

6 A[6:1]

O6

F7BMUX

F8MUX

F7AMUX

SEL D [1:0], DATA D [3:0]Input

SEL C [1:0], DATA C [3:0]Input

SEL B [1:0], DATA B [3:0]Input

SEL A [1:0], DATA A [3:0]Input Configuring the

“n” of an n-LUT ...27

Spring 2010 EECS150 - Lec03-FPGA Page

Extra muxes to chose LUT option ...

Virtex-5 FPGA User Guide www.xilinx.com 199UG190 (v4.2) May 9, 2008

CLB / Slice Timing ModelsR

General Slice Timing Model and ParametersA simplified Virtex-5 slice is shown in Figure 5-25. Some elements of the Virtex-5 slice are omitted for clarity. Only the elements relevant to the timing paths described in this section are shown.

Figure 5-25: Simplified Virtex-5 Slice

UG190_5_25_050506

LUT

O6

O5

6 D

FE/LAT

DCECLK

SR REV

Q

F7BMUX

F8MUX

DMUX

DQ

D Inputs

LUT

O6

O5

6 C

FE/LAT

DCECLK

SR REV

Q CQ

CMUX

C Inputs

DX

CX

LUT

O6

O5

6 B

FE/LAT

DCECLK

SR REV

Q BQ

BMUX

B Inputs

BX

FE/LAT

DCECLK

SR REV

Q AQ

F7AMUXLUT

O6

O5

6 A

AMUX

A Inputs

AX

CE

CLK

SRREV(DX)

From eight 5-LUTs ... to one 8-LUT.

Combinational or registered outs.

Flip-flops unused by LUTs can be used

standalone.

Flip-flops ...28

Page 8: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

Slice flip-flop properties ...

Virtex-5 FPGA User Guide www.xilinx.com 177UG190 (v4.2) May 9, 2008

CLB OverviewR

Table 5-3: Truth Table when SRLOW is Used (Default Condition)

SR REV Function

0 0 No Logic Change

0 1 1

1 0 0

1 1 0

Table 5-4: Truth Table when SRHIGH is Used

SR REV Function

0 0 No Logic Change

0 1 0

1 0 1

1 1 0

Figure 5-5: Register/Latch Configuration in a SliceUG190_5_05_071207

DX

CX

BX

CE

AX

DQ

CQ

BQ

AQ

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

DFFLUT D Output

LUT C Output

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

Q

CECK

Q

Q

Q

SR

LUT B Output

LUT A Output AFF

BFF

CFF

CLK

Reset Type

Sync

Async

Each state element may be edge-triggered FF or latch.

Each state element may respond differently to

set/reset signal.

Clock enable, clock polarity, and set/reset lines in a slice

are shared.

Next: The vertical dimension ...29

Spring 2010 EECS150 - Lec03-FPGA Page

Reminder: arithmetic addition ...

One-bit full adder

[Co,S] = A + B + Ci

A, B, Ci: 1-bit number inputs.[Co,S]: 2-bit number output.“+”: Arithmetic addition.

Simplest multi-bit adder

“Ripple-carry adder”30

UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture

Virtex-5 FPGA User Guide www.xilinx.com 197UG190 (v4.2) May 9, 2008

CLB OverviewR

carry multiplexer (MUXCY) can also be used to cascade function generators for implementing wide logic functions.

Figure 5-24 illustrates the carry chain with associated logic elements in a slice.

The carry chains carry lookahead logic along with the function generators. There are ten independent inputs (S inputs – S0 to S3, DI inputs – DI1 to DI4, CYINIT and CIN) and eight independent outputs (O outputs – O0 to O3, and CO outputs – CO0 to CO3).

The S inputs are used for the “propagate” signals of the carry lookahead logic. The “propagate” signals are sourced from the O6 output of a function generator. The DI inputs are used for the “generate” signals of the carry lookahead logic. The “generate” signals are sourced from either the O5 output of a function generator or the BYPASS input (AX, BX, CX, or DX) of a slice. The former input is used to create a multiplier, while the latter is used

Figure 5-24: Fast Carry Logic Path and Associated Elements

UG190_5_24_050506

O6 From LUTD

DMUX/DQ*

DMUX

DQO5 From LUTD

DX

S3MUXCY

DI3

CO3

O3

COUT (To Next Slice)

Carry Chain Block(CARRY4)

(Optional)

D Q

O6 From LUTC

CMUX/CQ*

CMUX

CQO5 From LUTC

CX

S2MUXCY

DI2

CO2

CO1

CO0

O2

(Optional)

D Q

O6 From LUTB

BMUX/BQ*

BMUX

BQO5 From LUTB

BX

S1MUXCY

DI1

O1

(Optional)

D Q

O6 From LUTA

AMUX/AQ*

AMUX

AQO5 From LUTA

AX

S0MUXCY

DI0

CIN

CIN (From Previous Slice)

* Can be used ifunregistered/registeredoutputs are free.

CYINIT

10

O0

(Optional)

D Q

We can map ripple-carry addition onto

carry-chain block.

The carry-chain block also useful for speeding up other adder structures

and counters.

Virtex 5 Verical Logic

Spring 2010 EECS150 - Lec03-FPGA Page

Putting it all together ... a SLICEL.

174 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008

Chapter 5: Configurable Logic Blocks (CLBs)R

Each CLB can contain zero or one SLICEM. Every other CLB column contains a SLICEMs. In addition, the two CLB columns to the left of the DSP48E columns both contain a SLICEL and a SLICEM.

Figure 5-4: Diagram of SLICEL

A6LUTROM

COUT

D

DX

C

CX

B

BX

A

AX

O6O5

UG190_5_04_032606

A5A4A3A2A1

D6

DMUX

D

DQ

C

CQ

CMUX

B

BQ

BMUX

A

AQ

AMUX

DX

D5D4D3D2D1

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

Q

CECK

CIN

0/1

A6LUTROM

O6O5

A5A4A3A2A1

C6

CX

C5C4C3C2C1

A6LUTROM

O6O5

A5A4A3A2A1

B6

BX

B5B4B3B2B1

A6LUTROM

O6O5

A5A4A3A2A1

A6

AXSRCE

CLK

A5A4A3A2A1

Q

Q

Q

Reset Type

Sync

Async The previous slides explain all

SLICEL features.

About 50% of the 17,280 slices in an LX110T are

SLICELs.

The other slices are SLICEMs, and have extra

features. 32

Page 9: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

Recall: 5-LUT architecture ...

ExpressFabric Technology

WP245 (v1.1.1) July 7, 2006 www.xilinx.com 5

R

The 6-input LUT leads to several benefits:• As it implements wider functions directly in the LUT, the number of logic levels

between registers is reduced, leading to higher performance.• It implements significantly more logic than a LUT with four inputs.• Power consumption is reduced because the larger LUT reduces the amount of

required interconnect (routing resources).The Virtex-5 family SLICEM LUTs also provide additional benefits:• New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or

32 x 2 distributed RAM. Benefits for the designer are a much denser and faster implementation of distributed RAM with increased flexibility.

• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement a shift register of up to 128 bits, providing significant area savings and reduced routing resources in comparison to previous architectures. Shift registers are features available only in Xilinx devices. The Xilinx ISE™ software packer automatically packs two 16-bit SRLs with common addressing but different data. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it can be implemented in a single slice.

Routing and Interconnect ArchitectureWith process technology advancements, interconnect timing delays can account for more than 50% of the critical path delay. A new diagonally symmetric interconnect pattern, developed for the Virtex-5 family, enhances performance by reaching more places in fewer hops. The new pattern allows for more logic connections to be made within two or three hops. Moreover, the more regular routing pattern makes it easier for the Xilinx ISE software to find the most optimal routes. All of the interconnect features are transparent to FPGA designers, but translate to higher overall

Figure 3: Block Diagram of a Virtex-5 6-Input LUT

WP245_03_051006

LUT5

A1

A2

A3

A4 D

A5

A6

A2

A3

A4 D

D6

D5

A5

A6

A2

A3

A4

A5

A6

LUT6

LUT5

A[6:2] D000000000100010

....

101

111011111011111

001

Q

Q

Q

Q

Q

Q

(1)

(1)

(1)

(0)

(0)

(0)

.... D

A[6:2]

32 Latches. Configured to 1 or 0.

Some parts of a logic design need many state elements.

SLICEMs replace normal 5-LUTs with circuits that can act like 5-LUTs, but can

alternatively use the 32 latches as RAM, ROM,

shift registers.33

Spring 2010 EECS150 - Lec03-FPGA Page

A SLICEM 6-LUT ...

Normal 6-LUT inputs.

Normal 5/6-LUT outputs.

Memory write

address

Memory data input

Memory data input.

Control output for chaining LUTs to

make larger memories.

Virtex-5 FPGA User Guide www.xilinx.com 173UG190 (v4.2) May 9, 2008

CLB OverviewR

Figure 5-3: Diagram of SLICEM

A6DI2

COUT

D

DX

C

CX

B

BX

A

AX

O6

DI1MC31

O5

UG190_5_03_041006

A5A4A3A2A1

D6

DIDMUX

D

DQ

C

CQ

CMUX

B

BQ

BMUX

A

AQ

AMUX

Reset Type

DX

D5D4D3D2D1

WA1-WA6WA7WA8

DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM

DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM

DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM

DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

Q

CECK

CLKWSGEN

CIN

0/1

WE

Sync

Async

A6DI2

O6

DI1

MC31

O5

A5A4A3A2A1

C6

CI

CX

C5C4C3C2C1

A6DI2

O6

DI1

MC31

O5

A5A4A3A2A1

B6

BI

BX

B5B4B3B2B1

A6DI2

O6

DI1

MC31

O5

A5A4A3A2A1

A6

AI

AXSRCE

CLK

WE

A5A4A3A2A1

Q

Q

Q

WA1-WA6WA7WA8

WA1-WA6WA7WA8

WA1-WA6WA7WA8

A 1.1 Mb distributed RAM can be made if all SLICEMs of an LX110T are used as RAM.

34

Spring 2010 EECS150 - Lec03-FPGA Page

Many RAM configurations possible ...

178 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008

Chapter 5: Configurable Logic Blocks (CLBs)R

SRHIGH and SRLOW can be set individually for each storage element in a slice. The choice of synchronous (SYNC) or asynchronous (ASYNC) set/reset (SRTYPE) cannot be set individually for each storage element in a slice.

The initial state after configuration or global initial state is defined by separate INIT0 and INIT1 attributes. By default, setting the SRLOW attribute sets INIT0, and setting the SRHIGH attribute sets INIT1. Virtex-5 devices can set INIT0 and INIT1 independent of SRHIGH and SRLOW.

The configuration options for the set and reset functionality of a register or a latch are as follows:

! No set or reset

! Synchronous set

! Synchronous reset

! Synchronous set and reset

! Asynchronous set (preset)

! Asynchronous reset (clear)

! Asynchronous set and reset (preset and clear)

Distributed RAM and Memory (Available in SLICEM only)

Multiple LUTs in a SLICEM can be combined in various ways to store larger amount of data.

The function generators (LUTs) in SLICEMs can be implemented as a synchronous RAM resource called a distributed RAM element. RAM elements are configurable within a SLICEM to implement the following:

! Single-Port 32 x 1-bit RAM

! Dual-Port 32 x 1-bit RAM

! Quad-Port 32 x 2-bit RAM

! Simple Dual-Port 32 x 6-bit RAM

! Single-Port 64 x 1-bit RAM

! Dual-Port 64 x 1-bit RAM

! Quad-Port 64 x 1-bit RAM

! Simple Dual-Port 64 x 3-bit RAM

! Single-Port 128 x 1-bit RAM

! Dual-Port 128 x 1-bit RAM

! Single-Port 256 x 1-bit RAM

Distributed RAM modules are synchronous (write) resources. A synchronous read can be implemented with a storage element or a flip-flop in the same slice. By placing this flip-flop, the distributed RAM performance is improved by decreasing the delay into the clock-to-out value of the flip-flop. However, an additional clock latency is added. The distributed elements share the same clock input. For a write operation, the Write Enable (WE) input, driven by either the CE or WE pin of a SLICEM, must be set High.

Virtex-5 FPGA User Guide www.xilinx.com 187UG190 (v4.2) May 9, 2008

CLB OverviewR

Distributed RAM configurations greater than the provided examples require more than one SLICEM. There are no direct connections between slices to form larger distributed RAM configurations within a CLB or between slices.

Figure 5-14: Distributed RAM (RAM256X1S)

UG190_5_14_050506

DI1D

A[7:0]

WCLKWE

(CLK)(WE/CE)

68

SPRAM64

RAM256X1S

A[6:1]WA[8:1]CLKWE

O6

DI1

68

SPRAM64

A[6:1]WA[8:1]CLKWE

O6F7BMUX

F8MUXRegisteredOutput

Output

(Optional)

D Q

O

DI1

68

SPRAM64

A[6:1]WA[8:1]CLKWE

O6

DI1

68

SPRAM64

A[6:1]WA[8:1]CLKWE

O6F7AMUX

A6 (CX)

A6 (AX)

A7 (BX)

Example configuration: Single-port 256b x 1,

registered output.

A complete list:

A 128 x 32b LUT RAM has a 1.1ns access time.35

Spring 2010 EECS150 - Lec03-FPGA Page

SLICEM shift register (one of many).

Virtex-5 FPGA User Guide www.xilinx.com 189UG190 (v4.2) May 9, 2008

CLB OverviewR

or flip-flop is available to implement a synchronous read. In this case, the clock-to-out of the flip-flop determines the overall delay and improves performance. However, one additional cycle of clock latency is added. Any of the 32 bits can be read out asynchronously (at the O6 LUT outputs) by varying the 5-bit address. This capability is useful in creating smaller shift registers (less than 32 bits). For example, when building a 13-bit shift register, simply set the address to the 13th bit. Figure 5-15 is a logic block diagram of a 32-bit shift register.

Figure 5-16 illustrates an example shift register configuration occupying one function generator.

Figure 5-15: 32-bit Shift Register Configuration

Figure 5-16: Representation of a Shift Register

ug190_5_15_050506

Output (Q)

RegisteredOutput

(Optional)

(AQ)

DI1

D Q

(AX)

SHIFTIN (MC31 of Previous LUT)

SHIFTIN (D)

A[4:0]

CLKCE

(A[6:2])

(CLK)(WE/CE)

SRL32

SRLC32E

A[6:2]

CLKCE

O6

MC31 SHIFTOUT (Q31)5

UG190_5_16_050506

SHIFTIN (D)

SHIFTOUT(Q31)WE

CLK

Address (A[4:0])

32-bit Shift Register

MUX

Q

5

See Virtex-5 User Guide for an complete list of shift-register types.

36

Page 10: EECS150 - Digital Design Lecture 3 - Field …inst.eecs.berkeley.edu/~cs150/sp10/Lecture/lec03-fpga.pdfSpring 2010 EECS150 - Lec03-FPGA Page EECS150 - Digital Design Lecture 3 - Field

Spring 2010 EECS150 - Lec03-FPGA Page

SLICEL vs SLICEM ...

Virtex-5 FPGA User Guide www.xilinx.com 173UG190 (v4.2) May 9, 2008

CLB OverviewR

Figure 5-3: Diagram of SLICEM

A6DI2

COUT

D

DX

C

CX

B

BX

A

AX

O6

DI1MC31

O5

UG190_5_03_041006

A5A4A3A2A1

D6

DIDMUX

D

DQ

C

CQ

CMUX

B

BQ

BMUX

A

AQ

AMUX

Reset Type

DX

D5D4D3D2D1

WA1-WA6WA7WA8

DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM

DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM

DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM

DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

Q

CECK

CLKWSGEN

CIN

0/1

WE

Sync

Async

A6DI2

O6

DI1

MC31

O5

A5A4A3A2A1

C6

CI

CX

C5C4C3C2C1

A6DI2

O6

DI1

MC31

O5

A5A4A3A2A1

B6

BI

BX

B5B4B3B2B1

A6DI2

O6

DI1

MC31

O5

A5A4A3A2A1

A6

AI

AXSRCE

CLK

WE

A5A4A3A2A1

Q

Q

Q

WA1-WA6WA7WA8

WA1-WA6WA7WA8

WA1-WA6WA7WA8

174 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008

Chapter 5: Configurable Logic Blocks (CLBs)R

Each CLB can contain zero or one SLICEM. Every other CLB column contains a SLICEMs. In addition, the two CLB columns to the left of the DSP48E columns both contain a SLICEL and a SLICEM.

Figure 5-4: Diagram of SLICEL

A6LUTROM

COUT

D

DX

C

CX

B

BX

A

AX

O6O5

UG190_5_04_032606

A5A4A3A2A1

D6

DMUX

D

DQ

C

CQ

CMUX

B

BQ

BMUX

A

AQ

AMUX

DX

D5D4D3D2D1

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

CECK

D

FFLATCHINIT1INIT0SRHIGHSRLOW

SR REV

Q

CECK

CIN

0/1

A6LUTROM

O6O5

A5A4A3A2A1

C6

CX

C5C4C3C2C1

A6LUTROM

O6O5

A5A4A3A2A1

B6

BX

B5B4B3B2B1

A6LUTROM

O6O5

A5A4A3A2A1

A6

AXSRCE

CLK

A5A4A3A2A1

Q

Q

Q

Reset Type

Sync

Async

SLICEMSLICEL

SLICEM adds memory features to LUTs, + muxes.

37

Spring 2010 EECS150 - Lec03-FPGA Page

Virtex-5 DSP48E Slice

38

Efficient implementation of multiply, add, bit-wise logical.

LX110T has 64 in a single column.

Spring 2010 EECS150 - Lec03-FPGA Page 39

UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture

!"#$%&'()*$$+,,-$$$).'/0$1

!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<

!"#$%&'()*!"#$%&'( !"#$%&')

To be continued ...

Throughout the semester, we will look at different Virtex-5 features in-depth.

Switch fabricBlock RAMDSP48 (ALUs)ClockingI/OSerial I/O + PCI