Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays

Lecture 2: Field Programmable Gate Arrays September 13, 2004

ECE 697F

Reconfigurable Computing

Lecture 2

Field Programmable Gate Arrays


Overview

• Three types of FPGAs- EEPROM

- SRAM

- Antifuse

• SRAM FPGA architectural choices.

• FPGA logic blocks -> size versus performance.

• FPGA switch boxes

• State-of-the-art

- Research issues in architecture.


Configuration vs. programming

° FPGA configuration:• Bits stay at the device they

program.

• A configuration bit controls a switch or a logic bit.

° CPU programming:• Instructions are fetched from a

memory.

• Instructions select complex operations.

CPUmemoryadd r1, r2 IRadd r1, r2


Logic element questions

° How many inputs?

° How many functions?• All functions of n inputs or eliminate some combinations?

• What inputs go to what pieces of the function?

° Any specialized logic?• Adder, etc.

° What register features?


Anti-Fuse FPGA (Actel ACT family)

• Anti-fuses are one-time programmable.

- 16 Volt pulse eliminates dielectric

- Only need to program once.• High performance -> direct connections between poly and N+

• Less appropriate for Reconfigurable Computing

- Good for bus transceivers

- High speed operation.


Antifuses° Permanently programmed.

° Make a connection with electrical signal.• More reliable than breaking a connection.

• Avoids shrapnel.

° Resistance of about 100 .


Antifuse structure

substrate

Metal 1

Metal 2

antifuse

via


Rows of programmablelogic building blocks

+

rows of interconnect

Anti-fuse Technology:Program Once

8 input, single output combinational logic blocks

FFs constructed from discrete cross coupled gates

Use Anti-fuses to buildup long wiring runs from

short segments

I/O Buffers, Programming and Test Logic

Logic Module Wiring Tracks

I/O Buffers, Programming and Test Logic

I/O

Bu

ffer

s, P

rog

ram

min

g a

nd

Tes

t L

og

ic

I/O B

uffers, P

rog

ramm

ing

and

Test L

og

ic

Actel Programmable Gate Arrays


Basic Module is aModified 4:1 Multiplexer

Example: Implementation of S-R Latch

2:1 MUXD0

D1

SOA

2:1 MUXD2

D3

SOB

2:1 MUX

S0

Y

S1

2:1 MUX"0"

R

2:1 MUX"1"

S

2:1 MUX Q

"0"

Actel Logic Module


Interconnection Fabric

Logic Module

Horizontal Track

Vertical Track

Anti-fuse

Actel Interconnect


Jogs cross an anti-fuse

minimize the # of jogs for speed critical circuits

2 - 3 hops for most interconnections

Logic Module

Logic ModuleLogic Module Output

Input

Input

Actel Routing Example


EEPROM Devices (PLDs)

• Frequently used technology for PALs, GALs, EPLDs

• User design frequently decomposed into SOP representation

• Appropriate for system glue logic.

• Single transistor interconnection point.


Altera Max 7000 Macrocell

Product-TermSelectMatrix

ClearSelect

Clock/EnableSelect

VCC

PRN

CLRN

ENA

D Q

GlobalClear

GlobalClock

To I/OControl

Block

To PIA

This respresents amultiplexercontrolled by theconfigurationprogram

ProgrammableRegister

36 Signalsfrom PIA

16 ExpanderProduct

Shared LogicExpanders

LAB Local Array

Parallel LogicExpanders(from othermacrocells)


Max 7000 PLD Structure

Input/GCLK1Input/OE2/GCLK2

Input/OE1

LAB A

Macrocells1-166-

6-16

16

6-16

I/OControlBlock

6-16I/O Pins

3

LAB C

Macrocells33-486-

6-16

16

6-

I/OControlBlock

6-16I/O Pins

3

LAB B

LAB D

Macrocells17-32

Macrocells49-64

6-16

1

3

6-16

1

3

6-16I/O Pins

6-16I/O Pins

I/OControlBlock

I/OControlBlock

6

6

6

6

PIA

6 OutputInput/GCLRn

6 Output

6-

6-16

6-

6-


SRAM-based FPGA

• SRAM bits can be programmed many times

• Each programming bit takes up five transistors

• Larger device area reduces speed versus EPROM and antifuse.

Read or Write

Data

Q

Q

Programming Bit I1I2

P1

P2P3P4

Out

2-Input LUT


Field Programmable Gate Array


Design Tradeoffs

• Some logic clusters are large (e.g. Altera/Xilinx contains 8-10 LUT-FF pairs)

• Three important issues:

- Logic elements per cluster

- Cluster connectivity to interconnect – wires (FC) – connection flexibility

- Switchbox flexibility (Fs)

LogicCluster

IO connectionsswitchbox


Issue 1: The Logic Cluster

• Question: How many BLE should there be per cluster?


Logic cluster utilization (Betz & Rose)

° Logic utilization vs. fraction of inputs accessible to LE in cluster.

° Utilization at 100% when only 50%-60% of inputs are accessible.

° Also found that connecting each track to only one LE output per cluster was sufficient.

© 1998 IEEE


Area efficiency vs. cluster size (Betz & Rose)

° Transistors per LE vs. cluster size.

• Includes overhead circuits.

° Clusters in size 1-8 were area-efficient.

© 1998 IEEE


Logic Cluster Size

• Interestingly, small block cluster more efficient (Betz – CICC’99)

• Includes area needed for routing.

• Small clusters (e.g. one BLE per cluster) not “CAD friendly).

• Most commercial devices have 4-10 BLEs per cluster


Number of Inputs per Cluster

• Lots of opportunities for input sharing in large clusters (Betz – CICC’99)

• Reducing inputs reduces the size of the device and makes it faster.

• Most FPGA devices have more inputs than actually needed to allow for routing flexibility


Connection Box Flexibility

• Fc -> How many tracks does an input pin connect to?

• If logic cluster is small, FC is large FC = W

• If logic cluster is large, Fc can be less.

- Approximately 0.2W for Xilinx XC4000EX, Virtex

LogicCluster

IO pin

Tracks

OutT0 T1 T2

T0T1T2

Out

FC = 3T0 T1 T2


Switchbox Flexibility

• Switch box provides optimized interconnection area.

• Flexibility found to be not as important as FC

• Six transistors needed for FS= 3

0

1

0

1

0 1

0 1


Putting it all together

• Xilinx XC4000EX family

- FS = 3

- FC = 0.2

- I = 8• Altera Flex10K family

- FS = 3

- FC = 0.25

- I = 22

More contemporary FPGAs have larger cluster sizes and segmentation.• More difficult to quantify exact Fc and Fs values.


Switchbox Issues


Switch Matrix


Xilinx 4000 Interconnect Details


Wilton Switchbox

• Rotate connections inside the switchbox while keeping FS= 3

• Still has six transistors for base switch matrix.

• Eliminates domain issue

0 21

2

0

1

2

0

1

0 21


Switchbox Issues


Buffering

• FPGAs need to buffer to isolate large RC networks

• Architects must decide where to place buffers.

S S


Segmentation

• Segmentation distribution: how many of each length?

• Longer length

- Better performance? - Reduced routability?

X Y

Length 4

Length 2

Length 1


Translating a Design to an FPGA

• Hierarchical FPGA likely to have a tree-like interconnect.

• Each “sub-array” contains about 100K gates

• Clever VLSI layout needed

FPGA

FPGA

FPGA

FPGA


Pipelined Interconnect

• Latest trend in FPGAs is to embed clocked flip flops in device to pipeline data.

• Helps create tolerance for delay

• Allows interconnect to be reused

• Large FPGA looks like a parallel processor.

FPGA FPGA


FPGA Comparison

SRAM Antifuse Flash EPROM

Speed Worst Best Worst Medium

Power Varies Near Best Best Worst

Density Medium Second Best Worst

Radiation Worst Best Medium Medium

Routing Cell size 1 1/10 1/7 PLD

Reprogrammable Yes No Yes Yes


Summary

• Three basic types of FPGA devices

- Antifuse

- EEPROM

- SRAM

• Key issues for SRAM FPGA are logic cluster, connection box, and switch box.

• Latest advances examine performance and routability.

Next class: FPGA versus Processor

Documents

Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays