8
ADAPTIVE FPGAS: HIGH-LEVEL ARCHITECTURE AND A SYNTHESIS METHOD Valavan Manohararajah, Stephen D. Brown, and Zvonko G. Vranesic Department of Electrical and Computer Engineering University of Toronto manohv|brown|[email protected] ABSTRACT This paper presents preliminary work exploring adaptive field programmable gate arrays (AFPGAs). An AFPGA is adap- tative in the sense that the functionality of subcircuits placed on the chip can change in response to changes observed on certain control signals. We describe the high-level architec- ture which adds additional control logic and SRAM bits to a traditional FPGA to produce an AFPGA. We also describe a synthesis method that identifies and resynthesizes mutually exclusive pieces of logic so that they may share the resources available in an AFPGA. The architectural feature and its as- sociated synthesis method helps reduce circuit size by 28% on average and up to 40% on select circuits. 1. INTRODUCTION Field programmable gate arrays (FPGAs) have become an increasingly popular medium for implementing digital cir- cuits due to the high costs associated with application spe- cific integrated circuit (ASIC) implementations. An FPGA can be bought off the shelf and can be configured to imple- ment an arbitrary digital circuit. This programmability al- lows reduced development and verification times, and avoids the large costs involved in chip fabrication. The programma- bility of FPGAs comes with a steep price, however. An FPGA-based implementation is estimated to be forty times larger and three times slower than a comparable implemen- tation using standard cells [1]. A number of studies have ex- amined methods of improving the area and speed efficiency of FPGAs [2]–[5]. Here we consider an architectural mod- ification that improves the area efficiency of existing FPGA architectures by dynamically changing the functionality of the circuit implemented in the programmable logic fabric. The key difference between an AFPGA and FPGA is in the structure of the configuration element. In an FPGA, each configuration element determines the functionality of a single programmable resource as illustrated in Figure 1a. Although most of the configuration elements in an AFPGA are the same as in an FPGA, a fraction of them use the al- ternative structure illustrated in Figure 1b. In this structure, two configuration elements are connected to a single pro- Programmable Resource Config Programmable Resource Config 0 Config 1 Config Select (a) (b) Fig. 1. Structure of a configuration element in an FPGA (a) and an AFPGA (b). sum XOR carry as b 0 a 0 s 0 sum XOR carry b 1 a 1 s 1 sum XOR carry b 2 a 2 s 2 sum XOR carry b 3 a 3 s 3 s 4 Fig. 2. A four-bit add-subtract unit. sum carry b 0 a 0 s 0 sum carry b 1 a 1 s 1 sum carry b 2 a 2 s 2 sum carry b 3 a 3 s 3 s 4 (a) sub borrow b 0 a 0 s 0 sub borrow b 1 a 1 s 1 sub borrow b 2 a 2 s 2 sub borrow b 3 a 3 s 3 s 4 (b) Fig. 3. Specialized versions of the add-subtract unit: (a) is specialized to perform an addition and (b) is specialized to perform a subtraction. 1-4244-0 312-X/06/$20.00 c 2006 IEEE.

[IEEE 2006 International Conference on Field Programmable Logic and Applications - Madrid, Spain (2006.08.28-2006.08.30)] 2006 International Conference on Field Programmable Logic

  • Upload
    zvonko

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2006 International Conference on Field Programmable Logic and Applications - Madrid, Spain (2006.08.28-2006.08.30)] 2006 International Conference on Field Programmable Logic

ADAPTIVE FPGAS: HIGH-LEVEL ARCHITECTURE AND A SYNTHESIS METHOD

Valavan Manohararajah, Stephen D. Brown, and Zvonko G. Vranesic

Department of Electrical and Computer EngineeringUniversity of Toronto

manohv|brown|[email protected]

ABSTRACT

This paper presents preliminary work exploring adaptive fieldprogrammable gate arrays (AFPGAs). An AFPGA is adap-tative in the sense that the functionality of subcircuits placedon the chip can change in response to changes observed oncertain control signals. We describe the high-level architec-ture which adds additional control logic and SRAM bits to atraditional FPGA to produce an AFPGA. We also describe asynthesis method that identifies and resynthesizes mutuallyexclusive pieces of logic so that they may share the resourcesavailable in an AFPGA. The architectural feature and its as-sociated synthesis method helps reduce circuit size by 28%on average and up to 40% on select circuits.

1. INTRODUCTION

Field programmable gate arrays (FPGAs) have become anincreasingly popular medium for implementing digital cir-cuits due to the high costs associated with application spe-cific integrated circuit (ASIC) implementations. An FPGAcan be bought off the shelf and can be configured to imple-ment an arbitrary digital circuit. This programmability al-lows reduced development and verification times, and avoidsthe large costs involved in chip fabrication. The programma-bility of FPGAs comes with a steep price, however. AnFPGA-based implementation is estimated to be forty timeslarger and three times slower than a comparable implemen-tation using standard cells [1]. A number of studies have ex-amined methods of improving the area and speed efficiencyof FPGAs [2]–[5]. Here we consider an architectural mod-ification that improves the area efficiency of existing FPGAarchitectures by dynamically changing the functionality ofthe circuit implemented in the programmable logic fabric.

The key difference between an AFPGA and FPGA isin the structure of the configuration element. In an FPGA,each configuration element determines the functionality ofa single programmable resource as illustrated in Figure 1a.Although most of the configuration elements in an AFPGAare the same as in an FPGA, a fraction of them use the al-ternative structure illustrated in Figure 1b. In this structure,two configuration elements are connected to a single pro-

ProgrammableResource

ConfigProgrammable

Resource

Config 0

Config 1

Config Select

(a) (b)

Fig. 1. Structure of a configuration element in an FPGA (a)and an AFPGA (b).

sum

XOR

carry

asb0

a0

s0

sum

XOR

carry

b1

a1

s1

sum

XOR

carry

b2

a2

s2

sum

XOR

carry

b3

a3

s3

s4

Fig. 2. A four-bit add-subtract unit.

sumcarry

b0

a0

s0

sumcarry

b1

a1

s1

sumcarry

b2

a2

s2

sumcarry

b3

a3

s3

s4

(a)

subborrow

b0

a0

s0

subborrow

b1

a1

s1

subborrow

b2

a2

s2

subborrow

b3

a3

s3

s4

(b)

Fig. 3. Specialized versions of the add-subtract unit: (a) isspecialized to perform an addition and (b) is specialized toperform a subtraction.

1-4244-0 312-X/06/$20.00 c©2006 IEEE.

Page 2: [IEEE 2006 International Conference on Field Programmable Logic and Applications - Madrid, Spain (2006.08.28-2006.08.30)] 2006 International Conference on Field Programmable Logic

sum

sub

carry

borrow

b0

a0

s0

sum

sub

carry

borrow

b1

a1

s1

sum

sub

carry

borrow

b2

a2

s2

sum

sub

carry

borrow

b3

a3

s3

s4

as

Fig. 4. The four-bit add-subtract unit implemented on anAFPGA.

grammable resource through a multiplexer. The functional-ity of the programmable resource is determined by the con-figuration element selected by the multiplexer. The moti-vation for the study of this alternative structure is best il-lustrated with an example. Consider an FPGA architecturewhere a logic block can implement any function of three in-puts. A possible implementation of a four-bit add-subtractunit for this architecture is given in Figure 2. A subtrac-tion is performed when the control signal as (AddSub) isequal to 1 and an addition is performed when it is equal to 0.This implementation uses 12 logic blocks. The add-subtractunit can be specialized to perform either an addition or asubtraction by setting as to zero or one, respectively. Af-ter setting as to a constant, logic simplification removes thefour XOR blocks reducing the number of logic blocks re-quired to 8. The specialized versions of the add-subtractunit are illustrated in Figures 3a and 3b. Although the logicblocks have been specialized to perform different functions,the routing structure is still identical. By allowing the logicblocks to use the multiplexed configuration element, the twospecialized circuits can be combined into a single circuit asillustrated in Figure 4. In the combined circuit, each logicblock implements the functionality required by both addi-tion and subtraction. The appropriate function is activatedby as which is assumed to connect to a special configurationselect signal. This example shows that there are potentialarea benefits when the multiplexed configuration element isused selectively in an FPGA architecture.

The architectural modification we are proposing adds acontrol signal and some extra configuration memory in or-der to reduce the size of the circuit to be implemented. Inmodern FPGAs [6, 7] configuration memory is a small frac-tion of total chip area, and a small portion of this config-uration memory is duplicated in order to allow two distinctsubcircuits to share the same set of programmable resources.A detailed study of this area tradeoff can be found in [8].Here we present a high-level overview of the architectureand present a synthesis method that helps reduce circuit sizesignificantly when targeting AFPGAs.

2. AFPGAS: HIGH-LEVEL ARCHITECTURE

AFPGAs are built by modifying an existing SRAM config-urable FPGA. Although other programming methods suchas antifuses and floating gate devices can be considered,these are less popular and are not considered in this work.

Before we describe the additional circuitry needed tocreate AFPGAs, we briefly review the academic FPGA ar-chitecture [3] that is the basis for our work. At the highestlevel, an FPGA architecture is comprised of I/O pins andlogic clusters. Programmable routing resources comprisedof programmable multiplexers and routing switches allowconnections to be made between logic clusters, and betweenlogic clusters and I/O pins. Each logic cluster is made upof a number of basic logic elements (BLEs), and each BLEcontains a lookup table (LUT) and a register. Here, we as-sume that the target architecture uses 4-input LUTs. Previ-ous work has established the area effiency of this particularLUT size [2, 4].

The distinguishing feature of AFPGAs is their use of theadaptive configuration element illustrated in Figure 1b. Thisis in contrast to FPGAs which use the traditional configura-tion element illustrated in Figure 1a. In the traditional con-figuration element, each programmable resource is attachedto a single SRAM cell which determines its functionality.In the adaptive configuration element, two SRAM cells con-nect to the programmable resource through a multiplexer,and the SRAM cell that gets to control the programmableresource is determined by a configuration select signal at-tached to the multiplexer. If the two SRAM cells contain dif-fering values, the programmable resource being controlledchanges its functionality dynamically in response to changesin the select signal.

An AFPGA architecture is obtained by making two mod-ifications to an existing FPGA architecture. First, a portionof the configuration elements in an FPGA are replaced byadaptive configuration elements. Since the adaptive config-uration elements require a select signal, the second modifi-cation adds select signals, each controlling the configurationelements in a region of the chip. The select signals are ac-cessible from the programmable routing network allowingthem to be driven by I/O pins and basic BLE outputs.

The adaptive BLE, illustrated in Figure 5a, is obtainedfrom a traditional BLE by modifying the configuration el-ements that determine LUT functionality. Using adaptiveconfiguration elements in place of traditional configurationelements, the LUT inside the adaptive BLE has the ability toperform two different functions of its inputs. A select sig-nal, s, connects to the adaptive configuration elements andprovides a way of selecting the function computed by theLUT.

Multiplexers and routing switches present in an FPGAcan be made adaptive by replacing their configuration ele-ments with adaptive configuration elements. The adaptive

Page 3: [IEEE 2006 International Conference on Field Programmable Logic and Applications - Madrid, Spain (2006.08.28-2006.08.30)] 2006 International Conference on Field Programmable Logic

sram

a

b

c

d

D Q

reset

clock

reset

outLUT

s

s

sram

s

sram

RoutingSwitch

(b) (c)(a)

Fig. 5. (a), an adaptive BLE, (b), an adaptive multiplexer,and (c), an adaptive routing switch.

multiplexer, illustrated in Figure 5b, has the capability ofgating two different signals through to the output dependingon the value of the select signal, s. The adaptive routingswitch, illustrated in Figure 5c, supports two modes of op-eration where the current mode is determined by the selectsignal, s. The switch has the capability of being turned onor off under the control of the select signal if its SRAM cellshave been programmed to two different values.

An AFPGA uses adaptive BLEs, multiplexers and rout-ing switches to allow two subcircuits to share the same setof programmable resources.

3. COMPARISON WITH PREVIOUS WORK

The AFPGA architecture proposed here has similarities toprevious work on multicontext FPGAs. An FPGA’s con-text is the set of configuration elements that determine itsfunctionality. A traditional FPGA contains a single contextwhich allows it to implement a single function. A multicon-text FPGA has a number of contexts, but only one of them isactive at any time. There has been both academic and indus-trial interest in multicontext FPGAs. Academic works in-clude the Dharma architecture [9], dynamically programmablegate array (DPGA) [10] and virtual gate element array (VEGA)[11] while industrial work carried out at Xilinx considers aversion of the Xilinx XC4000E FPGA [12] with several con-texts.

An AFPGA can be viewed as a multicontext FPGA with2 contexts. However, unlike a true multicontext FPGA, onlya portion of an FPGA’s context is duplicated to obtain thesecond context. A second difference is in the way that thecontexts are switched. In a multicontext FPGA, the con-text switch is global; a chip wide signal determines the con-text that is made active. This signal is usually connected toa control circuit that sequences through the contexts or toan external signal provided by the user. On an AFPGA, anumber of different context activation signals are used, eachone serving as the activation signal for a different region onthe chip. Furthermore, a synthesized subcircuit within theAFPGA determines when the context is to be switched. Athird difference is in the nature of the transformation used to

Synthesis

Clustering

Placement

Routing

Technology Mapping

Packing

Design Entry

FPGA Configuration

Technology Mapping

Adaptation

Fig. 6. The AFPGA CAD flow.

map circuits onto multicontext FPGAs and AFPGAs. Multi-context FPGAs use time multiplexing [12] which is a trans-formation based on sequential techniques, while AFPGAsuse adaptation which is a transformation based on combi-national techniques.

4. AFPGAS: CAD FLOW

The CAD flow for AFPGAs is illustrated in Figure 6. Thefirst two steps, synthesis and technology mapping, are thesame as in an FPGA CAD flow. However, starting with thethird step, adaptation, there are significant differences be-tween the two CAD flows.

The process of adaptation creates pairs of subcircuitsthat share programmable resources in an AFPGA. There aretwo technology mapping steps in the AFPGA CAD flow,the first precedes adaptation and the second follows adapta-tion. The first mapping step is useful because it improvesthe performance of adaptation. During adaptation, the se-lection of control signals is guided by area concerns, and theaccuracy of the selection process is greatly enhanced if thecircuit has been mapped into LUTs. The second technologymapping step is useful because it produces a smaller circuitby remapping parts of the circuit affected by the simplifi-cation operations (described in Section 5.2) that take placeduring adaptation.

An AFPGA architecture is based on an existing FPGAarchitecture, thus the final four steps in the AFPGA CADflow, packing, clustering, placement and routing, performtheir traditional FPGA CAD tasks in addition to new tasksthat are AFPGA specific. In addition to the task of groupingLUTs and registers into BLEs, the packing step groups pairsof LUTs selected from pairs of subcircuits to form adaptiveBLEs. Clustering and placement have the additional taskof ensuring that the BLEs that are part of a subcircuit pairare clustered and placed in a region with adaptive structures

Page 4: [IEEE 2006 International Conference on Field Programmable Logic and Applications - Madrid, Spain (2006.08.28-2006.08.30)] 2006 International Conference on Field Programmable Logic

(BLEs, multiplexers and routing switches) whose function-ality changes in response to a single configuration selectsignal. Finally, the routing step has two additional tasks.First, it has to make use of adaptive multiplexers and rout-ing switches when making connections for subcircuit pairs.Second, it has to connect the control signals that generatedsubcircuit pairs to the appropriate configuration select sig-nals.

5. ADAPTATION

5.1. Introduction

A circuit may be viewed as being composed of several sub-circuits. The process of adaptation replaces some of thesesubcircuits with adaptive subcircuits capable of sharing theadaptive resources present in an AFPGA. An adaptive sub-circuit consists of two half circuits, each a specialized ver-sion of the subcircuit being replaced. The creation of anadaptive subcircuit begins with the selection of a control sig-nal which is then used to identify a subcircuit whose area isgreatly reduced when the control signal is assumed to be aconstant. Two specialized versions of the subcircuit can begenerated, one assuming that the control signal is zero andthe other assuming that the control signal is one. Since thecontrol signal can only be in one of two possible states, zeroor one, the specialized subcircuits are not needed at the sametime and can be combined to form the two halves of an adap-tive subcircuit. The control signal connects to the adaptivesubcircuit and determines which of the two halves is to beactive at any time.

5.2. Preliminaries and Problem Definition

The combinational portion of a boolean circuit can be repre-sented as a directed acyclic graph (DAG) G = (V (G), E(G)).A node in the graph v ∈ V (G) represents a logic gate, pri-mary input or primary output, and a directed edge in thegraph e ∈ E(G) with head, u = head(e), and tail, v =tail(e), represents a signal in the logic circuit that is an out-put of gate u and an input of gate v. The set of input edgesfor a node v, iedge(v), is defined to be the set of edges with vas a tail. Similarly, the set of output edges for v, oedge(v), isdefined to be the set of edges with v as a head. A primary in-put (PI) node has no input edges and a primary output (PO)node has no output edges. An internal node has both inputedges and output edges. The set of distinct nodes that supplyinput edges to v are referred to as input nodes and is denotedinode(v). Similarly, the set of distinct nodes that connect tooutput edges from v are referred to as output nodes and isdenoted onode(v).

The circuit of Figure 7a and its graph representation,Figure 7b, will be used as an example when illustrating thenotions defined below.

e

d

fb

(a) (b)

d

b

e

f

e2

e3

e5

e6

a

c

c

a e1

e4

Fig. 7. A circuit and its corresponding graph.

A node v in the graph is considered to be a boolean vari-able v whose value is determined by the boolean functionFv at node v. Although we use the same notation for botha node and the variable it is associated with, it will be clearfrom the context which one of the two is meant. An edge efrom u to v indicates that the function used to compute v,Fv, depends on variable u. In the example, the function at ddepends on a and b, and is defined to be Fd(a, b) = a + b.

In addition to the use of the standard logic values 0 and 1,the symbol ‘∗’ is used to represent an unknown or undefinedlogic value.

The ordered pair [v, l], where l ∈ {0, 1}, represents theassignment of logic value l to variable v. An implication ofthe assignment [v, l] is another assignment that is a conse-quence of [v, l]. The implication set, impl [v, l], is the set ofimplications that are a consequence of the assignment [v, l].In the example, an implication of the assignment [b, 0] is theassignment [e, 0], and the implication set of [d, 1] is as fol-lows

impl [d, 1] = {[d, 1], [a, 0], [b, 0], [e, 0], [f, 1]}.Note that the implication set of an assignment includes theassignment itself.

Two laws regarding implications will be useful whengenerating implication sets. The transitive law states thatif [v, l] ∈ impl [u, k] and [w, m] ∈ impl [v, l] then [w, m] ∈impl [u, k], and the contrapositive law states that if [v, l] ∈impl [u, k] then [u, k] ∈ impl [v, l].

If the output or any of the inputs of a function Fv havebeen assigned values, then a justification operation, JUSTIFY(v),can be performed on v. The operation returns a set of as-signments that are a consequence of the assignments inci-dent on v. The scope of the operation is limited to the as-signments that can be derived by examining Fv; the assign-ments returned only apply to the functions’s inputs or to itsoutput. Returning to our example, if d = 1 and f = ∗then JUSTIFY(f) returns the assignment [f, 1]. Similarly,if d = 1, a = ∗, and b = ∗ then JUSTIFY(d) returns twoassignments, [a, 0] and [b, 0].

The structure of a graph can be simplified based on theassignments made to its variables. For example, in Fig-ure 7, if d = 1 then the function at d can be replaced by theconstant-1 function, Fd = 1, and edges e1 and e2, and node

Page 5: [IEEE 2006 International Conference on Field Programmable Logic and Applications - Madrid, Spain (2006.08.28-2006.08.30)] 2006 International Conference on Field Programmable Logic

v

Hv

H0

H1

G

v

G

Adaptation

Hv

Fig. 8. The process of adaptation.

a can be removed. Similarly, if b = 0, then the function atd can be replaced by Fd = a, and edge e2 can be removed.After some assignments have been made to the variables inG, the simplification operation, SIMPLIFY(H), on a sub-graph H of G returns a new subgraph which is a simplifiedversion of H based on the assignments present. The pro-cess of simplification can be expressed as a sequence of twosimple operations. The first operation simplifies a node’sfunction based on the assignments present and removes theinput edges that are no longer needed by the new function.The second operation removes a node and its input edges ifall of its output edges have been removed. These two opera-tions are continually applied until they have no further effecton the graph.

The subgraph controlled by a variable v, Hv, consistsof the nodes and edges affected by the simplifications thatare possible after using the assignments in impl[v, 0] andimpl[v, 1]. The subgraph used to compute v is not part ofthe subgraph controlled by v. In our example, impl [b, 0] ={[b, 0], [e, 0]} and impl [b, 1] = {[b, 1], [d, 0]}. If we usethe assignments in impl [b, 0] then the subgraph consistingof nodes, {d, e, c, f}, and edges, {e2, e3, e4, e6}, is affectedby the simplifications that are possible, and if we use theassignments in impl [b, 1] then the subgraph consisting ofnodes, {d, e, a, f}, and edges, {e2, e3, e1, e5}, is affected bythe simplifications that are possible. Combining these twosubgraphs, we obtain the subgraph controlled by b which isessentially the graph of Figure 7b with node b removed.

Figure 8 illustrates the transformation that takes placeduring adaptation. First, a control variable, v, and the sub-graph it controls, Hv , are identified. Then, two specializedversions of Hv are generated; H0 is obtained by simplifyingHv after applying the assignments in impl [v, 0] and H1 isobtained by simplifying Hv after applying the assignmentsin impl [v, 1]. Finally, the two subgraphs are combined to

(a)

d fe5

ae1

(b)

e fe6

ce4

b = 0 b = 1

b

d fe5

ae1

e fe6

ce4

(c) (d)

b

a,d,f

c,e,f

Fig. 9. The process of adaptation on the graph of Figure 7.

form the adaptive subgraph, H ′v. Variable v remains con-nected to the adaptive subgraph and determines which oneof H0 and H1 is to be active at any time.

The area of a subgraph H , area(H), is defined to be thenumber of nodes in it. In an adaptive subgraph, only oneof the component subgraphs is active at any time. Thus, thearea of an adaptive subgraph is defined to be the maximumof the areas of the component subgraphs,

area(H ′v) = max{area(H0), area(H1)}. (1)

Note that the area of an adaptive subgraph H ′v is alwayssmaller than or equal to the area of Hv because H ′v is com-posed of the two subgraphs H0 and H1 which are both sim-plified versions of Hv .

The goal of adaptation is to replace controlled subgraphswith adaptive subgraphs such that the area of the resultinggraph is minimized. Consider the process of adaptation onthe graph of Figure 7b using b as the control variable. Weidentified the subgraph controlled by b earlier. The sub-graphs of Figures 9a and 9b are obtained when the controlledsubgraph is simplified assuming b is 0 and 1, respectively.These two subgraphs are combined into the adaptive sub-graph illustrated in Figure 9c. Although the adaptive sub-graph reduces the area of the graph by two nodes, even moregains are possible if the adaptive subgraph is remapped. Inthe original graph, all nodes had two inputs, and if we as-sume that the target architecture consists of 2-LUTs thenremapping produces the graph of Figure 9d. The final graphhas a three-fold reduction in area compared to the original.

5.3. Generating Implication Sets

Implication sets are central to the problem of adaptation.They are used to identify controlled subgraphs as well as togenerate adaptive subgraphs. Our use of implication sets isnot new; they have found several uses in the literature. Theyhave been used to solve satisfiability (SAT), automatic test

Page 6: [IEEE 2006 International Conference on Field Programmable Logic and Applications - Madrid, Spain (2006.08.28-2006.08.30)] 2006 International Conference on Field Programmable Logic

1 ADAPT()2 IMPLICATIONSETS()3 C ← V (G)4 forever5 bestv ← nil, bestc ← 06 for v ∈ C7 skip if v �∈ V (G)8 H0 ← SPECIALIZE(v, 0)9 H1 ← SPECIALIZE(v, 1)

10 c← COST(Hv , H0, H1)11 if c > bestc12 bestv ← v, bestc ← c13 end if14 end for15 break if bestv = nil16 H0 ← SPECIALIZE(bestv, 0)17 H1 ← SPECIALIZE(bestv, 1)18 G← G−Hbestv

19 G← G ∪ FORMADAPTIVE(H0, H1)20 end forever

Fig. 10. The procedure used to generate adaptive subgraphs.

pattern generation (ATPG), and logic synthesis problems.Thus, there have been a number of algorithms proposed togenerate implication sets [13]–[17]. Our work uses a modi-fied version of the algorithm proposed in [17]. The primarybenefits of this algorithm over the others are its speed, itsability to generate all possible implication sets simultane-ously and its ability to discover a large number of implica-tions. We briefly review the algorithm here.

The algorithm maintains implication sets for each signalin the circuit. Initially, each implication set will contain asingle entry: the assignment itself. As the algorithm pro-gresses the implication sets are expanded. The algorithmstops when there are no further changes observed on any ofthe implication sets. An implication set is expanded by firstmaking the assignments within it. Then justification opera-tions are used to discover new implications. The transitivelaw is used to discover further implications of the newly dis-covered implications. In addition to the transtive law, duringthe process of discovery, the contrapositive law is used toadd new entries to other implication sets.

5.4. Generating Adaptive Subgraphs

The procedure used to generate adaptive subgraphs is pre-sented in Figure 10. It begins with a call to IMPLICATION-SETS which generates all implication sets in the graph. Then,before any modifications are made to the graph, a copy ofthe variables is placed in C. The implication sets that weregenerated only apply to the variables that are present in theunmodified graph, but some of these variables will disappearas the graph is modified during adaptation. Set C allows usto keep track of the variables for which implication sets areavailable.

The inner loop (lines 6–14) uses a greedy method tofind a control variable that can be used to generate an adap-tive subgraph, and the outer loop (lines 4–20) keeps replac-

1 SPECIALIZE(v, l)2 for [u, k] ∈ impl[v, l]3 skip if u �∈ V (G)4 u← k5 end for6 H′ ← simplify(Hv)7 for [u, k] ∈ impl[v, l]8 skip if u �∈ V (G)9 u← ∗10 end for11 return H′

Fig. 11. The procedure used to generate a specialized ver-sion of Hv under the assumption that v has logic value l.

ing subgraphs with their adaptive equivalents until the innerloop fails to find a control variable. For every variable that ispresent in both C and the modified graph G, the inner loopdetermines a cost that reflects its suitability as a control vari-able. The cost of a variable v is determined by COST and isbased on the controlled subgraph, Hv , as well as the twosubgraphs H0 and H1 produced by SPECIALIZE. The vari-able with the highest non-zero cost is selected as the controlvariable to be used in generating an adaptive subgraph.

Given a variable v and a logic value l, the specializationprocedure, illustrated in Figure 11, produces a version ofHv simplified using the implications in impl [v, l]. Note thatsome assignments in impl [v, l] cannot be used (lines 3 and8) as the variables they apply to may no longer be a part ofthe modified graph G.

Returning to Figure 10, having selected a control vari-able in the inner loop, the outer loop removes the subgraphcontrolled by the variable and replaces it with an adaptivesubgraph composed of the two specialized versions of theremoved subgraph.

Since the goal of adaptation is to minimize the area ofthe resulting circuit, the cost computed for a variable v isbased on the number of edges eliminated in Hv to obtain H0

and H1. Although the area savings can be measured directlyby counting the number of nodes eliminated, the edge-basedmeasure is finer grained and better reflects the area obtainedafter the technology mapping step that follows adaptation;the technology mapping step may be able to pack more logicinto the nodes where some edges have been eliminated. Thecost is computed as the minimum of two values c0 and c1

where c0 is the number of edges eliminated to obtain H0

and c1 is the number of edges eliminated to obtain H1. Fur-thermore, there are two special cases that may cause the costof a variable to be zero preventing its selection by the in-ner loop (lines 11–13). It is assumed that at minimum eachcontrol signal in the target architecture determines the func-tionality of the adaptive structures within a region that is ap-proximately the size of a cluster. Thus, to produce adaptivesubgraphs that are at least as big as a cluster, the cost of avariable is set to zero if it does not eliminate as many edgesas there are inputs to a typical cluster in the target archi-

Page 7: [IEEE 2006 International Conference on Field Programmable Logic and Applications - Madrid, Spain (2006.08.28-2006.08.30)] 2006 International Conference on Field Programmable Logic

u

a0

a1

d

e

f

g

c

b

v

Fig. 12. Existence of adaptive subgraphs imposes con-straints on the way cones are generated.

tecture. For example, if the target architecture uses clusterswith four BLEs, then a variable’s cost is set to zero if it doesnot eliminate at least 10 edges [3]. If a variable v controlsa large subgraph Hv and has disproportionate values for c0

and c1 then it is unlikely to be a good control variable andits cost is set to zero. For example, consider a circuit with aglobal reset signal. The reset signal controls most of the cir-cuit, and most of the circuit can be removed if it is assumedto be one. However, if the reset signal is assumed to be zero,most of the circuit remains unchanged. If an adaptive sub-circuit is generated using the reset signal, area would not bereduced significantly and a large part of the circuit would nolonger be able to participate in other specialization opera-tions. Far better area results can be obtained for the circuitby using a number of control signals, each of which controlsa smaller subcircuit and is able to produce balanced valuesfor c0 and c1.

The complexity of generating adaptive subgraphs (lines4–20) can be derived as follows. The number of nodes in thegraph is denoted n, and the average number of nodes in thesubgraph controlled by a variable is denoted k. The innerloop simplifies the subgraph controlled by each variable todetermine a cost, thus its complexity is O(kn). The outerloop will iterate O(n/k) times as each iteration replaces asubgraph of size O(k) with an adaptive subgraph. There-fore, the complete process has a complexity of O(n2). Ona Pentium III 1 GHz computer, adaptive subgraphs can begenerated for the 20 largest MCNC [18] circuits in 19 min-utes.

6. TECHNOLOGY MAPPING AFTERADAPTATION

Although the technology mapping procedure used after adap-tation is the same as the one that precedes adaptation, theexistence of adaptive subgraphs imposes constraints on theway cones are generated. A node v that is a part of an adap-tive subgraph may include a predecessor node u in a cone atv only if u is a part of the same adaptive subgraph or u isnot a part of any adaptive subgraph. A node v that is not apart of an adaptive subgraph may only include predecessors

Unadapted Adapted Mapped AdaptiveCircuit Ctrls (LUTs) (LUTs) (LUTs) (Ratio)C6288 15 904 904 714 0.69alu4 7 1035 820 780 0.97apex2 9 1201 960 921 0.95apex4 4 1087 850 733 0.89bigkey 1 1594 1366 912 0.87clma 13 4545 2801 2686 0.98des 18 1225 1092 1010 0.85diffeq 12 849 836 823 0.73dsip 1 1153 1145 691 0.99elliptic 15 2113 2100 2070 0.62ex1010 6 2532 1663 1480 0.94ex5p 2 931 710 602 0.93frisc 22 2269 2087 1997 0.85i10 13 779 680 661 0.81misex3 4 1086 678 611 0.98pdc 4 1882 1349 1190 0.97s38417 30 3672 3477 3362 0.86s38584.1 16 3730 3586 2916 0.93seq 5 1089 789 726 0.96spla 5 1337 981 853 0.99GeoAvg 7.10 1510.20 1241.06 1090.78 0.88Ratio 1.0 0.82 0.72

Table 1. The effect of adaptation on the 20 largest MCNCcircuits.

that are not a part of any adaptive subgraph in a cone at v.For example, in Figure 12, d can be a part of cone at f , ecan be a part of a cone at g, and c can be a part of a cone ate. However, a0/a1 cannot be a part of a cone at b or d.

7. RESULTS

The 20 largest MCNC [18] circuits were used to study thebenefits of the adaptation step. Before undergoing adapta-tion, each circuit was first synthesized using SIS (script-rugged) [19] and then technology mapped into 4-LUTsusing IMap [20]. IMap was also used for the second tech-nology mapping step following adaptation. For each circuit,Table 1 presents the number of control variables used duringadaptation (Ctrls), the area before adaptation (Unadapted),the area following adaptation (Adapted), the area followingthe second technology mapping step (Mapped), and the frac-tion of LUTs in the final circuit that are adaptive (Adaptive).

Although an average of 7.1 control signals were usedduring adaptation, there are significant differences in thenumber of control variables used to adapt each circuit. Somecircuits such as bigkey produce significant area reductionswith a single control variable while circuits such as clmaproduce similar area gains with a much larger number ofcontrol variables (13). Architectural decisions will be guidedby the larger number of control variables as an architecturethat supports several control variables can be made to mimica single control variable, whereas an architecture that sup-ports a single control variable cannot mimic the presence ofseveral control variables.

When adaptation is used in isolation, an average areasavings of 18% is observed. The addition of the second tech-

Page 8: [IEEE 2006 International Conference on Field Programmable Logic and Applications - Madrid, Spain (2006.08.28-2006.08.30)] 2006 International Conference on Field Programmable Logic

nology mapping step increases the area savings to 28%.A large fraction of the circuit produced by adaptation

(88%) is adaptive. From an architectural point of view, thissuggests that most of the LUTs in an AFPGA must be ofthe adaptive variety to effectively support the circuits beingproduced by adaptation.

8. WORK DESCRIBED ELSEWHERE

Although adaptive circuits use 28% fewer LUTs than non-adaptive circuits, this is only one of the factors in the areacomparison between adaptive circuits. To determine the truearea benefits of AFPGAs, two other factors need to be quan-tified. First, the area of the extra circuitry required to im-plement adaptive circuits needs to measured. Second, thestructural changes that a circuit underwent to become adap-tive may produce an associated increase or decrease in thedemand for routing resources, and this effect needs to bemeasured. A detailed area study considering these effects ispresented in [8]. The study showed that even after the areaoverheads were considered, an AFPGA was found to be ap-proximately 8–14% more area efficient than a comparableFPGA.

9. SUMMARY

We presented a high-level overview of the AFPGA archi-tecture and detailed the adaptation step which was used toexploit the architectural features present in an AFPGA. Twoalgorithms that were part of the adaptation step were de-scribed. The first algorithm generated implication sets whichwere then used by the second algorithm to generate adap-tive subcircuits. Following adaptation, a technology map-ping step was run to reduce both the area and depth of theresulting circuits. The entire procedure reduced the area ofthe 20 largest MCNC circuits by 28%.

10. REFERENCES

[1] I. Kuon and J. Rose. Measuring the Gap Between FPGAs andASICs. In Proceedings of the ACM International Symposiumon FPGAs, Monterey, CA, February 2006, pp. 21–30.

[2] J. Rose, R. Francis, D. Lewis and P. Chow. Architecture ofField-Programmable Gate Arrays: The Effect of Logic BlockFunctionality on Area Efficiency. IEEE Journal of Solid-StateCircuits, Vol. 25, No. 5, October 1990, pp. 1217–1225.

[3] V. Betz, J. Rose and A. Marquardt. Architecture and CADfor Deep Submicron FPGAs. Kluwer Academic Publishers,1999.

[4] E. Ahmed and J. Rose. The Effect of LUT and Cluster Sizeon Deep-Submicron FPGA Performance and Density. In Pro-ceedings of the ACM Symposium on FPGAs, Monterey, CA,February 2000, pp. 3–12.

[5] M. Sheng and J. Rose. Mixing Buffers and Pass Transistors inFPGA Routing Architectures. In Proceedings of the ACM In-ternational Symposium on FPGAs, Monterey, CA, February2001, pp. 75–84.

[6] Altera. Altera Device Handbook. 2005.

[7] Xilinx. Xilinx Device Handbook. 2005.

[8] V. Manohararajah. Area Optimizations in FPGA Architectureand CAD, Ph.D. Thesis, University of Toronto, 2005.

[9] N. B. Bhat, K. Chaudhary and E. S. Kuh. Performance-Oriented Fully Routable Dynamic Architecture for a FieldProgrammable Logic Device, University of California atBerkeley, Memo. No. UCB/ERL M93/42, June 1993.

[10] M. Bolotski, A. DeHon and T. F. Knight Jr. Unifying FPGAsand SIMD Arrays. In Proceedings of the ACM/SIGDA Inter-national Workshop on FPGAs, February 1994, Berkeley, CA,pp. 1–10.

[11] D. Jones and D. M. Lewis. A Time-Multiplexed FPGA Ar-chitecture for Logic Emulation. In Proceedings of the IEEECustom Integrated Circuits Conference, 1995, pp. 495–498.

[12] S. Trimberger, D. Carberry, A. Johnson and J. Wong. A Time-Multiplexed FPGA. In Proceedings of the IEEE Symposiumon FPGAs for Custom Computing Machines, Los Alamitos,April 1997, pp. 22–28.

[13] M. H. Schulz, E. Trischler and T. M. Sarfert. SOCRATES: AHighly Efficient Automatic Test Pattern Generation System.IEEE Transactions on Computer-Aided Design, Vol. 7, No.1, January 1988, pp. 126–136.

[14] J. Rajski and H. Cox. A Method to Calculate Neces-sary Assignments in Algorithmic Test Pattern Generation.In Proceedings of the IEEE International Test Conference,Semptember 1990, pp. 25–34.

[15] S. T. Chakradhar and V. D. Agrawal. A Transitive Clo-sure Algorithm for Test Generation. IEEE Transactions onComputer-Aided Design, Vol. 12, No. 7, July 1993, pp. 1015–1028.

[16] W. Kunz and D. Stoffel. Reasoning in Boolean Networks.Kluwer Academic Publishers, 1997.

[17] J. Zhao, M. Rudnick and J. Patel. Static logic implication withapplication to fast redundancy identification. In Proceedingsof the VLSI Test Symposium, 1997, pp. 288–293.

[18] Collaborative Benchmarking Laboratory.LGSynth93 Benchmark Suite. Available fromhttp://www.cbl.ncsu.edu/www/.

[19] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Mur-gai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton andA. L. Sangiovanni-Vincentelli. SIS: A System for SequentialCircuit Synthesis. Technical Report, University of Californiaat Berkeley, 1992, Memorandum No. UCB/ERL M92/41.

[20] V. Manohararajah, S. D. Brown, and Z. G. Vranesic. Heuris-tics for Area Minimization in LUT-Based FPGA TechnologyMapping. In Proceedings of the International Workshop onLogic and Synthesis, Temecula Creek Inn, CA, USA, June2004, pp. 14–21.