31
Dynamically Reconfigurable FIR Filter Architectures with Fast Reconfiguration Martin Kumm, Konrad Möller and Peter Zipf University of Kassel, Germany

Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

Dynamically Reconfigurable FIR Filter Architectures with Fast Reconfiguration

Martin Kumm, Konrad Möller and Peter Zipf

University of Kassel, Germany

Page 2: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

2

FIR FILTER

Fundamental component in digital signal processing

Computationally complex due to numerous multiply/accumulate operations

Page 3: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

3

WHY RECONFIGURATION?

Many applications require the change of coefficients...

...but only from time to time

➯ Possibility to reduce complexity

Page 4: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

4

METHODS OF RECONFIGURATION

1. Integrating multiplexers into the design

2. Partial reconfiguration (e.g., using ICAP)

3. Reconfigurable LUTs

Page 5: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

5

MULTIPLEXER BASED RECONFIGURATION

Multiplexers are integrated in add/shift networks

☺ Extremly fast reconfiguration (single clock cycle)

☹ Only a limited set of coefficients possible!

Fig. 6. Reconfigurable multiplier for the constants 815,621,831,105.

Fig. 7. Same constants as Fig. 6, but allowing one more operator. This resultsin two arithmetic operators at LD = 2 and a smaller overall area cost.

and stopband edges, respectively, and pass-/stopband rippleof 0.05 was chosen. The filter length is chosen such that itmeets the specification and N = M · n, where M is thedecimation factor. After designing the filter with firpm withthe respective length and equal weight of 1 for both pass- andstopband, the coefficients are rounded to 14 bits, which resultsin a maximum of 13 active bits effectively. The filter input bitwidth is again set to 8 bits. Table I shows the filter length forthe overall filter and the subfilters, the required componentsas well as the bit level cost for the ReMCM block, for eachunique subfilter, for a MCM block containing all coefficientsand for the symmetry multiplexers. Lastly, an estimate is givenfor the area reduction due to the replacement of adders witha loop and MUX before the structural adders. All results fordecimation of 4 to 8 are in square micrometer and based onthe cost estimate from [9]. From the results in Table I, itcan be seen that the fused MCM block requires between 8and 17% less area compared with the implementation of aMCM block which contains all coefficients. With increasingdecimation and filter length, the area increases moderately. The

filter with M = 8 is twice as long as the filter with M = 4but the ReMCM implementation only requires 26% more area.The savings due to reduction of adders before the structuraladders increases linearly to more than three times the size ofthe MCM block with all coefficients. By comparing the costof implementation of unfused subfilters with that of ReMCM,savings of up to 38% are possible. The savings at the structuraladders is more than twice the size of the ReMCM block.

V. CONCLUSION

An algorithm which implements optimized reconfigurablemultiple constant multiplication was presented. It is capableto generate a reconfigurable block that produces the productof an input multiplied by several fundamentals at any time.The method uses a minimal logic depth adder-graph algorithminternally and holds opportunities for further improvement.The experimental results show an area cost improvementof up to 38% compared with a parallel implementation ofthe subfilters and the results for single output ReMCM arecomparable to those of the previously published algorithms.

REFERENCES

[1] D. R. Bull and D. H. Horrocks, “Primitive operator digital filters,” IEEProc. G on Circuits, Devices Syst., vol. 138, no. 3, pp. 401–412, Jun.1991.

[2] Y. Voronenko and M. Puschel, “Multiplierless multiple constant multi-plication,” ACM Trans. Algorithms, vol. 3, no. 2, p. 11, May 2007.

[3] K. Johansson, “Low power and low complexity shift-and-add based com-putations,” Ph.D. dissertation, Linkoping University, 2008, LinkopingStudies in Science and Technology. Dissertations.

[4] M. Faust and C. H. Chang, “Minimal logic depth adder tree optimizationfor multiple constant multiplication,” in Proc. IEEE Int. Symp. onCircuits Syst., 2010. ISCAS 2010., Paris, France, May 30 - Jun. 2 2010,pp. 457–460.

[5] F. Xu, C. H. Chang, and C. C. Jong, “Contention resolution: Anew approach to versatile subexpressions sharing in multiple constantmultiplications,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 2,pp. 559–571, Mar. 2008.

[6] S. S. Demirsoy, A. G. Dempster, and I. Kale, “Design guidelines forreconfigurable multiplier blocks,” in Proc. IEEE Int. Symp. on CircuitsSyst., 2003. ISCAS 2003., vol. 4, Bangkok, Thailand, 25-28 May 2003,pp. 293–296.

[7] N. Sidahao, G. A. Constantinides, and P. Y. Cheung, “Multiple re-stricted multiplication,” Lecture Notes in Computer Science, Field Pro-grammable Logic and Application, vol. 3203/2004, pp. 374–383, 2004.

[8] S. Demirsoy, I. Kale, and A. G. Dempster, “Reconfigurable multiplierblocks: Structures, algorithm and applications,” Circuits, Syst. SignalProcess., vol. 26, no. 6, pp. 793–827, Dec. 2007.

[9] P. Tummeltshammer, J. C. Hoe, and M. Puschel, “Time-multiplexedmultiple-constant multiplication,” IEEE Trans. Comput.-Aided Des. In-tegr. Circuits Syst., vol. 26, no. 9, pp. 1551–1563, Sep. 2007.

[10] J. Chen and C. H. Chang, “High-level synthesis algorithm for the designof reconfigurable constant multiplier,” IEEE Trans. Comput.-Aided Des.Integr. Circuits Syst., vol. 28, no. 12, pp. 1844–1856, Dec. 2009.

[11] O. Gustafsson and A. G. Dempster, “On the use of multiple constantmultiplication in polyphase FIR filters and filter banks,” in Proc. NordicSignal Processing Symp., 2004. NORSIG 2004., Espoo, Finland, 9-11Jun. 2004, pp. 53–56.

[12] S. S. Demirsoy, A. G. Dempster, and I. Kale, “Power analysis ofmultiplier blocks,” in Proc. IEEE Int. Symp. on Circuits Syst., 2002.ISCAS 2002., vol. 1, Scottsdale, Arizona, 26-29 May 2002, pp. 297–300.

[13] K. Johansson, O. Gustafsson, and L. Wanhammar, “A detailed com-plexity model for multiple constant multiplication and an algorithm tominimize the complexity,” in Proc. 2005 European Conf. Circuit Theoryand Design, vol. 3, Cork, Ireland, 28 Aug.-2 Sep. 2005, pp. 465–468.

[Faust et al. ’10]

Page 6: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

6

PARTIALRECONFIGURATION

Partial regions of the FPGA are reconfigured via ICAP

☺ Least resources

☺ Arbitrary coefficients...

☹ ... but synthesis needed for each coefficient set

☹ Slow reconfiguration (≈μs/ms)!

Page 7: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

7

RECONFIGURABLE LUTS

Changing the LUT content only

Routing has to be fixed

First academic tool available (TLUT flow, [Bruneel et al. ’11])

☺ Fast reconfiguration (a few clock cycles, ≈ns/μs)

☺ Arbitrary coefficients...

☹ ... but (again) synthesis needed for each coefficient set

➯ Not, if a generic architecture is transformed to fixed routing

Page 8: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

8

RECONFIGURABLE LUTS

FPGA components to realize reconfigurable LUTs

Older Xilinx FPGAs (Virtex 1-4): Shift-Register LUT (SRL16)

Newer Xilinx FPGAs (Virtex 5/6, Spartan 6, 7-Series): CFGLUT5 (similar to SRLC32E but with two output functions)

Other FPGA vendors: Distributed RAM or block RAM

Page 9: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

9

METHODS OF RECONFIGURATION

1. Integrating multiplexers into the design➯ Logic fixed, routing flexible

2. Partial reconfiguration (e.g., using ICAP)➯ Logic flexible, routing flexible

3. Reconfigurable LUTs➯ Logic flexible, routing fixed

Page 10: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

10

LUT BASED FIR FILTER

Two well-known methods that employ LUTs in a fixed structure, suitable for FIR filters:

1. Distributed Arithmetic [Crosisier et al. ’73] [Zohar ’73] ...... [Kumm et al. ’13]

2. LUT based multipliers [Chapman ’96] [Wiatr et al. ’01]

Page 11: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

11

The main question is:

"Which architecture performs best?“

Page 12: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

12

DISTRIBUTED ARITHMETIC

Main idea is rearranging the underlying inner product

Resulting function (realized as LUT) is identical for each bit b

➯ Less configuration memory

xNb = (x0,b, x1,b, . . . , xN−1,b)

T

y = c · x =N−1�

n=0

cn xn

=N−1�

n=0

cn

Bx−1�

b=0

2bxn,b

=Bx−1�

b=0

2bN−1�

n=0

cnxn,b

� �� �=f(xN

b ) (LUT)

Page 13: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

13

DISTRIBUTED ARITHMETIC OVERALL ARCHITECTURE

Pre-processing to exploit coefficient symmetry

Reconfigurable LUTsOutput adder tree

Reconfiguration circuit

Page 14: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

14

DISTRIBUTED ARITHMETIC MAPPING TO CFGLUT5

Page 15: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

15

LUT MULTIPLIERFIR FILTER

cn · xn� �� �Bc×Bx mult.

= cn

L−1�

b=0

2bxn,b

� �� �Bc×L mult.

+2L cn

L−1�

b=0

2bxn,b+L

� �� �Bc×L mult.

+ . . .

Basic Idea: Split a multiplication into smaller chunks which fit into the FPGA LUT:

Page 16: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

16

LUT MULTIPLIERMAPPING TO CFGLUT5

Page 17: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

17

LUT MULTIPLIEROVERALL ARCHITECTURE

Replaced by reconfigurable multipliers

Page 18: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

18

CONTROL ARCHITECTURE

Page 19: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

19

RESOURCE COMPARISON

Distributed Arithmetic LUT Multiplier FIR

LUTs with inputs

CFGLUTs:

Bx + 1 M LUTs with inputs

CFGLUTs:

BxM

M �Bx/4� �Bc/2 + 2�

≈ 1

4BxM(Bc/2 + 2)

(Bx + 1) �M/4� �Bc/2 + 1�

≈ 1

4(Bx + 1)M(Bc/2 + 1)

: No. of unique taps

: input/coefficient bit widthBx/Bc

M = �N/2�

Page 20: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

20

RESOURCE COMPARISON

Distributed Arithmetic LUT Multiplier FIR

LUTs with inputs

CFGLUTs:

Bx + 1 M LUTs with inputs

CFGLUTs:

BxM

M �Bx/4� �Bc/2 + 2�

≈ 1

4BxM(Bc/2 + 2)

(Bx + 1) �M/4� �Bc/2 + 1�

≈ 1

4(Bx + 1)M(Bc/2 + 1)

Surprisingly, CFGLUT requirements are very similar!

Page 21: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

21

RESOURCE COMPARISON

Distributed Arithmetic LUT Multiplier FIR

Adders: Adders:M +Bx + (Bx + 1) �M/4� 2M − 1 +M �Bx/4�

➯ So, LUT multiplier based FIR filters are better when...

...,i.e., the input word size is greater than approximately half the number of coefficients

2M − 1 +MBx/4 < M +Bx + (Bx + 1)M/4...3

4M − 1 < Bx

M = �N/2�Bx

Page 22: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

22

RESULTS: 1ST EXPERIMENT

Synthesis experiment for Virtex 6

Nine benchmark filters with length N=6...151

Input word size

➯ Very fast reconfiguration times: 49...106 ns

➯ High clock frequencies: 472 MHz/494 MHz (DA/LUT mult.)

Bx ∈ {8, 16, 24, 32}

Page 23: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

23

RESULTS: 1ST EXPERIMENT

6 10 13 20 28 41 61 119 151−40

−20

0

20

40

Filter length N

Slic

eim

prov

emen

t[%

]

(a) Input word size Bx = 8 bit

6 10 13 20 28 41 61 119 151−40

−20

0

20

40

Filter length N

Slic

eim

prov

emen

t[%

]

(b) Input word size Bx = 16 bit

6 10 13 20 28 41 61 119 151−40

−20

0

20

40

Filter length N

Slic

eim

prov

emen

t[%

]

(c) Input word size Bx = 24 bit

6 10 13 20 28 41 61 119 151−40

−20

0

20

40

Filter length N

Slic

eim

prov

emen

t[%

]

(d) Input word size Bx = 32 bit

Fig. 7. Slice improvement of the reconfigurable FIR filter based on LUT multipliers in comparison with the reconfigurable DA FIR filter

TABLE IIICOMPARISON OF A SINGLE FILTER MIRZAEI10 41 WITH Bx = 16 BIT

USING ICAP RECONFIGURATION AND THE CFGLUT METHODS

Method S [bit] Slices fclk [MHz] Trec [ns]

RPAG [25] with ICAP 746496 502. . . 569 386.7. . . 448.8 233280Reconf. FIR DA [10] 1920 1071 521.9 61.3Reconf. FIR LUT 14784 1108 487.8 65.6

the optimization heavily depends on the numeric coefficientvalues, ten different filters were designed with the same lengthas the mid size benchmark filter MIRZAEI10 41 and an inputword size of 16 bit. These served as realistic configurationswhich can be reconfigured via ICAP.

The results are summarized in Table III. The number ofslices using RPAG optimized FIR filters varied for the differ-ent filter instances from 502. . . 569. Hence, a reconfigurationregion with a capacity of 569 slices has to be reserved. Thereconfiguration is organized in frames of 80 slices, thus, eightframes have to be reserved where each frame contributes with93312 bit, leading to a reconfiguration memory requirementof SICAP = 746496 bit per filter instance. Compared to theCFGLUT-based methods, a factor of 388 and 50 more recon-figuration memory is necessary, respectively. Assuming thatthe full performance of ICAP can be used, the reconfigurationtakes Trec = SICAP/32 · 10 ns = 233µs. Thus, comparedto the slowest CFGLUT methods with 65.5 ns, the ICAPreconfiguration is a factor of 3556 slower. The price for thesefast reconfiguration times and low memory requirements ispaid by a slice overhead of 88% and 95%, respectively.

VIII. CONCLUSION

We analyzed two reconfigurable FIR filter architecturesbased on the CFGLUT primitives which can be mapped toall modern FPGAs of Xilinx. The first one is based on a

recently proposed method based on distributed arithmetic [10],the second one uses several instances of a reconfigurable LUTmultiplier [14] to build a reconfigurable multiplier block asneeded in the FIR filter. Similarities between the differentapproaches were derived as both methods uses similar arith-metic transformations to map large LUTs to several smallerLUTs by the use of additional adders. It turned out that lessCFGLUTs, and, in most of the cases, less slices are needed forthe LUT based multiplier architecture in the case that the inputword size is greater than approximately half the number ofcoefficients and vice versa. Both methods have reconfigurationtimes and memory requirements which are about four ordersof magnitudes faster than using partial reconfiguration viathe ICAP interface which is paid by approximately twice theamount of slices.

REFERENCES

[1] M. Faust, O. Gustafsson, and C.-H. Chang, “Reconfigurable MultipleConstant Multiplication Using Minimum Adder Depth,” in Conference

Record of the Forty Fourth Asilomar Conference on Signals, Systems

and Computers (ASILOMAR), 2010, pp. 1297–1301.[2] Lowenborg and Johansson, “Minimax Design of Adjustable-bandwidth

Linear-phase FIR Filters,” IEEE Transactions on Circuits and Systems

I: Regular Papers, vol. 53, no. 2, pp. 431–439, 2006.[3] S. S. Demirsoy, A. Dempster, and I. Kale, “Design Guidelines for

Reconfigurable Multiplier Blocks,” in IEEE International Symposium

on Circuits and Systems (ISCAS), 2003.[4] S. S. Demirsoy, I. Kale, and A. Dempster, “Efficient Implementation

of Digital Filters Using Novel Reconfigurable Multiplier Blocks,” inConference Record of the Thirty-Eighth Asilomar Conference on Signals,

Systems and Computers, 2004, pp. 461–464.[5] P. Tummeltshammer, J. Hoe, and M. Puschel, “Time-Multiplexed

Multiple-Constant Multiplication,” IEEE Transactions on Computer-

Aided Design of Integrated Circuits and Systems, vol. 26, no. 9, pp.1551–1563, Sep. 2007.

[6] R. Gutierrez, J. Valls, and A. Perez-Pascual, “FPGA-Implementation ofTime-Multiplexed Multiple Constant Multiplication based on Carry-SaveArithmetic,” in International Conference on Field Programmable Logic

and Applications (FPL), 2009, pp. 609–612.

LUT Multiplier improvement compared to DA:

As expected, the LUT multiplier architecture is best for low N

Page 24: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

24

RESULTS: 1ST EXPERIMENT

6 10 13 20 28 41 61 119 151−40

−20

0

20

40

Filter length N

Slic

eim

prov

emen

t[%

]

(a) Input word size Bx = 8 bit

6 10 13 20 28 41 61 119 151−40

−20

0

20

40

Filter length N

Slic

eim

prov

emen

t[%

]

(b) Input word size Bx = 16 bit

6 10 13 20 28 41 61 119 151−40

−20

0

20

40

Filter length N

Slic

eim

prov

emen

t[%

]

(c) Input word size Bx = 24 bit

6 10 13 20 28 41 61 119 151−40

−20

0

20

40

Filter length N

Slic

eim

prov

emen

t[%

]

(d) Input word size Bx = 32 bit

Fig. 7. Slice improvement of the reconfigurable FIR filter based on LUT multipliers in comparison with the reconfigurable DA FIR filter

TABLE IIICOMPARISON OF A SINGLE FILTER MIRZAEI10 41 WITH Bx = 16 BIT

USING ICAP RECONFIGURATION AND THE CFGLUT METHODS

Method S [bit] Slices fclk [MHz] Trec [ns]

RPAG [25] with ICAP 746496 502. . . 569 386.7. . . 448.8 233280Reconf. FIR DA [10] 1920 1071 521.9 61.3Reconf. FIR LUT 14784 1108 487.8 65.6

the optimization heavily depends on the numeric coefficientvalues, ten different filters were designed with the same lengthas the mid size benchmark filter MIRZAEI10 41 and an inputword size of 16 bit. These served as realistic configurationswhich can be reconfigured via ICAP.

The results are summarized in Table III. The number ofslices using RPAG optimized FIR filters varied for the differ-ent filter instances from 502. . . 569. Hence, a reconfigurationregion with a capacity of 569 slices has to be reserved. Thereconfiguration is organized in frames of 80 slices, thus, eightframes have to be reserved where each frame contributes with93312 bit, leading to a reconfiguration memory requirementof SICAP = 746496 bit per filter instance. Compared to theCFGLUT-based methods, a factor of 388 and 50 more recon-figuration memory is necessary, respectively. Assuming thatthe full performance of ICAP can be used, the reconfigurationtakes Trec = SICAP/32 · 10 ns = 233µs. Thus, comparedto the slowest CFGLUT methods with 65.5 ns, the ICAPreconfiguration is a factor of 3556 slower. The price for thesefast reconfiguration times and low memory requirements ispaid by a slice overhead of 88% and 95%, respectively.

VIII. CONCLUSION

We analyzed two reconfigurable FIR filter architecturesbased on the CFGLUT primitives which can be mapped toall modern FPGAs of Xilinx. The first one is based on a

recently proposed method based on distributed arithmetic [10],the second one uses several instances of a reconfigurable LUTmultiplier [14] to build a reconfigurable multiplier block asneeded in the FIR filter. Similarities between the differentapproaches were derived as both methods uses similar arith-metic transformations to map large LUTs to several smallerLUTs by the use of additional adders. It turned out that lessCFGLUTs, and, in most of the cases, less slices are needed forthe LUT based multiplier architecture in the case that the inputword size is greater than approximately half the number ofcoefficients and vice versa. Both methods have reconfigurationtimes and memory requirements which are about four ordersof magnitudes faster than using partial reconfiguration viathe ICAP interface which is paid by approximately twice theamount of slices.

REFERENCES

[1] M. Faust, O. Gustafsson, and C.-H. Chang, “Reconfigurable MultipleConstant Multiplication Using Minimum Adder Depth,” in Conference

Record of the Forty Fourth Asilomar Conference on Signals, Systems

and Computers (ASILOMAR), 2010, pp. 1297–1301.[2] Lowenborg and Johansson, “Minimax Design of Adjustable-bandwidth

Linear-phase FIR Filters,” IEEE Transactions on Circuits and Systems

I: Regular Papers, vol. 53, no. 2, pp. 431–439, 2006.[3] S. S. Demirsoy, A. Dempster, and I. Kale, “Design Guidelines for

Reconfigurable Multiplier Blocks,” in IEEE International Symposium

on Circuits and Systems (ISCAS), 2003.[4] S. S. Demirsoy, I. Kale, and A. Dempster, “Efficient Implementation

of Digital Filters Using Novel Reconfigurable Multiplier Blocks,” inConference Record of the Thirty-Eighth Asilomar Conference on Signals,

Systems and Computers, 2004, pp. 461–464.[5] P. Tummeltshammer, J. Hoe, and M. Puschel, “Time-Multiplexed

Multiple-Constant Multiplication,” IEEE Transactions on Computer-

Aided Design of Integrated Circuits and Systems, vol. 26, no. 9, pp.1551–1563, Sep. 2007.

[6] R. Gutierrez, J. Valls, and A. Perez-Pascual, “FPGA-Implementation ofTime-Multiplexed Multiple Constant Multiplication based on Carry-SaveArithmetic,” in International Conference on Field Programmable Logic

and Applications (FPL), 2009, pp. 609–612.

LUT Multiplier improvement compared to DA:

Choosing the right architecture can save up to 40% slices

Page 25: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

25

RESULTS: 2ND EXPERIMENT

Method S [bit] Slices fclk [MHz] Trec [ns]

RPAG with ICAP 746496 502. . . 569 386.7. . . 448.8 233280

Reconf. FIR DA 1920 1071 521.9 61.3

Reconf. FIR LUT 14784 1108 487.8 65.6

Comparison with partial reconfiguration via ICAP

Ten different filters with N=41 were highly optimized using PMCM optimization RPAG [Kumm et al. ’12]

Configuration memory is reduced by a factor of 1/388 (DA) and 1/50 (LUT Mult.) ☺

Page 26: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

26

RESULTS: 2ND EXPERIMENT

Method S [bit] Slices fclk [MHz] Trec [ns]

RPAG with ICAP 746496 502. . . 569 386.7. . . 448.8 233280

Reconf. FIR DA 1920 1071 521.9 61.3

Reconf. FIR LUT 14784 1108 487.8 65.6

Comparison with partial reconfiguration via ICAP

Ten different filters with N=41 were highly optimized using PMCM optimization RPAG [Kumm et al. ’12]

Slice requirements are roughtly doubled ☹

Page 27: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

27

RESULTS: 2ND EXPERIMENT

Method S [bit] Slices fclk [MHz] Trec [ns]

RPAG with ICAP 746496 502. . . 569 386.7. . . 448.8 233280

Reconf. FIR DA 1920 1071 521.9 61.3

Reconf. FIR LUT 14784 1108 487.8 65.6

Comparison with partial reconfiguration via ICAP

Ten different filters with N=41 were highly optimized using PMCM optimization RPAG [Kumm et al. ’12]

Perfomance is similar

Page 28: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

28

RESULTS: 2ND EXPERIMENT

Method S [bit] Slices fclk [MHz] Trec [ns]

RPAG with ICAP 746496 502. . . 569 386.7. . . 448.8 233280

Reconf. FIR DA 1920 1071 521.9 61.3

Reconf. FIR LUT 14784 1108 487.8 65.6

Comparison with partial reconfiguration via ICAP

Ten different filters with N=41 were highly optimized using PMCM optimization RPAG [Kumm et al. ’12]

Reconfiguration time is drastically reducedby a factor of 1/3556! ☺

Page 29: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

CONCLUSION

29

Two different reconfigurable FIR filter architectures for arbitrary coefficient sets were analyzed

Both are implemented using reconfigurable LUTs (CFGLUTs)

The LUT multiplier architecture typically needs less slices when input word size is greater than approx. half the number of coefficients (and vice versa)

Both architectures offer reconfiguration times of about 3500 times faster than partial reconfiguration using ICAP

This is paid by twice the number of slice resources

Page 30: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

RECOSOC CONCLUSION

30

If you have a reconfigurable FPGA circuit which allows a fixed routing:

Use reconfigurable LUTs!

Page 31: Dynamically ReconÞgurable FIR Filter Architectures with ... · and Design,vol.3,Cork,Ireland,28Aug.-2Sep.2005,pp.465–468. [Faust et al. ’10] 6 PARTIAL RECONFIGURATION Partial

THANK YOU!