3 Fir Comparsion

50This article can be downloaded from http://www.ijeetc.com/currentissue.php

Int. J. Elec&Electr.Eng&Telecoms. 2014 H V Kumaraswamy et al., 2014

COMPARATIVE ANALYSIS OF DIFFERENT AREAEFFICIENT FIR FILTER STRUCTURES FOR

SYMMETRIC CONVOLUTIONS

H V Kumaraswamy1*, AmruthKaranth P1, Samarth Athreyas1 and Akash Bharadwaj B R1

*Corresponding Author: H V Kumaraswamy,[email protected]

A comparative analysis based on speed vs area is done on parallel, serial, and cascade serialFIR filter architectures and a DTMF test bench is also generated to test the architectures forvarious input frequencies. The VHDL code generated from MATLAB HDL code generator issynthesized in XILINX ISE 14.2 and the various parameters which determine the speed andarea efficiency of various architectures is tabulated and it is found out that for high speed operationsParallel architecture is the best and for applications involving lot of computations where lot ofspeed is required serial architecture is well suited.

Keywords: Parallel, Serial, Cascade serial, LUT

INTRODUCTIONInsignal processing, aFinite ImpulseResponse (FIR)filter is afilter,whoseimpulseresponse(or response to any finite lengthinput) is offiniteduration, because it settles tozero in finite time. This is in contrast to InfiniteImpulse Response(IIR) filters, which may haveinternal feedback and may continue to respondindefinitely (usually decaying). Theimpulseresponseof an Nth-order discrete-time FIR filter(i.e., with aKronecker deltaimpulse input)lasts forN+1 samples, and then settles tozero. FIR filters can bediscrete-timeorcontinuous-time, anddigitaloranalog.

ISSN 2319 2518 www.ijeetc.comVol. 3, No. 2, April 2014

2014 IJEETC. All Rights Reserved

Int. J. Elec&Electr.Eng&Telecoms. 2014

1 Department of Telecommunication Engineering, R.V. College of Engineering, Bangalore, India.

The outputy of a linear time invariant systemis determined byconvolvingits inputsignalxwith itsimpulse responseb.

For adiscrete-timeFIR filter, the outputis a weighted sum of the current and a finitenumber of previous values of the input. Theoperation is described by the followingequat ion, which def ines the outputsequence y [n ] in terms of i ts inputsequencex[n]:

Nnxbnxbnxbny N 110

N

ii inxb

0

Research Paper



The main disadvantage of FIR filters is thatconsiderably more computation power in ageneral purpose processor is requiredcompared to an IIR filter with similar sharpnessorselectivity, especially when low frequency(relative to the sample rate) cut-offs areneeded. However many digital signalprocessors provide specialized hardwarefeatures to make FIR filters approximately asefficient as IIR for many applications.

SPECIFICATIONS ANDARCHITECTURESIn this chapter the specifications for variousarchitectures, the design of the architecturestheir diagrams, advantages anddisadvantages are mentioned. Initially, thedesign of the architectures along with theircircuit diagrams is explained. Then, theadvantages and disadvantages of variousarchitectures are mentioned.

SpecificationFs = 48e3;

Fpass = 9.6e3;

Fstop = 12e3;

Apass = 1;

Astop = 90;

The above is the basic FIR filterspecification for various architecture and wehave chosen various parameters:

Sampling frequency is 48000 Hz

Pass band frequency is 9600 Hz

Stop band frequency is 12000 Hz

Pass band attenuation is 1 dB and

Stop band attenuation is 90 dBs

Design of Architectures with CircuitDiagramsIn this section various design of thearchitectures are mentioned. Architectures likefully parallel, fully serial, cascade areexplained.

Fully Parallel ArchitectureA ful ly paral le l archi tecture uses adedicated multiplier and adder for eachfilter tap; all taps execute in parallel. We canperform the same processing fordifferentsignalson the correspondingduplicated function units. Further, due to thefeatures ofparallel processing, the parallelDSP design often contains multiple outputs,resulting in higher throughput than notparallel. Three tap fully parallel architectureis as shown in the Figure 1.

Figure 1: Three-Tap Fir Fully Parallel

This results in the largest area since itemploys a dedicated multiplier and adder foreach lter tap and all taps execute in parallel.A fully parallel architecture is optimal for speed.However, it requires more multipliers andadders than a serial architecture, and thereforeconsumes more chip area.



Fully Serial ArchitectureIn fully serial architecture, instead of having adedicated multiplier for each tap, the inputsample for each tap is selected serially and ismultiplied with the corresponding coefficient.For symmetric (and anti-symmetrical)structures the input samples corresponding toeach set of symmetric taps are pre-added (forsymmetric) or pre-subtracted (for anti-symmetric) before multiplication with thecorresponding coefficients. The product isaccumulated sequentially using a register andthe final result is stored in a register beforethe next set of input samples arrive. Thisimplementation needs a clock rate that is asmany times faster than input sample rate asthe number of products to be computed. Thisresults in reducing the required chip area asthe implementation involves just one multiplierwith a few additional logic elements likemultiplexers and registers. A fully serialarchitecture conserves area by reusingmultiplier and adder resources sequentially.

partition is cascaded to the accumulator of theprevious partition. The output of all partitionsis therefore computed at the accumulator ofthe first partition. This technique is termedaccumulator reuse. No final adder is required,which saves area.

The cascade-serial architecture requires anextra cycle of the system clock to complete thefinal summation to the output. Therefore, thefrequency of the system clock must beincreased slightly with respect to the clock usedin a non-cascade partly serial architecture.

To generate a cascade-serial architecture,you specify a partly serial architecture withaccumulator reuse enabled. If you do notspecify the serial partitions, the coderautomatically selects an optimal partitioning.Cascade architecture is as shown in theFigure 3.

Figure 2: Serial Architecture

Cascade ArchitectureA cascade-serial architecture closelyresembles a partly serial architecture. As in apartly serial architecture, the filter taps aregrouped into a number of serial partitions thatexecute in parallel with respect to one another.However, the accumulated output of each

Figure 3: Cascade Architecture

RESULTS AND INFERENCEThe simulation results for four architectures areobtained for 1) Family: ARTIX7, 2) Device:XCTA100T, 3) Package: CSG324 and thedelays and the hardware is tabulated. Theresults for the architectures are compared andspeed versus area performance is compared.

Simulation ResultsSimulation results for various architectures areobtained and are shown in the below figures.



Comparison of Delay ResultsDelay results of various architectures are foundafter simulation and are tabulated in the table5.1 above. From the table it is inferred that fullyparallel is the most efficient architecture interms of the speed. It has only 0.814 ns of totaldelay as opposed to a greater delay of fullyserial and cascade architectures. That is fullyparallel can perform at a faster rate than theother architecture and hence can be called aperformance efficient architecture.

Figure 4: Gain Characteristic for FullySerial Architecture

Figure 5: Simulation Results for FullySerial Architecture

Figure 6: Simulation Results for FullyParallel Architecture

Figure 7: Simulation Results for CascadeSerial

Logic Delay 1.524 0.35 1.524

Route Delay 1.422 0.464 1.476

Total Delay 2.946 0.814 3.0

Table 1: Delay Results

Fully Serial(ns)

Fully Parallel(ns)

CascadeSerial (ns)

Comparison of Slice Registers andSlice LUT ResultsTable 2 shows the number of slice registers,LUTs, multipliers, adders/subtractors andRAMs used by each architecture. SliceRegisters and Slice LUT results of variousarchitectures are found after simulation and aretabulated in the table above. From the table it



can be inferred that fully serial is the bestarchitecture among the three in terms of thearea consumption to design an FIR filter as ituses only 1 multiplier and 4 adders/subtractorswhich is a very meagre as opposed to otherarchitectures and the number of slice registersand slice LUTs are comparable and hence isnot a deciding factor for the area. Multipliersapproximately weigh twice more than thenumber of adders/subtractors. Hence, fullyparallel consumes the maximum area amongthe three architectures.

CONCLUSIONWe have studied the realisation of FIR filtersusing various architectures like fully parallel,fully serial, partly serial and cascade serialarchitectures. The realisation of thesearchitectures is achieved through MATLAB.The filter characteristics for audiospecifications are verified using filter designtoolbox. Later, by the function called generatehdl the vhdl code for the architectures aregenerated and is executed to find out thespeed versus area performance of eacharchitecture.

Based on the results obtained from thesynthesis report it is evident that in terms ofspeed the partly serial and in terms of areathe fully parallel structure gives the best results.

It can be extrapolated from that the speedversus area performance results can beutilised to choose the best availablearchitecture in areas like video conferencing,high performance multimedia DSP structuresto improvise the filter performance andutilisation silicon area and bandwidth.

This project has the potential to be realisedon a FPGA in future and later the performanceof the various different architectures can bestudied. Based on the result obtained the bestpossible architectures in terms of speed Vsarea can be used in various applications likemultimedia processing, real time signalanalysis and conversion in various military andmedical applications.

REFERENCES1. Behrooz Parhami and Ding-Ming Kwai

(2003), Parallel Architectures andAdaptation Algorithmsfor ProgrammableFIR Digital FiltersWith Fully PipelinedData and Control Flows, Journal ofInformation Science and Engineering,Vol. 19, pp. 59-74, Department ofElectrical and Computer Engineering,University of California, Santa Barbara,CA 93106-9560, USA.

2. Chao Chengand Parhi K K (2004),Hardware Efficient Fast Parallel FIR FilterStructures Based on Iterated ShortConvolution, IEEE Transactions onCircuits and SystemsI: RegularPapers, Vol. 51, No. 8, pp. 1492-1500,ISBN: 1057-7122/04.

3. Florian Achleitner, Philipp Miedl andThomas Unterluggauer (2011), FilterStructures for FIR Filters, January 2,Conference Paper, Telematics Graz,

Number of Slice Registers 764 708 798

Number of Slice LUTs 294 213 385

Multipliers 1 29 2

Adders/Subtractors 4 58 6

RAM 1 0 2

Table 2: Slice Registers and Slice LUTResults

FullySerial

FullyParallel

CascadeSerial



University of Technology, Graz Universityof Technology.

4. Raveenakoppula and Masthan S K(2011), An Efficient Distributed ArithmeticArchitecture for High Order DigitalFilters, International Journal ofComputer Trends and Technology,Vol. 2, No. 2, Electronics andCommunication Engineering JayamukhiInstitute of Technology and SciencesNarsampet, Andhra Pradesh, India.

5. Sirisha A, Balanagu P and Suresh BabuN (2012), Design of Fir Filter Using LookUp Table and Memory Based Approach,Department of Electronics andCommunication Engineering, ChiralaEngineering College, Chirala.

6. Yu-Chi Tsao and Ken Choi (2012a), Area-EfficientVLSI Implementation forParallel

Linear-PhaseFIRDigitalFiltersof OddLength Based on FastFIRAlgorithm,IEEE Transactions on Circuits andSystemsII: Express Briefs, Vol. 59,No. 6, pp. 371-375, ISBN: 1549-7747.

7. Yu-Chi Tsao and Ken Choi (2012b), Area-Efficient Parallel FIR Digital FilterStructures for Symmetric ConvolutionsBased on Fast FIR Algorithm, IEEETransactions on Very Large ScaleIntegration (VLSI) Systems, Vol. 20,No. 2, pp. 366-371, ISBN: 1063-8210.

8. Zhenqi Liu, Fan Ye and Junyan Ren(2012), Low Cost Parallel Fir DigitalFilterStructures Utilizing the CoefficientSymmetry, State Key Laboratory of ASICand System, Fudan University, Shanghai200433, China, ISBN : 978-1-4673-2475-5/12.

Documents

3 Fir Comparsion