21
Research Article Design Considerations When Accelerating an FPGA-Based Digital Microphone Array for Sound-Source Localization Bruno da Silva, An Braeken, Kris Steenhaut, and Abdellah Touhafi INDI Department, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium Correspondence should be addressed to Bruno da Silva; [email protected] Received 16 March 2017; Accepted 16 May 2017; Published 20 June 2017 Academic Editor: Paolo Bruschi Copyright © 2017 Bruno da Silva et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e use of microphone arrays for sound-source localization is a well-researched topic. e response of such sensor arrays is dependent on the quantity of microphones operating on the array. A higher number of microphones, however, increase the computational demand, making real-time response challenging. In this paper, we present a Filter-and-Sum based architecture and several acceleration techniques to provide accurate sound-source localization in real-time. Experiments demonstrate how an accurate sound-source localization is obtained in a couple of milliseconds, independently of the number of microphones. Finally, we also propose different strategies to further accelerate the sound-source localization while offering increased angular resolution. 1. Introduction Most of the signal processing needed in microphone arrays is traditionally done using general purpose processors. However, the computational demand is directly related to the number of microphones of the array. is number is drastically increasing as low-cost MEMS technology is readily available. Current FPGAs are a potential solution thanks to their high-computational power and low latency response. In fact, FPGAs have been already considered by other researchers, mainly for converting the analogue or digital microphone signals into an audio format [1, 2] without further signal processing computation. We believe that FPGAs not only are able to manage relatively large microphone arrays, but also enable a faster response when compared to using general purpose processors. In order to satisfy the most time stringent sound-source localization applications that also use an incremental number of microphones, we propose a flexible, scalable, and real- time architecture. Main targets are the performance, scala- bility, and accuracy of the system to detect the direction of sound sources in real-time. Furthermore, we propose several techniques based on our architecture to accelerate the sound- source localization to guarantee real-time detection. e architecture presented in this paper is an improved and more detailed version than the one presented in [3]. Because this novel architecture is designed to be part of an embedded system, the resource and the power consumption are included together with the performance in our analysis of the system. A frequency analysis is also done based on design parameters such as the number of microphones or the number of orientations. Altogether this leads to an architecture for which the frequency response must satisfy the basic needs of an application requiring real-time sound- source localization. e main contributions of this work can be summarized as follows: (i) A Filter-and-Sum based architecture for a fast sound- source localization. (ii) A complete frequency and performance analysis of the system. (iii) Strategies to speed up the overall execution time. is paper is organized as follows. Section 2 presents related work. e principles used for the sound-source localization are introduced in Section 3. In Section 4 our proposed architecture is detailed. A complete time analysis Hindawi Journal of Sensors Volume 2017, Article ID 6782176, 20 pages https://doi.org/10.1155/2017/6782176

Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Research ArticleDesign Considerations When Accelerating an FPGA-BasedDigital Microphone Array for Sound-Source Localization

Bruno da Silva An Braeken Kris Steenhaut and Abdellah Touhafi

INDI Department Vrije Universiteit Brussel Pleinlaan 2 1050 Brussels Belgium

Correspondence should be addressed to Bruno da Silva brunodasilvavubbe

Received 16 March 2017 Accepted 16 May 2017 Published 20 June 2017

Academic Editor Paolo Bruschi

Copyright copy 2017 Bruno da Silva et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

The use of microphone arrays for sound-source localization is a well-researched topic The response of such sensor arrays isdependent on the quantity of microphones operating on the array A higher number of microphones however increase thecomputational demand making real-time response challenging In this paper we present a Filter-and-Sum based architectureand several acceleration techniques to provide accurate sound-source localization in real-time Experiments demonstrate how anaccurate sound-source localization is obtained in a couple of milliseconds independently of the number of microphones Finallywe also propose different strategies to further accelerate the sound-source localization while offering increased angular resolution

1 Introduction

Most of the signal processing needed in microphone arraysis traditionally done using general purpose processorsHowever the computational demand is directly related tothe number of microphones of the array This numberis drastically increasing as low-cost MEMS technology isreadily available Current FPGAs are a potential solutionthanks to their high-computational power and low latencyresponse In fact FPGAs have been already consideredby other researchers mainly for converting the analogueor digital microphone signals into an audio format [1 2]without further signal processing computation We believethat FPGAs not only are able to manage relatively largemicrophone arrays but also enable a faster response whencompared to using general purpose processors

In order to satisfy the most time stringent sound-sourcelocalization applications that also use an incremental numberof microphones we propose a flexible scalable and real-time architecture Main targets are the performance scala-bility and accuracy of the system to detect the direction ofsound sources in real-time Furthermore we propose severaltechniques based on our architecture to accelerate the sound-source localization to guarantee real-time detection

The architecture presented in this paper is an improvedand more detailed version than the one presented in [3]Because this novel architecture is designed to be part of anembedded system the resource and the power consumptionare included together with the performance in our analysisof the system A frequency analysis is also done basedon design parameters such as the number of microphonesor the number of orientations Altogether this leads to anarchitecture for which the frequency response must satisfythe basic needs of an application requiring real-time sound-source localization

The main contributions of this work can be summarizedas follows

(i) A Filter-and-Sum based architecture for a fast sound-source localization

(ii) A complete frequency and performance analysis ofthe system

(iii) Strategies to speed up the overall execution time

This paper is organized as follows Section 2 presentsrelated work The principles used for the sound-sourcelocalization are introduced in Section 3 In Section 4 ourproposed architecture is detailed A complete time analysis

HindawiJournal of SensorsVolume 2017 Article ID 6782176 20 pageshttpsdoiorg10115520176782176

2 Journal of Sensors

and different strategies to increase performance are presentedin Section 5 In Section 6 the proposed architecture isanalysed Finally the conclusions are drawn in Section 7

2 Related Work

The use of microphone arrays for sound-source localizationis a well-researched problem where complexity increaseswith the number of microphones involved and the requiredresponse time of the applicationThe response time is indeedcrucial for applications such as a counter-sniper systems [45] Suchmilitary systems are composed ofmicrophone arraysmounted on top of a soldiers helmet and connected to anFPGA for signal processing A similar approach is appliedin [6] where the authors present a hat-type hearing systemcomposed of 48 digital MEMs microphone array with anFPGA as the computational component Their main target isa hearing aid system which emphasizes up to 10 dB the soundcoming from a certain direction Such type of applicationsdemands a fast response of the system while being powerefficient

Indoor applications such as videoconferencing homesurveillance and patient care make also use of microphonearrays for speech detection [1 7] This paper describesthe design and implementation on an FPGA of an eight-element digital MEMS microphone array for distant speechrecognition In [8] the authors propose a beamforming-based acoustic system for localization of the dominant noisesource The signal acquisition consists of a microphone arraycomposed of up to 33MEMSmicrophones whereas the PDMdemodulation and the beamforming are implemented in anFPGA The implementation in the FPGA is completed withthe delay-and-sum beamforming measuring 60 angles andgenerating a polar map for directivity pattern presentationAnother example is proposed in [9] in which the sound-source localization is obtained by using distributed micro-phone arrays in aWSNThedistributed information collectedby the nodes is transferred and processed using data-fusiontechniques in order to locate and profile the sound sourcesDespite the fact that they implement most of the processingcomponents on an FPGA the 64k-FFT component becomestoo large and resource hungry such that it is not suitable forlow and middle-end FPGAs In both publications howevertheir solutions are not scalable and not adaptable to dynamicacoustic environments Furthermore they do not provideinformation about how fast their systems can be Insteadwe present a detailed description and analysis of a flexiblescalable and real-time architecture

3 Sound-Source Localization

Our microphone array is designed to spatially sample itssurrounding sound field in order to detect and to locatecertain types of sound sources A 360∘ sound power scanis performed for a configurable number of orientations Abeamforming technique focuses the array in one specificdirection or orientation by amplifying all sounds comingfrom that direction and by suppressing sounds coming fromother directions A polar power plot is obtained from which

the lobes can be used to estimate the nearby sound sourcesFigure 1 shows the functional elements required to locate thesound-source which involve several filters a beamformerand a relative sound power estimator

31 Microphone Array Description The sensor array is com-posed of 52 digital MEMS microphones and designed forfar-field and nondiffuse sound fields [9] The array patternconsists of four concentric subarrays of 4 8 16 and 24MEMSmicrophones mounted on a 20 cm circular printed board(Figure 2) Each subarray is differently positioned in orderto facilitate the capture of spatial acoustic information usinga beamforming technique Furthermore the sensor arrayresponse is dynamically modified by individually activatingor deactivating subarrays This distributed geometry allowsadapting the sensor to different sound sources For instancenot all the subarrays need to be active to detect a particularsound-source The computational requirements drasticallydecrease and the sensor array becomes more power efficientif only a few numbers of subarrays are active

32 Filters The selected digital MEMS microphones are theADMP521MEMSmicrophones designed by Analog Deviceswhich offer an omnidirectional polar response and a wide-band frequency response ranging from 100Hz up to 16 kHz[10] These digital MEMS microphones have a multiplexedpulse density modulation (PDM) as outputThe PDM signalsare generated by using an analogue to digital converter(ADC) based on a sigma delta converterThe sigma delta con-version technique uses an embedded integrator-comparatorcircuit to sample the analogue signal and outputs a 1-bit signal[11] The ADMP521 MEMS microphones use a fourth-ordersigma delta converter which reduces the added noise in theaudio frequency spectrum by shifting it to higher frequencyranges This undesirable high-frequency noise needs to beremovedTheADMP521MEMSmicrophones require a clockinput of around 1 to 3MHz as sampling frequency (119865119878) Thisrange of 119865119878 is chosen to oversample the audio signal in orderto have sufficient audio quality and to generate the PDMoutput signal Therefore the PDM signal needs not only tobe filtered to remove the noise but also to be downsampledto convert the audio signal to a Pulse-Code Modulation(PCM) formatThe target audible frequency range from 119865minto 119865max determines the decimation factor (119863119865) to properlydownsample the PDM signal while satisfying the Nyquisttheorem

119863119865 = lceil 1198651198782 sdot 119865max

rceil (1)

The usual range of119863119865 is from a few tens up to hundreds whentargeting audible frequency ranges For instance119863119865 needs tobe 83 to recover audio signal oversampled at 249MHz for atarget 119865max of 15 kHz

33 Filter-and-Sum Beamforming The beamforming tech-nique applied in our proposed architecture is based onthe Filter-and-Sum beamforming [12] The original Filter-and-Sum beamforming applies an independent weight to

Journal of Sensors 3

Power valueper angleFilters Beamforming

PCM PDM SUMs of PCMs

Power polar map0101101111111111111101101010010000000000000100010

Figure 1 Operations needed for the proposed architecture to locate a sound-source

Ring 4 (Oslash = 18 cm) 24 MICs

Ring 3 (Oslash = 135 cm) 16 MICs

Ring 2 (Oslash = 89 cm) 8 MICs

Ring 1 (Oslash = 45 cm) 4 MICs

Figure 2 Sound-source localization device composed of 4 MEMS microphone subarrays

each microphone output before summing them The overalleffect is an amplification of the signal coming from a targetorientationwhile suppressing signals fromother orientationsA variant version of the Filter-and-Sum recovers the audiosignal from the PDM signal applies the same low-pass FIRfilter and delays the filter output signal of each microphoneby a specific amount of time (Δ) before adding all theoutput signals together (Figure 3) The time delay (Δ119898) fora microphone 119898 is determined by the focus direction 120579 theposition vector (997888rarr119903119898) ofmicrophone119898 and the speed of sound(119888)

Δ119898 =997888rarr119903119898 sdot

119888 (2)

where the unitary vector () defines the direction vector of afar-field propagating signal with a focus direction 120579 The totaloutput (119874(120579 119905)) of the array can be expressed based on thesignal output of each microphone in the time domain 119904119898(119905)and the number of microphones in the array (119872)

119874 (120579 119905) =119872

sum119898=1

119904119898 (119905 minus Δ119898 (120579)) (3)

The response of the Filter-and-Sum beamforming howeveris usually represented in the frequency domain due to its

dependence on the signal frequency Let 119878119898(120596) be the outputsignal of each microphone at angular speed 120596 = 2120587119891 forfrequency 119891 and 119872 the number of microphones in the arrayThe total output (119874(120579 120596)) is defined as in [13]

119874 (120579 120596) =119872

sum119898=1

119878119898 (120596) 119890minus119895120596Δ119898(120579) (4)

which can be simplified by assuming a monochromaticacoustic wave as

119874 (120579 120596) = 119878119900 (120596)119872

sum119898=1

119890119895119903119898120596119899(1205790minus120579)

= 119878119900 (120596) 119882 (119908119899 1205790 120579) (5)

where 119878119900(120596) is the output signal of the monochromatic wave119908119899 is the incoming monochromatic angular speed 1205790 is itsdirection and 120579 is the array focus 119882(119908119899 1205790 120579) is known asthe array pattern which determines the amplification or gainof the array output For instance when 1205790 = 120579 which occurswhen the array is focusing in the direction of the incomingmonochromatic wave the gain reaches its maximum 119872equal to the number of microphones

34 Polar Steered Response Power The direction of thesound-source is located by measuring the relative sound

4 Journal of Sensors

Y

X

MIC 2

MIC 1

MIC 4

MIC 3

kr2

r1

r3r4

120579Δ1

(a)

MIC 2

MIC 1

MIC 4

MIC 3

Δ1

Δ2

Δ3

Δ4

t

t

t

t

t

Phased sum ofall MICs

(b)

Figure 3The proposed Filter-and-Sum beamforming filters and delays the output of each microphone before adding them together (a)Theacoustic wave received at each microphone is measured and filteredThe beamforming technique considers the time Δ119898 that the input signaltakes to travel from the microphone 119898 to the origin is proportional to the projection of the microphone vector 997888rarr119903119898 on (b) This Δ119898 isdetermined by the position of the microphone in the array and the desired focus direction 120579 of the array Consequently the signals comingfrom the same direction are amplified after the addition of the delayed inputs Source [9]

power per horizontal direction which is done by a 360∘ sweepoverview of the surrounding sound field The directionalpower output of amicrophone array defined here as the polarsteering response power (P-SRP) corresponds to the arrayrsquosdirectional response to sound sources present in a sound field(Figure 4) The P-SRP is obtained by considering multiplebroadband sources coming from different directions forinstance human speech

The output power when the microphone array is exposedto a broadband sound-source 119878(119908)with an angle of incidence1205790 can be modelled as

119874 (120579 119878) = 1198601119882 (1199081198991 1205790 120579) + 1198602119882 (1199081198992 1205790 120579) + sdot sdot sdot+ 119860119899119882 (119908119899119899 1205790 120579) (6)

where 119860 119894 with 119894 isin 1 119899 is the amplitude of one ofthe 119899 frequency components of 119878(119908) The equation can begeneralized to consider a sound field 120601 composed of multiplebroadband sound sources at different locations and withuncorrelated noise

119874 (120579 120601) = 119874 (120579 1198781) + 119874 (120579 1198782) + sdot sdot sdot + 119874 (120579 119878119899)+ Noiseuncorrelated

(7)

The arrayrsquos power output can be expressed as

119875 (120579 120601) = 1003816100381610038161003816119874 (120579 120601)10038161003816100381610038162 (8)

since the power of a signal is the square of the arrayrsquos poweroutput Finally the normalized power output is defined as theP-SRP

P-SRP (120579 120601) = 119875 (120579 120601)max120579isin[02120587]119875 (120579 120601) (9)

The comparison of119875(120579 120601) for different values of 120579 determinesin which direction the sound-source is located since themaximum power is obtained when the focus corresponds tothe location of a sound-source

The calculation of the P-SRP is usually defined in thefrequency domain [14 15] which requires the computation ofa Fourier transform Instead we propose applying Parsevalrsquostheorem which states that the sum of the squares of afunction is equal to the sum of the squares of its transformThis theorem drastically simplifies the calculations since P-SRP can be computed in the time domain Let us definethe sensing time (119905119904) as the time the array is registeringthe previously defined sound field 120601 for each orientationTherefore the power 119875(120579 119905119904) can be expressed as follows

119875 (120579 119905119904) = 1119905119904119905119904sum119905=1

10038161003816100381610038161003816119874 (120579 119905120601)100381610038161003816100381610038162 (10)

Consequently P-SRP can be expressed in the time domain by

P-SRP (120579 119905119904) = 119875 (120579 119905119904)max120579isin[02120587]119875 (120579 119905119904) (11)

Journal of Sensors 5

9060

30

80

0

330

300270

240

210

180

150

120

(a)

509060

30

0

330

300270

240

210

180

150

120

(b)

Figure 4 Examples of a polar map obtained under experimental conditions for sound sources of 5 kHz (a) and 8 kHz (b)

Steering 4 orientations Steering 8 orientations Steering 16 orientations Steering 32 orientations Steering 64 orientations

9060

30

20

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

120 9060

30

50

0

330

300270

240

210

180

150

120

Figure 5 Examples of polar maps with different angular resolution locating a sound-source of 8 kHz A low number of orientations clearlylead to wrong sound-source location

35 Sensor Array Evaluation The defined P-SRP allowsestimating the direction of arrival of multiple sound sourcesunder different sound field conditions Nevertheless theprecision and accuracy of its estimation can be determinedby different quality metrics

The Filter-and-Sum beamforming is applied to a discretenumber of orientations or angles The angular resolutionof the microphone array is determined by the number ofmeasurements per 360∘ sweep A higher number of measure-ments increment the resolution of the P-SRP displayed as apolar powermap (Figure 5) and decrease the location error ofthe sound-sourceThe lobes of this polar powermap can thenbe used to estimate the bearing of nearby sound sources innondiffuse sound fields conditions In fact the characteristicsof the main lobe when considering a single sound-sourcescenario determine the directivity of the microphone arrayThe definition of array directivity 119863119901 is proposed in [16] forbroadband signals The authors propose the use of (119863119901) as ametric of the quality of the array since 119863119901 depends on themain lobe shape and its capacity to unambiguously point to aspecific bearing The definition of array directivity presentedin [16] is adapted for 2D polar coordinates in [9] as follows

119863119901 (120579 120596) = 120587119875 (120579 120596)2(12) int2120587

0119875 (120579 120596)2 119889120579 (12)

where 119875(120579 120596) is the output power of the array when pointingto the direction 120579 and (12) int2120587

0119875(120579 120596)2119889120579 is the sum of

the squared output power in all other directions It can beexpressed as the ratio between the area of a circle whoseradius is the maximum power of the array and the total areaof the power output Consequently 119863119901 defines the quality ofthe microphone array and can be used to specify a certainthreshold for themicrophone array For instance if119863119901 equals8 themain lobe is eight times slimmer than the unit circle andoffers a confident estimation of a sound-source within half aquadrant

Whereas 119863119901 is usually considered for broadband soundsources other metrics are necessary to profile the arrayrsquosresponse for different types of sound sources Figure 6depicts the maximum side lobe (MSL) and the half-powerbeamwidth which are two complementary metrics used tocharacterize the response of arrays for narrowband soundsources Half-power beamwidth is the angular extent bywhich the power response has fallen to half of the maximumlevel of the main lobe Since the half-power coincides witha 3 dB drop in power level it is often called 3 dB beamwidth(BWminus3 dB) This metric determines the angular ratio betweenthe power signal level which is at least 50 of the peak powerlevel and the remaining circle By contrast MSL is anotherimportant parameter used to represent the impact of the sidelobeswhen characterizing arraysMSL is the normalized ratio

6 Journal of Sensors

Am

plitu

de

1

08

06

04

02

0

Angle of arrival0 90 180 270 360

Mainlobe

Maximum sidelobe (MSL)

Half-power level(BWminus3dB)

Figure 6 Definitions of maximum side lobe (MSL) and 3 dB beamwidth (BW3 dB)

PDM splitter

BeamformingstageSubarray 1

Subarray 4

Control unit

FPGA

Microphone array

Filter stage Power stagemiddot middot middotmiddot middot middot

Figure 7 Main stages of the proposed architecture

between the highest side lobe and the power level of the mainlobe expressed in dB Bothmetrics theMSL and BWminus3 dB aredesired to be as low as possible whereas119863119901 should be as highas possible to guarantee a precise sound-source location

4 A Filter-and-Sum Based Architecture

The proposed architecture uses a Filter-and-Sum based-beamforming technique to locate a sound-source with anarray of digital MEMS microphones Many applicationshowever demand a certain scalability and flexibility whenlocating the sound-source With such requirements in mindthe proposed architecture has some additional features tosupport a dynamic response targeting applications with real-time demands The proposed architecture is also designed tobe battery power efficient and to operate in streaming fashionto achieve the fastest possible response

One of the features of the ADMP521 microphone is itslow-power sleep mode capability When no clock signal isprovided the ADMP521 microphone enters in a low-powersleepmode (lt1 120583A) whichmakes this sound-source localizersuitable for battery powered implementationsThePCBof theMEMsmicrophone array is designed to exploit this capabilityFigure 2 depicts the subarray distribution of the MEMsmicrophones Using the clock signal it is possible to activateor deactivate subarrays since each subarray is fetched withan individual clock signal This flexibility allows disablingnot only subarrays of microphones but also the associatedcomputational components decreasing the computational

Table 1 Relevant parameters involved in proposed architecture

Parameter Definition119865119904 Sampling frequency119865min Minimum frequency of the target sound source119865max Maximum frequency of the target sound sourceBW Minimum bandwidth to satisfy Nyquist119863119865 Decimation factor119863CIC CIC filter decimation factor119873CIC Order of the CIC filter119863FIR FIR filter decimation factor119873FIR Order of the FIR filter

demand and the power consumptionThe proposed architec-ture is properly designed to support such flexibility

The array computes its response as fast as possible to reachreal-time sound-source location The proposed architectureis designed to process in stream fashion and is mainlycomposed of three cascaded stages operating in pipeline(Figure 7)Thefirst stage is the filter chain which is composedof the minimum number of components required to recoverthe audio signal in the target frequency range The secondstage computes the Filter-and-Sum beamforming operationThe final stage obtains 119875(120579 119905) for the focused orientation Apolar power map is obtained once a complete steering loop iscompleted The different stages are discussed in more detailin the following subsections Table 1 summarizes the mostrelevant parameters of the proposed architecture

Journal of Sensors 7

CIC decimator

filterlow-passFIR filter

Remove DC

Filter chain 1

PDM

low-passFIR filter

Filter chain 52

PDM Filtered PCM

Filtered PCM

Filters stage

Remove DCCIC

decimator filter

NCICth-order

NCICth-order

NFIRth-order

NFIRth-order

DCIC

DCIC

DFIR

DFIR

PDMsplitter

Figure 8 The filtering stage consists of a couple of filters with a downsampling factor

41 Filter Stage The filter stage contains a PDM demulti-plexer and asmany filter chain blocks asMEMSmicrophones(Figure 8) Each microphone of the array is associated with afilter chain composed of a couple of cascaded filtersThe full-capacity design supports up to 52 filter chain blocks workingin parallel but their number is defined by the number ofactive microphones The unnecessary filter chain blocks aredisabled at runtime

Themicrophonesrsquo clock 119865119878 determines the input rate andtherefore how fast the filter stage should operate The lowoperating frequency for current FPGAs allows interestingpower savings [17]

Every pair of microphones has its PDM output signalmultiplexed in time Thus at every edge of the clock cyclethe output is the sampled data from one of the microphonesThe PDM demultiplexing is the first operation to obtain theindividual sampled data from each microphone This task isdone in the PDM splitter block

The next component consists of a cascade of filters tofilter and to downsample eachmicrophone signal Traditionaldigital filters such as the Finite Impulse Response (FIR) typeof filters are a good solution to reduce the signal bandwidthand to remove the higher frequency noise Once the signalis filtered it can be decimated to decrease the oversamplingto a reasonable audio quality rate (eg 48 kHz) Howeverthis filter consumes many adders and dedicated multipliers(DSPs) from the FPGA resources particularly if its orderincreases

The Cascaded Integrated-Comb (CIC) filter is an alterna-tive for low-pass filtering techniques which has been devel-oped in [18 19] and involves only additions and subtractionsThis type of filter consists of 3 stages the integrating stage thedecimator or integrator stage and the comb section PDMsamples are recursively added in the integrating stage whilebeing recursively subtracted with a differential delay in thecomb stage The number of recursive operations in both the

integrating and comb section determines the order of thefilter (119873CIC) and should at least be equal to the order of thesigma delta converter from the DAC of the microphonesAfter the CIC filter the signal growth (119866) is proportional tothe decimation factor (119863CIC) and the differential delay (DD)and is exponential to the filter order [19]

119866 = (119863CIC sdot DD)119873CIC (13)

The output bit width grows proportionally to 119866 Denoteby 119861in the number of input bits then the number of outputbits 119861out is as follows

119861out = lceil119873CIC sdot log2 (119863CIC sdot DD) + 119861inrceil (14)

The proposed CIC decimation filter eliminates higherfrequency noise components and decimates the signal by119863CIC at the same time However a major disadvantage ofthis filter is the nonflat frequency response in the desiredaudio frequency range In order to improve the flatness ofthe frequency response a CIC filter with a lower decimationfactor followed by a compensation FIR filter is often chosenlike in [20ndash22]

The CIC filter is followed by an averager which is usedto cancel out the effects caused by the microphonesrsquo DCoffset output leading to a constant offset in the beamformingvalues This block improves the dynamic range reducing thebit width required to represent the data after the CIC

The last component of each filter chain is a low-passcompensation FIR filter based on a Kaiser windowThis filterequalises the passband drop usually introduced by CIC filters[19] It additionally performs a low rate changeThe proposedfilter also needs a cut-off frequency of 119865max at a samplingrate of 119865119904119863CIC which is the sampling rate obtained afterthe CIC decimator filter with a decimation factor of 119863CICThis low-pass FIR filter is designed in a serial fashion toreduce the resource consumption In fact the FIR filter order

8 Journal of Sensors

Pre-Computed Orientations

Delays subarray 1

Delays subarray 2

Delays subarray 3

+Mem delay microphone 1

Mem delay microphone N

+

Mem delay microphone 1

Mem delay microphone M

Delays subarray 4

+

Mem delay microphone 1

Mem delay microphone I

+

Mem delay microphone 1

Mem delay microphone J

Delays

+

Filtered MIC1

Sums

Beamforming stage

Filtered MIC2

Filtered MIC25

Filtered MIC52

Power valueper angle

Mem delaysubarray 4

Mem delaysubarray 3

Mem delaysubarray 2

Mem delaysubarray 1

Precomputed delays per orientation

Figure 9 Details of the internal structure of the proposed modular Filter-and-Sum beamforming Note that the delay values are stored in aprecomputed table

is also determined by 119863CIC Thereby the stream nature ofthe architecture the CIC filter is able to generate an outputvalue every clock cycleDue to the decimation factor only oneoutput value per 119863CIC input value is propagated to the low-pass FIR filter Therefore the FIR filter has 119863CIC clock cyclesto compute each input value which determines its maximumorderThe filtered signal is then further decimated by a factorof 119863FIR to obtain a minimum bandwidth BW = 2 sdot 119865max ofaudio signals to satisfy the Nyquist theorem The overall 119863119865can be expressed based on the low rate change of each filter

119863119865 = 119863CIC sdot 119863FIR (15)

42 Beamforming Stage As detailed before the main pur-pose of the beamforming operation is to focus the MEMSmicrophone array in one particular direction The detectionof sound sources is possible by continuously steering in loopsof 360∘ The number of orientations 119873119900 determines theangular resolution Higher angular resolutions demand notonly a larger execution time per steering loop but also moreFPGA memory resources to store the precomputed delaysper orientation

The beamforming stage depends on the number ofmicrophones and subarrays Although Filter-and-Sumbeam-forming assumes a fixed number of microphones and a fixedgeometry our scalable solution satisfies those restrictions

while offering a flexible geometry Figure 9 shows ourproposed Filter-and-Sum based beamformer This stage isbasically composed of FPGArsquos blocks of memory (BRAM)in ring-buffer fashion that properly delay the filtered micro-phone signal The values of the delays at a given momentdepend on the focus orientation at that moment and aredetermined by the array pattern 119882(119908119899 1205790 120579) from (5) Thedelay for a given microphone is determined by its positionon the array and on the focus orientation All possible delayvalues per microphone for each beamed orientation areprecomputed grouped per orientation and stored in ROMsduring compilation time During execution time the delayvalues Δ119898(120579) of each microphone 119898 when pointing to acertain orientation 120579 are obtained from this precomputedtable

The beamforming stage is designed to support a variablenumber of microphones This is enabled by grouping theinput signals following their subarray structure Thereforeinstead of implementing one simple Filter-and-Sum of 52microphones there are four Filter-and-Sum operations inparallel for the 4 8 16 and 24 microphones Their sumoperation is firstly done locally for each subarray and after-wards between subarraysTheonly restriction of thismodularbeamforming is the synchronization of the outputs in orderto have them properly delayedTherefore the easiest solutionis to delay all the subarrays with the maximum delay of the

Journal of Sensors 9

Power valueper angle Peak

detection

Power stage

Figure 10 The power stage consists of a couple of components tocalculate P-SRP and the estimated location of the sound-source

subarrays Although the output of some subarrays is alreadyproperly delayed additional delays shown at the Sums sec-tion in Figure 9 are inserted to assure that the proper delay ofeach subarray has been obtainedThis is achieved by using thevalid output signals of each subarray beamforming withoutadditional resource cost Consequently only the Filter-and-Sum beamforming modulo linked to an active subarray isenabled The not active beamformers are set to zero in orderto avoid any negative impact of the beamforming opera-tion

A side benefit of this modular approach is a reductionof the memory resource consumption Since each subarrayhas their ring-buffer memory properly dimensioned to itsmaximum sample delay the portion of underused regions ofthe consumed memories is significantly low

43 Power Stage Figure 10 shows the components of thepower stage Once the filtered data has been properly delayedand added for a particular orientation 120579 119875(120579 119905) is calculatedfollowing (10) The P-SRP is obtained after a steering loopallowing the determination of the sound sourcesThe sound-source is estimated to be located in direction shown by thepeak of the polar power map which corresponds to theorientation with the maximum 119875(120579 119905)5 Performance Analysis of the Filter-and-SumBased Architecture

A performance analysis of the proposed architecture ispresented in this section The analysis shows how the designparameters such as the filtersrsquo characteristics affect thefinal execution time of the sound-source locator The linksbetween performance and design parameters are explainedfollowed by the description of the different accelerationstrategies These strategies can be considered standalone orcombined for certain timing constraints The advantages ofthese strategies are lately presented in Section 6

51 Time Parameters The overall execution time of theproposed architecture is defined by the latency of the maincomponents A detailed analysis of the implementation ofcomponents and the latency that they incur provides a goodinsight about the speed of the system (Table 2)The operationfrequency of the design can be assumed to be the same asthe sampling frequency Let us define 119905P-SRP as the overall

Table 2 Relevant parameters involved in the performance calcula-tion for the proposed architecture

Parameter Definition119905119904 Sensing time119905119900 Execution time of one orientation119873119900 Number of orientations119871119900 Latency of the system119905P-SRP Time required to obtain a polar power map119905filtersII Initiation interval of the filter stage119905filters Execution time of the filter stage119905beamformingII Initiation interval of the beamforming stage

119905beamforming Execution time of the beamforming stage119905powerII Initiation interval of the power stage119905power Execution time of the power stage119905II Sum of all initiation intervals119905CICII Initiation interval of the CIC filter119905DCII Initiation interval of the removed DC block

119905FIRII Initiation interval of the FIR filter119905DelayII Initiation interval of the delay memories

119905SumII Initiation interval of the cascaded sums119905PowerII Initiation interval of the power calculation

execution time in clock cycles required to obtain P-SRPThus119905P-SRP is defined as

119905P-SRP = 119873119900 sdot 119905119900 = 119873119900 sdot (119905filters + 119905beamforming + 119905power) (16)

where 119905119900 is the execution time of one orientation and isdetermined by the execution time of the filter stage (119905filters)the execution time of the beamforming (119905beamforming) andthe execution time of the power stage (119905power) which are themain components of the system as explained in the previoussection The proposed architecture is designed to pipelineeach stage overlapping the execution of each component ofthe design Therefore only the initial latency or initiationinterval (II) of the components needs to be considered sinceit corresponds to the system group delay

Let us assume that the design operates at the same fre-quency 119865119878 like the microphones then (16) can be rearrangedas follows

119905P-SRP = 119873119900 sdot 119871119900119865119878

= 119873119900 sdot (119905filtersII + 119905beamformingII + 119905powerII + 119905119904)

(17)

where 119871119900 is the latency of the system and determined bythe initiation interval of the filter stage (119905filtersII ) the initiationinterval of the beamforming stage (119905beamforming

II ) and theinitiation interval of the power stage (119905powerII )The time duringwhich the microphone array is monitoring one particularorientation is known as 119905119904 This is the time required to calcu-late a certain number of output samples (119873119904) As previouslydetailed the digital microphones oversample the audio signalby operating at 119865119878 The reconstruction of the audio signal inthe target range demands a certain level of decimation 119863119865

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

2 Journal of Sensors

and different strategies to increase performance are presentedin Section 5 In Section 6 the proposed architecture isanalysed Finally the conclusions are drawn in Section 7

2 Related Work

The use of microphone arrays for sound-source localizationis a well-researched problem where complexity increaseswith the number of microphones involved and the requiredresponse time of the applicationThe response time is indeedcrucial for applications such as a counter-sniper systems [45] Suchmilitary systems are composed ofmicrophone arraysmounted on top of a soldiers helmet and connected to anFPGA for signal processing A similar approach is appliedin [6] where the authors present a hat-type hearing systemcomposed of 48 digital MEMs microphone array with anFPGA as the computational component Their main target isa hearing aid system which emphasizes up to 10 dB the soundcoming from a certain direction Such type of applicationsdemands a fast response of the system while being powerefficient

Indoor applications such as videoconferencing homesurveillance and patient care make also use of microphonearrays for speech detection [1 7] This paper describesthe design and implementation on an FPGA of an eight-element digital MEMS microphone array for distant speechrecognition In [8] the authors propose a beamforming-based acoustic system for localization of the dominant noisesource The signal acquisition consists of a microphone arraycomposed of up to 33MEMSmicrophones whereas the PDMdemodulation and the beamforming are implemented in anFPGA The implementation in the FPGA is completed withthe delay-and-sum beamforming measuring 60 angles andgenerating a polar map for directivity pattern presentationAnother example is proposed in [9] in which the sound-source localization is obtained by using distributed micro-phone arrays in aWSNThedistributed information collectedby the nodes is transferred and processed using data-fusiontechniques in order to locate and profile the sound sourcesDespite the fact that they implement most of the processingcomponents on an FPGA the 64k-FFT component becomestoo large and resource hungry such that it is not suitable forlow and middle-end FPGAs In both publications howevertheir solutions are not scalable and not adaptable to dynamicacoustic environments Furthermore they do not provideinformation about how fast their systems can be Insteadwe present a detailed description and analysis of a flexiblescalable and real-time architecture

3 Sound-Source Localization

Our microphone array is designed to spatially sample itssurrounding sound field in order to detect and to locatecertain types of sound sources A 360∘ sound power scanis performed for a configurable number of orientations Abeamforming technique focuses the array in one specificdirection or orientation by amplifying all sounds comingfrom that direction and by suppressing sounds coming fromother directions A polar power plot is obtained from which

the lobes can be used to estimate the nearby sound sourcesFigure 1 shows the functional elements required to locate thesound-source which involve several filters a beamformerand a relative sound power estimator

31 Microphone Array Description The sensor array is com-posed of 52 digital MEMS microphones and designed forfar-field and nondiffuse sound fields [9] The array patternconsists of four concentric subarrays of 4 8 16 and 24MEMSmicrophones mounted on a 20 cm circular printed board(Figure 2) Each subarray is differently positioned in orderto facilitate the capture of spatial acoustic information usinga beamforming technique Furthermore the sensor arrayresponse is dynamically modified by individually activatingor deactivating subarrays This distributed geometry allowsadapting the sensor to different sound sources For instancenot all the subarrays need to be active to detect a particularsound-source The computational requirements drasticallydecrease and the sensor array becomes more power efficientif only a few numbers of subarrays are active

32 Filters The selected digital MEMS microphones are theADMP521MEMSmicrophones designed by Analog Deviceswhich offer an omnidirectional polar response and a wide-band frequency response ranging from 100Hz up to 16 kHz[10] These digital MEMS microphones have a multiplexedpulse density modulation (PDM) as outputThe PDM signalsare generated by using an analogue to digital converter(ADC) based on a sigma delta converterThe sigma delta con-version technique uses an embedded integrator-comparatorcircuit to sample the analogue signal and outputs a 1-bit signal[11] The ADMP521 MEMS microphones use a fourth-ordersigma delta converter which reduces the added noise in theaudio frequency spectrum by shifting it to higher frequencyranges This undesirable high-frequency noise needs to beremovedTheADMP521MEMSmicrophones require a clockinput of around 1 to 3MHz as sampling frequency (119865119878) Thisrange of 119865119878 is chosen to oversample the audio signal in orderto have sufficient audio quality and to generate the PDMoutput signal Therefore the PDM signal needs not only tobe filtered to remove the noise but also to be downsampledto convert the audio signal to a Pulse-Code Modulation(PCM) formatThe target audible frequency range from 119865minto 119865max determines the decimation factor (119863119865) to properlydownsample the PDM signal while satisfying the Nyquisttheorem

119863119865 = lceil 1198651198782 sdot 119865max

rceil (1)

The usual range of119863119865 is from a few tens up to hundreds whentargeting audible frequency ranges For instance119863119865 needs tobe 83 to recover audio signal oversampled at 249MHz for atarget 119865max of 15 kHz

33 Filter-and-Sum Beamforming The beamforming tech-nique applied in our proposed architecture is based onthe Filter-and-Sum beamforming [12] The original Filter-and-Sum beamforming applies an independent weight to

Journal of Sensors 3

Power valueper angleFilters Beamforming

PCM PDM SUMs of PCMs

Power polar map0101101111111111111101101010010000000000000100010

Figure 1 Operations needed for the proposed architecture to locate a sound-source

Ring 4 (Oslash = 18 cm) 24 MICs

Ring 3 (Oslash = 135 cm) 16 MICs

Ring 2 (Oslash = 89 cm) 8 MICs

Ring 1 (Oslash = 45 cm) 4 MICs

Figure 2 Sound-source localization device composed of 4 MEMS microphone subarrays

each microphone output before summing them The overalleffect is an amplification of the signal coming from a targetorientationwhile suppressing signals fromother orientationsA variant version of the Filter-and-Sum recovers the audiosignal from the PDM signal applies the same low-pass FIRfilter and delays the filter output signal of each microphoneby a specific amount of time (Δ) before adding all theoutput signals together (Figure 3) The time delay (Δ119898) fora microphone 119898 is determined by the focus direction 120579 theposition vector (997888rarr119903119898) ofmicrophone119898 and the speed of sound(119888)

Δ119898 =997888rarr119903119898 sdot

119888 (2)

where the unitary vector () defines the direction vector of afar-field propagating signal with a focus direction 120579 The totaloutput (119874(120579 119905)) of the array can be expressed based on thesignal output of each microphone in the time domain 119904119898(119905)and the number of microphones in the array (119872)

119874 (120579 119905) =119872

sum119898=1

119904119898 (119905 minus Δ119898 (120579)) (3)

The response of the Filter-and-Sum beamforming howeveris usually represented in the frequency domain due to its

dependence on the signal frequency Let 119878119898(120596) be the outputsignal of each microphone at angular speed 120596 = 2120587119891 forfrequency 119891 and 119872 the number of microphones in the arrayThe total output (119874(120579 120596)) is defined as in [13]

119874 (120579 120596) =119872

sum119898=1

119878119898 (120596) 119890minus119895120596Δ119898(120579) (4)

which can be simplified by assuming a monochromaticacoustic wave as

119874 (120579 120596) = 119878119900 (120596)119872

sum119898=1

119890119895119903119898120596119899(1205790minus120579)

= 119878119900 (120596) 119882 (119908119899 1205790 120579) (5)

where 119878119900(120596) is the output signal of the monochromatic wave119908119899 is the incoming monochromatic angular speed 1205790 is itsdirection and 120579 is the array focus 119882(119908119899 1205790 120579) is known asthe array pattern which determines the amplification or gainof the array output For instance when 1205790 = 120579 which occurswhen the array is focusing in the direction of the incomingmonochromatic wave the gain reaches its maximum 119872equal to the number of microphones

34 Polar Steered Response Power The direction of thesound-source is located by measuring the relative sound

4 Journal of Sensors

Y

X

MIC 2

MIC 1

MIC 4

MIC 3

kr2

r1

r3r4

120579Δ1

(a)

MIC 2

MIC 1

MIC 4

MIC 3

Δ1

Δ2

Δ3

Δ4

t

t

t

t

t

Phased sum ofall MICs

(b)

Figure 3The proposed Filter-and-Sum beamforming filters and delays the output of each microphone before adding them together (a)Theacoustic wave received at each microphone is measured and filteredThe beamforming technique considers the time Δ119898 that the input signaltakes to travel from the microphone 119898 to the origin is proportional to the projection of the microphone vector 997888rarr119903119898 on (b) This Δ119898 isdetermined by the position of the microphone in the array and the desired focus direction 120579 of the array Consequently the signals comingfrom the same direction are amplified after the addition of the delayed inputs Source [9]

power per horizontal direction which is done by a 360∘ sweepoverview of the surrounding sound field The directionalpower output of amicrophone array defined here as the polarsteering response power (P-SRP) corresponds to the arrayrsquosdirectional response to sound sources present in a sound field(Figure 4) The P-SRP is obtained by considering multiplebroadband sources coming from different directions forinstance human speech

The output power when the microphone array is exposedto a broadband sound-source 119878(119908)with an angle of incidence1205790 can be modelled as

119874 (120579 119878) = 1198601119882 (1199081198991 1205790 120579) + 1198602119882 (1199081198992 1205790 120579) + sdot sdot sdot+ 119860119899119882 (119908119899119899 1205790 120579) (6)

where 119860 119894 with 119894 isin 1 119899 is the amplitude of one ofthe 119899 frequency components of 119878(119908) The equation can begeneralized to consider a sound field 120601 composed of multiplebroadband sound sources at different locations and withuncorrelated noise

119874 (120579 120601) = 119874 (120579 1198781) + 119874 (120579 1198782) + sdot sdot sdot + 119874 (120579 119878119899)+ Noiseuncorrelated

(7)

The arrayrsquos power output can be expressed as

119875 (120579 120601) = 1003816100381610038161003816119874 (120579 120601)10038161003816100381610038162 (8)

since the power of a signal is the square of the arrayrsquos poweroutput Finally the normalized power output is defined as theP-SRP

P-SRP (120579 120601) = 119875 (120579 120601)max120579isin[02120587]119875 (120579 120601) (9)

The comparison of119875(120579 120601) for different values of 120579 determinesin which direction the sound-source is located since themaximum power is obtained when the focus corresponds tothe location of a sound-source

The calculation of the P-SRP is usually defined in thefrequency domain [14 15] which requires the computation ofa Fourier transform Instead we propose applying Parsevalrsquostheorem which states that the sum of the squares of afunction is equal to the sum of the squares of its transformThis theorem drastically simplifies the calculations since P-SRP can be computed in the time domain Let us definethe sensing time (119905119904) as the time the array is registeringthe previously defined sound field 120601 for each orientationTherefore the power 119875(120579 119905119904) can be expressed as follows

119875 (120579 119905119904) = 1119905119904119905119904sum119905=1

10038161003816100381610038161003816119874 (120579 119905120601)100381610038161003816100381610038162 (10)

Consequently P-SRP can be expressed in the time domain by

P-SRP (120579 119905119904) = 119875 (120579 119905119904)max120579isin[02120587]119875 (120579 119905119904) (11)

Journal of Sensors 5

9060

30

80

0

330

300270

240

210

180

150

120

(a)

509060

30

0

330

300270

240

210

180

150

120

(b)

Figure 4 Examples of a polar map obtained under experimental conditions for sound sources of 5 kHz (a) and 8 kHz (b)

Steering 4 orientations Steering 8 orientations Steering 16 orientations Steering 32 orientations Steering 64 orientations

9060

30

20

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

120 9060

30

50

0

330

300270

240

210

180

150

120

Figure 5 Examples of polar maps with different angular resolution locating a sound-source of 8 kHz A low number of orientations clearlylead to wrong sound-source location

35 Sensor Array Evaluation The defined P-SRP allowsestimating the direction of arrival of multiple sound sourcesunder different sound field conditions Nevertheless theprecision and accuracy of its estimation can be determinedby different quality metrics

The Filter-and-Sum beamforming is applied to a discretenumber of orientations or angles The angular resolutionof the microphone array is determined by the number ofmeasurements per 360∘ sweep A higher number of measure-ments increment the resolution of the P-SRP displayed as apolar powermap (Figure 5) and decrease the location error ofthe sound-sourceThe lobes of this polar powermap can thenbe used to estimate the bearing of nearby sound sources innondiffuse sound fields conditions In fact the characteristicsof the main lobe when considering a single sound-sourcescenario determine the directivity of the microphone arrayThe definition of array directivity 119863119901 is proposed in [16] forbroadband signals The authors propose the use of (119863119901) as ametric of the quality of the array since 119863119901 depends on themain lobe shape and its capacity to unambiguously point to aspecific bearing The definition of array directivity presentedin [16] is adapted for 2D polar coordinates in [9] as follows

119863119901 (120579 120596) = 120587119875 (120579 120596)2(12) int2120587

0119875 (120579 120596)2 119889120579 (12)

where 119875(120579 120596) is the output power of the array when pointingto the direction 120579 and (12) int2120587

0119875(120579 120596)2119889120579 is the sum of

the squared output power in all other directions It can beexpressed as the ratio between the area of a circle whoseradius is the maximum power of the array and the total areaof the power output Consequently 119863119901 defines the quality ofthe microphone array and can be used to specify a certainthreshold for themicrophone array For instance if119863119901 equals8 themain lobe is eight times slimmer than the unit circle andoffers a confident estimation of a sound-source within half aquadrant

Whereas 119863119901 is usually considered for broadband soundsources other metrics are necessary to profile the arrayrsquosresponse for different types of sound sources Figure 6depicts the maximum side lobe (MSL) and the half-powerbeamwidth which are two complementary metrics used tocharacterize the response of arrays for narrowband soundsources Half-power beamwidth is the angular extent bywhich the power response has fallen to half of the maximumlevel of the main lobe Since the half-power coincides witha 3 dB drop in power level it is often called 3 dB beamwidth(BWminus3 dB) This metric determines the angular ratio betweenthe power signal level which is at least 50 of the peak powerlevel and the remaining circle By contrast MSL is anotherimportant parameter used to represent the impact of the sidelobeswhen characterizing arraysMSL is the normalized ratio

6 Journal of Sensors

Am

plitu

de

1

08

06

04

02

0

Angle of arrival0 90 180 270 360

Mainlobe

Maximum sidelobe (MSL)

Half-power level(BWminus3dB)

Figure 6 Definitions of maximum side lobe (MSL) and 3 dB beamwidth (BW3 dB)

PDM splitter

BeamformingstageSubarray 1

Subarray 4

Control unit

FPGA

Microphone array

Filter stage Power stagemiddot middot middotmiddot middot middot

Figure 7 Main stages of the proposed architecture

between the highest side lobe and the power level of the mainlobe expressed in dB Bothmetrics theMSL and BWminus3 dB aredesired to be as low as possible whereas119863119901 should be as highas possible to guarantee a precise sound-source location

4 A Filter-and-Sum Based Architecture

The proposed architecture uses a Filter-and-Sum based-beamforming technique to locate a sound-source with anarray of digital MEMS microphones Many applicationshowever demand a certain scalability and flexibility whenlocating the sound-source With such requirements in mindthe proposed architecture has some additional features tosupport a dynamic response targeting applications with real-time demands The proposed architecture is also designed tobe battery power efficient and to operate in streaming fashionto achieve the fastest possible response

One of the features of the ADMP521 microphone is itslow-power sleep mode capability When no clock signal isprovided the ADMP521 microphone enters in a low-powersleepmode (lt1 120583A) whichmakes this sound-source localizersuitable for battery powered implementationsThePCBof theMEMsmicrophone array is designed to exploit this capabilityFigure 2 depicts the subarray distribution of the MEMsmicrophones Using the clock signal it is possible to activateor deactivate subarrays since each subarray is fetched withan individual clock signal This flexibility allows disablingnot only subarrays of microphones but also the associatedcomputational components decreasing the computational

Table 1 Relevant parameters involved in proposed architecture

Parameter Definition119865119904 Sampling frequency119865min Minimum frequency of the target sound source119865max Maximum frequency of the target sound sourceBW Minimum bandwidth to satisfy Nyquist119863119865 Decimation factor119863CIC CIC filter decimation factor119873CIC Order of the CIC filter119863FIR FIR filter decimation factor119873FIR Order of the FIR filter

demand and the power consumptionThe proposed architec-ture is properly designed to support such flexibility

The array computes its response as fast as possible to reachreal-time sound-source location The proposed architectureis designed to process in stream fashion and is mainlycomposed of three cascaded stages operating in pipeline(Figure 7)Thefirst stage is the filter chain which is composedof the minimum number of components required to recoverthe audio signal in the target frequency range The secondstage computes the Filter-and-Sum beamforming operationThe final stage obtains 119875(120579 119905) for the focused orientation Apolar power map is obtained once a complete steering loop iscompleted The different stages are discussed in more detailin the following subsections Table 1 summarizes the mostrelevant parameters of the proposed architecture

Journal of Sensors 7

CIC decimator

filterlow-passFIR filter

Remove DC

Filter chain 1

PDM

low-passFIR filter

Filter chain 52

PDM Filtered PCM

Filtered PCM

Filters stage

Remove DCCIC

decimator filter

NCICth-order

NCICth-order

NFIRth-order

NFIRth-order

DCIC

DCIC

DFIR

DFIR

PDMsplitter

Figure 8 The filtering stage consists of a couple of filters with a downsampling factor

41 Filter Stage The filter stage contains a PDM demulti-plexer and asmany filter chain blocks asMEMSmicrophones(Figure 8) Each microphone of the array is associated with afilter chain composed of a couple of cascaded filtersThe full-capacity design supports up to 52 filter chain blocks workingin parallel but their number is defined by the number ofactive microphones The unnecessary filter chain blocks aredisabled at runtime

Themicrophonesrsquo clock 119865119878 determines the input rate andtherefore how fast the filter stage should operate The lowoperating frequency for current FPGAs allows interestingpower savings [17]

Every pair of microphones has its PDM output signalmultiplexed in time Thus at every edge of the clock cyclethe output is the sampled data from one of the microphonesThe PDM demultiplexing is the first operation to obtain theindividual sampled data from each microphone This task isdone in the PDM splitter block

The next component consists of a cascade of filters tofilter and to downsample eachmicrophone signal Traditionaldigital filters such as the Finite Impulse Response (FIR) typeof filters are a good solution to reduce the signal bandwidthand to remove the higher frequency noise Once the signalis filtered it can be decimated to decrease the oversamplingto a reasonable audio quality rate (eg 48 kHz) Howeverthis filter consumes many adders and dedicated multipliers(DSPs) from the FPGA resources particularly if its orderincreases

The Cascaded Integrated-Comb (CIC) filter is an alterna-tive for low-pass filtering techniques which has been devel-oped in [18 19] and involves only additions and subtractionsThis type of filter consists of 3 stages the integrating stage thedecimator or integrator stage and the comb section PDMsamples are recursively added in the integrating stage whilebeing recursively subtracted with a differential delay in thecomb stage The number of recursive operations in both the

integrating and comb section determines the order of thefilter (119873CIC) and should at least be equal to the order of thesigma delta converter from the DAC of the microphonesAfter the CIC filter the signal growth (119866) is proportional tothe decimation factor (119863CIC) and the differential delay (DD)and is exponential to the filter order [19]

119866 = (119863CIC sdot DD)119873CIC (13)

The output bit width grows proportionally to 119866 Denoteby 119861in the number of input bits then the number of outputbits 119861out is as follows

119861out = lceil119873CIC sdot log2 (119863CIC sdot DD) + 119861inrceil (14)

The proposed CIC decimation filter eliminates higherfrequency noise components and decimates the signal by119863CIC at the same time However a major disadvantage ofthis filter is the nonflat frequency response in the desiredaudio frequency range In order to improve the flatness ofthe frequency response a CIC filter with a lower decimationfactor followed by a compensation FIR filter is often chosenlike in [20ndash22]

The CIC filter is followed by an averager which is usedto cancel out the effects caused by the microphonesrsquo DCoffset output leading to a constant offset in the beamformingvalues This block improves the dynamic range reducing thebit width required to represent the data after the CIC

The last component of each filter chain is a low-passcompensation FIR filter based on a Kaiser windowThis filterequalises the passband drop usually introduced by CIC filters[19] It additionally performs a low rate changeThe proposedfilter also needs a cut-off frequency of 119865max at a samplingrate of 119865119904119863CIC which is the sampling rate obtained afterthe CIC decimator filter with a decimation factor of 119863CICThis low-pass FIR filter is designed in a serial fashion toreduce the resource consumption In fact the FIR filter order

8 Journal of Sensors

Pre-Computed Orientations

Delays subarray 1

Delays subarray 2

Delays subarray 3

+Mem delay microphone 1

Mem delay microphone N

+

Mem delay microphone 1

Mem delay microphone M

Delays subarray 4

+

Mem delay microphone 1

Mem delay microphone I

+

Mem delay microphone 1

Mem delay microphone J

Delays

+

Filtered MIC1

Sums

Beamforming stage

Filtered MIC2

Filtered MIC25

Filtered MIC52

Power valueper angle

Mem delaysubarray 4

Mem delaysubarray 3

Mem delaysubarray 2

Mem delaysubarray 1

Precomputed delays per orientation

Figure 9 Details of the internal structure of the proposed modular Filter-and-Sum beamforming Note that the delay values are stored in aprecomputed table

is also determined by 119863CIC Thereby the stream nature ofthe architecture the CIC filter is able to generate an outputvalue every clock cycleDue to the decimation factor only oneoutput value per 119863CIC input value is propagated to the low-pass FIR filter Therefore the FIR filter has 119863CIC clock cyclesto compute each input value which determines its maximumorderThe filtered signal is then further decimated by a factorof 119863FIR to obtain a minimum bandwidth BW = 2 sdot 119865max ofaudio signals to satisfy the Nyquist theorem The overall 119863119865can be expressed based on the low rate change of each filter

119863119865 = 119863CIC sdot 119863FIR (15)

42 Beamforming Stage As detailed before the main pur-pose of the beamforming operation is to focus the MEMSmicrophone array in one particular direction The detectionof sound sources is possible by continuously steering in loopsof 360∘ The number of orientations 119873119900 determines theangular resolution Higher angular resolutions demand notonly a larger execution time per steering loop but also moreFPGA memory resources to store the precomputed delaysper orientation

The beamforming stage depends on the number ofmicrophones and subarrays Although Filter-and-Sumbeam-forming assumes a fixed number of microphones and a fixedgeometry our scalable solution satisfies those restrictions

while offering a flexible geometry Figure 9 shows ourproposed Filter-and-Sum based beamformer This stage isbasically composed of FPGArsquos blocks of memory (BRAM)in ring-buffer fashion that properly delay the filtered micro-phone signal The values of the delays at a given momentdepend on the focus orientation at that moment and aredetermined by the array pattern 119882(119908119899 1205790 120579) from (5) Thedelay for a given microphone is determined by its positionon the array and on the focus orientation All possible delayvalues per microphone for each beamed orientation areprecomputed grouped per orientation and stored in ROMsduring compilation time During execution time the delayvalues Δ119898(120579) of each microphone 119898 when pointing to acertain orientation 120579 are obtained from this precomputedtable

The beamforming stage is designed to support a variablenumber of microphones This is enabled by grouping theinput signals following their subarray structure Thereforeinstead of implementing one simple Filter-and-Sum of 52microphones there are four Filter-and-Sum operations inparallel for the 4 8 16 and 24 microphones Their sumoperation is firstly done locally for each subarray and after-wards between subarraysTheonly restriction of thismodularbeamforming is the synchronization of the outputs in orderto have them properly delayedTherefore the easiest solutionis to delay all the subarrays with the maximum delay of the

Journal of Sensors 9

Power valueper angle Peak

detection

Power stage

Figure 10 The power stage consists of a couple of components tocalculate P-SRP and the estimated location of the sound-source

subarrays Although the output of some subarrays is alreadyproperly delayed additional delays shown at the Sums sec-tion in Figure 9 are inserted to assure that the proper delay ofeach subarray has been obtainedThis is achieved by using thevalid output signals of each subarray beamforming withoutadditional resource cost Consequently only the Filter-and-Sum beamforming modulo linked to an active subarray isenabled The not active beamformers are set to zero in orderto avoid any negative impact of the beamforming opera-tion

A side benefit of this modular approach is a reductionof the memory resource consumption Since each subarrayhas their ring-buffer memory properly dimensioned to itsmaximum sample delay the portion of underused regions ofthe consumed memories is significantly low

43 Power Stage Figure 10 shows the components of thepower stage Once the filtered data has been properly delayedand added for a particular orientation 120579 119875(120579 119905) is calculatedfollowing (10) The P-SRP is obtained after a steering loopallowing the determination of the sound sourcesThe sound-source is estimated to be located in direction shown by thepeak of the polar power map which corresponds to theorientation with the maximum 119875(120579 119905)5 Performance Analysis of the Filter-and-SumBased Architecture

A performance analysis of the proposed architecture ispresented in this section The analysis shows how the designparameters such as the filtersrsquo characteristics affect thefinal execution time of the sound-source locator The linksbetween performance and design parameters are explainedfollowed by the description of the different accelerationstrategies These strategies can be considered standalone orcombined for certain timing constraints The advantages ofthese strategies are lately presented in Section 6

51 Time Parameters The overall execution time of theproposed architecture is defined by the latency of the maincomponents A detailed analysis of the implementation ofcomponents and the latency that they incur provides a goodinsight about the speed of the system (Table 2)The operationfrequency of the design can be assumed to be the same asthe sampling frequency Let us define 119905P-SRP as the overall

Table 2 Relevant parameters involved in the performance calcula-tion for the proposed architecture

Parameter Definition119905119904 Sensing time119905119900 Execution time of one orientation119873119900 Number of orientations119871119900 Latency of the system119905P-SRP Time required to obtain a polar power map119905filtersII Initiation interval of the filter stage119905filters Execution time of the filter stage119905beamformingII Initiation interval of the beamforming stage

119905beamforming Execution time of the beamforming stage119905powerII Initiation interval of the power stage119905power Execution time of the power stage119905II Sum of all initiation intervals119905CICII Initiation interval of the CIC filter119905DCII Initiation interval of the removed DC block

119905FIRII Initiation interval of the FIR filter119905DelayII Initiation interval of the delay memories

119905SumII Initiation interval of the cascaded sums119905PowerII Initiation interval of the power calculation

execution time in clock cycles required to obtain P-SRPThus119905P-SRP is defined as

119905P-SRP = 119873119900 sdot 119905119900 = 119873119900 sdot (119905filters + 119905beamforming + 119905power) (16)

where 119905119900 is the execution time of one orientation and isdetermined by the execution time of the filter stage (119905filters)the execution time of the beamforming (119905beamforming) andthe execution time of the power stage (119905power) which are themain components of the system as explained in the previoussection The proposed architecture is designed to pipelineeach stage overlapping the execution of each component ofthe design Therefore only the initial latency or initiationinterval (II) of the components needs to be considered sinceit corresponds to the system group delay

Let us assume that the design operates at the same fre-quency 119865119878 like the microphones then (16) can be rearrangedas follows

119905P-SRP = 119873119900 sdot 119871119900119865119878

= 119873119900 sdot (119905filtersII + 119905beamformingII + 119905powerII + 119905119904)

(17)

where 119871119900 is the latency of the system and determined bythe initiation interval of the filter stage (119905filtersII ) the initiationinterval of the beamforming stage (119905beamforming

II ) and theinitiation interval of the power stage (119905powerII )The time duringwhich the microphone array is monitoring one particularorientation is known as 119905119904 This is the time required to calcu-late a certain number of output samples (119873119904) As previouslydetailed the digital microphones oversample the audio signalby operating at 119865119878 The reconstruction of the audio signal inthe target range demands a certain level of decimation 119863119865

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Journal of Sensors 3

Power valueper angleFilters Beamforming

PCM PDM SUMs of PCMs

Power polar map0101101111111111111101101010010000000000000100010

Figure 1 Operations needed for the proposed architecture to locate a sound-source

Ring 4 (Oslash = 18 cm) 24 MICs

Ring 3 (Oslash = 135 cm) 16 MICs

Ring 2 (Oslash = 89 cm) 8 MICs

Ring 1 (Oslash = 45 cm) 4 MICs

Figure 2 Sound-source localization device composed of 4 MEMS microphone subarrays

each microphone output before summing them The overalleffect is an amplification of the signal coming from a targetorientationwhile suppressing signals fromother orientationsA variant version of the Filter-and-Sum recovers the audiosignal from the PDM signal applies the same low-pass FIRfilter and delays the filter output signal of each microphoneby a specific amount of time (Δ) before adding all theoutput signals together (Figure 3) The time delay (Δ119898) fora microphone 119898 is determined by the focus direction 120579 theposition vector (997888rarr119903119898) ofmicrophone119898 and the speed of sound(119888)

Δ119898 =997888rarr119903119898 sdot

119888 (2)

where the unitary vector () defines the direction vector of afar-field propagating signal with a focus direction 120579 The totaloutput (119874(120579 119905)) of the array can be expressed based on thesignal output of each microphone in the time domain 119904119898(119905)and the number of microphones in the array (119872)

119874 (120579 119905) =119872

sum119898=1

119904119898 (119905 minus Δ119898 (120579)) (3)

The response of the Filter-and-Sum beamforming howeveris usually represented in the frequency domain due to its

dependence on the signal frequency Let 119878119898(120596) be the outputsignal of each microphone at angular speed 120596 = 2120587119891 forfrequency 119891 and 119872 the number of microphones in the arrayThe total output (119874(120579 120596)) is defined as in [13]

119874 (120579 120596) =119872

sum119898=1

119878119898 (120596) 119890minus119895120596Δ119898(120579) (4)

which can be simplified by assuming a monochromaticacoustic wave as

119874 (120579 120596) = 119878119900 (120596)119872

sum119898=1

119890119895119903119898120596119899(1205790minus120579)

= 119878119900 (120596) 119882 (119908119899 1205790 120579) (5)

where 119878119900(120596) is the output signal of the monochromatic wave119908119899 is the incoming monochromatic angular speed 1205790 is itsdirection and 120579 is the array focus 119882(119908119899 1205790 120579) is known asthe array pattern which determines the amplification or gainof the array output For instance when 1205790 = 120579 which occurswhen the array is focusing in the direction of the incomingmonochromatic wave the gain reaches its maximum 119872equal to the number of microphones

34 Polar Steered Response Power The direction of thesound-source is located by measuring the relative sound

4 Journal of Sensors

Y

X

MIC 2

MIC 1

MIC 4

MIC 3

kr2

r1

r3r4

120579Δ1

(a)

MIC 2

MIC 1

MIC 4

MIC 3

Δ1

Δ2

Δ3

Δ4

t

t

t

t

t

Phased sum ofall MICs

(b)

Figure 3The proposed Filter-and-Sum beamforming filters and delays the output of each microphone before adding them together (a)Theacoustic wave received at each microphone is measured and filteredThe beamforming technique considers the time Δ119898 that the input signaltakes to travel from the microphone 119898 to the origin is proportional to the projection of the microphone vector 997888rarr119903119898 on (b) This Δ119898 isdetermined by the position of the microphone in the array and the desired focus direction 120579 of the array Consequently the signals comingfrom the same direction are amplified after the addition of the delayed inputs Source [9]

power per horizontal direction which is done by a 360∘ sweepoverview of the surrounding sound field The directionalpower output of amicrophone array defined here as the polarsteering response power (P-SRP) corresponds to the arrayrsquosdirectional response to sound sources present in a sound field(Figure 4) The P-SRP is obtained by considering multiplebroadband sources coming from different directions forinstance human speech

The output power when the microphone array is exposedto a broadband sound-source 119878(119908)with an angle of incidence1205790 can be modelled as

119874 (120579 119878) = 1198601119882 (1199081198991 1205790 120579) + 1198602119882 (1199081198992 1205790 120579) + sdot sdot sdot+ 119860119899119882 (119908119899119899 1205790 120579) (6)

where 119860 119894 with 119894 isin 1 119899 is the amplitude of one ofthe 119899 frequency components of 119878(119908) The equation can begeneralized to consider a sound field 120601 composed of multiplebroadband sound sources at different locations and withuncorrelated noise

119874 (120579 120601) = 119874 (120579 1198781) + 119874 (120579 1198782) + sdot sdot sdot + 119874 (120579 119878119899)+ Noiseuncorrelated

(7)

The arrayrsquos power output can be expressed as

119875 (120579 120601) = 1003816100381610038161003816119874 (120579 120601)10038161003816100381610038162 (8)

since the power of a signal is the square of the arrayrsquos poweroutput Finally the normalized power output is defined as theP-SRP

P-SRP (120579 120601) = 119875 (120579 120601)max120579isin[02120587]119875 (120579 120601) (9)

The comparison of119875(120579 120601) for different values of 120579 determinesin which direction the sound-source is located since themaximum power is obtained when the focus corresponds tothe location of a sound-source

The calculation of the P-SRP is usually defined in thefrequency domain [14 15] which requires the computation ofa Fourier transform Instead we propose applying Parsevalrsquostheorem which states that the sum of the squares of afunction is equal to the sum of the squares of its transformThis theorem drastically simplifies the calculations since P-SRP can be computed in the time domain Let us definethe sensing time (119905119904) as the time the array is registeringthe previously defined sound field 120601 for each orientationTherefore the power 119875(120579 119905119904) can be expressed as follows

119875 (120579 119905119904) = 1119905119904119905119904sum119905=1

10038161003816100381610038161003816119874 (120579 119905120601)100381610038161003816100381610038162 (10)

Consequently P-SRP can be expressed in the time domain by

P-SRP (120579 119905119904) = 119875 (120579 119905119904)max120579isin[02120587]119875 (120579 119905119904) (11)

Journal of Sensors 5

9060

30

80

0

330

300270

240

210

180

150

120

(a)

509060

30

0

330

300270

240

210

180

150

120

(b)

Figure 4 Examples of a polar map obtained under experimental conditions for sound sources of 5 kHz (a) and 8 kHz (b)

Steering 4 orientations Steering 8 orientations Steering 16 orientations Steering 32 orientations Steering 64 orientations

9060

30

20

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

120 9060

30

50

0

330

300270

240

210

180

150

120

Figure 5 Examples of polar maps with different angular resolution locating a sound-source of 8 kHz A low number of orientations clearlylead to wrong sound-source location

35 Sensor Array Evaluation The defined P-SRP allowsestimating the direction of arrival of multiple sound sourcesunder different sound field conditions Nevertheless theprecision and accuracy of its estimation can be determinedby different quality metrics

The Filter-and-Sum beamforming is applied to a discretenumber of orientations or angles The angular resolutionof the microphone array is determined by the number ofmeasurements per 360∘ sweep A higher number of measure-ments increment the resolution of the P-SRP displayed as apolar powermap (Figure 5) and decrease the location error ofthe sound-sourceThe lobes of this polar powermap can thenbe used to estimate the bearing of nearby sound sources innondiffuse sound fields conditions In fact the characteristicsof the main lobe when considering a single sound-sourcescenario determine the directivity of the microphone arrayThe definition of array directivity 119863119901 is proposed in [16] forbroadband signals The authors propose the use of (119863119901) as ametric of the quality of the array since 119863119901 depends on themain lobe shape and its capacity to unambiguously point to aspecific bearing The definition of array directivity presentedin [16] is adapted for 2D polar coordinates in [9] as follows

119863119901 (120579 120596) = 120587119875 (120579 120596)2(12) int2120587

0119875 (120579 120596)2 119889120579 (12)

where 119875(120579 120596) is the output power of the array when pointingto the direction 120579 and (12) int2120587

0119875(120579 120596)2119889120579 is the sum of

the squared output power in all other directions It can beexpressed as the ratio between the area of a circle whoseradius is the maximum power of the array and the total areaof the power output Consequently 119863119901 defines the quality ofthe microphone array and can be used to specify a certainthreshold for themicrophone array For instance if119863119901 equals8 themain lobe is eight times slimmer than the unit circle andoffers a confident estimation of a sound-source within half aquadrant

Whereas 119863119901 is usually considered for broadband soundsources other metrics are necessary to profile the arrayrsquosresponse for different types of sound sources Figure 6depicts the maximum side lobe (MSL) and the half-powerbeamwidth which are two complementary metrics used tocharacterize the response of arrays for narrowband soundsources Half-power beamwidth is the angular extent bywhich the power response has fallen to half of the maximumlevel of the main lobe Since the half-power coincides witha 3 dB drop in power level it is often called 3 dB beamwidth(BWminus3 dB) This metric determines the angular ratio betweenthe power signal level which is at least 50 of the peak powerlevel and the remaining circle By contrast MSL is anotherimportant parameter used to represent the impact of the sidelobeswhen characterizing arraysMSL is the normalized ratio

6 Journal of Sensors

Am

plitu

de

1

08

06

04

02

0

Angle of arrival0 90 180 270 360

Mainlobe

Maximum sidelobe (MSL)

Half-power level(BWminus3dB)

Figure 6 Definitions of maximum side lobe (MSL) and 3 dB beamwidth (BW3 dB)

PDM splitter

BeamformingstageSubarray 1

Subarray 4

Control unit

FPGA

Microphone array

Filter stage Power stagemiddot middot middotmiddot middot middot

Figure 7 Main stages of the proposed architecture

between the highest side lobe and the power level of the mainlobe expressed in dB Bothmetrics theMSL and BWminus3 dB aredesired to be as low as possible whereas119863119901 should be as highas possible to guarantee a precise sound-source location

4 A Filter-and-Sum Based Architecture

The proposed architecture uses a Filter-and-Sum based-beamforming technique to locate a sound-source with anarray of digital MEMS microphones Many applicationshowever demand a certain scalability and flexibility whenlocating the sound-source With such requirements in mindthe proposed architecture has some additional features tosupport a dynamic response targeting applications with real-time demands The proposed architecture is also designed tobe battery power efficient and to operate in streaming fashionto achieve the fastest possible response

One of the features of the ADMP521 microphone is itslow-power sleep mode capability When no clock signal isprovided the ADMP521 microphone enters in a low-powersleepmode (lt1 120583A) whichmakes this sound-source localizersuitable for battery powered implementationsThePCBof theMEMsmicrophone array is designed to exploit this capabilityFigure 2 depicts the subarray distribution of the MEMsmicrophones Using the clock signal it is possible to activateor deactivate subarrays since each subarray is fetched withan individual clock signal This flexibility allows disablingnot only subarrays of microphones but also the associatedcomputational components decreasing the computational

Table 1 Relevant parameters involved in proposed architecture

Parameter Definition119865119904 Sampling frequency119865min Minimum frequency of the target sound source119865max Maximum frequency of the target sound sourceBW Minimum bandwidth to satisfy Nyquist119863119865 Decimation factor119863CIC CIC filter decimation factor119873CIC Order of the CIC filter119863FIR FIR filter decimation factor119873FIR Order of the FIR filter

demand and the power consumptionThe proposed architec-ture is properly designed to support such flexibility

The array computes its response as fast as possible to reachreal-time sound-source location The proposed architectureis designed to process in stream fashion and is mainlycomposed of three cascaded stages operating in pipeline(Figure 7)Thefirst stage is the filter chain which is composedof the minimum number of components required to recoverthe audio signal in the target frequency range The secondstage computes the Filter-and-Sum beamforming operationThe final stage obtains 119875(120579 119905) for the focused orientation Apolar power map is obtained once a complete steering loop iscompleted The different stages are discussed in more detailin the following subsections Table 1 summarizes the mostrelevant parameters of the proposed architecture

Journal of Sensors 7

CIC decimator

filterlow-passFIR filter

Remove DC

Filter chain 1

PDM

low-passFIR filter

Filter chain 52

PDM Filtered PCM

Filtered PCM

Filters stage

Remove DCCIC

decimator filter

NCICth-order

NCICth-order

NFIRth-order

NFIRth-order

DCIC

DCIC

DFIR

DFIR

PDMsplitter

Figure 8 The filtering stage consists of a couple of filters with a downsampling factor

41 Filter Stage The filter stage contains a PDM demulti-plexer and asmany filter chain blocks asMEMSmicrophones(Figure 8) Each microphone of the array is associated with afilter chain composed of a couple of cascaded filtersThe full-capacity design supports up to 52 filter chain blocks workingin parallel but their number is defined by the number ofactive microphones The unnecessary filter chain blocks aredisabled at runtime

Themicrophonesrsquo clock 119865119878 determines the input rate andtherefore how fast the filter stage should operate The lowoperating frequency for current FPGAs allows interestingpower savings [17]

Every pair of microphones has its PDM output signalmultiplexed in time Thus at every edge of the clock cyclethe output is the sampled data from one of the microphonesThe PDM demultiplexing is the first operation to obtain theindividual sampled data from each microphone This task isdone in the PDM splitter block

The next component consists of a cascade of filters tofilter and to downsample eachmicrophone signal Traditionaldigital filters such as the Finite Impulse Response (FIR) typeof filters are a good solution to reduce the signal bandwidthand to remove the higher frequency noise Once the signalis filtered it can be decimated to decrease the oversamplingto a reasonable audio quality rate (eg 48 kHz) Howeverthis filter consumes many adders and dedicated multipliers(DSPs) from the FPGA resources particularly if its orderincreases

The Cascaded Integrated-Comb (CIC) filter is an alterna-tive for low-pass filtering techniques which has been devel-oped in [18 19] and involves only additions and subtractionsThis type of filter consists of 3 stages the integrating stage thedecimator or integrator stage and the comb section PDMsamples are recursively added in the integrating stage whilebeing recursively subtracted with a differential delay in thecomb stage The number of recursive operations in both the

integrating and comb section determines the order of thefilter (119873CIC) and should at least be equal to the order of thesigma delta converter from the DAC of the microphonesAfter the CIC filter the signal growth (119866) is proportional tothe decimation factor (119863CIC) and the differential delay (DD)and is exponential to the filter order [19]

119866 = (119863CIC sdot DD)119873CIC (13)

The output bit width grows proportionally to 119866 Denoteby 119861in the number of input bits then the number of outputbits 119861out is as follows

119861out = lceil119873CIC sdot log2 (119863CIC sdot DD) + 119861inrceil (14)

The proposed CIC decimation filter eliminates higherfrequency noise components and decimates the signal by119863CIC at the same time However a major disadvantage ofthis filter is the nonflat frequency response in the desiredaudio frequency range In order to improve the flatness ofthe frequency response a CIC filter with a lower decimationfactor followed by a compensation FIR filter is often chosenlike in [20ndash22]

The CIC filter is followed by an averager which is usedto cancel out the effects caused by the microphonesrsquo DCoffset output leading to a constant offset in the beamformingvalues This block improves the dynamic range reducing thebit width required to represent the data after the CIC

The last component of each filter chain is a low-passcompensation FIR filter based on a Kaiser windowThis filterequalises the passband drop usually introduced by CIC filters[19] It additionally performs a low rate changeThe proposedfilter also needs a cut-off frequency of 119865max at a samplingrate of 119865119904119863CIC which is the sampling rate obtained afterthe CIC decimator filter with a decimation factor of 119863CICThis low-pass FIR filter is designed in a serial fashion toreduce the resource consumption In fact the FIR filter order

8 Journal of Sensors

Pre-Computed Orientations

Delays subarray 1

Delays subarray 2

Delays subarray 3

+Mem delay microphone 1

Mem delay microphone N

+

Mem delay microphone 1

Mem delay microphone M

Delays subarray 4

+

Mem delay microphone 1

Mem delay microphone I

+

Mem delay microphone 1

Mem delay microphone J

Delays

+

Filtered MIC1

Sums

Beamforming stage

Filtered MIC2

Filtered MIC25

Filtered MIC52

Power valueper angle

Mem delaysubarray 4

Mem delaysubarray 3

Mem delaysubarray 2

Mem delaysubarray 1

Precomputed delays per orientation

Figure 9 Details of the internal structure of the proposed modular Filter-and-Sum beamforming Note that the delay values are stored in aprecomputed table

is also determined by 119863CIC Thereby the stream nature ofthe architecture the CIC filter is able to generate an outputvalue every clock cycleDue to the decimation factor only oneoutput value per 119863CIC input value is propagated to the low-pass FIR filter Therefore the FIR filter has 119863CIC clock cyclesto compute each input value which determines its maximumorderThe filtered signal is then further decimated by a factorof 119863FIR to obtain a minimum bandwidth BW = 2 sdot 119865max ofaudio signals to satisfy the Nyquist theorem The overall 119863119865can be expressed based on the low rate change of each filter

119863119865 = 119863CIC sdot 119863FIR (15)

42 Beamforming Stage As detailed before the main pur-pose of the beamforming operation is to focus the MEMSmicrophone array in one particular direction The detectionof sound sources is possible by continuously steering in loopsof 360∘ The number of orientations 119873119900 determines theangular resolution Higher angular resolutions demand notonly a larger execution time per steering loop but also moreFPGA memory resources to store the precomputed delaysper orientation

The beamforming stage depends on the number ofmicrophones and subarrays Although Filter-and-Sumbeam-forming assumes a fixed number of microphones and a fixedgeometry our scalable solution satisfies those restrictions

while offering a flexible geometry Figure 9 shows ourproposed Filter-and-Sum based beamformer This stage isbasically composed of FPGArsquos blocks of memory (BRAM)in ring-buffer fashion that properly delay the filtered micro-phone signal The values of the delays at a given momentdepend on the focus orientation at that moment and aredetermined by the array pattern 119882(119908119899 1205790 120579) from (5) Thedelay for a given microphone is determined by its positionon the array and on the focus orientation All possible delayvalues per microphone for each beamed orientation areprecomputed grouped per orientation and stored in ROMsduring compilation time During execution time the delayvalues Δ119898(120579) of each microphone 119898 when pointing to acertain orientation 120579 are obtained from this precomputedtable

The beamforming stage is designed to support a variablenumber of microphones This is enabled by grouping theinput signals following their subarray structure Thereforeinstead of implementing one simple Filter-and-Sum of 52microphones there are four Filter-and-Sum operations inparallel for the 4 8 16 and 24 microphones Their sumoperation is firstly done locally for each subarray and after-wards between subarraysTheonly restriction of thismodularbeamforming is the synchronization of the outputs in orderto have them properly delayedTherefore the easiest solutionis to delay all the subarrays with the maximum delay of the

Journal of Sensors 9

Power valueper angle Peak

detection

Power stage

Figure 10 The power stage consists of a couple of components tocalculate P-SRP and the estimated location of the sound-source

subarrays Although the output of some subarrays is alreadyproperly delayed additional delays shown at the Sums sec-tion in Figure 9 are inserted to assure that the proper delay ofeach subarray has been obtainedThis is achieved by using thevalid output signals of each subarray beamforming withoutadditional resource cost Consequently only the Filter-and-Sum beamforming modulo linked to an active subarray isenabled The not active beamformers are set to zero in orderto avoid any negative impact of the beamforming opera-tion

A side benefit of this modular approach is a reductionof the memory resource consumption Since each subarrayhas their ring-buffer memory properly dimensioned to itsmaximum sample delay the portion of underused regions ofthe consumed memories is significantly low

43 Power Stage Figure 10 shows the components of thepower stage Once the filtered data has been properly delayedand added for a particular orientation 120579 119875(120579 119905) is calculatedfollowing (10) The P-SRP is obtained after a steering loopallowing the determination of the sound sourcesThe sound-source is estimated to be located in direction shown by thepeak of the polar power map which corresponds to theorientation with the maximum 119875(120579 119905)5 Performance Analysis of the Filter-and-SumBased Architecture

A performance analysis of the proposed architecture ispresented in this section The analysis shows how the designparameters such as the filtersrsquo characteristics affect thefinal execution time of the sound-source locator The linksbetween performance and design parameters are explainedfollowed by the description of the different accelerationstrategies These strategies can be considered standalone orcombined for certain timing constraints The advantages ofthese strategies are lately presented in Section 6

51 Time Parameters The overall execution time of theproposed architecture is defined by the latency of the maincomponents A detailed analysis of the implementation ofcomponents and the latency that they incur provides a goodinsight about the speed of the system (Table 2)The operationfrequency of the design can be assumed to be the same asthe sampling frequency Let us define 119905P-SRP as the overall

Table 2 Relevant parameters involved in the performance calcula-tion for the proposed architecture

Parameter Definition119905119904 Sensing time119905119900 Execution time of one orientation119873119900 Number of orientations119871119900 Latency of the system119905P-SRP Time required to obtain a polar power map119905filtersII Initiation interval of the filter stage119905filters Execution time of the filter stage119905beamformingII Initiation interval of the beamforming stage

119905beamforming Execution time of the beamforming stage119905powerII Initiation interval of the power stage119905power Execution time of the power stage119905II Sum of all initiation intervals119905CICII Initiation interval of the CIC filter119905DCII Initiation interval of the removed DC block

119905FIRII Initiation interval of the FIR filter119905DelayII Initiation interval of the delay memories

119905SumII Initiation interval of the cascaded sums119905PowerII Initiation interval of the power calculation

execution time in clock cycles required to obtain P-SRPThus119905P-SRP is defined as

119905P-SRP = 119873119900 sdot 119905119900 = 119873119900 sdot (119905filters + 119905beamforming + 119905power) (16)

where 119905119900 is the execution time of one orientation and isdetermined by the execution time of the filter stage (119905filters)the execution time of the beamforming (119905beamforming) andthe execution time of the power stage (119905power) which are themain components of the system as explained in the previoussection The proposed architecture is designed to pipelineeach stage overlapping the execution of each component ofthe design Therefore only the initial latency or initiationinterval (II) of the components needs to be considered sinceit corresponds to the system group delay

Let us assume that the design operates at the same fre-quency 119865119878 like the microphones then (16) can be rearrangedas follows

119905P-SRP = 119873119900 sdot 119871119900119865119878

= 119873119900 sdot (119905filtersII + 119905beamformingII + 119905powerII + 119905119904)

(17)

where 119871119900 is the latency of the system and determined bythe initiation interval of the filter stage (119905filtersII ) the initiationinterval of the beamforming stage (119905beamforming

II ) and theinitiation interval of the power stage (119905powerII )The time duringwhich the microphone array is monitoring one particularorientation is known as 119905119904 This is the time required to calcu-late a certain number of output samples (119873119904) As previouslydetailed the digital microphones oversample the audio signalby operating at 119865119878 The reconstruction of the audio signal inthe target range demands a certain level of decimation 119863119865

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

4 Journal of Sensors

Y

X

MIC 2

MIC 1

MIC 4

MIC 3

kr2

r1

r3r4

120579Δ1

(a)

MIC 2

MIC 1

MIC 4

MIC 3

Δ1

Δ2

Δ3

Δ4

t

t

t

t

t

Phased sum ofall MICs

(b)

Figure 3The proposed Filter-and-Sum beamforming filters and delays the output of each microphone before adding them together (a)Theacoustic wave received at each microphone is measured and filteredThe beamforming technique considers the time Δ119898 that the input signaltakes to travel from the microphone 119898 to the origin is proportional to the projection of the microphone vector 997888rarr119903119898 on (b) This Δ119898 isdetermined by the position of the microphone in the array and the desired focus direction 120579 of the array Consequently the signals comingfrom the same direction are amplified after the addition of the delayed inputs Source [9]

power per horizontal direction which is done by a 360∘ sweepoverview of the surrounding sound field The directionalpower output of amicrophone array defined here as the polarsteering response power (P-SRP) corresponds to the arrayrsquosdirectional response to sound sources present in a sound field(Figure 4) The P-SRP is obtained by considering multiplebroadband sources coming from different directions forinstance human speech

The output power when the microphone array is exposedto a broadband sound-source 119878(119908)with an angle of incidence1205790 can be modelled as

119874 (120579 119878) = 1198601119882 (1199081198991 1205790 120579) + 1198602119882 (1199081198992 1205790 120579) + sdot sdot sdot+ 119860119899119882 (119908119899119899 1205790 120579) (6)

where 119860 119894 with 119894 isin 1 119899 is the amplitude of one ofthe 119899 frequency components of 119878(119908) The equation can begeneralized to consider a sound field 120601 composed of multiplebroadband sound sources at different locations and withuncorrelated noise

119874 (120579 120601) = 119874 (120579 1198781) + 119874 (120579 1198782) + sdot sdot sdot + 119874 (120579 119878119899)+ Noiseuncorrelated

(7)

The arrayrsquos power output can be expressed as

119875 (120579 120601) = 1003816100381610038161003816119874 (120579 120601)10038161003816100381610038162 (8)

since the power of a signal is the square of the arrayrsquos poweroutput Finally the normalized power output is defined as theP-SRP

P-SRP (120579 120601) = 119875 (120579 120601)max120579isin[02120587]119875 (120579 120601) (9)

The comparison of119875(120579 120601) for different values of 120579 determinesin which direction the sound-source is located since themaximum power is obtained when the focus corresponds tothe location of a sound-source

The calculation of the P-SRP is usually defined in thefrequency domain [14 15] which requires the computation ofa Fourier transform Instead we propose applying Parsevalrsquostheorem which states that the sum of the squares of afunction is equal to the sum of the squares of its transformThis theorem drastically simplifies the calculations since P-SRP can be computed in the time domain Let us definethe sensing time (119905119904) as the time the array is registeringthe previously defined sound field 120601 for each orientationTherefore the power 119875(120579 119905119904) can be expressed as follows

119875 (120579 119905119904) = 1119905119904119905119904sum119905=1

10038161003816100381610038161003816119874 (120579 119905120601)100381610038161003816100381610038162 (10)

Consequently P-SRP can be expressed in the time domain by

P-SRP (120579 119905119904) = 119875 (120579 119905119904)max120579isin[02120587]119875 (120579 119905119904) (11)

Journal of Sensors 5

9060

30

80

0

330

300270

240

210

180

150

120

(a)

509060

30

0

330

300270

240

210

180

150

120

(b)

Figure 4 Examples of a polar map obtained under experimental conditions for sound sources of 5 kHz (a) and 8 kHz (b)

Steering 4 orientations Steering 8 orientations Steering 16 orientations Steering 32 orientations Steering 64 orientations

9060

30

20

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

120 9060

30

50

0

330

300270

240

210

180

150

120

Figure 5 Examples of polar maps with different angular resolution locating a sound-source of 8 kHz A low number of orientations clearlylead to wrong sound-source location

35 Sensor Array Evaluation The defined P-SRP allowsestimating the direction of arrival of multiple sound sourcesunder different sound field conditions Nevertheless theprecision and accuracy of its estimation can be determinedby different quality metrics

The Filter-and-Sum beamforming is applied to a discretenumber of orientations or angles The angular resolutionof the microphone array is determined by the number ofmeasurements per 360∘ sweep A higher number of measure-ments increment the resolution of the P-SRP displayed as apolar powermap (Figure 5) and decrease the location error ofthe sound-sourceThe lobes of this polar powermap can thenbe used to estimate the bearing of nearby sound sources innondiffuse sound fields conditions In fact the characteristicsof the main lobe when considering a single sound-sourcescenario determine the directivity of the microphone arrayThe definition of array directivity 119863119901 is proposed in [16] forbroadband signals The authors propose the use of (119863119901) as ametric of the quality of the array since 119863119901 depends on themain lobe shape and its capacity to unambiguously point to aspecific bearing The definition of array directivity presentedin [16] is adapted for 2D polar coordinates in [9] as follows

119863119901 (120579 120596) = 120587119875 (120579 120596)2(12) int2120587

0119875 (120579 120596)2 119889120579 (12)

where 119875(120579 120596) is the output power of the array when pointingto the direction 120579 and (12) int2120587

0119875(120579 120596)2119889120579 is the sum of

the squared output power in all other directions It can beexpressed as the ratio between the area of a circle whoseradius is the maximum power of the array and the total areaof the power output Consequently 119863119901 defines the quality ofthe microphone array and can be used to specify a certainthreshold for themicrophone array For instance if119863119901 equals8 themain lobe is eight times slimmer than the unit circle andoffers a confident estimation of a sound-source within half aquadrant

Whereas 119863119901 is usually considered for broadband soundsources other metrics are necessary to profile the arrayrsquosresponse for different types of sound sources Figure 6depicts the maximum side lobe (MSL) and the half-powerbeamwidth which are two complementary metrics used tocharacterize the response of arrays for narrowband soundsources Half-power beamwidth is the angular extent bywhich the power response has fallen to half of the maximumlevel of the main lobe Since the half-power coincides witha 3 dB drop in power level it is often called 3 dB beamwidth(BWminus3 dB) This metric determines the angular ratio betweenthe power signal level which is at least 50 of the peak powerlevel and the remaining circle By contrast MSL is anotherimportant parameter used to represent the impact of the sidelobeswhen characterizing arraysMSL is the normalized ratio

6 Journal of Sensors

Am

plitu

de

1

08

06

04

02

0

Angle of arrival0 90 180 270 360

Mainlobe

Maximum sidelobe (MSL)

Half-power level(BWminus3dB)

Figure 6 Definitions of maximum side lobe (MSL) and 3 dB beamwidth (BW3 dB)

PDM splitter

BeamformingstageSubarray 1

Subarray 4

Control unit

FPGA

Microphone array

Filter stage Power stagemiddot middot middotmiddot middot middot

Figure 7 Main stages of the proposed architecture

between the highest side lobe and the power level of the mainlobe expressed in dB Bothmetrics theMSL and BWminus3 dB aredesired to be as low as possible whereas119863119901 should be as highas possible to guarantee a precise sound-source location

4 A Filter-and-Sum Based Architecture

The proposed architecture uses a Filter-and-Sum based-beamforming technique to locate a sound-source with anarray of digital MEMS microphones Many applicationshowever demand a certain scalability and flexibility whenlocating the sound-source With such requirements in mindthe proposed architecture has some additional features tosupport a dynamic response targeting applications with real-time demands The proposed architecture is also designed tobe battery power efficient and to operate in streaming fashionto achieve the fastest possible response

One of the features of the ADMP521 microphone is itslow-power sleep mode capability When no clock signal isprovided the ADMP521 microphone enters in a low-powersleepmode (lt1 120583A) whichmakes this sound-source localizersuitable for battery powered implementationsThePCBof theMEMsmicrophone array is designed to exploit this capabilityFigure 2 depicts the subarray distribution of the MEMsmicrophones Using the clock signal it is possible to activateor deactivate subarrays since each subarray is fetched withan individual clock signal This flexibility allows disablingnot only subarrays of microphones but also the associatedcomputational components decreasing the computational

Table 1 Relevant parameters involved in proposed architecture

Parameter Definition119865119904 Sampling frequency119865min Minimum frequency of the target sound source119865max Maximum frequency of the target sound sourceBW Minimum bandwidth to satisfy Nyquist119863119865 Decimation factor119863CIC CIC filter decimation factor119873CIC Order of the CIC filter119863FIR FIR filter decimation factor119873FIR Order of the FIR filter

demand and the power consumptionThe proposed architec-ture is properly designed to support such flexibility

The array computes its response as fast as possible to reachreal-time sound-source location The proposed architectureis designed to process in stream fashion and is mainlycomposed of three cascaded stages operating in pipeline(Figure 7)Thefirst stage is the filter chain which is composedof the minimum number of components required to recoverthe audio signal in the target frequency range The secondstage computes the Filter-and-Sum beamforming operationThe final stage obtains 119875(120579 119905) for the focused orientation Apolar power map is obtained once a complete steering loop iscompleted The different stages are discussed in more detailin the following subsections Table 1 summarizes the mostrelevant parameters of the proposed architecture

Journal of Sensors 7

CIC decimator

filterlow-passFIR filter

Remove DC

Filter chain 1

PDM

low-passFIR filter

Filter chain 52

PDM Filtered PCM

Filtered PCM

Filters stage

Remove DCCIC

decimator filter

NCICth-order

NCICth-order

NFIRth-order

NFIRth-order

DCIC

DCIC

DFIR

DFIR

PDMsplitter

Figure 8 The filtering stage consists of a couple of filters with a downsampling factor

41 Filter Stage The filter stage contains a PDM demulti-plexer and asmany filter chain blocks asMEMSmicrophones(Figure 8) Each microphone of the array is associated with afilter chain composed of a couple of cascaded filtersThe full-capacity design supports up to 52 filter chain blocks workingin parallel but their number is defined by the number ofactive microphones The unnecessary filter chain blocks aredisabled at runtime

Themicrophonesrsquo clock 119865119878 determines the input rate andtherefore how fast the filter stage should operate The lowoperating frequency for current FPGAs allows interestingpower savings [17]

Every pair of microphones has its PDM output signalmultiplexed in time Thus at every edge of the clock cyclethe output is the sampled data from one of the microphonesThe PDM demultiplexing is the first operation to obtain theindividual sampled data from each microphone This task isdone in the PDM splitter block

The next component consists of a cascade of filters tofilter and to downsample eachmicrophone signal Traditionaldigital filters such as the Finite Impulse Response (FIR) typeof filters are a good solution to reduce the signal bandwidthand to remove the higher frequency noise Once the signalis filtered it can be decimated to decrease the oversamplingto a reasonable audio quality rate (eg 48 kHz) Howeverthis filter consumes many adders and dedicated multipliers(DSPs) from the FPGA resources particularly if its orderincreases

The Cascaded Integrated-Comb (CIC) filter is an alterna-tive for low-pass filtering techniques which has been devel-oped in [18 19] and involves only additions and subtractionsThis type of filter consists of 3 stages the integrating stage thedecimator or integrator stage and the comb section PDMsamples are recursively added in the integrating stage whilebeing recursively subtracted with a differential delay in thecomb stage The number of recursive operations in both the

integrating and comb section determines the order of thefilter (119873CIC) and should at least be equal to the order of thesigma delta converter from the DAC of the microphonesAfter the CIC filter the signal growth (119866) is proportional tothe decimation factor (119863CIC) and the differential delay (DD)and is exponential to the filter order [19]

119866 = (119863CIC sdot DD)119873CIC (13)

The output bit width grows proportionally to 119866 Denoteby 119861in the number of input bits then the number of outputbits 119861out is as follows

119861out = lceil119873CIC sdot log2 (119863CIC sdot DD) + 119861inrceil (14)

The proposed CIC decimation filter eliminates higherfrequency noise components and decimates the signal by119863CIC at the same time However a major disadvantage ofthis filter is the nonflat frequency response in the desiredaudio frequency range In order to improve the flatness ofthe frequency response a CIC filter with a lower decimationfactor followed by a compensation FIR filter is often chosenlike in [20ndash22]

The CIC filter is followed by an averager which is usedto cancel out the effects caused by the microphonesrsquo DCoffset output leading to a constant offset in the beamformingvalues This block improves the dynamic range reducing thebit width required to represent the data after the CIC

The last component of each filter chain is a low-passcompensation FIR filter based on a Kaiser windowThis filterequalises the passband drop usually introduced by CIC filters[19] It additionally performs a low rate changeThe proposedfilter also needs a cut-off frequency of 119865max at a samplingrate of 119865119904119863CIC which is the sampling rate obtained afterthe CIC decimator filter with a decimation factor of 119863CICThis low-pass FIR filter is designed in a serial fashion toreduce the resource consumption In fact the FIR filter order

8 Journal of Sensors

Pre-Computed Orientations

Delays subarray 1

Delays subarray 2

Delays subarray 3

+Mem delay microphone 1

Mem delay microphone N

+

Mem delay microphone 1

Mem delay microphone M

Delays subarray 4

+

Mem delay microphone 1

Mem delay microphone I

+

Mem delay microphone 1

Mem delay microphone J

Delays

+

Filtered MIC1

Sums

Beamforming stage

Filtered MIC2

Filtered MIC25

Filtered MIC52

Power valueper angle

Mem delaysubarray 4

Mem delaysubarray 3

Mem delaysubarray 2

Mem delaysubarray 1

Precomputed delays per orientation

Figure 9 Details of the internal structure of the proposed modular Filter-and-Sum beamforming Note that the delay values are stored in aprecomputed table

is also determined by 119863CIC Thereby the stream nature ofthe architecture the CIC filter is able to generate an outputvalue every clock cycleDue to the decimation factor only oneoutput value per 119863CIC input value is propagated to the low-pass FIR filter Therefore the FIR filter has 119863CIC clock cyclesto compute each input value which determines its maximumorderThe filtered signal is then further decimated by a factorof 119863FIR to obtain a minimum bandwidth BW = 2 sdot 119865max ofaudio signals to satisfy the Nyquist theorem The overall 119863119865can be expressed based on the low rate change of each filter

119863119865 = 119863CIC sdot 119863FIR (15)

42 Beamforming Stage As detailed before the main pur-pose of the beamforming operation is to focus the MEMSmicrophone array in one particular direction The detectionof sound sources is possible by continuously steering in loopsof 360∘ The number of orientations 119873119900 determines theangular resolution Higher angular resolutions demand notonly a larger execution time per steering loop but also moreFPGA memory resources to store the precomputed delaysper orientation

The beamforming stage depends on the number ofmicrophones and subarrays Although Filter-and-Sumbeam-forming assumes a fixed number of microphones and a fixedgeometry our scalable solution satisfies those restrictions

while offering a flexible geometry Figure 9 shows ourproposed Filter-and-Sum based beamformer This stage isbasically composed of FPGArsquos blocks of memory (BRAM)in ring-buffer fashion that properly delay the filtered micro-phone signal The values of the delays at a given momentdepend on the focus orientation at that moment and aredetermined by the array pattern 119882(119908119899 1205790 120579) from (5) Thedelay for a given microphone is determined by its positionon the array and on the focus orientation All possible delayvalues per microphone for each beamed orientation areprecomputed grouped per orientation and stored in ROMsduring compilation time During execution time the delayvalues Δ119898(120579) of each microphone 119898 when pointing to acertain orientation 120579 are obtained from this precomputedtable

The beamforming stage is designed to support a variablenumber of microphones This is enabled by grouping theinput signals following their subarray structure Thereforeinstead of implementing one simple Filter-and-Sum of 52microphones there are four Filter-and-Sum operations inparallel for the 4 8 16 and 24 microphones Their sumoperation is firstly done locally for each subarray and after-wards between subarraysTheonly restriction of thismodularbeamforming is the synchronization of the outputs in orderto have them properly delayedTherefore the easiest solutionis to delay all the subarrays with the maximum delay of the

Journal of Sensors 9

Power valueper angle Peak

detection

Power stage

Figure 10 The power stage consists of a couple of components tocalculate P-SRP and the estimated location of the sound-source

subarrays Although the output of some subarrays is alreadyproperly delayed additional delays shown at the Sums sec-tion in Figure 9 are inserted to assure that the proper delay ofeach subarray has been obtainedThis is achieved by using thevalid output signals of each subarray beamforming withoutadditional resource cost Consequently only the Filter-and-Sum beamforming modulo linked to an active subarray isenabled The not active beamformers are set to zero in orderto avoid any negative impact of the beamforming opera-tion

A side benefit of this modular approach is a reductionof the memory resource consumption Since each subarrayhas their ring-buffer memory properly dimensioned to itsmaximum sample delay the portion of underused regions ofthe consumed memories is significantly low

43 Power Stage Figure 10 shows the components of thepower stage Once the filtered data has been properly delayedand added for a particular orientation 120579 119875(120579 119905) is calculatedfollowing (10) The P-SRP is obtained after a steering loopallowing the determination of the sound sourcesThe sound-source is estimated to be located in direction shown by thepeak of the polar power map which corresponds to theorientation with the maximum 119875(120579 119905)5 Performance Analysis of the Filter-and-SumBased Architecture

A performance analysis of the proposed architecture ispresented in this section The analysis shows how the designparameters such as the filtersrsquo characteristics affect thefinal execution time of the sound-source locator The linksbetween performance and design parameters are explainedfollowed by the description of the different accelerationstrategies These strategies can be considered standalone orcombined for certain timing constraints The advantages ofthese strategies are lately presented in Section 6

51 Time Parameters The overall execution time of theproposed architecture is defined by the latency of the maincomponents A detailed analysis of the implementation ofcomponents and the latency that they incur provides a goodinsight about the speed of the system (Table 2)The operationfrequency of the design can be assumed to be the same asthe sampling frequency Let us define 119905P-SRP as the overall

Table 2 Relevant parameters involved in the performance calcula-tion for the proposed architecture

Parameter Definition119905119904 Sensing time119905119900 Execution time of one orientation119873119900 Number of orientations119871119900 Latency of the system119905P-SRP Time required to obtain a polar power map119905filtersII Initiation interval of the filter stage119905filters Execution time of the filter stage119905beamformingII Initiation interval of the beamforming stage

119905beamforming Execution time of the beamforming stage119905powerII Initiation interval of the power stage119905power Execution time of the power stage119905II Sum of all initiation intervals119905CICII Initiation interval of the CIC filter119905DCII Initiation interval of the removed DC block

119905FIRII Initiation interval of the FIR filter119905DelayII Initiation interval of the delay memories

119905SumII Initiation interval of the cascaded sums119905PowerII Initiation interval of the power calculation

execution time in clock cycles required to obtain P-SRPThus119905P-SRP is defined as

119905P-SRP = 119873119900 sdot 119905119900 = 119873119900 sdot (119905filters + 119905beamforming + 119905power) (16)

where 119905119900 is the execution time of one orientation and isdetermined by the execution time of the filter stage (119905filters)the execution time of the beamforming (119905beamforming) andthe execution time of the power stage (119905power) which are themain components of the system as explained in the previoussection The proposed architecture is designed to pipelineeach stage overlapping the execution of each component ofthe design Therefore only the initial latency or initiationinterval (II) of the components needs to be considered sinceit corresponds to the system group delay

Let us assume that the design operates at the same fre-quency 119865119878 like the microphones then (16) can be rearrangedas follows

119905P-SRP = 119873119900 sdot 119871119900119865119878

= 119873119900 sdot (119905filtersII + 119905beamformingII + 119905powerII + 119905119904)

(17)

where 119871119900 is the latency of the system and determined bythe initiation interval of the filter stage (119905filtersII ) the initiationinterval of the beamforming stage (119905beamforming

II ) and theinitiation interval of the power stage (119905powerII )The time duringwhich the microphone array is monitoring one particularorientation is known as 119905119904 This is the time required to calcu-late a certain number of output samples (119873119904) As previouslydetailed the digital microphones oversample the audio signalby operating at 119865119878 The reconstruction of the audio signal inthe target range demands a certain level of decimation 119863119865

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Journal of Sensors 5

9060

30

80

0

330

300270

240

210

180

150

120

(a)

509060

30

0

330

300270

240

210

180

150

120

(b)

Figure 4 Examples of a polar map obtained under experimental conditions for sound sources of 5 kHz (a) and 8 kHz (b)

Steering 4 orientations Steering 8 orientations Steering 16 orientations Steering 32 orientations Steering 64 orientations

9060

30

20

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

12090

60

30

50

0

330

300270

240

210

180

150

120 9060

30

50

0

330

300270

240

210

180

150

120

Figure 5 Examples of polar maps with different angular resolution locating a sound-source of 8 kHz A low number of orientations clearlylead to wrong sound-source location

35 Sensor Array Evaluation The defined P-SRP allowsestimating the direction of arrival of multiple sound sourcesunder different sound field conditions Nevertheless theprecision and accuracy of its estimation can be determinedby different quality metrics

The Filter-and-Sum beamforming is applied to a discretenumber of orientations or angles The angular resolutionof the microphone array is determined by the number ofmeasurements per 360∘ sweep A higher number of measure-ments increment the resolution of the P-SRP displayed as apolar powermap (Figure 5) and decrease the location error ofthe sound-sourceThe lobes of this polar powermap can thenbe used to estimate the bearing of nearby sound sources innondiffuse sound fields conditions In fact the characteristicsof the main lobe when considering a single sound-sourcescenario determine the directivity of the microphone arrayThe definition of array directivity 119863119901 is proposed in [16] forbroadband signals The authors propose the use of (119863119901) as ametric of the quality of the array since 119863119901 depends on themain lobe shape and its capacity to unambiguously point to aspecific bearing The definition of array directivity presentedin [16] is adapted for 2D polar coordinates in [9] as follows

119863119901 (120579 120596) = 120587119875 (120579 120596)2(12) int2120587

0119875 (120579 120596)2 119889120579 (12)

where 119875(120579 120596) is the output power of the array when pointingto the direction 120579 and (12) int2120587

0119875(120579 120596)2119889120579 is the sum of

the squared output power in all other directions It can beexpressed as the ratio between the area of a circle whoseradius is the maximum power of the array and the total areaof the power output Consequently 119863119901 defines the quality ofthe microphone array and can be used to specify a certainthreshold for themicrophone array For instance if119863119901 equals8 themain lobe is eight times slimmer than the unit circle andoffers a confident estimation of a sound-source within half aquadrant

Whereas 119863119901 is usually considered for broadband soundsources other metrics are necessary to profile the arrayrsquosresponse for different types of sound sources Figure 6depicts the maximum side lobe (MSL) and the half-powerbeamwidth which are two complementary metrics used tocharacterize the response of arrays for narrowband soundsources Half-power beamwidth is the angular extent bywhich the power response has fallen to half of the maximumlevel of the main lobe Since the half-power coincides witha 3 dB drop in power level it is often called 3 dB beamwidth(BWminus3 dB) This metric determines the angular ratio betweenthe power signal level which is at least 50 of the peak powerlevel and the remaining circle By contrast MSL is anotherimportant parameter used to represent the impact of the sidelobeswhen characterizing arraysMSL is the normalized ratio

6 Journal of Sensors

Am

plitu

de

1

08

06

04

02

0

Angle of arrival0 90 180 270 360

Mainlobe

Maximum sidelobe (MSL)

Half-power level(BWminus3dB)

Figure 6 Definitions of maximum side lobe (MSL) and 3 dB beamwidth (BW3 dB)

PDM splitter

BeamformingstageSubarray 1

Subarray 4

Control unit

FPGA

Microphone array

Filter stage Power stagemiddot middot middotmiddot middot middot

Figure 7 Main stages of the proposed architecture

between the highest side lobe and the power level of the mainlobe expressed in dB Bothmetrics theMSL and BWminus3 dB aredesired to be as low as possible whereas119863119901 should be as highas possible to guarantee a precise sound-source location

4 A Filter-and-Sum Based Architecture

The proposed architecture uses a Filter-and-Sum based-beamforming technique to locate a sound-source with anarray of digital MEMS microphones Many applicationshowever demand a certain scalability and flexibility whenlocating the sound-source With such requirements in mindthe proposed architecture has some additional features tosupport a dynamic response targeting applications with real-time demands The proposed architecture is also designed tobe battery power efficient and to operate in streaming fashionto achieve the fastest possible response

One of the features of the ADMP521 microphone is itslow-power sleep mode capability When no clock signal isprovided the ADMP521 microphone enters in a low-powersleepmode (lt1 120583A) whichmakes this sound-source localizersuitable for battery powered implementationsThePCBof theMEMsmicrophone array is designed to exploit this capabilityFigure 2 depicts the subarray distribution of the MEMsmicrophones Using the clock signal it is possible to activateor deactivate subarrays since each subarray is fetched withan individual clock signal This flexibility allows disablingnot only subarrays of microphones but also the associatedcomputational components decreasing the computational

Table 1 Relevant parameters involved in proposed architecture

Parameter Definition119865119904 Sampling frequency119865min Minimum frequency of the target sound source119865max Maximum frequency of the target sound sourceBW Minimum bandwidth to satisfy Nyquist119863119865 Decimation factor119863CIC CIC filter decimation factor119873CIC Order of the CIC filter119863FIR FIR filter decimation factor119873FIR Order of the FIR filter

demand and the power consumptionThe proposed architec-ture is properly designed to support such flexibility

The array computes its response as fast as possible to reachreal-time sound-source location The proposed architectureis designed to process in stream fashion and is mainlycomposed of three cascaded stages operating in pipeline(Figure 7)Thefirst stage is the filter chain which is composedof the minimum number of components required to recoverthe audio signal in the target frequency range The secondstage computes the Filter-and-Sum beamforming operationThe final stage obtains 119875(120579 119905) for the focused orientation Apolar power map is obtained once a complete steering loop iscompleted The different stages are discussed in more detailin the following subsections Table 1 summarizes the mostrelevant parameters of the proposed architecture

Journal of Sensors 7

CIC decimator

filterlow-passFIR filter

Remove DC

Filter chain 1

PDM

low-passFIR filter

Filter chain 52

PDM Filtered PCM

Filtered PCM

Filters stage

Remove DCCIC

decimator filter

NCICth-order

NCICth-order

NFIRth-order

NFIRth-order

DCIC

DCIC

DFIR

DFIR

PDMsplitter

Figure 8 The filtering stage consists of a couple of filters with a downsampling factor

41 Filter Stage The filter stage contains a PDM demulti-plexer and asmany filter chain blocks asMEMSmicrophones(Figure 8) Each microphone of the array is associated with afilter chain composed of a couple of cascaded filtersThe full-capacity design supports up to 52 filter chain blocks workingin parallel but their number is defined by the number ofactive microphones The unnecessary filter chain blocks aredisabled at runtime

Themicrophonesrsquo clock 119865119878 determines the input rate andtherefore how fast the filter stage should operate The lowoperating frequency for current FPGAs allows interestingpower savings [17]

Every pair of microphones has its PDM output signalmultiplexed in time Thus at every edge of the clock cyclethe output is the sampled data from one of the microphonesThe PDM demultiplexing is the first operation to obtain theindividual sampled data from each microphone This task isdone in the PDM splitter block

The next component consists of a cascade of filters tofilter and to downsample eachmicrophone signal Traditionaldigital filters such as the Finite Impulse Response (FIR) typeof filters are a good solution to reduce the signal bandwidthand to remove the higher frequency noise Once the signalis filtered it can be decimated to decrease the oversamplingto a reasonable audio quality rate (eg 48 kHz) Howeverthis filter consumes many adders and dedicated multipliers(DSPs) from the FPGA resources particularly if its orderincreases

The Cascaded Integrated-Comb (CIC) filter is an alterna-tive for low-pass filtering techniques which has been devel-oped in [18 19] and involves only additions and subtractionsThis type of filter consists of 3 stages the integrating stage thedecimator or integrator stage and the comb section PDMsamples are recursively added in the integrating stage whilebeing recursively subtracted with a differential delay in thecomb stage The number of recursive operations in both the

integrating and comb section determines the order of thefilter (119873CIC) and should at least be equal to the order of thesigma delta converter from the DAC of the microphonesAfter the CIC filter the signal growth (119866) is proportional tothe decimation factor (119863CIC) and the differential delay (DD)and is exponential to the filter order [19]

119866 = (119863CIC sdot DD)119873CIC (13)

The output bit width grows proportionally to 119866 Denoteby 119861in the number of input bits then the number of outputbits 119861out is as follows

119861out = lceil119873CIC sdot log2 (119863CIC sdot DD) + 119861inrceil (14)

The proposed CIC decimation filter eliminates higherfrequency noise components and decimates the signal by119863CIC at the same time However a major disadvantage ofthis filter is the nonflat frequency response in the desiredaudio frequency range In order to improve the flatness ofthe frequency response a CIC filter with a lower decimationfactor followed by a compensation FIR filter is often chosenlike in [20ndash22]

The CIC filter is followed by an averager which is usedto cancel out the effects caused by the microphonesrsquo DCoffset output leading to a constant offset in the beamformingvalues This block improves the dynamic range reducing thebit width required to represent the data after the CIC

The last component of each filter chain is a low-passcompensation FIR filter based on a Kaiser windowThis filterequalises the passband drop usually introduced by CIC filters[19] It additionally performs a low rate changeThe proposedfilter also needs a cut-off frequency of 119865max at a samplingrate of 119865119904119863CIC which is the sampling rate obtained afterthe CIC decimator filter with a decimation factor of 119863CICThis low-pass FIR filter is designed in a serial fashion toreduce the resource consumption In fact the FIR filter order

8 Journal of Sensors

Pre-Computed Orientations

Delays subarray 1

Delays subarray 2

Delays subarray 3

+Mem delay microphone 1

Mem delay microphone N

+

Mem delay microphone 1

Mem delay microphone M

Delays subarray 4

+

Mem delay microphone 1

Mem delay microphone I

+

Mem delay microphone 1

Mem delay microphone J

Delays

+

Filtered MIC1

Sums

Beamforming stage

Filtered MIC2

Filtered MIC25

Filtered MIC52

Power valueper angle

Mem delaysubarray 4

Mem delaysubarray 3

Mem delaysubarray 2

Mem delaysubarray 1

Precomputed delays per orientation

Figure 9 Details of the internal structure of the proposed modular Filter-and-Sum beamforming Note that the delay values are stored in aprecomputed table

is also determined by 119863CIC Thereby the stream nature ofthe architecture the CIC filter is able to generate an outputvalue every clock cycleDue to the decimation factor only oneoutput value per 119863CIC input value is propagated to the low-pass FIR filter Therefore the FIR filter has 119863CIC clock cyclesto compute each input value which determines its maximumorderThe filtered signal is then further decimated by a factorof 119863FIR to obtain a minimum bandwidth BW = 2 sdot 119865max ofaudio signals to satisfy the Nyquist theorem The overall 119863119865can be expressed based on the low rate change of each filter

119863119865 = 119863CIC sdot 119863FIR (15)

42 Beamforming Stage As detailed before the main pur-pose of the beamforming operation is to focus the MEMSmicrophone array in one particular direction The detectionof sound sources is possible by continuously steering in loopsof 360∘ The number of orientations 119873119900 determines theangular resolution Higher angular resolutions demand notonly a larger execution time per steering loop but also moreFPGA memory resources to store the precomputed delaysper orientation

The beamforming stage depends on the number ofmicrophones and subarrays Although Filter-and-Sumbeam-forming assumes a fixed number of microphones and a fixedgeometry our scalable solution satisfies those restrictions

while offering a flexible geometry Figure 9 shows ourproposed Filter-and-Sum based beamformer This stage isbasically composed of FPGArsquos blocks of memory (BRAM)in ring-buffer fashion that properly delay the filtered micro-phone signal The values of the delays at a given momentdepend on the focus orientation at that moment and aredetermined by the array pattern 119882(119908119899 1205790 120579) from (5) Thedelay for a given microphone is determined by its positionon the array and on the focus orientation All possible delayvalues per microphone for each beamed orientation areprecomputed grouped per orientation and stored in ROMsduring compilation time During execution time the delayvalues Δ119898(120579) of each microphone 119898 when pointing to acertain orientation 120579 are obtained from this precomputedtable

The beamforming stage is designed to support a variablenumber of microphones This is enabled by grouping theinput signals following their subarray structure Thereforeinstead of implementing one simple Filter-and-Sum of 52microphones there are four Filter-and-Sum operations inparallel for the 4 8 16 and 24 microphones Their sumoperation is firstly done locally for each subarray and after-wards between subarraysTheonly restriction of thismodularbeamforming is the synchronization of the outputs in orderto have them properly delayedTherefore the easiest solutionis to delay all the subarrays with the maximum delay of the

Journal of Sensors 9

Power valueper angle Peak

detection

Power stage

Figure 10 The power stage consists of a couple of components tocalculate P-SRP and the estimated location of the sound-source

subarrays Although the output of some subarrays is alreadyproperly delayed additional delays shown at the Sums sec-tion in Figure 9 are inserted to assure that the proper delay ofeach subarray has been obtainedThis is achieved by using thevalid output signals of each subarray beamforming withoutadditional resource cost Consequently only the Filter-and-Sum beamforming modulo linked to an active subarray isenabled The not active beamformers are set to zero in orderto avoid any negative impact of the beamforming opera-tion

A side benefit of this modular approach is a reductionof the memory resource consumption Since each subarrayhas their ring-buffer memory properly dimensioned to itsmaximum sample delay the portion of underused regions ofthe consumed memories is significantly low

43 Power Stage Figure 10 shows the components of thepower stage Once the filtered data has been properly delayedand added for a particular orientation 120579 119875(120579 119905) is calculatedfollowing (10) The P-SRP is obtained after a steering loopallowing the determination of the sound sourcesThe sound-source is estimated to be located in direction shown by thepeak of the polar power map which corresponds to theorientation with the maximum 119875(120579 119905)5 Performance Analysis of the Filter-and-SumBased Architecture

A performance analysis of the proposed architecture ispresented in this section The analysis shows how the designparameters such as the filtersrsquo characteristics affect thefinal execution time of the sound-source locator The linksbetween performance and design parameters are explainedfollowed by the description of the different accelerationstrategies These strategies can be considered standalone orcombined for certain timing constraints The advantages ofthese strategies are lately presented in Section 6

51 Time Parameters The overall execution time of theproposed architecture is defined by the latency of the maincomponents A detailed analysis of the implementation ofcomponents and the latency that they incur provides a goodinsight about the speed of the system (Table 2)The operationfrequency of the design can be assumed to be the same asthe sampling frequency Let us define 119905P-SRP as the overall

Table 2 Relevant parameters involved in the performance calcula-tion for the proposed architecture

Parameter Definition119905119904 Sensing time119905119900 Execution time of one orientation119873119900 Number of orientations119871119900 Latency of the system119905P-SRP Time required to obtain a polar power map119905filtersII Initiation interval of the filter stage119905filters Execution time of the filter stage119905beamformingII Initiation interval of the beamforming stage

119905beamforming Execution time of the beamforming stage119905powerII Initiation interval of the power stage119905power Execution time of the power stage119905II Sum of all initiation intervals119905CICII Initiation interval of the CIC filter119905DCII Initiation interval of the removed DC block

119905FIRII Initiation interval of the FIR filter119905DelayII Initiation interval of the delay memories

119905SumII Initiation interval of the cascaded sums119905PowerII Initiation interval of the power calculation

execution time in clock cycles required to obtain P-SRPThus119905P-SRP is defined as

119905P-SRP = 119873119900 sdot 119905119900 = 119873119900 sdot (119905filters + 119905beamforming + 119905power) (16)

where 119905119900 is the execution time of one orientation and isdetermined by the execution time of the filter stage (119905filters)the execution time of the beamforming (119905beamforming) andthe execution time of the power stage (119905power) which are themain components of the system as explained in the previoussection The proposed architecture is designed to pipelineeach stage overlapping the execution of each component ofthe design Therefore only the initial latency or initiationinterval (II) of the components needs to be considered sinceit corresponds to the system group delay

Let us assume that the design operates at the same fre-quency 119865119878 like the microphones then (16) can be rearrangedas follows

119905P-SRP = 119873119900 sdot 119871119900119865119878

= 119873119900 sdot (119905filtersII + 119905beamformingII + 119905powerII + 119905119904)

(17)

where 119871119900 is the latency of the system and determined bythe initiation interval of the filter stage (119905filtersII ) the initiationinterval of the beamforming stage (119905beamforming

II ) and theinitiation interval of the power stage (119905powerII )The time duringwhich the microphone array is monitoring one particularorientation is known as 119905119904 This is the time required to calcu-late a certain number of output samples (119873119904) As previouslydetailed the digital microphones oversample the audio signalby operating at 119865119878 The reconstruction of the audio signal inthe target range demands a certain level of decimation 119863119865

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

6 Journal of Sensors

Am

plitu

de

1

08

06

04

02

0

Angle of arrival0 90 180 270 360

Mainlobe

Maximum sidelobe (MSL)

Half-power level(BWminus3dB)

Figure 6 Definitions of maximum side lobe (MSL) and 3 dB beamwidth (BW3 dB)

PDM splitter

BeamformingstageSubarray 1

Subarray 4

Control unit

FPGA

Microphone array

Filter stage Power stagemiddot middot middotmiddot middot middot

Figure 7 Main stages of the proposed architecture

between the highest side lobe and the power level of the mainlobe expressed in dB Bothmetrics theMSL and BWminus3 dB aredesired to be as low as possible whereas119863119901 should be as highas possible to guarantee a precise sound-source location

4 A Filter-and-Sum Based Architecture

The proposed architecture uses a Filter-and-Sum based-beamforming technique to locate a sound-source with anarray of digital MEMS microphones Many applicationshowever demand a certain scalability and flexibility whenlocating the sound-source With such requirements in mindthe proposed architecture has some additional features tosupport a dynamic response targeting applications with real-time demands The proposed architecture is also designed tobe battery power efficient and to operate in streaming fashionto achieve the fastest possible response

One of the features of the ADMP521 microphone is itslow-power sleep mode capability When no clock signal isprovided the ADMP521 microphone enters in a low-powersleepmode (lt1 120583A) whichmakes this sound-source localizersuitable for battery powered implementationsThePCBof theMEMsmicrophone array is designed to exploit this capabilityFigure 2 depicts the subarray distribution of the MEMsmicrophones Using the clock signal it is possible to activateor deactivate subarrays since each subarray is fetched withan individual clock signal This flexibility allows disablingnot only subarrays of microphones but also the associatedcomputational components decreasing the computational

Table 1 Relevant parameters involved in proposed architecture

Parameter Definition119865119904 Sampling frequency119865min Minimum frequency of the target sound source119865max Maximum frequency of the target sound sourceBW Minimum bandwidth to satisfy Nyquist119863119865 Decimation factor119863CIC CIC filter decimation factor119873CIC Order of the CIC filter119863FIR FIR filter decimation factor119873FIR Order of the FIR filter

demand and the power consumptionThe proposed architec-ture is properly designed to support such flexibility

The array computes its response as fast as possible to reachreal-time sound-source location The proposed architectureis designed to process in stream fashion and is mainlycomposed of three cascaded stages operating in pipeline(Figure 7)Thefirst stage is the filter chain which is composedof the minimum number of components required to recoverthe audio signal in the target frequency range The secondstage computes the Filter-and-Sum beamforming operationThe final stage obtains 119875(120579 119905) for the focused orientation Apolar power map is obtained once a complete steering loop iscompleted The different stages are discussed in more detailin the following subsections Table 1 summarizes the mostrelevant parameters of the proposed architecture

Journal of Sensors 7

CIC decimator

filterlow-passFIR filter

Remove DC

Filter chain 1

PDM

low-passFIR filter

Filter chain 52

PDM Filtered PCM

Filtered PCM

Filters stage

Remove DCCIC

decimator filter

NCICth-order

NCICth-order

NFIRth-order

NFIRth-order

DCIC

DCIC

DFIR

DFIR

PDMsplitter

Figure 8 The filtering stage consists of a couple of filters with a downsampling factor

41 Filter Stage The filter stage contains a PDM demulti-plexer and asmany filter chain blocks asMEMSmicrophones(Figure 8) Each microphone of the array is associated with afilter chain composed of a couple of cascaded filtersThe full-capacity design supports up to 52 filter chain blocks workingin parallel but their number is defined by the number ofactive microphones The unnecessary filter chain blocks aredisabled at runtime

Themicrophonesrsquo clock 119865119878 determines the input rate andtherefore how fast the filter stage should operate The lowoperating frequency for current FPGAs allows interestingpower savings [17]

Every pair of microphones has its PDM output signalmultiplexed in time Thus at every edge of the clock cyclethe output is the sampled data from one of the microphonesThe PDM demultiplexing is the first operation to obtain theindividual sampled data from each microphone This task isdone in the PDM splitter block

The next component consists of a cascade of filters tofilter and to downsample eachmicrophone signal Traditionaldigital filters such as the Finite Impulse Response (FIR) typeof filters are a good solution to reduce the signal bandwidthand to remove the higher frequency noise Once the signalis filtered it can be decimated to decrease the oversamplingto a reasonable audio quality rate (eg 48 kHz) Howeverthis filter consumes many adders and dedicated multipliers(DSPs) from the FPGA resources particularly if its orderincreases

The Cascaded Integrated-Comb (CIC) filter is an alterna-tive for low-pass filtering techniques which has been devel-oped in [18 19] and involves only additions and subtractionsThis type of filter consists of 3 stages the integrating stage thedecimator or integrator stage and the comb section PDMsamples are recursively added in the integrating stage whilebeing recursively subtracted with a differential delay in thecomb stage The number of recursive operations in both the

integrating and comb section determines the order of thefilter (119873CIC) and should at least be equal to the order of thesigma delta converter from the DAC of the microphonesAfter the CIC filter the signal growth (119866) is proportional tothe decimation factor (119863CIC) and the differential delay (DD)and is exponential to the filter order [19]

119866 = (119863CIC sdot DD)119873CIC (13)

The output bit width grows proportionally to 119866 Denoteby 119861in the number of input bits then the number of outputbits 119861out is as follows

119861out = lceil119873CIC sdot log2 (119863CIC sdot DD) + 119861inrceil (14)

The proposed CIC decimation filter eliminates higherfrequency noise components and decimates the signal by119863CIC at the same time However a major disadvantage ofthis filter is the nonflat frequency response in the desiredaudio frequency range In order to improve the flatness ofthe frequency response a CIC filter with a lower decimationfactor followed by a compensation FIR filter is often chosenlike in [20ndash22]

The CIC filter is followed by an averager which is usedto cancel out the effects caused by the microphonesrsquo DCoffset output leading to a constant offset in the beamformingvalues This block improves the dynamic range reducing thebit width required to represent the data after the CIC

The last component of each filter chain is a low-passcompensation FIR filter based on a Kaiser windowThis filterequalises the passband drop usually introduced by CIC filters[19] It additionally performs a low rate changeThe proposedfilter also needs a cut-off frequency of 119865max at a samplingrate of 119865119904119863CIC which is the sampling rate obtained afterthe CIC decimator filter with a decimation factor of 119863CICThis low-pass FIR filter is designed in a serial fashion toreduce the resource consumption In fact the FIR filter order

8 Journal of Sensors

Pre-Computed Orientations

Delays subarray 1

Delays subarray 2

Delays subarray 3

+Mem delay microphone 1

Mem delay microphone N

+

Mem delay microphone 1

Mem delay microphone M

Delays subarray 4

+

Mem delay microphone 1

Mem delay microphone I

+

Mem delay microphone 1

Mem delay microphone J

Delays

+

Filtered MIC1

Sums

Beamforming stage

Filtered MIC2

Filtered MIC25

Filtered MIC52

Power valueper angle

Mem delaysubarray 4

Mem delaysubarray 3

Mem delaysubarray 2

Mem delaysubarray 1

Precomputed delays per orientation

Figure 9 Details of the internal structure of the proposed modular Filter-and-Sum beamforming Note that the delay values are stored in aprecomputed table

is also determined by 119863CIC Thereby the stream nature ofthe architecture the CIC filter is able to generate an outputvalue every clock cycleDue to the decimation factor only oneoutput value per 119863CIC input value is propagated to the low-pass FIR filter Therefore the FIR filter has 119863CIC clock cyclesto compute each input value which determines its maximumorderThe filtered signal is then further decimated by a factorof 119863FIR to obtain a minimum bandwidth BW = 2 sdot 119865max ofaudio signals to satisfy the Nyquist theorem The overall 119863119865can be expressed based on the low rate change of each filter

119863119865 = 119863CIC sdot 119863FIR (15)

42 Beamforming Stage As detailed before the main pur-pose of the beamforming operation is to focus the MEMSmicrophone array in one particular direction The detectionof sound sources is possible by continuously steering in loopsof 360∘ The number of orientations 119873119900 determines theangular resolution Higher angular resolutions demand notonly a larger execution time per steering loop but also moreFPGA memory resources to store the precomputed delaysper orientation

The beamforming stage depends on the number ofmicrophones and subarrays Although Filter-and-Sumbeam-forming assumes a fixed number of microphones and a fixedgeometry our scalable solution satisfies those restrictions

while offering a flexible geometry Figure 9 shows ourproposed Filter-and-Sum based beamformer This stage isbasically composed of FPGArsquos blocks of memory (BRAM)in ring-buffer fashion that properly delay the filtered micro-phone signal The values of the delays at a given momentdepend on the focus orientation at that moment and aredetermined by the array pattern 119882(119908119899 1205790 120579) from (5) Thedelay for a given microphone is determined by its positionon the array and on the focus orientation All possible delayvalues per microphone for each beamed orientation areprecomputed grouped per orientation and stored in ROMsduring compilation time During execution time the delayvalues Δ119898(120579) of each microphone 119898 when pointing to acertain orientation 120579 are obtained from this precomputedtable

The beamforming stage is designed to support a variablenumber of microphones This is enabled by grouping theinput signals following their subarray structure Thereforeinstead of implementing one simple Filter-and-Sum of 52microphones there are four Filter-and-Sum operations inparallel for the 4 8 16 and 24 microphones Their sumoperation is firstly done locally for each subarray and after-wards between subarraysTheonly restriction of thismodularbeamforming is the synchronization of the outputs in orderto have them properly delayedTherefore the easiest solutionis to delay all the subarrays with the maximum delay of the

Journal of Sensors 9

Power valueper angle Peak

detection

Power stage

Figure 10 The power stage consists of a couple of components tocalculate P-SRP and the estimated location of the sound-source

subarrays Although the output of some subarrays is alreadyproperly delayed additional delays shown at the Sums sec-tion in Figure 9 are inserted to assure that the proper delay ofeach subarray has been obtainedThis is achieved by using thevalid output signals of each subarray beamforming withoutadditional resource cost Consequently only the Filter-and-Sum beamforming modulo linked to an active subarray isenabled The not active beamformers are set to zero in orderto avoid any negative impact of the beamforming opera-tion

A side benefit of this modular approach is a reductionof the memory resource consumption Since each subarrayhas their ring-buffer memory properly dimensioned to itsmaximum sample delay the portion of underused regions ofthe consumed memories is significantly low

43 Power Stage Figure 10 shows the components of thepower stage Once the filtered data has been properly delayedand added for a particular orientation 120579 119875(120579 119905) is calculatedfollowing (10) The P-SRP is obtained after a steering loopallowing the determination of the sound sourcesThe sound-source is estimated to be located in direction shown by thepeak of the polar power map which corresponds to theorientation with the maximum 119875(120579 119905)5 Performance Analysis of the Filter-and-SumBased Architecture

A performance analysis of the proposed architecture ispresented in this section The analysis shows how the designparameters such as the filtersrsquo characteristics affect thefinal execution time of the sound-source locator The linksbetween performance and design parameters are explainedfollowed by the description of the different accelerationstrategies These strategies can be considered standalone orcombined for certain timing constraints The advantages ofthese strategies are lately presented in Section 6

51 Time Parameters The overall execution time of theproposed architecture is defined by the latency of the maincomponents A detailed analysis of the implementation ofcomponents and the latency that they incur provides a goodinsight about the speed of the system (Table 2)The operationfrequency of the design can be assumed to be the same asthe sampling frequency Let us define 119905P-SRP as the overall

Table 2 Relevant parameters involved in the performance calcula-tion for the proposed architecture

Parameter Definition119905119904 Sensing time119905119900 Execution time of one orientation119873119900 Number of orientations119871119900 Latency of the system119905P-SRP Time required to obtain a polar power map119905filtersII Initiation interval of the filter stage119905filters Execution time of the filter stage119905beamformingII Initiation interval of the beamforming stage

119905beamforming Execution time of the beamforming stage119905powerII Initiation interval of the power stage119905power Execution time of the power stage119905II Sum of all initiation intervals119905CICII Initiation interval of the CIC filter119905DCII Initiation interval of the removed DC block

119905FIRII Initiation interval of the FIR filter119905DelayII Initiation interval of the delay memories

119905SumII Initiation interval of the cascaded sums119905PowerII Initiation interval of the power calculation

execution time in clock cycles required to obtain P-SRPThus119905P-SRP is defined as

119905P-SRP = 119873119900 sdot 119905119900 = 119873119900 sdot (119905filters + 119905beamforming + 119905power) (16)

where 119905119900 is the execution time of one orientation and isdetermined by the execution time of the filter stage (119905filters)the execution time of the beamforming (119905beamforming) andthe execution time of the power stage (119905power) which are themain components of the system as explained in the previoussection The proposed architecture is designed to pipelineeach stage overlapping the execution of each component ofthe design Therefore only the initial latency or initiationinterval (II) of the components needs to be considered sinceit corresponds to the system group delay

Let us assume that the design operates at the same fre-quency 119865119878 like the microphones then (16) can be rearrangedas follows

119905P-SRP = 119873119900 sdot 119871119900119865119878

= 119873119900 sdot (119905filtersII + 119905beamformingII + 119905powerII + 119905119904)

(17)

where 119871119900 is the latency of the system and determined bythe initiation interval of the filter stage (119905filtersII ) the initiationinterval of the beamforming stage (119905beamforming

II ) and theinitiation interval of the power stage (119905powerII )The time duringwhich the microphone array is monitoring one particularorientation is known as 119905119904 This is the time required to calcu-late a certain number of output samples (119873119904) As previouslydetailed the digital microphones oversample the audio signalby operating at 119865119878 The reconstruction of the audio signal inthe target range demands a certain level of decimation 119863119865

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Journal of Sensors 7

CIC decimator

filterlow-passFIR filter

Remove DC

Filter chain 1

PDM

low-passFIR filter

Filter chain 52

PDM Filtered PCM

Filtered PCM

Filters stage

Remove DCCIC

decimator filter

NCICth-order

NCICth-order

NFIRth-order

NFIRth-order

DCIC

DCIC

DFIR

DFIR

PDMsplitter

Figure 8 The filtering stage consists of a couple of filters with a downsampling factor

41 Filter Stage The filter stage contains a PDM demulti-plexer and asmany filter chain blocks asMEMSmicrophones(Figure 8) Each microphone of the array is associated with afilter chain composed of a couple of cascaded filtersThe full-capacity design supports up to 52 filter chain blocks workingin parallel but their number is defined by the number ofactive microphones The unnecessary filter chain blocks aredisabled at runtime

Themicrophonesrsquo clock 119865119878 determines the input rate andtherefore how fast the filter stage should operate The lowoperating frequency for current FPGAs allows interestingpower savings [17]

Every pair of microphones has its PDM output signalmultiplexed in time Thus at every edge of the clock cyclethe output is the sampled data from one of the microphonesThe PDM demultiplexing is the first operation to obtain theindividual sampled data from each microphone This task isdone in the PDM splitter block

The next component consists of a cascade of filters tofilter and to downsample eachmicrophone signal Traditionaldigital filters such as the Finite Impulse Response (FIR) typeof filters are a good solution to reduce the signal bandwidthand to remove the higher frequency noise Once the signalis filtered it can be decimated to decrease the oversamplingto a reasonable audio quality rate (eg 48 kHz) Howeverthis filter consumes many adders and dedicated multipliers(DSPs) from the FPGA resources particularly if its orderincreases

The Cascaded Integrated-Comb (CIC) filter is an alterna-tive for low-pass filtering techniques which has been devel-oped in [18 19] and involves only additions and subtractionsThis type of filter consists of 3 stages the integrating stage thedecimator or integrator stage and the comb section PDMsamples are recursively added in the integrating stage whilebeing recursively subtracted with a differential delay in thecomb stage The number of recursive operations in both the

integrating and comb section determines the order of thefilter (119873CIC) and should at least be equal to the order of thesigma delta converter from the DAC of the microphonesAfter the CIC filter the signal growth (119866) is proportional tothe decimation factor (119863CIC) and the differential delay (DD)and is exponential to the filter order [19]

119866 = (119863CIC sdot DD)119873CIC (13)

The output bit width grows proportionally to 119866 Denoteby 119861in the number of input bits then the number of outputbits 119861out is as follows

119861out = lceil119873CIC sdot log2 (119863CIC sdot DD) + 119861inrceil (14)

The proposed CIC decimation filter eliminates higherfrequency noise components and decimates the signal by119863CIC at the same time However a major disadvantage ofthis filter is the nonflat frequency response in the desiredaudio frequency range In order to improve the flatness ofthe frequency response a CIC filter with a lower decimationfactor followed by a compensation FIR filter is often chosenlike in [20ndash22]

The CIC filter is followed by an averager which is usedto cancel out the effects caused by the microphonesrsquo DCoffset output leading to a constant offset in the beamformingvalues This block improves the dynamic range reducing thebit width required to represent the data after the CIC

The last component of each filter chain is a low-passcompensation FIR filter based on a Kaiser windowThis filterequalises the passband drop usually introduced by CIC filters[19] It additionally performs a low rate changeThe proposedfilter also needs a cut-off frequency of 119865max at a samplingrate of 119865119904119863CIC which is the sampling rate obtained afterthe CIC decimator filter with a decimation factor of 119863CICThis low-pass FIR filter is designed in a serial fashion toreduce the resource consumption In fact the FIR filter order

8 Journal of Sensors

Pre-Computed Orientations

Delays subarray 1

Delays subarray 2

Delays subarray 3

+Mem delay microphone 1

Mem delay microphone N

+

Mem delay microphone 1

Mem delay microphone M

Delays subarray 4

+

Mem delay microphone 1

Mem delay microphone I

+

Mem delay microphone 1

Mem delay microphone J

Delays

+

Filtered MIC1

Sums

Beamforming stage

Filtered MIC2

Filtered MIC25

Filtered MIC52

Power valueper angle

Mem delaysubarray 4

Mem delaysubarray 3

Mem delaysubarray 2

Mem delaysubarray 1

Precomputed delays per orientation

Figure 9 Details of the internal structure of the proposed modular Filter-and-Sum beamforming Note that the delay values are stored in aprecomputed table

is also determined by 119863CIC Thereby the stream nature ofthe architecture the CIC filter is able to generate an outputvalue every clock cycleDue to the decimation factor only oneoutput value per 119863CIC input value is propagated to the low-pass FIR filter Therefore the FIR filter has 119863CIC clock cyclesto compute each input value which determines its maximumorderThe filtered signal is then further decimated by a factorof 119863FIR to obtain a minimum bandwidth BW = 2 sdot 119865max ofaudio signals to satisfy the Nyquist theorem The overall 119863119865can be expressed based on the low rate change of each filter

119863119865 = 119863CIC sdot 119863FIR (15)

42 Beamforming Stage As detailed before the main pur-pose of the beamforming operation is to focus the MEMSmicrophone array in one particular direction The detectionof sound sources is possible by continuously steering in loopsof 360∘ The number of orientations 119873119900 determines theangular resolution Higher angular resolutions demand notonly a larger execution time per steering loop but also moreFPGA memory resources to store the precomputed delaysper orientation

The beamforming stage depends on the number ofmicrophones and subarrays Although Filter-and-Sumbeam-forming assumes a fixed number of microphones and a fixedgeometry our scalable solution satisfies those restrictions

while offering a flexible geometry Figure 9 shows ourproposed Filter-and-Sum based beamformer This stage isbasically composed of FPGArsquos blocks of memory (BRAM)in ring-buffer fashion that properly delay the filtered micro-phone signal The values of the delays at a given momentdepend on the focus orientation at that moment and aredetermined by the array pattern 119882(119908119899 1205790 120579) from (5) Thedelay for a given microphone is determined by its positionon the array and on the focus orientation All possible delayvalues per microphone for each beamed orientation areprecomputed grouped per orientation and stored in ROMsduring compilation time During execution time the delayvalues Δ119898(120579) of each microphone 119898 when pointing to acertain orientation 120579 are obtained from this precomputedtable

The beamforming stage is designed to support a variablenumber of microphones This is enabled by grouping theinput signals following their subarray structure Thereforeinstead of implementing one simple Filter-and-Sum of 52microphones there are four Filter-and-Sum operations inparallel for the 4 8 16 and 24 microphones Their sumoperation is firstly done locally for each subarray and after-wards between subarraysTheonly restriction of thismodularbeamforming is the synchronization of the outputs in orderto have them properly delayedTherefore the easiest solutionis to delay all the subarrays with the maximum delay of the

Journal of Sensors 9

Power valueper angle Peak

detection

Power stage

Figure 10 The power stage consists of a couple of components tocalculate P-SRP and the estimated location of the sound-source

subarrays Although the output of some subarrays is alreadyproperly delayed additional delays shown at the Sums sec-tion in Figure 9 are inserted to assure that the proper delay ofeach subarray has been obtainedThis is achieved by using thevalid output signals of each subarray beamforming withoutadditional resource cost Consequently only the Filter-and-Sum beamforming modulo linked to an active subarray isenabled The not active beamformers are set to zero in orderto avoid any negative impact of the beamforming opera-tion

A side benefit of this modular approach is a reductionof the memory resource consumption Since each subarrayhas their ring-buffer memory properly dimensioned to itsmaximum sample delay the portion of underused regions ofthe consumed memories is significantly low

43 Power Stage Figure 10 shows the components of thepower stage Once the filtered data has been properly delayedand added for a particular orientation 120579 119875(120579 119905) is calculatedfollowing (10) The P-SRP is obtained after a steering loopallowing the determination of the sound sourcesThe sound-source is estimated to be located in direction shown by thepeak of the polar power map which corresponds to theorientation with the maximum 119875(120579 119905)5 Performance Analysis of the Filter-and-SumBased Architecture

A performance analysis of the proposed architecture ispresented in this section The analysis shows how the designparameters such as the filtersrsquo characteristics affect thefinal execution time of the sound-source locator The linksbetween performance and design parameters are explainedfollowed by the description of the different accelerationstrategies These strategies can be considered standalone orcombined for certain timing constraints The advantages ofthese strategies are lately presented in Section 6

51 Time Parameters The overall execution time of theproposed architecture is defined by the latency of the maincomponents A detailed analysis of the implementation ofcomponents and the latency that they incur provides a goodinsight about the speed of the system (Table 2)The operationfrequency of the design can be assumed to be the same asthe sampling frequency Let us define 119905P-SRP as the overall

Table 2 Relevant parameters involved in the performance calcula-tion for the proposed architecture

Parameter Definition119905119904 Sensing time119905119900 Execution time of one orientation119873119900 Number of orientations119871119900 Latency of the system119905P-SRP Time required to obtain a polar power map119905filtersII Initiation interval of the filter stage119905filters Execution time of the filter stage119905beamformingII Initiation interval of the beamforming stage

119905beamforming Execution time of the beamforming stage119905powerII Initiation interval of the power stage119905power Execution time of the power stage119905II Sum of all initiation intervals119905CICII Initiation interval of the CIC filter119905DCII Initiation interval of the removed DC block

119905FIRII Initiation interval of the FIR filter119905DelayII Initiation interval of the delay memories

119905SumII Initiation interval of the cascaded sums119905PowerII Initiation interval of the power calculation

execution time in clock cycles required to obtain P-SRPThus119905P-SRP is defined as

119905P-SRP = 119873119900 sdot 119905119900 = 119873119900 sdot (119905filters + 119905beamforming + 119905power) (16)

where 119905119900 is the execution time of one orientation and isdetermined by the execution time of the filter stage (119905filters)the execution time of the beamforming (119905beamforming) andthe execution time of the power stage (119905power) which are themain components of the system as explained in the previoussection The proposed architecture is designed to pipelineeach stage overlapping the execution of each component ofthe design Therefore only the initial latency or initiationinterval (II) of the components needs to be considered sinceit corresponds to the system group delay

Let us assume that the design operates at the same fre-quency 119865119878 like the microphones then (16) can be rearrangedas follows

119905P-SRP = 119873119900 sdot 119871119900119865119878

= 119873119900 sdot (119905filtersII + 119905beamformingII + 119905powerII + 119905119904)

(17)

where 119871119900 is the latency of the system and determined bythe initiation interval of the filter stage (119905filtersII ) the initiationinterval of the beamforming stage (119905beamforming

II ) and theinitiation interval of the power stage (119905powerII )The time duringwhich the microphone array is monitoring one particularorientation is known as 119905119904 This is the time required to calcu-late a certain number of output samples (119873119904) As previouslydetailed the digital microphones oversample the audio signalby operating at 119865119878 The reconstruction of the audio signal inthe target range demands a certain level of decimation 119863119865

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

8 Journal of Sensors

Pre-Computed Orientations

Delays subarray 1

Delays subarray 2

Delays subarray 3

+Mem delay microphone 1

Mem delay microphone N

+

Mem delay microphone 1

Mem delay microphone M

Delays subarray 4

+

Mem delay microphone 1

Mem delay microphone I

+

Mem delay microphone 1

Mem delay microphone J

Delays

+

Filtered MIC1

Sums

Beamforming stage

Filtered MIC2

Filtered MIC25

Filtered MIC52

Power valueper angle

Mem delaysubarray 4

Mem delaysubarray 3

Mem delaysubarray 2

Mem delaysubarray 1

Precomputed delays per orientation

Figure 9 Details of the internal structure of the proposed modular Filter-and-Sum beamforming Note that the delay values are stored in aprecomputed table

is also determined by 119863CIC Thereby the stream nature ofthe architecture the CIC filter is able to generate an outputvalue every clock cycleDue to the decimation factor only oneoutput value per 119863CIC input value is propagated to the low-pass FIR filter Therefore the FIR filter has 119863CIC clock cyclesto compute each input value which determines its maximumorderThe filtered signal is then further decimated by a factorof 119863FIR to obtain a minimum bandwidth BW = 2 sdot 119865max ofaudio signals to satisfy the Nyquist theorem The overall 119863119865can be expressed based on the low rate change of each filter

119863119865 = 119863CIC sdot 119863FIR (15)

42 Beamforming Stage As detailed before the main pur-pose of the beamforming operation is to focus the MEMSmicrophone array in one particular direction The detectionof sound sources is possible by continuously steering in loopsof 360∘ The number of orientations 119873119900 determines theangular resolution Higher angular resolutions demand notonly a larger execution time per steering loop but also moreFPGA memory resources to store the precomputed delaysper orientation

The beamforming stage depends on the number ofmicrophones and subarrays Although Filter-and-Sumbeam-forming assumes a fixed number of microphones and a fixedgeometry our scalable solution satisfies those restrictions

while offering a flexible geometry Figure 9 shows ourproposed Filter-and-Sum based beamformer This stage isbasically composed of FPGArsquos blocks of memory (BRAM)in ring-buffer fashion that properly delay the filtered micro-phone signal The values of the delays at a given momentdepend on the focus orientation at that moment and aredetermined by the array pattern 119882(119908119899 1205790 120579) from (5) Thedelay for a given microphone is determined by its positionon the array and on the focus orientation All possible delayvalues per microphone for each beamed orientation areprecomputed grouped per orientation and stored in ROMsduring compilation time During execution time the delayvalues Δ119898(120579) of each microphone 119898 when pointing to acertain orientation 120579 are obtained from this precomputedtable

The beamforming stage is designed to support a variablenumber of microphones This is enabled by grouping theinput signals following their subarray structure Thereforeinstead of implementing one simple Filter-and-Sum of 52microphones there are four Filter-and-Sum operations inparallel for the 4 8 16 and 24 microphones Their sumoperation is firstly done locally for each subarray and after-wards between subarraysTheonly restriction of thismodularbeamforming is the synchronization of the outputs in orderto have them properly delayedTherefore the easiest solutionis to delay all the subarrays with the maximum delay of the

Journal of Sensors 9

Power valueper angle Peak

detection

Power stage

Figure 10 The power stage consists of a couple of components tocalculate P-SRP and the estimated location of the sound-source

subarrays Although the output of some subarrays is alreadyproperly delayed additional delays shown at the Sums sec-tion in Figure 9 are inserted to assure that the proper delay ofeach subarray has been obtainedThis is achieved by using thevalid output signals of each subarray beamforming withoutadditional resource cost Consequently only the Filter-and-Sum beamforming modulo linked to an active subarray isenabled The not active beamformers are set to zero in orderto avoid any negative impact of the beamforming opera-tion

A side benefit of this modular approach is a reductionof the memory resource consumption Since each subarrayhas their ring-buffer memory properly dimensioned to itsmaximum sample delay the portion of underused regions ofthe consumed memories is significantly low

43 Power Stage Figure 10 shows the components of thepower stage Once the filtered data has been properly delayedand added for a particular orientation 120579 119875(120579 119905) is calculatedfollowing (10) The P-SRP is obtained after a steering loopallowing the determination of the sound sourcesThe sound-source is estimated to be located in direction shown by thepeak of the polar power map which corresponds to theorientation with the maximum 119875(120579 119905)5 Performance Analysis of the Filter-and-SumBased Architecture

A performance analysis of the proposed architecture ispresented in this section The analysis shows how the designparameters such as the filtersrsquo characteristics affect thefinal execution time of the sound-source locator The linksbetween performance and design parameters are explainedfollowed by the description of the different accelerationstrategies These strategies can be considered standalone orcombined for certain timing constraints The advantages ofthese strategies are lately presented in Section 6

51 Time Parameters The overall execution time of theproposed architecture is defined by the latency of the maincomponents A detailed analysis of the implementation ofcomponents and the latency that they incur provides a goodinsight about the speed of the system (Table 2)The operationfrequency of the design can be assumed to be the same asthe sampling frequency Let us define 119905P-SRP as the overall

Table 2 Relevant parameters involved in the performance calcula-tion for the proposed architecture

Parameter Definition119905119904 Sensing time119905119900 Execution time of one orientation119873119900 Number of orientations119871119900 Latency of the system119905P-SRP Time required to obtain a polar power map119905filtersII Initiation interval of the filter stage119905filters Execution time of the filter stage119905beamformingII Initiation interval of the beamforming stage

119905beamforming Execution time of the beamforming stage119905powerII Initiation interval of the power stage119905power Execution time of the power stage119905II Sum of all initiation intervals119905CICII Initiation interval of the CIC filter119905DCII Initiation interval of the removed DC block

119905FIRII Initiation interval of the FIR filter119905DelayII Initiation interval of the delay memories

119905SumII Initiation interval of the cascaded sums119905PowerII Initiation interval of the power calculation

execution time in clock cycles required to obtain P-SRPThus119905P-SRP is defined as

119905P-SRP = 119873119900 sdot 119905119900 = 119873119900 sdot (119905filters + 119905beamforming + 119905power) (16)

where 119905119900 is the execution time of one orientation and isdetermined by the execution time of the filter stage (119905filters)the execution time of the beamforming (119905beamforming) andthe execution time of the power stage (119905power) which are themain components of the system as explained in the previoussection The proposed architecture is designed to pipelineeach stage overlapping the execution of each component ofthe design Therefore only the initial latency or initiationinterval (II) of the components needs to be considered sinceit corresponds to the system group delay

Let us assume that the design operates at the same fre-quency 119865119878 like the microphones then (16) can be rearrangedas follows

119905P-SRP = 119873119900 sdot 119871119900119865119878

= 119873119900 sdot (119905filtersII + 119905beamformingII + 119905powerII + 119905119904)

(17)

where 119871119900 is the latency of the system and determined bythe initiation interval of the filter stage (119905filtersII ) the initiationinterval of the beamforming stage (119905beamforming

II ) and theinitiation interval of the power stage (119905powerII )The time duringwhich the microphone array is monitoring one particularorientation is known as 119905119904 This is the time required to calcu-late a certain number of output samples (119873119904) As previouslydetailed the digital microphones oversample the audio signalby operating at 119865119878 The reconstruction of the audio signal inthe target range demands a certain level of decimation 119863119865

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Journal of Sensors 9

Power valueper angle Peak

detection

Power stage

Figure 10 The power stage consists of a couple of components tocalculate P-SRP and the estimated location of the sound-source

subarrays Although the output of some subarrays is alreadyproperly delayed additional delays shown at the Sums sec-tion in Figure 9 are inserted to assure that the proper delay ofeach subarray has been obtainedThis is achieved by using thevalid output signals of each subarray beamforming withoutadditional resource cost Consequently only the Filter-and-Sum beamforming modulo linked to an active subarray isenabled The not active beamformers are set to zero in orderto avoid any negative impact of the beamforming opera-tion

A side benefit of this modular approach is a reductionof the memory resource consumption Since each subarrayhas their ring-buffer memory properly dimensioned to itsmaximum sample delay the portion of underused regions ofthe consumed memories is significantly low

43 Power Stage Figure 10 shows the components of thepower stage Once the filtered data has been properly delayedand added for a particular orientation 120579 119875(120579 119905) is calculatedfollowing (10) The P-SRP is obtained after a steering loopallowing the determination of the sound sourcesThe sound-source is estimated to be located in direction shown by thepeak of the polar power map which corresponds to theorientation with the maximum 119875(120579 119905)5 Performance Analysis of the Filter-and-SumBased Architecture

A performance analysis of the proposed architecture ispresented in this section The analysis shows how the designparameters such as the filtersrsquo characteristics affect thefinal execution time of the sound-source locator The linksbetween performance and design parameters are explainedfollowed by the description of the different accelerationstrategies These strategies can be considered standalone orcombined for certain timing constraints The advantages ofthese strategies are lately presented in Section 6

51 Time Parameters The overall execution time of theproposed architecture is defined by the latency of the maincomponents A detailed analysis of the implementation ofcomponents and the latency that they incur provides a goodinsight about the speed of the system (Table 2)The operationfrequency of the design can be assumed to be the same asthe sampling frequency Let us define 119905P-SRP as the overall

Table 2 Relevant parameters involved in the performance calcula-tion for the proposed architecture

Parameter Definition119905119904 Sensing time119905119900 Execution time of one orientation119873119900 Number of orientations119871119900 Latency of the system119905P-SRP Time required to obtain a polar power map119905filtersII Initiation interval of the filter stage119905filters Execution time of the filter stage119905beamformingII Initiation interval of the beamforming stage

119905beamforming Execution time of the beamforming stage119905powerII Initiation interval of the power stage119905power Execution time of the power stage119905II Sum of all initiation intervals119905CICII Initiation interval of the CIC filter119905DCII Initiation interval of the removed DC block

119905FIRII Initiation interval of the FIR filter119905DelayII Initiation interval of the delay memories

119905SumII Initiation interval of the cascaded sums119905PowerII Initiation interval of the power calculation

execution time in clock cycles required to obtain P-SRPThus119905P-SRP is defined as

119905P-SRP = 119873119900 sdot 119905119900 = 119873119900 sdot (119905filters + 119905beamforming + 119905power) (16)

where 119905119900 is the execution time of one orientation and isdetermined by the execution time of the filter stage (119905filters)the execution time of the beamforming (119905beamforming) andthe execution time of the power stage (119905power) which are themain components of the system as explained in the previoussection The proposed architecture is designed to pipelineeach stage overlapping the execution of each component ofthe design Therefore only the initial latency or initiationinterval (II) of the components needs to be considered sinceit corresponds to the system group delay

Let us assume that the design operates at the same fre-quency 119865119878 like the microphones then (16) can be rearrangedas follows

119905P-SRP = 119873119900 sdot 119871119900119865119878

= 119873119900 sdot (119905filtersII + 119905beamformingII + 119905powerII + 119905119904)

(17)

where 119871119900 is the latency of the system and determined bythe initiation interval of the filter stage (119905filtersII ) the initiationinterval of the beamforming stage (119905beamforming

II ) and theinitiation interval of the power stage (119905powerII )The time duringwhich the microphone array is monitoring one particularorientation is known as 119905119904 This is the time required to calcu-late a certain number of output samples (119873119904) As previouslydetailed the digital microphones oversample the audio signalby operating at 119865119878 The reconstruction of the audio signal inthe target range demands a certain level of decimation 119863119865

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

10 Journal of Sensors

Power

Sum

Delay

FIR filter

Remove DC

CIC

tstCICII tDC

II tFIRII tDelayII

tSumII tPowerII

Figure 11 Timing analysis of the pipelined execution of the components

This level of decimation is done by the CIC and the FIR filterin the filter stage with a certain level of decimation (119863CIC)and (119863FIR) respectively Based on 119863119865 defined in (1) the time119905119904 is expressed as follows

119905119904 = 119863119865 sdot 119873119904119865119878 = lceil 119865119878

BWrceil sdot 119873119904

119865119878 asymp 1198731199042 sdot 119865max

(18)

II of each stage of the implementation can also be furtherdecomposed based on the latency of the components

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

119905beamformingII = 119905Delay

II + 119905SumII (19)

where 119905119894II is the initiation interval of each component 119894Therefore 119905II is defined as the sum of all the initiationintervals

119905II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II + 119905SumII + 119905PowerII (20)

Equation (16) can be rearranged (see Figure 11) as

119905P-SRP = 119873119900 sdot (119905II + 119905119904) (21)

The execution time 119905P-SRP is determined by 119873119900 and 119873119904since the level of decimation is determined by the targetfrequency range and 119905II is determined by the componentsrsquodesign Although most of the latency of each component ofthe design is hidden thanks to the pipelined operation thereare still some cycles dedicated to initialize the componentsA detailed analysis of 119905II provides valuable information aboutthe performance leaks

CIC The initiation interval of the CIC filter represents thetime required to fulfil the integrator and the comb stagesTherefore the order of the CIC (119873CIC) determines 119905CICII

119905CICII = 2 sdot 119873CIC + 1119865119878 (22)

DC The component which must remove the DC level of thesignal introduces a minor initial latency due to its internal

registers Since it needs at least two input values to calculatethe DC level it also depends on 119863CIC

119905DCII = 119863CIC + 2

119865119878 (23)

FIRThe initiation interval of the FIR filter is also determinedby the order of this filter (119873FIR) Since the filter operationis basically a convolution the initial output values are notcorrect until at least the lceil(119873FIR + 1)2rceilth input signal of thefilter Because the filters are cascaded 119863CIC also affects 119905FIRII

119905FIRII = 119863CIC sdot (lceil(119873FIR + 1) 2rceil + 1)119865119878 (24)

Therefore 119905filtersII is expressed as follows

119905filtersII = 119905CICII + 119905DCII + 119905FIRII

= 2 sdot 119873CIC + 119863CIC sdot (2 + lceil(119873FIR + 1) 2rceil) + 3119865119878

(25)

Delay The beamforming operation is done through memo-ries which properly delay the audio samples for a particularorientation The maximum number of samples determinesthe minimum size of these delay memories This value repre-sents the maximum distance between a pair of microphonesfor a certain microphone array distribution and may vary foreach orientationThe initiation interval of the Filter-and-Sumbeamformer is therefore expressed as the maximum distancebetween pairs of microphones for a particular orientation

119905DelayII = max (Δ am (120579)) sdot 119863119865

119865119878 (26)

where max(Δ am(120579)) is the maximum time delay of the activemicrophones for the beamed orientation 120579 Therefore 119905Delay

IIis mainly determined by the microphone array distribution119865119878 and the target frequencies determining 119863119865 Due to thesymmetry of the microphone array and for the sake ofsimplicity it is assumed that each orientation has the samemax(Δ am) Notice this does not need to be true for differentarray configurations

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Journal of Sensors 11

Sum The proposed beamforming is composed of not onlya set of delay memories but also a sum tree The initiationinterval of this component is defined by the number of activemicrophones (119873am)

119905SumII = lceillog2 (119873am)rceil119865119878 (27)

Therefore 119905beamformingII is expressed as follows

119905beamformingII = 119905Delay

II + 119905SumII

= max (Δ am (120579)) sdot 119863119865 + lceillog2 (119873am)rceil119865119878

(28)

Power The final component is the calculation of the powerper orientation This simple component has a constantlatency of a couple of clock cycles

119905PowerII = 2119865119878 (29)

The timing analysis of the initiation interval of eachcomponent of the architecture gives an idea about thedesign parameters with higher impact The definition of thefilters mainly their order is determined by the applicationspecifications so it should not be modified to reduce theoverall execution time On the other hand the distribution ofthe microphones in the array affects not only the frequencyresponse of the system but also the execution time Noticehowever that the number of microphones does not havetiming impact Only the number of active microphones hasa minor impact in terms of a couple of clock cycles of dif-ference Nevertheless (21) already shows that the dominantparameters are 119905119904 and 11987311990052 Sensitive Parameters The timing analysis provides anindication of the parameters dominating the execution timeSome parameters like the microphone array distributionwhich determine the beamforming latency are fixed whileothers like 119873119900 or 119905119904 per orientation are variable

Orientations Figure 5 depicts howan increment of119873119900 leads toa better sound-source localization This resolution howeverhas a high repercussion on the response time A simplestrategy is to maintain the angular resolution only for whereit is needed while quickly exploring the surrounding soundfield For instance the authors in [3] propose a strategyto reduce the beamforming exploration to 8 orientationswith an angular separation of 45 degrees Once a steeringloop ends the orientations are rotated one position whichrepresents a shift operation in the precomputed orientationtable Therefore all the supported 64 orientations are mon-itored after 8 steering loops Despite this strategy intendingto accelerate the peak detection by monitoring the minimum119873119900 the overall 119873119900 remains the same for achieving theequivalent angular resolution

Sensing Time The sensing time is a well-known parameterof radio frequency applications The time 119905119904 is known to

strengthen the robustness against noise [23] In our casethe time a receiver is monitoring the surrounding soundfield determines the probability of properly detection ofa sound-source Consequently a higher 119905119904 is needed todetect and locate sound sources under low Signal-to-Noise(SNR) conditions Despite the fact that this term could bemodified in runtime to adapt the sensing of the array basedon an estimated SNR it would demand a continuous SNRestimation which is out of the scope of this paper

To conclude Table 2 summarizes the timing definitionsOn one hand 119905119904 determines the number of processed acousticsamples and therefore directly affects the sensing of thesystem On the other hand 119873119900 determines the angularresolution of the sound-source search and influences theaccuracy There is a trade-off between 119905119904 and 119873119900 and thequality of the sound-source location

53 Strategies for Time Reduction The following three strate-gies are proposed to accelerate the sound-source localizationwithout any impact on the frequency response and 119863119875 of thearchitecture An additional strategy is proposed specially fordynamic acoustic environments but with a certain accuracycost

531 Continuous Processing The proposed architecture isdesigned to reset the filter and beamforming stages after 119905119900due to orientation transition Thanks to beamforming afterthe filter stage the system can be continuously processingwhile resetting The filter stage does not need to stop itsprocessing The input data is not lost due to the resetoperations since the filtered input values are stored in thebeamforming stage Furthermore the initialization of thebeamforming stage can also be eliminated since the storeddata from the previous orientation can be reused for thecalculation of the new one With this approach (17) becomesas follows

119905P-SRP = 119905filtersII + 119905beamformingII + 119873119900 sdot (119905powerII + 119905119904)

asymp 119905II + 119873119900 sdot 119905119904(30)

532 Time Multiplexing Nowadays FPGAs can operate atclock speeds of hundreds of MHz Despite the fact that thepower consumption is significantly lower when operating atlow frequency [17] the proposed architecture is able to oper-ate at much higher frequency than the data sampling rateThis capability provides the opportunity to parallelize thebeamforming computations without any additional resourceconsumption Instead of consuming more logic resources byreplicating the main operations the proposed strategy simi-lar to Time-Division Multiplexing in communications con-sists in time multiplexing these parallel operations Becausethe type of the input data is oversampled audio the selectionof the operations to be time multiplexed is limited Based on(21) the candidates to be parallelized are 119873119900 and 119905119904 Since theinput data rate is determined by 119865119878 (18) shows that 119905119904 cannotbe reduced without decreasing 119873119904 or changing the targetfrequency range Nevertheless since the computation of eachorientation is data independent they can be parallelizedThe

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

12 Journal of Sensors

PDM splitter

BeamformingstagehellipFilter stagehellip Power stage

ormingage Power stage

Subarray 1

Subarray 4

middot middot middotmiddot middot middot PDMsplitter

BeamfostahellipFilter stagehellip

Subarray 1

Subarray 4

middot middot middotmiddot middot middot

FS FP

Figure 12 Clock regions for the time multiplexing of the computation of multiple 119873119900

simultaneous computation of multiple orientations is onlypossible after the beamforming operation Let us define 119905119875IIas the monitoring time before being able to process multipleorientations in parallel Therefore

119905119875II = 119905CICII + 119905DCII + 119905FIRII + 119905Delay

II (31)

After 119905119875II the delay memories which compose the Filter-and-Sum beamforming stage have already stored enoughaudio data to start locating the sound-source Because thebeamforming operation relies on delaying the recoveredaudio signal multiple orientations can be computed inparallel by accessing the content of the delay memoriesat a higher speed than the sampling of the input data Itbasically multiplexes the output beamforming computationsover time The required frequency 119865119875 to parallelize all 119873119900 forthis architecture is defined as follows

119865119875 = 119865119878 sdot 119873119900119863119865 (32)

Due to (1) 119865119875 can be also expressed based on the targetfrequency range

119865119875 asymp BW sdot 119873119900 (33)

Notice that the required frequency to multiplex in time thecomputation of the orientations does not depend on thenumber of microphones in the array Figure 12 shows theclock domains when applying this strategy While the front-end consisting of the microphone array and the filter stageoperates at 119865119878 the output of the beamforming is processedat 119865119875 The additional cost in terms of resources is theextension of the register for the power per angle calculationA memory of 119873119900 positions is required instead of the singleregister used to store the accumulated power values Thisstrategy allows fully parallelizing the computation of all theorientations Thus 119905P-SRP is mainly limited by 119873119900 and themaximum reachable frequency of the design since 119865119878 isdetermined by the microphonesrsquo operational frequency and119863119865 by the frequency range of the target sound-source In fact119863119865 determines how many orientations can be processed inparallel

533 Parallel Time Multiplexing This proposed strategy isan extension of the previous one The frequency 119865119875 is

limited by the maximum attainable operating frequency ofthe implementation which is determined by many factorsfrom the technology to the available resources on the FPGAFor instance if 119865max equals 30 kHz and the maximumattainable operating frequency is 100MHz then up to 1666orientations could be computed in parallel However if notall the resources of the FPGA are completely consumedespecially the internal blocks of memory (BRAM) thereis still space for improvement With the time multiplexingstrategy the memories of the beamforming stage are fullyaccessed since in each clock cycle there is at least onememory access or even two memory accesses when new datais stored Therefore more memory resources can be used tofurther accelerate the computation of the P-SRP The simplereplication of the beamforming stage preconfigured fordifferent orientations will be enough to double the numberof processed orientations while maintaining the same 119905P-SRPThe strategy mainly consumes BRAMs Nevertheless dueto the value of the max(Δ119898) at BW for our microphonearray only few audio samples are needed to complete thebeamforming This fact drastically reduces the memoryconsumption which provides the potential computation ofthousands of orientations by applying both strategies

All strategies can be applied independently despite thefact that some will only work properly when combinedNot all strategy combinations are beneficial For instance adynamic angular resolution should be only combined withthe time multiplexing of the orientations when 119865119875 is higherthan 119865119878 Otherwise the reduction of 119873119900 by dynamicallyreadjusting the target orientations does not provide anyacceleration and it would only degrade the response of thesystem

6 Results

The proposed architecture is evaluated in this section Ouranalysis starts evaluating different design solutions based onthe timing analysis introduced in Section 51 One repre-sentative configuration is evaluated based on the frequencyresponse and accuracy by using the metrics described in Sec-tion 35 This evaluation also considers sensitive parameterssuch as the number of active subarrays and the relevance of119873119900 already introduced in Section 52 The resource and thepower consumption for a Zynq 7020 target FPGA are also

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Journal of Sensors 13

Tim

ing

(ms)

55

5

45

4

35

3

25

Tim

ing

(ms)

555

454

353

25Tim

ing

(ms)

555

454

353

25

Freqmax (kHz)

Freqmax (kHz)

Fmax

(kH

z)11

11

1110

10

10

12

12

12

13

13

13

14

14

14

15

15

15

16

16

16

Sample rate (MHz)

Sample rate (MHz)

12

12

14

14

16

16

18

18

2

2

22

22

24

24

26

26

28

28

3

3

32

32

Sample rate (MHz)12 14 16 18 2 22 24 26 28 3 32

Figure 13 Minimum values of 119905119900 based on 119865119878 and 119865max Different perspectives are displayed in the bottom figures Notice how the shortest 119905119900is obtained when increasing 119865max and 119865119878

presented Finally the strategies presented in Section 53 areapplied for the representative design

61 General Performance Analysis The proposed perfor-mance analysis from the previous section is here applied on aconcrete exampleThe explored design parameters are 119865119878 and119865max keeping 119873119904 and 119873119900 both constant to 64 Whereas 119865119878 isdetermined by the microphonersquos sampling frequency 119865max isdetermined by the target application For our design spaceexploration we consider an 119865max from 10 kHz to 16 kHz insteps of 125Hz and 119865119878 ranges from 125MHz until 3072MHzas specified in [10]

Equations (16) to (18) and (20) to (32) are used to obtain119905P-SRPThe performance analysis starts obtaining119863119865 for everypossible value of 119865119878 and 119865max All possible combinationsof 119863CIC and 119863FIR are considered based on (15) The low-pass FIR filter parameters are 119873FIR which is determined by119863CIC and 119865max as the cut-off frequency Each possible low-pass FIR filter is generated considering a transition band of2 kHz and an attenuation of at least 60 dB at the stop bandIf the minimum order or the filter is higher than 119873FIR thefilter is discarded We consider these parameters as realisticconstraints for low-pass FIR filters Furthermore aminimumorder of 4 is defined as threshold for 119873FIR Thus some valuesare discarded because 119863119865 is a prime number or 119873FIR is below4 Each low-pass FIR filter is generated and evaluated inMatlab 2016b

Figure 13 depicts the minimum timings of the DSE thatthe proposed Filter-and-Sum architecture needs to compute

one orientation 119905119900 is slightly reduced when varying 119865119878 Forinstance it is reduced from 503ms to 397ms when 119865max =10 kHz A higher 119865119878 means a faster sampling which is infact the operational frequency limiting factor Furthermorea higher decrement of 119905P-SRP is produced when increasing119865119878 and 119865max Higher values of 119865max allow higher values of119863CIC which can greatly reduce computational complexity ofnarrowband low-pass filtering However too high values of119863CIC lead to such low rates that although a higher orderlow-pass FIR filter is supported it cannot satisfy the low-passfiltering specifications Notice how the number of possiblesolutions decreases while increasing 119865max Due to 119865119878 and 119865maxranges the values of 119863119865 vary between 39 and 154 Thoughas previously explained many values cannot be consideredsince they are either prime numbers or the decomposition infactors of119863CIC leads to values below 4 Because higher valuesof119865max lead to low values of119863CIC for low119865119878 these119863CIC valuescannot satisfy the specifications of the low-pass FIR filter

Finally relatively low values of 119905P-SRP are obtained for119865maxvalues from 10 kHz to 1065 kHz and119865119878 ranging from27MHzto 3072MHz It is produced by high values of 119863CIC whichmeans that a higher order low-pass FIR filter is supportedAs expected high values of 119863CIC lead to high order low-passFIR filters and lower119863FIR A lower 119905P-SRP is possible thanks toavoiding unnecessary computations since fewer samples aredecimated after the low-pass FIR filter

62 Analysis of a Design As shown in Figure 13 severaldesign considerations drastically affect the final performance

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 14: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

14 Journal of Sensors

4 MICsSo

und-

sour

ce fr

eque

ncy

(kH

z) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(a)

12 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0

1

08

06

04

02

50 100 150 200 250 300

(b)

28 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

Angle of arrival (degrees)0 50 100 150 200 250 300

1

08

06

04

02

(c)

52 MICs

Soun

d-so

urce

freq

uenc

y (k

Hz) 14

12

10

8

6

4

2

1

08

06

04

02

Angle of arrival (degrees)0 50 100 150 200 250 300 350

(d)

Figure 14 Waterfall diagrams of the proposed architecture The figures are obtained by enabling only a certain number of subarrays From(a) to (d) only the 4 innermost microphones only the 12 innermost microphones the 28 innermost microphones and all microphones

Table 3 Configuration of the architecture under analysis

Parameter Definition Value119865119904 Sampling frequency 2MHz119865min Minimum frequency 1 kHz119865max Maximum frequency 15625 kHz

BW Minimum bandwidth to satisfyNyquist 3125 kHz

119863119865 Decimation factor 64119863CIC CIC filter decimation factor 16119873CIC Order of the CIC filter 2119863FIR FIR filter decimation factor 4119873FIR Order of the FIR filter 16

However most of these design decisions do not have asignificant impact on the system response compared to otherfactors such as the number of active microphones or thenumber of orientations The analysis of impact of theseparameters on the systemrsquos response and performance is doneover one particular design

Table 3 summarizes the configuration of the architectureThe design considers 119865119904 = 2MHz which is the clock for themicrophones and the functional frequency of the designThisvalue of 119865119904 is the intermediate value between the requiredclock signals of the ADMP521microphones [10]The selected

cut-off frequency is 119865max = 15625 kHz which leads to 119863119865 =64 In this example design 119873CIC = 4 with a decimation factorof 16 and a differential delay of 32 The chosen FIR filter has abeta factor of 27 and a cut-off frequency of119865max at a samplingrate of 125 kHz which is the sampling rate obtained after theCIC decimator filter with a 119863CIC = 16 The filtered signalis then further decimated by a factor 119863FIR = 4 to obtain aBW = 31250 kHz audio signal

The architecture is designed to support a completesteering loop up to 64 orientations which represents anangular resolution of 5625∘ On the other hand the subarrayapproach allows activating the 52 microphones if all the4 subarrays are active The final results are obtained byassuming a speed sound of asymp3432ms

621 Frequency Response The waterfall diagrams of Fig-ure 14 show the power output of the combined subarraysin all directions for all frequencies In our case the resultsare calculated with a single sound-source varying between100Hz and 15 kHz in steps of 100Hz and placed at 180∘All results are normalized per frequency Every waterfallshows a clear distinctive main lobe When only subarray 1is active there are side lobes at 53 kHz and 106 kHz whichimpede the sound-source location for those frequencies Thefrequency response of the subarrays improves when they arecombined since their frequency responses are superposedThe combination of the subarrays 1 and 2 reaches a minimum

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 15: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Journal of Sensors 15

Sound-source frequency (Hz)

0

10

20

30

40

50

608 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(a)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

0

10

20

30

40

50

6016 orientations

Sound-source frequency (Hz)103 104

Dp

(b)

0

10

20

30

40

50

60 32 orientations

Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

103 104

Dp

(c)

0

10

20

30

40

50

60 64 orientations

Inner 4 MICsInner 12 MICsInner 28 MICs

All 52 MICsThreshold

Sound-source frequency (Hz)103 104

Dp

(d)

Figure 15 Directivities when considering a variable number of orientations and active microphones From (a) to (d) 119863119875 with only 8orientations up to 64 orientations on (d)

detectable frequency of 31 kHz when combining subarrays1 2 and 3 and all subarrays reach 21 kHz and 16 kHzrespectively These minimum values are clearly depicted inFigure 15 with a threshold of 8 for 119863119875 which indicatesthat the main lobersquos surface corresponds to maximally halfof a quadrant The frequency response of the combinationof subarrays has a strong variation at the main lobe andtherefore in 119863119875 Figure 15 depicts the evolution of 119863119875when increasing the angular resolution and when combiningsubarrays The angular resolution determines that the upperbound 119863119875 converges which is dependent on the numberof orientations The number of active microphones onthe other hand influences how fast 119863119875 converges to itsupper limit Consequently the number of activemicrophonesdetermines the minimum frequency which can be locatedwhen considering a threshold of 8 for 119863119875 Alongside thedirectivity other metrics such as the main beamwidth and

theMSL levelsmetrics are also calculated to properly evaluatethe quality of the arrayrsquos response Figure 16 depicts the MSLwhen varying the number of active subarrays and the numberof orientations A low angular resolution leads to a lowerresolution of the waterfall diagrams but only the metrics canshow the impact At frequencies between 1 and 3 kHz themain lobe converges to a unit circle which can be explainedby the lack of any side lobe Higher frequencies presentsecondary lobes especially when only the inner subarrayis active which increases the MSL values independently ofthe angular resolution A low angular resolution leads tounexpected low values of MSL since the secondary lobes arenot detected On the other hand a higher number of activemicrophones lead to lower values of MSL independently ofthe angular resolution

Figure 17 depicts the BWminus3 dB metric for a similar analysisof the number of microphones and angular resolution On

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 16: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

16 Journal of Sensors

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(a)

16 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(b)

32 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(c)

64 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

10000 14000120008000600040002000Sound-source frequency (Hz)

Max

imum

side

lobe

leve

l (dB

)

0

minus10

minus20

minus30

minus40

minus50

minus60

minus70

minus80

minus90

minus100

minus110

(d)

Figure 16 Measured MSL when considering a variable number of orientations and active microphones From (a) to (d) the MSL with only8 orientations up to 64 orientations on (d)

one hand a higher number of microphones produce afaster decrement of BWminus3 dB reflected as a thinner mainlobe Nevertheless BWminus3 dB of each subarray converges to aminimum which is only reached at higher frequencies Theangular resolution determines this minimum which rangesfrom 90∘ till 1125∘ when 8 or 64 orientations are consideredrespectively

622 Resource Consumption and Power Analysis Table 4summarizes the resource consumption when combiningsubarrays The consumed resources are divided into theresources for the filter stage the beamforming stage andthe total consumption per groups of subarrays The filterstage mostly consumes DSPs while the beamforming stagemainly demands BRAMs Most of the resource consumptionis dominated by the filter stage since a filter chain is dedicated

to each MEMs microphone What determines the resourceconsumption is the number of active subarrays

The flexibility of our architecture allows the creation ofheterogeneous source-sound locators Thus the architecturecan be scaled for small FPGAs based on the target sound-source profile or a particular desirable power consumptionFor instance the combination of the two inner subarrayswould use 12 microphones while consuming less than 10 ofthe available resources The LUTs are the limiting resourcedue to the internal registers of the filters In fact when allthe subarrays are used around 80 of the available LUTsare required Nevertheless any subarray can be disabled inruntime which directly deactivates its associated filter andbeamforming components Although this does not affectthe resource consumption it has a direct impact over thepower consumption Table 5 shows the power consumption

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 17: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Journal of Sensors 17

10000 120008000600040002000Sound-source frequency (Hz)

0

50

100

150

200

250

300

350

8 orientations

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

Beam

wid

th at

minus3

dBs (

degr

ees)

(a)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

16 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(b)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

32 orientations

0

50

100

150

200

250

300

350

Beam

wid

th at

minus3

dBs (

degr

ees)

(c)

10000 120008000600040002000Sound-source frequency (Hz)

Inner 4 MICsInner 12 MICs

Inner 28 MICsAll 52 MICs

0

64 orientations

50

100

150

200

250

300

350Be

amw

idth

atminus3

dBs (

degr

ees)

(d)

Figure 17 Measured BWminus3 dB level when considering a variable number of orientations and active microphones From (a) to (d) the BWminus3 dBwith only 8 orientations up to 64 orientations on (d)

in mW based on the number of active subarrays The powerconsumption of the microphones is also considered sincethe FPGA and the microphone array are powered from thesame source Thus the overall power consumption mustbe considered since the architecture is designed for anembedded system The MEMS microphones are poweredwith 33 volts which represents a power consumption permicrophone of 264 120583W and 396mW for the inactive andactive microphones respectively Notice how the powerconsumption increases with the number of active subarraysThere is a turning pointwhen 3 or 4 subarrays are activeThusthe microphone array consumes more power than the FPGAwhen all the subarrays are active

623 Timing Analysis The timing analysis based on Sec-tion 5 of the design under evaluation is summarized inTable 6 A complete steering loop requires around 169ms

while 119905119900 rounds to 26ms Notice that the initialization (119905II)consumes around 215 of the execution time Fortunatelythis initialization can almost be completely removed whenapplying the first strategy described in Section 531

Table 7 summarizes the timing results when applyingthe first strategies proposed in Section 5 The eliminationof the initialization after each orientationrsquos transition slightlyreduces 119905P-SRP In this case 119905P-SRP is expressed as follows

119905P-SRP = 119905II + 119873119900 sdot 119905119904 (34)

The main improvement is obtained after time multiplexingthe computation of the power per orientations In this case119865119875 the operational frequency of the beamforming compu-tation to process all 119873119900 in parallel equals 119865119878 as expressedin (32) This is possible because 119863119865 and 119873119900 have the samevalue Therefore there is no need to have a different clockfor the beamforming operation since the spacing between

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 18: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

18 Journal of Sensors

Table 4 Resource consumption after placement and routing when combining microphone subarrays Each subarray combination details theresource consumption of the filter and the beamforming stage

Resources Available Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICsResources Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total Filters Beamforming Total

Sliceregisters 106400 5043 626 6144 14859 1540 16882 34489 3195 38183 54042 4447 59093

Slice LUTs 53200 3612 344 4732 10759 754 12299 25032 1486 27318 37221 2221 42319LUT-FF 86689 2329 199 2773 7013 512 7779 16353 1069 17698 23656 1664 27619BRAM 140 0 2 2 0 6 6 0 14 14 0 22 22DSP48 220 8 4 12 24 4 28 56 4 60 88 4 92

Table 5 Power consumption at 119865119904 = 2MHz expressed in mW when combining microphone subarrays Values obtained from the Vivado20164 power report

Active MEMS microphones Reported on-chip power TotalSubarrays Active Inactive Total Static Dynamic Total PowerInner 4 MICs 1584 013 1597 120 2 122 13797Inner 12 MICs 4752 011 4763 120 5 125 17263Inner 28 MICs 11088 006 11094 121 11 132 24294All 52 MICs 20592 0 20592 122 16 138 34392

Table 6 Timing analysis without any optimization of the designunder evaluation The values are expressed in 120583sParameter Definition Values [120583s]119905CICII Initiation interval of the CIC filter 45

119905DCII

Initiation interval of the removed DCblock 9

119905FIRII Initiation interval of the FIR filter 72119905DelayII Initiation interval of the delay memories 480

119905SumII Initiation interval of the cascaded sums 35

119905PowerIIInitiation interval of the power

calculation 1

119905filtersII Initiation interval of the filter stage 855

119905beamformingII

Initiation interval of the beamformingstage 4845

119905powerII Initiation interval of the power stage 1119905II Sum of all initiation intervals 571119905119904 Sensing time 2048119905119900 Execution time of one orientation 2650

119905P-SRP Time required to obtain a polar powermap 169600

output filtered values from the filter stage is large enough Bycombining the first two strategies 119905P-SRP rounds to 2ms andonly the first steering loop needs 26ms due to 119905119875II In this case119905P-SRP is expressed as follows

119905P-SRP = 119905119875II + 119905119904 asymp 119905119904 (35)

The other two strategies proposed in Section 531 aredesigned to fully exploit the FPGA resources and to overcome

time constraints when considering a high angular resolutionIn the first case since the design under evaluation has a smallangular resolution (119873119900 = 64) there is no need for a higher119865119875 when applying the time multiplexing strategy However ahigher angular resolution can be obtained when consideringthe unconsumed resources without additional timing costTable 8 shows the combination of strategies increases theangular resolution without additional time penaltyThe oper-ational frequency (119865op) determines at what speed the FPGAcan operate By following (33) the beamforming operationcan be exploited by increasing 119865119875 up to the maximumfrequency which increases 119873119900 as well

max (119873119900) = max (119865op)BW

= 119865119875BW

(36)

Many thousands of orientations can be computed inparallel when combining all strategies The beamformingstage can be replicated as many times as the remainingavailable resources allow Of course this estimation is cer-tainly optimistic since the frequency drops when the resourceconsumption increases Nevertheless this provides an upperbound for 119873119900 For instance when only the inner subarray isconsidered the DSPs are the limiting component Howeverup to 53 beamforming stages could be theoretically placedin parallel When more subarrays are active the BRAMsare the constrained component Notice how the number ofsupported orientations increases if the number of subarraysdecreases It has however an impact on the frequencyresponse and the accuracy of the system as shown inSection 621 Nevertheless tens of thousands of orientationscan be computed in parallel consuming only around 2msby operating at the highest 119865op and by replicating thebeamforming stage to exploit all the available resources

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 19: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

Journal of Sensors 19

Table 7 Timing analysis of the optimized designs when applying and combining the first two strategies The values are expressed in ms

Initial Continuous Time multiplexing Continuous time multiplexing119905P-SRP 1696ms 1316ms 26ms 2ms

Table 8 Maximum 119873119900 when combining strategies The maximum number of beamformers is obtained based on the available resources andthe resource consumption of each beamformer (Table 4)Themaximum119865op is reported by the Vivado 20164 tool after placement and routing

Continuous time multiplexing Parallel continuous time multiplexingInner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs Inner 4 MICs Inner 12 MICs Inner 28 MICs All 52 MICs

max beamformers mdash mdash mdash mdash 55 23 10 6max119865op 9562MHz 9327MHz 9197MHz 8791MHz 9562MHz 9327MHz 9197MHz 8791MHzmax119873119900 3059 2984 2943 2813 168292 68650 29430 16879

7 Conclusions

In this paper we have presented a scalable and flexiblearchitecture for fast sound-source localization On one handthe architecture can flexibly disable sections of the micro-phone array that are not needed or disable them to respectpower restrictionsThemodular approach of the architectureallows scaling the system for a larger or smaller number ofmicrophones Nevertheless such capabilities do not impactthe frequency and accuracy of our sound-source locator Onthe other hand several strategies to offer real-time sound-source localization have been presented and evaluatedThesestrategies not only accelerate but also provide solutionsfor those time stringent applications with a high angularresolution demandThousands of angles can be monitored inparallel offering a high-resolution sound-source localizationin a couple of milliseconds

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the European Regional Devel-opment Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Pro-gramme 2014ndash2020 through the ERDF-2020 Project ICITY-RDIBRU

References

[1] E Zwyssig M Lincoln and S Renals ldquoA digital microphonearray for distant speech recognitionrdquo in Proceedings of the Inter-national Conference on Acoustics Speech and Signal Processing(ICASSP rsquo10) pp 5106ndash5109 IEEE Dallas Tex USA March2010

[2] A Abdeen and R Laxmi ldquoDesign and performance of areal-time acoustic beamforming systemrdquo in Proceedings of the12th SENSORS rsquo13 Conference IEEE Baltimore MD USANovember 2013

[3] B da Silva L Segers A Braeken and A Touhafi ldquoRuntimereconfigurable beamforming architecture for real-time sound-source localizationrdquo in Proceedings of the 26th InternationalConference on Field-Programmable Logic and Applications (FPLrsquo16) IEEE Lausanne Switzerland September 2016

[4] Y Zhang and S Baobin ldquoSound source localization algorithmbased onwearable acoustic counter-sniper systemsrdquo in Proceed-ings of the 5th International Conference on Instrumentation andMeasurement Computer Communication and Control IMCCCrsquo15 pp 340ndash345 IEEE Qinhuangdao China September 2015

[5] J Sallai W Hedgecock P Volgyesi A Nadas G Balogh and ALedeczi ldquoWeapon classification and shooter localization usingdistributed multichannel acoustic sensorsrdquo Journal of SystemsArchitecture vol 57 no 10 pp 869ndash885 2011

[6] T Inoue R Imai Y Ikeda and Y Oikawa Hat-type hearingsystem using MEMS microphone array 2016

[7] Z I Skordilis A Tsiami P Maragos G Potamianos L Spel-gatti andR Sannino ldquoMultichannel speech enhancement usingMEMS microphonesrdquo in Proceedings of the 40th InternationalConference on Acoustics Speech and Signal Processing ICASSPrsquo15 pp 2729ndash2733 IEEE Brisbane Australia April 2014

[8] I Salom V Celebic M Milanovic D Todorovic and J PrezeljldquoAn implementation of beamforming algorithm on FPGAplatform with digital microphone arrayrdquo in Proceedings of the138th Audio Engineering Society Convention AES rsquo15 AudioEngineering Society New York Ny USA May 2015

[9] J Tiete F Domınguez B da Silva L Segers K Steenhaut andA Touhafi ldquoSoundCompass a distributed MEMS microphonearray-based sensor for sound source localizationrdquo Sensors vol14 no 2 pp 1918ndash1949 2014

[10] Analog Devices ldquoADMP521 datasheetUltralow noise micro-phone with bottom Port and PDM digital outputrdquo TechnicalReport Analog Devices Norwood MA USA 2012

[11] Texas Instruments ldquoHow delta-sigma ADCs workrdquo Tehcni-cal report Texas Intruments httpwwwticomlitanslyt423slyt423pdf

[12] D H Johnson and D E Dudgeon Array Signal ProcessingConcepts and Techniques Simon amp Schuster New York NYUSA 1992

[13] J J Christensen and J Hald ldquoTechnical Review BeamformingrdquoTech Rep Bruel amp Kjear Danmark 2004

[14] J H DiBiase A high-accuracy low-latency technique for talkerlocalization in reverberant environments using microphone

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 20: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

20 Journal of Sensors

arrays [Phd thesis] Brown University Providence RI USA2000

[15] J H DiBiase H F Silverman and M S Brandstein ldquoRobustlocalization in reverberant roomsrdquo in Microphone Arrays pp157ndash180 Springer Berlin Germany 2001

[16] M J Taghizadeh P N Garner and H Bourlard ldquoMicrophonearray beampattern characterization for hands-free speech appli-cationsrdquo in Proceedings of the 7th Sensor Array andMultichannelSignal Processing Workshop SAM rsquo12 pp 465ndash468 IEEEHoboken NJ USA June 2012

[17] H Blasinski F Amiel and E Thomas ldquoImpact of differentpower reduction techniques at architectural level on modernFPGAsrdquo in Proceedings of the Latin American Symposium onCircuits and Systems LASCAS Stanford University StanfordCalif USA 2010

[18] E Hogenauer ldquoAn economical class of digital filters for decima-tion and interpolationrdquo IEEE Transactions on Acoustics Speechand Signal Processing vol 29 no 2 pp 155ndash162 1981

[19] MPDonadioldquoCICfilter introductionrdquo2000 httpspdfsseman-ticscholarorg5bf748fbdeb1ff68a2407c0ccfd58b816e9937d5pdf

[20] N Hegde ldquoSeamlessly interfacing MEMs microphones withblackfin processorsrdquo EE-350 Engineer-to-Engineer Note 2010

[21] G J Dolecek and J Diaz-Carmona On Design of CIC Decima-tors INTECH Open Access Publisher 2011

[22] R Lyons ldquoUnderstanding cascaded integrator-comb filtersrdquoEmbed System Program vol 18 no 4 pp 14ndash27 2005

[23] T E Bogale L Vandendorpe and L L Bao ldquoSensing through-put tradeoff for cognitive radio networks with noise varianceuncertaintyrdquo in Proceedings of the 9th International Conferenceon Cognitive Radio Oriented Wireless Networks CROWNCOMrsquo14 pp 435ndash441 IEEE Oulu Finland June 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 21: Design Considerations When Accelerating an FPGA-Based ...downloads.hindawi.com › journals › js › 2017 › 6782176.pdf · Design Considerations When Accelerating an FPGA-Based

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of