Real Time Reproduction of Moving Sound Sources by Wave Field Synthesis: Objective and Subjective Quality Evaluation

Audio Engineering Society

Convention PaperPresented at the 130th Convention2011 May 13–16 London, UK

The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that havebeen peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced fromthe author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takesno responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio

Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rightsreserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from theJournal of the Audio Engineering Society.

Real-time Reproduction of Moving SoundSources by Wave Field Synthesis:Objective and Subjective Quality Evaluation

Michele Gasparini1, Paolo Peretti1, Stefania Cecchi1, Laura Romoli1, Francesco Piazza1

1A3Lab - DIBET - Universita Politecnica delle Marche, Via Brecce Bianche, 60131 Ancona, Italy

Correspondence should be addressed to Michele Gasparini ([email protected])

ABSTRACTWave Field Synthesis (WFS) is an audio rendering technique that allows the reproduction of an acousticimage over an extended listening area. In order to achieve a realistic sensation, true representation of movingsound sources is essential. In this paper, a real time algorithm that significantly reduces computationalefforts in the synthesis of WFS driving functions in presence of moving sound sources is presented. Highefficiency can be obtained taking into account a model based on phase approximation and Short Time FourierTransform (STFT). The influence of the streaming frame size and of the source velocity on well known soundfield artifacts has been studied, considering PC simulations and listening tests.

1. INTRODUCTIONThe goal of every spatial audio technique is the

true reproduction of the acoustic image. Over theyears Wave Field Synthesis (WFS) has been result-ing in one of the most effective method on reach-ing high spatial precision, especially in videoconfer-encing and entertainement systems. However, WFShigh computational load due to the elevate numberof loudspeakers and processing channels that haveto be used in order to obtain sufficient quality for

the reproduced field, constitutes a limit in its ap-plicability. Real time algorithms capable of reduc-ing the computational complexity are hence essen-tials. Efficient algorithm have been presented overthe years for the synthesis of static sources fields[1, 2, 3, 4]. On the contrary, reproduction of movingsound sources is still problematic, even if it is of pri-mary importance for a true spatial reconstruction ofthe sound scene. The movement entails the need ofmaking use of time varying filters with a significant

Gasparini et al. Real-time Reproduction of Moving Sound Sources

P

|~r0 − ~rn|~n

ϕ

SV

Sound Sources

Fig. 1: Geometry used for the Kirchhoff-Helmholtzintegral.

increase of computational load [3]. In this papera real time WFS algorithm capable of reproducingmoving sound sources is presented. Each source tra-jectory is approximated with two virtual sources atthe same time. High efficiency is obtained by phaseapproximation and fractional delays. In conclusionquality evaluations obtained by mean of computersimulations and listening tests are reported.In section 2 a brief summary of the traditional WFStheory is presented, together with efficient process-ing techniques used to reduce computational load.Section 3 analyze the theory of moving sources. Aspecific formulation of the sound field is derivedstarting from Green’s functions. Common solutionsfor the synthesis of movement by WFS are then re-ported and a novel approach is presented, in orderto overcome some specific issues. The quality of theproposed algorithm is evaluated in section 4 bothwith PC simulations and preliminary subjective lis-tening tests.

2. STATIC SOURCES WFS THEORYWFS is a multichannel technique effective in repro-

ducing the sound field of a given virtual source overan extended listening area, overcoming the classicstereophonic sweet spot limitation. The improve-ment of the acoustical image is achieved using ar-rays of loudspeakers controlled by particular signalscalled “driving functions” in WFS context . TheWFS theory is based on the Kirchhoff-Helmholtz in-

tegrals which state that “the sound field inside a vol-ume can be calculated if the pressure field and nor-mal particle velocity due to the primary sources onthe enclosing surface are known” [1]. With referenceto Figure 1, considering a sound wave propagatingin a volume V enclosed by a surface S, the Kirchhoff-Helmoltz integral is given by

P (~r0, ω) =1

4π

∮S

P (~r, ω)∂G~r0(~r, ω)

∂ndS +

− 1

4π

∮S

jkρ0cVn(~r, ω)G~r0(~r, ω) dS

(1)

where ~r0 and ~r denote the generic position inside Vand on S respectively, ω is the angular frequency,k is the wave number, c is the sound velocity, ρ0is the medium density, P is the sound pressure andVn is the particle velocity normal to S. G~r0(~r, ω)is the Green function which synthesizes the sec-ondary sources located in ~r0. The Green functionis not unique. The simplest Green functions are themonopole solutions of the wave equation for a sourceat position ~r0

G~r0(~r, ω) =e−jk|~r−~r0|

|~r − ~r0|(2)

G~r0(~r, ω) =ejk|~r−~r0|

|~r − ~r0|. (3)

By inserting (2) in (1), the 3D forward Kirchhoff-Helmoltz integral is obtained:

P (~r0, ω) =1

4π

∮S

jkρ0cVn(~r, ω)e−jk|~r−~r0|

|~r − ~r0|+

+1

4π

∮S

P (~r, ω)1 + jk |~r − ~r0||~r − ~r0|

cosϕe−jk|~r−~r0|

|~r − ~r0|dS

(4)

where ϕ is the angle between the normal vector toS and the vector ~r − ~r0 (Figure 1). From (4) it canbe seen that the sound field inside V is obtained bya distribution of monopole and dipole sources [1].Similarly, by inserting (3) in (1), the 3D inverseKirchhoff-Helmoltz Integral is obtained. Consider-ing a linear distribution of monopole sources, theresulting field is expressed by the two-dimensional

AES 130th Convention, London, UK, 2011 May 13–16

Page 2 of 10


∆x

~rm

~rn

~r0

θn

ϕ

Sound Source

Listening Area

Loudspeakers array

x

y

Fig. 2: Geometry used for the driving functions.

Rayleigh I integral [2]:

PWFS(~r0, ω) =

+∞∫−∞

Qm(~rn, ω)e−jk|~r0−~rn|

|~r0 − ~rn|dx (5)

where ~r0 and ~rn are the receiver and the speakerpositions respectively, according to Figure 2. Qm isa fundamental function, called driving function inWFS context, and is given by

Qm(~rn, ω) =S(ω)

Dn(ϕ, ω)

√jk

2π×

×√

|y0 − yn||y0 − yn|+ |ym − yn|

cos θne−jk|~rm−~rn|√|~rm − ~rn|

(6)

where S(ω) is the Fourier transform of the sourcesignal, ϕ is the angle between the vector normal tothe loudspeakers array ~n and the vector |~r0 − ~rn|,θn is the angle between the vector normal to theloudspeakers array ~n and the vector |~rn − ~rm| andDn(ϕ, ω) is the directivity function for the n-th loud-speaker. In the following the loudspeakers will beconsidered omnidirectional, so Dn(ϕ, ω) will be as-sumed equal to 1.

2.1. Phase approximationIt is worth underlying that equation (6) is composedof a time-varying and a time-unvarying part [5]. Incase of non moving sources the first component isjust given by S(ω). Therefore, the WFS technique

~rm(t)

~r0

~vm

~rm(t)

~vm

Wave Field

Sound Source

α

x

y

Fig. 3: Geometry of the field of a moving monopolesource.

in the case of linear geometry, becomes the appli-cation of N filters, one for each loudspeaker, to thesource stream S(ω) [5]. The filters frequency re-sponse Fn(ω) is given by the time invariant part of(6), weighted by ∆x and it can be seen as a productof two functions:

Fn(ω) = F |·|n Fφn (7)

where F|·|n and Fφn weigh on the module and phase

behavior, respectively. The first part is given by

F |·|n (ω) =|cos θn|√|~rm − ~rn|

√|ω|2πc

√|y0 − yn||y0 − ym|

∆x (8)

because k = ω/c and |j| = 1. As regards Fφn , con-sidering both sources behind and in front of the linearray, it is given by

Fφn (ω) = ae−jak|~rm−~rn|√ja sgn(ω) (9)

where sgn(·) represents the sign function and a isequal to 1 or -1 in case of virtual sources behindor in front of the array. Efficient methods for thecalculation of the phase term are presented in [4].The advantage of these techniques is given by thefilter impulse response to be bounded in a small timerate, therefore allowing filters truncation. The driv-ing functions can be implemented in a more opti-mized way being composed by a simple time delayand a FIR filter [4].

3. MOVING SOURCES WFSTraditional WFS theory can not represent the


Page 3 of 10


−2 0 2−2

0

2

4

x [m]

y[m

]

−1

−0.5

0

0.5

1

(a) Sound field

0.05 0.1 0.15

500

1000

1500

t [s]

f[H

z]

−60

−40

−20

0

[dB]

(b) Spectrogram

Fig. 4: Sound field of a source moving at 80 m/s parallel to the x-axis toward positive x direction aty = −1 m. Source is oscillating harmonically at a frequency of 500 Hz. Spectrogram is misured at ~robs =[0, 1.5 m].

sound field of a moving source correctly. Reproduc-tion of movement is challenging both for the natureof the physical phenomena related to moving sourcesand for the limits of WFS theory. There are somepeculiar aspects that have to be taken into accountin order to reproduce a correct acoustic image.

3.1. Field of a moving monopole sourceIn case of moving sound sources, Green’s functions(2) and (3) have to be modified in order to take intoaccount the new geometry (Figure 3). Consideringa sound source s0 moving with a velocity ~vm, theresulting field is obtained starting from the theory ofthe Lienard-Wiechert potentials for electromagneticfields and is given by [6]

P (~r0, t) =

∞∫τ=τmin

δ(τ)

4π×

× S0(t)∣∣~r0 − ~rm(t)∣∣− ~vm(t) · [~r0 − ~rm(t)]/c

dτ

(10)

witht = t− T (~r0, t, τ) (11)

and

τmin = −|~r0 − ~rm(t)|c

. (12)

The quantity t is defined as retarded time and rep-resents the time of emission of the wave front that

reaches ~r0 at the time t. According to the siftingproperty of the Dirac delta distribution the solutionof (10) is

P (~r0, t) =S0(t0)

4π×

× 1∣∣~r0 − ~rm(t0)∣∣− ~vm(t0) · [~r0 − ~rm(t0)]/c

(13)

where · indicates the scalar product between vectorsand

t0 = t− T (~r0, t, 0) (14)

in which T (r0, t, 0) is the unique solution of

T (~r0, t, 0)− |~r0 − ~rm[t− T (~r0, t, 0)]|c

= 0 (15)

Introducing the Mach factor M defined as v/c, equa-tion (13) can be rewritten as:

P (~r0, t) =S0(t0)

4π∣∣~r0 − ~rm(t0)

∣∣ [1−M cosϑ(t0)]. (16)

An example of the sound field generated by a movingpoint sound source is represented in Figure 4. Thenoticeable wave length variation along the move-ment direction is referred to as Doppler effect andit’s a tipical feature of the field of moving sourcescaused by the invariance of sound velocity. It con-sists on a frequency variation perceived by a listener


Page 4 of 10


relatively moving in respect to the source. The ob-served frequency is given by

fobs =c± voc± vs

f (17)

where fobs is the observed frequency, c is the soundspeed, v0 and vs are the observer and source velocityrespectively and f is the emitted frequency. Hence,a listener will experience a frequency fobs < f whilethe source is moving away from him and a frequencyfobs > f when the sound source is approaching.

3.2. Stationary formulationIn order to reproduce a moving sound source withstatic WFS driving functions, common approachessample the source trajectory reducing the calcula-tion to a stationary problem, without explicitly con-sidering the features of the sound field. In pres-ence of movement, the static driving function (6)becomes:

Qm( ~rn, ω, t) =S(ω)

Dn(ϕ, ω)

√jk

2π×

×√|y0 − yn||y0 − ym(t)| cos θn

e−jk|~rm(t)−~rn|√|~rm(t)− ~rn|

.

(18)

Resulting filters are time varying and are not suit-able for a real-time implementation. In order toachieve computational efficiency source signal is di-vided into frames and the source trajectory is sam-pled. In this way driving function (18) resumes to(6) and can be evaluated as showed in section 2.1.Nevertheless, this approach leads to heavy phase dis-continuities between frames and can not correctlyrepresent particular features like Doppler effect. Anaccurate analysis of the main artifacts deriving fromthis formulation is presented in [7]. Moreover, in [8]it is shown how a specific formulation of secondarysources driving functions, directly deriving from thewave field of the moving source, is necessary in or-der to achieve a correct representation of movementcharacteristics.

3.3. Proposed approachTraditional formulation of WFS is not suitable to

correctly represent moving sound sources due to twomain reasons:

• The source sampled trajectory causes heavyphase discontinuities on the output signals, re-

−2 0 2

0

2

4

x [m]

y[m

]

−1

−0.5

0

0.5

1

Fig. 5: Sound field of a source moving at 80 m/sparallel to the x-axis toward positive x direction aty = −1 m reproduced with conventional WFS, os-cillating harmonically at 500 Hz. The array is com-posed by 64 loudspeakers equally spaced by 5 cmand is placed at y = 0. Frame size is 256.

sulting in audible artifacts. However the dis-cretization of the source positions is essential inorder to calculate the driving functions in a realtime scenario.

• Traditional driving functions do not expresslytake into account the Mach factor M, althoughthis variable is highly related to the resultingfield.

Even if in [8] a specific moving source formulation ofthe WFS theory is presented, at the authors knowl-edge there are no feasible solutions for real time sce-narios presented in literature up to now. The basicidea of the proposed approach is the approximationof each moving virtual source with two sources. Theformer is the virtual source itself and the latter is acopy of the virtual source, named retarded source,whose position is delayed by a frame-time. Thesources audio signals are combined using a cross-fading technique based on ramp functions. A similarsolution has been adopted in [9] for the computa-tion of Head Related Transfer Function in binauralreproduction. This method allows a smooth changebetween frames, resulting in suppressed phase jumpson the output signals. A scheme of the entire sig-nal processing is reported in Figure 6. Signal elab-oration is composed of two different filtering oper-ation. The module is derived with a Short Time


Page 5 of 10


Fig. 6: Processing to synthesize the driving signal of a loudspeaker.

Fourier Transform (STFT) framework and then afractional delay is applied by Overlap and Save tech-nique (OLS). The frame signal is the same for thevirtual source and for the retarded source, multipliedby an ascending ramp function and by a descendingone, respectively. Source positions are sampled atthe end of the ramp. Each source signal is frequencyfiltered with a modified driving function. The signalFourier transform is obtained by STFT in order toreduce computational load. The driving functionsmodule (8) described in 2.1 has been modified, ac-cording to (16), in order to better fit the movingsource behavior and is given by

Q|·|m(~rn, ω) =|cos θn|√|~rm − ~rn|

√|ω|2πc×

×√|y0 − yn||y0 − ym|

∆x

1−M cosα

(19)

where ~rm points to the current frame position forthe virtual source and to the previous frame posi-tion for the retarded source. α represents the anglebetween the vectors ~vm and ~rn−~rm (Figure 3). Thedriving function phase (9) is applied to the signalsusing fractional delays and shift registers. Efficientstructures for the computation of fractional delays inWFS systems can be found in [10]. The two sourcessignals obtained in this way are summed togetherin the output streaming. The processing has to berepeated for each loudspeaker.

4. OBJECTIVE AND SUBJECTIVE RESULTSThe proposed algorithm has been evaluated

with both PC simulations and subjective listeningtests. Reproduced fields have been represented inMatlab R©environment in order to analyze the influ-ence of source velocity and frame size on the fieldartifacts. Figure 8 shows the sound field generated


Page 6 of 10


by a linear array of 20 loudspeakers. It is easy tonotice how higher source velocities require a shorterframe. Figure 9 shows the spectrograms of the sig-nals synthesized. It’s possible to notice the presenceof original signal copies shifted in frequency. Any-way this duplicates are about 50dB under the orig-inal signal. It is worth underlying how the signalrepetitions get closer to the original one at the in-creasing of frame size. From the figures it is alsopossible to see that the Doppler effect follows theexpected behavior.

4.1. Listening testsThe listening test has been executed following amodified version of the ITU-R BS.1534-1 (“Mushratest”)[11] without hidden reference as in [12, 13].In order to easily calculate the driving functions atthe variation of some parameters, the processing hasbeen developed in a real time framework called NU-Tech [14]. The algorithm has been used to drive alinear array of 16 loudspeakers equally spaced by10 cm for a total of 1.6 m in length. The testshave been carried out in a non anechoic room of6 m×6 m×3.5 m with tipical office furnitures (desk,chairs, windows). A complete list of the used mate-rial is reported:

• PC with Intel Core 2 @ 2.5 GHz with MicrosoftWindows 7 (32 bit).

• External audio interface MOTU 24 I/O PCI.

• M-Audio Speaker Studiophile AV20.

• Software NU-Tech ver. 2.3 beta.

The reproduced field has been proposed to a groupof eleven experienced listeners, composed by twowomen and nine men aged between 20 and 40, whohave been asked to evaluate the overall likelihood ofthe virtual source movement. They were asked toconcentrate on velocity perception and Doppler ef-fect. Listeners have been asked to evaluate two dif-ferent signals moving with a circular trajectory cen-tered 30 m behind the array with a radius of 15 m.Source velocity has been randomly changed between30, 60 and 100 km/h. Ratings have been reportedon a linear scale (CQS) between 0 (bad) and 100(excellent).Figure 7 shows the results of the listening test. The

30 60 100 Mean0

20

40

60

80

100

v [km/h]

Ratings

Fig. 7: Perceived quality of the algorithm at varioussource velocities. Mean value represents the meanrating of the three previous tests.

algorithm has obtained an overall good rating byall of the subjects, although the reproduced signalsquality decreases with the increasing of the sourcevelocity as we expected from theory and PC simula-tions. Anyway the executed listening test has to beconsidered as a preliminary test, further trials haveto be done for a final evaluation.

5. CONCLUSIONSIn this paper a novel approach to moving sourceWFS has been presented. Real time implementationhas been taken into account with particular atten-tion to obtain a good movement reproduction qual-ity. The proposed approach is based on the synthe-sis of two virtual sources, moving together spaced bythe distance covered in a frame time. Each sourcereproduce the same signal weighted by an increasingand a decreasing ramp function, respectively. Thealgorithm quality has been evaluated both with PCsimulations and listening tests and the results havebeen presented. Tests showed an overall good per-ceived quality. Future development will concern thefurther reduction of the computational load and adeeper investigation on psychoacoustic perception ofthe movement.

6. REFERENCES

[1] A. J. Berkhout, D. De Vries, and P. Vogel,“Acoustic Control by Wave Field Synthesis,”J. Acoust. Soc. Amer., vol. 93, no. 5, pp. 2764–2778, May 1993.


Page 7 of 10


[2] E. N. G. Verheijen, “Sound Reproductionby Wave Field Synthesis,” Ph.D. dissertation,Delft University of Technology, Delft, TheNetherlands, 1997.

[3] A. Franck, K. Brandenburg, and U. Richter,“Efficient Delay Interpolation for Wave FieldSynthesis,” in Proc. 125th Audio EngineeringSociety Convention, New York, NY, USA, Oct.2008.

[4] P. Peretti, L. Romoli, S. Cecchi, L. Palestini,and F. Piazza, “Phase Approximation of LinearGeometry Driving Functions for Sound FieldSynthesis,” in Proc. European Signal Process-ing Conference, Aalborg, Denmark, Aug. 2010,pp. 1939–1943.

[5] L. Romoli, P. Peretti, S. Cecchi, L. Palestini,and F. Piazza, “Real-Time Implementation ofWave Field Synthesis for Sound ReproductionSystems,” in Proc. IEEE Asia Pacific Confer-ence on Circuits and Systems, Macao, China,Nov. 2008, pp. 430–433.

[6] A. T. de Hoop, “Fields and waves excited byimpulsive point sources in motion – The general3D time-domain Doppler effect,” Wave Motion,vol. 43, pp. 116–122, Sept. 2005.

[7] A. Franck, A. Grafe, T. Korn, and M. Strauß,“Reproduction of Moving Sound Sources byWave Field Synthesis: An Analysis of Arti-facts,” in Proc. 32nd Audio Engineering SocietyInternational Conference, Hillerød, Denmark,Sept. 2007.

[8] J. Ahrens and S. Spors, “Reproduction of Mov-ing Virtual Sound Sources with Special Atten-tion to the Doppler Effect,” in Proc. 124th Au-dio Engineering Society Convention, Amster-dam, The Netherlands, May 2008.

[9] C. Tsakostas and A. Floros, “Real-time Spa-tial Representation of Moving Sound Sources,”in Proc. 123rd Audio Engineering Society Con-vention, New York, USA, October 2007.

[10] H. Zhao and J. Yu, “A Simple and Efficient De-sign of Variable Fractional Delay FIR Filters,”vol. 53, no. 2, pp. 157–160, Feb. 2006.

[11] ITU-R BS. 1534, “Method for subjective listen-ing tests of intermediate audio quality,” 2001.

[12] B. Klehs and T. Sporer, “Wave Field Synthe-sis in the Real World: Part 1 – In the LivingRoom,” in Proc. 114th Audio Engineering Soci-ety Convention, Amsterdam, The Netherlands,March 2003.

[13] T. Sporer and B. Klehs, “Wave Field Synthesisin the Real World: Part 2 – In the Movie The-atre,” in Proc. 116th Audio Engineering SocietyConvention, Berlin, Germany, May 2004.

[14] A. Lattanzi, F. Bettarelli, and S. Cecchi, “NU-Tech: The Entry Tool of the hArtes Toolchainfor Algorithms Design,” in Proc. 124th AudioEngineering Society Convention, Amsterdam,The Netherlands, May 2008.


Page 8 of 10


−2 0 2

0

2

4

x [m]

y[m

]

−1

−0.5

0

0.5

1

(a) v=20 m/s, frame size=128 points.

−2 0 2

0

2

4

x [m]

y[m

]

−1

−0.5

0

0.5

1

(b) v=20 m/s, frame size=512 points.

−2 0 2

0

2

4

x [m]

y[m

]

−1

−0.5

0

0.5

1

(c) v=40 m/s, frame size=128 points.

−2 0 2

0

2

4

x [m]

y[m

]

−1

−0.5

0

0.5

1

(d) v=40 m/s, frame size=512 points.

−2 0 2

0

2

4

x [m]

y[m

]

−1

−0.5

0

0.5

1

(e) v=80 m/s, frame size=128 points.

−2 0 2

0

2

4

x [m]

y[m

]

−1

−0.5

0

0.5

1

(f) v=80 m/s, frame size=512 points.

Fig. 8: Sound fields reproduced by a linear array of 20 loudspeakers equally spaced by 20 cm at y = 0 m,for a source parallel moving to the x-axis at various velocity and for different frame sizes. Source frequencyis 500 Hz sampled at 44100 Hz.


Page 9 of 10


0.1 0.2 0.3 0.40

500

1000

1500

t [s]

f[H

z]

−60

−40

−20

0

[dB]

(a) v=40 m/s, frame size=32 points.

0.05 0.1 0.150

500

1000

1500

t [s]

f[H

z]

−60

−40

−20

0

[dB]

(b) v=80 m/s, frame size=32 points.

0.1 0.2 0.3 0.40

500

1000

1500

t [s]

f[H

z]

−60

−40

−20

0

[dB]

(c) v=40 m/s, frame size=64 points.

0.05 0.1 0.150

500

1000

1500

t [s]

f[H

z]

−60

−40

−20

0

[dB]

(d) v=80 m/s, frame size=64 points.

0.1 0.2 0.3 0.40

500

1000

1500

t [s]

f[H

z]

−60

−40

−20

0

[dB]

(e) v=40 m/s, frame size=128 points.

0.05 0.1 0.150

500

1000

1500

t [s]

f[H

z]

−60

−40

−20

0

[dB]

(f) v=80 m/s, frame size=128 points.

Fig. 9: Spectrograms registered at ~robs = [0, 1.5] m of the field reproduced by a linear array of 20 loud-speakers equally spaced by 20 cm at y = 0 m, for a source parallel moving to the x-axis at various velocityand for different frame sizes. Source frequency is 500 Hz sampled at 44100 Hz.


Page 10 of 10

Documents

Real Time Reproduction of Moving Sound Sources by Wave Field Synthesis: Objective and Subjective Quality Evaluation