5
464 J. Audio Eng. Soc., Vol. 60, No. 6, 2012 June FEATURE ARTICLE Francis Rumsey Staff Technical Writer Audio event identification and sound-morphing techniques were two key themes of the AES 45th International Conference, Applications of Time-Frequency Processing in Audio Time–frequency processing A s digital signal processing becomes more powerful, there is increasing scope for handling audio signals in the frequency domain—dividing the signal into narrow frequency bands and performing operations on those bands independently. This is not a new concept, having been used in applications such as low bit-rate coding for many years, But the amount of work in the field has become so great that the AES organ- ized a conference recently in Finland, chaired by Ville Pulkki, devoted entirely to the topic. This article attempts to condense some of the contributions from that conference, taking what is essentially a heavily mathematical topic and presenting it in a more digestible form. Because the human hearing mechanism is frequency-selective, and models of it tend to assume a number of so-called critical bands within which energy is integrated and mask- ing takes place, a frequency-selective approach to audio signal processing lends itself well to applications that need to take into account features of human hearing. These include psychoacoustic models, low bit-rate coding, audio effects, quality enhancement, sound synthesis, semantic analysis, and speech processing. The concept of time–frequency (TF) processing comes into play because frequency domain process- ing needs to take into account the temporal aspects of the signal. Transforms from the time to frequency domains typically work on the basis of chopping up the audio signal into a series of time windows. It is a basic princi- ple of the theory underlying this field that there is a direct relationship between the time and the frequency domains—the reso- lution available in one domain being directly connected to the resolution selected in the other. One of the keys to success in algo- rithms that process audio in this way is to determine the optimum trade-offs between time and frequency resolution to achieve the best quality, and this may be made variable or adaptive based on the signal content and other contextual information. For example, when a signal has rapid transients separated by relatively low-level continuous segments, there can be advantages to choosing rela- tively short time windows. Whereas higher level continuous signals can warrant the use of longer time windows. We concentrate here on two specific appli- cations of TF processing in audio engineer- ing, represented prominently at the conference, namely the identification or clas- sification of sounds and audio morphing. IDENTIFYING AUDIO EVENTS One of the primary themes that emerged from the papers presented at the conference was the use of time–frequency processing in the iden- tification of audio events. This can include the automatic isolation of specific features of an audio signal such as percussive sounds, or higher-level content recognition and classifi- cation such as the identification of sounds made by particular individuals or things. THE BENEFITS OF BLT In his paper “Detection of Audio Events by Boosted Learning of Local Time–Frequency Patterns,” Aki Härmä talks about a classifier that he calls BLT, which in this case is not bacon, lettuce, and tomato but “boosted learn- ing of local time-frequency patterns,” based on a previous algorithm known as AdaBoost. It is applied particularly to the detection of short audio events such as footsteps in environmen- tal recordings and percussion sounds in music. Such sounds have a distinctive tem- plate that makes them stand out from others such as speech or musical tones. Härmä likens the problem of detecting them to simi- lar problems of object recognition in machine vision. However the prototype templates that are needed as training data for pattern match- ing can be confused by local variants in the signal such as reverberation and background noise, so there are advantages to deriving the training data from the local signal. In the BLT process the training data are derived from a small number of locally collected TF patterns of interesting events. BLT uses a nonuniform frequency scale, partly because footsteps and other natural sounds have more energy at low frequencies than high and partly because the human hear- ing system employs an approximately loga- rithmic frequency scale. These things inform an assumption that nonuniform frequency scaling may be beneficial in the recognition of natural sounds. BLT also takes account of a combination of spectrum and temporal enve- lope features, splitting the signal into a number of short-time power spectrum esti- mates, grouped into auditory bands. Template matching is undertaken using a series of weak classifiers, which are trained on successive similar events in the signal (e.g., a series of footsteps). Features of the signal are selected that most successfully divide the training data into “events” and “nonevents,” which requires in Härmä’s case that training recordings of ten individuals walking in a hallway are manually annotated to separate real footstep sounds from other similar noises in the recording.

Time-Frequency Processing

Embed Size (px)

DESCRIPTION

By Francis Rumsey

Citation preview

  • 464 J. Audio Eng. Soc., Vol. 60, No. 6, 2012 June

    FEATURE ARTICLE

    Francis RumseyStaff Technical Writer

    Audio event identification and sound-morphing techniques weretwo key themes of the AES 45th International Conference,Applications of Time-Frequency Processing in Audio

    Timefrequencyprocessing

    As digital signal processing becomesmore powerful, there is increasingscope for handling audio signals in thefrequency domaindividing the signal intonarrow frequency bands and performingoperations on those bands independently.This is not a new concept, having been usedin applications such as low bit-rate coding formany years, But the amount of work in thefield has become so great that the AES organ-ized a conference recently in Finland, chairedby Ville Pulkki, devoted entirely to the topic.This article attempts to condense some of thecontributions from that conference, takingwhat is essentially a heavily mathematicaltopic and presenting it in a more digestibleform.

    Because the human hearing mechanism isfrequency-selective, and models of it tend toassume a number of so-called critical bandswithin which energy is integrated and mask-ing takes place, a frequency-selectiveapproach to audio signal processing lendsitself well to applications that need to takeinto account features of human hearing.These include psychoacoustic models, lowbit-rate coding, audio effects, qualityenhancement, sound synthesis, semanticanalysis, and speech processing. The conceptof timefrequency (TF) processing comesinto play because frequency domain process-ing needs to take into account the temporalaspects of the signal. Transforms from thetime to frequency domains typically work onthe basis of chopping up the audio signal intoa series of time windows. It is a basic princi-ple of the theory underlying this field thatthere is a direct relationship between thetime and the frequency domainsthe reso-lution available in one domain being directly

    connected to the resolution selected in theother. One of the keys to success in algo-rithms that process audio in this way is todetermine the optimum trade-offs betweentime and frequency resolution to achieve thebest quality, and this may be made variable oradaptive based on the signal content andother contextual information. For example,when a signal has rapid transients separatedby relatively low-level continuous segments,there can be advantages to choosing rela-tively short time windows. Whereas higherlevel continuous signals can warrant the useof longer time windows.We concentrate here on two specific appli-

    cations of TF processing in audio engineer-ing, represented prominently at theconference, namely the identification or clas-sification of sounds and audio morphing.

    IDENTIFYING AUDIO EVENTSOne of the primary themes that emerged fromthe papers presented at the conference was theuse of timefrequency processing in the iden-tification of audio events. This can include theautomatic isolation of specific features of anaudio signal such as percussive sounds, orhigher-level content recognition and classifi-cation such as the identification of soundsmade by particular individuals or things.

    THE BENEFITS OF BLTIn his paper Detection of Audio Events byBoosted Learning of Local TimeFrequencyPatterns, Aki Hrm talks about a classifierthat he calls BLT, which in this case is notbacon, lettuce, and tomato but boosted learn-ing of local time-frequency patterns, based ona previous algorithm known as AdaBoost. It isapplied particularly to the detection of short

    audio events such as footsteps in environmen-tal recordings and percussion sounds inmusic. Such sounds have a distinctive tem-plate that makes them stand out from otherssuch as speech or musical tones. Hrmlikens the problem of detecting them to simi-lar problems of object recognition in machinevision. However the prototype templates thatare needed as training data for pattern match-ing can be confused by local variants in thesignal such as reverberation and backgroundnoise, so there are advantages to deriving thetraining data from the local signal. In the BLTprocess the training data are derived from asmall number of locally collected TF patternsof interesting events.

    BLT uses a nonuniform frequency scale,partly because footsteps and other naturalsounds have more energy at low frequenciesthan high and partly because the human hear-ing system employs an approximately loga-rithmic frequency scale. These things informan assumption that nonuniform frequencyscaling may be beneficial in the recognition ofnatural sounds. BLT also takes account of acombination of spectrum and temporal enve-lope features, splitting the signal into anumber of short-time power spectrum esti-mates, grouped into auditory bands. Templatematching is undertaken using a series of weakclassifiers, which are trained on successivesimilar events in the signal (e.g., a series offootsteps). Features of the signal are selectedthat most successfully divide the training datainto events and nonevents, which requiresin Hrms case that training recordings often individuals walking in a hallway aremanually annotated to separate real footstepsounds from other similar noises in therecording.

  • 466 J. Audio Eng. Soc., Vol. 60, No. 6, 2012 June

    FEATURE ARTICLE

    In testing the BLT algorithm on other realrecordings, footstep sounds were combinedwith other sounds such as door slams,speech, aircon, and printer noise. Anotherround of testing used a song in which snaredrum hits were to be identified from amongguitars and a vocalist. BLT was comparedwith another typical sound classificationmethod and found to perform considerablybetter in the case of the footstep sounds. Inthe case of the drum sounds the perform-ances of the two methods were similar.Hrm concludes that BLT has the potentialto perform well on noisy and difficult data.

    SONIC HANDPRINTSAntti Jylh and his colleagues explain thatnonspeech sounds such as hand claps maybe able to convey enough information toidentify a person. This is particularly inter-esting in security applications where multi-ple sources of information can be employedto identify individuals. In Sonic Handprints:Person Identification with Hand ClappingSounds by a Model-Based Method they notethat people can often distinguish their ownhand claps from those of others, and thatautomatic recognition algorithms trained onone persons hand claps do not necessarilywork well on anothers.Similarly to the previously mentioned

    work by Hrm, the challenge in this case isto distinguish subtle differences betweenpercussive sounds, again using a form oftemplate matching. The authors adapted aprobabilistic model involving a hiddenMarkov model, developed by others for pitchtracking in streamed audio, turning it intoone capable of dealing with predefined, shortpercussive events. Because reverberation inrecordings can degrade the recognitionaccuracy of such systems, some post-processing was employed to skip over tenframes after the start of each detected eventin order to reduce this problem.

    When testing the algorithm, a sound bankof hand claps was captured using a laptopcomputer microphone from 16 differentpeople, in a room with a reverberation time ofabout 0.7 seconds. No particular effort wasmade to control the style of clapping in orderto capture the most natural sounds possible.A series of spectral templates was derivedfrom a subset of the hand-clap recordingsused as training data, shown in Fig. 1. Here itcan be seen how different the templates ofeach subject are. In the case of some peoplethe way they used their hands changed overthe clapping sequence, so the spectral shapeof the templates is based on an overall patternof what was essentially a changing phenome-non. The classification accuracy was 64%,which is well above the chance level of 6.25%that applies to this number of individuals.Inevitably the performance was particularlygood if a persons template contained uniqueregions of high energy that differentiatedthem clearly from others, and also if theywere particularly consistent in their clapping.

    EVALUATING BUTTON SOUNDSAnother possible application of TF processingin the identification of audio events is

    described by Kensuke Fujinokiand his colleagues in Auto-mated Evaluation for ButtonSounds from Wavelet-BasedFeatures. Here the aim is tofind features of the audio sig-nals arising from button-pushes that distinguish thequality of one from that ofanother. The authors describetheir use in a previous study ofa particular type of TF trans-form whose resolution in bothdomains corresponds closelyto the characteristics ofhuman hearingthe continu-ous wavelet transform (CWT).

    The waveform of a typical push-button soundis shown in Fig. 2, along with its TF repre-sentation using wavelets. In this study aslightly different method using triangularbiorthogonal wavelets is employed, whichenables a complicated multiscale pyramidrepresentation of the signal in three dimen-sions. From this the cumulative sound pres-sure and reverberation elements of the signalcan be identified, with a better rejection ofbackground noise than when simply measur-ing the sound pressure of the original signal.

    In a psychoacoustic experiment theauthors employed a semantic differentialmethod (one involving distinctions betweentwo opposing verbal terms) to determinedescriptive features of the button-pushsounds that could be mapped to the featuresextracted from the audio signals. Somesuccess was had in using the wavelet-basedfeatures to enable the automatic recognitionof listeners preferenceannoyance scores forthe different button-push sounds.

    SOUND MORPHING USINGWAVELETSAside from its use in the identification andclassification of sounds, TF processing canalso be used in sound-morphing applica-tions. Morphing involves either the gradualtransformation of one sound into another or

    Fig. 1. Spectral templates of 16 subjects hand claps used intemplate matching (courtesy Jylh et al.)

    WHATS A HIDDENMARKOV MODEL?Hidden Markov models are used quitewidely in audio recognition systems. TheMarkov assumption being that a systemscurrent state is based on all of its previousstates. (Most practical models in fact onlydepend on the most recent states ratherthan all previous states.) A hidden Markovmodel attempts to infer information about ahidden process that is creating observableinformation, enabling it to predict with acertain degree of probability what are themost likely features of the hidden process.

    Fig. 2. (a) Typical waveform of a push-buttonsound. (b) Timefrequency representation ofthe same sound using Morlet wavelets.Different colors represent the sound amplitudeat a point on the plot, according to the right-hand scale (courtesy Fujinoki et al.).

  • the hybridization of two sounds, so that thecharacteristics of one are superimposed onthose of another. It cannot usually be doneby just a simple mixture of two sounds in thetime domain, as the ear is very good at iden-tifying the original components of linearmixtures and decomposing them perceptu-ally. The new sound has to be a true prod-uct of the two original sounds, which usu-ally means some sort of convolution orfeature extraction and interpolation. Gabrielli and Squartini describe a new

    process for achieving this in Ibrida: A NewDWT-Domain Sound Hybridization Tool.Like the sound-analysis method mentionedin the previous section, Ibrida employs awavelet transform to decompose the origi-nal sound into its spectral elements becauseof its nonuniform timefrequency resolu-tion (preserving signal transients) and lowcomputational requirements. This is

    followed by an inverse wavelet transform toput the signal back in the time domain.Advantages of the nonuniform characteris-tics of the wavelet transform over alterna-tives such as the short-time Fouriertransform (STFT) include the likelihoodthat a greater number of the transformedfrequency bands will cover the main contentof music and speech signals. Uniform trans-forms tend to result in a lot of the bandsbeing in regions of the spectrum wherethere isnt much action in the audio signal,so they are largely irrelevant.The idea of hybridizing more than one

    sound source is taken back to its earliestinstances, considering instruments such asthe jaw harp (in which the vocal tract inter-acts with the instrument) and the didgeridoo,in which the resonances of the vocal tractinteract with a resonating wood tube. Inmodern electronic music terms we tend totalk in terms of cross synthesis, vocoding,and morphing. Fig. 3 shows the basic processinvolved, in which a number of originalsignals are subjected to a discrete wavelettransform (DWT), resulting in a set of sub-band signals (in separated frequency bands).The equivalent subband signals from eachsource are mixed in appropriate proportionsto create the new sound, then the resultingsignal is subjected to an inverse discretewavelet transform (IDWT) to return it to thetime domain. By adjusting the proportions ofthe different signals mixed together in eachfrequency band, the characteristics of theoutput sound can be made more or less likeone of the input signals in different regions ofthe spectrum. Another advantage of the DWTin this context is said to be that it has a rela-tively small number of control parameters,which makes for easy user interaction.The authors implemented a version of

    Ibrida in the Pure Data graphical program-ming environment, which enables thehybridization of two mono inputs withcontrol of the wavelet coefficients and mixingparameters. It can be downloaded in proto-type form from http://a3lab.dibet.univpm.it/downloads/Ibrida-tool.

    MORPHING USING A GABORMASKOlivero et al., in Sound Morphing StrategiesBased on Alterations of TimeFrequencyRepresentations by Gabor Multipliers,define a sound morph as a hybrid soundwhose timbre is intermediate between asource and a target sound, having the samefundamental frequency, duration, and loud-ness. Many traditional approaches to morph-ing are said to work according to a sinusoid-plus-noise model of the signal, in which the

    ALTERNATIVETIMEFREQUENCYTRANSFORMSThe original TF transform devised by Fourierassumed that the sound being transformedcontinued over an infinite period withoutchanging, resulting in a representation inthe frequency domain by a set of sinewave components and their phases. Short-time granular transforms such as Gaborand wavelet, divide up the signal into aseries of small time units, allowing thespectral components and their phases to betracked as the signal changes over time. Inthe Gabor case the units are known asatoms and time windowing involvesmultiplying the time-domain signal by aGaussian function before using the short-time Fourier transform to move it into thefrequency domain. Wavelet transformsinvolve the decomposition of complexsignals into a set of wavelets, which arebrief wave-like oscillations that start andend with zero amplitude, building to apeak in the center of the time period, suchas shown here.

    Typical pattern of a wavelet, in thiscase showing the real (blue) andimaginary (red dotted) parts of acomplex Morlet wavelet. The Morlet (orGabor) wavelet has characteristicsclosely related to human perception(courtesy Fujinoki et al.).

    J. Audio Eng. Soc., Vol. 60, No. 6, 2012 June 467

    FEATURE ARTICLE

  • FEATURE ARTICLE

    original signals are represented as a summa-tion of harmonic partials. (Morphing can beachieved in this approach by interpolatingbetween the partials in the frequencydomain to obtain a timbre that lies betweenthe two original signals.) These authors,however, prefer not to assume a formal sig-nal model and deal directly with the TF rep-resentation of sounds. One of the centralplanks of their approach is the Gabor TF rep-resentation of sounds. The Gabor transformis essentially a way of converting a signalfrom the time to the frequency domain thatallows for the fact that real sounds changeover time. Gabor multipliers are essentially functions

    that alter the level and phaseof groups of frequency bandsin the TF representation ofthe signal so as to perform amodification to the spectrumduring each short time frame.They can be used to createtime-varying filters as long asthe signal does not havemassive TF shifts. The processis a form of convolution,whereby the filter functionis convolved with the signal by

    point-wise multiplication in the frequencydomain. The Gabor mask is the spectralshape or transfer function of the multiplierneeded to achieve a convincing morphbetween the two original sounds. Various approaches to estimating suitable

    masks for morphing are discussed by theauthors of this paper, including values thatgive rise to conventional cross synthesismorphing effects such as addition (simplemixing) and multiplication (essentially filter-ing of one signal by another) of the sourceand target spectrograms. As this representedPh.D. work in progress, there was consider-ably more work to be done before a clearconclusion can be reached about the best

    choice of parameters and control methods forhigh-quality morphing using this approach.

    MORPHING OF PERCUSSIVESOUNDSThe sinusoidal model of sounds mentioned atthe beginning of the previous section is notparticularly suitable for use when morphingnoisy sounds such as percussion instruments.Andrea Primavera and his colleagues there-fore discuss an alternative that can be used forautomatic morphing of such sounds in theirpaper Audio Morphing for Percussive HybridSound Generation. It involves two main fea-tures, namely preprocessing of the originalsignals in the frequency domain, followed bylinear interpolation in the time domain.A basic block diagram of the approach is

    shown in Fig. 4. The main elements of pre-processing involves time alignment of theoriginal signals and scaling of the releaseportion of the sound envelopes so that thetwo original sounds have the same length.This is not applied to the attack portion of theenvelopes because those are crucial to thesounds unique perceptual identity. In orderto determine the attack and release portionsof the percussive sounds, the authors makeuse of the Amplitude Centroid Trajectory(ACT) and the reverberation time of thesound (the point at which it has decayed by60 dB). The ACT model is based on an evalua-tion of the spectral centroid, which essentiallydetermines the primary energy peak in thefrequency spectrum. As shown in Fig. 5, theend of the attack and the start of the decay isdetermined to be when the slope of the spec-tral centroid changes direction.It is explained that linear interpolation in

    the time domain between two sets of originalaudio samples tends not to be very effectiveas a high-quality morphing techniquebecause one can hear the original sounds inthe morphed result. This happens mostwhen the two sounds are noticeably differentfrom each other. However, in this case it isclaimed to be appropriate because the drumsounds concerned often have a similar pitchto each other and a very noisy spectrum,which makes the approach more suitablethan alternatives based on additive synthesis.In order to confirm this, the authors attemptsome subjective and objective comparisonsof the various alternatives, namely linearinterpolation on its own, the new approachincluding a preprocessing stage, and sinu-soid-plus-noise modeling.In the subjective tests listeners were asked

    to estimate the interpolation factor betweenthe two original sounds (in other words, howmuch of each was contained in the morphedresult), along with the quality (defined in

    468 J. Audio Eng. Soc., Vol. 60, No. 6, 2012 June

    Fig. 3. Basic operation of wavelet transform-based soundhybridization approach (courtesy Gabrielli and Squartini)

  • terms of naturalness). The estimated interpo-lation factor was found to be very close to thephysical interpolation factor when using thenew approach involving preprocessing,whereas with conventional interpolation therelationship was nothing like a linear (aperceived factor of 0.8 related to a physicalfactor of around 0.5, for example). Themorphed sounds created using sinusoidalmodeling were perceived as highly unnatural,which confirmed the opinion that thismethod was unsuitable for percussive sounds.A multidimensional scaling experiment inwhich linearly interpolated sounds werecompared in pairs for their similarity demon-

    strated that releasetime seemed to be themost important feature in the perception ofthese hybrid percussive sounds.

    CONCLUSIONAudio identification and classification sys-tems, as well as sound morphing algorithms,depend heavily on timefrequency process-ing to achieve high-quality results. Theemphasis of current research seems to be onthe choice of transform used when movingbetween domains and the precise ways inwhich the transfer functions and frequencyscalings are determined in order to achieve

    the most successful results. Formal psychoa-coustic experiments are increasingly used toevaluate the outcomes of the TF processingalgorithms used in these fields, which lendsanother dimension to what has hithertobeen a field primarily concerned with thesignal processing challenges.

    Fig. 4. Block diagram of a morphing process for percussive sounds,showing frequency domain processes in yellow and time domainprocesses in blue (Figs. 4 and 5 courtesy Primavera et al.).

    Fig. 5. Time evolution of the spectral centroid (shown red) for apercussive signal waveform (shown blue). The estimated boundarybetween attack and release portions is shown in green.

    J. Audio Eng. Soc., Vol. 60, No. 6, 2012 June 469

    FEATURE ARTICLE

    www.nti-audio.comNTi Audio AG Liechtenstein+423 239 6060

    NTI Americas Inc. Portland, Oregon, USA+1 503 684 7050

    NTI ChinaSuzhou, Beijing, Shenzhen+86 512 6802 0075

    NTI JapanTokyo, Japan +81 3 3634 6110

    FLEXUS Audio AnalyzerSuperior specifications cover numerous research and design applications over wide level and frequency ranges.

    System is optimized for measurement speed. Fast glide sweeps typically provide all relevant measurements on all channels in less than one second.

    Configure to exactly meet your audio test needs. Add measurement channels, switchers, impedance measurement modules and interfaces.

    .NET-based FX-Control suite provides comprehensive, intuitive access to all controls and measurement func-tions. Control several instruments through one suite.

    Leading-edge measurement technology reveals the speaker parameters, including Rub&Buzz, with a single stimulus. PureSound provides complete speaker characterization with unparalleled correlation to human hearing.

    FLEXUS

    AnalyzAudio FLEXUS

    erAnalyz

    FLEXUSPe r f o r m a n c e S u p e r i o r s p e c i fi c a t i o n s co c ow i d e l e v e l a n d frequency ranges. r e q u e n cy y

    S p e e d i n y o u Sy s t e m i s o p t i m i ze d f o r m a l l r e l e v a n t m e a s u r e m e n t

    AnalyzAudio FLEXUS i n y o u r l a b

    ov e r n u m e r o u s researc e s e a r ch a n d d e s i g n a p r a n g e s .

    u r l i n e m e a s u r e m e n t s p e e d . Fa s t g l i d e s w e e p s

    s o n a l l ch a n n e l s i n l e s s t h a n o n e s e c o

    erAnalyz

    p p l i c a t i o n s ov e r

    s t y p i c a l l y p r ov i d e o n d .

    M o d u l a r s y s t e m C o n fi g u r e to exactly meet your audio test needs. to exactly meet your audio test needs. o e x a c ct l y m e e t y o u A d d m e a s u r e m e n t ch a n n e l s , switcm e a s u r e m e n t modules and interfaces. modules and interfaces. o d u l e s a n d i n t

    F X - C o n t r o l s o ft f ftw ftw.NE - b a s e d FX-Control suite pro X - C o n t r o l s u i t e pro p rTTT-based FX-Control suite proT intuitiv e access to all controls and measurement func a c c e s s t o a l l c o n t r o l s t i o n s . C o n t r o l several instruments through one suite. e v e r a l i n s t r u m e

    m r audio test needs. a u d i o t e s t needs. n e e d s .

    w i t chers , i m p e d a n c e t e r f a c e s .

    ware r ov i d e s comprehensi comprehensi o m p r e h e n s iv e ,

    a n d measurement func measurement func e a s u r e m e n t func func u n c- e n t s t h r o u g h o n e suite. suite. u i t e .

    PureSoundLeading-edge measurement tecspeaker parameters, including Rstimulus. PureSound providescharacterization with unparallelehearing.

    chnology reveals the Rub&Buzz, with a single s complete speaker ed correlation to human

    icas@nti-audioamer+1 503 684 7050

    tland,orPAmerNTI

    .como@nti-audioinf+423 239 6060

    einensthtLiecG Audio ANTi

    .comhina@nti-audioc075 +86 512 6802 0

    Shenzhen Beijing,Suzhou,NTI China

    .comicas@nti-audio+1 503 684 7050

    USAon,eg Ortland,icas Inc.Amer

    www.comjapan@nti-audio

    01+81 3 3634 61apan J,ookyTToky

    apanNTI J

    .com

    Shenzhen

    .com.nti-audiowww

    Editors note: Purchase a CD-ROM with allAES 45th Conference papers atwww.aes.org/publications/conferences/.To purchase individual papers go towww.aes.org/e-lib/.