12
Approximating the head-related transfer function using simple geometric models of the head and torso V. Ralph Algazi and Richard O. Duda a) CIPIC, Center for Image Processing and Integrated Computing, University of California, Davis, California 95616 Ramani Duraiswami, Nail A. Gumerov, and Zhihui Tang Perceptual Interfaces and Reality Laboratory, Institute for Advanced Computer Studies, University of Maryland, College Park, Maryland 20742 ~Received 5 April 2002; accepted for publication 1 August 2002! The head-related transfer function ~HRTF! for distant sources is a complicated function of azimuth, elevation and frequency. This paper presents simple geometric models of the head and torso that provide insight into its low-frequency behavior, especially at low elevations. The head-and-torso models are obtained by adding both spherical and ellipsoidal models of the torso to a classical spherical-head model. Two different numerical techniques—multipole reexpansion and boundary element methods—are used to compute the HRTF of the models in both the frequency domain and the time domain. These computed HRTFs quantify the characteristics of elevation-dependent torso reflections for sources above the torso-shadow cone, and reveal the qualitatively different effects of torso shadow for sources within the torso-shadow cone. These effects include a torso bright spot that is prominent for the spherical torso, and significant attenuation of frequencies above 1 kHz in a range of elevations. Both torso reflections and torso shadow provide potentially significant elevation cues. Comparisons of the model HRTF with acoustic measurements in the horizontal, median, and frontal planes confirm the basic validity of the computational methods and establish that the geometric models provide good approximations of the HRTF for the KEMAR mannequin with its pinnae removed. © 2002 Acoustical Society of America. @DOI: 10.1121/1.1508780# PACS numbers: 43.64.Bt, 43.66.Qp, 43.66.Pn @LHC# I. INTRODUCTION A. Variation of the HRTF with elevation Head-related transfer functions ~HRTFs! are central to spatial hearing, and have been studied extensively ~Blauert, 1997; Carlile, 1996; Wightman and Kistler, 1997!. The HRTF depends not only on the position of the sound source relative to the listener, but also on the size and shape of the listener’s torso, head, and pinnae. The resulting complexity makes its behavior difficult to understand. In this paper, we investigate the HRTFs for distant sources using very simple geometric models of the head and torso to gain insight into various features observed in acous- tically measured human HRTFs. The simplest informative model is the spherical-head model. Introduced by Lord Ray- leigh almost a century ago ~Strutt, 1907!, it has been used by many researchers to explain how the head affects the inci- dent sound field ~Hartley and Fry, 1921; Kuhn, 1977, 1987; Brungart and Rabinowitz, 1999!. Although this model pro- vides only a crude approximation to a human HRTF, it yields a first-order explanation and approximation of how the inter- aural time difference ~ITD! and the interaural level differ- ence ~ILD! vary with azimuth and range. However, the spherical-head model does not provide any cues for elevation. 1 It is well established that the pinna pro- vides the major source of elevation cues ~Batteau, 1967; Gardner and Gardner, 1973; Wright et al., 1974!. 2 The effect of the pinna on the HRTF has been studied both experimen- tally ~Mehrgardt and Mellert, 1977; Shaw, 1974, 1997; Wightman and Kistler, 1989! and computationally ~Lopez- Poveda and Meddis, 1996; Kahana et al., 1999; Kahana and Nelson, 2000; Katz, 2001!. This work shows that the influ- ence of the pinna is negligible below about 3 kHz, but is both significant and complicated at frequencies where the wave- length is short compared to the size of the pinna. The torso also influences the HRTF and provides eleva- tion dependent information ~Kuhn and Gurnsey, 1983; Kuhn, 1987; Genuit and Platte, 1981!. Although torso cues are not as strong as pinna cues perceptually, they appear at lower frequencies where typical sound signals have most of their energy. It has been shown that a simple ellipsoidal model of the torso can be used to calculate a torso reflection, and that such reflections provide significant elevation cues away from the median plane, even for sources having no spectral energy above 3 kHz ~Algazi et al., 2001a!. However, reflection is a short-wavelength or high- frequency concept, and modeling the effects of the torso by a specular reflection is only a first approximation. Further- more, as the source descends in elevation, a point of grazing incidence is reached, below which torso reflections disappear and torso shadowing emerges. Rays drawn from the ear to points of tangency around the upper torso define a cone that we call the torso-shadow cone ~see Fig. 1!. Clearly, the specular reflection model does not apply within the torso- shadow cone. Instead, diffraction and scattering produce a qualitatively different behavior, characterized by the attenu- a! Electronic mail: [email protected] 2053 J. Acoust. Soc. Am. 112 (5), Pt. 1, Nov. 2002 0001-4966/2002/112(5)/2053/12/$19.00 © 2002 Acoustical Society of America

Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

Embed Size (px)

Citation preview

Page 1: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

Approximating the head-related transfer function using simplegeometric models of the head and torso

V. Ralph Algazi and Richard O. Dudaa)

CIPIC, Center for Image Processing and Integrated Computing, University of California, Davis,California 95616

Ramani Duraiswami, Nail A. Gumerov, and Zhihui TangPerceptual Interfaces and Reality Laboratory, Institute for Advanced Computer Studies,University of Maryland, College Park, Maryland 20742

~Received 5 April 2002; accepted for publication 1 August 2002!

The head-related transfer function~HRTF! for distant sources is a complicated function of azimuth,elevation and frequency. This paper presents simple geometric models of the head and torso thatprovide insight into its low-frequency behavior, especially at low elevations. The head-and-torsomodels are obtained by adding both spherical and ellipsoidal models of the torso to a classicalspherical-head model. Two different numerical techniques—multipole reexpansion and boundaryelement methods—are used to compute the HRTF of the models in both the frequency domain andthe time domain. These computed HRTFs quantify the characteristics of elevation-dependent torsoreflections for sources above the torso-shadow cone, and reveal the qualitatively different effects oftorso shadow for sources within the torso-shadow cone. These effects include a torso bright spot thatis prominent for the spherical torso, and significant attenuation of frequencies above 1 kHz in arange of elevations. Both torso reflections and torso shadow provide potentially significant elevationcues. Comparisons of the model HRTF with acoustic measurements in the horizontal, median, andfrontal planes confirm the basic validity of the computational methods and establish that thegeometric models provide good approximations of the HRTF for the KEMAR mannequin with itspinnae removed. ©2002 Acoustical Society of America.@DOI: 10.1121/1.1508780#

PACS numbers: 43.64.Bt, 43.66.Qp, 43.66.Pn@LHC#

rct

xi

nauivay

inc;

lder-

an-

en-7;

thve-

va-,twer

heirl ofthatomergy

h-y a

er-zingearr tothat

so-e au-

I. INTRODUCTION

A. Variation of the HRTF with elevation

Head-related transfer functions~HRTFs! are central tospatial hearing, and have been studied extensively~Blauert,1997; Carlile, 1996; Wightman and Kistler, 1997!. TheHRTF depends not only on the position of the sound sourelative to the listener, but also on the size and shape oflistener’s torso, head, and pinnae. The resulting complemakes its behavior difficult to understand.

In this paper, we investigate the HRTFs for distasources using very simple geometric models of the headtorso to gain insight into various features observed in acotically measured human HRTFs. The simplest informatmodel is the spherical-head model. Introduced by Lord Rleigh almost a century ago~Strutt, 1907!, it has been used bymany researchers to explain how the head affects thedent sound field~Hartley and Fry, 1921; Kuhn, 1977, 1987Brungart and Rabinowitz, 1999!. Although this model pro-vides only a crude approximation to a human HRTF, it yiea first-order explanation and approximation of how the intaural time difference~ITD! and the interaural level difference~ILD ! vary with azimuth and range.

However, the spherical-head model does not providecues for elevation.1 It is well established that the pinna provides the major source of elevation cues~Batteau, 1967;Gardner and Gardner, 1973; Wrightet al., 1974!.2 The effect

a!Electronic mail: [email protected]

J. Acoust. Soc. Am. 112 (5), Pt. 1, Nov. 2002 0001-4966/2002/112(5)/2

ehety

tnds-e-

i-

s-

y

of the pinna on the HRTF has been studied both experimtally ~Mehrgardt and Mellert, 1977; Shaw, 1974, 199Wightman and Kistler, 1989! and computationally~Lopez-Poveda and Meddis, 1996; Kahanaet al., 1999; Kahana andNelson, 2000; Katz, 2001!. This work shows that the influ-ence of the pinna is negligible below about 3 kHz, but is bosignificant and complicated at frequencies where the walength is short compared to the size of the pinna.

The torso also influences the HRTF and provides eletion dependent information~Kuhn and Gurnsey, 1983; Kuhn1987; Genuit and Platte, 1981!. Although torso cues are noas strong as pinna cues perceptually, they appear at lofrequencies where typical sound signals have most of tenergy. It has been shown that a simple ellipsoidal modethe torso can be used to calculate a torso reflection, andsuch reflections provide significant elevation cues away frthe median plane, even for sources having no spectral enabove 3 kHz~Algazi et al., 2001a!.

However, reflection is a short-wavelength or higfrequency concept, and modeling the effects of the torso bspecular reflection is only a first approximation. Furthmore, as the source descends in elevation, a point of graincidence is reached, below which torso reflections disappand torso shadowing emerges. Rays drawn from the eapoints of tangency around the upper torso define a conewe call the torso-shadow cone~see Fig. 1!. Clearly, thespecular reflection model does not apply within the torshadow cone. Instead, diffraction and scattering producqualitatively different behavior, characterized by the atten

2053053/12/$19.00 © 2002 Acoustical Society of America

Page 2: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

b

oududmdr

A,inm

oromoiffiiolasoueF

rspve

o

anba

er

th

tages

iza-nge,

re-sen-

aand

itythe

hisedtic

mp-ffsingtly,gerier

y ans,

ofateoutre,very

es

r is

gle

s for

gd to

ing

usede-

besen beent

rob-d.ur-RTFalud-ry-

the runow

ation of high frequencies when the wavelength is comparato or smaller than the size of the torso.

There are several reasons why the effect of the torsothe behavior of the HRTF for sources in the complete srounding sphere has not been systematically measurestudied. First, the lengthy measurement process preclasking human subjects to stand motionless, and seatedsurements at low elevations are influenced by posture anthe supporting chair, which introduces too many arbitravariables. Although the use of a mannequin such as KEM~Burkhardt and Sachs, 1975! solves this particular problemat very low elevations a truncated torso introduces meanless artifacts of its own, and one would have to use a coplete mannequin with intact arms and legs. Second, the teffects appear at relatively low frequencies, where roreflections—even in anechoic chambers—make it hard totain accurate measurements. Third, it is experimentally dcult to place sufficiently large loudspeakers in the regdirectly below the subject. These obstacles have led to aof knowledge of HRTF behavior for sources in the torshadow cone, a lack that may be responsible for the freqobservation that virtual sources synthesized with HRTrarely appear to come from really low elevations.

B. Methods for determining the HRTF for thesnowman model

To gain a better understanding of the effects of the toon the HRTF at all frequencies and elevations, a simhead-and-torso model called the snowman model was intigated. In its simplest form, the snowman model consistsa spherical head located above a spherical torso~Gumerovet al., 2002!. Unlike the isolated sphere, there is no eleginfinite-series solution for the scattering of sound wavesthe snowman model. However, there are at least three wto obtain the HRTF, all of which are employed in this pap

~1! acoustic measurements,~2! numerical computation using boundary-element me

ods, and

FIG. 1. The torso-shadow cone for the KEMAR mannequin. Rays fromear are tangent to the torso at the points shown. At wavelengths whertracing is valid, specular reflections are produced by the torso for sosources above the cone. The sound for sources within the cone is shadby the torso.

2054 J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

le

nr-oresea-/oryR

g--

so

b--nck

nts

oles-f

tyys:

-

~3! numerical computation using multipole reexpansions.

Each of these approaches has its characteristic advanand disadvantages, which are summarized in turn.

Acoustic measurements entail no mathematical idealtions, are accurate over much of the audible frequency raand can produce both HRTFs and head-related impulsesponses~HRIRs! equally easily. However, room reflectionmake it difficult to measure the response at very low frequcies, physical constraints can make it difficult to positionloudspeaker at very low elevations, and measurementalignment errors make it difficult to get the repeatabilneeded to study systematically the effects of changingvalues of snowman parameters.

By contrast, the computational methods used in tstudy work particularly well at low frequencies, can be usfor any source location, and are well suited to systemaparametric studies. However, they employ idealized assutions, require validation, and have time/accuracy tradeothat limit the highest frequencies that can be used. Befrequency-domain methods, they provide the HRTF direcbut they require the computation of the HRTF at a larnumber of linearly spaced frequencies to invert the Foutransform and extract the HRIR.

Boundary-element methods~or similar finite-differenceand finite-element methods! can be applied to an arbitrarilyshaped boundary surface~Ciskowski and Brebbia, 1991!.However, the continuous surface must be approximated bdiscretely sampled mesh of points in three dimensiospaced at roughly one-tenth of the shortest wavelengthinterest. It is a challenge to obtain a sufficiently accurmesh for the human torso, head and pinnae, even withtaking the possible effects of hair into account. Furthermoto determine the response at high frequencies requiresdense sampling and correspondingly long computation tim~Katz, 2001!.

The multipole reexpansion method used in this papesimilar to the T-matrix method~Waterman and Truell, 1961!,and it extends the classical infinite series solution for a sinsphere~Morse and Ingard, 1968! to scattering by multiplespheres. The technique used employs new expressionreexpansion of multipole solutions~Gumerov and Du-raiswami, 2001a!. Coupled with a procedure for enforcinboundary conditions on the sphere surfaces, it can be usesolve multiple scattering problems in domains containmultiple spheres~Gumerov and Duraiswami, 2001b!. Nomeshes are required. Although reexpansion requires theof numerical methods to solve the linear equations thatfine the boundary conditions, space and frequency cansampled with arbitrarily fine resolution. In the particular cawhere the spheres are coaxial, multipole reexpansion caseveral orders of magnitude faster than boundary-elemmethods. However, in the current version, convergence plems limit the highest frequencies that can be investigate

In this paper, each method is used for a different ppose. Acoustic measurements are used to obtain the Hfor the KEMAR mannequin and to validate the numericmethods. Multipole reexpansion is used for systematic sties of the snowman with a spherical torso. The bounda

eayded

Algazi et al.: Geometric head-related transfer function models

Page 3: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

id-r

thoos

oheanatie

thu

w

st

t oeniminha

onwct

sub

diboenca

ballallelballith

l,ron-sa-rsup-

re-

per

inltz

,

nt

ofs areld

red

cmobthd. en-

is

element method is used for the snowman with an ellipsotorso. In the process, we~a! mutually validate the computational methods used,~b! identify the features of HRTFs fosimple geometric models of the head and torso,~c! evaluatethe adequacy of the snowman as an approximation tohuman head and torso, and~d! use the snowman model treveal the first-order effects of the torso and to identify psible localization cues.

II. METHODS

A. Measurement procedure

Acoustically measured HRTFs were obtained for twobjects: the KEMAR mannequin shown in Fig. 1 and tphysical snowman model shown in Fig. 2. Because headtorso effects are obscured by the presence of the pinKEMAR’s pinnae were removed and the exposed caviwere filled with putty and tape. Two Etymot̄ic ResearchER-7C microphones were placed inside the head, withprobe tips emerging at the entrance of the ear canals flwith the surface of the head. The Golay-code techniqueused to measure the HRIRs~Zhou et al., 1992!. The testsounds were played through 3.2-cm-radius Bose Acoumass™ Cube speakers mounted on a 1-m-radius hoopwas rotated about a horizontal axis through the midpointhe interaural axis. The sampling rate for the measuremwas 44.1 kHz. To remove room reflections, the resultingpulse responses were windowed using a modified Hammwindow that eliminated everything occurring 2.5 ms after tinitial pulse. The windowed responses were free-field equized to compensate for the loudspeaker and microphtransfer functions. Because the small loudspeakers usedinefficient radiators at low frequencies, the low-frequensignal-to-noise ratio was poor, and it was not possiblecompletely restore the response below 500 Hz. As a remeasured HRTF values below 500 Hz should eithertreated with suspicion or ignored.

To extend the useful range by another octave, the raof the head of the physical snowman was made to be ahalf that of a human head, and the results were subsequscaled in frequency accordingly. Specifically, the physi

FIG. 2. The physical snowman model, which is composed of a 4.15-radius boccie ball resting on top of a 10.9-cm-radius bowling ball. The prtube microphone is inside the boccie ball, with the probe tip flush withsurface. The bowling ball is supported by a 0.5-cm-radius cylindrical ro

J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

al

e

-

nde,s

eshas

ti-hatfts-g

el-e

ereyolt,e

usuttlyl

snowman model consisted of a 4.15-cm-radius boccieresting on top of a 10.9-cm-radius bowling ball, with a smcollar added to keep the head from rolling off. The modwas supported by a 0.5-cm-radius metal rod. The bocciewas drilled to accommodate one ER-7C microphone, wthe probe tip emerging at a horizontal diameter.

For both KEMAR and the physical snowman modemeasurements were made in the horizontal, median and ftal planes~see Fig. 3!. The hoop was rotated in uniform stepof 360 degrees/128'2.8 degrees. For horizontal-plane mesurements, the azimuthu covered a full 360 degrees. Fomedian-plane and frontal-plane measurements, the hoopport structure limited the elevation anglesf and d to theinterval from281.6 to1261.6 degrees, so that no measuments could be made in a68.4-degree cone directly belowthe subject.

B. Computational procedures

Both of the computational methods used in this pasolve the Helmholtz equation~the Fourier transform of thewave equation! at a specified frequency in an infinite domacontaining one or more scattering bodies. The Helmhoequation is given by

¹2p1k2p50, ~1!

wherep is the Fourier transform of the acoustic pressurek5v/c is the wave number,v is the circular frequency, andcis the speed of sound. The incident pressure fieldpinc istypically prescribed as the field from an isotropic poisource, and the goal is to compute the scattered field,pscat

5p2pinc , subject to boundary conditions at the surfacesthe scatterers and at infinity. We assume that the surface‘‘sound hard’’ (]p/]n50), and that the scattered sound fieis outgoing at infinity.

When there is a single spherical scatterer, the scattefield at a point specified by the spherical coordinates (r , u,f! can be written in the form

-eeFIG. 3. The coordinate planes. In the horizontal plane, the azimuthu rangesfrom 0 to 360 degrees. Support structures limit the range of the experimtally measurable elevation anglesf and d. The entire 360-degree rangecovered in the computational solutions.

2055Algazi et al.: Geometric head-related transfer function models

Page 4: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

,nt

wfeite

gsr

aioa

tun

b

ss

ac

th’sab

is

heTe

al

b

a-d

lved

hease

ue

od-fius,Tox-

esndeure

areifi-im-

re-he

in-nitplyly

histhe

ithiderauseof

ni-inggesar-

othent

n by

par-is

ear.teraln-nt,’’dthis

pscat5(l 50

`

(m52 l

l

almhl~kr !Ylm~u,f!, ~2!

where hl(•) is the l th-order spherical Hankel functionYlm(•,•) are the spherical harmonics, and the coefficiealm are determined by requiring thatp5pinc1pscat satisfiesthe boundary conditions on the sphere. Such a solutionused by Duda and Martens~1998! to represent the HRTF oa spherical head. If there areN spheres in the domain, oncan exploit the linearity of the Helmholtz equation and wrthe solution as

pscat5p11p21¯1pN , ~3!

where

pj5(l 50

`

(m52 l

l

almj hl~kr j !Ylm~u j ,f j !. ~4!

Each of the functionspj is centered at the correspondinsphere, and is expressed in a local spherical coordinatetem. These series are truncated at some finite numbeterms, and the coefficientsalm

j are found by requiring that theboundary conditions at the surface of each sphere be sfied. The procedure for doing this using multipole translatand reexpansion is presented in Gumerov and Duraisw~2001b!.

When the scattering surfaces are not spherical andmultipole reexpansion technique cannot be used, the boary element method is used instead. This method worksusing Green’s identity to write Eq.~1! as an integral equationfor the acoustic pressure. This equation specifies the presat afield point X on the surface of the acoustic domain a

CXpX5EGY

FGXY]pY

]nY 2]GXY

]nY pYGdGY , ~5!

whereG is the surface of the acoustic domain,n is the unitoutward normal vector to the acoustic domain at a surf~source! point Y, G is the free-space Green’s function, andCis the jump term that results due to the treatment ofsingular integral involving the derivative of the Greenfunction. The Green’s function for the three-dimensionfree-space problem, expressed in terms of the wave numk and the distancer between the source and field points,

GXY5exp$2 ikr %

4pr, ~6!

where i 5A21 is the complex constant. The surface of tscatterers is discretized using plane triangular elements.equation is written at each boundary element, and a linsystem of equations is obtained, which can be symbolicrepresented as

@F#$P%5@G#H ]P

]n J , ~7!

where@F# and @G# are matrices whose coefficients are otained by evaluating integrals involving]G/]n and G ker-nels, respectively;$P% is the vector of acoustic pressuresthe surface nodes, and$]P/]n% is the vector of normal derivatives of the pressure. Imposition of the boundary con

2056 J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

s

as

ys-of

tis-nmi

hed-y

ure

e

e

ler

hearly

-

t

i-

tions leads to a system of linear equations that can be sofor the pressure.

Both the multipole reexpansion method and tboundary-element procedure yield the magnitude and phof the HRTF at a particular frequencyf . Computationaltime/accuracy tradeoffs limit the maximum allowable valof ka, wherea is the head radius andk52p f /c is the wavenumber,c being the speed of sound. For the snowman mels that were investigated, the maximum useable value okawas approximately 10. For the standard 8.75-cm head radthis corresponds to a maximum frequency of about 6 kHz.be conservative, all HRTF calculations were limited to eactly 5 kHz.

To obtain the HRIRs, the HRTFs for 500 frequenciuniformly spaced from 0 Hz to 5 kHz were calculated, athe ifft function in MATLAB™ was used to calculate thinverse discrete Fourier transform. Because this procedimplicitly assumes that values of the HRTF above 5 kHzall zero, direct use of the inverse transform leads to signcant and distracting Gibbs phenomenon ripples in thepulse response. For graphical display, these ripples weremoved by applying a standard Hamming window to tmagnitude spectrum, leaving the phase unchanged~Oppen-heim and Schafer, 1969!. Like low-pass filtering, windowingsmooths the impulse response, reducing the height andcreasing the width of pulses. For the window used, a uimpulse that would have a duration of 0.2 ms when sharband limited to 5 kHz has its peak height approximatehalved and its duration approximately doubled. Although tresults in some loss of information, it greatly increasesclarity of the graphs.

III. THE FRONTAL-PLANE HEAD-AND-TORSO HRTF

To investigate the behavior of the HRTF, we start wthe familiar spherical-head model and subsequently consa sequence of progressively more complex cases. Becthe results for the frontal plane exhibit a greater varietybehavior, we focus on it first.

A. The spherical head model

Figure 4 shows two image representations of the magtude of the right-ear, frontal-plane HRTF for a sphere havthe standard 8.75-cm head radius. These particular imawere created using the algorithm presented in Duda and Mtens~1998!, but the same results were also produced by bthe multipole reexpansion code and the boundary elemcode. Brightness corresponds to dB magnitude as showthe grayscale bar at the top of the image. In Fig. 4~a!, eachvertical line corresponds to the frequency response at aticular elevation angle,d. At low frequencies the response0 dB for any elevation angle. The largest response~which isapproximately 6 dB at high frequencies! occurs at d50 degrees, where the source points directly at the rightAs expected, the response is generally large on the ipsilaside (290 degrees,d,90 degrees) and small on the cotralateral side (90 degrees,d,270 degrees). However, othe contralateral side the sphere exhibits a ‘‘bright spowhich appears in Fig. 4~a! as a bright vertical streak centereat d5180 degrees. The dark bands on each side of

Algazi et al.: Geometric head-related transfer function models

Page 5: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

tsuen

n-se0ri

on

ha

a

serae

e atan

ncetionlingr iseral

uc-, at tothat. 2.e-gs.h-ed 1

rns

detaneig

thsp

Fig.t are

theed by

streak are interference patterns whose regularity is due toperfect symmetry of the sphere; more irregularly shapedfaces have the same general behavior, but the interferpatterns become smeared.

Figure 4~b! is a useful alternative remapping of the iformation in Fig. 4~a!. In this polar plot, frequency rangefrom 0 to 5 kHz along any radial line. The center of thimage corresponds tof 50, where the response is exactlydB. The frequency response for incident sound waves aring at an angled is found along the radial line at the angledas shown. This puts the HRTF display into direct correspdence with the coordinate system shown in Fig. 3.

The HRIR, which includes both the magnitude and tphase response of the HRTF, is particularly useful for reveing multipath components of the response. Figure 5~a! showsa family of HRIR curves corresponding to Fig. 4. Note ththe ipsilateral response (290 degrees,d,90 degrees) isnot only stronger than the contralateral respon(90 degrees,d,270 degrees), but it also occurs soonThe approximately 0.7 ms difference in the arrival timesd50 degrees andd5180 degrees is the maximum ITD. Th

FIG. 4. Image representations of the magnitude of the HRTF for an irigid sphere. The magnitude in dB is represented by brightness. The dafor the right ear of an 8.75-cm-radius sphere. The images are for the froplane ~see Fig. 3!. In ~a!, the left half of the image corresponds to thipsilateral side, and the right half to the contralateral side. Note the brspot that appears as a vertical streak atd5180 degrees.~b! shows the samedata in polar coordinates. The elevation angled corresponds directly to thefrontal view in Fig. 3. Thus, once again the left half corresponds toipsilateral side and the right half to the contralateral side, but the brightappears as a broad horizontal streak.

J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

her-ce

v-

-

el-

t

e.t

bright spot appears as a local maximum in the responsd5180 degrees. In the vicinity of the bright spot, one csee a second pulse that follows the first pulse@see Fig. 5~b!#.It is this second pulse that is the source of the interferepatterns seen in the frequency domain. A rough interpretais that one pulse is composed of wave components travearound the ipsilateral side of the sphere, and the othecomposed of components traveling around the contralatside; their in-phase confluence atd5180 degrees is thesource of the bright spot.3

B. The physical snowman model

We now examine the effects produced by the introdtion of the torso. Using the same 8.75-cm head radius23-cm spherical torso is added directly below and tangenthe head. This results in a ratio of head size to torso sizeis the same as that for the physical snowman shown in Fig

The frontal plane HRTF computed by the multipole rexpansion method is shown in Fig. 6. Comparison of Fi4~a! and 6~a! reveals two major differences. First, three arcshaped notches centered neard580 degrees appear on thipsilateral side. The lowest-frequency notch occurs arounkHz atd580 degrees, where the response dips to25 dB. Aswas shown in Algazi et al. ~2001a!, these elevation-dependent notches are comb-filter interference patte

alis

tal

ht

eot

FIG. 6. The computed HRTF for the physical snowman model shown in2, scaled for a head radius of 8.75 cm. The arch-shaped notches thasymmetric aboutd590 degrees are due to specular reflections fromupper torso. The deeper notches around 210 to 250 degrees are caustorso shadow. A torso bright spot can be seen aroundd5255 degrees.

-d

al

et-kl

ecein

FIG. 5. ~a! The HRIR for the sphere.The waveforms have a 5-kHz bandwidth, and the spectrum was smoothewith a Hamming window before inver-sion. The bright spot appears as a locmaximum atd5180 degrees.~b! Anexpanded plot of the impulse responsat d5150 degrees. This illustrates thanear the bright spot the impulse response is bimodal. The weaker peacan be attributed to waves that travearound the contralateral side of thsphere. This second pulse is the sourof the interference patterns seenFig. 4.

2057Algazi et al.: Geometric head-related transfer function models

Page 6: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

aa

psi

m

onon

o

vr

tw

. Ilinapon

-thre

i

raE

rathharaFo

otsi-ghtenetrics ofex-

re-he.ctedion.orthehetedheow-

ex-sed

hend-n-for

stheme-hen

that

at

50

mae

ay

re

tioneotision

caused by torso reflection. They extend throughout thedible frequency range, and provide cues for elevation awfrom the median plane.

A second major difference is the appearance of deeand more closely spaced notches on the contralateralbelow the horizontal plane, whered ranges from roughly 195to 250 degrees. These low-elevation notches, which forpattern of parallel lines in Fig. 6~b!, fall in the torso-shadowcone; combined with head shadow, they cause the respfor frequencies above 1 kHz to be much lower on the ctralateral side than on the ipsilateral side.

Somewhat surprisingly, the lowest response does notcur when the source is directly below (d5290 or1270 degrees). Instead, another bright spot appears atlow elevations. This ‘‘torso bright spot’’ is particularly cleain Fig. 6~b!, where it forms a bright radial ridge neard5255 degrees. Thus, the snowman model exhibitsbright spots, one due to the head aroundd5180 degrees, andone due to the torso aroundd5255 degrees.

There is a simple explanation for the torso bright spota sound source below the torso were directed along afrom the center of the torso to the location of the right eand if the head did not disturb the sound field, a bright swould be formed on the contralateral side of the torso awould strongly ‘‘illuminate’’ the right ear. For the dimensions of the physical snowman, the elevation angle forline is 254.6 degrees, which is consistent with this interptation.

The torso-shadow cone for the physical snowmanshown drawn to scale in Fig. 7~a!. The ipsilateral limit isdefined by the ray AE tangent to the torso on the ipsilateside, and the contralateral limit is defined by the ray Ctangent to the torso on the contralateral side. BE, thethrough the center of the torso to the right ear, bisectstorso-shadow region. Sources in the ipsilateral zone are sowed by just the torso, while sources in the contralatezone are shadowed by both the torso and the head. In7~b! these three rays are shown superimposed on the p

FIG. 7. Boundaries of the torso-shadow cone for the physical snowmodel. The bisecting ray BE through the center of the torso to the rightshown in~a! defines the direction of the torso bright spot. The tangent rAE and CE are shown superimposed on the computed HRTF in~b!. There isa close correspondence between the geometrical boundaries and theof reduced response.

2058 J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

u-y

erde

a

se-

c-

ery

o

fe

r,td

is-

s

l

yed-l

ig.lar

HRTF plot of Fig. 6~b!. Although the correspondence is nperfect, AE is closely aligned with the edge of reduced iplateral response, BE is closely aligned with the torso brispot, and CE is roughly aligned with the edge of the evmore reduced contralateral response. Thus, the geomtorso-shadow cone is in good agreement with the zonereduced response. A similar geometric argument helps toplain why the torso reflection notches attain their lowest fquencies neard580 degrees. If the head is removed, tdiagram in Fig. 7~a! is symmetric about the ray from B to EOutside of the torso shadow cone, a pulse of sound direto the ear at E is followed by a subsequent torso reflectFor pulses directed along the tangent ray from A to Ealong the tangent ray from C to E, the delay betweeninitial pulse and the reflection is zero. By symmetry, tmaximum time delay occurs for an overhead ray direcfrom E to B, and this leads to the lowest frequency for tinterference notch. For the dimensions of the physical snman, this overhead ray would be found atd574.6 degrees.Although the head disturbs the symmetry, this argumentplains why the angle for the lowest notch frequency is biabelow 90 degrees.

Additional insight can be obtained by comparing tHRIRs for the sphere and the snowman. The 5-kHz bawidth HRIR, computed by inverting the multipole reexpasion, is shown in Fig. 8. Comparing these results to thosethe isolated sphere@Fig. 5~a!#, three prominent differencecan be seen. First, the torso reflection is clearly present insnowman HRIR in the general range of elevations froabout230 to 1150 degrees. The maximum time delay btween the main pulse and the torso reflection occurs wthe source is overhead, and is approximatelyDT50.7 ms. Inthe frequency domain, this corresponds to the first notchoccurs at f 051/(2DT)5700 Hz. This value is in goodagreement with the location of the lowest frequency archd580 degrees in Fig. 6~a!.

Second, the response in the interval from 200 to 2

nars

gion

FIG. 8. The computed HRIR for the physical snowman. The torso reflecis prominent when the elevationd is between230 and 150 degrees. Thhead bright spot can be seen neard5180 degrees, and the torso bright spneard5255 degrees. The response ford between 200 and 250 degreesflattened and broadened by torso shadow. The long thin ‘‘tail’’ in this regis responsible for the strong interference notches seen in Fig. 6~b!. A sym-metric tail for d between290 and240 degrees is barely visible.

Algazi et al.: Geometric head-related transfer function models

Page 7: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

aenhiehiaatathonrs

heer

mexheu

rr

njus

delaled. Ased

areonnce

forgen-eralhebit

datapat-thisec-e ofular

haten-theex-ely

h ay a

upingw-

ea-ted

h ator-

re-cy

entivesro-.

Fipa

degrees is significantly broader and flatter for the snowmthan for the isolated sphere. This corresponds to the presof torso shadowing on the contralateral side. Third, in tsame interval there is a long, thin ‘‘tail’’ whose arrival timchanges rapidly with elevation. Like the torso reflection, tsecond pulse produces notch filter interference patterns,is the source of the deep notches that appear in the contreral torso shadow zone in Fig. 6. This presence for thiscan be explained by imagining that waves traveling overtorso from the contralateral side of the torso-shadow ccan be divided into two groups. The group that arrives fitravels over the contralateral side@along the line CE in Fig.7~a!#, where it is further shadowed by the head, while tgroup that arrives later takes a longer path over the ipsilatside @along the line AE in Fig. 7~a!#, and is not subject tohead shadow. Although this is a crude explanation of a coplicated diffraction and scattering phenomenon, it alsoplains why the same ‘‘tail’’ is much less prominent when tsource is on the ipsilateral side, where the first arriving groof waves is not subject to head shadow but the second aing group~the ‘‘tail’’ ! is attenuated by head shadow.

IV. VALIDATION OF THE MULTIPOLE REEXPANSIONTECHNIQUE

To confirm the validity of the multipole reexpansiomethod and of the characteristics of the snowman HRTFpresented, we now compare those results to the result

FIG. 9. The measured HRTF for the physical snowman model shown in2, scaled for a head radius of 8.75 cm. These results should be comwith the computational results in Fig. 6.

J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

nce

s

sndlat-ileet

al

--

piv-

stof

acoustic measurements of the physical snowman moshown in Fig. 2. Figure 9 shows the measured HRTFs, scin frequency to match the standard 8.75-cm head radiuswas mentioned in Sec. II A, experimental constraints limitthe elevation angles to the interval from281.6 to1261.6 degrees; regions for which no data are availableshown in solid black. Ignoring those regions, a comparisof Figs. 6 and 9 shows a generally very good correspondebetween the computational and the acoustic results.

In particular, the torso reflections and the bright spotthe head are in excellent agreement. There is also gooderal agreement regarding the behavior in the contralattorso shadow region, including the torso bright spot. Tmain difference is that the interference notches are adeeper and more closely spaced in the computationalthan in the measured results. Because such interferenceterns are quite sensitive to small changes in path lengths,difference could be caused by any of a number of imperftions in the experimental conditions, such as the presenca supporting rod, the presence of the collar, or small angmisalignments.

Comparison of the HRIRs reveals some differences tare not as evident in the frequency domain. As was mtioned in Sec. II A, it was not possible to compensate forlow response of the loudspeakers below 500 Hz withoutposing low-frequency noise. The measured data is effectiva high-pass-filtered version of the true responses, wit3-dB corner frequency around 500 Hz. This is revealed bslight darkening at the top of Fig. 9~a! that is not visuallyprominent in the frequency domain, but that showsclearly in the time domain as a negative overshoot followthe main pulse. The high-pass filtering hides the lofrequency noise that is present in the measured data.

Thus, in the comparison of the computed and the msured impulse responses shown in Fig. 10, the compuresponse was filtered by a single-pole high-pass filter wit500-Hz corner frequency to introduce a comparable distion to the curve@see Fig. 10~a!#. In addition, a Hammingwindow was used to band-limit the measured impulsesponses to 5 kHz to account for the lack of high-frequenenergy in the computed results@see Fig. 10~b!#. With thesecorrections understood, there is again very good agreembetween the computed and the measured data. This gconfidence that the multipole reexpansion technique is pviding correct HRTF values in both magnitude and phase

g.red

n.ss

e

-

s-

FIG. 10. Comparison of the~a! com-puted and~b! measured impulse re-sponses for the physical snowmaThe computed HRIR was high-pasfiltered to simulate the effects of losof low-frequency information in themeasured data. This accounts for thnegative overshoot following the mainpulse. Similarly, the measured HRIRwas low-pass filtered~bandlimited to 5kHz! to simulate the absence of highfrequency information in the com-puted results. The range of elevationis limited to the range for which measurements are available.

2059Algazi et al.: Geometric head-related transfer function models

Page 8: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

xit

inrsw-s

ermp

or

teiny

iufnmr

th1i-thh

oidnt

ar’’

xi-ndhesd

ro-thatc-nd,Rw

orsothate istheh aas

en-y afun-oxi-xissy

atalltheoleerearetal-by

eim-

.wnIntrongs atteralon-

n to-r by

he

t isl.

hift.

an

V. EVALUATING THE SNOWMAN MODEL

To investigate how well a human HRTF can be appromated by the HRTF of the snowman model, we decidedcompare the HRTF for the KEMAR mannequin shownFig. 1 to two snowman models, one with a spherical toand one with an ellipsoidal torso. This process involves tsteps:~a! geometrically fitting the snowman models to KEMAR, and~b! comparing the resulting HRTFs. Each proceis explained in turn.

A. Fitting the snowman model to KEMAR

The optimum snowman model for KEMAR is the onwhose HRTF best matches KEMAR’s HRTF. However, theis no obvious or well-established measure of error in coparing two HRTFs. As a consequence, it was decided simto match anthropometric characteristics. The head and tcomponents were fit separately, placing more emphasisfitting the upper torso than the lower torso. Although betresults could be obtained with a more sophisticated fittprocedure, the method used had the virtue of being easunderstand and apply.

The spherical head is defined by its center and radThe center was located at the center of a bounding boxKEMAR’s head~see Fig. 11!. When the regression equatioin Algazi et al. ~2001b! was used to estimate the optimuhead radius, a value of 8.70 cm was obtained, which ismarkably close to the standard 8.75-cm value. To fittorso, the five control points shown as open circles in Fig.were used.4 This was almost sufficient to define an ellipsodal torso model whose principal axes were parallel tocoordinate axes, but left one degree of freedom free. Twas resolved by arbitrarily placing the center of the ellipsat the intersection of the lines joining the left/right and froback control points.

The resulting spherical head and ellipsoidal torsoshown in Fig. 11. The dimensions for this ‘‘best-fittingmodel were as follows:

Head radius,a 8.7 cm

Torso half height,bh 19.3 cm

Torso half width,bw 21.5 cm

Torso half depth,bd 11.6 cm

Torso geometric mean radius,b 16.9 cm

FIG. 11. Fitting a spherical head and ellipsoidal torso to the KEMAR mnequin.

2060 J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

-o

oo

s

e-lysoonrgto

s.or

e-e1

eis

/

e

Torso displacement from head, back,Db 1.7 cm

Torso displacement from head, down,Dd 30.9 cm.

The resulting snowman model provides just a first appromation to KEMAR. The spherical head is wider, shorter, ashallower than the real head. The ellipsoidal torso matcthe real torso fairly well on top, but is badly mismatchebelow the control points. This was an intentional compmise, and was made for three reasons. First, an ellipsoidwould provide a better match to the lower torso would neessarily sacrifice the more important upper region. Secoin the torso shadow region where the match is poor, KEMAis no longer a realistic model for a human. Third, at very loelevations, the wavelengths that are unattenuated by tshadow are sufficiently long that there is reason to hopea crude fit will be adequate and that the specific shaprelatively unimportant. Finally, the head is not tangent totorso, leaving a gap where the neck should be. Althougmore realistic model would include a neck component, it womitted in the interest of simplicity.

For an even simpler approximation, the head was ctered over the torso, and the ellipsoid was approximated bsphere whose radius~16.9 cm! was the geometric mean othe semiaxes of the ellipsoid, so that the volume waschanged. The resulting spherical torso is a coarser apprmation, exhibiting a symmetry about the vertical polar athat is not realistic. However, it provides a model that is eato understand and evaluate.

B. Comparing the KEMAR and the snowman HRTFs

We now compare the pinnaless KEMAR HRTF to thof the spherical-torso and ellipsoidal-torso models. Incases, the KEMAR HRTFs were measured acoustically,spherical-torso results were computed using the multipreexpansion method, and the ellipsoidal-torso results wcomputed using the boundary-element method. We compfrontal-plane results in some detail, and compare horizonplane and median-plane results more briefly. We startcomparing the KEMAR HRTF to the HRTF for thspherical-torso approximation, and then examine theprovement that the ellipsoidal-torso model can provide.

1. Frontal plane

The frontal-plane KEMAR HRTF is shown in Fig12~a!, and the corresponding spherical-torso HRTF is shoin Fig. 12~b!. The gross characteristics are very similar.particular, in both cases one can see the presence of a sresponse on the ipsilateral side, torso reflection notcheupper elevations, head shadowing on the upper contralaside, and even stronger torso shadowing on the lower ctralateral side. Thus, the model provides an approximatioKEMAR that captures important, first-order effects. However, there are numerous differences as well, caused eithethe simplicity of the model or the choices made in fitting tmodel to KEMAR.

On the upper contralateral side, the head bright spoalmost as well defined for KEMAR as it is for the modeHowever, it occurs aroundd5155 degrees for KEMAR ver-sus 180 degrees for the model, which is a 25-degree s

-

Algazi et al.: Geometric head-related transfer function models

Page 9: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

tho

erm

arb

eld

th.s

heefoties

asolahe

n

t isisis-nd

dible

isotew-so,Fs.the

delrical

thehe

hane

dscect

ondernsrplyso.soorso

eadin

ral-rsohepthpy

chesestivex-be

ow

calthe,ow-all

The reason for this shift can be traced to the way thatspherical head was fit to KEMAR’s head. The actual topKEMAR’s head is about 3 cm above the top of the sph~see Fig. 11!. If the spherical head were to be shifted up 3 cwithout moving thez coordinate of the ears in space, the ewould no longer be across a diameter, but would be downsin21(3/8.7)'20 degrees, which would introduce the corrsponding shift in the position of the bright spot and wouaccount for most of the discrepancy. However, shiftinghead up results in a poorer fit to the data at low elevationsthis difference is important, a better solution would be to uan ellipsoidal head model~Dudaet al., 1999!.

On the upper ipsilateral side, the torso-reflection arcfor KEMAR and the model are very similar, with thhighest-frequency fifth arch being even better definedKEMAR than for the model. However, there is a systemashift in the elevation at which the notches reach their lowfrequency, being aroundd595 degrees for KEMAR andaround 75 degrees for the model. This shows up clearly20-degree rotation of the torso reflection notches in the pplots. Using the dimensions for the model, the heuristicgument given in Sec. III B predicts that the elevation for tlowest frequency should appear atd590 degrees2tan21(8.7/30.9)'74 degrees, which is in close agreeme

FIG. 12. Measured and computed frontal-plane HRTFs.~a! KEMAR withthe pinnae removed.~b! The spherical-torso snowman approximation.~c!The ellipsoidal-torso snowman approximation.

J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

efe

sy

-

eIfe

s

rct

aarr-

t

with what is observed. The exact reason why this poindifferent for KEMAR is not completely understood, but itprobably a combination of the effect of the neck and a mmatch between the surface orientation of the ellipsoid athe top of the torso in the vicinity of the neck.

Although a small zone directly below KEMAR coulnot be measured, the ipsilateral torso shadow that is vismatches the model well. The contralateral torso shadowmore complex for KEMAR than it is for the model, and—nsurprisingly—there is no sign of a torso bright spot. The ninterference patterns that appear ford.240 degrees are undoubtedly due to scattering from the truncated lower torand are not expected to be encountered in human HRTHowever, the general presence of strong torso shadow oncontralateral side for both KEMAR and the snowman mopresents a strong contrast with the behavior of the sphehead model alone~cf. Fig. 4!.

The boundary-element method was used to computefrontal-plane HRTF for the ellipsoidal-torso model, and tresults are shown in Fig. 12~c!. As might be expected, theresults are closer to those for the spherical-torso model tto KEMAR. There are two primary effects of stretching thspherical torso into an ellipsoid:~a! introducing angular an-isotropy, and~b! breaking up the perfect symmetry that leato the torso bright spot. Because there is little differenbetween the height and width of the ellipsoid, the first effeis not very prominent in a frontal-plane response. The seceffect can be seen, however, in that the interference pattin the contralateral torso shadow region are not as shadefined for the ellipsoidal torso as for the spherical torIndeed, in this region, the polar plot for the ellipsoidal torseems to be intermediate between that for the spherical tand that for KEMAR.

2. Horizontal plane

The torso has relatively little effect on the HRTF in thhorizontal plane. The HRTF for the isolated spherical helooks exactly the same in the horizontal plane as it doesthe frontal plane~see Fig. 4!. The horizontal-plane results foKEMAR, the spherical-torso snowman, and the ellipsoidtorso snowman are shown in Fig. 13. The effects of toreflection are more irregular in the KEMAR data than in tmodels. The difference between torso width and torso defor the ellipsoidal-torso snowman introduces an anisotrothat can be seen as a flattening of the torso reflection arin Fig. 13~c!, but the effect is small. Perhaps the greatdifference between KEMAR and the models is the relatweakness of the head bright spot in KEMAR. As was eplained in the discussion of frontal-plane results, this canattributed to the fact that KEMAR’s ears are displaced belthe center of the head.

3. Median plane

In the median plane, the HRTF for the isolated spherihead is uninteresting, because there is no variation withelevation angle,f. The median-plane results for KEMARthe spherical-torso snowman, and the ellipsoidal-torso snman are shown in Fig. 14. Above the horizontal plane,

2061Algazi et al.: Geometric head-related transfer function models

Page 10: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

soa

s,sopaly

oilyla

tarssorsr

ailere

eldd

n-tly,to

o-andom-del.

etterso

ideen

ve, buttheensese

par-der

toallar

n.

three results are very similar, with the slightly flatter torreflection arches for the ellipsoidal model providingslightly better fit to the KEMAR data. At lower elevationthe differences between the spherical torso and the ellipdal torso models are greater than might be expected. Inticular, the ellipsoidal-torso model exhibits two moderatedeep interferences notches, with a bright spot at the pdirectly below, while the spherical-torso model exhibits ona general shadowing. The low-elevation shadowing is retively weak compared to what is observed in the fronplane. This is explained by the fact that head and toshadow both occur in the frontal plane, while only torshadowing occurs in the median plane. The ellipsoidal-toresults, although less easily explained, seem to be closewhat is observed in the KEMAR data. However, once agKEMAR’s sharply truncated torso introduces some compinterference patterns at low elevations that are not reallyevant to human HRTFs.

VI. DISCUSSION AND CONCLUSIONS

The results of this investigation show that simple modof the head and torso can explain the major features founthe pinnaless KEMAR HRTF for distant sources. The ad

FIG. 13. Measured and computed horizontal-plane HRTFs.~a! KEMARwith the pinnae removed.~b! The spherical-torso snowman approximatio~c! The ellipsoidal-torso snowman approximation.

2062 J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

i-r-

nt

-lo

otonxl-

sini-

tion of either a spherical or an ellipsoidal torso to the stadard spherical-head model changes the HRTF significanbringing the behavior of the model substantially closerthat of a pinnaless KEMAR. In particular, the torso intrduces reflections when the source is at high elevation,shadow when the source is at low elevations. These phenena are not seen with the isolated spherical head moMore elaborate geometric models should produce even bapproximations, but the simplicity of the spherical-torsnowman model facilitates systematic studies.

Our previous work showed that torso reflections provrelatively weak but genuine elevation cues, particularly whthe source is away from the median plane~Algazi et al.,2001a!. The elevation cues provided by torso shadow hanot been subjected to systematic psychoacoustic testsinformal listening experiments using HRIRs generated bymodels indicate that torso shadow does increase the sthat a virtual auditory source is at a low elevation. Becauthese elevation cues occur at low frequencies, they areticularly important for sources such as footsteps or thunthat have little high-frequency energy.

Simple head and torso models have uses in additionproviding insight into HRTF behavior. As part of a structurmodel of the HRTF, they can be customized to particu

FIG. 14. Measured and computed median-plane HRTFs.~a! KEMAR withthe pinnae removed.~b! The spherical-torso snowman approximation.~c!The ellipsoidal-torso snowman approximation.

Algazi et al.: Geometric head-related transfer function models

Page 11: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

tr-

anri-

esilet

siptll

eaetico

weo

foaand-tr

arte

eraheu

m

innRT

iduls

Te

eigentsrs

rrerof theKE

’’ J.

s

g to

.

n

-

p-

-

ess-

r

c-

--Dtyu/

R-

ei-

ar,’’e-

loc.

i-

to

individuals by matching their parameters to anthropome~Algazi et al., 2001d!. In addition, they can be used to compensate acoustically measured HRTFs, filling in importlow-frequency information that is difficult to measure expementally.

There are several open issues that need further invgation. Perhaps most important, it is clear that the detabehavior of human HRTFs at low elevations is sensitiveposture. However, relatively little is known as to how sentive people are to the acoustic changes that accompanytural changes. Another open question concerns the effecthe neck on the HRTF and its perceptual importance. Finathe sensitivity of the results to displacements of the hrelative to the torso and displacements of the ears relativthe center of the head need to be better understood, parlarly because these displacements have been found tquite substantial for human subjects~Algazi et al., 2001c!.Although the geometric modeling approach cannot ansthe perceptual questions, it offers an attractive way to invtigate the effects of different components of the bodyHRTFs.

ACKNOWLEDGMENTS

The authors would like to thank Dennis Thompsonhis help with much of the experimental work. Support wprovided by the National Science Foundation under GrNos. IIS-00-97256 and ITR-00-86075. Any opinions, finings, and conclusions or recommendations expressed inmaterial are those of the authors and do not necessarilyflect the view of the National Science Foundation.

1To be more precise, the ITD and ILD for the spherical-head modelconstant on a cone of confusion. In the interaural-polar coordinate sysa cone of confusion is a cone of constant interaural-polar azimuthu ip

~Algazi et al., 2001a!. Thus, the ITD and ILD uniquely determine thinteraural-polar azimuth, but provide no information about the interaupolar elevationf ip . A simple coordinate conversion shows that tinteraural-polar azimuth is related to the standard vertical-polar azimuvp and vertical-polar elevationfvp by u ip5sin21@sinuvp cosfvp#. Thus,strictly speaking, the ITD and ILD do not determine eitheruvp or f ip , butinstead determine the product sinuvp cosfvp . However, the main point isthat the spherical-head model does not provide sufficient cues to deterboth the azimuth and the elevation.

2Butler ~1975! gives an excellent survey of the extensive research on pcues. In addition to the static acoustic cues that are captured by the Hthere are also dynamic elevation cues generated by head motion~Perrettand Noble, 1997!. This paper is concerned with static cues only.

3This interpretation is presented by Duda and Martens~1998!. The impulseresponses shown in that paper were computed for a 22.05-kHz bandwand their higher temporal resolution makes it easier to see the two pthat appear in the vicinity of the bright spot.

4These control points were drawn from earlier work on creating an HRdatabase for human subjects in which operational definitions for 17 msurements of the head and torso, including the head height and neck hwere provided~Algazi et al., 2001c!. This unambiguously specified thlocations of four of the torso control points: the left and right arm poiand the front and back torso points. The definition for the top of the toused with the HRTF database gave a point that was too low for the custudy. For this paper, the top of the torso was located by going down fthe center of the bounding box for the head by the amount of one-halhead height plus one third of the neck height. Although this procedurarbitrary, it is well defined and can be applied to subjects other thanMAR.

J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

y

t

ti-d

o-os-ofy,dtou-be

ers-n

rst

hise-

em,

l-

th

ine

aF,

th,es

Fa-ht,

ontme

is-

Algazi, V. R., Avendano, C., and Duda, R. O.~2001a!. ‘‘Elevation localiza-tion and head-related transfer function analysis at low frequencies,Acoust. Soc. Am.109, 1110–1122.

Algazi, V. R., Avendano, C., and Duda, R. O.~2001b!. ‘‘Estimation of aspherical-head model from anthropometry,’’ J. Audio Eng. Soc.49~6!,472–478.

Algazi, V. R., Duda, R. O., Thompson, D. M., and Avendano, C.~2001c!.‘‘The CIPIC HRTF database,’’ inWASPAA’01~Proc. 2001 IEEE ASSPWorkshop on Applications of Signal Processing to Audio and Acoustic!,New Paltz, New York, pp. 99–102.

Algazi, V. R., Duda, R. O., Morrison, R. P., and Thompson, D. M.~2001d!.‘‘Structural composition and decomposition of HRTFs,’’ inWASPAA’01~Proc. 2001 IEEE ASSP Workshop on Applications of Signal ProcessinAudio and Acoustics!, New Paltz, New York, pp. 103–106.

Batteau, D. W.~1967!. ‘‘The role of the pinna in human localization,’’ ProcR. Soc. London, Ser. B168, 158–180.

Blauert, J. P.~1997!. Spatial Hearing~revised edition! ~MIT, Cambridge,MA !, pp. 50–93, 373–374.

Brungart, D. S., and Rabinowitz, W. R.~1999!. ‘‘Auditory localization ofnearby sources. Head-related transfer functions,’’ J. Acoust. Soc. Am.106,1465–1479.

Burkhardt, M. D., and Sachs, R. M.~1975!. ‘‘Anthropometric manikin foracoustic research,’’ J. Acoust. Soc. Am.58, 214–222.

Butler, R. A. ~1975!. ‘‘The influence of the external and middle ear oauditory discriminations,’’ inHandbook of Sensory Physiology, edited byW. D. Keidel and W. D. Neff~Springer Verlag, New York!, pp. 247–260.

Carlile, S. ~ed.! ~1996!. Virtual Auditory Space: Generation and Applications ~Landes, Austin, TX!, pp. 38–45.

Ciskowski, R. D., and Brebbia, C. A.~eds.! ~1991!. Boundary ElementMethods in Acoustics~Computational Mechanics Publications, Southamton!.

Duda, R. O., and Martens, W. L.~1998!. ‘‘Range dependence of the response of a spherical head model,’’ J. Acoust. Soc. Am.104, 3048–3058.

Duda, R. O., Avendano, C., and Algazi, V. R.~1999!. ‘‘An adaptable ellip-soidal head model for the interaural time difference,’’ inICASSP’99~Proc.IEEE International Conference in Acoustics Speech and Signal Procing!, Vol. II, pp. 965–968.

Gardner, M. B., and Gardner, R. S.~1973!. ‘‘Problem of localization in themedian plane: Effect of pinna cavity occlusion,’’ J. Acoust. Soc. Am.53,400–408.

Genuit, K., and Platte, H. J.~1981!. ‘‘Untersuchungen zur Realisation einerichtungsgetreuen U¨ bertragung mit elektroakustischen Mitteln~Investiga-tions on the implementation of directionally faithful transmission by eletroacoustical means!,’’ Fortschritte der Akustik, FASE/DAGA ’81 Berlin~VDE-Verlag, Berlin!, pp. 629–632.

Gumerov, N. A., and Duraiswami, R.~2001a!. ‘‘Fast, exact, and stable computation of multipole translation and rotation coefficients for the 3Helmholtz equation,’’ UMIACS Technical Report TR-2001-44, Universiof Maryland, College Park, MD (http://www.umiacs.umd.ed;ramani/pubs/multipole.pdf).

Gumerov, N. A., and Duraiswami, R.~2001b!. ‘‘Multiple scattering from Nspheres using multipole reexpansion,’’ UMIACS Technical Report T2001-72, University of Maryland, College Park, MD(http://www.umiacs.umd.edu/;ramani/pubs/multisphere.pdf).

Gumerov, N., Duraiswami, R., and Tang, Z.~2002!. ‘‘Numerical study of theinfluence of the torso on the HRTF,’’ inICASSP’02~Proc. IEEE Int. Conf.Acoustics Speech and Signal Processing!.

Hartley, R. V. L., and Fry, T. C.~1921!. ‘‘The binaural localization of puretones,’’ Phys. Rev.18, 431–442.

Kahana, Y., and Nelson, P. A.~2000!. ‘‘Spatial acoustic mode shapes of thhuman pinna,’’ preprint 5218,109th Convention Audio Engineering Socety, Los Angeles, CA.

Kahana, Y., Nelson, P. A., Petyt, M., and Choi, S.~1999!. ‘‘Numerical mod-eling of the transfer functions of a dummy-head and of the external ePaper s52654,AES 16th International Conference on Spatial Sound Rproduction, Rovaniemi, Finland.

Katz, B. F. G.~2001!. ‘‘Boundary element method calculation of individuahead-related transfer function. I. Rigid model calculation,’’ J. Acoust. SAm. 110, 2440–2448.

Kuhn, G. F.~1977!. ‘‘Model for the interaural time differences in the azmuthal plane,’’ J. Acoust. Soc. Am.62, 157–167.

Kuhn, G. F. ~1987!. ‘‘Physical acoustics and measurements pertainingdirectional hearing,’’ inDirectional Hearing, edited by W. A. Yost and G.Gourevitch~Springer Verlag, New York!, pp. 3–25.

2063Algazi et al.: Geometric head-related transfer function models

Page 12: Approximating the head-related transfer function using ... · tally ~Mehrgardt and Mellert, 1977; Shaw, ... rounding sphere has not been systematically measured or ... lems limit

-

l

-f

in

n

Am.

Kuhn, G. F., and Guernsey, R. M.~1983!. ‘‘Sound pressure distributionabout the human head and torso,’’ J. Acoust. Soc. Am.73, 95–105.

Lopez-Poveda, E. A., and Meddis, R.~1996!. ‘‘A physical model of sounddiffraction and reflections in the human concha,’’ J. Acoust. Soc. Am.100,3248–3259.

Mehrgardt, S., and Mellert, V.~1977!. ‘‘Transformation characteristics of theexternal human ear,’’ J. Acoust. Soc. Am.61, 1567–1576.

Morse, P. M., and Ingard, K. U.~1968!. Theoretical Acoustics~Princeton U.P., Princeton, NJ!, pp. 418–441.

Oppenheim, A. V., and Schafer, R. W.~1969!. Discrete-Time Signal Processing ~Prentice Hall, Englewood Cliffs, NJ!, pp. 447–450.

Perrett, S., and Noble, W.~1997!. ‘‘The effect of head rotations on verticaplane sound localization,’’ J. Acoust. Soc. Am.102, 2325–2332.

Shaw, E. A. G.~1974!. ‘‘The external ear,’’ inHandbook of Sensory Physiology, Vol. V/I: Auditory System, edited by W. D. Keidel and W. D. Nef~Springer Verlag, New York!, pp. 455–490.

Shaw, E. A. G.~1997!. ‘‘Acoustical features of the human external ear,’’Binaural and Spatial Hearing in Real and Virtual Environments, edited by

2064 J. Acoust. Soc. Am., Vol. 112, No. 5, Pt. 1, Nov. 2002

R. H. Gilkey and T. R. Anderson~Lawrence Erlbaum, Mahwah, NJ!, pp.25–47.

Strutt, J. W.~Lord Rayleigh! ~1907!. ‘‘On our perception of sound direc-tion,’’ Philos. Mag.13, 214–232.

Waterman, P. C., and Truell, R.~1961!. ‘‘Multiple scattering of waves,’’ J.Math. Phys.2, 512–537.

Wightman, F. L., and Kistler, D. J.~1989!. ‘‘Headphone simulation of free-field listening. I: Stimulus synthesis,’’ J. Acoust. Soc. Am.85, 858–867.

Wightman, F. L., and Kistler, D. J.~1997!. ‘‘Factors effecting the relativesalience of sound localization cues,’’ inBinaural and Spatial Hearing inReal and Virtual Environments, edited by R. H. Gilkey and T. R. Anderso~Lawrence Erlbaum, Mahwah, NJ!, pp. 1–23.

Wright, D., Hebrank, J. H., and Wilson, B.~1974!. ‘‘Pinna reflections ascues for localization,’’ J. Acoust. Soc. Am.56, 957–962.

Zhou, B., Green, D. M., and Middlebrooks, J. C.~1992!. ‘‘Characterizationof external ear impulse responses using Golay codes,’’ J. Acoust. Soc.92, 1169–1171.

Algazi et al.: Geometric head-related transfer function models