Simulation of room acoustics to train some aspects of ...upcommons.upc.edu/bitstream/handle/2117/80266/Edua... · nied by another person or with a seeing eye dog. However, it is less

SIMULATION OF ROOM ACOUSTICS TOTRAIN SOME ASPECTS OF HUMAN

ECHOLOCATION

by

Eduard Fernández Aguilar

Final Thesis for the degree of

Telecommunication Engineering at Polytechnic University of Catalonia under the Erasmus exchangeprogramme at Delft University of Technology

Supervisor: Dr. ir. Richard Hendriks

ACKNOWLEDGEMENTS

I would like to thank Richard Hendriks and Jorge Martínez for their continuous help during the last sixmonths. I would also like to thank my family, who have been there showing their support. In addition, Ihave to show my gratitude to Cristina Leal, whose help with the statistical part has been providential. Finally,I need to thank Anna Bosquet, who has been extremely helpful providing some ideas that have allowed me tofinish the thesis.

Eduard Fernández Aguilar

iii

ABSTRACT

Human echolocation is a technique that could improve the quality of life of most people who suffer from avisual impairment. This method consists of navigating the surroundings through the information providedby the echoes. These echoes are the reflections of sounds deliberately produced by the person who is tryingto echolocate.

This thesis is divided in a literature study and an implementation to simulate room acoustics to trainsome aspects of echolocation. The literature study includes three main topics. First of all sound localiza-tion is detailed, however, being this theme a large subject of study, only those factors that are involved inhuman echolocation are explained. These parameters are, among others: sound localization cues, the roleof the pinnae and head related transfer functions, and the precedence effect. Secondly, a review of humanecholocation most important cues and characteristics is given. To finish the theoretical part of the thesis,an explanation of the chosen method to simulate room acoustics is given. The model used in this thesis isthe image method. Finally, some simulations and examples of the proposed model are provided. In additionto these simulations the results of a user test are also showed. These results were evaluated using an exactbinomial test.

v

CONTENTS

1 Introduction 1

2 Sound Localization 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Sound Localization and Lateralization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Localization Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Role of the Pinnae and HRTFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.1 Role of the Pinnae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.2 Role of the HRTF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4 Precedence Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Echolocation 93.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Echo Information and Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3.1 Surface Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3.2 Object Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.4 Interpretation of Echo Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4.1 Signal Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4.2 The Ideal Echo Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Image Method 154.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.2 Computational Models to Simulate Room Acoustics . . . . . . . . . . . . . . . . . . . . . . . 15

4.2.1 Wave-based Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.2.2 Ray-based Modelling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 Image Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.1 Image Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.2 Image Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Experiments and Simulations 235.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 Room Impulse Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2.1 Ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2.2 Real Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3.1 Distance discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3.2 Lateral discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.3.3 Distance and lateral discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.4 User Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Conclusions 35

Bibliography 37

vii

1INTRODUCTION

One of the most feared impairments among all the possible ones is blindness. 285 million people are esti-

mated to be visually impaired worldwide: 39 million are blind and 246 have low vision according to World

Health Organization. Almost everybody has seen a blind person walking in the street with a cane, accompa-

nied by another person or with a seeing eye dog. However, it is less common to see a blind person producing

sounds with the mouth as he or she moves. These people are using active echolocation.

To date there are no statistics available about how many blind people use echolocation, but anecdotal

reports in the literature suggest that between 20 and 30% of totally blind people may use it [1]. It would be

an incredible step forward for blind people if they could stop depending on a seeing eye dog or a person.

Moreover, being able to do so does not require any kind of device. Echolocation is learned through training

and all the needed equipment is provided by the human body, thus blind people who know how to echolocate

with ease would become more self-sufficient.

There are associations such as World Access for the Blind that are already teaching people how to echolo-

cate. However, there is not any kind of software that allow people practicing and training. In this thesis a

simulated environment is proposed to train certain aspects of human echolocation. To do so, a literature

study is presented and then the developed model is explained.

Human echolocation is the ability that some people have to navigate the surroundings making use of

sounds. These sounds reflect on surfaces or objects creating echoes, which are the elements that provide the

needed information to the echolocation user. There is some basic knowledge that must be taken into account

to understand echolocation. The most basic element is to comprehend how humans can identify where a

sound comes from. This effect is named sound localization. Several features are involved in this process, but

some of the most important ones are localization cues, involving interaural differences, the role of the pinnae

and head related functions, meaning how the body affects the received sounds, and finally the precedence

effect, which is the phenomenon that explains why humans are incapable echoes of close surfaces.

Although sound localization is the basis to echolocate, some particularities regarding to echolocation

need to be noticed. Not all type of sounds perform equally well. The same way that not all kind of sounds

work the same way to echolocate, different surfaces and objects provide different information to the subject

that is trying to echolocate, so it is important to know which signals are the best to echolocate and which

surfaces or objects are easier to identify.

In order to simulate room acoustics a mathematical model is needed to recreate this process virtually.

1

2 1. INTRODUCTION

There are many ways to do so but one of the most used ones is the image model. This method calculates the

room impulse response of a point to point transmission from a rectangular room without objects in it. Using

this method it is possible to obtain the room impulse response that would modify the sound produced by a

person trying to echolocate. Applying this method then, it is possible to simulate what a person would listen

if he/she produced a sound in an empty rectangular room. To check if the model to do so works correctly

some simulations are needed. It is also interesting to receive some feedback about how real people deal with

this simulations.

2SOUND LOCALIZATION

2.1. INTRODUCTION

Sound localization consists in the ability of estimate the direction and distance of a sound source in the envi-

ronment of the user. There is some terminology that should be known when dealing with sound localization.

Some terms are monaural, which is a sound that only reaches one ear. On the other hand, binaural is a sound

that reaches both ears. For purposes of sound localization binaural hearing is essential, as most of the cues

that allow people to echolocate are based on interaural differences. These interaural differences will be ex-

plained in detail during this chapter. Other two interesting terms to know are dichotic and diotic. If the sound

arriving the two ears is different it is called dichotic whereas if it is identic, then it is called diotic.

In order to be able to accurately indicate where a sound is coming from, a coordinate system is needed.

Usually the coordinate system is defined by three planes. These are the horizontal, the frontal and the median

plane. The horizontal plane is parallel to the eyes and passes through the entrance of the ear canals. The

frontal plane is perpendicular to the horizontal plane and it also passes through the entrance of the ear canals.

Finally the median plane, is formed by all those positions that form a right angle with both the horizontal and

vertical plane, in other words, all the points that are equally distant to both ears. Any possible direction of a

sound can be specified by its azimuth and elevation. Any sound in the median plane has 0º azimuth and any

sound in the horizontal plane has 0º elevation. Usually, the azimuth ranges from 0º to 360º, where 0º is just

in front of the head, 90º corresponds to the left ear, 180º is behind the head and 270º corresponds to the right

ear. However, the elevation usually ranges from -90º to 90º, being 90º just above the head, 0º on the horizontal

plane and -90º below the head. Figure 2.1 shows a graphic example of how this system of coordinates looks

like.

There is a special case in which people is able to locate a sound that receives a different name. In case

the sound is emitted from headphones, the localization is realized inside the head. Thus a sound image is

located inside the head. This process differs from localization as the sound source location is not placed in

the environment and it is called lateralization. Usually, lateralized sounds are located on the axis that links

both ears whereas localized sounds can be perceived as if they were coming form any direction. Although

sound localization definition refers to a sound source, later on it will be explained that it can also be used to

echolocate, as an echo itself also has an origin, an equivalent to a sound source.

3

4 2. SOUND LOCALIZATION

Figure 2.1: System of coordinates used to define positions relative to the head [2].

2.2. SOUND LOCALIZATION AND LATERALIZATION

2.2.1. LOCALIZATION CUES

To explain cues for localization, only pure tones will be considered. Assuming that there is a sinusoidal sound

source placed in the horizontal plan and at one side of the head. It is quite straightforward to think that the

sound that reaches the first ear in its path will arrive earlier and more intensely. Therefore, two cues can be

extracted from this scenario. The Interaural Time Difference (ITD) is the time difference between ears. The

second cues that can be obtained is the Interaural Intensity Difference (IID), which is the variation of intensity

of a sound that reaches both ears. If an IID is given in decibels, it is called Interaural Level Difference (ILD).

Due to physical reasons, these two cues are not equally effective for all frequencies.

For low-frequency sounds, ILDs are negligible when the sound source is sound sources that are at a con-

siderable distance. This is due to the fact that low-frequency sounds have a long wavelength compared to the

size of the head. The sound bends around the head (also known as diffraction) making ILDs unperceptible.

On the other hand, for high-frequency sounds it happens the exact opposite. As they have a short wavelength

relative to the head the ILDs are very noticeable. These ILDs can be reach 20 dB for these type of signals. In

the figure 2.2 it can be seen different ILDs for different frequencies and angles of incidence.

Range for ITDs is from 0 to 690 µs, which correspond to a sound impinging directly from the front and

to one impinging from -90º or 90º, which is directly to one ear. In figure 2.3 ITDs are plotted as a function of

azimuth. However, ITD can slightly vary depending on the frequency of the input.

If the input is a sinusoid, then the ITD is equivalent to the Interaural Phase Difference (IPD), which refers

to the variation of phase between ears. For low-frequency tones, less than 725 Hz, IPD provides clear and

unequivocal information about the sound location. The number 725 Hz is not chosen randomly. A tone of

725 Hz has a period of 1380 µs, which is exactly the double of the maximum ITD (690 µs). This means that

a tone of this frequency will present opposite phases al both ears, so the same waveform will be presented

at both ears. Thus, for high-frequency, the location of the sound may be ambiguous, especially for those

frequencies above 1500 Hz. This ambiguities can be solved by moving either the head or the sound source.

To conclude, it can be extracted that to localize low-frequency tones sources ILDs are more useful whereas

for high-frequency tones, ITDs are most suitable ones. This idea is called duplex theory and it dates back to

lord Rayleigh (1907).

2.3. ROLE OF THE PINNAE AND HRTFS 5

Figure 2.2: ILDs for a sinusoidal input [2].

Figure 2.3: ITD as a function of azimuth [2].

2.3. ROLE OF THE PINNAE AND HRTFS

2.3.1. ROLE OF THE PINNAE

Although head or sound source movements help in terms of localization in the vertical direction, the ability

to discern the position in these scenario is not limited to these two resources. Many studies [3–5] have sug-

gested that the pinnae provides important information to judge the vertical location of a sound. These two

last studies also suggest that they provide significant information not only for vertical localization but for all

directions.

Some studies [6–11] have proven that the pinnae modify the spectra of an incoming sound depending

on what angle this sound impinges relative to the head. The sound that enters the ear canal can be divided


into two. The sound that enters directly to the ear canal and the reflections that take place in the pinna.

These reflections are obviously delayed relative to the direct sound and when added to the direct sound can

either cancel some frequencies, when the phase difference is 180º, or enhance some frequencies when these

reflections are in phase with the direct path. These spectral changes produced by the pinnae only affect

those frequencies above 6000 Hz, which are the ones that have wavelengths small enough to interact with the

pinnae. However, spectral changes are not only limited for high-frequencies as the torso and head can also

produce variations on the spectrum. Thus, the pinnae, the head and the torso form a filter that is direction

dependent. This filter is the so-called Head Related Transfer Function (HRTF).

2.3.2. ROLE OF THE HRTFHRTFs are individualized filters as they depend on head, torso and pinnae shape and size. If a subject uses

another person’s HRTF is still able to localize in the horizontal plane with accuracy. However, the accuracy

when distinguishing a sound which comes from the front or back, or one that comes from above the head or

below is reduced [10].

HRTFs describe the filtering of head, pinna and torso when sound from an acoustic point source is re-

ceived at a defined position in the ear canal of a listener under free-field acoustic conditions [12]. The time-

domain equivalent of the HRTF is called Head Related Impulse Response (HRIR). The HRTF or HRIR, are

defined on a free space. This means that they must be obtained in anechoic chamber such that the environ-

ment does not affect the measurements.

HRTF CHARACTERISTICS AND THEIR PERCEPTUAL RELEVANCE

Sound transfer from a given sound source in free space to a given listener, more specifically his eardrum, can

according to [13] be divided into three specific parts. The first part is the transmission of the sound through

free space until it reaches the blocked entrance of the ear-canal. The second part is the impedance conversion

related to the ear-canal blocking. Finally, the last part is the transmission along the ear-canal.

In the same study, all these parts were found to be very dependent on the listener. Nevertheless, the

measurements taken at the blocked entrance showed less deviation between individuals. Based on this it was

considered to be the most suitable point to make HRTF measurements.

A well known study [8, 14], reports accurate sound localization when the subjects were using their own

HRTF. However, when the same subjects were using non-individual HRTFs, a high percentage of front-back

and up-down confusions were reported. It can be deduced from this study that the interaural cues for hor-

izontal sound localization are not very individually dependent. On the other hand, spectral cues need to

be considered as an important factor for resolving location along the cones-of-confusion. These cones-of-

confusion are those cones in which any sound source placed on its surface will provide the same ITDs.

SEQUENTIAL CAPTURING HRTF

A given HRTF depends on many parameters. The most important among these parameters are source posi-

tion, listener position and head and torso orientation. Most of the setups used to obtain HRTFs allow to vary

one or two of these parameters. In most of the available datasets [15–18], the incidence angle of the sound

source was changed with respect to a fixed head and torso orientation at a constant source distance. Rotat-

ing the head and torso is in this setup equivalent to moving the sound source spherically. It is important to

emphasize that the head orientation did not change in the mentioned datasets.

For a given angle of incidence, a measuring signal is emitted by the sound source and subsequently the

response at the left and right ear the HRTF is measured. In order to obtain a full set of HRTFs, this procedure

is repeated for all the desired source positions.

2.4. PRECEDENCE EFFECT 7

2.4. PRECEDENCE EFFECTIn normal acoustic conditions, a sound from a given source arrives to the listener ears via different paths. A

part of this sound, usually most of it, arrives through the direct path while the remaining parts reach the ears

after reflecting with the environment surfaces. However there are times that the greatest part of the energy is

concentrated in the reflections. Despite these echoes, which people usually cannot recognize, it is possible

to discern the direction where the sound was emitted from.

In some studies [19, 20] it has been investigated how the auditory system deals with echoes and how they

affect sound localization. These studies usually consist in tests performed in free space or using headphones.

In figure 2.4 it can be seen the type of signals that were fed when using headphones. It is important to know

how the hearing system copes with echoes in order to be able to understand echolocation. Some conclusions

that can affect echolocation can be extracted from these studies.

The first conclusion that can be extracted is that if the interval between the direct sound and the echo is

short enough, they fuse, thus they are heard as a single sound. This interval depends on the type of sound. If

the sound is a single click this interval is about 5 ms whereas if the sound is complex (speech or music) the

interval can be as long as 40 ms. This effect is called echo suppression. However, it is important to mention

that even the echo is not heard, if it is suppressed, the perception of the overall sound changes.

Considering the echo suppression, it has been proved that the sound location is determined by the loca-

tion of the first sound. This effect is the so-called precedence effect. However, there are some constraints for

the precedence effect to take place. It only takes places for sounds of a discontinuous or transient character.

The delay between the direct path and the echo must be bigger than 1 ms, otherwise, the sound location is

determined by the average of both the direct path and echoes direction. This last effect is known as summing

location.

It seems like the precedence effect has a disruptive effect on echolocation, which is true. However, the

precedence effect does not suppress completely the echoes, and as it is explained above, a sound with or

without echo is not heard the same way. Moreover te ability to detect interaural delays of echoes can be

improved through practice. What can be extracted from these conclusions is that echolocation needs some

practice and training in order to minimize the precedence effect.


Figure 2.4: Stimulus used in [19].

3ECHOLOCATION

3.1. INTRODUCTION

Although most people think about animals when talking about echolocation, it is also possible for humans to

develop skills that allow us to echolocate. It is specially useful for blind people, who can use this technique to

navigate. All the information that will be given in the following sections is regarding human echolocation.

Echolocation can be defined as the ability to perceive echoes and use them to obtain information about

the surrounding space and objects in the area. The auditory system processes the phonons (waves of sound)

that reflect on surfaces, thus being to obtain certain information.

There must be three components to perceive echoes: a sound, a surface, which will produce an echo,

and a receiver. The quality of the perceived reflections depends on each of these three components and the

interactions between them. The main characteristics of these factors will be explained in this chapter.

3.2. SOUND

Sounds that can be perceived by humans are characterized by five basic parameters: directionality, pitch,

timbre, intensity and envelope. Directionality, or directivity, is understood as the amount of focus that a

sound has as it is produced by its source. In terms of human echolocation, the sounds used to produce an

echo should have some directivity in order to know where the echoes come from.

The pitch refers to the dominant frequency of a sound. However, although pitch and frequency are closely

related, they are not equivalent. The pitch is a subjective measure, whereas the frequency is an objective

measure. This said, the pitch and the dominant frequency can be the same. Humans are able to distinguish a

large scale of pitches as they can hear sounds from 20 Hz to 20 KHz.

The third parameter is the timbre. It only makes reference to the unique sound that a source makes. What

allows different sounds to be distinguished is the spectral composition of these. Spectral composition is how

the frequencies that are present in a sound are distributed in the spectra. Simple timbres have few frequencies

while the complex ones have more. Besides, these frequencies can be grouped in a small range of frequencies

(narrow-band), or in a large range of frequencies (broad-band).

The next variable is the intensity, which is the loudness of a given sound. It is measured in decibels (dB).

The last parameter is the envelope. It is closely related to three factors. Rise time, sustain time and decay.

The rise time, or onset, is the amount of time that the signal takes to reach its peak from zero. The sustain

9

10 3. ECHOLOCATION

time is how long the signal stays at its average intensity. Finally, the decay time is the length of time that the

signal takes to decrease from its average intensity to zero. In practical terms, the envelope is the contour of a

signal.

3.3. ECHO INFORMATION AND PERCEPTION

Echoes are characterized by the same five parameters as a sound generated by a source. However, their char-

acteristics correspond to the surface they reflected on. Thus, the surface properties can be obtained analyzing

the echo. Surface detection will be explained in the following sections. Once this explanation is given, object

detection will be covered.

3.3.1. SURFACE DETECTION

Surface detection is the most basic element in echolocation. Being able to perceive the presence of a surface

by its reflected echo is the most important factor. If no surface is detected, no further information can be

obtained.

The presence of echo can only exist if there is a surface. However, the non existence of an echo does not

necessarily mean that there is no surface. It might mean that the surface is only capable of casting echoes

that are too weak to be perceived or that the environmental sounds mask the echoes.

The main factors involved in the change of intensity of an echo are the target parameters; the spatial

relationship between the target, the sound source and the observer; and the background noise that might

mask the echoes.

TARGET PARAMETERS

It is obvious to deduce that the more reflective a surface is, the more intense the echoes produced by it will

be.

One key factor that contributes to the quality of the reflected echoes is the target geometry. The dimen-

sion, width and curvature of a target affect the strength of the reflected echo. The thinner a target is, the more

difficult it is to detect. It is due to the fact that thin surfaces tend to scatter or diffract more energy than they

reflect. However, if this same surface is curved to increase its directivity, it can be detected again [21].

Another important factor involved in the echo quality produced by a surface is its composition. The

composition refers to the density and texture of a surface [22]. Targets of little density tend to perform poorly

in terms of reflectivity. Soft surfaces absorb much of the energy whereas sparse surfaces let the energy pass

through them rather than reflecting it. In the same way, very smooth surfaces tend to reflect less energy than

rougher surfaces. Sound waves slide off polished surfaces, causing a lot of scattering [23, 24].

SPATIAL RELATIONSHIP BETWEEN TARGET AND OBSERVER

Distance and Size The echo being a sound signal, it is affected by distance as well. It decreases its intensity

as the distance the signal is travelling increases. Thus, the further the surface is, the weaker the echo will be.

When talking about size, as it has been explained above, thin targets tend to scatter most of the energy of

the impinging sound wave. Likewise, signals tend to pass around small targets, since the area where they can

bounce back from is not big enough.

Target Position There are not enough data about lateral and vertical target positioning to enable a clear

understanding of the contradictions that have been found in different studies. Signal characteristics may be

responsible for the apparent contradiction in these findings [25–30].

3.3. ECHO INFORMATION AND PERCEPTION 11

The manner in which the target is situated relative to the observer provides clearer results. If targets are

flat planes faced squarely to the observer, optimum perception is achieved. However, this is not the type of

situation that an observer will find on a daily basis. When a surface becomes more oblique, they divert more

energy away from the observer. Similarly, the can be perceived as a thinner surface, which leads to more

scattering as it is explained above [31].

EFFECTS OF SOUND SOURCE POSITION

Although there are many types of sound blind people can use (hand claps, cane taps, footsteps, tongue

clicks...), it has been proved that one key factor is their positioning relatively to the observer ears [32? ].

Later on, what type of sound is the best for echolocation purposes will be discussed.

3.3.2. OBJECT PERCEPTION

Object perception is not just being able to detect a surface. The observer is able to perceive different features

of the object. These features can be shape, size or location among others. This ability does not just allow a

blind person to avoid obstacles, it gives them the chance to interact with the objects they want in advance,

and not just treat them as something to evade.

The most important features that should be taken into account when dealing with object perception are

object localization and size, form and composition perception.

OBJECT LOCALIZATION

It refers to the ability to distinguish where an object is located. The most widely studied aspects on object

localization are distance perception and lateral localization [? ].

Distance Perception Some features of envelope and pitch seem to be the main parameters that play an

important role in terms of distance perception in humans [33].

Considering the envelope, apart from the factors that have already been explained, there is another factor

that should be taken into account, the time delay. The time delay is defined as the time interval between the

onset of the source sound and the onset time of the echo. This delay is directly proportional to the distance

between the source and the target. There is a point at which the human ear can no longer distinguish the

difference between the source sound and the echo. This point occurs when the distance between them is

around two or three meters.

At this point, the ear relies on the pitch parameter to have a perception of the distance. When the distance

between the surface and the sound source decreases, the pitch rises comparing it to the source sound one

[34]. This effect can lead to the cancellation of certain frequencies or the augmentation of other ones. These

changes can be explained by interference patterns between the reflected wave and the impinging one [? ].

Lateral Localization As it can be deduced, the ability to localize objects laterally comes from being able to

identify the directional parameters of the echo.

Still, there are some studies that show that once an object is moved from the frontal position, the ability

to localize it drops off [25–28].

PERCEPTION OF SIZE

Studies in size discrimination have all followed a similar paradigm. The largest and the smallest stimuli from

a given set are presented to the subject. Then the next largest and smallest are presented until the subject is

not able to perceive a difference between them [? ].

12 3. ECHOLOCATION

These type of studies have shown that size perception is closely related to the distance between the subject

and the object [35–39]. This is in accordance with what has been explained. Small surfaces reflect less sound

and therefore less intensity, the same way as far surfaces reflections lack intensity.

Other parameters that might theoretically be involved in size perception are timbre and directionality.

Small surfaces tend to reflect easier high frequencies, as its wavelengths are smaller than low frequencies

which may pass around the object and not bounce on it. This effect could change the timbre of the reflected

sound comparing it to the original one. As for directionality, larger surfaces reflect a broader spread of wave

fronts than smaller objects. This can be perceived by the listener as if the surface was occupying a larger

space, thus a larger object.

PERCEPTION OF FORM

Some studies show that blind people can distinguish forms. In theory, directional characteristics of reflected

energy combined with intensity variations should allow the perception of general form through the use of

echoes [40, 41].

PERCEPTION OF COMPOSITION

Through spectrographic analysis of ultrasonic reflections, the fact that the ability to perceive surface compo-

sition from echoes is determined largely by the echo timbre has been shown [37, 38]. Some textures tend to

reflect certain frequencies better than others, which leads to the change of the echo timbre, being possible

this way being possible to identify the composite nature of surfaces.

3.4. INTERPRETATION OF ECHO INFORMATIONIn order for echolocation to be useful, the variables that characterize it must be understood under all circum-

stances. The degree to which a certain subject can extract useful information from an echo depends on the

characteristics of the echo information and the nature of the environment in which it occurs, as well as on the

physical and physiological capacities of the observer to perceive and process all this information. The signals

that can be used to generate echoes are only useful as long as the listener can extract information from the

echoes generated by them. Otherwise, all this information is either lost or meaningless [? ].

3.4.1. SIGNAL PARAMETERS

FREQUENCY

It is believed that there is a need for humans to use high frequency sounds in order to be able to echolo-

cate [42]. Although high frequencies can not travel as far as low frequencies, the energy they carry reflects

more completely from the surfaces they encounter. This is due to the fact that high frequencies have smaller

wavelengths, which makes it easier to reflect better from small objects or small features of surfaces [? ].

However, high frequencies might not be as efficient when the objective is to locate large features or per-

ception at greater distances. Another limitation is that they do not perform well when dealing with tilded sur-

faces, as they tend to be scattered or diffracted [25]. It is also important to take into account the fact that high

frequencies are more easily obscured or buried by low frequencies sounds than the other way around [25].

Therefore in high noise environments, low frequencies may be the best option. Furthermore, as it has already

been explained, pitch and intensity discrimination, the most important parameters that enable echolocation,

tend to be poor at high frequency.

It is straightforward then to think that the use of midrange frequencies for echolocation is the most suit-

able. Considering that standard movement and navigation tasks rarely require the need to detect the smallest

details it allows the use of midrange frequencies.

3.4. INTERPRETATION OF ECHO INFORMATION 13

TIMBRE

Studies of timbre agree that complex, wide brand timbres are better than simple narrow band signals as they

can carry more useful information [43–45]. The reason why wide band signals are better than narrow band is

that they contain a big range of frequency. This way, high frequencies can be used to distinguish small details

while midrange frequencies allow maximum intensity discriminability.

INTENSITY

It has been reported that sounds of medium intensity work better in terms of echolocation than loud sounds

[46]. There are two main reasons why this is true. The first reason is that as the echo is always quieter than the

original sound, if this one is too loud, it could mask the reflection. The second factor is the unique design of

the human auditory system. The human auditory system tends to dampen reception about two milliseconds

after the onset of a sound [47]. These mechanisms include the stapedious reflex and the neural refractory

period [48]. This means that a sound seems to get quieter right after the start, specially loud ones. This

mechanism also dampens echoes, which can be made undetectable.

ENVELOPE

To be able to use echolocation, a person needs to hear the majority of the echo, thus not all kind of signals

can be used. The signals that can be used are those that are short enough so that the echo can be best heard.

If the signal is over very quickly, most of the echo returns once the signal is finished, so there is no masking.

It is suggested that pulsed signals of less than ten milliseconds of duration are the most suitable for good

echolocation in humans [49].

In addition, it is also very helpful if the signal has a very fast rise and decay time. This type of signal

produces a phenomenon called click transient. It amounts to a brief burst of white noise at the rise time of

the signal which can yield very high frequencies depending on the physical nature of the signal. Even if the

signal has only low frequencies, a very quick rise and decay time provides a complex spread of frequencies to

a very high range. This is important because signals that are commonly used in human echolocation, such as

finger snapping or tongue clicks, have high frequency components thanks to click transient.

DIRECTIONALITY

In order for signals to provoque useful echoes, they must allow that most of the reflected energy comes back

to the listener’s ears. In terms of echolocation, directionality can be divided into two components. The first

one is the direction of the source signal and the direction of the reflected sound.

Directed signals are the most useful, as the energy is focused away from the observer [43, 46]. These type

of signals bring important benefits. More intense signals can be used as the ears are not in the direct path

of the source signal, therefore the auditory system tends not to engage suppressive mechanisms that might

mask the echoes. Moreover, the use of more intense signals also elicits stronger echoes, which makes it easier

to obtain information from them.

The direction of the reflected energy is determined by the direction of the source signal relative to the

reflecting surface [32]. The amount of useful energy depends upon the relative position and orientation of

the observer to the position and direction of the source signal and to the reflecting surface. Thus, it is quite

straightforward to imagine that the best possible scenario is a signal emitted near the observer’s ears and

focused to a perpendicular surface.

3.4.2. THE IDEAL ECHO SIGNAL

Analyzing all the given information, it can be said that the ideal signal should make use of frequencies through-

out the audio spectrum and maximize the return of echo information to the ears. A suitable signal would

14 3. ECHOLOCATION

be a pulsed, directed and complex signal of variable intensity and quick direction originated near the ears.

Additionally, the signal should be produced deliberately and should also have consistence in its acoustics

parameter.

There are two types of signals that can adjust to the above said: artificial and organic signals.

Artificial signals need to be produced by an external device. These devices tend to be uncomfortable and

are easily noticed. In terms of how they produce a signal they can be classified as electronic and mechanic

devices. Electronic devices can be designed and created to produce signals that can cause optimal echoes.

However, they tend to be expensive and they need a power source as periodic maintenance. Mechanical

devices are usually clickers. These clickers are less obtrusive and cheaper than the mentioned electronic

devices. However, signal parameters can not be changed and the directivity is limited as well.

Cane taps and footsteps are also considered mechanically produced sounds. They are better than the

other mechanic devices as they do not need maintenance and they are not expensive. On the other hand,

in terms of signal production, they perform as poorly as a typical mechanic gear. Cane taps and footsteps

produce sounds that are far from the ears and they are highly affected by the ground.

Organic signals do not have most of the disadvantages of the artificial ones. They do not need manip-

ulation of an external device since they are always available, they do not need maintenance and are free of

charge. It is true that the produced signals are not as flexible as the ones produced by an electronic device,

but they also have some of the needed parameters. Blind echo users can generate a lot of signal types, but the

most common are handclaps, finger snaps, vocalizations and oral clicks. The first two types of signals (hand-

claps an finger snaps) have some advantages such as strong intensity and a suitable envelope. Nevertheless,

the lack of directivity and having to use the hands are some of the main inconveniences. Oral signals do not

need extra manipulation and additionally have more directivity than handclaps. Considering what has been

explained, almost all blind echo users use oral clicks.

Phoneticians classify oral clicks in five groups depending on how these are physically generated [50]. Each

type of click has its own basic parameters (envelope, intensity and spectral characteristics). Theoretically,

oral clicks should be signals with good properties to elicit proper echoes and empirical evidence shows that

[51, 52]. They can last for a very short time, around 4 ms, although normal duration ranges from about 6.6 ms

to 20 ms. Rise times are also quite short, fluctuating between 1.2 ms to 8 ms. In terms of frequency, depending

on the type of oral click used, they can vary from 0.9 kHz to 8 kHz.

To conclude, it can be seen that the pulsed, complex and directional nature of oral clicks make them a

good candidate to be used as signals to echolocate. The possibility to control several parameters like intensity,

timbre and directionality make them suitable for a large list of situations. For all these reasons, the oral click

is considered to be the best candidate for blind people to echolocate.

4IMAGE METHOD

4.1. INTRODUCTIONOne of the most popular methods to simulate room acoustics is the image method proposed by Allen and

Berkley in 1979 [53]. They chose the image model because what they were seeking for was the transfer func-

tion for a point to point transmission.The image model only takes into account those images that have an

effect to the room impulse response. The most important feature of the image model is that in the time

domain, it only contributes with a pulse that only has as characteristics gain and delay.

Image method is not the only way to find a room impulse response (RIR). In this chapter alternatives to

image method will be explained. Equally further explanation of the image method will be provided.

4.2. COMPUTATIONAL MODELS TO SIMULATE ROOM ACOUSTICSMathematically, the sound propagation is defined by the wave equation. Impulse responses from a source to

a receiver can be obtained by solving the wave equation, however, it can rarely be expressed analytically, thus

the use of approximations is very usual. That is the reason why computational models are used.

Computational models to simulate room acoustics can be divided in three groups. Wave-based mod-

elling, ray-based modelling and statistical modelling.

Statistical modelling will not be explained as it is not suitable for auralization problems, but as it is a way

to simulate room acoustics it needed to be mentioned.

4.2.1. WAVE-BASED MODELLING

Wave-based models return the most accurate results. However, analytical solutions can only be achieved in

very simple cases, such as a rectangular room with rigid walls . Among all the wave-based models, only three

will be mentioned and its main characteristics will be given. The models that will be mentioned are the Finite

Element Method (FEM), Boundary Element Method (BEM) and Finite-Difference Time-Domain (FDTD).

The FEM and BEM are numerical methods [54, 55], which means that are very computationally demand-

ing. Therefore, these methods for real time auralizations are quite limited as the high computational re-

quirements are a big handicap. In FEM, the space is divided into volume elements, while in BEM only the

boundaries of the space are divided into surface elements. The elements interact with each other according

to wave propagation basics [56]. There are more requirement for these methods to be used. The size of the

15

16 4. IMAGE METHOD

elements the waves interact with need to be much smaller than the wavelength, and moreover, it is highly rec-

ommended to only use low frequencies, as at high frequencies, the required number of elements to compute

becomes very high. Thus, these methods are only to be used for small enclosures and using low frequencies

preferably.

The last method to simulate room acoustics that will be mentioned is the FDTD [57, 58]. The princi-

ple FDTD is based on is to substitute the derivative of the wave equation by their finite differences. With

this method, better impulse responses for auralization purposes are achieved compared with FEM and BEM.

However, FEM and BEM are better than FDTD as it is possible to create more complex structures with the

element methods than with FDTD.

4.2.2. RAY-BASED MODELLING

Ray-based models are based on geometrical room acoustics [59]. The most used ray-based models are the

ray-tracing [60] and the image method [53]. The main difference between these two methods to obtain the

room impulse response is how they calculate the reflection paths [61]. To calculate any room impulse re-

sponse, all the sound reflection paths should be taken into account. In ray-tracing models the emitted sound

is treated as a set of finite rays. These rays propagate through the room and all the reflections caused by the

collisions of these rays with the room walls are considered. All these rays are affected by the attenuation of

travelling through air and the collisions with the room boundaries. Once all the rays reach the receiver, they

are processed and the room impulse response is obtained. Rays emitted by the source can be structured in a

set of randomly distributed angles, uniformly distributed angles or a restricted set of angles. As it can be no-

ticed, ray-tracing based models are not exhaustive as they not contemplate all the possible reflection paths,

just the ones caused by the given set of rays emitted by the source. In contrast with ray-based method, image

methods do are exhaustive. In return, image methods can only be applied in enclosures with plane surfaces

whereas ray-based methods can be used in rooms with random surfaces.

It should be mentioned that ray-based models do not take into account phase changes as they are based

on energy propagations. Once this is said, a further explanation of the image method will be explained in the

next section.

4.3. IMAGE METHODAs well as the other methods explained above, image method is suitable to calculate the reverberation of a

room given a sound source and a receiver. Allen and Berkley developed a model to compute a Finite Impulse

Response (FIR) between a source and a receiver within a rectangular room.

4.3.1. IMAGE MODEL

In order to explain how the image model works, it is better to use some figures, as graphically is much easier.

Figure 4.1 shows a source (S) and a receiver (D). It can be seen that two signals arrives to D . One corresponds

to the direct path and the other one corresponds to the signal reflected. It can also be seen the image source

(S′), which is obtained by mirroring the room and the original source. Being the triangle SRS′ isosceles, then

by symmetry

−→SR +−−→

RD =−−→S′D , (4.1)

thus, it is the same to compute the path length from adding−→SR and

−−→RD , or just obtaining

−−→S′D . Also, the

fact of using an image source ensures the presence of a reflection.

If there is a need to find reflections of higher order, like second reflections or third reflections, it is more

4.3. IMAGE METHOD 17

evident that using image models in these cases saves more calculations. 4.2 shows an example of how a

reflection of third order would be computed. In this case the equality is

−→SR +−−→

RK +−−→K F +−−→

F D =−−−→S′′′D , (4.2)

so it is very obvious that the higher order is the reflection that needs to be calculated, the more efficient it

is to use the image model.

Finally, in order to obtain a room impulse response with a given order what needs to be done is create a

lattice with as many image orders as the desired reflection order. An example of how this lattice could see is

shown in 4.3.

Figure 4.1: Path involving one reflection obtained using one image [56].

18 4. IMAGE METHOD

Figure 4.2: Path involving three reflections obtained using three images [56].

Figure 4.3: Lattice formed by a set of virtual sources.


4.3.2. IMAGE METHOD

Consider a rectangular room with a length of Lx , a width of Ly and a height of Lz . Then consider a source

located at s = [xs , ys , zs ] and a receiver located at r = [x, y, z]. Both of them are located at that coordinates

from x = 0, y = 0 and z = 0. The relative positions of the images respect to the receiver, taking the origin of

coordinates as origin, can be written as

Rp = [(−1)qx xs −x, (−1)qy ys − y, (−1)qz zs − z], (4.3)

and considering that

q = [qx , qy , qz ] ∈Q = {[qx , qy , qz ] : qx , qy , qz ∈ {0,1}

}(4.4)

8 combinations of image sources are obtained. It must be noticed that whenever (q) is 1 in any dimension,

an image source exists and it does not necessary have to be of order one. A possible solution to consider all

the images could be adding a vector to Rp. This new vector should look like

Rm = [2mx Lx ,2my Ly ,2mz Lz ] (4.5)

being

m = [mx ,my ,mz ] ∈M = {[mx ,my ,mz ] : −N < mx ,my ,mz < N ∈Z}

. (4.6)

Adding (4.5) to (4.3) all image sources can be obtained with the only restriction of N . Therefore, the order

reflection of an image source located at r+Rp +Rm can be obtained by the following equation

Op,m = |2mx −qx ,2my −qy ,2mz −qz |. (4.7)

The distance between any image source and the receiver can be written as

d = ‖Rp +Rm‖, (4.8)

therefore the Time Delay Of Arrival (TDOA) can be expressed as

τ= ‖Rp +Rm‖c

, (4.9)

where c is the sound velocity in meters per second.

Taken into account all the previous considerations, the impulse response for the given source and receiver

can be written as

h(r,s, t ) = ∑p∈Q

∑m∈M

β|mx−qx |x1

β|mx |x2

β|my−qy |y1

β|my |y2

β|mz−qz |z1

β|mz |z2

δ(t −τ)

4πd, (4.10)

where the parametersβx1 ,βx2 ,βy1 ,βy2 ,βz1 andβz2 are the reflections coefficients of the six walls that form

the room. Being the elements of m ranged between−N and N , it means that there are (2N+1)3 combinations.

At the same time, as it is said above, q gives 8 combinations. Thus, all the possible combinations are 8(2N+1)3.

The delays of each of the impulses (τ), are calculated using (4.9). Once both summations are done, to obtain

the signal that reaches the receiver, it can be calculated by convolving the signal emitted by the sound source

and the calculated impulse response.

In order to implement this model it has to be noticed that the delay ((4.9)) might not match a sampling

20 4. IMAGE METHOD

instant. Therefore, the discrete expression for (4.10) can be expressed as

h(r,s, t ) = ∑p∈Q

∑m∈M

β|mx−qx |x1

β|mx |x2

β|my−qy |y1

β|my |y2

β|mz−qz |z1

β|mz |z2

LPF {δ(t −τ fs )}

4πd, (4.11)

where fs is the sampling frequency and LPF is theoretically ideal Low Pass Filter (LPF) with cut-off fre-

quency fs/2. In [53] the following approximation was made

LPF {δ(t −τ fs )} ≈ δ(t − r ound{τ fs }) (4.12)

so the Time Of Arrival (TOA), in samples, was shifted to the nearest integer value. Nevertheless, there

are some applications where the TOA is a critical parameter, thus this approximation can be harmful for the

desired purpose. Peterson suggested to replace each impulse by the impulse of an ideal Hanning-windowed

low-pass filter of the form [62],

δLPF (t ) =

12

(1+cos( 2πt

Tω))sinc(2π fc t ) for −Tω

2 < t < Tω2

0 otherwise, (4.13)

where Tω is the width of the impulse response (in time), and fc is the cut-off frequency of the low pass

filter. Using this approach, even using a low sampling frequency, it is possible to obtain the true delays of

arrival. In figure 4.4 it can be seen a comparison of the values obtained by the Allen and Berkley’s method

(squares) and the values obtained by the Peterson’s method (circles).

The last thing that must be considered when simulating room acoustics is the reverberation time, which

is defined as the time that a reflection takes to be 60 dB down from its direct path. An empirical formula know

as Sabine-Franklin’s formula [63] can be used to determine the reverberation, also known as RT60

RT60 = 24ln(10)V

c∑6

i=1 Si (1−β2i )

, (4.14)

where V is the volume of the room, βi the reflection coefficient and Si the surface of the i th wall.


Figure 4.4: Comparison of the shifted and low-pass impulse method [62].

5EXPERIMENTS AND SIMULATIONS

The aim of this thesis is to check if it is possible to simulate room acoustics to train certain aspects of echolo-

cation. An explanation of the proposed experiments and simulations will be explained in this chapter. Addi-

tionally some user tests will be performed to see the viability of the developed simulations.

5.1. EXPERIMENT SETUP

The used scenario is an empty room of variable dimension, however most of the simulations have been per-

formed using a room of a width of 4 meters, a length of 4 meters and a height of 3 meters. As the RIR is

calculated using the image method explained in 4, no objects can be placed in the room. In order to calculate

a suitable reflection coefficients, the Sabine Franklin’s formula (4.14) is used. The chosen RT60 ranges from

0.5 to 0.8 seconds according to the Germany Standard DIN 18041. In this case, according to the recommen-

dations given in DIN 18041, which can be seen in 5.1, having a volume of 48 m3, the corresponding RT60 is 0.4

seconds. With the given data, the average reflection coefficient is 0.7583. However, in order to make it more

realistic and to mitigate the lack of data form the used HRTF database, the floor reflection coefficient is set to

0.001 emulating a carpeted floor.

23

24 5. EXPERIMENTS AND SIMULATIONS

Figure 5.1: Recommendations of DIN 18041 regarding the reverberation time in a room at 500 Hz as a function of its use and volume.

The used HRTF is the one provided by [15]. It includes data from an elevation of -40º to 90º in steps of

10º. Regarding the azimuth, the 360º are sampled in equidistant steps, but they might not be the same for

all the elevations. In 5.2 it can be seen the number of measurements for each elevations and the azimuth

increments.

Figure 5.2: Number of measurements for each elevation.

The listener is modelled with a loudspeaker and two microphones. The loudspeaker represents the mouth

5.2. ROOM IMPULSE RESPONSE 25

and the two microphones represent both ears. Both ears are separated 18 centimeters in the intersection of

the frontal and the horizontal plane. The mouth is separated 9 centimeters from the axis that form the ears

in the horizontal plane and is equidistant with both ears. In 5.3 it can be seen the scenario.

Figure 5.3: Scenario.

5.2. ROOM IMPULSE RESPONSE

5.2.1. IDEAL

The first thing to do is understand a basic example of how a Room Impulse Response (RIR) in this conditions

should look. The room dimension is the one explained above, so 4x4x3 meters. To understand this, figure 5.4

has been included to illustrate the explanation.

The larger and first peak (Sample 18), corresponds to the direct path of the signal. It is the sound that

goes directly from the mouth to the ears. The ceiling reflection is located at the sample 312. Considering that

the sampling frequency is 44100 Hz, this is correct as the number of sample N is defined by the following

expression.

N = x fs

c, (5.1)

thus, being the distance to the ceiling 1.2 meters and c = 340, the obtained sample is 311. It should be

taken into account that the distance to be used is the double of the real one, as the sound has to arrive to the

ceiling, bounce and then come back.

The next peak is the main reflection. It is the one that bounces in the front wall. Being the subject 2.91

meters away from the front wall, theoretically the reflection should be at sample 377. The obtained sample is

379.

What comes next are the lateral and the crosswise reflections. The peak at the sample 490 corresponds to

the echo produced by the front floor/down front wall. Most of the next echoes do not impinge frontally,thus

there is a small time difference between those that arrives to one ear than those that arrive to the other ear.

This is very clear in the peaks at the samples 508 and 532. These peaks are the lateral reflections. The differ-


ence of 24 samples correspond to 18 centimeters, which is the separation between ears.

Once the HRTF is applied, the signal becomes much more difficult to analyze. However, the presence of

the echoes can still be seen with clarity. In figure 5.5 the RIRs with and without the application of the HRTF

are shown.

Figure 5.4: Room Impulse Response.

Figure 5.5: Room Impulse Response with HRTF.

There is a delay of approximately 50 samples. This is due to the fact that when convoluting the RIR with

the HRTF the resulting signal is delayed 50 samples.

5.2. ROOM IMPULSE RESPONSE 27

5.2.2. REAL SIGNALS

To work with a realistic environment there is a need to work with real human click 5.6. When convoluting a

real human click with the already seen RIRs, it gets really difficult to distinguish between single reflections.

However, it is still easy to identify the first reflection. It might be possible that second reflections produce

peaks with higher amplitude than the first reflection. This phenomenon is due to the fact that in this interval,

the sum of different reflections create a peak, which results to have greater amplitude than the first reflection.

In figure 5.7 it can be seen a comparison between the resulting signal using the HRTF or not.

In order for this experiment to be useful for human echolocation purposes, it must be dynamic. The

dimensions and the position of the listener need to be easily changed. That is one one of the advantages of

using the image method.

Figure 5.6: Real human click.


Figure 5.7: Real click comparison.

5.3. SIMULATIONS

5.3.1. DISTANCE DISCRIMINATION

It can be clearly seen in figures 5.8 and 5.9 that the proposed model deals correctly with distances variations.

The approach to the wall makes the first reflection arrive earlier. The samples of arrival are consequent with

the distance between the listener and the wall. It can also be noticed that non varying elements such as ceiling

or lateral reflections are not altered.

Figure 5.8: Distance discrimination.

5.3. SIMULATIONS 29

Figure 5.9: Distance discrimination with HRTF.

5.3.2. LATERAL DISCRIMINATION

The same way the model deals distance variation properly, it also works correctly when testing lateral dis-

crimination. When the listener is moved closer to a lateral wall, it can be seen in figures 5.10 and 5.11 how

the first reflection becomes now the lateral reflection resulting from the echo of this side of the room. It can

also be noticed how the channel closer to the wall receives first the echo than the other one (Blue for the left

channel and red for the right channel).

In this case, the graphics corresponding to the test with HRTF makes this lateral discrimination clearer

5.11. This is due to the fact of the HRTF nature, as it allows humans to distinguish wether sounds comes from

the left, right, above or below the head. Thus, it would make sense to be able to notice this in the graphics.

However, as it has been already explained in 3, it is not proven that humans can percept objects laterally,

so it is interesting to see how the model deals with lateral movements although it might not be useful for the

echolocation training problem.


Figure 5.10: Lateral discrimination.

Figure 5.11: Lateral discrimination with HRTF.

5.3.3. DISTANCE AND LATERAL DISCRIMINATION

The last test was performed to check if the model is able to handle both distance and lateral discrimination at

once. As it can be observed in figures 5.12 and 5.13 the model is able to do so. When the listener is closer to the

front and one side wall, it can be seen how the front reflection arrives earlier and how the corresponding side

channel receives earlier the lateral reflection. The same way, when the subject is further from the front wall

but closer to a side wall, lateral reflections prevails. As explained above, this last case is much more evident

when analyzing the graphics that include the HRTF.

5.4. USER TESTS 31

Figure 5.12: Distance and lateral discrimination.

Figure 5.13: Distance and lateral discrimination with HRTF.

5.4. USER TESTS

In order to validate if the designed model is applicable some user tests were performed. The user test consists

in being able to notice the difference between two sounds recorded from different distances from a wall. In

order to make it easier, all the reflection coefficients but the one from the front wall were set to zero. The

compared distances can be found in 5.1.


DistancesLong vs Long 10 8

Long vs Medium 10 4Long vs Short 10 0.5

Medium vs Medium 4 4Medium vs Short 4 0.8

Short vs Short 1 0.5

Table 5.1: Pairs of distances used in user tests.

The number of realized user tests were 14, 7 to visually impaired people and 7 to people without vision

problems. The sound pairs that each user heard were random. This leaded to a situation were some subjects

compared a given pair more than once and did not evaluate others.

Each user test consisted in the comparison of 10 pairs. The first five sound pairs were filtered by the HRTF

while the last five did not have any HRTF involved. The obtained results can be found in 5.2 and 5.3.

Visually impaired peopleHRTF No HRTF

Right Wrong Right WrongLong vs Long 0 3 4 3

Long vs Medium 3 2 2 2Long vs Short 2 2 1 2

Medium vs Medium 4 3 1 3Medium vs Short 6 5 6 4

Short vs Short 2 3 6 1

Total 17 18 20 15

Table 5.2: Obtained results for visually impaired people.

People without vision problemsHRTF No HRTF

Right Wrong Right WrongLong vs Long 1 2 3 3

Long vs Medium 2 1 1 4Long vs Short 2 0 2 0

Medium vs Medium 7 2 2 2Medium vs Short 10 2 7 4

Short vs Short 3 3 4 3

Total 25 10 19 16

Table 5.3: Obtained results for People without vision problems.

In order to validate whether the test gives significant results an exact binomial test was performed for each

pair of sounds and globally. A significant result would be that the probability to obtain these results by chance

is significant or not. The tests that provided significant results are the ones highlighted in green, the medium

vs short distance (HRTF involved) for people without vision problems and the overall test for the same group.

The reason why there are not more validated tests is the reduced population that was used during these user

tests.

Taking a look at the results test by test, it can be appreciated that in most of them there is not a clear

tendence to the right option. This can be attributed again to the small population used. Considering the tests

globally, it can be seen that both groups perform decently well when the sounds are not filtered by an HRTF.

5.4. USER TESTS 33

However, meanwhile the visually impaired group performs poorly when dealing with sounds filtered by an

HRTF, the other group performs particularly well in this case.

6CONCLUSIONS

In this thesis a model to simulate room acoustics to be used to train certain aspects of echolocation has been

developed. A theoretical introduction in order to understand how the proposed approach works has been

given as well.

The implemented model, as it has been explained above, works properly. The echoes arrive when they

have to and the model is capable to deal with several reflections at the same time. Therefore it can be con-

cluded that the proposed model could be suitable to accomplish the goal it was created for. A user test has

also been performed. In this user test both women and men have participated, although the presence of men

was higher than the one from women. It is also important to consider that all the visually impaired subjects

were not completely blind. Some of them were blind whereas the others subjects had low vision. Unfortu-

nately, due to the lack of resources the test has not been as large as it should in order to have enough amount

of data to be able to take conclusions.

Despite the inconclusive results, the developed model is promising as it has been demonstrated in 5. Fur-

ther testing is suggested in order to validate if it is possible to train certain aspects of echolocation simulating

room acoustics. Some factors that should be looked into are the necessity of the use of a personal HRTF as

well as the use of a self-made click.

35

BIBLIOGRAPHY

[1] L. Thaler, Echolocation may have real-life advantages for blind people: an analysis of survey data, Fron-tiers in physiology 4 (2013).

[2] B. C. Moore, An introduction to the psychology of hearing (Brill, 2012).

[3] R. Butler, Monaural and binaural localization of noise bursts vertically in median sagittal plane, Journalof Auditory Research 9, 230 (1969).

[4] D. W. Batteau, The role of the pinna in human localization, Proceedings of the Royal Society of LondonB: Biological Sciences 168, 158 (1967).

[5] H. G. Fisher and S. J. Freedman, The role of the pinna in auditory localization. Journal of Auditory re-search (1968).

[6] E. Shaw, Transformation of sound pressure level from the free field to the eardrum in the horizontal plane,The Journal of the Acoustical Society of America 56, 1848 (1974).

[7] S. R. Oldfield and S. P. Parker, Acuity of sound localisation: a topography of auditory space. i. normalhearing conditions, Perception 13, 581 (1984).

[8] F. L. Wightman and D. J. Kistler, Headphone simulation of free-field listening. i: Stimulus synthesis, TheJournal of the Acoustical Society of America 85, 858 (1989).

[9] J. Kawaura, Y. Suzuki, F. Asano, and T. Sone, Sound localization in headphone reproduction by simulatingtransfer functions from the sound source to the external ear. Journal of the Acoustical Society of Japan (E)12, 203 (1991).

[10] E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman, Localization using nonindividualized head-related transfer functions, The Journal of the Acoustical Society of America 94, 111 (1993).

[11] J. Blauert, Spatial hearing: the psychophysics of human sound localization (MIT press, 1997).

[12] J. Blauert, The technology of binaural listening (Springer, 2013).

[13] D. Hammersho, H. Mo, et al., Sound transmission to and within the human ear canal, The Journal of theAcoustical Society of America 100, 408 (1996).

[14] F. L. Wightman and D. J. Kistler, Headphone simulation of free-field listening. ii: Psychophysical valida-tion, The Journal of the Acoustical Society of America 85, 868 (1989).

[15] W. G. Gardner and K. D. Martin, Hrtf measurements of a kemar, The Journal of the Acoustical Society ofAmerica 97, 3907 (1995).

[16] J. Blauert, M. Brueggen, K. Hartung, A. W. Bronkhorst, R. Drullmann, G. Reynaud, L. Pellieux, W. Krebber,and R. Sottek, The audis catalog of human hrtfs, in 16th International Congress of Acoustics (1998) pp.2901–2902.

[17] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, The cipic hrtf database, in Applications ofSignal Processing to Audio and Acoustics, 2001 IEEE Workshop on the (IEEE, 2001) pp. 99–102.

[18] N. Gupta, A. Barreto, M. Joshi, and J. C. Agudelo, Hrtf database at fiu dsp lab, in Acoustics Speech andSignal Processing (ICASSP), 2010 IEEE International Conference on (IEEE, 2010) pp. 169–172.

[19] H. Wallach, E. B. Newman, and M. R. Rosenzweig, A precedence effect in sound localization, The Journalof the Acoustical Society of America 21, 468 (1949).

37

38 BIBLIOGRAPHY

[20] R. Y. Litovsky, H. S. Colburn, W. A. Yost, and S. J. Guzman, The precedence effect, The Journal of theAcoustical Society of America 106, 1633 (1999).

[21] C. E. Rice and S. H. Feinstein, The influence of target parameters on a human echo-detection task. in Pro-ceedings of the Annual Convention of the American Psychological Association (American PsychologicalAssociation, 1965).

[22] V. Twersky, On the scattered reflection of scalar waves from absorbant surfaces. (Reports from the Math-ematics Research group, (EMblebb), Washington Square College of New York University., 1950).

[23] V. Twersky, On the non-specular reflection of plane waves of sound, The Journal of the Acoustical Societyof America 22, 539 (1950).

[24] V. Twersky, On the physical basis of the perception of obstacles by the blind, The American journal ofpsychology , 409 (1951).

[25] I. Kohler, Orientation by aural clues, (American Foundation for the Blind, Research Bulletin, 1964) pp.14–53.

[26] C. E. Rice, Perceptual enhancement in the early blind? The Psychological Record (1969).

[27] C. Rice, Early blindness, early experience and perceptual enhancement, Res Bull Am Found Blind 22, 1(1970).

[28] B. N. Schenkman, Human echolocation as a function of kind of sound source and object position (Depart-ment of Psychology, University of Uppsala, 1983).

[29] W. Dolanski, I. les aveugles possèdent-ils le «sens des obstacles, L’année psychologique 31, 1 (1930).

[30] V. Dolanski, Do the blind sense obstacles, And There Was Light 1, 8 (1931).

[31] N. Clarke, G. Pick, and J. Wilson, Obstacle detection with and without the aid of a directional noise gen-erator. American Foundation for the Blind, Research Bulletin (1975).

[32] J. Wilson, Psychoacoustics of obstacle detection using ambient or self-generated noise, Animal Sonar Sys-tems. Biology and Bionics 1, 89 (1967).

[33] B. N. Schenkman, Human echolocation: A review of the literature and a theoretical analysis (Departmentof Psychology, University of Uppsala, 1985).

[34] I. G. Bassett and E. J. Eastmond, Echolocation: Measurement of pitch versus distance for sounds reflectedfrom a flat surface, The Journal of the Acoustical Society of America 36, 911 (1964).

[35] C. E. Rice, S. H. Feinstein, and R. J. Schusterman, Echo-detection ability of the blind: Size and distancefactors. Journal of Experimental Psychology 70, 246 (1965).

[36] C. E. Rice and S. H. Feinstein, Sonar system of the blind: size discrimination, Science 148, 1107 (1965).

[37] J. Juurmaa, Analysis of orientation ability and its significance for the rehabilitation of the blind. Scandi-navian journal of rehabilitation medicine 1, 80 (1968).

[38] J. Juurmaa, On the accuracy of obstacle detection by the blind. New Outlook for the Blind (1970).

[39] W. N. Kellogg, Sonar system of the blind new research measures their accuracy in detecting the texture,size, and distance of objects" by ear.", Science 137, 399 (1962).

[40] C. E. Rice, Human echo perception, Science 155, 656 (1967).

[41] S. Hausfeld, R. P. Power, A. Gorta, and P. Harris, Echo perception of shape and texture by sighted subjects,Perceptual and Motor Skills 55, 623 (1982).

[42] L. H. Riley, D. M. Luterman, and M. F. Cohen, Relationship between hearing ability and mobility in ablinded adult-population, New Outlook for the Blind 58, 139 (1964).

BIBLIOGRAPHY 39

[43] H. Laufer, The detection of obstacles with the aid of sound directing devices, Biological Review 10, 30(1948).

[44] M. Supa, M. Cotzin, and K. M. Dallenbach, " facial vision": The perception of obstacles by the blind, TheAmerican Journal of Psychology , 133 (1944).

[45] M. Cotzin and K. M. Dallenbach, " facial vision:" the role of pitch and loudness in the perception of obsta-cles by the blind, The American journal of psychology , 485 (1950).

[46] V. Twersky, Auxiliary mechanical sound sources for obstacle perception by audition, The Journal of theAcoustical Society of America 25, 156 (1953).

[47] W. Wiener and G. Lawson, Audition for the traveler who is visually impaired, Foundations of orientationand mobility 2, 104 (1997).

[48] C. Carlson-Smith and W. Wiener, The auditory skills necessary for echolocation: a new explanation, Jour-nal of Visual Impairment and Blindness 90, 21 (1996).

[49] D. R. Griffin, Listening in the dark: the acoustic orientation of bats and men. (1958).

[50] P. Ladefoged and A. Traill, Clicks and their accompaniments, Journal of Phonetics 22, 33 (1994).

[51] C. Rice, The human sonar system, (Animal Sonar Systems Conference: Biology and bionics. NATO Ad-vanced Study Institute, Frascati, Italy.[aJKO], 1966) pp. 719–755.

[52] C. Rice, Quantitative measures of unaid echo detection in the blind: Auditory echo localization. (Pro-ceedings of the International Conference on Sensory Devices for the Blind., 1966) pp. 89–102.

[53] J. B. Allen and D. A. Berkley, Image method for efficiently simulating small-room acoustics, The Journal ofthe Acoustical Society of America 65, 943 (1979).

[54] M. Kleiner, B.-I. Dalenbäck, and P. Svensson, Auralization-an overview, Journal of the Audio EngineeringSociety 41, 861 (1993).

[55] A. Pietrzyk, Computer modeling of the sound field in small rooms, in Audio Engineering Society Confer-ence: 15th International Conference: Audio, Acoustics & Small Spaces (Audio Engineering Society,1998).

[56] E. A. Habets, Room impulse response generator, Technische Universiteit Eindhoven, Tech. Rep 2, 1 (2006).

[57] D. Botteldooren, Finite-difference time-domain simulation of low-frequency room acoustic problems, TheJournal of the Acoustical Society of America 98, 3302 (1995).

[58] L. Savioja, J. Backman, A. Järvinen, and T. Takala, Waveguide mesh method for low-frequency simulationof room acoustics, (1995).

[59] H. Kuttruff, Room acoustics (CRC Press, 2009).

[60] A. Kulowski, Algorithmic representation of the ray tracing technique, Applied Acoustics 18, 449 (1985).

[61] L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänänen, Creating interactive virtual acoustic environments,Journal of the Audio Engineering Society 47, 675 (1999).

[62] P. M. Peterson, Simulating the response of multiple microphones to a single acoustic source in a reverber-ant room, The Journal of the Acoustical Society of America 80, 1527 (1986).

[63] A. D. Pierce et al., Acoustics: an introduction to its physical principles and applications (Acoustical Societyof America Melville, NY, 1991).

Documents

Simulation of room acoustics to train some aspects of ...upcommons.upc.edu/bitstream/handle/2117/80266/Edua... · nied by another person or with a seeing eye dog. However, it is less