Compression Techniques For Digital Hearing Aids

Compression Techniques For Digital Hearing Aids

Garini Nikoleta

September 15, 2009

Preface

This thesis is based upon studies conducted during October 2008 to August 2009 atthe Department of Electrical and Computer Engineering of the University of Patras. Itdeals with some basic issues related to Digital Hearing Aids and more specifically, with thematter of compression in hearing aid devices.

There has been explosion in the number of digital hearing aids on the market in thelast five years. At last count, there are 22 manufacturers with digital hearing aids marketedunder 40 different model names. Manufacturers are moving toward their third or fourthgeneration of digital products.

The first chapter is a general introduction to hearing aids. It refers briefly to the humanauditory system and the exact problems faced by people with hearing impairment. It alsopresents the underlying theory behind compression and its major role in decreasing the rangeof sound levels in the environment to better match the dynamic range of a hearing-impairedperson. Compression systems are used to achieve specific aims and different compressionparameters are needed for each rationale.

Chapter 2 contains different approaches for frequency compression. Some of them areMultiband Compression, Wide Dynamic Range Compression and Output Limiting Com-pression. The classic frequency-domain compression uses FFT processing and the ideal andpractical FFT systems are described. In order to approximate the non-uniform frequencyresolution of the human auditory system, warped compression systems are used for speechenhancement.

Chapter 3 is dedicated to the theory of Multirate Filter Banks and the Polyphase De-composition as an efficient way of implementing them. A different prototype filter design isthoroughly described and is proposed since it provides a minimum combined approximationerror.

Chapter 4 explains the approximation of the time-domain post filter with gain coeffi-cients being adapted at the frequency domain by an allpole filter of lower degree. A way ofeliminating sharp zeros in the filters frequency response is suggested and simulation resultsprovide us an evaluation of the proposed technique.

Appendices A, B, C and D serve as reference and provide Matlab code and someuseful proofs and derivations.

1

Acknowledgements

This master thesis was successfully completed during my graduation at the inter-departmentalprogram Signal Processing & Communication Systems of the Department of ComputerEngineering and Informatics at the University of Patras. Its main target is to enhance andevaluate some compression techniques applied to Digital Hearing Aids.

I am deeply thankful to my Professor George Moustakides for his advice, for his uniquesupport and for the pleasant environment he has offered me at the Department of Electricaland Computer Engineering at the University of Patras. His enthusiasm about the projectand his experience helped me to copy with issues in Digital Signal Processing that seemedto me difficult at first.

The master thesis evaluation was performed by Nikolaos P.Galatsanos, Professor atthe Department of Electrical and Computer Engineering at the University of Patras, andProfessor Emmanouil Psarakis of the Department of Computer Engineering and Informaticsat the University of Patras.

Garini NikoletaPatras, 2009

2

Contents

1 Introduction to Hearing Aids 81.1 Description of Human Auditory System and Acoustic Measurements . . . . 9

1.1.1 Cochlear Tuning and Frequency Selectivity . . . . . . . . . . . . . . 101.1.2 Linear Amplifiers and Gains . . . . . . . . . . . . . . . . . . . . . . . 121.1.3 Sound Pressure Level and Absolute Threshold of Hearing . . . . . . 12

1.2 Problems Faced by Hearing-impaired People . . . . . . . . . . . . . . . . . . 141.3 Compression In Hearing Aids . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.3.1 Compressions major role: Reducing Signals Dynamic Range . . . . 161.3.2 Basic Characteristics of a Compressor . . . . . . . . . . . . . . . . . 171.3.3 Rationales for use of Compressors . . . . . . . . . . . . . . . . . . . 19

2 Approaches for Compression in FrequencyDomain 222.1 Multiband Compression and FFT Processing . . . . . . . . . . . . . . . . . 222.2 FrequencyDomain Compression . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.1 Describing Ideal and Practical FFT system . . . . . . . . . . . . . . 242.2.2 SideBranch Architecture . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Warped Compression System . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.1 Concerns in designing Compression Systems . . . . . . . . . . . . . . 262.3.2 Digital Frequency Warping . . . . . . . . . . . . . . . . . . . . . . . 272.3.3 Compressor using frequency warping . . . . . . . . . . . . . . . . . . 28

2.4 Warped LowDelay Post Filter . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.1 Warped Post Filter for Speech Enhancement . . . . . . . . . . . . . 292.4.2 Warped Low Delay Post Filter . . . . . . . . . . . . . . . . . . . . . 31

3 Filter Banks and Prototype Filter Structures 323.1 Multirate Systems and Filter Banks . . . . . . . . . . . . . . . . . . . . . . 323.2 Uniform and Non-Uniform DFT Filter Banks . . . . . . . . . . . . . . . . . 333.3 Polyphase Representation of Filter Banks . . . . . . . . . . . . . . . . . . . 35

3.3.1 Basic concept of Polyphase Decomposition . . . . . . . . . . . . . . 353.3.2 Why the name Polyphase Decomposition? . . . . . . . . . . . . . 363.3.3 Polyphase Implementation of Uniform DFT filter banks . . . . . . . 36

3.4 Efficient Non-Uniform Filter Bank Equalizer . . . . . . . . . . . . . . . . . . 383.5 Prototype Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.5.1 Different Parameters of prototype filter . . . . . . . . . . . . . . . . 423.5.2 Realization of different prototype filter structures . . . . . . . . . . . 43

3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3

4 Low Delay Time-Domain Post Filter 464.1 Uniform Auto-Regressive Low-Delay Post Filter . . . . . . . . . . . . . . . . 464.2 Allpass transformed Auto-Regressive Low-Delay Post Filter . . . . . . . . . 50

4.2.1 Approximation of the warped post filter . . . . . . . . . . . . . . . . 504.2.2 Approximation of the uniform post filter . . . . . . . . . . . . . . . . 51

4.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.4 Low Delay FIR Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . 554.5 Elimination of Deep Nulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

A The DFT and IDFT Matrices 64

B Solve min-max problem using Linear Programming 65

C Proofs & Derivations 67

D Matlab code 69

4

List of Figures

1.1 The anatomy of the peripheral auditory system [3]. . . . . . . . . . . . . . . 91.2 Frequency Threshold Tuning curves. . . . . . . . . . . . . . . . . . . . . . . 101.3 Tuning curves for five neurons [6]. . . . . . . . . . . . . . . . . . . . . . . . 111.4 Results of a masking experiment [7]. . . . . . . . . . . . . . . . . . . . . . . 111.5 Human auditory thresholds as a function of frequency. Sounds that fall in

the shaded region below the curve are below threshold and therefore inaudible. 121.6 Saturation Sound Pressure Level Frequency Response of a hearing-aid [1]. . 131.7 Audiogram with different speech sounds. . . . . . . . . . . . . . . . . . . . . 131.8 Decreased Dynamic Range for hearing-impaired people. . . . . . . . . . . . 151.9 Input/Output curves showing effects of Output Limiting Compression(left)

and Wide Dynamic Range Compression(right). . . . . . . . . . . . . . . . . 171.10 The effects of a compressor on a signal. Only the middle portion of the input

is above the compressors threshold. Note the overshoot when the signal levelincreases (it takes some time for the gain to decrease), and the attenuationwhen the input signal returns to the first level (and the gain increases). Therelease time is generally longer than the attack time. . . . . . . . . . . . . . 18

2.1 Block diagram of a multi-channel compression system. . . . . . . . . . . . . 232.2 Block diagram of an ideal frequency-domain compression system using 128-

point FFT and sampling rate 16kHz. . . . . . . . . . . . . . . . . . . . . . 242.3 Side-branch compression architecture. . . . . . . . . . . . . . . . . . . . . . 252.4 Group delay in samples for a single all-pass filter having the warping param-

eter a=0.5756 [12]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.5 Block diagram of a compression system using frequency warping for both

frequency analysis and filtered signal synthesis. . . . . . . . . . . . . . . . . 282.6 Diagram of a warped post-filter for speech enhancement. . . . . . . . . . . . 302.7 Diagram of a warped low-delay post-filter for speech enhancement. . . . . . 31

3.1 Block Diagram of an M-band analysis-synthesis filter bank. . . . . . . . . . 323.2 Typical filter responses of digital filter banks. . . . . . . . . . . . . . . . . . 333.3 The simplest example of a uniform-DFT filter bank. . . . . . . . . . . . . . 343.4 Schematic of the relation between h(n) and its l-th polyphase component. . 363.5 A prototype lowpass response of an M-th band filter. . . . . . . . . . . . . . 363.6 Implementation of the uniform DFT bank using polyphase decomposition. . 373.7 Polyphase decomposition of the uniform DFT filter bank with decimation by

a factor of M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.8 Polyphase network (PPN) realization of a DFT analysis-synthesis filter bank

for a prototype filter of length L + 1 = 2M. . . . . . . . . . . . . . . . . . . 393.9 System of filter-bank equalizer. . . . . . . . . . . . . . . . . . . . . . . . . . 40

5

3.10 Filter-bank summation method for time-varying spectral gain factors Wi(k)adapted at a reduced sampling rate. . . . . . . . . . . . . . . . . . . . . . . 40

3.11 Polyphase network implementation of the FBE for the direct-form filter. . . 43

4.1 Approximation of a uniform FIR filter by a uniform AR filter. . . . . . . . . 464.2 Basic concept of approximation: the two signals y and e must have the same

statistical characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3 Approximation of the uniform post-filter by a warped AR filter. . . . . . . . 514.4 Network for calculation of the (P + 1)-warped impulse autocorrelation coef-

ficients hshs(). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.5 Approximation of the uniform time-domain post-filter by an allpass trans-

formed AR filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.6 Magnitude Response in dB and approximation of an FIR filter by a uniform

AR filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.7 Magnitude Response in dB and approximation of a uniform FIR post-filter

by an allpass transformed AR filter. . . . . . . . . . . . . . . . . . . . . . . 564.8 Magnitude Response in dB and approximation of a warped FIR post-filter

by a warped AR filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.9 Introduction of an all-pole filter so as to eliminate deep nulls in FIR filter

frequency response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.10 Magnitude and Phase Response of a filter with one sharp zero at a specified

frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.11 Magnitude and Phase Response of a filter after the elimination of its sharp

zero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6

List of Tables

3.1 The maximum combined error for 32 frequency bands. . . . . . . . . . . . . 443.2 The maximum combined error for 64 frequency bands. . . . . . . . . . . . . 443.3 The maximum combined error for different compression factors Mc. . . . . . 45

4.1 The maximum approximation error for different filter lengths in symmetricaland non-symmetrical case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7

Chapter 1

Introduction to Hearing Aids

Hearing aids are devices that partially overcome auditory deficits and are normally employedto compensate for hearing-loss in hearing-impaired people. The main objective of a hearingaid is to fit the dynamic range of speech and everyday sounds into the restricted dynamicrange of the impaired ear. In order to achieve a better understanding of this device and itsfunction, we need to explain how the sound is perceived by the human auditory system andwhich exactly the problems encountered by hearing-impaired people are.

Some sounds are totally inaudible and others can be detected because part of theirspectra is audible,but may not be correctly identified because other parts of their spectra-typically those parts at high frequencies-remain inaudible.The range of levels between theweakest sound that can be heard and the most intense sound that can be tolerated is lessfor a person with hearing impairment than for a normal listener.To compensate for this,hearing aids amplify weak sounds more than they amplify intense sounds [1].

The most common type of hearing loss is sensorineural hearing loss in which the rootcause lies in the vestibulocochlear nerve, the inner ear or central processing centers of thebrain. People with sensorineural hearing loss usually experience abnormal perception ofloudness named loudness recruitment, i.e. a slight increase in sound intensity above thethreshold of hearing can be unbearably loud for them but the very-low-intensity soundsare inaudible. Sensorineural impairment diminishes the ability of a person to detect andanalyze energy at one frequency in the presence of energy at other frequencies. Similarly,a person with hearing loss has decreased ability to hear a signal that rapidly follows,or israpidly followed by a different signal. This decreased frequency and temporal resolutionmakes it more likely for a hearing-impaired person that noise will mask speech.

In order to avoid such difficulties, different types of compression hearing aids are there-fore suggested. The compression algorithm is a system dependent characteristic since thecore of used hearing aid forces the set of allowed algorithms. Apart from compression, themain parameters to program are the Noise Reduction techniques and Feedback Cancella-tion algorithms. Noise Reduction is an important stage in the hearing aid signal processingsince hearing-impaired people have to understand speech with background noise. The thirdproblem to solve is Feedback Cancellation in order to fit the hearing aid to the patient. Thisphenomenon is produced when the sound goes from the loudspeaker to the microphone. Itoften causes the hearing aid howl and limits the maximum gain that can be used withoutinstability, reducing the sound quality when the gain is close to the limit.

8

CHAPTER 1. INTRODUCTION TO HEARING AIDS 9

1.1 Description of Human Auditory System and AcousticMeasurements

In order to hear a sound, the auditory system must accomplish three basic tasks. First itmust deliver the acoustic stimulus to the receptors; second, it must transduce the stimulusfrom pressure changes into electrical signals; and third, it must process these electricalsignals so that they can efficiently indicate the qualities of the sound source such as pitch,loudness and location. The human ear can be divided into three fairly distinct componentsaccording to both anatomical position and function: the outer ear, which is responsiblefor gathering sound energy and funnelling it to the eardrum, the middle ear which acts asa mechanical transformer and the inner ear where the auditory receptors (hair cells) arelocated [2]. The Fig. 1.1 shows a detailed aspect of anatomy of the human ear:

Figure 1.1: The anatomy of the peripheral auditory system [3].

Sound waves enter the auditory canal, travel through it and hit the tympanic membrane(eardrum). This wave information travels across the air-filled middle ear cavity via a series ofdelicate bones (malleus,incus and stapes) which convert the lower-pressure eardrum soundvibrations into higher-pressure sound vibrations at the oval window. Higher pressure isnecessary because the inner ear beyond the oval window contains liquid rather than air.Consequently, the sound information is converted from waveform to nerve impulses in thecochlea.

In the inner ear, the cochlea is a tube coiled up into a spiral, divided along its length bytwo membranes: Reissners membrane and the basilar membrane. It has three fluidfilledsections: scala media with the Organ of Corti which transforms mechanical waves to electricsignals in neurons, scala tympani and scala vestibuli [4]. The Organ of Corti has hair cellswhich are columnar cells with a bundle of 100200 specialized mechanosensors for hearingat the top (cilia) that transform the fluid waves into nerve signals. Atop the longest ciliarests the tectorial membrane which moves back and forth with each cycle of sound, tiltingthe cilia and allowing electric current into the hair cell. The sound information travels downthe vestibulocochlear nerve and is further processed until it eventually reaches the thalamusfrom where it is relayed to the cortex.


1.1.1 Cochlear Tuning and Frequency Selectivity

Hair cells are the sensory receptors of both the auditory and the vestibular systems andtransform mechanical energy into neural signals. They are mainly classified as inner-haircells and outer-hair cells which are over three times more numerous and affect the responseof the basilar membrane. Mechanical properties of the basilar membrane affect the way itresponds to sounds of different frequencies.

It is known that the location of the peak of the traveling wave on the basilar membraneis determined by the frequency of the originating sound. When a certain frequency soundstimulates a point on the membrane, it responds by moving and hair cells at that site arestimulated by the force that this movement creates. Therefore, groups of hair cells onlyrespond if certain frequencies are present in the originating sound [5].

Each place on the basilar membrane is tuned to s particular characteristic frequency. Asa whole, the basilar membrane behaves as a bank of over-lapping bandpass filters (auditoryfilters). In this way, it extracts quite detailed information about the spectral decompositionof sounds and performs a partial spectral/Fourier analysis of the sound, with each placeon it being most sensitive to a different frequency component. The frequency sensitivity ofa hair cell can be displayed as a tuning curve and the phenomenon is know as cochleartuning.

Frequency Threshold Tuning curves can be obtained by finding the level of a pure tonerequired to produce a justmeasurable increase in the firing rate of a neuron, as a functionof frequency of the pure tone. These curves are equivalent to the tuning curves on thebasilar membrane, they are characteristically V-shaped as shown is Fig. 1.2 and their peakrepresents the frequency at which the cell is most sensitive:

Figure 1.2: Frequency Threshold Tuning curves.

The closer the frequency of the tone to the characteristic frequency of the neuron, thelower is the level required. The important point is that frequency selectivity in the auditorynerve is very similar to that on the basilar membrane, since each nerve fiber innervates asingle inner hair cell. Because of the difficulties involved in measuring the vibration of thebasilar membrane directly, most of our knowledge derives from auditory nerve recording.An example is illustrated in Fig. 1.3 which shows frequency threshold tuning curves recordedfrom the auditory nerve of a chinchilla (by Ruggero and Semple,1992). Five curves are showndepicting the tuning properties of five neurons with characteristic frequencies ranging fromabout 500Hz to 16kHz. The tuning curves for those five neurons are plotted on a linear anda logarithmic axis.

Frequency Selectivity is one of the most important topics in hearing, because thenature of auditory perception is largely determined by the ears ability to separate out the


Figure 1.3: Tuning curves for five neurons [6].

different frequency components of sounds. It refers to the ability of the auditory system toresolve the components of sinusoidal waves in a complex sound. Frequency selectivity canbe measured at all stages of the auditory system from the basilar membrane to the auditorycortex, as well as in our perceptions.

The perception of a sound depends not only on its own frequency and intensity but alsoon other sounds present at the same time. For example, typical classroom sounds, createdby movement, coughing, rustling of papers, make the instructors voice difficult to hear.This phenomenon is called masking. Technically speaking, masking is defined as the rise inthreshold of one tone (test tone) due to the presence of another (masker) tone. It is knownthat a signal is most easily masked by a sound having frequency components close to thoseof the signal. This led to the idea that our ability to separate the components of a complexsound depends on the frequency-resolving power of the basilar membrane. It also led to theidea that masking reflects the limits of frequency selectivity and provides a way to quantifyit. Fig. 1.4 illustrates the results of a masking experiment. The line indicates the amountthat the threshold is raised in the presence of a masking noise centered at 410Hz. So for a410Hz tone, the threshold is raised by about 60 dB above absolute threshold.

Figure 1.4: Results of a masking experiment [7].

As far as the perception of intensity is concerned, the human ear has incredible absolutesensitivity and dynamic range. The most intense sound we can hear without immediatedamage to the ear is at least 140 dB above the faintest sound we can just detect. Thiscorresponds to an intensity ratio of 100.000.000.000.000:1. The Absolute Threshold is thesmallest value of some stimulus that a listener can detect. In order to investigate our


perceptual capabilities, it is useful to generate an absolute threshold curve, which relatesthe frequency of a signal to the intensity at which it can be detected by the ear. Fig. 1.5shows a plot of the thresholds of hearing for a range of frequencies.

Figure 1.5: Human auditory thresholds as a function of frequency. Sounds that fall in theshaded region below the curve are below threshold and therefore inaudible.

The smallest detectable change in intensity which is a matter of Intensity Discriminationis measured using a variety of psychophysical methods and various stimuli. Although thedifference threshold depends on several factors including duration, intensity and the kindsof stimuli on which the measurement is made, Webers law holds for most stimuli. In otherwords, the smallest detectable change is a constant fraction of the intensity of the stimulus.

1.1.2 Linear Amplifiers and Gains

Amplifiers inside hearing aids can be classified as linear and nonlinear. Linear amplifiersmultiply the input signal by a fixed amount despite its magnitude. The behavior of alinear amplifier is not affected by how many signals it is amplifying at the same time. Forexample, if signal C is amplified by 30dB when it is the only signal present in the input,then it will still be amplified by 30dB even when several other signals are simultaneouslybeing amplified by the device [1].

The gain of any device relates the amplitude of the signal coming out of the device tothe amplitude of the signal going into the device. Gain is thus calculated as the outputamplitude divided by the input amplitude or as the Output Level minus the Input Levelexpressed in dB SPL(Sound Pressure Level). To fully describe the gain of a linear amplifier,it is necessary to state its gain at every frequency within the frequency range of interest.This is referred to as the gainfrequency response or gain curve. Thus, the degree ofamplification is represented as a graph of Gain versus frequency(GainFrequency Response)or a graph of Output Level versus Input Level (IOcurve) which shows the dependence ofOutput Sound Pressure Level on Input Sound Pressure Level for a particular signal orfrequency. It should be noted that the highest level produced by a hearing aid is know asSaturation Sound Pressure Level (SSPL).

1.1.3 Sound Pressure Level and Absolute Threshold of Hearing

All amplifiers become nonlinear when the input or output signals exceed a certain level.This happens because amplifiers are unable to handle signals larger than the voltage of the


battery that powers the amplifier. As with gain, the SSPL varies with frequency and auseful measure is the SSPL Response curve.

Figure 1.6: Saturation Sound Pressure Level Frequency Response of a hearing-aid [1].

The Absolute Threshold of Hearing is the minimum sound level of a pure tone thatan average ear with normal hearing can hear in a noiseless environment. It relates to thesound that can just be heard by the organism and it is not a discrete point and is thereforeclassed as the point at which a response is elicited a specified percentage of the time. It isexpressed in dB SPL and can be measured using psychological methods.

For a hearingimpaired person, threshold of hearing is different from that of a normallistener and a way to determine the hearing loss is Acoustic Audiogram. An Audiogram isa chart depicting hearing test results:

Figure 1.7: Audiogram with different speech sounds.

A hearing testwhich is performed by an audiologist in a sound insulated room deter-mines a persons hearing sensitivity at different frequencies and diagnoses the exact typeof hearing loss. Generally, prolonged exposure to excessive sound levels may cause hearingdefects. There are two types of hearing loss in specific:

Sensorineural hearing loss occurs when the hair cells of cochlea are damaged or wornout. Typical causes are the aging process and excessive exposure to noise. This is the


most common type of hearing loss and as yet there is no cure, though hearing aidscan help.

Conductive hearing loss occurs when the sound is not being transmitted through theear canal and middle ear to the inner ear. Common causes are wax in the ear canal,fluid in the middle ear or damage to the middle ear bones. This type of hearing losscan often be successfully treated with medication or surgery.

1.2 Problems Faced by Hearing-impaired People

The following problems are those that are mainly related to the most common type ofhearing loss, sensorineural hearing loss [1]:

Decreased AudibilityWhile hearing-impaired people do not hear some sounds at all, people with a severehearing loss may not hear any speech sound unless they are shouted and those with amoderate loss are more likely to hear some sounds and not others. Particularly, softerphonemesusually consonantsmay not be heard i.e. the sequence of sounds i e a a rmight have originate as pick the black harp and could have been heard as kick thecat hard. For people with hearingimpairment, essential parts of some phonemes arenot audible and they recognize sound by noting which frequencies contain the mostenergy. In general, the high-frequency components of speech are weaker than the lowfrequency components. Thus, hearingimpaired people usually miss highfrequencyinformation.

To overcome these problems, a hearing aid has to provide more amplification at fre-quencies where speech has the weakest components (usually high frequencies). Hence,hearing aids provide different amount of gain in different frequency regions.

Decreased Dynamic RangeUnfortunately, it is not always appropriate to amplify soft sounds by the amountneeded to make them audible. Sensorineural hearing loss increases the threshold ofhearing much more than the threshold of loudness discomfort. The dynamic range ofthe ear is the level difference between discomfort and threshold of audibility and itis less in case of a hearingimpaired person. Reduced dynamic range for people withhearing loss is depicted in Fig. 1.8.

If the sounds of the environment are to fit within the restricted dynamic range of apatient, then the hearing aid must amplify weak sounds more than it amplifies intensesounds. This is the main target of compression.

Decreased Frequency ResolutionPeople with sensorineural loss deal with the difficulty of separating sounds of differentfrequencies which are represented at different places within the cochlea. Decreasedfrequency resolution of these people is due to the loss of the ability of outer hair cellsto increased sensitivity of the cochlea for tuning frequencies (frequencies at which theaffected part of the cochlea is tuned). The essence is that even when a speech and anoise component have different frequencies, which are close enough, the cochlea willhave a single broad region of activity rather than two finely tuned separate regions.The brain is thus unable to untangle the signal from the noise.


Figure 1.8: Decreased Dynamic Range for hearing-impaired people.

If frequency resolution is sufficiently decreased,relatively intense lowfrequency partsof speech may mask the weaker higher frequency components. This is referred to asupward spread of masking and a prescribed hearing aid will minimize its amount bymaking sure that there is no frequency region in which speech is much louder thanfor the remaining regions.

Even normalhearing people have poorer resolution at high intensity levels than atlower levels. The difficulty of hearingimpaired ones in separating sounds is partlycaused by the damaged hair cells in the cochlea and partly by their need to listenat elevated level. In general, frequency resolution gradually decreases as hearing lossincreases.

Some ways to minimize problems caused by decreased frequency resolution are noisereduction at the hearing aid system, usage of directional microphones to focus onwanted sounds and suppress those coming from unwanted directions and lastly, ap-propriate variation of gain with frequency so that lowfrequency parts of speech ornoise will not mask highfrequency parts.

Decreased Temporal Resolution Weaker sounds may often be masked by intensesounds that immediately precede or follow them and this affects speech intelligibility.As hearing loss gets worse, the ability to hear weak sounds during brief gaps in amore intense masker gradually decreases. To compensate for this decreased temporalresolution, hearing aids perform FastActing Compression where the gain is rapidlyincreased or decreased during weak or intense sounds respectively. The main disadvan-tage of this method is that unwanted weak background noises are made more audible.Combining all the above mentioned deficits which cause hearing loss, we conclude thatall together can cause severe reduction in intelligibility. A hearingimpaired personneeds a better Signal-to-Noise-Ratio (SNR) than that of a normal listener, if both areto understand the same amount of speech. More specifically, the average SNR deficitassociated with a moderate hearing loss is estimated to be about 4dB while in case of


a severe hearing loss is about 10dB.

1.3 Compression In Hearing Aids

Compressions major role is to decrease the range of sound levels in the environment so as tobetter match the dynamic range of a person with hearing impairment. The compressor maybe most active at low, mid or high sound levels or it may vary its gain across a wide rangeof sound levels, in which case it is known as Wide Dynamic Range Compressor (WDRC).A compressor can react to a change in Input Level within only a few thousands of a secondor it can be as slow as to spend many tens of seconds to fully react. The degree to which acompressor finally reacts at a change in input level may be represented as an InputOutputDiagram or as a GainInput Diagram.

Compression may be either linear or nonlinear. For sounds of a given frequency, a linearcompressor amplifies by the same amount no matter what the level of the signal is, or whatother sounds are simultaneously present.In this case, the problem is that intense soundsbecome more intense and thus, annoying. The solution is to put a compression thresholdwhich is the input level above which the compressor and is clearly visible on InputOutputDiagram.

Another measure which is related to the slope of the curves on IO Diagram or GainInput Diagrams is the compression ratio that describes the variation in Output Level thatcorresponds to a variation in Input Level.

Benefits of compression can be summarized as follows:

It can make lowlevel speech more intelligible, by increasing gain and hence audibility. It can make highlevel sounds more comfortable and less distorted. In midlevel environments, it offers little advantage relative to a wellfitted linear aid,

but once the Input Level varies from this, its advantages become evident.

However, the two most important disadvantages are:

Greater likelihood of feedback oscillation and excessive amplification of unwanted lowerlevel background noises.

1.3.1 Compressions major role: Reducing Signals Dynamic Range

The rationale for compression is to compensate for the reduced dynamic range found in theimpaired ear and the increased growth of loudness (recruitment) that accompanies hearingloss. In fact, a compressor is an amplifier that automatically turns its gain down as theinput signal level rises.

There are three basic ways by which the dynamic range of signals can be reduced:

LowLevel Compression where after amplification lower levels come closer togetherwhile the spacing of upper levels is not affected.

Wide Dynamic-Range Compression (WDRC) in which compression is applied moregradually over a wide range of input levels.

Compression Limiting or HighLevel Compression where lowlevel sounds are am-plified linearly, but the inputs from moderate to intense sounds are squashed into anarrower range of outputs. Its name is due to the fact that the output is not allowedto exceed a set limit.


Salient features of Output Limiting Compression and Wide Dynamic Range Compres-sion are shown in Fig. 1.9. Output Limiting Compression has two main features: highcompression kneepoint and high compression ratio. On the other hand, WDRC is associ-ated with low compression thresholds (below 55dB SPL) and low compression ratios (lessthan 5:1) [8].

Figure 1.9: Input/Output curves showing effects of Output Limiting Compression(left) andWide Dynamic Range Compression(right).

1.3.2 Basic Characteristics of a Compressor

A compressor is intrinsically a dynamic device: it changes the gain according to changes inthe input signal level. Sound in the environment is changing constantly in intensity overtime and a compression hearing aid has to respond to these changes. The dynamic aspectsof compression as well as its static compression characteristics must be taken into account.We refer below to some of its basic characteristics like attack and release times, compressionthreshold and compression ratio.

Dynamic Compression Characteristics

The attack and release times are the lengths of time it takes for a compression circuit torespond to changes in the intensity of an input SPL. Specifically, for any type of compressorthe attack time is the time taken for the compressor to react to an increase in signal level,while the release time is the time taken to react to a decrease in input level. In Fig. 1.10, weobserve two waveforms that are the input and the output to a compressor and we mentionthe attack and release transitions that follow a rise and a decrease respectively in signallevel:

Most attack and release times are set to achieve a best compromise between two un-desirable extremes. Times that are too fast will cause the gain to fluctuate rapidly andthis may cause a pumping reception by the listener. Quick attack times (i.e. less than10ms) prevent sudden, transient sounds from becoming too loud for the listener. In general,release times need to be longer than attack times to prevent a fluttering perception onthe part of the listener [8].

Although the attack and release times could be made to have extremely short values(close to zero), the consequences are undesirable. If the release time is too short, the gainwill vary during each voice pitch period and the compressor will thus distort the waveform.


Figure 1.10: The effects of a compressor on a signal. Only the middle portion of the inputis above the compressors threshold. Note the overshoot when the signal level increases(it takes some time for the gain to decrease), and the attenuation when the input signalreturns to the first level (and the gain increases). The release time is generally longer thanthe attack time.

On the other hand, if the attack time is extremely short and the release time long, distortionwill be minimal. However, very brief soundslike clicks will cause a decrease in gain andthe gain will stay low for a long time afterwards. Suitable values for attack times in hearingaids are usually around 5 ms, while release times are rarely less than 20ms. In addition,the attack and release times have a major effect on how compressors affect the levels of thedifferent syllables of speech.

It should be noted that apart from attack and release times, the parameters in a com-pression system are also the number of frequency channels and the compression ratios ineach frequency band. It may well be that the optimum compressor adjustment is a functionof the type and amount of background noise, interference and the characteristics of individ-ual hearing loss. A serious matter to deal with is to identify different sound environmentsfor the purpose of adjusting compression or other signalprocessing system parameters.

Static Compression Characteristics

The attack and release times tell us how quickly a compressor operates; we need differentterms to tell us by how much a compressor decreases the gain as level rises. After havingspecified these gain changes, we make the assumption that the compressor has had timeto fully react to variations in signal level and thus, we study the static characteristics thatare applicable to signals. The Sound Pressure Level above which the hearing aid beginscompression is referred to as the compression threshold. Another significant characteristicis the compression ratio which is defined as the change in Input Level needed to producea 1dB change in Output Level. Compression Ratios can have any value greater than 1:1and less than 1:1 is also possible but correspond to dynamicrange expanders rather thancompressors. In hearing aids with WDRC, compression ratios in the range of 1:5:1 to 3:1are very common.

As far as the DynamicRange Compressor is concerned, it involves several engineeringtradeoffs. It is very important to realize that there is not a unique best compressor design.


Each system involves tradeoffs between processing complexity, frequency resolution, timedelay and quantization noise. The most important processing concerns are the systemfrequency resolution and the processing time delay. Most digital compression systems usemultiple frequency bands. For any given processing approach, increased frequency resolutioncomes at the price of increased processing delay.

Interaction between static and dynamic aspects of compression

With incoming sounds, the attack/release times of a hearing aid interact with the compres-sion ratio and these interactions affect the sound quality for the listener. Fast attack/releasetimes have the effect of temporarily reducing the ratio or amount of compression for anygiven sound stimulus. In general, a combination of short attack/release times (i.e. 10ms)and high compression ratios (i.e. 10:1) cause distortion. If the same short attack/releasetimes are combined with low compression ratios (i.e. 2:1) then the sound quality is notquite so compromised.

Dynamic and static aspects of compression are found in predictable combinations today.Syllabic Compression, with its relatively short attack and release times, is mostly oftenassociated with Wide Dynamic Range Compression hearing aids that have a low compressionthreshold (or kneepoint) and low compression ratio of less than 5:1. It is also sometimesencountered with Output Limiting Compression hearing aids in which thresholds and ratiosof compression are high.

1.3.3 Rationales for use of Compressors

The following section tries to outline several theoretical reasons why compressors should beincluded in hearing aids:

Avoiding discomfort, distortion and damage

As the input to the hearing aid increases, its output cannot be allowed to keep on gettingbigger. There are two reasons why the maximum must be limited. Firstly, if excessivelyintense signals are presented to the hearing aid wearer, the resulting loudness will causediscomfort. Thus, this loudness discomfort level which is subjective for each wearer providesan upper limit to the hearing aid SSPL. Secondly, excessively intense signals may causefurther damage to the aid wearers residual hearing capability.

These two reasons explain why the maximum output must be limited, but this limitingcould be achieved with either peak clipping or compression limiting. The reason for prefer-ring compression limiting over peak clipping in nearly all cases is that peak clipping createsdistortion and even though so does compression limiting, the type of distortion created bypeak clipping is far more objectionable than the type created by compression limiting.

When compression limiting is used to control the SSPL of a hearing aid, a high compres-sion ratio is needed so that the output SPL does not rise significantly for very intense inputlevels. The attack time must be short so that gain decreases rapidly enough to preventloudness discomfort. As with all compressors, the release time must not be so short thatit starts distorting the waveform. If a hearing aid does not include a compression limiter,peak clipping will occur once the input signal becomes sufficiently intense. If the hearing aidcontains Wide Dynamic Range Compression, the input level needed to cause peak clippingmay be so high that peak clipping seldom occurs.


Reducing intersyllabic and interphonemic intensity differences

The most intense speech soundslike some vowelsare about 30dB more intense than theweakest soundslike some unvoiced consonants. For people with reduced dynamic rangeeven when range is adequate to hear weak phonemes without intense ones being too loud, itis likely for the weaker phonemes to be temporally masked by the stronger ones. A possiblesolution is to include a fastacting compressor and in such a case compression is calledsyllabic compression or phonemic compression.

A problem that might appear in fast compression is that it alters the intensity relation-ships between different phonemes and syllables. In some cases, the hearing aid wearer usesthe relative intensities of sounds to help identify them and thus, even if altering relativeintensities increases their audibility it decreases the intelligibility of some speech sounds.Another possible problem is the effect of compression on brief weak sounds that followclosely after sustained intense sounds. If the release time is longer than the gap betweenthe intense and the weak sound, then gain will still be decreased when the brief weak soundarrives. Hence, such weak sounds will be less audible than they would be in case of linearamplification.

The most severe problem of fastacting compression is that if the gain is fast enough toincrease when a soft phoneme occurs, it will also be fast enough in case of pauses betweenwords.This matters when there exists background noise which is less intense than speechand the compressor increases its gain during the noise and decreases it during speech. Thisdisadvantage has to be weighed against the advantages of fastacting compression.

One important observation is that compressors intended to decrease the intensity differ-ences between syllables must have compression thresholds low enough for the compression tobe active across a range of input levels and leave some intensity differences intact, and highenough to significantly decrease dynamic range. Attack and release times must be shortenough that the gain can vary appreciably from one syllable or phoneme to the following,but not so short as to create significant amount of distortion to the waveform.

Reducing differences in longterm level

As well as changing the intersyllabic relationships, the fastacting compressor decreasesthe mean level difference between the soft and the intense speech. An alternative use ofcompression is to decrease the longerterm dynamic range without changing the intensityrelationships between syllables that follow each other closely in time. This is achieved byusing longer attack and release times than the typical duration of syllables.

Normalizing loudness

Normalizing the perception of loudness is possibly the most popular rationale for usingcompression. It is known that sensorineural hearing loss greatly affects loudness perception.The principle of loudness normalization is as follows: For any input level and frequency,give the hearing aid the gain needed for the wearer to report the loudness to be the sameas that which a person with normal hearing would report. The required amount of gain ateach input level can be deduced by graphs that depict loudness of sounds at different levels.Loudness can be measured in several ways but only subjectively.

The most common way of achieving loudness normalization is with separate compres-sors located in each channel of a multichannel hearing aid. Alternatively, a hearing aidmay contain only two channels and have a compressor in only the highfrequency channel.However, it is possible to combine a compressor with a filter that alters its shape with


input level, so that even a single channel hearing aid can have a leveldependent frequencyresponse.

Maximizing intelligibility

Multichannel Compression can be used to achieve in each frequency region the amountof audibility that maximizes intelligibility, subject to some constraint about overall loud-ness. Although the overall loudness of broadband sounds may be well normalized, such anapproach will result in loudness not being normalized in any frequency region.

Reducing noise

The interfering effect of background noise is the single biggest problem faced by hearing aidwearers. There are several assumptions made so that compression will decrease the effectsof noise:

Noise usually has a greater lowfrequency emphasis than does speech and thus, thelowfrequency parts of speech are more likely to be mask and hence convey littleinformation.

Lowfrequency parts of noise may cause upward spread of masking and so mask thehighfrequency parts of speech.

Lowfrequency parts of noise contribute most to the loudness of noise. Noise is more of a problem in highlevel environments than in lowlevel environments.Consequently, if the lowfrequency parts of the noise cause masking and excessive loud-

ness and at the same time, lowfrequency parts of speech do not convey useful information,then increase of comfort and improvement in intelligibility can be achieved by decreasinglowfrequency gain in highlevel environments. Hearing aids aimed at noise reduction haveoften been marketed as Automatic Signal Processing Devices. An additional benefit of suchdevices is that the aid wearers own voice has a greater lowfrequency emphasis and a greateroverall level at the hearing aid microphone than the voice of other people. Consequently,lowfrequency compression can help give the wearers own voice a more acceptable tonalquality than would occur for linear amplification.

Although noise reduction discussed so far aims to minimize only lowfrequency noise,more advanced multichannel hearing aids can decrease noise or signal in any frequency regionwhere SNR is estimated to be particularly poor. This type of hearing aids estimates SNRwithin each channel by taking advantage of the fluctuations in level that are characteristicof speech, in comparison to the more constant level of background noise. In each channel,the envelope is analyzed by a speech or nonspeech detector where higher parts are assumedto represent the peaks of speech signal and lower level parts represent background noise.The speech/nonspeech detector combines these estimates of signal level and noise levelto estimate the SNR in each channel and thus, the appropriate gain for each channel iscalculated.

Chapter 2

Approaches for Compression inFrequencyDomain

Current hearing aid devices employ a set of bandpass filters with different gains. Thenumber of filters in a specified hearing aid may vary and their central frequencies aremost commonly set at 250Hz, 500Hz, 750Hz, 1kHz, 1.5kHz, 2kHz and 4kHz covering theaudible frequency range of the human ear [9]. The gains at these specified frequenciesare programmable and set according to the specific audiogram of a patient. We there-fore understand that existing methods for compression in hearing aids are related to thefrequencydomain in which gain calculation takes place.

First, a standard clinical procedure is followed to measure the patients audiogram,which normally shows the hearing loss at specified frequencies (i.e. 250Hz, 500Hz, 1kHz,2kHz and 4kHz). After that, a target is generated depending upon how much gain thepatient needs in order to compensate for hearing loss at the specified frequencies. At last,the target is programmed into the hearing aid, on which the patient is subjected to sometests under different sound situations. Mapping of the audiogram to the target is a basicprocess to preselect the required gains of the hearing aid for the user. It is thus desirable tomake this first gain as close to the real requirement as possible to save effort and time inthe following process.

2.1 Multiband Compression and FFT Processing

Multi-channel dynamic-range compression is a basic part of digital hearing aids. The de-sign of a digital compressor involves many considerations, including frequency resolution,processing group delay, quantization noise, and algorithm complexity. A multichannelcompressor combines a filter bank with compression in each frequency band. In mostimplementations, compressors operate independently in each channel but there are somesystems where compression gains can be grouped across adjacent bands. The compressoroutput involves the response of each frequency band to the signal present in that band andeven some simple signals might cause complicated responses. The system output is finallyproduced by adding compressed signals in each band as shown in Fig. 2.1:

Through multiband compression, hearing aids separate the input signal out to differentfrequency bands and each subband signal goes through a different channel. Each channel hasits own compressor and the amount of compression is different at each frequency dependingon the patients hearing loss or input signal level.The amount of compression is bigger forhigher compression ratios and low compression thresholds. Furthermore, a disadvantageof singlechannel compression over multichannel compression is that in the former, when

22

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN 23

Figure 2.1: Block diagram of a multi-channel compression system.

gain is reduced the frequency content of the signal is reduced across all frequencies, whilemultichannel compression avoids this problem.

The Fast Fourier Transform (FFT) provides a convenient way of calculating the spec-trum of a signal extracting its frequency information. The usefulness of FFT to digitalhearing aid processing is obvious given the fact that all hearing aid processing is dependenton the frequency of the signal, such as increasing the compression ratio at high frequencies.The healthy cochlea can be viewed as a biological Fourier Transform since it separates soundinto frequency regions along the basilar membrane, with high frequency sounds vibratingthe basilar end and low frequencies vibrating the apical end [10].

The frequency analysis techniqueFFT provides great sound quality, however it hasother challenges for effective implementation. The FFT technique is based on a uniformspacing of frequency components while the auditory system is based on a logarithmic spac-ing. The human ears ability to resolve sounds is best modeled by a system in which thebandwidth of frequency analysis is nearly constant at low frequencies and increases propor-tionally at higher frequencies (auditory Bark scale). This is due to the logarithmic frequencycoding on the basilar membrane.

The hearing instrument channels can be matched to the auditory system channels but isaccomplished at the expense of processing efficiency. For example, in an FFT based system,the uniformly spaced bands can be combined to provide bandwidths similar to the auditorysystem. This approach can provide an excellent representation of auditory system; however,it requires a highresolution FFT to achieve the necessary lowfrequency resolution. Whilethis approach can provide excellent sound quality, the required processing can delay theoutput. If this processing delay is too long, the hearing aid might have a negative userperception (e.g. an echo).

2.2 FrequencyDomain Compression

Filter banks represent one approach to timedomain processing. The input sequence isconvolved with the filters one sample at a time and the resulting output sequence is formedby summing the filter outputs. An alternative approach is to divide the signal into shortsegments, transform each segment into the frequency domain using an FFT, compute thecompression gains from the computed input spectrum and apply them to the signal, andthen inverse transform to return to the time domain.


2.2.1 Describing Ideal and Practical FFT system

Ideal FFT system

In Figure 2.2, a block diagram of a frequencydomain compressor is shown where samplingrate is 16kHz and FFT size is set to 128 samples [11]:

Figure 2.2: Block diagram of an ideal frequency-domain compression system using 128-pointFFT and sampling rate 16kHz.

Initially, the input fills a data buffer, is windowed and zeropadded. Using overlapadd method, the FFT of the block is calculated and the power spectrum is estimated ata 125Hz frequency spacing. These power estimates in the desired frequency bands arecomputed in individual frequency bins at low frequencies and combined frequency bins athigher frequencies. In this way, approximation to human auditory frequency analysis isachieved. For each block of data, power spectrum is thus computed and a sequence ofsignal samples is produced at the block sampling rate.

Compressor gains in each band are computed for the FFT system and afterwards, theFFT of the input signal is multiplied by the compressor gains to give the compressed signalin frequencydomain. The compressed signal is finally inverse transformed to give the timesequence and all sequences are combined using overlapadd technique.

The frequency-domain compressor can be considered to be a filtering operation; thespectrum of the input signal is multiplied by the spectrum of the compression filter togive the spectrum of the compressed output signal. However, the compression filter isdesigned in the frequency domain, so the length of its impulse response is not known andcan lead to temporal aliasing. Consequently, the length of the filter response must be chosenappropriately so as to eliminate temporal aliasing.

Practical FFT system

The FFT system with temporal aliasing eliminated requires a total of four FFTs: a forwardFFT for the input segment, an inverse FFT for the compression gains, a forward FFT forthe truncated compression impulse response, and an inverse FFT for the filtered segment.A practical digital hearing aid, in general, will not have the signal-processing capabilityto perform four FFTs. The DSP may not be fast enough, or the battery drain may be


too great. One solution to this problem is to provide circuitry on the DSP chip that isdedicated to computing the FFT or to exploit special properties of the FFT and digitalfilters to design a transform with a reduced operations count.

An additional solution is to compromise on the compression filter design to reduce thenumber of FFTs needed.The shorter the impulse response, the smoother the frequencyresponse. Thus smoothing the compression-gain frequency response is equivalent to anapproximate truncation of the impulse response. The smoothing does not produce an exacttruncation, so some residual temporal aliasing distortion is possible. A careful selection ofthe input segment length, FFT size, and frequency-domain smoothing will result in temporalaliasing distortion that can not be perceived under most listening conditions.

Furthermore, the time delay of the FFT compressor depends on the size of the inputbuffer and the size of the FFT. The FFT can not be computed until the input buffer is filled,so there is a processing delay while the input segment is accumulated. The compressionfrequency response is also specified as a real number greater than zero in each frequencyband. A frequency response that is pure real has a corresponding impulse response thatis linear-phase. Another probable way by which the delay can be adjusted is by changingthe size of the input segment and/or that of the FFT. A shorter input segment means thatthe input buffer will be filled sooner, with a corresponding reduction in the overall delay.However, a shorter input buffer means that the FFTs will have to be computed more often,and the processing capacity of the DSP or the battery drain will need to be increased. Theother option is by using a smaller FFT. If the input buffer size is halved and the FFT sizehalved, then the delay will be also be halved without an increase in the computational orpower requirements. However, the frequency resolution for a smaller FFT is reduced.

2.2.2 SideBranch Architecture

In Figure 2.3, we observe the block diagram of the sidebranch compression architecturewhich has the advantage of combining low quantization noise of FIR filter bank with theefficiency of spectral gain calculation using the FFT:

Figure 2.3: Side-branch compression architecture.

The side-branch system separates the input signal filtering from the frequency analy-sis and calculation of the compression gains. Increasing the number of taps in the FIRfilter allows for finer adjustment of the compressor frequency response at the expense ofincreased processing complexity and system delay. Another way of viewing the side-branchcompressor is that it is an FFT system in which the compression filter is transformed intothe time domain, with the filtering then performed via time-domain convolution rather thanfrequency-domain multiplication.


In this implementation, the input signal fills a K/2-sample buffer and present K/2samples are appended to the previous K/2 samples to give a total of K samples whichare then windowed to provide the input to the K-point FFT. The signal power spectrum iscomputed from the FFT bins. Frequency bands are peak-detected and compressor gains arecomputed from the peak-detector outputs. The compression gains are inverse transformedto give the impulse response of the compression filter. Because the gains as a function offrequency are real, the impulse response has even symmetry and yields a linear-phase filter.The impulse response can be windowed if desired to smooth its frequency response. TheK/2 most-recent input samples are then convolved with the K-point FIR filter to producethe final output.

2.3 Warped Compression System

Hearing losses are typically frequencydependent, so the compressor is designed as to pro-vide different amounts of dynamicrange compression in different frequency regions. Thesolution is a multichannel system, such as a filter bank with different degrees of com-pression in each channel. The design of a multichannel compressor involves a fundamentaltradeoff between frequency resolution and time delay.

For any given processing approach, increased frequency resolution comes at the price ofincreased processing delay. Compared to conventional digital processing algorithms, the useof digital frequency warping inherently gives frequency resolution on an auditory frequencyscale and reduces the amount of processing delay for a specifieddegree of lowfrequencyresolution. The processing delay of a frequencywarped compressor, which is described ina following section, is frequencydependent with greater delay at low frequencies than athigh frequencies. Consequently, a frequencywarped compressor must take into accountthe frequency resolution, overall system processing delay and delay variationacross frequency. The target is to design a compression system that avoids audible artifactscaused by the system delay and has good frequency resolution on a criticalband frequencyscale [12].

2.3.1 Concerns in designing Compression Systems

Frequency Resolution

The main concern in designing a multichannel compressor is to match systems frequencyresolution to that of the human auditory system. Digital frequency analysis typically pro-vides constantbandwidth frequency resolution. However, human auditory systems res-olution is more accurately modeled by a filter bank having a nearly constant bandwidthat low frequencies but proportional to frequency as it increases. This mismatch betweendigital and auditory analysis can be greatly reduced by replacing the conventional uniformfrequency analysis by a warped frequency analysis. Frequency warping uses a conformalmapping so as to reallocate frequency samples close to the Bark frequency scale and isdescribed in more detail in a following section.

Overall Processing Delay

A second concern in designing a compression system for a hearing aid is the overall process-ing delay which might cause coloration effects when the hearingaid wearer is talking. Whentalking, the talkers own voice reaches the cochlea with minimal delay via bone conduction


and through the hearingaid vent. This signal interacts with the delayed and amplified sig-nal produced by the hearing aid device to produce a combfiltered spectrum at the cochlea.Delays as short as 3 to 6msec that are constant across frequency are detectible and overalldelays in the range of 15 to 20msec can be judged as disturbing or objectionable.

The overall processing delay is due to several factors. Certain aspects of the overall sys-tem delay, such as the A/D and D/A converter delays, are not affected by signal processingsince they are fixed by the hardware. The total software processing delay is the sum ofthe time required to fill the input buffer, the group delay inherent in frequency-domain ortime-domain filtering and the time needed to execute the code before the output signal isavailable.

2.3.2 Digital Frequency Warping

Frequency warping uses a conformal mapping to give a nonuniform spacing of frequencysamples around the unit circle in the complex zplane. It is achieved by replacing unitdelays in a digital filter with firstorder allpass filters. The all pass filter is given by:

A(z) =z1 a1 az1 (2.1)

where a is the warping parameter. The value of the warping parameter that gives aclosest fit to the Bark frequency scale is a = 0.5756 for sampling rate 16kHz. For thischoice of parameters, group delay at low frequencies exceeds one sample and is less thanone sample at high frequencies as illustrated in Fig. 2.4.

Figure 2.4: Group delay in samples for a single all-pass filter having the warping parametera=0.5756 [12].

The transfer function of the warped FIR filter is the weighted sum of the outputs ofeach allpass section:

W (z) =Kk=0

bkAk(z) (2.2)

for a filter with K+1 taps. Forcing the real filter coefficients {bk} to have even symmetry foran unwarped FIR filter yields a linearphase filter, in which the filter delay is independent


Figure 2.5: Block diagram of a compression system using frequency warping for both fre-quency analysis and filtered signal synthesis.

of the coefficients as long as symmetry is preserved. This symmetry property guaranteesthat no phase modification will occur as the compressor changes gain in response to theincoming signal. In a binaural fitting (hearing aids on both ears), the coefficient symmetryalso ensures that identical amounts of group delay are introduced at the two ears and thus,preserving the interaural phase differences used for sound localization.

Frequency warping can be used to design both finiteimpulse response and infiniteimpulse response (IIR) filters. Improved frequency resolution in a conventional FIR filterrequires increasing the filter length, which leads to a further increase in group delay. Sim-ilarly, improved frequency resolution in a warped FIR filter requires an increase in thenumber of allpass filter sections which also leads to a rise in filter delay. There is thereforea tradeoff between frequency resolution and group delay for both conventional and warpedfilters, although the warped filter has less delay at low frequencies than a conventional filterwith the same lowfrequency resolution.

2.3.3 Compressor using frequency warping

A dynamicrange compression system using warped frequency analysis is presented inFig. 2.5.

The compressor combines a warped FIR filter and a warped FFT. The input signal x(n)is passed through a cascade of all-pass filters, with the output of the k -th all-pass stagegiven by pk(n). The sequence of delayed samples {pk(n)} is then windowed and its FFTis calculated. Because of the fact that the data sequence is windowed, the spectrum issmoothed in the warped frequency domain, giving smoothly overlapping frequency bands.The result of the FFT is a spectrum sampled on a Bark frequency scale. The implementationof the algorithm is done on a sample-by-sample basis or using block data processing where


the compression gains are updated once per block (block processing is typically used) [12].The compression gains are computed from the warped power spectrum and are pure realnumbers so the inverse FFT to give the warped time-domain filter results in a filter with realand even-symmetrical coefficients. The system output is finally calculated by convolvingthe delayed samples with the compression gain filter:

y(n) =Kk=0

gk(n)pk(n) (2.3)

where {gk(n)} are the compression filter coefficients.Comparing a conventional FIR system to a warped compression system of the same

length, the latter will require more computational resources because of the all-pass filters inthe tapped delay line. Nevertheless, in many cases the warped FIR filter is shorter than theconventional FIR filter needed to achieve the same degree of auditory frequency resolution.

2.4 Warped LowDelay Post Filter

We live in a noisy world. In all applications (telecommunications, hands-free communica-tions, recording, human-machine interfaces, etc.) that require at least one microphone, thesignal of interest is usually contaminated by background noise and reverberations. There-fore, the microphone signal has to be cleaned with digital signal processing tools beforeit is played out,transmitted or stored. As a result, nowadays digital hearing aids are mostlyequipped with speech enhancement systems.

By speech enhancement, we mean the improvement in intelligibility and/or quality of adegraded speech signal which includes not only noise reduction but also dereverberation andseparation of independent signals. Speech enhancement is a very difficult problem becausethe nature and characteristics of noise signals can change dramatically in time and theperformance measure can also be defined differently for each application. To measure theperformance, two perceptual criteria are widely used: quality (subjective) and intelligi-bility (objective). In general, it is not possible to improve both quality and intelligibility atthe same time and quality is usually improved at the expense of sacrificing intelligibility [13].

2.4.1 Warped Post Filter for Speech Enhancement

For the suppression of background noise, a noise reduction system has to achieve a highquality for the enhanced speech but without causing a significant signal delay. A high signaldelay can cause coloration effects while the hearing-aid user is talking. In such a case,the talkers own voice reaches the cochlea with minimal delay (via bone conduction andthrough hearing-aid vent) and interacts with the delayed and amplified signal produced bythe hearing-aid. This leads to perceptual annoying artifacts.

In order to achieve these two conflicting goals, the main focus is the development of apost-filter in the considered noise reduction system sketched in Fig. 2.6 [14]:

The calculation of the filter coefficients is done in the frequency domain while the ac-tual filtering is performed in the time-domain. For the adaptation of filter coefficientsin frequency-domain, the noisy input speech signal x(k) is transformed into the spectral-domain by means of a frequency-warped Discrete Fourier Transform (DFT) Analysis filterbank which is described in more detail in Chapter 3. It should be mentioned that a fil-ter bank with non-uniform (approximately Bark-scaled) frequency resolution incorporatesa perceptual model of the human auditory system. Thus, a lower number of frequencychannels can be taken comparing to those of a uniform filter bank.


Figure 2.6: Diagram of a warped post-filter for speech enhancement.

The M spectral coefficients Xi(k) are calculated by the DFT polyphase network (PPN)Analysis filter bank with downsampling and the calculation of the M spectral gain factorsWi(k) is done at decimated time instants:

k =k

r

k (2.4)

where r denotes the downsampling rate. Gains Wi(k) are estimated by a common spectralspeech estimator (such as the Wiener filter) and are real and bounded by Wthres < Wi(k)

Documents

Compression Techniques For Digital Hearing Aids