85
A project report on A Time-Varying Convergence Parameter for the LMS Algorithm in the Presence of White Gaussian Noise

timevarying lms

Embed Size (px)

Citation preview

Page 1: timevarying lms

Aproject report

onA Time-Varying Convergence Parameter for the

LMS Algorithm in the Presence of White Gaussian Noise

Page 2: timevarying lms

ABSTRACT:

A novel approach for the least-mean-square (LMS) estimation algorithm is proposed.

Rather than using a fixed convergence parameter μ, this approach utilizes a time-varying LMS

parameter μn. This technique leads to faster convergence and provides reduced mean-squared

error compared to the conventional fixed parameter LMS algorithm. The algorithm has been

tested for noise reduction and estimation in narrow-band FM signals corrupted by additive white

Gaussian noise.

Page 3: timevarying lms

INTRODUCTION:

The LMS algorithm is a well-known adaptive estimation and prediction technique. It has

been extensively studied in the literature and widely used in a variety of applications. The

performance of the LMS algorithm is highly dependent on the selected convergence parameter μ

and the signal condition. A larger convergence parameter value leads to faster convergence of

the LMS algorithm, i.e., convergence of the filter coefficients to their optimal values. After

coefficients converge to their optimal value, the convergence parameter should be small for

better estimation accuracy .

In this project, we propose a time-varying convergence parameterμn . for the LMS

algorithm in a white Gaussian noise environment. A general power decaying law has been

studied, however, other time-varying laws could also be applicable. The main idea is to set the

convergence parameter to a large value in the initial state in order to speed up the algorithm

convergence. As time passes, the parameter will be adjusted to a smaller value so that the

adaptive filter will have a smaller mean-squared error. The modified algorithm has been tested

for noise reduction and estimation in linear frequency-modulated (LFM) narrowband signals

corrupted by additive white Gaussian noise. Simulation results have shown that the modified

LMS algorithm has better performance in terms of convergence speed than the conventional

LMS algorithm with a constant convergence parameter and the least-mean-squares error close is

to the optimal value.

Page 4: timevarying lms

SPEECH SIGNALS:

A speech signal consists of three classes of sounds. They are voice, fricative and plosive

sounds. Voiced sounds are caused by excitation of the vocal tract with quasi-periodic pulses of

airflow. Fricative sounds are formed by constricting the vocal tract and passing air through it,

causing Turbulence that result in a noise-like sound. Plosive sounds are created by closing up the

vocal tract, building up air behind it then suddenly releasing it; this is heard in the sound made

by the letter Figure shows a discrete time representation of a speech signal.

By looking at it as a whole we can tell that it is non-stationary. That is, its mean values

vary with time and cannot be predicted using the above mathematical models for random

processes. However, a speech signal can be considered as a linear composite of the above three

classes of sound, each of these sounds are stationary and remain fairly constant over intervals of

the order of 30 to 40 ms. The theory behind the derivations of many adaptive filtering algorithms

usually requires the input signal to be stationary. Although speech is non-stationary for all time,

it is an assumption that the short term stationary behavior outlined above will prove adequate for

the adaptive filters to function as desired

Representation of Speech Signal

Page 5: timevarying lms

SPEECH GENERATION:

Speech generation and recognition are used to communicate between humans and

machines. Rather than using your hands and eyes, you use your mouth and ears. This is very

convenient when your hands and eyes should be doing something else, such as: driving a car,

performing surgery, or (unfortunately) firing your weapons at the enemy. Two approaches are

used for computer generated speech: digital recording and vocal tract simulation. In digital

recording, the voice of a human speaker is digitized and stored, usually in a compressed form.

During playback, the stored data are uncompressed and converted back into an analog

signal. An entire hour of recorded speech requires only about three megabytes of storage, well

within the capabilities of even small computer systems. This is the most common method of

digital speech generation used today. Vocal tract simulators are more complicated, trying to

mimic the physical mechanisms by which humans create speech. The human vocal tract is an

acoustic cavity with resonate frequencies determined by the size and shape of the chambers.

Sound originates in the vocal tract in one of two basic ways, called voiced and fricative sounds.

With voiced sounds, vocal cord vibration produces near periodic pulses of air into the vocal

cavities. In comparison, fricative sounds originate from the noisy air turbulence at narrow

constrictions, such as the teeth and lips. Vocal tract simulators operate by generating digital

signals that resemble these two types of excitation. The characteristics of the resonate chamber

are simulated by passing the excitation signal through a digital filter with similar resonances.

This approach was used in one of the very early DSP success stories, the Speak & Spell, a widely

sold electronic learning aid for children.

SPEECH PRODUCTION:

Speech is produced when air is forced from the lungs through the vocal cords and

along the vocal tract. The vocal tract extends from the opening in the vocal cords (called the

glottis) to the mouth, and in an average man is about 17 cm long. It introduces short-term

correlations (of the order of 1 ms) into the speech signal, and can be thought of as a filter with

broad resonances called formants. The frequencies of these formants are controlled by varying

Page 6: timevarying lms

the shape of the tract, for example by moving the position of the tongue. An important part of

many speech codecs is the modeling of the vocal tract as a short-term filter. As the shape of the

vocal tract varies relatively slowly, the transfer function of its modeling filter needs to be

updated only relatively infrequently (typically every 20 ms or so).

The vocal tract filter is excited by air forced into it through the vocal cords. Speech sounds can

be broken into three classes depending on their mode of excitation.

Voiced sounds are produced when the vocal cords vibrate open and closed, thus

interrupting the flow of air from the lungs to the vocal tract and producing quasi-periodic

pulses of air as the excitation. The rate of the opening and closing gives the pitch of the

sound. Varying the shape of, and the tension in, the vocal cords, and the pressure of the

air behind them can adjust this. Voiced sounds show a high degree of periodicity at the

pitch period, which is typically between 2 and 20 ms. This long-term periodicity can be

seen in Figure 1 which shows a segment of voiced speech sampled at 8 kHz. Here the

pitch period is about 8 ms or 64 samples.

Unvoiced sounds result when the excitation is a noise-like turbulence produced by

forcing air at high velocities through a constriction in the vocal tract while the glottis is

held open. Such sounds show little long-term periodicity as can be seen from Figures 3

and 4 although short-term correlations due to the vocal tract are still present.

Plosive sounds result when a complete closure is made in the vocal tract, and air pressure

is built up behind this closure and released suddenly.

Some sounds cannot be considered to fall into any one of the three classes above,

but are a mixture. For example voiced fricatives result when both vocal cord vibration and a

constriction in the vocal tract are present.

Although there are many possible speech sounds which can be produced, the shape of the

vocal tract and its mode of excitation change relatively slowly, and so speech can be considered

to be quasi-stationary over short periods of time (of the order of 20 ms). Speech signals show a

high degree of predictability, due sometimes to the quasi-periodic vibrations of the vocal cords

and also due to the resonances of the vocal tract. Speech coders attempt to exploit this

predictability in order to reduce the data rate necessary for good quality voice transmission

From the technical, signal-oriented point of view, the production of speech is

widely described as a two-level process. In the first stage the sound is initiated and in the second

Page 7: timevarying lms

stage it is filtered on the second level. This distinction between phases has its orgin in the source-

filter model of speech production.

Fig: Source Filter Model of Speech Production

The basic assumption of the model is that the source signal produced at the

glottal level is linearly filtered through the vocal tract. The resulting sound is emitted to the

surrounding air through radiation loading (lips). The model assumes that source and filter are

independent of each other. Although recent findings show some interaction between the vocal

tract and a glottal source (Rothenberg 1981; Fant 1986), Fant's theory of speech production is

still used as a framework for the description of the human voice, especially as far as the

articulation of vowels is concerned.

SPEECH PROCESSING:

The term speech processing basically refers to the scientific discipline concerning

the analysis and processing of speech signals in order to achieve the best benefit in various

practical scenarios. The field of speech processing is, at present, undergoing a rapid growth in

terms of both performance and applications. The advances being made in the field of

microelectronics, computation and algorithm design stimulate this. Nevertheless, speech

processing still covers an extremely broad area, which relates to the following three engineering

applications:

• Speech Coding and transmission that is mainly concerned with man-to man voice

communication;

• Speech Synthesis which deals with machine-to-man communications;

• Speech Recognition relating to man-to machine communication.

Page 8: timevarying lms

Speech Coding:

Speech coding or compression is the field concerned with compact digital

representations of speech signals for the purpose of efficient transmission or storage. The central

objective is to represent a signal with a minimum number of bits while maintaining perceptual

quality. Current applications for speech and audio coding algorithms include cellular and

personal communications networks (PCNs), teleconferencing, desktop multi-media systems, and

secure communications.

SPEECH SYNTHESIS:

The process that involves the conversion of a command sequence or input text

(words or sentences) into speech waveform using algorithms and previously coded speech data is

known as speech synthesis. The inputting of text can be processed through by keyboard, optical

character recognition, or from a previously stored database. A speech synthesizer can be

characterized by the size of the speech units they concatenate to yield the output speech as well

as by the method used to code, store and synthesize the speech. If large speech units are

involved, such as phrases and sentences, high-quality output speech (with large memory

requirements) can be achieved. On the contrary, efficient coding methods can be used for

reducing memory needs, but these usually degrade speech quality.

NOISE SOURCES:

Sources of noise exist throughout the environment. One type of noise is due to turbulence

and is therefore totally random and impossible to predict. Engineers like to look at signals, noise

included, in the frequency domain. That is, "How is the noise energy distributed as a function of

frequency?"

These turbulent noises tend to distribute their energy evenly across the frequency bands

and are therefore referred to as "Broadband Noise". Very commonly we come across a word

“white noise” white noise comes under the category of Broadband Noise. White Noise is a noise

having a frequency spectrum that is continuous and uniform over a specified frequency band.

Note: White noise has equal power per hertz over the specified frequency band. (Synonym

additive white Gaussian noise) Examples of broadband noise are the low frequency noise from

jet planes and the impulse noise of an explosion.

Page 9: timevarying lms

A large number of environmental noises are different. These "Narrow Band Noises"

concentrate most of their noise energy at specific frequencies. When the source of the noise is a

rotating or repetitive machine, the noise frequencies are all multiples of a basic "Noise Cycle"

and the noise is approximately periodic. This "Tonal Noise" is common in the environment as

man made machinery tends to generate it (along with a smaller amount of broadband noise) at

increasingly high levels.

Filter design

Filter design is the process of designing a filter (in the sense in which the term is used in

signal processing, statistics, and applied mathematics), often a linear shift-invariant filter, which

satisfies a set of requirements, some of which are contradictory. The problem is to find a

realization of the filter which meets each of the requirements to a sufficient degree to make it

useful.

The filter design process can be described as an optimization problem where each requirement

contributes with a term to an error function which should be minimized. Certain parts of the

design process can be automated, but normally an experienced electrical engineer is needed to

get a good result.

Typical design requirements:

Typical requirements which are considered in the design process are:

The filter should have a specific frequency response

The filter should have a specific impulse response

The filter should be causal

The filter should be stable

The filter should be localized

The computational complexity of the filter should be low

The filter should be implemented in particular hardware or software

Page 10: timevarying lms

The frequency function:

Typical examples of frequency function are"

A low-pass filter is used to cut unwanted high-frequency signals.

A high-pass filter passes high frequencies fairly well; it is helpful as a filter to cut any unwanted

low frequency components.

A band-pass filter passes a limited range of frequencies.

A band-stop filter passes frequencies above and below a certain range. A very narrow band-stop

filter is known as a notch filter.

A low-shelf filter passes all frequencies, but increasing or reducing frequencies below the cutoff

frequency by specified amount.

A high-shelf filter passes all frequencies, but increasing or reducing frequencies above the cutoff

frequency by specified amount.

A peak EQ filter makes a peak or a dip in the frequency response, commonly used in graphic

equalizers.

An all-pass filter passes through all frequencies unchanged, but changes the phase of the signal.

This is a filter commonly used in phaser effects.

An important parameter is the required frequency response. In particular, the steepness and

complexity of the response curve is a deciding factor for the filter order and feasibility.

A first order recursive filter will only have a single frequency-dependent component. This means

that the slope of the frequency response is limited to 6 dB per octave. For many purposes, this is

not sufficient. To achieve steeper slopes, higher order filters are required.

In relation to the desired frequency function, there may also be an accompanying weighting

function which describes, for each frequency, how important it is that the resulting frequency

function approximates the desired one. The larger weight, the more important is a close

approximation.

Page 11: timevarying lms

The impulse response

There is a direct correspondence between the filter's frequency function and its impulse

response, the former is the Fourier transform of the latter. This means that any requirement on

the frequency function is a requirement on the impulse response, and vice versa.

However, in certain applications it may be the filter's impulse response which is explicit

and the design process then aims at producing as close an approximation as possible to the

requested impulse response given all other requirements.

In some cases it may even be relevant to consider a frequency function and impulse

response of the filter which are chosen independently from each other. For example, we may

both want a specific frequency function of the filter and that the resulting filter have a small

effective width in the signal domain as possible. The latter condition can be realized by

considering a very narrow function as the wanted impulse response of the filter even though this

function has no relation to the desired frequency function. The goal of the design process is then

to realize a filter which tries to meet both these contradicting design goals as much as possible.

Causality

In order to be implementable, any time-dependent filter must be causal: the filter

response only depends on the current and past inputs. A standard approach is to leave this

requirement until the final step. If the resulting filter is not causal, it can be made causal by

introducing an appropriate time-shift (or delay). If the filter is a part of a larger system (which it

normally is) these types of delays have to be introduced with care since they affect the operation

of the entire system.

Stability

A stable filter assures that every limited input signal produces a limited filter response. A

filter which does not meet this requirement may in some situations prove useless or even

harmful. Certain design approaches can guarantee stability, for example by using only feed-

forward circuits such as an FIR filter. On the other hand, filter based on feedback circuits have

Page 12: timevarying lms

other advantages and may therefore be preferred, even if this class of filters include unstable

filters. In this case, the filters must be carefully designed in order to avoid instability.

Locality

In certain applications we have to deal with signals which contain components which can

be described as local phenomena, for example pulses or steps, which have certain time duration.

A consequence of applying a filter to a signal is, in intuitive terms, that the duration of the local

phenomena is extended by the width of the filter. This implies that it is sometimes important to

keep the width of the filter's impulse response function as short as possible.

According to the uncertainty relation of the Fourier transform, the product of the width of

the filter's impulse response function and the width of its frequency function must exceed a

certain constant. This means that any requirement on the filter's locality also implies a bound on

its frequency function's width. Consequently, it may not be possible to simultaneously meet

requirements on the locality of the filter's impulse response function as well as on its frequency

function. This is a typical example of contradicting requirements.

Computational complexity

A general desire in any design is that the number of operations (additions and

multiplications) needed to compute the filter response is as low as possible. In certain

applications, this desire is a strict requirement, for example due to limited computational

resources, limited power resources, or limited time. The last limitation is typical in real-time

applications.

There are several ways in which a filter can have different computational complexity. For

example, the order of a filter is more or less proportional to the number of operations. This

means that by choosing a low order filter, the computation time can be reduced.

For discrete filters the computational complexity is more or less proportional to the

number of filter coefficients. If the filter has many coefficients, for example in the case of

Page 13: timevarying lms

multidimensional signals such as tomography data, it may be relevant to reduce the number of

coefficients by removing those which are sufficiently close to zero.

Another issue related to computational complexity is separability, that is, if and how a

filter can be written as a convolution of two or more simpler filters. In particular, this issue is of

importance for multidimensional filters, e.g., 2D filter which are used in image processing. In

this case, a significant reduction in computational complexity can be obtained if the filter can be

separated as the convolution of one 1D filter in the horizontal direction and one 1D filter in the

vertical direction. A result of the filter design process may, e.g., be to approximate some desired

filter as a separable filter or as a sum of separable filters.

Other considerations

It must also be decided how the filter is going to be implemented:

Analog filter

Analog sampled filter

Digital filter

Mechanical filter

Analog filters

The design of linear analog filters is for the most part covered in the linear filter section.

Digital filters

Digital filters are classified into one of two basic forms, according to how they respond to an unit

impulse:

Finite impulse response, or FIR, filters express each output sample as a weighted sum of the last

N inputs, where N is the order of the filter. Since they do not use feedback, they are inherently

stable. If the coefficients are symmetrical (the usual case), then such a filter is linear phase, so it

delays signals of all frequencies equally. This is important in many applications. It is also

straightforward to avoid overflow in an FIR filter. The main disadvantage is that they may require

Page 14: timevarying lms

significantly more processing and memory resources than cleverly designed IIR variants. FIR

filters are generally easier to design than IIR filters - the Remez exchange algorithm is one

suitable method for designing quite good filters semi-automatically. (See Methodology.)

Infinite impulse response, or IIR, filters are the digital counterpart to analog filters. Such a filter

contains internal state, and the output and the next internal state are determined by a linear

combination of the previous inputs and outputs (in other words, they use feedback, which FIR

filters normally do not). In theory, the impulse response of such a filter never dies out completely,

hence the name IIR, though in practice, this is not true given the finite resolution of computer

arithmetic. IIR filters normally require less computing resources than an FIR filter of similar

performance. However, due to the feedback, high order IIR filters may have problems with

instability, arithmetic overflow, and limit cycles, and require careful design to avoid such pitfalls.

Additionally, since the phase shift is inherently a non-linear function of frequency, the time delay

through such a filter is frequency-dependent, which can be a problem in many situations. 2nd

order IIR filters are often called 'biquads' and a common implementation of higher order filters is

to cascade biquads. A useful reference for computing biquad coefficients is the RBJ Audio EQ

Cookbook.

Sample rate

Unless the sample rate is fixed by some outside constraint, selecting a suitable sample

rate is an important design decision. A high rate will require more in terms of computational

resources, but less in terms of anti-aliasing filters. Interference[disambiguation needed] and beating with

other signals in the system may also be an issue.

Anti-aliasing

For any digital filter design, it is crucial to analyze and avoid aliasing effects. Often, this

is done by adding analog anti-aliasing filters at the input and output, thus avoiding any frequency

component above the Nyquist frequency. The complexity (i.e., steepness) of such filters depends

on the required signal to noise ratio and the ratio between the sampling rate and the highest

frequency of the signal.

Page 15: timevarying lms

Theoretical basis

Parts of the design problem relate to the fact that certain requirements are described in the

frequency domain while others are expressed in the signal domain and that these may contradict.

For example, it is not possible to obtain a filter which has both an arbitrary impulse response and

arbitrary frequency function. Other effects which refer to relations between the signal and

frequency domain are

The uncertainty principle between the signal and frequency domains

The variance extension theorem

The asymptotic behaviour of one domain versus discontinuities in the other

The uncertainty principle

As stated in the uncertainty principle, the product of the width of the frequency function and the

width of the impulse response cannot be smaller than a specific constant. This implies that if a

specific frequency function is requested, corresponding to a specific frequency width, the

minimum width of the filter in the signal domain is set. Vice versa, if the maximum width of the

response is given, this determines the smallest possible width in the frequency. This is a typical

example of contradicting requirements where the filter design process may try to find a useful

compromise.

The variance extension theorem

Let be the variance of the input signal and let be the variance of the filter. The variance of

the filter response, , is then given by

= +

Page 16: timevarying lms

This means that σr > σf and implies that the localization of various features such as pulses or

steps in the filter response is limited by the filter width in the signal domain. If a precise

localization is requested, we need a filter of small width in the signal domain and, via the

uncertainty principle, its width in the frequency domain cannot be arbitrary small.

Discontinuities versus asymptotic behaviour

Let f(t) be a function and let F(ω) be its Fourier transform. There is a theorem which states that if

the first derivative of F which is discontinuous has order , then f has an asymptotic decay

like t − n − 1.

A consequence of this theorem is that the frequency function of a filter should be as smooth as

possible to allow its impulse response to have a fast decay, and thereby a short width.

Methodology

One common method for designing FIR filters is the Remez exchange algorithm. Here

the user specifies a desired frequency response, a weighting function for errors from this

response, and a filter order N. The algorithm then finds the set of N coefficients that minimize

the maximum deviation from the ideal. Intuitively, this finds the filter that is as close as you can

get to the desired response given that you can use only N coefficients. This method is

particularly easy in practice and at least one text[1] includes a program that takes the desired filter

and N and returns the optimum coefficients. One possible drawback to filters designed this way

is that they contain many small ripples in the passband(s), since such a filter minimizes the peak

error.

Another method to finding a discrete FIR filter is filter optimization described in

Knutsson et al., which minimizes the integral of the square of the error, instead of its maximum

value. In its basic form this approach requires that an ideal frequency function of the filter FI(ω)

is specified together with a frequency weighting function W(ω) and set of coordinates xk in the

signal domain where the filter coefficients are located.

An error function is defined as

Page 17: timevarying lms

where f(x) is the discrete filter and is the discrete-time Fourier transform defined on the

specified set of coordinates. The norm used here is, formally, the usual norm on L2 spaces. This

means that measures the deviation between the requested frequency function of the filter, FI,

and the actual frequency function of the realized filter, . However, the deviation is also

subject to the weighting function W before the error function is computed.

Once the error function is established, the optimal filter is given by the coefficients f(x)

which minimize . This can be done by solving the corresponding least squares problem. In

practice, the L2 norm has to be approximated by means of a suitable sum over discrete points in

the frequency domain. In general, however, these points should be significantly more than the

number of coefficients in the signal domain to obtain a useful approximation.

Simultaneous optimization in both domains

The previous method can be extended to include an additional error term related to a

desired filter impulse response in the signal domain, with a corresponding weighting function.

The ideal impulse response can be chosen independently of the ideal frequency function and is in

practice used to limit the effective width and to remove ringing effects of the resulting filter in

the signal domain. This is done by choosing a narrow ideal filter impulse response function, e.g.,

an impulse, and a weighting function which grows fast with the distance from the origin, e.g., the

distance squared. The optimal filter can still be calculated by solving a simple least squares

problem and the resulting filter is then a "compromise" which has a total optimal fit to the ideal

functions in both domains. An important parameter is the relative strength of the two weighting

functions which determines in which domain it is more important to have a good fit relative to

the ideal function.

Page 18: timevarying lms

FINITE IMPULSE RESPONSE:

A finite impulse response (FIR) filter is a type of a digital filter. The impulse response,

the filter's response to a Kronecker delta input, is finite because it settles to zero in a finite

number of sample intervals. This is in contrast to infinite impulse response (IIR) filters, which

have internal feedback and may continue to respond indefinitely. The impulse response of an

Nth-order FIR filter lasts for N+1 samples, and then dies to zero.

The difference equation that defines the output of an FIR filter in terms of its input is:

where:

x[n] is the input signal,

y[n] is the output signal,

bi are the filter coefficients, and

N is the filter order – an Nth-order filter has (N + 1) terms on the right-hand side; these

are commonly referred to as taps.

This equation can also be expressed as a convolution of the coefficient sequence bi with the

input signal:

That is, the filter output is a weighted sum of the current and a finite number of previous values

of the input.

An FIR filter has a number of useful properties which sometimes make it preferable to an

infinite impulse response (IIR) filter. FIR filters:

Page 19: timevarying lms

1.Are inherently stable. This is due to the fact that all the poles are located at the origin and

thus are located within the unit circle.

2.Require no feedback. This means that any rounding errors are not compounded by summed

iterations. The same relative error occurs in each calculation. This also makes implementation

simpler.

3.They can easily be designed to be linear phase by making the coefficient sequence

symmetric; linear phase, or phase change proportional to frequency, corresponds to equal delay

at all frequencies. This property is sometimes desired for phase-sensitive applications, for

example crossover filters, and mastering.

The main disadvantage of FIR filters is that considerably more computation power is required

compared to an IIR filter with similar sharpness or selectivity, especially when low frequencies

(relative to the sample rate) cutoffs are needed.

IMPULSE RESPONSE:

The impulse response h[n] can be calculated if we set in the above relation,

where δ[n] is the Kronecker delta impulse. The impulse response for an FIR filter then becomes

the set of coefficients bn, as follows

for to .

The Z-transform of the impulse response yields the transfer function of the FIR filter

Page 20: timevarying lms

FIR filters are clearly bounded-input bounded-output (BIBO) stable, since the output is a

sum of a finite number of finite multiples of the input values, so can be no greater than

times the largest value appearing in the input.

FILTER DESIGN

To design a filter means to select the coefficients such that the system has specific

characteristics. The required characteristics are stated in filter specifications. Most of the time

filter specifications refer to the frequency response of the filter. There are different methods to

find the coefficients from frequency specifications:

1. Window design method

2. Frequency Sampling method

3. Weighted least squares design

4. Parks-McClelland method (also known as the Equiripple, Optimal, or Minimax method).

The Remez exchange algorithm is commonly used to find an optimal equiripple set of

coefficients. Here the user specifies a desired frequency response, a weighting function

for errors from this response, and a filter order N. The algorithm then finds the set of (N +

1) coefficients that minimize the maximum deviation from the ideal. Intuitively, this finds

the filter that is as close as you can get to the desired response given that you can use

only (N + 1) coefficients. This method is particularly easy in practice since at least one

text[1] includes a program that takes the desired filter and N, and returns the optimum

coefficients.

Software packages like MATLAB, GNU Octave, Scilab, and SciPy provide convenient ways to

apply these different methods.

Page 21: timevarying lms

Some of the time, the filter specifications refer to the time-domain shape of the input signal the

filter is expected to "recognize". The optimum matched filter is to sample that shape and use

those samples directly as the coefficients of the filter -- giving the filter an impulse response that

is the time-reverse of the expected input signal.

WINDOW DESIGN METHOD

In the Window Design Method, one designs an ideal IIR filter, then applies a window

function to it – in the time domain, multiplying the infinite impulse by the window function. This

results in the frequency response of the IIR being convolved with the frequency response of the

window function[2] – thus the imperfections of the FIR filter (compared to the ideal IIR filter) can

be understood in terms of the frequency response of the window function.

The ideal frequency response of a window is a Dirac delta function, as that results in the

frequency response of the FIR filter being identical to that of the IIR filter, but this is not

attainable for finite windows, and deviations from this yield differences between the FIR

response and the IIR response.

MOVING AVERAGE EXAMPLE

Page 22: timevarying lms

Block diagram of a simple FIR filter (2nd-order/3-tap filter in this case, implementing a moving

average)

A moving average filter is a very simple FIR filter. The filter coefficients are found via the

following equation:

for

To provide a more specific example, we select the filter order:

The impulse response of the resulting filter is:

The following figure shows the block diagram of such a 2nd-order moving-average filter.

To discuss stability and spectral topics we take the z-transform of the impulse response:

The following figure shows the pole-zero diagram of the filter. Two poles are located at the

origin, and two zeros are located at ,

Page 23: timevarying lms

The frequency response, for frequency ω in radians per sample, is:

The following figure shows the absolute value of the frequency response. Clearly, the moving-

average filter passes low frequencies with a gain near 1, and attenuates high frequencies. This is

a typical low-pass filter characteristic. Frequencies above π are aliases of the frequencies below

π, and are generally ignored or filtered out if reconstructing a continuous-time signal.

Page 25: timevarying lms

INFINITE IMPULSE RESPONSE

Infinite impulse response (IIR) is a property of signal processing systems. Systems with

this property are known as IIR systems or, when dealing with filter systems, as IIR filters. IIR

systems have an impulse response function that is non-zero over an infinite length of time. This

is in contrast to finite impulse response filters (FIR), which have fixed-duration impulse

responses. The simplest analog IIR filter is an RC filter made up of a single resistor (R) feeding

into a node shared with a single capacitor (C). This filter has an exponential impulse response

characterized by an RC time constant.

Page 26: timevarying lms

IIR filters may be implemented as either analog or digital filters. In digital IIR filters, the

output feedback is immediately apparent in the equations defining the output. Note that unlike

with FIR filters, in designing IIR filters it is necessary to carefully consider "time zero" case in

which the outputs of the filter have not yet been clearly defined.

Design of digital IIR filters is heavily dependent on that of their analog counterparts

because there are plenty of resources, works and straightforward design methods concerning

analog feedback filter design while there are hardly any for digital IIR filters. As a result,

usually, when a digital IIR filter is going to be implemented, an analog filter (e.g. Chebyshev

filter, Butterworth filter, Elliptic filter) is first designed and then is converted to a digital filter by

applying discretization techniques such as Bilinear transform or Impulse invariance.

Example IIR filters include the Chebyshev filter, Butterworth filter, and the Bessel filter.

TRANSFER FUNCTION DERIVATION

Digitals filters are often described and implemented in terms of the difference equation

that defines how the output signal is related to the input signal:

where:

is the feedforward filter order

are the feedforward filter coefficients

is the feedback filter order

are the feedback filter coefficients

is the input signal

is the output signal.

Page 27: timevarying lms

A more condensed form of the difference equation is:

which, when rearranged, becomes:

To find the transfer function of the filter, we first take the Z-transform of each side of the above

equation, where we use the time-shift property to obtain:

We define the transfer function to be:

Considering that in most IIR filter designs coefficient is 1, the IIR filter transfer function takes

the more traditional form:

DESCRIPTION OF BLOCK DIAGRAM

Page 28: timevarying lms

Simple IIR filter block diagram

A typical block diagram of an IIR filter looks like the following. The z − 1 block is a unit delay.

The coefficients and number of feedback/feedforward paths are implementation-dependent.

Stability

The transfer function allows us to judge whether or not a system is bounded-input, bounded-

output (BIBO) stable. To be specific, the BIBO stability criteria requires the ROC of the system

include the unit circle. For example, for a causal system, all poles of the transfer function have to

have an absolute value smaller than one. In other words, all poles must be located within a unit

circle in the z-plane.

The poles are defined as the values of z which make the denominator of H(z) equal to 0:

Clearly, if then the poles are not located at the origin of the z-plane. This is in contrast to

the FIR filter where all poles are located at the origin, and is therefore always stable.

IIR filters are sometimes preferred over FIR filters because an IIR filter can achieve a much

sharper transition region roll-off than FIR filter of the same order.

Page 29: timevarying lms

Example

Let the transfer function of a filter H be

with ROC a < | z | and 0 < a < 1

which has a pole at a, is stable and causal. The time-domain impulse response is

h(n) = anu(n)

which is non-zero for n > = 0.

ADAPTIVE FILTER

An adaptive filter is a filter that self-adjusts its transfer function according to an

optimizing algorithm. Because of the complexity of the optimizing algorithms, most adaptive

filters are digital filters that perform digital signal processing and adapt their performance based

on the input signal. By way of contrast, a non-adaptive filter has static filter coefficients (which

collectively form the transfer function).

For some applications, adaptive coefficients are required since some parameters of the

desired processing operation (for instance, the properties of some noise signal) are not known in

advance. In these situations it is common to employ an adaptive filter, which uses feedback to

refine the values of the filter coefficients and hence its frequency response.

Generally speaking, the adapting process involves the use of a cost function, which is a

criterion for optimum performance of the filter (for example, minimizing the noise component of

the input), to feed an algorithm, which determines how to modify the filter coefficients to

minimize the cost on the next iteration.

Page 30: timevarying lms

As the power of digital signal processors has increased, adaptive filters have become

much more common and are now routinely used in devices such as mobile phones and other

communication devices, camcorders and digital cameras, and medical monitoring equipment.

EXAMPLE

Suppose a hospital is recording a heart beat (an ECG), which is being corrupted by a 50

Hz noise (the frequency coming from the power supply in many countries).

One way to remove the noise is to filter the signal with a notch filter at 50 Hz. However,

due to slight variations in the power supply to the hospital, the exact frequency of the power

supply might (hypothetically) wander between 47 Hz and 53 Hz. A static filter would need to

remove all the frequencies between 47 and 53 Hz, which could excessively degrade the quality

of the ECG since the heart beat would also likely have frequency components in the rejected

range.

To circumvent this potential loss of information, an adaptive filter could be used. The

adaptive filter would take input both from the patient and from the power supply directly and

would thus be able to track the actual frequency of the noise as it fluctuates. Such an adaptive

technique generally allows for a filter with a smaller rejection range, which means, in our case,

that the quality of the output signal is more accurate for medical diagnoses.

BLOCK DIAGRAM

The block diagram, shown in the following figure, serves as a foundation for particular

adaptive filter realisations, such as Least Mean Squares (LMS) and Recursive Least Squares

(RLS). The idea behind the block diagram is that a variable filter extracts an estimate of the

desired signal.

Page 31: timevarying lms

To start the discussion of the block diagram we take the following assumptions:

The input signal is the sum of a desired signal d(n) and interfering noise v(n)

x(n) = d(n) + v(n)

The variable filter has a Finite Impulse Response (FIR) structure. For such structures the

impulse response is equal to the filter coefficients. The coefficients for a filter of order p

are defined as

.

The error signal or cost function is the difference between the desired and the estimated

signal

The variable filter estimates the desired signal by convolving the input signal with the impulse

response. In vector notation this is expressed as

where

Page 32: timevarying lms

is an input signal vector. Moreover, the variable filter updates the filter coefficients at every time

instant

where is a correction factor for the filter coefficients. The adaptive algorithm generates this

correction factor based on the input and error signals. LMS and RLS define two different

coefficient update algorithms.

APPLICATIONS OF ADAPTIVE FILTERS

Noise cancellation

Signal prediction

Adaptive feedback cancellation

Echo cancellation

v

LEAST MEAN SQUARES (LMS):

Least mean squares (LMS) algorithms is a type of adaptive filter used to mimic a desired

filter by finding the filter coefficients that relate to producing the least mean squares of the error

signal (difference between the desired and the actual signal). It is a stochastic gradient descent

method in that the filter is only adapted based on the error at the current time. It was invented in

1960 by Stanford University professor Bernard Widrow and his first Ph.D. student, Ted Hoff.

Page 33: timevarying lms

PROBLEM FORMULATION:

Most linear adaptive filtering problems can be formulated using the block diagram above.

That is, an unknown system is to be identified and the adaptive filter attempts to adapt the

filter to make it as close as possible to , while using only observable signals x(n),

d(n) and e(n); but y(n), v(n) and h(n) are not directly observable. Its solution is closely related to

the Wiener filter.

The idea behind LMS filters is to use steepest descent to find filter weights which

minimize a cost function. We start by defining the cost function as

where e(n) is the error at the current sample 'n' and E{.} denotes the expected value.

Applying steepest descent means to take the partial derivatives with respect to the individual

entries of the filter coefficient (weight) vector

Page 34: timevarying lms

where is the gradient operator. With

and it follows

Now, is a vector which points towards the steepest ascent of the cost function.

To find the minimum of the cost function we need to take a step in the opposite direction of

. To express that in mathematical terms

where is the step size. That means we have found a sequential update algorithm which

minimizes the cost function. Unfortunately, this algorithm is not realizable until we know

.

SIMPLIFICATIONS:

For most systems the expectation function must be approximated.

This can be done with the following unbiased estimator

where N indicates the number of samples we use for that estimate. The simplest case is N = 1

Page 35: timevarying lms

For that simple case the update algorithm follows as

Indeed this constitutes the update algorithm for the LMS filter.

LMS ALGORITHM :

The LMS algorithm for a pth order algorithm can be summarized as

Parameters: p = filter order

μ = step size

Initialisation:

Computation: For n = 0,1,2,...

where denotes the Hermitian transpose of .

LEAST MEAN SQUARE ALGORITHM:

Least mean squares (LMS) algorithms are used in adaptive filters to find the filter coefficients

that relate to producing the least mean squares of the error signal (difference between the desired

and the actual signal). It is a stochastic gradient descent method in which the filter is adaptive

based on the error at the current time. It was invented in 1960 by Stanford University professor

Bernard Widrow and his first Ph.D. student, Ted Hoff. The adaptive linear combiner output, is a

linear combination of the input samples. The error in measurement is given by

Page 36: timevarying lms

where is the transpose vector of input samples.

Page 37: timevarying lms

To develop an adaptive algorithm ,it is required to estimate the gradient of ξ=E[] by taking

differences between short term averages of .Instead, to develop the LMS algorithm process,is

taken as the estimate of Thus at each iteration in the adaptive process a gradient estimate form is

as follows

With this simple estimate the steepest descent type of adaptive algorithm is specified as

This is the LMS algorithm. Where “μ” is the gain constant that regulates the speed and stability

of adaptation. Since the weight changes at each iteration are based on imperfect gradient

estimates, the adaptive process is expected to be noisy. The LMS algorithm can be implemented

without squaring, averaging or differentiation and is simple and efficient process.

CONVERGENCE OF WEIGHT VECTOR: As with all adaptive Algorithms, the primary

concern with the LMS Algorithm is its convergence to the weight vector solution, where error E

[] is minimized.

NORMALISED LEAST MEAN SQUARES FILTER (NLMS):

The main drawback of the "pure" LMS algorithm is that it is sensitive to the scaling of its

input x(n). This makes it very hard (if not impossible) to choose a learning rate μ that guarantees

stability of the algorithm.. The Normalised least mean squares filter (NLMS) is a variant of the

Page 38: timevarying lms

LMS algorithm that solves this problem by normalising with the power of the input. The NLMS

algorithm can be summarised as:

Parameters: p = filter order

μ = step size

Initialization:

Computation: For n = 0,1,2,...

OPTIMAL LEARNING RATE:

It can be shown that if there is no interference (v(n) = 0), then the optimal learning rate

for the NLMS algorithm is

μopt = 1

and is independent of the input x(n) and the real (unknown) impulse response . In the

general case with interference ( ), the optimal learning rate is

The results above assume that the signals v(n) and x(n) are uncorrelated to each other,

which is generally the case in practice.

PROOF:

Page 39: timevarying lms

Let the filter misalignment be defined as , we can derive the

expected misalignment for the next sample as:

Let and

Assuming independence, we have:

The optimal learning rate is found at , which leads to:

Page 40: timevarying lms

MEAN SQUARED ERROR:

In statistics, the mean square error or MSE of an estimator is one of many ways to

quantify the difference between an estimator and the true value of the quantity being estimated.

MSE is a risk function, corresponding to the expected value of the squared error loss or quadratic

loss. MSE measures the average of the square of the "error." The error is the amount by which

the estimator differs from the quantity to be estimated. The difference occurs because of

randomness or because the estimator doesn't account for information that could produce a more

accurate estimate.

The MSE is the second moment (about the origin) of the error, and thus incorporates both

the variance of the estimator and its bias. For an unbiased estimator, the MSE is the variance.

Like the variance, MSE has the same unit of measurement as the square of the quantity being

estimated. In an analogy to standard deviation, taking the square root of MSE yields the root

mean squared error or RMSE, which has the same units as the quantity being estimated; for an

unbiased estimator, the RMSE is the square root of the variance, known as the standard error.

DEFINITION AND BASIC PROPERTIES:

The MSE of an estimator with respect to the estimated parameter θ is defined as

The MSE is equal to the sum of the variance and the squared bias of the estimator

Page 41: timevarying lms

The MSE thus assesses the quality of an estimator in terms of its variation and

unbiasedness. Note that the MSE is not equivalent to the expected value of the absolute error.

Since MSE is an expectation, it is a scalar, and not a random variable. It may be a

function of the unknown parameter θ, but it does not depend on any random quantities. However,

when MSE is computed for a particular estimator of θ the true value of which is not known, it

will be subject to estimation error. In a Bayesian sense, this means that there are cases in which it

may be treated as a random variable.

ALTERNATIVE USAGE:

The term mean squared error is sometimes used to refer to residual sum of squares,

divided by the number of observations. This is an observed quantity, whereas the definition

above is a function of an unknown parameter. For more details, see errors and residuals in

statistics.

EXAMPLES:

Suppose we have a random sample of size n from an identically distributed population,

.

Some commonly-used estimators of the true parameters of the population, μ and σ2, are

shown in the following table (see notes for distribution requirements for the MSEs in the table

related to variance estimators).

True

valu

e

Estimator Mean squared error

θ = μ = the unbiased estimator of the

population mean,

Page 42: timevarying lms

θ =

σ2

= the unbiased estimator of the

population variance,

θ =

σ2

= the biased estimator of the

population variance,

θ =

σ2

= the biased estimator of the

population variance,

1. The MSEs shown for the variance estimators assume i.i.d. so that

. The result for follows easily from the variance that

is 2n − 2.

2. The general MSE expression for the unbiased variance estimator, without distribution

assumptions, is , where μ4 is the fourth central

moment.[3]

Page 43: timevarying lms

3. Unbiased estimators may not produce estimates with the smallest total variation (as

measured by MSE): 's MSE is larger than 's MSE.

4. Estimators with the smallest total variation may produce biased estimates: typically

underestimates σ2 by

INTERPRETATION:

An MSE of zero, meaning that the estimator predicts observations of the parameter θ

with perfect accuracy, is the ideal and forms the basis for the least squares method of regression

analysis.

While particular values of MSE other than zero are meaningless in and of themselves,

they may be used for comparative purposes. Two or more statistical models may be compared

using their MSEs as a measure of how well they explain a given set of observations: The

unbiased model with the smallest MSE is generally interpreted as best explaining the variability

in the observations.

Both linear regression techniques such as analysis of variance estimate the MSE as part

of the analysis and use the estimated MSE to determine the statistical significance of the factors

or predictors under study. The goal of experimental design is to construct experiments in such a

way that when the observations are analyzed, the MSE is close to zero relative to the magnitude

of at least one of the estimated treatment effects.

MSE is also used in several stepwise regression techniques as part of the determination as

to how many predictors from a candidate set to include in a model for a given set of

observations.

APPLICATIONS:

Minimizing MSE is a key criterion in selection estimators. Among unbiased estimators,

the minimal MSE is equivalent to minimizing the variance, and is obtained by the

MVUE. However, a biased estimator may have lower MSE; see estimator bias.

Page 44: timevarying lms

In statistical modelling, the MSE is defined as the difference between the actual

observations and the response predicted by the model and is used to determine whether

the model does not fit the data or whether the model can be simplified by removing terms

NOISE:

In common use, the word noise means any unwanted sound. In both analog and digital

electronics, noise is an unwanted perturbation to a wanted signal; it is called noise as a

generalisation of the audible noise heard when listening to a weak radio transmission. Signal

noise is heard as acoustic noise if played through a loudspeaker; it manifests as 'snow' on a

television or video image. In signal processing or computing it can be considered unwanted data

without meaning; that is, data that is not being used to transmit a signal, but is simply produced

as an unwanted by-product of other activities. In Information Theory, however, noise is still

considered to be information. In a broader sense, film grain or even advertisements encountered

while looking for something else can be considered noise. In biology, noise can describe the

variability of a measurement around the mean, for example transcriptional noise describes the

variability in gene activity between cells in a population.

Noise can block, distort, change or interfere with the meaning of a message in both

human and electronic communication.

In many of these areas, the special case of thermal noise arises, which sets a fundamental

lower limit to what can be measured or signaled and is related to basic physical processes at the

molecular level described by well-established thermodynamics considerations, some of which

are expressible by simple formulae.

GAUSSIAN NOISE:

Page 45: timevarying lms

Gaussian noise is statistical noise that has a probability density function (abbreviated pdf)

of the normal distribution (also known as Gaussian distribution). In other words, the values that

the noise can take on are Gaussian-distributed. It is most commonly used as additive white noise

to yield additive white Gaussian noise (AWGN).

ADDITIVE WHITE GAUSSIAN NOISE (AWGN):

Additive white Gaussian noise (AWGN) is a channel model in which the only

impairment to communication is a linear addition of wideband or white noise with a constant

spectral density (expressed as watts per hertz of bandwidth) and a Gaussian distribution of

amplitude. The model does not account for fading, frequency selectivity, interference,

nonlinearity or dispersion. However, it produces simple and tractable mathematical models.

which are useful for gaining insight into the underlying behavior of a system before these

other phenomena are considered.

Wideband Gaussian noise comes from many natural sources, such as the thermal

vibrations of atoms in conductors (referred to as thermal noise or Johnson-Nyquist noise), shot

noise, black body radiation from the earth and other warm objects, and from celestial sources

such as the Sun.

The AWGN channel is a good model for many satellite and deep space communication

links. It is not a good model for most terrestrial links because of multipath, terrain blocking,

interference, etc. However, for terrestrial path modeling, AWGN is commonly used to simulate

background noise of the channel under study, in addition to multipath, terrain blocking,

interference, ground clutter and self interference that modern radio systems encounter in

terrestrial operation.

Page 46: timevarying lms

LINEAR FREQUENCY MODULATION (FM) :

Till now we've seen signals that do not change in frequency over time. How do we

modify the signal to obtain a time-varying frequency?

A chirp signal is one that sweeps linearly from a low to a high frequency.

Can we create such a signal by concatenating small sequences, each with a frequency that

is higher than the last?

This approach will likely lead to problems lining up the phase of each segment so that

discontinuities aren't introduced in the resulting waveform (as seen below).

Figure : A signal made by concatenating sinusoids of different frequencies will result in

discontinuities if care is not taken to match the initial phase.

THE PROPOSED ALGORITHM:

In the conventional LMS algorithm, the weight vector coefficients w(n) for the FIR filter

are updated according tothe formula.

w(n) = w(n -1) + μ e(n)y(n) (1)

where w(n) = [wo(n) w1(n):::wM(n)] (M+1 being the filter length), μ is the convergence

parameter (sometimes referred to as step-size), e(n) = d(n) - z(n) is the output error (z(n) being

the filter output), and d(n) is the reference signal). Note that z(n) = w(n-1)y T (n) = x̂ (n) where

x̂ (n) is the original signal and y(n) = [y(n) y(n - 1):::y(n - M)] is the input signal to the filter.

Page 47: timevarying lms

For the algorithm to be useful for a range of FM signals with different bandwidths

(including single-tone sinusoids), we first specify the centre frequency fm in the spectrum of

interest. The conventional LMS algorithm is then used (with a singletone of frequency fm) to

find an optimal value of μ at that frequency. This optimal value μ0is used to update the

timevarying convergence parameter μn according to the following formula

μn=αn∗μ0 (2)

whereα n is a decaying factor. We will consider the following decaying law:

α n=c1

1+a nb (3)

where C, a, b are positive constants that will determine the magnitude and the rate of

decrease for α n. According to the above law, C has to be a positive number larger than 1. When

C = 1, α nwill be equal to 1 and the new algorithm will be the same as the conventional LMS

algorithm. A summary of the time-varying LMS algorithm is shown below:

Z(n)=W(n-1) yT(n) (4)

e(n)=d(n)-z(n) (5)

α n=c1

1+a nb (6)

μn=αn∗μ0 (7)

W(n)=W(n-1)+μn e (n ) y (n) (8)

Page 48: timevarying lms

Fig. 1. Spectrum for LFM narrowband signal with fo = 100 Hz, Bandwidth = 100 Hz and

Ts = 0.001.

ADAPTIVE FILTERS:

Adaptive filtering techniques to reduce this unwanted echo, thus increasing

communication quality. These echoes can be very annoying to callers. A widely used technique

to suppress echoes is to employ adaptive echo cancellers.

ADAPTIVE ECHO CANCELLERS:

A technique to remove or cancel echoes is shown in Figure. The echo canceller mimics

the transfer function of the echo path (or room acoustic) to synthesize a replica of the echo, and

then subtracts that replica from the combined echo and near-end speech (or disturbance) signal to

obtain the near end signal alone. However, the transfer function is unknown in practice, and so it

Page 49: timevarying lms

must be identified. The solution to this problem is to use an adaptive filter the method used to

cancel the echo signal is known as adaptive filtering.

Adaptive filters are dynamic filters which iteratively alter their characteristics in order to

achieve an optimal desired output. An adaptive filter algorithmically alters its parameters in

order to minimize a unction of the difference between the desired output d(n) and its actual

output y(n). This function is known as the cost function of the adaptive algorithm. Figure shows

a block diagram of the adaptive echo cancellation system implemented throughout this thesis.

Here the filter H(n) represents the impulse response of the acoustic environment, W(n) represents

the adaptive filter used to cancel the echo signal. The adaptive filter aims to equate its output

y(n) to the desired output d(n) (the signal reverberated within the acoustic environment). At each

iteration the error signal, e(n)=d(n)-y(n), is fed back into the filter, where the filter characteristics

are altered accordingly.

Block diagram of Adaptive Echo Canceller

CHOICE OF ALGORITHM:

A wide variety of recursive algorithms have been developed in the literature for the

operation of linear adaptive filters, In the final analysis, the choice of one algorithm over another

is determined by one or more of the following factors

RATE OF CONVERGENCE:

Page 50: timevarying lms

This is defined as the number of iterations required for the algorithm, in response to

stationary inputs, to converge “close enough” to the optimum wiener solution in the mean-square

error sense. A fast rate of convergence allows the algorithm to adapt rapidly to a stationary

environment of unknown statistics.

MISS ADJUSTMENT:

For an algorithm of interest, this parameter provides a quantitative measure of the amount

which the final value of the mean-square error, averaged over an ensemble of adaptive filters,

deviates from the minimum mean-square error produced by the Wiener filter.

TRACKING:

When an adaptive filtering algorithm operates in a non-stationary environment. The

algorithm is required to track statistical variations in the environment. Two contradictory

features, however, influence the tracking performance of the algorithm

(1) Rate of convergence, and

(2) steady-state fluctuation due to algorithm noise.

ROBUSTNESS:

For an adaptive filter to be robust, small disturbances (I.e., disturbances with small

energy) can only result in small estimation errors. The disturbances may arise from a variety of

factors, internal or external to the filter.

COMPUTATIONAL REQUIREMENTS:

Here the issues of concern include

(a) The number of operations (i.e., multiplications, divisions, and additions/ subtractions)

Required to make one complete iteration of the algorithm.

(b) The size of memory locations required to store the data and the program,

Page 51: timevarying lms

(c) The investment required to program the algorithm on a computer.

APPROACH TO DEVELOP LINEAR ADAPTIVE FILTER

STOCHASTIC GRADIENT APPROACH:

The stochastic gradient approach uses a tapped-delay line, or transversal filter, as the

structural basis for implementing the linear adaptive filter. For the case of stationary inputs, the

cost function, also referred to as the index of performance, is defined as the mean square error

(i.e., the mean square value of the difference between the desired response and the transversal

filter output). This cost function is precisely a second order function of the tap weights in the

transversal filter.

To develop a recursive algorithm for updating the tap weights of the adaptive transversal

filter, we proceed in two stages, First, we use an iterative procedure to solve the Wiener Hopf

equations (i.e., the Matrix equation defining the optimum Wiener solution); the iterative

procedure is based on the method of steepest descent, which is a well known technique in

optimization theory. This method required the use of a gradient vector, the value of which

depends on two parameters: the correlation Matrix of the tap inputs in the transversal filter and

the cross correlation vector between the desired response and the same tap inputs. Next, we use

instantaneous values for this correlation, so as to derive an estimate for the gradient vector,

making it assume a stochastic character in general.

The resulting algorithm is widely known as the least mean square (LMS) algorithm, , the

essence of which for the case of a transversal filter operating on real valued data may be

described as

Page 52: timevarying lms

Where the error signal is defined as the difference between some desired response and the actual

response of the transversal filter produced by the tap input vector.

LEAST MEAN SQUARE (LMS) ALGORITHM:

The Least Mean Square (LMS) algorithm was first developed by Widrow and Hoff

in1959 through their studies of pattern recognition. From there it has become one of the most

widely used algorithms in adaptive filtering. The LMS algorithm is an important member of the

family of stochastic gradient-based algorithms as it utilizes the gradient vector of the filter tap

weights to converge on the optimal wiener solution. It is well known and widely used due to its

computational simplicity. It is this simplicity that has made it the benchmark against which all

other adaptive filtering algorithms are judged.

The LMS algorithm is a linear adaptive filter algorithm, which in general consists of two

basic processes.

1. A filter process: This involves

a. Computing the output of a linear filter in response to an input signal.

b. Generating an estimation error by comparing this output with a desired response.

2. An adaptive process which involves the automatic adjustment of the

Parameters of the filter in accordance with the estimation error.

Page 53: timevarying lms

The combination of these two processes working together constitutes a feedback loop;

First, we have a transversal filter, around which the LMS algorithm is built. This component is

responsible for performing the filtering process. Second, we have a mechanism for performing

the adaptive control process on the tap weights of the transversal filter. With each iteration of the

LMS algorithm, the filter tap weights of the adaptive filter are updated according to the

following formula (Farhang-Boroujeny 1999).

w (n +1) = w(n) + 2μex(n)

Here x(n) is the input vector of time delayed input values,

x(n) = [x(n) x(n-1) x(n-2) –.x(n- N+1)]

The vector represents the coefficients of the adaptive FIR filter tap weight vector at time

n. The parameter μ is known as the step size parameter and is a small positive constant. This step

size parameter controls the influence of the updating factor.

Selection of a suitable value for μ is imperative to the performance of the LMS algorithm,

if the value is too small the time the adaptive filter takes to converge on the optimal solution will

be too long; if μ is too large the adaptive filter becomes unstable and its output diverges. Is the

simplest to implement and is stable when the step size parameter is selected appropriately. This

requires prior knowledge of the input signal which is not feasible for the echo cancellation

system.

DERIVATION OF THE LMS ALGORITHM:

The derivation of the LMS algorithm builds upon the theory of the wiener solution for the

optimal filter tap weights, Wo . It also depends on the steepest-descent algorithm. This is a

formula which updates the filter coefficients using the current tap weight vector and the current

gradient of the cost function with respect to the filter tap weight coefficient vector , ▼ξ

Page 54: timevarying lms

As the negative gradient vector points in the direction of steepest descent for the N-

dimensional quadratic cost function, each recursion shifts the value of the filter coefficients

closer toward their optimum value, which corresponds to the minimum achievable value of the

cost function, ξ(n).The LMS algorithm is a random process implementation of the steepest

descent algorithm. Here the expectation for the error signal is not known so the instantaneous

value is used as an estimate. The steepest descent algorithm then becomes

The gradient of the cost function,▼ξ(n), can alternatively be expressed in the following

form.

Substituting this into the steepest descent algorithm of Eqn , we arrive at the recursion for the

LMS adaptive algorithm.

w (n +1) = w(n) + 2μex(n)

Page 55: timevarying lms

IMPLEMENTATION OF THE LMS ALGORITHM:

Each iteration of the LMS algorithm requires 3 distinct steps in this order:

1. The output of the FIR filter, y(n) is calculated using equation

2. The value of the error estimation is calculated using equation

3. The tap weights of the FIR vector are updated in preparation for the next iteration,

by

The main reason for the LMS algorithms popularity in adaptive filtering is its

Computational simplicity, making it easier to implement than all other commonly use adaptive

algorithms. For each iteration the LMS algorithm requires 2Nadditions and 2N+1 multiplications

(N for calculating the output, y(n), one for 2μe(n) and an additional N for the scalar by vector

multiplication).

SIMULATION RESULTS:

Page 56: timevarying lms

In this section, we shall present simulation results to evaluate the performance of the

proposed algorithm using Matlab. In this simulation, the input signal for both algorithms has the

form y(t) = x(t) + n(t), n(t) being white Gaussian noise with 1dB power and x(t) is the

original signal assumed to be a finite-length LFM signal of the form

X(t)=cos(ω0t + α t 2

2¿π T ( t−T ) (9)

where ω0= 2π fo is a constant (initial frequency), taken here as 100 Hz, T is the signal

duration, and is the modulation index which will determine the bandwidth of LFM signal.

Fig. (1) shows the spectrum for an LFM narrowband signal that will be used in the

simulation, where fm is its mean frequency. The bandwidth BW of this LFM signal can be

adjusted by varying the parameter f m. Increasing will increase the signal bandwidth, as can be

numerically shown using the relationships

f m=1

∫0

w|X (w)|2

∫0

|X (w)|2dw (10)

BW= 12 π

∫0

¿¿ (11)

where X(f) is the Fourier transform of x(t).

The mean squared error (MSE) for each convergence parameter is calculated as follows:

MSE= 1N∑n=0

N

[ x (n )− x̂ (n)]2 (12)

where x(n) is the original signal and x̂ (n) is the filter output, which represents an

estimate of the input signal.

Fig. (2) shows the MSE for different LFM signals using the conventional and the time

varying LMS algorithms. The performance of the conventional LMS algorithm varies depending

on the LFM signal bandwidth. However, the optimal value for μ0 in the case of LFM is still

located in the lower region, within the range of 0.0001 to 0.0005. As a result, if the algorithm is

Page 57: timevarying lms

to be used with LFM signals with the same centre frequency (but probably with different

bandwidths), we should choose μ0 from this range for the time-varying LMS algorithm.

Fig. 2. MSE for conventional LMS algorithm (filter order = 100, Ts= 0.001, SNR = 1 dB, fo = 100 Hz).

Page 58: timevarying lms

Fig. 3. The effect of parameter C on MSE for time-varying LMS algorithm (filter order = 100, Ts = 0.001, SNR = 1 dB, fo = 100 Hz, LFM bandwidth = 50 Hz).

Fig. (3) shows the relationship between the C and μ0 using a 100-order filter for the time-

varying LMS algorithm used for noise reduction in an LFM narrowband signal of 50 Hz

bandwidth. Please note that Fig. (3) can be divided into two regions, small μ region (optimal

region ranging from μ 0.0002 to 0.0006) and large μ region (μ larger than 0.006). In the optimal

region, the time-varying LMS algorithm will always provide optimal value close to that achieved

by the conventional LMS algorithms. The parameter C does not only affect the MSE but it also

affects the convergence time as shown in Fig. (5), Fig. (9) and Fig. (10). Fig. (3) also shows that

the time-varying LMS algorithm works better when the convergence parameter is smaller than

the optimal μ for the conventional LMS algorithm.

Fig. (4) shows the performances of conventional LMS alogrithm and time-varying LMS

algorithm for a LFM narrowband signal bandwidth of 50 Hz and a single-tone signal of 125 Hz

(close to the mean frequency of LFM signal). Fig. (4) shows that the optimal _ for the two

Page 59: timevarying lms

signals are different. The optimal μ for a single-tone sinusoid is in the range 0.15*10−3 to 0.2*

10−3, and for an LFM narrowband signal of 50 Hz is around 0.4*10−3

Fig. (5) shows the estimation curve when the time-varying LMS algorithm is used for

noise reduction in narrowband FM signals. The curve in Fig. (5) is the estimated output error

e(n) from equation (5). In general, the time-varying LMS algorithm provides faster

convergence than the conventional LMS algorithm (C = 1). Fig. (5) also shows the effect of the

parameter C as a convergence controlling factor. Larger C will provide faster convergence.

Fig. (6), Fig. (7), and Fig. (8) show the mean-squared error versus the number of samples

N for LFM signals with different bandwidths and different values of C. These figures show that

the time-varying LMS algorithm provides better MSE. It can be concluded that the time-varying

LMS algorithm provides better MSE performance for a larger bandwidth.

Fig. (9) and Fig. (10) show the convergence of the time-varying LMS algorithm to a

limit of MSE = 0.05 using LFM signal with a bandwidth of 50 Hz and MSE = 0.08 for another

LFM signal with a bandwidth of 100 Hz. Fig. (9) and Fig. (10) also show that the time-varying

LMS algorithm provides faster convergence for larger C.

Page 60: timevarying lms

Fig. 4. MSE performance comparison for narrowband LFM of 50 Hz and signal-tone at

125 Hz for both conventional LMS and timevarying LMS algorithm (filter order = 100, Ts =

0.001, SNR = 1 dB).

Page 61: timevarying lms

Fig. 5. The effect of parameter C on estimation error curve using the time-varying LMS

for noise reduction in narrowband signals (filter order = 10, _o = 0.001, SNR = 2 dB, fo = 100

Hz, and LFM bandwidth = 50 Hz).

Fig. 6. MSE vs number of samples for different C values (filter order = 100, _o =

0.0002, SNR = 1 dB , single-tone at 125 Hz, a = 0.01 , and b =0.7).

Page 62: timevarying lms

Fig. 7. MSE vs Number of sample for differenre C used (filter order = 100, _o = 0.0002,

SNR = 1 dB , fo = 100, LFM bandwidth = 50 Hz, a = 0.01, b = 0.7).

Page 63: timevarying lms

CONCLUSIONS:

A new structure for the LMS algorithm with a decaying, time-varying law for the

convergence parameter has been proposed. In a stationary white Gaussian noise environment,

simulations show that the time-varying LMS algorithm provides faster convergence than the

conventional LMS algorithm and a smaller mean-squared error (MSE) which is close to the

optimal value. The algorithm is based on selecting the optimal value of the convergence

parameter using a single tone sinusoid with a frequency that equals the centre frequency of

expected LFM signals, assuming they are narrowband. The best decay controlling factor is

bandwidth-dependent. Further study of different decay laws is needed to extend the algorithm to

deal with non-linear FM signals.