BARNER AND AR - University of Delawarebarner/courses/eleg675/papers...BARNER AND AR CE INTR ODUCTION Original Signal Noisy Signal Trimming Order Statistic 0 10 20 30 40 50 60 70 80

Order�Statistic Filtering and Smoothing of

Time�Series� Part II

Kenneth E� Barner� and Gonzalo R� Arce�

� Applied Science and Engineering Laboratories�Department of Electrical Engineering

University of DelawareNewark� Delaware ��

�� Email� barner�udel�edu

� Department of Electrical EngineeringUniversity of DelawareNewark� Delaware ��

Abstract

This is the second paper of a two�part tutorial on the fundamentals of univariate time�series �ltering using order�statistics� where both temporal and rank orderings are consideredjointly� This second paper focuses on order�statistic selection �lters� where the �lter outputis restricted to be one of the input samples� In particular� we treat class of Weighted OrderStatistic �WOS� �lters� and the more generalized �lter class of Permutation WeightedOrder Statistic �PWOS� �lters� By combining temporal� and rank�order based weightingwith order�statistic selection� detail and edge preserving �lters that are robust to outliersand sample contamination can be constructed� Like their weighted sum counterparts�these selection �lters can be applied to the smoothing� �ltering� and forecasting of time�series� Furthermore� selection �lters can be optimized as a function of the underlyingsignal statistics� While the weighted sum �lter optimization is formulated under the MeanSquared Error� the selection �lters utilize the more robust Mean Absolute Error �MAE�criteria� This MAE optimization and selection based estimates results in a robust class of�lters that has advantages over the weighted sum counterparts in many applications� Theevolution of these �lters is covered and illustrative examples are given demonstrating theproperties and performance of this class of estimators�

Invited paper to appear in the � Handbook of Statistics�� Order Statistics and Their Applications��

C� R� Rao and N� Balakrishnan� Editors�

BARNER AND ARCE CONTENTS

Contents

� Introduction �

� The Median Filter �

�� The Running Median Filter � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Statistical Properties � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Deterministic Properties � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Median Filtering and Threshold Decomposition � � � � � � � � � � � � � � � � � � ��

� Weighted Median Filters ��

�� Center Weighted Median Filters � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Weighted Median Filters � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Weighted Order Statistic Filters � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Threshold Decomposition and Logic � � � � � � � � � � � � � � � � � � � � � � � � ��

� Time�Rank Coupling Extensions� PWOS Filters ��

�� L�� PWOS lters � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Lj� PWOS lters � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� PWOS Filter Lattices � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Model Order �Complexity� Reduction � � � � � � � � � � � � � � � � � � � � � � � ��

� Optimization Techniques ��

�� Problem Formulation � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Algorithm I � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� The Steepest Descent and LMS Algorithms � � � � � � � � � � � � � � � � ��

�� Algorithm II � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Applications to Image Restoration ��

� Conclusion ��

�

BARNER AND ARCE � INTRODUCTION

� Introduction

Data time�series occur naturally in numerous elds of study including economics� engineering�

medical� and many social elds� These time�series must often be processed� or ltered� to

extract some information of interest� Traditionally� this ltering has been linear� Certainly�

linear lters have a sound theoretical basis and have been extensively studied� Unfortunately�

linear lters su�er from poor performance in many applications� Among the signals that linear

lters perform poorly on are those with changing levels and corrupting noise that is either

heavy tailed or signal dependent �� This poor performance has lead to the investigation of

nonlinear ltering methodologies�

The design of nonlinear lters can follow many approaches since there is no single underlying

theory of nonlinear lters� Thus� nonlinear lters range from simple ad hocmethods designed to

tackle a single problem� to increasingly theoretically founded approaches that are more widely

applicable� One nonlinear ltering approach that has received considerable attention� and for

which much theoretical study has been conducted� is that based on rank�order� Indeed� much

attention has been paid to rank�order lters since the running median lter was rst applied

to the smoothing of time�series by Tukey in ��

The rank�ordering of samples allows the design of lter structures that are �a� robust in

environments where the assumed statistics deviate from Gaussian models and are possibly

contaminated with outliers� and �b� track signal discontinuities without introducing transient

or blurring artifacts as linear lters do� Filter classes that operate on rank�order information

can be broadly broken down into two categories according to how the estimate is formed� The

two lter categories are weighted sum and selection type� The weighted sum type lters form

estimates by weighting the input samples� often as a function of temporal� and rank�order� and

then summing the weighted samples to obtain an estimate� Such lters were discussed in Part

I of this paper� The selection type lters take a di�erent approach� restricting the output to

be one of the input samples� As in the weighted sum case� the input samples can be weighted

to re�ect importance� but the lter output must be one of the observation samples�

Selection rank�order lters have advantages over their weighted sum counterparts in many

applications� This is particularly true for signals with numerous edges� such as images or

biomedical signals where the measured process can change states abruptly� Weighted sum

based lters tend to blur the edges of such signals� even if their weights are a function of

temporal� and rank�order� In images� accurate tracking of edges is vital due to the nonlinear

nature of the human visual system�

Selection type lters have considerable advantages in edge tracking as compared to weighted

sum lters� To illustrate this and motivate the selection approach� consider the raster�scan

order ltering of an image corrupted by impulsive type noise� A common approach to limiting

the e�ect of impulsive outliers is through trimming� In a weighted sum approach� this leads to

�


Original Signal Noisy Signal Trimming Order Statistic

0 10 20 30 40 50 60 70 80 90 10060

80

100

120

140

160

180

200

220

240

260

Figure �� A single scan line from the image �aerial�� The original� corrupted and runningorder�statistics x�� and x�N�� are shown� The corruption is additive Laplacian noise withprobability of occurance �� and � � �� Also� N � �� and � � ��

the ��trimmed mean� The output of this lter at instant n is

y�n� ��

N � �� N��Xi��

x�i��

where x�� x�� x�N� are the N observed samples in rank order� Thus� the ��trimmedmean averages over all but the �� smallest and largest samples� If � � �� the sample meanis realized while for � � N � the sample median is realized� A comparable trimming method

that is selection type is the center weighted median �� which can be expressed as

y�n� � MED�x�� x�n�� x�N��

For this lter� the output is identical to the input as long as x�� x�n� � x�N�� If x�n� isoutside this range� the output is trimmed to either x�� or x�N�� To compare the weightedsum and selection approaches� consider the single image scan line shown in Fig� �� This

gure shows the original scan line� the scan corrupted by impulsive noise� and the running

trimming statistics x�� and x�N�� As the gure shows� these statistics form a bandbetween which the samples are either averaged �weighted sum approach� or the input is passed

to the output �selection approach�� Figure � shows the results of the two ltering operations�

While both suppress outliers� the selection approach clearly performs better than the weighted

sum approach which excessively smoothes all edges� The advantages of the selection approach

�


Original Signal Selection Filter Output Weighted Sum Filter Output

0 10 20 30 40 50 60 70 80 90 10060

80

100

120

140

160

180

200

220

240

Figure �� The output of the selection and weighted sum lters operating on the corrupted scanline in Fig� ��

can more clearly be seen by examining the image in Fig� � whose upper left quarter is the

original �aerial� image� the upper right quarter of the gure is the corresponding quarter

of the image which has been corrupted by noise� the bottom left quarter is the output of

a weighted�sum type order�statistic lter� and the bottom right quarter is the output of a

selection�type order statistic lter� Both lters operate on a raster scan and have a width of

seven�

This example illustrates that the selection approach to ltering has clear advantages for

certain signals� It is this general category of selection order�statistic lters that we cover in

this paper� We begin by giving a brief review of the most well known and thoroughly studied

selection order�statistic lter� the median lter� The median lter is also the starting point for

many generalizations that have been developed� Therefore� a thorough understanding of the

median lter is necessary to fully understand the principals behind the generalizations�

The median lter� as will be shown� possesses many optimality properties� However� the

lter o�ers little �exibility and is temporal blind� That is� all temporal information is lost

in the ltering process� Permuting the time ordered observations� for instance� does not alter

the lter output� This lack of temporal information causes performance to su�er� As a result�

numerous generalizations of the median lter have been introduced that incorporate some form

of temporal information ��

Temporal information can be incorporated into order�statistic ltering through weighting

of time ordered samples prior to rank ordering� This leads to the class of Weighted Median

lters and Weighted Order Statistic �WOS� lters �� Through weighting� certain temporal

�


Figure �� The image �aerial� broken into four quadrants� upper left� original� upper right�noisy� lower left� weighted sum lter output� lower right� selection output�

samples can be emphasized while others are deemphasized� This weighting scheme incorporates

temporal information and results in considerable performance gain over temporal blind �strictly

rank�order� lters� Still� the temporal�order weighting followed by rank ordering decouples the

temporal from rank information during weighting� Due to this decoupling� these lters use

only a fraction of the temporal and rank information contained in the two orderings�

The full temporal and rank information is represented by the mapping that takes one or�

dering to the other� p � x �� xL where x and xL are vectors containing the temporally andrank ordered observation samples� respectively� The full permutation mapping information

can be utilized by coupling the temporal� and rank�order during weighting� This results in the

powerful class of Permutation Weighted Order�Statistic �PWOS� lters �� While the per�

formance achieved by using the full permutation information can be impressive� the explosive

growth in the parameter set limits the number of samples for which the full information can

be used� To combat this problem� a Lj� lattice approach to coupling temporal� and rank�order

information is used� In the lattice terminology� � and L refer to temporal� and rank�order

respectively� The exponents govern the amount of rank �j� coupling used� Thus� the amount

of temporal and rank coupling is easily controlled� This o�ers �exibility in performance as

well as control over the parameter set� In addition to the lattice approach� we detail alter�

native methods for reducing the permutation information while retaining performance gains�

To e�ectively utilize these classes of lters� the parameters must be set appropriately for the

�

BARNER AND ARCE � THE MEDIAN FILTER

task at hand� To this end� we present two adaptive optimization techniques� Lastly� numerous

examples are given illustrating the performance of the various lters�

� The Median Filter

The running median lter was the genesis for the broad array of rank order based ltering

techniques that exist today� and that continue to be developed� The running median lter

was rst suggested as a nonlinear smoother for time series data by Tukey in �� Since

median lters are the foundation upon which current rank order based ltering techniques

are based� a thorough understanding of the median lter and its properties is crucial to the

development and understanding of current techniques� As such� a brief review of the median

lter is given in this section� The review includes formal denitions and a survey of statistical

and deterministic properties developed to characterize the median lter performance� Also

included is a review of threshold decomposition� which was instrumental in developing many

of the median lter properties� This review serves as a starting point for the median lter

generalizations developed in the following sections�

�� The Running Median Filter

To dene the running median lter� let fxg be a discrete time sequence� The running medianpasses a window over the sequence fxg that selects� at each instant n� an odd number of samplesto comprise the observation vector x�n�� The observation window is typically symmetric and

centered at n� resulting in

x�n� � �x�n�N�� x�n�� x�n�N��T � ��

where N� may range in value over the nonnegative integers and N � �N� � � is the �odd

valued� window size� While processing such non�causal observation vectors has traditionally

been referred to as smoothing� we loosen the terminology somewhat and refer to the processing

of both causal and non�causal observations as simply ltering� The median lter operating on

the input sequence fxg produces the output sequence fyg� where at time index n

y�n� � MED�x�n��

� Median value of �x�n�N�� x�n�� x�n�N��

That is� the samples in the observation window are sorted and the middle� or median� value is

taken as the output�

The input sequence fxg may be either nite or innite in extent� For the nite case� thesamples of fxg can be indexed as x�� x�� x�L�� where L is the length of the sequence�Due to the symmetric nature of the observation window� the window extends beyond a nite

extent input sequence at both the beginning and end� These end e�ects are generally accounted

�


0

1

2

3

4

5

0

1

2

3

4

5

Input

Output

Filter Motion

Figure �� The operation of the window width � median lter� �� appended points�

for by appending N� samples at the beginning and end of fxg� Although the appended samplescan be arbitrarily chosen� typically these are selected so that the points appended at the

beginning of the sequence have the same value as the rst signal point� and the points appended

at the end of the sequence all have the value of the last signal point�

To illustrate the appending of input sequence and the median ltering operation� consider

the input signal fxg of Figure �� In this example� fxg consists of � observations from a��level process� fx � x�n� � f� �� g� n � �� g� The gure shows the input se�quence and the resulting output sequence for a window size � median lter� Note that to

account for edge e�ects� two samples have been appended to both the beginning and end

of the sequence� The median lter output at the window location shown in the gure is

y�� MED�x�� x�� x�� x�� x�� MED� ��

The median ltering operation is clearly nonlinear� As such� the median lter does not

possess the superposition property� Thus� traditional frequency and impulse response analysis

are not applicable� The impulse response of a median lter is� in fact� zero for all time�

Consequently� alternative methods for analyzing and characterizing median lters must be

employed� Broadly speaking� two types of analysis have been applied to the characterization

of median lters� statistical and deterministic� Statistical properties examine the performance

of the median lter� through such measures as optimality and output variance� for the case

of white noise time sequences� Conversely� deterministic properties examine the lter output

characteristics for specic types of commonly occurring deterministic time sequences� In the

following� we review some of the statistical and deterministic properties of running median

lters�

�


�� Statistical Properties

The statistical properties of median lters can be examined through the derivation of output

distributions and statistical conditions on the optimality of median estimates� These analysis

generally assume that the input to the median lter is a constant signal with additive white

noise� The assumption that the noise is additive and white is quite natural and made similarly

in the analysis of linear lters� The assumption that the underlying signal is a constant is

certainly convenient� but more importantly� often valid� This is especially true for the types of

signals median lters are most frequently applied to� such as images� Signals such as images

are characterized by regions of constant value separated by sharp transitions� or edges� Thus�

the statistical analysis of a constant region is valid for large portions of these commonly used

signals� By calculating the output distribution of the median lter over a constant region� the

noise smoothing capabilities of the median can be measured through statistics such as the lter

output variance�

The median lter properties covered here are for time series signals consisting of white noise

observation samples with known distribution� Since the observation sequence is probabilistic�

the time index can be dropped and attention focused on a single observation vector� In this

case� and others for which the time index n can be dropped without confusion� we do so and

denote the observation vector as simply x � �x�� x�� xN �� Consider rst the case where the

observation samples are white noise with a double exponential� or Laplacian� distribution� In

this case� the common probability density function �pdf� is given by fx�t� ��p��

e�p��jt��j�

where � and �� are the mean and variance� respectively� For a vector of samples� the joint pdf

is

fx�t� �

��p��

�Ne�

p��

PNi��

jti��j� ��

Given an observation vector x� the Maximum Likelihood �ML� estimate of the mean� or location

parameter� is found by maximizing �� with t � x� To simplify the notation� dene the distance

operator D�� asD��

NXi��

jxi � �j� � ��

Then the ML estimate of the location� for Laplacian distributed samples� is the value � that

minimizes D�� with � � �� It is easy to show that

MED�x�� x�� xN � � argmin�

D��

Thus� the median of the samples x�� x�� xN is the value � that minimizes D�� and

consequently� the ML estimate of location for samples with a Laplacian distribution� As a

comparison�

MEAN �x�� x�� xN � ��

N

NXi��

xi � argmin�

D��

�


is the ML estimate of location for samples with a Gaussian distribution� The median and sam�

ple mean are� thus� optimal estimates of location for the Laplacian and Gaussian distributions�

respectively� This shows that for heavy tailed distributions� such as the Laplacian� the median

has advantages over the linear combination based sample mean� A further examination of

D�� and D�� reinforces this point� The median is clearly the least absolute error estimateof the center of the distribution for x�� x�� xN � while the mean is the least squared error

estimator� The reliance on the absolute error criteria means that the median is less in�uenced

by outliers than the squared error based mean�

Having established the types of signals for which median lters are optimal� the ltering

operation can be further characterized through the determination of output distributions�

Assume again that the input time series consists of white noise samples with pdf fx�� andcumulative distribution �cdf� Fx�� Under these conditions on the input samples� it well knownthat the median lter output cdf� Fmed�� and pdf� fmed�� are given by

Fmed�t� �NX

i�N��

�Ni

�Fx�t�

i�� Fx�t��N�i ��

and

fmed�t� �N �

N��N��fx�t�Fx�t�

N�� Fx�t��N� ��

respectively �� From these expressions it can be shown that for t� and t� such that Fx�t��

� � Fx�t�� then Fmed�t�� Fmed�t�� also holds� By setting t� � t�� where by denitiont�� is the point satisfying Fx�t�� we see that the median is statistically unbiased in the

sense that the median of the input is the median of the output� Moreover� the median behaves

consistently for samples with asymmetric distributions�

The calculation of statistics such as the output mean and variance from the expressions

in �� and �� is often quite di�cult� Insight into the smoothing characteristics of the

median lter can� however� be gained by examining the asymptotic behavior �N � �� ofthese statistics� where� under some general assumptions� results can be derived� For the case

of white noise input samples� the asymptotic mean� �med� and variance� ��med� of the median

lter output are

�med � t��

and

��med ��

�N�fx�t��

Thus� the median produces a consistent �limN�� and unbiased estimate of the inputdistribution median� irrespective of the input distribution� Note that the output variance is

not proportional to the input variance� but rather ��fx�t�� For heavy tailed noise such as

impulsive� ��fx�t�� is not related to input variance� i�e�� the variance is proportional to the

impulse magnitude� not ��fx�t�� Thus� the output variance of the median in this case is not


Mean and Median Filter Output Variance

Input Sample Probability Filter TypeDensity Function Mean Median

Uniformfx�t� �

��p��

for�p�� t �

p��

� otherwise��

N��

N��

Gaussianfx�t� �

�p��

e� �

��t��

N��

�N

Laplacianfx�t� �

�p��

e�p�

�jt��j ��

N��

�N

Table �� Asymptotic output variences for the window size N mean and median lters for whiteinput samples with uniform� Gaussian� and Laplacian distributions�

proportional to the input variance� This is not true for the sample mean and further explains

the more robust behavior of the median�

The variances for the sample mean and median lter output are given in Table � for the

uniform� Gaussian� and Laplacian input distribution cases �� The results hold for all N in

the uniform case and are asymptotic for the Gaussian and Laplacian cases� Note that the

median performs about � dB better than the sample mean for the Laplacian case and � dB

worse in the Gaussian case�

The median lter possesses numerous statistical properties in addition to those discussed

above� Among those properties that illustrate the optimality of the median are ��

�� The conditional median at each time instant n is the minimum Mean Absolute Error

�MAE� estimator of the signal value at time n� where the conditioning is on the past

history up to time n of the noise corrupted observations of the signal�

�� The running median is� with high probability� a maximum a posteriori estimator of a

constant signal in symmetric impulsive noise�

These statistical properties are complemented by a set of deterministic properties� which are

discussed next�

�� Deterministic Properties

Statistical properties give considerable insight into the median lter performance� The median

lter cannot� however� be su�ciently characterized through statistical properties alone� For

instance� an important question not answered by the statistical properties is what type of

signal� if any� is passed through the median lter unaltered� Linear lters� for example� can

�


be analyzed in the frequency domain to determine� among other things� pass� and stop�band

frequencies� If the frequency content of the input signal lies exclusively in the lter passband�

then the signal passes through the lter unaltered�� Conversely� signal content in the stop band

does not pass through� or is at least attenuated by� the lter� Somewhat analogous results do

in fact exist for the median lter� For median lters� passband or invariant signals are referred

to as root signals� The concept of root signals is important to the understanding of median

lters and their e�ect on general signal structures� A review of the signicant results in root

signal analysis is given in the following along with the main median lter properties resulting

from this analysis�

The denition of a root signal is quite simple� a signal is a median lter root if the signal

is invariant under the median ltering operation� Thus� a signal fxg is a root of the windowsize N � �N� � � median lter if

x�n� � MED�x�n�N�� x�n�� x�n�N��

for all n� As an example� consider the signal shown in Fig� �� This signal is ltered by three

di�erent window size median lters �N� � �� and �� Note that for the window size three

case �N� � �� the lter output is a root� That is� further ltering of this signal with the

window size three median lter does not alter the signal� Notice� however� that if this same

signal is ltered with a larger window size median� the signal will be modied� Thus� the signal

in Fig� ��b� is in the passband� or a root� of a N� � � median lter but outside the passband�

or not a root� of the N� � � and N� � � lters�

The goal of root analysis is to relate the ltering of desired signals corrupted by noise to

root and non�root signals� If it can be shown that certain types of desired signals are in the

median lter root set� while noise is outside the root set� then the median ltering of a time

series will preserve desired structures while altering the noise� Such a result does in fact hold

and will be made clear through the following denitions and properties� First note that� as

the example above illustrates� whether or not a signal is a median lter root depends on the

window size of the lter in question� Clearly� all signals are roots of the window size one median

�identity� lter� To investigate this dependence on window size� median lter root signals can

be characterized in terms of local signal structures� where the local signal structures are related

to the lter window size� Such a local structure based analysis serves two purposes� First�

it denes signal structures that� when properly combined� form the median lter root set�

Second� by relating the local structures to the lter window size� the e�ect of window size on

roots is made clear� The local structure analysis of median lter roots relies on the following

denitions�

Denition �� A constant neighborhood is a region of at least N�� consecutive identically

�In general� the pass�band is de�ned in terms of the magnitude response� Thus� there may be some timeshifting of signals in the pass�band� depending on the �lter phase response�

�


0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

Output signal for a window

of size 3

Input signal x(n)

0

1

2

3

4



of size 5

of size 7

Figure �� E�ects of window size on a median ltered signal� �� appended points�

valued points� �

Denition �� An edge is a monotonic region between two constant neighborhoods of dif�

ferent value� The connecting monotonic region cannot contain any constant neighborhoods�

�

Denition �� An impulse is a constant neighborhood followed by at least one� but no more

than N� points which are then followed by another constant neighborhood having the same

value as the rst constant neighborhood� The two boundary points of these at most N points

do not have the same value as the two constant neighborhoods� �

Denition �� An oscillation is a sequence of points which is not part of a constant neigh�

borhood� an edge or an impulse� �

These denitions may now be used to develop a description of those signals which do and

those which do not pass through a median lter without being perturbed� In particular� Gal�

lagher and Wise �� have developed a number of properties which characterize these signal

sets for the case of nite length sequences� First� any impulse will be eliminated upon me�

dian ltering� Secondly� a nite length signal is a median lter root if it consists of constant

��


neighborhoods and edges only� Thus� if a desired signal is constructed solely of constant neigh�

borhoods and edges� then it will not be altered by the median ltering operation� Conversely�

if observation noise consists of impulses �as dened above�� it will be removed by the median

ltering operation� These median lter root properties are made exact by the following�

Denition �� A sequence fxg is said to be locally monotonic of lengthm� denoted LOMO�m��if the subsequence x�n�� x�n� �� x�n�m� �� is monotonic for all n ��

Property �� Given a length L sequence to be median ltered with a length N � �N� � �

window� a necessary and su�cient condition for the signal to be invariant �a root� under median

ltering is that the extended �beginning and end appended� signal be LOMO�N� � ��

Thus� the set of signals that forms the passband or root set �invariant to ltering� of a size

N median lter consists solely of those signals that are formed of constant neighborhoods and

edges� Note that by the denition of LOMO�m�� a change of trend implies that the sequence

must stay constant for at least m� � points� It follows that for a median lter root signal tocontain both increasing and decreasing regions� these regions must be separated by a constant

neighborhood of least N� � � identically valued samples� It is also clear from the denition

of LOMO�� that a LOMO�m�� sequence is also LOMO�m�� for any two positive integersm� � m�� This implies that the roots for decreasing window size median lters are nested�i�e�� every root of a window size M lter is also a root of a window sized N median lter for

all N M � This is formalized by�

Property �� Let S denote a set of nite length sequences and RN� be the root set of the

window size N � �N� � � median lter operating on S� Then the root sets are nested such

that � � �RN�� RN� � RN�� R� � R� � S� �

In addition to the above description of the root signal set for a median lter� it can be

shown that any signal of nite length is mapped to a root signal by repeated median ltering�

In fact� it is simple to show that the rst and last points to change value on a median ltering

operation remain invariant upon additional lter passes� where repeated lter passes consist

of using the output of the prior lter pass for the input of an identical lter on the current

pass� This fact� in turn� indicates that any L long nonroot signal �oscillations and impulses�

will become a root structure after a maximum of �L � �� successive lterings� This simplebound was improved in �� where it was shown that at most

�

�L� �

��N� � ��

��

passes of the median lter are required to reach a root� This bound is conservative in practice

since in most cases a root signal is obtained after ten or so lter passes�

��


0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

Input signal x(n)

0

1

2

3

4

Root signal for a window


of size 5 ( 2 filter passes).

of size 3 ( 1 filter pass).

of size 7 ( 2 filter passes).


Figure �� Root signals obtained by median lters of size �� and �� appended points�

The median lter root properties are illustrated through an example in Fig� �� This gure

shows an original signal and the resultant root signals after multiple passes of window size ��

�� and � median lters� Note that while it takes only a single pass of the window size � median

lter to obtain a root� it takes two passes for the window sizes � and � median lters� Clearly�

the locally monotonic structure requirements of the root signals are satised in Fig� �� For

the window size � case� the input sequence becomes LOMO�� after a single pass of the lter�

Thus� this sequence is in the root set of the window size � median lter� but not a root of the

window size N � median lter since it is not LOMO�N� for N ��

The deterministic and statistical properties form a powerful set of tools for describing the

median ltering operation and performance� Together� they show that the median lter is

an optimal estimator of location for Laplacian noise and that common signal structures� e�g��

constant neighborhoods and edges in images� are in the lter pass�band �root set�� Moreover�

impulses are removed by the ltering operation and repeated passes of the median lter always

results in the signal converging to a root� where the root consists of a well dened set of

structures related to the lter window size�

��


�� Median Filtering and Threshold Decomposition

A fundamental property of median lters is threshold decomposition �� This property was

the key to deriving many of the median lter statistical and deterministic properties� Moreover�

threshold decomposition is instrumental in the optimization of the median lter generalizations

discussed in the following sections� A review of this important property is therefore in order�

Threshold decomposition is simply a means of decomposing anM �level signal into an equiv�

alent set of M � � binary sequences�� Let x�n� � �x�� x�� xN � be an N element observationvector where the signal is quantized toM levels such that x�n� � ZM � f� �� M��g� Thethreshold decomposition of x�n� results in the set of binary vectorsX��n��X��n�� XM��n��where Xi�n� � f� �gN is the observation vector thresholded at level i for i � �� M � ��As a function of the threshold operator Ti��

Xi�n� � Ti�x�n��

� �Ti�x�� Ti�x�� Ti�xN ��

�hXi��X

i�� X

iN

i� ��

where Ti�� is dened asXij � Ti�xj � �

�� if xj � i

otherwise

� ��

for i � �� M � � and j � �� N � In terms of the time indexed samples� Xi�n� �Ti�x�n�� Threshold decomposition can be reversed by simply adding the threshold decomposed

signals�

x�n� �M��Xi��

Xi�n� and x�n� �M��Xi��

Xi�n��

Furthermore� it was shown by Fitch et� al� that the median operation commutes with thresh�

olding �� Stated more formally� the median ltering of aM�level signal x�n� � f� �� M��g is equivalent to ltering the M � � threshold signals X��n��X��n�� XM��n�� and sum�ming the results�

MED�x�n�� M��Xi��

MED�Xi�n��

for all n� Thus� threshold decomposition is a weak superposition property� A related property

is the partial ordering property known as the stacking property�

Denition �� Let X and Y be N element binary vectors� Then X stacks on Y� which is

denoted as Y � X� if and only if Yi � Xi for i � �� N � A function f�� possesses thestacking property if and only if

Y � X f�Y� � f�X�� For now we restrict the discussion to quantized signals� This restriction is lifted in Section ��

��

BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

1 01 2 0 3 3 1 2 2

0 0 0

001 1 1

0 00 0 0

1 1 0

1 1

1 1 0 0 0

0 1

1 1

1 1

1

Binary Med. Filter

Binary Med. Filter

Binary Med. Filter

Threshold at 1, 2, and 3 Add binary outputs

Median Filter 1 1 3 3 2 21 0 2 2

0 0

1 1

0 00 0 0

1 1

1 1

1 1 0 0 0

0

1 1

1 1

11 10

0 1 1

Figure �� Median Filtering by threshold decomposition� The ��valued input signal is lteredby the running sorting method in the top part of the gure� In the bottom part of the gure�the signal is rst decomposed into a set of binary signals and each of these is ltered by abinary median lter� The output is produced by adding together the outputs of the binarymedian lters�

�

The median lter was shown to possesses the stacking property �� which can be stated as

follows� In the threshold decomposition domain� the binary median lter output at threshold

level i is always less than or equal to the binary median lter output at lower threshold levels�

MED�Xi�n�� MED�Xj�n��

for all i� j such that � � j � i �M � ��The stacking property is a partial ordering property� It states that the result of applying the

median lter to each of the binary sequences obtained by thresholding the original signal will

have a specic structure to them� Thus� in median ltering by threshold decomposition� the

input sequence is rst decomposed inM�� binary sequences� and each of these is then lteredby a binary median lter� Furthermore� the set of output sequences possesses the stacking

property� As a simple example� consider the median lter of window size three �N � ��

being applied to a ��level input signal as shown in Fig� �� The outputs of the multi�level

median lter and of the threshold decomposition median lter are identical because of the

weak superposition property�

� Weighted Median Filters

Numerous generalizations to the median ltering operation have been introduced since Tukey

rst suggested the median lter as a smoother in �� While many di�erent approaches

have been taken in an attempt to improve the median lter performance� most have� in some

��


way� attempted to include temporal information into the ltering process� For most signals�

and certainly those of practical interest� it is clear that certain observation samples have a

higher degree of correlation with the desired estimate than do others� In the linear lter case�

this correlation is re�ected in the weight given each sample� A similar weighting approach can

be taken to generalize the median lter�

The sample weighting approach to generalizing the median lter is developed in this section�

We begin by discussing the Center Weighted Median �CWM� lter� in which only one sample�

the sample centrally located in the observation window� is weighted� This is then generalized

to the Weighted Median �WM� lter case in which all observation samples are weighted� In

both the CWM and WM lter cases the output is the median value of the weighted set� A

further generalization can be achieved by allowing the output to be an order statistic other than

the median� This leads to the class of Weighted Order Statistic �WOS� lters� Following the

development of these generalizations� we show that each possesses the threshold decomposition

property� As noted earlier� threshold decomposition is an extremely powerful tool for both lter

analysis and optimization� and is the nal topic covered in this section�

�� Center Weighted Median Filters

The median lter is strictly a rank order operator� Thus� all temporal locations within the

observation window are considered equivalent� That is� given a window of observation samples�

any permutation of the samples within the observation window results in an identical median

lter output� As stated above� for most signals certain samples within the observation window

are more correlated with the desired estimate than are others� Due to the symmetric nature of

the observation window� the sample most correlated with the desired estimate is� in general�

the center observation sample�

The center observation sample can be weighted to re�ect its importance� or correlation

with the desired estimate� Since median lters select the output in a di�erent fashion than

do linear lters� i�e�� ranking versus summing� the observation samples must also be weighted

di�erently� In the median ltering case� weighting is accomplished through repetition� Thus�

the output of the CWM lter is given by

y�n� � MED�x�� xc�� xc wc� xc�� xN ��

where xcwc denotes the replication operator xcwc �wc timesz ��

xc� xc� � � � � xc and c � �N�� N��

is the index of the center sample� The center sample is thus repeated wc times� where wc is

non�zero odd positive integer� Consequently� the output of the CWM lter is the median over

an extended set containing multiple center samples� When wc � �� the operator is a median

lter� and for wc � N � the CWM reduces to an identity operation� On the right side of �� thetime index n has been dropped for notational simplicity and the observation samples indexed

��


0 50 100 150 200 250 300 350 400 450 500-1

0

1

2

3

4

5

time n

wei

ght w

Figure � E�ects of increasing the center weight of a CWM lter of size N � � operating onthe voiced speech �a�� The CWM lter output is shown for wc � �� Note that forwc � � the CWM reduces to median lter� and for wc � � it becomes the identity lter�

according to their location in the observation window� In terms of the time series� the samples

in the observation window are xi � x�n� �N� � �� i� for i � �� N �The e�ect of varying the center sample weight is perhaps best seen by way of an example�

Consider a segment of recorded speech� The voiced waveform �a� is shown at the top of Fig� �

This speech signal is taken as the input of a CWM lter of size �� The outputs of the CWM�

as the weight parameter wc from � to �� are also shown in Fig� � The vertical index denotes

the value given to wc� The signal at the top is the original signal� or the output signal of the

CMW when wc � N � or � in this example� The second signal from the top is the CWM ltered

signal with wc � N � �� The weight wc is successively decreased until wc � �� in which casethe CWM lter reduces to the standard median�

The smoothing characteristics of the CWM lter� as a function of the center sample weight�

are illustrated in the previous example and gure� Clearly� as wc is increased less smoothing

occurs� This response of the CWM lter is explained by the following property which relates

��


x(k ) x(N+1-k)x(1) x(N)

Figure �� The center weighted median ltering operation� The center observation sample ismapped to the order statistic x�k� �x�N��k�� if the center sample is less �greater� than x�k��x�N��k�� and left unaltered otherwise�

the weight wc and the CWM lter output to select order statistics �OS�� The N observation

samples x�� x�� xN can be written as an OS vector�

xL � �x�� x�� x�N��

where x�� x�� x�N�� The following relation �� utilizes this notation�

Property �� Let fyg be the output of a CWM lter operating on the sequence fxg� Then

y�n� � MED�x�� xc�� xc wc� xc�� xN �� MED

hx�k�� xc� x�N�k��

i��

where k � �N � �� wc�� for � � wc � N � and k � � for wc N � �

From this property we can write the CWM lter output y�n� as

y�n� �

��

xc if x�k� � xc � x�N��k�x�k� if xc � x�k�x�N��k� if xc � x�N��k�

� ��

Since x�n� is the center sample in the observation window� i�e�� xc � x�n�� equation ��

indicates that the output of the lter is identical to the input as long as the x�n� lies in the

intervalhx�k�� x�N��k�

i� If the center input sample is greater than x�N��k� the lter outputs

x�N��k�� guarding against a high rank order �large� aberrant data point being taken as theoutput� Similarly� the lter�s output is x�k� if the sample x�n� is smaller than this order

statistic� This CWM lter performance characteristic is illustrated in Figs� � and �� Figure �

shows how the input sample is left unaltered if it is between the trimming statistics x�k� and

x�N��k� and mapped to one of these statistics if it is outside this range� Figure � showsan example of the CWM lter operating on a Laplacian sequence� Along with the input and

output� the trimming statistics are shown� It is easily seen how increasing k tightens the range

in which the input is passed directly to the output�

�� Weighted Median Filters

The weighting scheme used by CWM lters can be naturally extended to include all input

samples� To this end� let w � �w�� w�� wN � be a N long weight vector with positive integer

�


Input Signal Trimming Order StatisticsFilter Output

0 20 40 60 80 100 120 140 160 180 200−4

−3

−2

−1

0

1

2

3

4

Figure �� An example of the CWM lter operating on an i�i�d� Laplacian sequence with unitvariance� Shown are the lter input and output sequences as well as the trimming statisticsx�k� and x�N��k�� The lter window size is �� and k � ��

elements that sum to an odd number� i�e��PN

i�� wi is odd� Given this vector of weights� the

WM lter operation is dened as ��

y�n� � MED�x�n� w� �� MED�x� w�� x� w�� xN wN ��

Thus� WM lters incorporate temporal order information by weighting samples according to

their temporal order prior to rank ltering� The ltering operation is illustrated through the

following example�

Example �� Consider the window size � WM lter dened by the symmetric weight vector

w � �� For the observation x�n� � �� the lter output is found as

y�n� � MED� �� MED� �� MED� ��

��

where the median value is underlined in equation �� The large weighting on the center

input sample results in this sample being taken as the output� As a comparison� the standard

median output for the given input is y�n� � ��

��


w1 w2 w3 w4 wN

{x} Observation Window

x1 x2 x3 x4 xN

MED

{y}

. . .

. . .

Input

Output

Figure �� The weighted median ltering operation�

The WM ltering operation can be schematically described as in Fig� �� This gure il�

lustrates that as the lter window slides over an input sequence� the observation samples are

duplicated �weighted� according to their temporal order within the window� This replication

forms an expanded observation set which is then ordered according to rank� and the median

sample selected as the output� In this fashion specic temporal order samples can be empha�

sized� and others de�emphasized� The gure also illustrates that structurally� the WM lter is

similar to the linear FIR lter� This relationship between linear and WM lters can be further

explored through an alternative WM lter denition�

The constraint that the WM lter weights be integer valued can be relaxed through a

second� equivalent� lter denition� Thus� let w be an N element weight vector with positive

�possibly� non�integer elements� The output of the WM lter dened by w and operating on

the observation x�n� can be dened as

y�n� � argmin�

D�w��

where D�w�� is the weighted distance operator

D�w�� NXi��

wijxi � �j��

Note thatD�w�� is piecewise linear and convex for wi � � i � �� N � Thus� argmin�D�w��is guaranteed to be one of the input samples x�� x�� xN � The WM lter output for non�

integer weights can determined from �� as follows�

�� Calculate the threshold w� ��

PNi�� wi�

�� Sort the samples in the observation vector x�n��

�� Sum the weights corresponding to the sorted samples beginning with the maximum

sample and continuing down in order�

�


�� The output is the sample whose weight causes the sum to become � w��

The following example illustrates this procedure�

Example �� Consider the window size � WM lter dened by the real valued weights

w � �� The output for this lter operating on the observation x�n� �

�� is found as follows� Summing the weights gives the threshold w� ��

P�i�� wi �

�� The observation samples� sorted observation samples� their corresponding weight� and

the partial sum of weights �from each ordered sample to the maximum� are�

observation samples �� corresponding weights ��

sorted observation samples �� corresponding weights �� partial weight sums ��

��

Thus� the output is � since when starting from the right �maximum sample� and summing the

weights� the threshold w� � �� is not reached until the weight associated with � is added�

The underlined sum value above indicates that this is the rst sum which meets or exceeds the

threshold� �

In the previous section the median and sample mean lters were related through the

distance operator D�� There� it was shown that MED�x�n�� argmin� D�� whileMEAN�x�n�� argmin�D

�� Similar results hold relating the WM and linear FIR l�

ters by means of the weighted distance measure D�w�� As stated in �� the WM of x�n� isargmin�D

�w�� for � � �� Interestingly� if the distance norm is changed to two� then

argmin�

D�w��

PNi��wixiPNi�� wi

� ��

which is a normalized linear FIR lter ��

Before ending the discussion on WM lters it is important to point out that the two lter

denitions given �equations �� and �� are identical� It has been shown that any WM lter

based on real valued weights has an equivalent integer valued weight representation �� As

an illustration� multiplying a weight vector by a positive constant results in an identical lter�

Thus� the WM lter dened by the weight vector w � � �� is identical to that used in

Example �� Consequently� there are only a nite number of WM lters for a given window

size� The number of WM lters� however� grows rapidly with window size� For instance� there

are only � window size � WM lters� but �� and �� window size � and � WM lters�

respectively ��

��


�� Weighted Order Statistic Filters

The weighting scheme used in WM lters is an e�ective method for emphasizing samples in

certain observation window locations and de�emphasizing others� However� the WM lter

output is restricted to be the median of the weight �repetition� expanded set� This lack of

freedom in choosing the rank of the output can limit performance in certain cases�

This limitation can be eliminated by allowing the rank of the output to be an adjustable

parameter� This leads to the class of WOS lters� which includes WM and all rank�order

lters as a subset� Moreover� the more powerful generalization developed in the remainder of

the paper are based on the WOS ltering operation�

The operation of a window size N WOS lter is dened by the N element weight vector

w and the rank parameter w�� For positive integer valued weights and rank parameter �the

integer constraint will be lifted shortly�� the output of the WOS lter is computed as

y�n� � w�th� Largest �x�n� w��

Note that if w� ��

PNi�� wi� �or for non�integer weights w� �

��

PNi��wi�� then the WOS

lter reduces to a WM lter� The WOS lters also contain rank�order lters as a special

case� By restricting each of the weights to be unity� wi � � i � �� N � the WOS lter

output becomes y�n� � w�th� Largest �x�n� w� � x�w�� where again x�� x�� x�N� are

the order statistics� While rather simple� there are several applications where rank�order lters

can be e�ectively utilized� The demodulation of AM signals is one such example where the

output rank is selected so as to tract the envelope function of the AM signal� Figure �� depicts

the AM detection of a � kHz tone signal on a �� kHz carrier and sampled at �� kHz using

an eighth�ranked�order operation with a window size of �� Figure �� a� shows the envelope

detection when no noise is present� whereas Fig� �� b� shows the envelope detection in an

impulsive noise environment� Note that while impulsive noise is very disruptive with most

envelope detectors� the output of the rank�order lter is hardly perturbed by the noise�

As with WM lters� the restriction that the weights� and in this case w�� be integer valued

can be relaxed� For non�integer values� w� is referred to as the threshold and the WOS lter

output is determined by the same procedure used to nd the WM lter output for the non�

integer weights� The only di�erence being that w� is free to be chosen and not restricted to

w� �PN

i�� wi� Thus� WOS lters have N � � degrees of freedom� The freedom to set the

threshold� in addition to the weights� makes WOS lters a powerful class of lters with wide

ranging applications� Moreover� e�ective �adaptive� optimization procedures exist for WOS

lters� Furthermore� since the WM and WOS lters are simple generalization of the median�

we can expect some properties of the median lter to extrapolate to these more general lters�

This is in fact the case for the root signal properties and threshold decomposition� We revisit

threshold decomposition next because of its importance in the analysis and optimization of

WM and WOS lters�

��


70

80

90

100

110

120

130

140

150

160

0 10 20 30 40 50 60 70 80 90 100

Detected signal

Original signal

Original signal

70

80

90

100

110

120

130

140

150

160

0 10 20 30 40 50 60 70 80 90 100

Noisy signal

Detected signal

Signal corrupted with impulsive noise

Figure �� Rank�order AM demodulation� The window size is �� and the output is the �thlargest in the window� Baseband signal is at � KHz with a carrier of ��KHz� The samplingfrequency is �� KHz� �a� noiseless reception� �b� noisy reception with impulsive noise Pr��

�� Threshold Decomposition and Logic

As stated above� threshold decomposition extends to the class of WOS lters� To show this�

we begin by again restricting the input signal to have M levels� After proving that WOS lters

possess the threshold decomposition and stacking property� the conditions on the input signal

are relaxed to allow for the case of real valued inputs�

To begin� denote the input vector as x�n� � �x�� xN �T where xi � ZM � f� �� M �

�g� Recall that x�n� can be decomposed intoM�� binary vectors X��n��X��n�� XM��n��where the elements of the binary vectors are Xmi � Tm�xi� for m � �� M � � andi � �� N � Also� the decomposition is reversible� xi �

PM��m�� X

mi for i � �� N � The

decomposition can now be applied to the WOS lter operation�

y�n� � w��th largest �x� w� � � � � xN wN � ��

��


� w��th largest

��M��Xm��

Xm�

�

w��

�M��Xm��

XmN

�

wN

��

Since replicating each xi sample wi times is equivalent to replicating each Xmi binary sample

wi times and adding all these together� the above can be written as

y�n� � w��th largest

�M��Xm��

�Xm� w�� M��Xm��

�XmN wN ��

The next step is to invoke the stacking property of threshold decomposition which states

that if Xpi � � for a given threshold level p� then Xqi � � for all levels q p� Similarly

if Xpi � � then Xqi � for all q p� Thus� nding the w��th largest sample in the set

x� w�� xN wN is equivalent to nding the maximum level m at which there are w� ormore ones in the set Xm� w�� XmN wN � Finding the maximum level which satises thiscondition� in turn� can be found by counting the levels which have w� or more ones in the

binary vectors� Hence� the output of the WOS lter can be written as

y�n� �M��Xm��

w��th largest� Xm� w�� XmN wN ��

This expression can be further simplied as ��

y�n� �M��Xm��

f�Xm�w� ��

where the function f�X�w� is a linearly separable threshold function

f�Xm�w� �

�� if

�PM��m�� wiX

mi

�� w�

otherwise��

The output of a WOS lter can be nally expressed as

y�n� �M��Xm��

U��WT �Xm

��

where U�� is a unit step function� and where �W � �w�� w�� w�� wN �T and �Xm � ��Xm� �� XmN �T are the extended weight and extended observation vectors� respectively�

Hence� the WOS lter output is shown to not only satisfy the threshold decomposition

property but also to be characterized by a sum of linear threshold functions� Notice that in

the threshold domain �� the WOS lter weights� including w�� are required to be positive

but can also be real�valued� The restriction that the input be integer�valued can also be

relaxed to allow for real�valued observations� Next� we generalize the threshold decomposition

architecture to handle real�valued signals ��

��

BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS

Take x�n� to be nonnegative and real�valued� The nonnegative constraint is taken for

convenience� and will be relaxed shortly� As in the integer�valued case� a real�valued observation

x�n� can be decomposed into a set of binary signals�

X��n� � U�x�n��

from which x�n� can be recovered�

x�n� �

Z ��

X��n�d �

Z ��

U�x�n�� d � ��

The WOS ltering of a real�valued signal can now be implemented using threshold decompo�

sition as

y�n� �

Z ��

U��WT �X�

�d ��

where �X� � ��X�� X�� X�N �T � The integration is simplied by the fact that the obser�vation vector contains� at most� N di�erent valued samples� Consequently� there are at most

N � � di�erent binary vectors X�� The possible vectors are

X� �

��

�� T if � �� x��X

x�i�� X

x�i�� X

x�i��N �

T if � �x�i�� x�i�� T if � �x�N��

� ��

Using this fact it can be shown that

y�n� � x�� NXi��

�x�i� � x�i��U��WT �Xx�i�

��

This decomposition holds for both integer� and real�valued signals� as well as those that are

not strictly positive� Moreover� this decomposition is much more e�cient than that originally

derived for integer�valued signals since it requires only N �� threshold logic operations rather

thanM � This reduction in complexity simpliesWOS analysis and optimization� both of which

are performed in the threshold domain� By combining this threshold decomposition with unit

step function approximations� fast adaptive optimization algorithms can be developed� This

is the approach taken in Section �� which describes the optimization of WOS lters�

� Time�Rank Coupling Extensions� PWOS Filters

The generalizations of the median lter discussed in the previous section are based on the

weighting of samples� In the most general case covered� WOS lters� the observation samples

are weighted according to their temporal�order prior to rank ordering� This median lter

generalization method� as well as others such as stack lters �� have been proposed to

incorporate some form of temporal�order information into rank�order lters� Still� due to their

��


constrained nature� these methods do not fully utilize the information contained in both the

temporal and rank ordering of the observed data ��

An observation set of samples can� of course� be ordered in many ways� In most practical

situations samples are observed on a time ordered basis� e�g�� from a sensor which is regularly

sampled� This results in the temporal�ordered observation x� The samples comprising x can

be permuted such that they are ordered according to a di�erent criteria� such as rank� The

rank�ordered samples are written as xL� Thus� the mapping x �� xL is simply a permutation ofsamples� Moreover� this permutation mapping contains both the temporal and rank orderings

of an observation set of samples�

The temporal and rank natural orderings are important for the ltering process� Rank�

order information is vital for reducing the e�ect of outliers in non�Gaussian environments and

accurately tracking non�stationary signal discontinuities� Conversely� temporal�order informa�

tion is essential for preserving!rejecting signal frequency content and processing temporally

correlated signals� The class of Permutation �P� lters have been designed to take full ad�vantage of the permutation mapping x �� xL� and consequently� both the temporal � andrank�order of observation samples� By utilizing both orderings� permutation lters have shown

to be both robust and frequency selective �� Moreover� the temporal� and rank�order

information can be simply augmented with additional statistics resulting in extended P lters�� which have additional capabilities� Selection P lters contain WOS lters� stack lters�� and some composition of discrete morphological operators� as a proper subset�

The use of the observation permutation as a basis for ltering has considerable advantages�

However� the factorial growth in the number of permutations� as a function of window size�

limits the practicality of using the full permutation information� Thus� a subset of the permu�

tation information must be used in practice� Optimizing on what� and how much� temporal

and rank information should be used is very di�cult� Therefore� we adopt a nested lattice

formulation of permutation lters� This lattice formulation gives a well structured method

for controlling the amount of temporal and rank information used� Each vertex of the lattice

denes a class of P lters which uses a xed amount of temporal� and rank�order information�This lattice is an extension of the L� ordering used in �� This extension results in a Lj�

time!rank ordering and lattice� where j indicates the amount of rank information incorporated�

respectively� To illustrate the concept� the following discussion starts with the simple L�� case�

Extensions are then made to the more general cases�

�� L�� PWOS �lters

The WOS lter operates on limited temporal� and rank�order information� Clearly� samples

are weighted according to their temporal�order� or equivalently� their location within the ob�

�Two orderings that arise naturally are temporal and rank� Other natural orderings include spatial� spectraland likelihood�

��


servation window� The expanded set is then rank ordered and the w��th sample taken as theoutput� The observation samples are ordered only after weighting� That is� the weight applied

to a sample is not dependent upon its rank�order� For instance� if the center sample is heavily

weighted to re�ect its importance� then the observation sample in that location is emphasized

regardless if it is a �good� sample or an outlier� In fact� all outliers are emphasized under this

scheme since each outlier occupies the center observation window location once� assuming the

window is sequential shifted over the sequence one sample at a time�

The samples in the observation window can be more appropriately weighted by considering

the temporal� and rank�order of each sample� To accomplish this weighting� dene the rank

indicator vector Ri � �Ri��Ri�� RiN �T � where

Rik �� if xi �� x�k�

otherwise

��

and xi �� x�k� means that the kth order statistic occupies the ith temporal location in x� Letthe variable ri be the rank of xi in xL� hence� Riri � � by denition� Thus� Ri is a length Nbinary vector with a �� in position ri� The other N � � positions in the vector are zeros� TheN rank indicators can be combined into a N� � N matrix P that indicates the rank of eachsample�

P �

��R� � � � � �� R�

��

��

�� RN

��

where � is a N long vector of zeros�

Example �� Consider the � sample �temporally�ordered� observation x � �� which

results in the rank�ordered vector xL � �� the four rank indicator vectors and their

respective rank parameters are

R� � �� T � r� � � R� � �� T � r� � �R � �� T � r � � R � �� T � r � ��

Combining them into the P matrix produces�

P �

��

� � �� j � � � j � � � j � � �

� � � j � �� j � � � j � � �

� � � j � � � j � � � � j � � �

� � � j � � � j � � � j ��

��T

� ��

where the vertical separation bars have been added for convenience� Thus� the rst section of

the P matrix gives the rank of x�� the second gives the rank of x�� and continuing so until the

last section which is the rank of xN � �

�Note that this same rank information could be represented by a N� element vector� We use the less e�cientrepresentation only to allow simple matrix products� This will simplify the notation used shortly�

��


Having dened P� which gives the temporal� and rank�order of each sample� we can now

dene a corresponding weight vector� Since the goal is to weight each sample according to its

temporal� and rank�order� the weight vector must have N� entries� Consider the i�th temporalsample xi� This sample can take on N rank values� so N weights must be associating with this

sample� Dene the weight vector

Wi � �wi�� wi�� wi��N��T ��

with positive valued elements to be that associated with xi� Thus� each xi has N weights and

the single weight use at any given instant will depend on the rank of xi� Recalling that ri

is the rank of xi� the weight used at each instant is wi��ri�� Thus� each observation sample is

weighted according to both its temporal� and rank�order�

The N weight vectors can be stacked to form a single PWOS weight vector�

W � �WT� jWT� j � � � jWTN �T � ��

The appropriate weights fromW �only N weights are used at any given time� can be selected

using P� Once the weights are selected� the output of a PWOS lter is found in an analogous

manner to the WOS lter output� Formally� the PWOS output is dened as

y�n� � W�th� Largest �x

T WTP� �� W

�th� Largest �x� WT� P�� x� WT� P�� xN WTNPN � ��

� W�th� Largest �x� w��r�� x� w��r�� xN wN�rN ��

Thus� each input sample is weighted according to its temporal� and rank�order and the W�th�

largest sample is chosen as the output from the expanded set� Since the weight of each sample

depends on the temporal� and rank�order of one sample �itself�� this lter is said to use L��

temporal!rank information and reside at the L�� location on the L� lattice� which is dened

shortly�

The following examples illustrate the operation of PWOS ltering� The weights in the

example are integer�valued� However� like WOS lters� PWOS lter weights need only be

positive� We give only a integer�valued weight PWOS example as output for real�valued weights

is found similarly to the WOS case�

Example �� Consider the window size � PWOS lter with W� � �� Let x � �x�� x�� x� �

�� then� xr � �x�� x�� x�� Let the PWOS weight vector be

W � �w�� w�� w��j � � � jw�� w�� w��T

� � �� j�� j�� T ��

�


From the observation vector� we can compute the matrix P� The rank indicator vectors for x

are�R� � �� T � r� � �R� � �� T � r� � �R � �� T � r � ��

��

The weights obtained for the replication of the input samples are computed via WTP� which

evaluates to

WTP � �� j�� j��

�� j � � � j � �

� � � j �� j � �

� � � j � � � j � ��

��T

��

� ��

The output of the lter is

y�n� � ��thLargest�xT WTP�

� ��thLargest��

� ��thLargest��

��

�

The advantage of considering both temporal� and rank�order when assigning weights is that

outliers can be detected and given a smaller weight� This is illustrated in Fig� �� which shows

optimal PWOS lter weights plotted as a mesh function of temporal� and rank�order� The input

to this lter was an image corrupted by heavy�tailed points� As the gure shows� the samples

given the most weight are centrally located in both time and rank� This makes intuitive sense

has the central temporal samples are expected to be more correlated with the desire center

sample then those which are temporally distant� Similarly� samples that lie in the extreme

ranks may be outliers and should be given smaller weight� Next� we extend temporal!rank

coupling to include more than one sample�

�� Lj� PWOS �lters

The weighting scheme derived in the previous section can be extended to include information

on the rank�order of multiple samples� Thus� each input sample can be weighted according to

not only its temporal� and rank�order� but also the rank�order of its neighbors� This scheme

allows the ranks of adjacent samples to be compared during the weighting process� Through

such comparisons� it can be better determined if a sample is truly an outlier� For instance� if

two adjacent samples both have high rank� then they may simply be samples which crossed an

edge� If only one sample has high rank� then with higher probability it is an outlier� To take

advantage of neighboring rank information� a general Lj� rank coupling technique is developed

next�

��


Temporal-order Rank-order

1

9

9

1

Figure �� Optimal window size � PWOS lter weights plotted as a mesh function of temporal�and rank�order� The greatest weight is given to those samples that are centrally located inboth time and rank�

In the previous section rank indicators were used to characterize the rank of each �temporal�

sample xi� Suppose that we want to jointly characterize the ranks of two input samples� xi and

xi�� If the rank indicator vector for xi� Ri� is given� then we can form an additional indicatorvector for xi�� that does not contain the information provided in Ri� This vector� denotedby R�i � is the N � � length reduced indicator vector formed by removing the rthi element fromRi�� Thus� Ri gives the rank of xi and R�i gives the rank of xi�� given that we know the rankof xi already� We can extend this concept to more than two samples� Associated with the x

thi

input sample� the reduced rank indicator Rai is formed by removing the rthi � rthi�� rthi��a��elements from the vector Ri�a� where denotes Modulo N addition i a � �i� a� Mod N ��For example� if x � �� and xr � �� then the rank indicator vectors and their

respective rank parameters are

R� � �� T � r� � � R� � �� T � r� � �R � �� T � r � � R � �� T � r � ��

The reduced rank indicator vectors R� and R� are� for example�

R� � �� r� � � �� T � �� TR� � �� r� � � �� r� �T � �� T

��

where the rth sample was removed from R�� R� to get R� and where the rth and rth

samples were deleted from R�� R� to get R��

�The Modulo N operation de�ned here is on the group f�� Ng� such that N Mod N N� andN � � Mod N �� The ranks can� of course� be coupled in a fashion other than cyclical Modulo N methodused here� e�g�� the next sample coupled to xi could be that of minimum temporal distance from xi resulting incoupling progressions xi� xi�� xi�� xi�� Such couplings result in similar �lter structures and results� Forsimplicity� we use the notationally simple cyclic Modulo N coupling here�

�


The rank indicator vectors Ri�R�i � � � � �Rj��i can be used to express the ranks of j consec�utive samples starting at xi� The rank permutation indicator associated with the x

thi input

sample is dened as

Pji � Ri �R�i � � � ��Rj��i ��

for � � j � N � where � denotes the matrix Kronecker product� Note that the vector Pji haslengthN�N�� N�j�� P jN � The indicator vector Pji characterizes the relative rankingof the samples xi� xi�� xi��j�� Thus� P�i contains no ran

Documents

BARNER AND AR - University of Delawarebarner/courses/eleg675/papers...BARNER AND AR CE INTR ODUCTION Original Signal Noisy Signal Trimming Order Statistic 0 10 20 30 40 50 60 70 80